n or n-1 for sample standard deviation? (1 Viewer)

2822309062

New Member
Joined
Nov 8, 2019
Messages
3
Gender
Male
HSC
2020
So I was doing some 2020 school trial papers and I found that none have included sqrt(p(1-p)/(n-1)) as the answer for sample standard deviation. Just wondering what do you guys think, would n-1 be safer in HSC or n will be fine?
 

Trebla

Administrator
Administrator
Joined
Feb 16, 2005
Messages
8,118
Gender
Male
HSC
2006
I would be interested to see what the questions actually ask in those papers because there is a nuance here that I don't think is easily understood.

If X is a B(n,p) random variable to model the sampling distribution of n trials of independent Bernoulli random variables, then if we define



this means that



Notice that p is actually a parameter describing the 'population' in relation to each Bernoulli trial. In other words, if p is found directly from the problem (as the probability of success) rather than being estimated from relative frequencies in a sample, then you’re not working with a “sample” variance per se.
 

2822309062

New Member
Joined
Nov 8, 2019
Messages
3
Gender
Male
HSC
2020
I would be interested to see what the questions actually ask in those papers because there is a nuance here that I don't think is easily understood.

If X is a B(n,p) random variable to model the sampling distribution of n trials of independent Bernoulli random variables, then if we define



this means that



Notice that p is actually a parameter describing the 'population' in relation to each Bernoulli trial. In other words, if p is found directly from the problem (as the probability of success) rather than being estimated from relative frequencies in a sample, then you’re not working with a “sample” variance per se.
That's interesting, I thought sample variance is a general reference to whenever we are dealing with samples.
So, for example, if there is a factory only produces pens and there are 1000 pens (population) produced and 100 are chosen as samples. And question asks for the standard deviation from the sampling distribution.

If an overall faulty rate of 10% is given in regard to the whole population of 1000 pens, then it will be appropriate to use 10% as the probability to calculate the "sample" standard deviation for the sample, forming ? Otherwise, is the population standard deviation because you mentioned it as "then you’re not working with a “sample” variance per se". Then if this is the population standard deviation, then wouldn't it be different to which shall also represent the population standard deviation?

Moreover, if the overall faulty rate of 10% is disregarded, and an independent research is conducted to the 100 chosen sample and found a fault rate of 5%, then the "sample" standard deviation shall be ?

I am a bit confused now and might have misinterpreted your message, I will be really appreciated if you can reply to me.
Many thanks
 

Trebla

Administrator
Administrator
Joined
Feb 16, 2005
Messages
8,118
Gender
Male
HSC
2006
The value of the sample variance is the spread of the data within the sample. It will always vary depending on what sample you get. It is a realisation of an experiment so it shouldn't really have any probabilities in it. In your example, say you take a sample of 100 pens out of 1000 pens and see how many are faulty. It doesn't make sense that in every sample of 100 pens you will get exactly 10 pens that are faulty. You may get 9 or 12 pens etc as it depends on your sample.

Perhaps worth looking at this from first principles. Let each be Bernoulli trials. Denote as the realisation (or data points) of each of these random variables. These are just a bunch of 0s and 1s where 0 represents failure and 1 represents success.

Suppose that out of the n trials, we get m successes (m<n) which is our particular sample. This means that our sample data has m lots of 1s and n-m lots of 0s so



hence the sample mean is simply



The sample variance is given by



After some algebra simplifying you get



which means that



Notice that your sample mean is in fact your sample proportion. At no point did we use the probability of success in the context of the sample but rather the relative frequency in the sample.

This is different to the variance of the random variable which uses probabilities and distribution functions to measure its spread. This is what most questions typically ask about.
 

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

Top