28

The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are

Embed Size (px)

Citation preview

The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are the values of the population parameters and variances/standard deviations are in the given form.

X p

we can consider each point estimate is one of the values of the random variable under normal distribution.

We want to use our point estimate to find the population parameter.

However, the probability that our point estimate actually IS the population parameter is VERY small.

Therefore, the conclusion is, the point estimates are not very helpful in terms finding the population parameter if used ALONE.

We need to take a different strategy. Assuming we play a game, I have one

number in my mind, with some restrictions, and you will guess what that number is. A. Your guess can only be ONE number. B. Your guess could be a range (or

interval) of numbers. In which case do you think you have a

better chance of getting the number.

Apparently, B gives you a better chance.

Let’s do that in our estimation here.Obviously, that strategy should be

called Interval Estimation.

Idea of interval estimation.

Form of interval estimation:

Point Estimate Margin of Error That is usually called confidence

interval. Recall some of the poll results of the

presidential election, “someone is leading at 52% with margin of error of 3%”.

Two things we need to know about interval estimation: 1. How to do it. 2. How to use it.

Let’s start with the second one, which is true for all types of interval estimation.

We use the term “confidence” interval, which means we should have some confidence in the interval estimation we come up with.

How do we quantify “confidence”?

Remember that if we do many, many SRS and calculate the point estimate for each one of them, the mean of those point estimates should be the population parameter of interest.

That is the same for confidence interval.

Given that we take many, many SRS and calculate the confidence intervals for the point estimates from each sample, we expect a proportion of those confidence intervals should cover the true population parameter.

That “proportion” is our measure of confidence.

Usually, we use 95% as our confidence level.

Now, how to calculate confidence intervals:

For sample mean, , the confidence interval is in the form of:

where is the sample mean, n is the sample size, is the population standard deviation.

X

2

X Zn

X

Now let’s look at , that is how we take “confidence” into account.

We assume the sample mean is normally distributed, or at least approximately normally distributed.

The values of sample mean within 95% probability of the population parameter is

2

Z

2

Zn

Then if an point estimate is within 95% of μ, the interval should cover μ.

Next question, how do we decide . If we are interested in 95%, taking into

account that normal distribution is symmetric, we should use the cutoff at 97.5%.

X

2

X Zn

2

Z

Therefore, a 95% confidence interval of

Also, we can try other confidence levels, for example, if we want to try 99% confidence interval, we should use 2.576 and if we want to try 90% confidence interval, we should use 1.645.

X

1.96Xn

Example: Average GPA of management students.

Using the previous example, if we know that the standard deviation of GPA among management students is 0.8, create a 95% confidence interval for the estimate we got in our sample. How about a 90%?

Some observations:1. Given that a sample has been drawn, the

margin of error : , totally depends on the

confidence level. The higher the confidence level, the larger the margin of error.

2. The width of the confidence interval also depends on the confidence level, the higher the confidence level, the wider then confidence interval.

2

Zn

Interpretation of confidence interval: (at 95% for example) Basically, if we draw many samples and

calculate the confidence interval, 95% of those confidence intervals will cover the true population parameter.

Or, since each sample is equally likely, we can say, our confidence interval has a 95% chance of covering the true population parameter.

Confidence interval is something that we calculate after we draw a sample and calculate the point estimate.

You may also ask what if we do NOT have the population variance/standard deviation?

How about we want to do a confidence interval for sample proportion, very similar:

And everything else follows as it is for the sample mean.

2

(1 )p pp Z

n

Example: if we want to find a 95% confidence interval for the proportion of management students whose GPA is higher than 2.8, what shall we do and how shall we interpret it?

So far we have assumed that you have the power to make and carry out the decisions of sampling scheme and data collection.

Unfortunately, we do not have that power or resource most of the time.

In real research, we are always faced with the question, how large do you think your sample should be to get a desired margin of error?

If the population parameter of interest is the population mean, we will assume that we know the population standard deviation and use the following formula:

where E is the desired level of margin of error.

2

22

( )Z

nE

If our interest is in the population proportion, then use the following formula:

How to find ? 1. Use other people’s results; 2. Use a

pilot study; 3. Use judgments; 4. Start with p=0.5.

2

22

( ) (1 )Z p p

nE

p

Example: if you are interested in the average GPA of management students and you know that the standard deviation of GPA among management students is 0.8. You want your estimate to be off by at most 0.4. How will you determine the number of students to collect information from?

Example: You are interested in how many chocolate beans there are in an M&M packet. Someone from the factory told you that the standard deviation of number of chocolate beans is 10, how many packs of M&M will you have to sample to get an estimate which is 3 beans within the actual number of beans in each packet? How about you change your desired margin of error to be 20?

Example: Again you are interested in the proportion of management students with GPA higher than 2.8. You heard from someone whose has done this study before that the proportion in her study is 60%. If you decide to be off by no more than 5%, how many students should you include in your sample?