15 estimation and sample size

14-04-2012

1

Research Methodology Dr. Nimit Chowdhary, Professor

Saturday, April 14, 2012 1© Dr. Nimit Chowdhary

© Dr. Nimit Chowdhary Research Methodology Workshop p. 2 Saturday, April 14, 2012

We want to know the behavior (characteristics) of a population

We draw samples from this population Make conclusions about the population

parameters based on sample statistics. A characteristic of the sample is an estimate

of the similar characteristic of population

14-04-2012

2

© Dr. Nimit ChowdharySaturday, April 14, 2012

Estimation

Point estimate Interval estimate


A point estimate uses single sample value to estimate the desired population parameter.

ExampleSample mean is a point estimate.

14-04-2012

3


UnbiasednessThat the estimated value (statistic derived from sample) is equal to the population parameter

ConsistencyIf statistic approaches population parameter as the sample size increases and approaches population size


EfficiencyIf value of the estimate remains stable from sample to sample. Best estimator will return least inter sample variation

SufficiencyIf estimator uses all the information about the population parameter contained in the sample (mean uses all information while median does not)

14-04-2012

4


May not exactly locate the population parameter

May not indicate how far is the estimate from the true value

Point estimate does not specify as to how confident we can be that the estimate is close to the population parameter


Instead of a point value we can indicate a range as an estimate.

We can be reasonably confident that the true parameter will lie in this range

14-04-2012

5


A sampling distribution is normally distributed with a mean of and a standard deviation of

x

x

xZ

xx Z xZxZ

x1 x2X

© Dr. Nimit Chowdhary Research Methodology Workshop p. 10

x1 x2X

Suppose we want to find out a confidence interval around the sample mean within which the population mean is expected to lie 95% of the time.

95%

47.5%47.5%

2.5%2.5%

14-04-2012

6


This can be interpreted as: If all possible samples of size n were taken, then on

the average 95% of these samples would include the population mean within the interval around their sample means bounded by x1 and x2

If we took a random sample of size n from a given population, the probability is 0.95 that the population mean would lie between the interval x1and x2 around the sample mean


This can be interpreted as: If a random sample of size n was taken from a given

population, we can be 95% confident in our assertion that the population mean will lie around the sample mean in the interval bounded by values x1 and x2 as shown (also known as 95% confidence interval).

At 95% confidence interval, the value of z score as taken from the z score table is 1.96.

14-04-2012

7


The sponsor of a TV programme targeted at the children’s market (age 4-10 years) wants to find out the average amount of time children spend watching TV. A random sample of 100 children indicated the average time spent by these children watching TV per week to be 27.2 hours. From previous experience, the population s.d. of the weekly extent of TV watched () is known to be 8 hours. A confidence level of 95% is considered to be adequate.


1.96 x

x1 x227.2X

=8

1.96 x

Confidence interval is given by

So must lie between

and

also,

xx Z

xx Z xx Z

x n

14-04-2012

8


Therefore, in our case

27.21.968100

XZ

n

8 0.8100x n

Hence,

Therefore the confidence interval is

27.2 1.96 0.827.2 1.56825.632,28.768

xx Z

This means that we can conclude with 95% confidence that a child on average spends between 26.632 and 28.768 hours per week watching television.


Calculate the confidence interval in the previous example, if we want to increase our confidence level from 95% to 99%. Other values remain the same.

= 8

x1 x227.2X

95%

49.5%49.5%

0.5%0.5% 2.58 x2.58 x

14-04-2012

9


As in previous case,

27.21.968100

XZ

n

8 0.8100x n

Hence,

Therefore the confidence interval is

27.2 2.58 0.827.2 2.06425.136, 29.264

xx Z

This means that we can conclude with 99% confidence that a child on average spends between 25.136 and 29.264 hours per week watching television. Note that limits have to be spread out further to exude more confidence.


In situations when population variation (s.d) is unknown, and when sample size is reasonably large (30 or more), we can approximate the population standard deviation () by the sample standard deviation (s), so that the confidence interval,

is approximated by interval, when n ≥ 30

Where,

xx Z

xx Zs

x

ssn

14-04-2012

10


Now since we are interested in distribution of means, the dispersion of distribution of sample means can be estimated from sample standard deviation- we have assumed that sample s.d. is equal to population s.d. for n ≥ 30.

x

ssn


It is desired to estimate the average age of students who graduate with an MBA degree in the university system. A random sample of 64 graduating students showed that the average age was 27 years with a standard deviation of 4 years.

14-04-2012

11


Estimate a 95% confidence interval estimate of the true average (population mean) age of all such graduating students at the university.

How would the confidence interval limits change if the confidence level was increased from 95% to 99%

Sample size n is sufficiently large, we can approximate the population standard deviation by the sample standard deviation

1.96 x

x1 x227X s=4

1.96 x

4 0.564x

ssn

14-04-2012

12


95% confidence interval of population mean is given by:

27 1.96 0.527.98,26.02

xx Zs

Hence, 26.02 27.98

For 99% confidence, Z=2.58

27 2.58 0.528.29,25.71

,25.71 28.29

xx Zs

Hence


Sample is studied to infer about the population parameters

More the variation in the population, a bigger sample would be required to estimate

Bigger the sample, more is our confidence with the estimate

14-04-2012

13


Choice of sample size is depends upon two things: Degree of accuracy we require in our estimate The degree of confidence in ourselves that the

error in the estimate remains within the degree of accuracy that is desired

Ideally sample mean should be equal to the population mean

If entire population is taken as a sample then would be equal to

For a sampling exercise ( - ) can be considered as error or deviation of the estimator from the population mean.

We know that:

X

X

XX

X

XZ

But,

( )

X nand

X E

14-04-2012

14

2

/

X

XZ

E E nZn

ZnE

Therefore sample size depends upon Confidence interval

desired, Z Maximum error allowed,

E Variability of the

population,


If we want a smaller error (want to be more accurate!)

Be more confident and/ or,

Population variance is more

We will have to have a larger sample

14-04-2012

15


We would like to know that a child spends watching television over the weekend. We want our estimate to be within ± 1 hour of the true population average. (This means that the maximum allowable error is 1 hour). Previous studies have shown the population s.d. to be 3 hours. What sample size should be taken for this purpose, if we want to be 95% confident that the error in our estimate will not exceed the maximum allowable error?


For 95% confidence level, the values of

Z=1.96

E= 1 hour (given)

= 3 hours (given)

2 2

2

2 2

2

,

(1.96) (3)(1)

34.57 35

thenZn

E

n

n

Technology

15 estimation and sample size