15

Click here to load reader

15 estimation and sample size

  • Upload
    iittm

  • View
    470

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 15 estimation and sample size

14-04-2012

1

Research Methodology Dr. Nimit Chowdhary, Professor

Saturday, April 14, 2012 1© Dr. Nimit Chowdhary

© Dr. Nimit Chowdhary Research Methodology Workshop p. 2 Saturday, April 14, 2012

We want to know the behavior (characteristics) of a population

We draw samples from this population Make conclusions about the population

parameters based on sample statistics. A characteristic of the sample is an estimate

of the similar characteristic of population

Page 2: 15 estimation and sample size

14-04-2012

2

© Dr. Nimit ChowdharySaturday, April 14, 2012

Estimation

Point estimate Interval estimate

© Dr. Nimit ChowdharySaturday, April 14, 2012

A point estimate uses single sample value to estimate the desired population parameter.

ExampleSample mean is a point estimate.

Page 3: 15 estimation and sample size

14-04-2012

3

© Dr. Nimit Chowdhary Research Methodology Workshop p. 5 Saturday, April 14, 2012

UnbiasednessThat the estimated value (statistic derived from sample) is equal to the population parameter

ConsistencyIf statistic approaches population parameter as the sample size increases and approaches population size

© Dr. Nimit Chowdhary Research Methodology Workshop p. 6 Saturday, April 14, 2012

EfficiencyIf value of the estimate remains stable from sample to sample. Best estimator will return least inter sample variation

SufficiencyIf estimator uses all the information about the population parameter contained in the sample (mean uses all information while median does not)

Page 4: 15 estimation and sample size

14-04-2012

4

© Dr. Nimit Chowdhary Research Methodology Workshop p. 7 Saturday, April 14, 2012

May not exactly locate the population parameter

May not indicate how far is the estimate from the true value

Point estimate does not specify as to how confident we can be that the estimate is close to the population parameter

© Dr. Nimit Chowdhary Research Methodology Workshop p. 8 Saturday, April 14, 2012

Instead of a point value we can indicate a range as an estimate.

We can be reasonably confident that the true parameter will lie in this range

Page 5: 15 estimation and sample size

14-04-2012

5

© Dr. Nimit ChowdharySaturday, April 14, 2012

A sampling distribution is normally distributed with a mean of and a standard deviation of

x

x

xZ

xx Z xZxZ

x1 x2X

© Dr. Nimit Chowdhary Research Methodology Workshop p. 10

x1 x2X

Suppose we want to find out a confidence interval around the sample mean within which the population mean is expected to lie 95% of the time.

95%

47.5%47.5%

2.5%2.5%

Page 6: 15 estimation and sample size

14-04-2012

6

© Dr. Nimit Chowdhary Research Methodology Workshop p. 11 Saturday, April 14, 2012

This can be interpreted as: If all possible samples of size n were taken, then on

the average 95% of these samples would include the population mean within the interval around their sample means bounded by x1 and x2

If we took a random sample of size n from a given population, the probability is 0.95 that the population mean would lie between the interval x1and x2 around the sample mean

© Dr. Nimit Chowdhary Research Methodology Workshop p. 12 Saturday, April 14, 2012

This can be interpreted as: If a random sample of size n was taken from a given

population, we can be 95% confident in our assertion that the population mean will lie around the sample mean in the interval bounded by values x1 and x2 as shown (also known as 95% confidence interval).

At 95% confidence interval, the value of z score as taken from the z score table is 1.96.

Page 7: 15 estimation and sample size

14-04-2012

7

© Dr. Nimit Chowdhary Research Methodology Workshop p. 13 Saturday, April 14, 2012

The sponsor of a TV programme targeted at the children’s market (age 4-10 years) wants to find out the average amount of time children spend watching TV. A random sample of 100 children indicated the average time spent by these children watching TV per week to be 27.2 hours. From previous experience, the population s.d. of the weekly extent of TV watched () is known to be 8 hours. A confidence level of 95% is considered to be adequate.

© Dr. Nimit Chowdhary Research Methodology Workshop p. 14

1.96 x

x1 x227.2X

=8

1.96 x

Confidence interval is given by

So must lie between

and

also,

xx Z

xx Z xx Z

x n

Page 8: 15 estimation and sample size

14-04-2012

8

© Dr. Nimit Chowdhary Research Methodology Workshop p. 15

Therefore, in our case

27.21.968100

XZ

n

8 0.8100x n

Hence,

Therefore the confidence interval is

27.2 1.96 0.827.2 1.56825.632,28.768

xx Z

This means that we can conclude with 95% confidence that a child on average spends between 26.632 and 28.768 hours per week watching television.

© Dr. Nimit ChowdharySaturday, April 14, 2012

Calculate the confidence interval in the previous example, if we want to increase our confidence level from 95% to 99%. Other values remain the same.

= 8

x1 x227.2X

95%

49.5%49.5%

0.5%0.5% 2.58 x2.58 x

Page 9: 15 estimation and sample size

14-04-2012

9

© Dr. Nimit Chowdhary Research Methodology Workshop p. 17

As in previous case,

27.21.968100

XZ

n

8 0.8100x n

Hence,

Therefore the confidence interval is

27.2 2.58 0.827.2 2.06425.136, 29.264

xx Z

This means that we can conclude with 99% confidence that a child on average spends between 25.136 and 29.264 hours per week watching television. Note that limits have to be spread out further to exude more confidence.

© Dr. Nimit ChowdharySaturday, April 14, 2012

In situations when population variation (s.d) is unknown, and when sample size is reasonably large (30 or more), we can approximate the population standard deviation () by the sample standard deviation (s), so that the confidence interval,

is approximated by interval, when n ≥ 30

Where,

xx Z

xx Zs

x

ssn

Page 10: 15 estimation and sample size

14-04-2012

10

© Dr. Nimit Chowdhary Research Methodology Workshop p. 19

Now since we are interested in distribution of means, the dispersion of distribution of sample means can be estimated from sample standard deviation- we have assumed that sample s.d. is equal to population s.d. for n ≥ 30.

x

ssn

© Dr. Nimit Chowdhary Research Methodology Workshop p. 20 Saturday, April 14, 2012

It is desired to estimate the average age of students who graduate with an MBA degree in the university system. A random sample of 64 graduating students showed that the average age was 27 years with a standard deviation of 4 years.

Page 11: 15 estimation and sample size

14-04-2012

11

© Dr. Nimit Chowdhary Research Methodology Workshop p. 21 Saturday, April 14, 2012

Estimate a 95% confidence interval estimate of the true average (population mean) age of all such graduating students at the university.

How would the confidence interval limits change if the confidence level was increased from 95% to 99%

Sample size n is sufficiently large, we can approximate the population standard deviation by the sample standard deviation

1.96 x

x1 x227X s=4

1.96 x

4 0.564x

ssn

Page 12: 15 estimation and sample size

14-04-2012

12

© Dr. Nimit Chowdhary Research Methodology Workshop p. 23

95% confidence interval of population mean is given by:

27 1.96 0.527.98,26.02

xx Zs

Hence, 26.02 27.98

For 99% confidence, Z=2.58

27 2.58 0.528.29,25.71

,25.71 28.29

xx Zs

Hence

© Dr. Nimit Chowdhary Research Methodology Workshop p. 24 Saturday, April 14, 2012

Sample is studied to infer about the population parameters

More the variation in the population, a bigger sample would be required to estimate

Bigger the sample, more is our confidence with the estimate

Page 13: 15 estimation and sample size

14-04-2012

13

© Dr. Nimit Chowdhary Research Methodology Workshop p. 25 Saturday, April 14, 2012

Choice of sample size is depends upon two things: Degree of accuracy we require in our estimate The degree of confidence in ourselves that the

error in the estimate remains within the degree of accuracy that is desired

Ideally sample mean should be equal to the population mean

If entire population is taken as a sample then would be equal to

For a sampling exercise ( - ) can be considered as error or deviation of the estimator from the population mean.

We know that:

X

X

XX

X

XZ

But,

( )

X nand

X E

Page 14: 15 estimation and sample size

14-04-2012

14

2

/

X

XZ

E E nZn

ZnE

Therefore sample size depends upon Confidence interval

desired, Z Maximum error allowed,

E Variability of the

population,

© Dr. Nimit ChowdharySaturday, April 14, 2012

If we want a smaller error (want to be more accurate!)

Be more confident and/ or,

Population variance is more

We will have to have a larger sample

Page 15: 15 estimation and sample size

14-04-2012

15

© Dr. Nimit Chowdhary Research Methodology Workshop p. 29 Saturday, April 14, 2012

We would like to know that a child spends watching television over the weekend. We want our estimate to be within ± 1 hour of the true population average. (This means that the maximum allowable error is 1 hour). Previous studies have shown the population s.d. to be 3 hours. What sample size should be taken for this purpose, if we want to be 95% confident that the error in our estimate will not exceed the maximum allowable error?

© Dr. Nimit Chowdhary Research Methodology Workshop p. 30

For 95% confidence level, the values of

Z=1.96

E= 1 hour (given)

= 3 hours (given)

2 2

2

2 2

2

,

(1.96) (3)(1)

34.57 35

thenZn

E

n

n