Click here to load reader
Upload
iittm
View
470
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
14-04-2012
1
Research Methodology Dr. Nimit Chowdhary, Professor
Saturday, April 14, 2012 1© Dr. Nimit Chowdhary
© Dr. Nimit Chowdhary Research Methodology Workshop p. 2 Saturday, April 14, 2012
We want to know the behavior (characteristics) of a population
We draw samples from this population Make conclusions about the population
parameters based on sample statistics. A characteristic of the sample is an estimate
of the similar characteristic of population
14-04-2012
2
© Dr. Nimit ChowdharySaturday, April 14, 2012
Estimation
Point estimate Interval estimate
© Dr. Nimit ChowdharySaturday, April 14, 2012
A point estimate uses single sample value to estimate the desired population parameter.
ExampleSample mean is a point estimate.
14-04-2012
3
© Dr. Nimit Chowdhary Research Methodology Workshop p. 5 Saturday, April 14, 2012
UnbiasednessThat the estimated value (statistic derived from sample) is equal to the population parameter
ConsistencyIf statistic approaches population parameter as the sample size increases and approaches population size
© Dr. Nimit Chowdhary Research Methodology Workshop p. 6 Saturday, April 14, 2012
EfficiencyIf value of the estimate remains stable from sample to sample. Best estimator will return least inter sample variation
SufficiencyIf estimator uses all the information about the population parameter contained in the sample (mean uses all information while median does not)
14-04-2012
4
© Dr. Nimit Chowdhary Research Methodology Workshop p. 7 Saturday, April 14, 2012
May not exactly locate the population parameter
May not indicate how far is the estimate from the true value
Point estimate does not specify as to how confident we can be that the estimate is close to the population parameter
© Dr. Nimit Chowdhary Research Methodology Workshop p. 8 Saturday, April 14, 2012
Instead of a point value we can indicate a range as an estimate.
We can be reasonably confident that the true parameter will lie in this range
14-04-2012
5
© Dr. Nimit ChowdharySaturday, April 14, 2012
A sampling distribution is normally distributed with a mean of and a standard deviation of
x
x
xZ
xx Z xZxZ
x1 x2X
© Dr. Nimit Chowdhary Research Methodology Workshop p. 10
x1 x2X
Suppose we want to find out a confidence interval around the sample mean within which the population mean is expected to lie 95% of the time.
95%
47.5%47.5%
2.5%2.5%
14-04-2012
6
© Dr. Nimit Chowdhary Research Methodology Workshop p. 11 Saturday, April 14, 2012
This can be interpreted as: If all possible samples of size n were taken, then on
the average 95% of these samples would include the population mean within the interval around their sample means bounded by x1 and x2
If we took a random sample of size n from a given population, the probability is 0.95 that the population mean would lie between the interval x1and x2 around the sample mean
© Dr. Nimit Chowdhary Research Methodology Workshop p. 12 Saturday, April 14, 2012
This can be interpreted as: If a random sample of size n was taken from a given
population, we can be 95% confident in our assertion that the population mean will lie around the sample mean in the interval bounded by values x1 and x2 as shown (also known as 95% confidence interval).
At 95% confidence interval, the value of z score as taken from the z score table is 1.96.
14-04-2012
7
© Dr. Nimit Chowdhary Research Methodology Workshop p. 13 Saturday, April 14, 2012
The sponsor of a TV programme targeted at the children’s market (age 4-10 years) wants to find out the average amount of time children spend watching TV. A random sample of 100 children indicated the average time spent by these children watching TV per week to be 27.2 hours. From previous experience, the population s.d. of the weekly extent of TV watched () is known to be 8 hours. A confidence level of 95% is considered to be adequate.
© Dr. Nimit Chowdhary Research Methodology Workshop p. 14
1.96 x
x1 x227.2X
=8
1.96 x
Confidence interval is given by
So must lie between
and
also,
xx Z
xx Z xx Z
x n
14-04-2012
8
© Dr. Nimit Chowdhary Research Methodology Workshop p. 15
Therefore, in our case
27.21.968100
XZ
n
8 0.8100x n
Hence,
Therefore the confidence interval is
27.2 1.96 0.827.2 1.56825.632,28.768
xx Z
This means that we can conclude with 95% confidence that a child on average spends between 26.632 and 28.768 hours per week watching television.
© Dr. Nimit ChowdharySaturday, April 14, 2012
Calculate the confidence interval in the previous example, if we want to increase our confidence level from 95% to 99%. Other values remain the same.
= 8
x1 x227.2X
95%
49.5%49.5%
0.5%0.5% 2.58 x2.58 x
14-04-2012
9
© Dr. Nimit Chowdhary Research Methodology Workshop p. 17
As in previous case,
27.21.968100
XZ
n
8 0.8100x n
Hence,
Therefore the confidence interval is
27.2 2.58 0.827.2 2.06425.136, 29.264
xx Z
This means that we can conclude with 99% confidence that a child on average spends between 25.136 and 29.264 hours per week watching television. Note that limits have to be spread out further to exude more confidence.
© Dr. Nimit ChowdharySaturday, April 14, 2012
In situations when population variation (s.d) is unknown, and when sample size is reasonably large (30 or more), we can approximate the population standard deviation () by the sample standard deviation (s), so that the confidence interval,
is approximated by interval, when n ≥ 30
Where,
xx Z
xx Zs
x
ssn
14-04-2012
10
© Dr. Nimit Chowdhary Research Methodology Workshop p. 19
Now since we are interested in distribution of means, the dispersion of distribution of sample means can be estimated from sample standard deviation- we have assumed that sample s.d. is equal to population s.d. for n ≥ 30.
x
ssn
© Dr. Nimit Chowdhary Research Methodology Workshop p. 20 Saturday, April 14, 2012
It is desired to estimate the average age of students who graduate with an MBA degree in the university system. A random sample of 64 graduating students showed that the average age was 27 years with a standard deviation of 4 years.
14-04-2012
11
© Dr. Nimit Chowdhary Research Methodology Workshop p. 21 Saturday, April 14, 2012
Estimate a 95% confidence interval estimate of the true average (population mean) age of all such graduating students at the university.
How would the confidence interval limits change if the confidence level was increased from 95% to 99%
Sample size n is sufficiently large, we can approximate the population standard deviation by the sample standard deviation
1.96 x
x1 x227X s=4
1.96 x
4 0.564x
ssn
14-04-2012
12
© Dr. Nimit Chowdhary Research Methodology Workshop p. 23
95% confidence interval of population mean is given by:
27 1.96 0.527.98,26.02
xx Zs
Hence, 26.02 27.98
For 99% confidence, Z=2.58
27 2.58 0.528.29,25.71
,25.71 28.29
xx Zs
Hence
© Dr. Nimit Chowdhary Research Methodology Workshop p. 24 Saturday, April 14, 2012
Sample is studied to infer about the population parameters
More the variation in the population, a bigger sample would be required to estimate
Bigger the sample, more is our confidence with the estimate
14-04-2012
13
© Dr. Nimit Chowdhary Research Methodology Workshop p. 25 Saturday, April 14, 2012
Choice of sample size is depends upon two things: Degree of accuracy we require in our estimate The degree of confidence in ourselves that the
error in the estimate remains within the degree of accuracy that is desired
Ideally sample mean should be equal to the population mean
If entire population is taken as a sample then would be equal to
For a sampling exercise ( - ) can be considered as error or deviation of the estimator from the population mean.
We know that:
X
X
XX
X
XZ
But,
( )
X nand
X E
14-04-2012
14
2
/
X
XZ
E E nZn
ZnE
Therefore sample size depends upon Confidence interval
desired, Z Maximum error allowed,
E Variability of the
population,
© Dr. Nimit ChowdharySaturday, April 14, 2012
If we want a smaller error (want to be more accurate!)
Be more confident and/ or,
Population variance is more
We will have to have a larger sample
14-04-2012
15
© Dr. Nimit Chowdhary Research Methodology Workshop p. 29 Saturday, April 14, 2012
We would like to know that a child spends watching television over the weekend. We want our estimate to be within ± 1 hour of the true population average. (This means that the maximum allowable error is 1 hour). Previous studies have shown the population s.d. to be 3 hours. What sample size should be taken for this purpose, if we want to be 95% confident that the error in our estimate will not exceed the maximum allowable error?
© Dr. Nimit Chowdhary Research Methodology Workshop p. 30
For 95% confidence level, the values of
Z=1.96
E= 1 hour (given)
= 3 hours (given)
2 2
2
2 2
2
,
(1.96) (3)(1)
34.57 35
thenZn
E
n
n