Upload
gulshan-kumar-sinha
View
31
Download
2
Tags:
Embed Size (px)
Citation preview
Session 12
Reference
Levin, R. I. and Rubin, D.S., Statistics for Management (Pearson Education )
Black, K., Business Statistics 5th Edn., Wiley Publication.
Q? What is the purpose of obtaining a sample?
A. To provide a description of a population
In the inferential statistics process, a researcher selects a random sample from the population, computes a statistic on the sample, and reaches conclusions about the population parameter from the statistic.
In attempting to analyze the sample statistic, it is essential to know the distribution of the statistic.
Sampling distribution: The probability distribution of a statistic, obtained by selecting all the possible samples of a specific size from a population.
Predicting the characteristics of a sample
Example
Frequency distribution for a population of four scores: 2, 4, 6, 8
Suppose we know the marks of four students
Scores: 2,4,6,8
Let’s construct a distribution of sample means
Population parameters (scores) : 2,4,6,8
Specify a sample size, say n=2
Examine all possible samples (A,A), (A,B), (A,C)….
The possible samples of n = 2 scores from the population
Figure – Distribution of sample means
The distribution of sample means for n = 2
Characteristics of sample means
Sample means tend to pile up around the population mean
The distribution of sample means is approximately normal in shape.
The distribution of sample means can be used to answer probability questions about sample means
What do we use when we have a large n and do not want to calculate all of the
possible samples ?
Central Limit Theorem
CLT: For any population with mean of and a
standard deviation , the distribution of sample
means for sample size n will approach a normal
distribution with a mean of and a standard
deviation of / (square root of n) as n approaches
infinity.
n
Central Limit Theorem Cont’d
Distribution of sample means tends to be a normal distribution particularly if one of the following is true:
The population from which the sample is drawn is normal.
The number of scores (n) in each sample is relatively large (n>30)
Expected value of X
Sample means should be close to the population mean (expected value of x)
Expected value of X: the mean of the distribution of sample means will be equal to (the population mean)
X
Standard Error of X
nx
Standard error of the
mean for an infinite
population
Standard deviation of
the population
x
x
Magnitude of the Standard error is
determined by
The size of the sample
The standard deviation of the population from which the sample is selected
Law of large numbers: the > n, the more probable the sample mean will be close to the population mean.
Estimating the Population Mean
Interval estimate
Suppose a marketing research director needs an estimate of the average life in months of car batteries his company manufacturers.
A random sample of 200 batteries is selected.
Enquire about the life of the batteries.
Mean battery life is 36 months.
Point estimate: Mean battery life is 36 months.
What about the uncertainty factor??
To answer this we need to find the standard error.
Standard error is calculated as 0.707 months
In other words: actual life of battery may lie somewhere in the interval estimate of 35.293 to 36.707 months
Session 13
Confidence Interval to Estimate when is Known
n
xx
nzx
nzx
or
nzx
Point estimate
Interval Estimate
What is a confidence interval?
One sample out of 20 (5%) does not contain the true mean, 15.
1011121314151617181920
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Confidence Interval (contd..)
95% confidence means: 95% of all the sample means are within ±2 standard errors from μ.
μ is within ±2 standard errors of 95% of all the sample means.
Distribution of Sample Means for 95% Confidence
.4750 .4750
X
95%
.025 .025
Z 1.96 -1.96 0
For a 95% confidence interval
α = 0.05
α/2 = 0.025
Value of α/2 or z.025 look at the standard normal distribution table under
.5000 - .0250 = .4750
From standard normal table look up 0.4750, and read 1.96 as the z value from the row and column
Estimating the Population Mean
α is used to locate the Z value in constructing the confidence interval
The confidence interval yields a range within which the researcher feel with some confidence the population mean is located
Z score – the number of standard deviations a value (x) is above or below the mean of a set of numbers when the data are normally distributed
Estimating the Population Mean
xz
n
95% Confidence Intervals for
X
95%
X X
X
X
X
X
/21300, 160, 85, 1.96x n z
/2 /2
46 461300 1.96 1300 1.96
85 85
1300 34.01 1300 34.01
1265.99 1334.01
x z x zn n
95% Confidence Interval for
Problem # 1
A survey was taken of U.S. companies that do business with firms in India.
One of the questions on the survey was: Approximately how many years has your company been trading with firms in India?
A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this question is 7.7 years.
Using this information, construct a 90% confidence interval for the mean number of years that a U.S. company has been trading with firms in India.
365.12μ545.8
91.1455.10μ91.1455.10
44
7.7645.1455.10μ
44
7.7645.1455.10
μ
n
zxn
zx645.1 confidence %90
.44 ,7.7 ,455.10
z
nx
Problem 1 - Solution
Problem # 2
A study is conducted in a company that employs 800 engineers. A random sample of 50 engineers reveals that the average sample age is 34.3 years. Historically, the population standard deviation of the age of the company’s engineers is approximately 8 years.
Construct a 98% confidence interval to estimate the average age of all the engineers in this company.
85.3675.31
554.23.34554.23.34
1800
50800
50
833.23.34
1800
50800
50
833.23.34
11
N
nN
nzx
N
nN
nzx
33.2 confidence %98
.50 and ,800= ,8 ,3.34
z
nNx
Problem 2- Solution
Estimating the Mean of a Normal Population: Sample Size is Small (n<30)
The distribution of sample means is approximately normal if the population has a normal distribution.
The z formulas can be used to estimate a population mean if the value of the population Standard Deviation is known.
Problem #3
Suppose a car rental firm wants to estimate the
average number of miles travelled per day by
each of its car. A random sample of 20 cars data
reveal that the sample mean travel distance per
day is 85.5 km with a population standard
deviation of 19.3 km. Assume that the number of
miles travelled per day is normally distributed in
the population.
Compute 99% confidence interval to estimate
population mean.
96.6μ4.74
Problem ??
The Greensboro Coliseum is considering
expanding its seating capacity and needs to know
both the average number of people who attends
events there and the variability in this number.
The following are the attendances in thousands
at nine randomly selected sporting events. Find
the point estimates of the mean and the variance
of the population from which the sample was
drawn.
8.8 14.0 21.3 7.9 12.5 20.6 16.3 14.1 13.0
Answer: 14.2777 thousands; 21.119
Problem ??
The National Bank of Lincoln is trying to
determine the number of tellers available during
the lunch rush on Fridays. The bank has
collected data on the number of people who
entered the bank during the last 3 months on
Friday from 11 a.m. to 1 p.m. Using the data
below, find the point estimates of the mean and
standard deviation of the population from which
the sample was drawn.
242, 275, 289, 306, 342, 385, 279, 245, 269, 305,
294, 328
Answer: x bar = 296.58 people; s =40.75
Problem ??
Bobby wants to purchase a used car. He randomly selected 125
want ads and found that the average price of a car in this sample
was Rs.1.75 lakhs. He knows that the standard deviation of the
used-car prices in the city is Rs.33500.
(a) Establish an interval estimate for the average price of a car so
that Bobby can be 68.3 percent certain that the population mean
lies within this interval?
Answer: (a) 172003.6 – 177996.3
(b) Establish an interval estimate for the average price of a car so
that Bobby can be 95.5 percent certain that the population mean
lies within this interval
Answer: (b) 169007.3 – 180992.7
Session 14
Problem ??
The Westview High School Principal is interested in knowing the
average height of seniors at this school, but she does not have
enough time to examine the records of all 430 seniors. It is
assumed that the height of seniors follows normal distribution.
She randomly selects 48 students. She finds the sample mean to
be 64.5 inches and the standard deviation to be 2.3 inches.
(a) Find the estimated standard error of the mean
Answer: (a) 0.31326
(b) Construct a 90 percent confidence interval for the mean
Answer: (b) 63.986 – 65.014
t Distribution
When the population standard deviation is unknown, sample size is <30….t distribution
Early theoretical work on t distribution was done by W.S. Gosset in early 1900s (Guinness Brewery, Dublin)
t distribution is used instead of the z distribution for doing inferential statistics on the population mean when the population Std Dev is unknown and the population is normally distributed
With the t distribution, you use the Sample Std Dev, s
n
s
xt
A family of distributions - a unique distribution for each value of its parameter using degrees of freedom (d.f.)
t formula:
t Distribution
t distribution – symmetric, unimodal, mean = 0, flatter in middle and have more area in their tails than the normal distribution
t distribution approach the normal curve as n becomes larger
t distribution is to be used when the population variance or population Std Dev is unknown, regardless of the size of the sample
t Distribution Characteristics
t table uses the area in the tail of the distribution Emphasis in the t table is on α, and each tail of the distribution contains α/2 of the area under the curve when confidence intervals are constructed
t values are located at the intersection of the df value and the selected α/2 value
Reading the t Distribution
1
1,2/1,2/
1,2/
ndf
n
stx
n
stx
or
n
stx
nn
n
Confidence Intervals for of a Normal Population: Unknown
Table of Critical Values of t
t
With df = 24 and = 0.05, t = 1.711.
df t0.100 t0.050 t0.025 t0.010 t0.005
1 3.078 6.314 12.706 31.821 63.656
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.327 2.576
1
ndf
n
stx
n
stx
or
n
stx
Confidence Intervals for of a Normal Population: Unknown
Problem #4 The owner of a large equipment rental company wants to make a rather quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive. The owner decides to take a random sample of rental invoices. Fourteen different rentals of ditch diggers are selected randomly from the files, yielding the following data. The owner uses these data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assumes that the number of days per rental is normally distributed in the population.
3 1 3 2 5 1 2 1 4 2 1 3 1 1
18.310.1
04.114.204.114.2
14
29.1012.314.2
14
29.1012.314.2
n
stx
n
stx
012.3
005.02
99.1
2
131 ,14 ,29.1,14.2
13,005.
t
ndfn sx
Solution for Problem #4
Problem ??
Suppose a researcher wants to estimate the average amount of
extra working hours (beyond their 40-hour week) used per week
for managers in the aerospace industry. He randomly samples 18
managers and measures the amount of extra time they work
during a specific week and obtains the results (in hours) as shown
below:
6 21 17 20 7 0 8 16 29
3 8 12 11 9 21 25 15 16
Construct a 90% confidence interval to estimate the average
amount of extra time per week worked by a manager
Answer: (a) 10.356 – 16.754
t0.05,17 = 1.740
2 2
ˆ ˆ ˆ ˆˆ ˆ
:
ˆ = sample proportion
ˆ ˆ=1
= population proportion
= sample size
p q p qp z p p z
n n
where
p
q p
p
n
Confidence Interval to Estimate the Population Proportion
Estimating the population proportion often must be made
Problem #5
A clothing company produces men’s jeans. The jeans are made and sold with either a regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans market in Oklahoma City that prefers boot-cut jeans, the analyst takes a random sample of 423 jeans sales from the company’s two Oklahoma City retail outlets. Only 72 of the sales were for boot-cut jeans. Construct a 90% confidence interval to estimate the proportion of the population in Oklahoma City who prefer boot-cut jeans.
ˆ ˆ ˆ ˆˆ ˆ
(0.17)(0.83) (0.17)(0.83)0.17 1.645 0.17 1.645
423 423
0.17 0.03 0.17 0.03
0.14 0.20
pq pqp z p p z
n n
p
p
p
72ˆ423, 72, 0.17
423
ˆ ˆ=1 1 0.17 0.83
90% 1.645
xn x p
n
q p
Confidence z
Solution Problem #5
Determining Sample Size when Estimating
It may be necessary to estimate the sample size when working on a project
In studies where µ is being estimated, the size of the sample can be determined by using the z formula for sample means to solve for n
Difference between and µ is the error of estimation x
Determining Sample Size when Estimating
n
xz
xE
E
z
E
zn
2
2
2
22
2
1
4range
z formula
Error of Estimation (tolerable error)
Estimated Sample Size
Estimated
Problem #6
Suppose you want to estimate the average age of all Boeing 737-300 airplanes now in active domestic U.S. service. You want to be 95% confident, and you want your estimate to be within one year of the actual figure. The 737-300 was first placed in service about 24 years ago, but you believe that no active 737-300s in the U.S. domestic fleet are more than 20 years old. How large of a sample should you take?
22
2
2 2(1.96) (5)
21
96.04 or 97
n zE
Solution for Problem 6
Determining Sample Size when Estimating p
n
qp
ppZ
ˆ
ppE ˆ
E
pqzn
2
2
z formula
Error of Estimation (tolerable error)
Estimated Sample Size
Problem #7
Hewitt Associates conducted a national survey to determine the extent to which employers are promoting health and fitness among their employees. One of the questions asked was, Does your company offer on-site exercise classes? Suppose it was estimated before the study that no more than 40% of the companies would answer Yes. How large a sample would Hewitt Associates have to take in estimating the population proportion to ensure a 98% confidence in the results and to be within .03 of the true population proportion?
2
2
2
2
(2.33) (0.40)(0.60)
(.03)
1,447.7 or 1,448
z pqn
E
60.01
40.0
33.2 %98
03.0
PQ
Pestimated
ZConfidence
E
Solution for Problem 7