56
Session 12

Module 2 Part I

Embed Size (px)

Citation preview

Page 1: Module 2 Part I

Session 12

Page 2: Module 2 Part I

Reference

Levin, R. I. and Rubin, D.S., Statistics for Management (Pearson Education )

Black, K., Business Statistics 5th Edn., Wiley Publication.

Page 3: Module 2 Part I

Q? What is the purpose of obtaining a sample?

A. To provide a description of a population

Page 4: Module 2 Part I

In the inferential statistics process, a researcher selects a random sample from the population, computes a statistic on the sample, and reaches conclusions about the population parameter from the statistic.

In attempting to analyze the sample statistic, it is essential to know the distribution of the statistic.

Sampling distribution: The probability distribution of a statistic, obtained by selecting all the possible samples of a specific size from a population.

Predicting the characteristics of a sample

Page 5: Module 2 Part I

Example

Frequency distribution for a population of four scores: 2, 4, 6, 8

Suppose we know the marks of four students

Scores: 2,4,6,8

Page 6: Module 2 Part I

Let’s construct a distribution of sample means

Population parameters (scores) : 2,4,6,8

Specify a sample size, say n=2

Examine all possible samples (A,A), (A,B), (A,C)….

The possible samples of n = 2 scores from the population

Page 7: Module 2 Part I

Figure – Distribution of sample means

The distribution of sample means for n = 2

Page 8: Module 2 Part I

Characteristics of sample means

Sample means tend to pile up around the population mean

The distribution of sample means is approximately normal in shape.

The distribution of sample means can be used to answer probability questions about sample means

Page 9: Module 2 Part I

What do we use when we have a large n and do not want to calculate all of the

possible samples ?

Page 10: Module 2 Part I

Central Limit Theorem

CLT: For any population with mean of and a

standard deviation , the distribution of sample

means for sample size n will approach a normal

distribution with a mean of and a standard

deviation of / (square root of n) as n approaches

infinity.

n

Page 11: Module 2 Part I

Central Limit Theorem Cont’d

Distribution of sample means tends to be a normal distribution particularly if one of the following is true:

The population from which the sample is drawn is normal.

The number of scores (n) in each sample is relatively large (n>30)

Page 12: Module 2 Part I

Expected value of X

Sample means should be close to the population mean (expected value of x)

Expected value of X: the mean of the distribution of sample means will be equal to (the population mean)

X

Page 13: Module 2 Part I

Standard Error of X

nx

Standard error of the

mean for an infinite

population

Standard deviation of

the population

x

x

Page 14: Module 2 Part I

Magnitude of the Standard error is

determined by

The size of the sample

The standard deviation of the population from which the sample is selected

Law of large numbers: the > n, the more probable the sample mean will be close to the population mean.

Page 15: Module 2 Part I

Estimating the Population Mean

Page 16: Module 2 Part I

Interval estimate

Suppose a marketing research director needs an estimate of the average life in months of car batteries his company manufacturers.

A random sample of 200 batteries is selected.

Enquire about the life of the batteries.

Mean battery life is 36 months.

Point estimate: Mean battery life is 36 months.

What about the uncertainty factor??

To answer this we need to find the standard error.

Standard error is calculated as 0.707 months

In other words: actual life of battery may lie somewhere in the interval estimate of 35.293 to 36.707 months

Page 17: Module 2 Part I

Session 13

Page 18: Module 2 Part I

Confidence Interval to Estimate when is Known

n

xx

nzx

nzx

or

nzx

Point estimate

Interval Estimate

Page 19: Module 2 Part I

What is a confidence interval?

One sample out of 20 (5%) does not contain the true mean, 15.

1011121314151617181920

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sample

Page 20: Module 2 Part I

Confidence Interval (contd..)

95% confidence means: 95% of all the sample means are within ±2 standard errors from μ.

μ is within ±2 standard errors of 95% of all the sample means.

Page 21: Module 2 Part I

Distribution of Sample Means for 95% Confidence

.4750 .4750

X

95%

.025 .025

Z 1.96 -1.96 0

Page 22: Module 2 Part I

For a 95% confidence interval

α = 0.05

α/2 = 0.025

Value of α/2 or z.025 look at the standard normal distribution table under

.5000 - .0250 = .4750

From standard normal table look up 0.4750, and read 1.96 as the z value from the row and column

Estimating the Population Mean

Page 23: Module 2 Part I

α is used to locate the Z value in constructing the confidence interval

The confidence interval yields a range within which the researcher feel with some confidence the population mean is located

Z score – the number of standard deviations a value (x) is above or below the mean of a set of numbers when the data are normally distributed

Estimating the Population Mean

xz

n

Page 24: Module 2 Part I

95% Confidence Intervals for

X

95%

X X

X

X

X

X

Page 25: Module 2 Part I

/21300, 160, 85, 1.96x n z

/2 /2

46 461300 1.96 1300 1.96

85 85

1300 34.01 1300 34.01

1265.99 1334.01

x z x zn n

95% Confidence Interval for

Page 26: Module 2 Part I

Problem # 1

A survey was taken of U.S. companies that do business with firms in India.

One of the questions on the survey was: Approximately how many years has your company been trading with firms in India?

A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this question is 7.7 years.

Using this information, construct a 90% confidence interval for the mean number of years that a U.S. company has been trading with firms in India.

Page 27: Module 2 Part I

365.12μ545.8

91.1455.10μ91.1455.10

44

7.7645.1455.10μ

44

7.7645.1455.10

μ

n

zxn

zx645.1 confidence %90

.44 ,7.7 ,455.10

z

nx

Problem 1 - Solution

Page 28: Module 2 Part I

Problem # 2

A study is conducted in a company that employs 800 engineers. A random sample of 50 engineers reveals that the average sample age is 34.3 years. Historically, the population standard deviation of the age of the company’s engineers is approximately 8 years.

Construct a 98% confidence interval to estimate the average age of all the engineers in this company.

Page 29: Module 2 Part I

85.3675.31

554.23.34554.23.34

1800

50800

50

833.23.34

1800

50800

50

833.23.34

11

N

nN

nzx

N

nN

nzx

33.2 confidence %98

.50 and ,800= ,8 ,3.34

z

nNx

Problem 2- Solution

Page 30: Module 2 Part I

Estimating the Mean of a Normal Population: Sample Size is Small (n<30)

The distribution of sample means is approximately normal if the population has a normal distribution.

The z formulas can be used to estimate a population mean if the value of the population Standard Deviation is known.

Page 31: Module 2 Part I

Problem #3

Suppose a car rental firm wants to estimate the

average number of miles travelled per day by

each of its car. A random sample of 20 cars data

reveal that the sample mean travel distance per

day is 85.5 km with a population standard

deviation of 19.3 km. Assume that the number of

miles travelled per day is normally distributed in

the population.

Compute 99% confidence interval to estimate

population mean.

96.6μ4.74

Page 32: Module 2 Part I

Problem ??

The Greensboro Coliseum is considering

expanding its seating capacity and needs to know

both the average number of people who attends

events there and the variability in this number.

The following are the attendances in thousands

at nine randomly selected sporting events. Find

the point estimates of the mean and the variance

of the population from which the sample was

drawn.

8.8 14.0 21.3 7.9 12.5 20.6 16.3 14.1 13.0

Answer: 14.2777 thousands; 21.119

Page 33: Module 2 Part I

Problem ??

The National Bank of Lincoln is trying to

determine the number of tellers available during

the lunch rush on Fridays. The bank has

collected data on the number of people who

entered the bank during the last 3 months on

Friday from 11 a.m. to 1 p.m. Using the data

below, find the point estimates of the mean and

standard deviation of the population from which

the sample was drawn.

242, 275, 289, 306, 342, 385, 279, 245, 269, 305,

294, 328

Answer: x bar = 296.58 people; s =40.75

Page 34: Module 2 Part I

Problem ??

Bobby wants to purchase a used car. He randomly selected 125

want ads and found that the average price of a car in this sample

was Rs.1.75 lakhs. He knows that the standard deviation of the

used-car prices in the city is Rs.33500.

(a) Establish an interval estimate for the average price of a car so

that Bobby can be 68.3 percent certain that the population mean

lies within this interval?

Answer: (a) 172003.6 – 177996.3

(b) Establish an interval estimate for the average price of a car so

that Bobby can be 95.5 percent certain that the population mean

lies within this interval

Answer: (b) 169007.3 – 180992.7

Page 35: Module 2 Part I

Session 14

Page 36: Module 2 Part I

Problem ??

The Westview High School Principal is interested in knowing the

average height of seniors at this school, but she does not have

enough time to examine the records of all 430 seniors. It is

assumed that the height of seniors follows normal distribution.

She randomly selects 48 students. She finds the sample mean to

be 64.5 inches and the standard deviation to be 2.3 inches.

(a) Find the estimated standard error of the mean

Answer: (a) 0.31326

(b) Construct a 90 percent confidence interval for the mean

Answer: (b) 63.986 – 65.014

Page 37: Module 2 Part I

t Distribution

When the population standard deviation is unknown, sample size is <30….t distribution

Early theoretical work on t distribution was done by W.S. Gosset in early 1900s (Guinness Brewery, Dublin)

t distribution is used instead of the z distribution for doing inferential statistics on the population mean when the population Std Dev is unknown and the population is normally distributed

With the t distribution, you use the Sample Std Dev, s

Page 38: Module 2 Part I

n

s

xt

A family of distributions - a unique distribution for each value of its parameter using degrees of freedom (d.f.)

t formula:

t Distribution

Page 39: Module 2 Part I

t distribution – symmetric, unimodal, mean = 0, flatter in middle and have more area in their tails than the normal distribution

t distribution approach the normal curve as n becomes larger

t distribution is to be used when the population variance or population Std Dev is unknown, regardless of the size of the sample

t Distribution Characteristics

Page 40: Module 2 Part I

t table uses the area in the tail of the distribution Emphasis in the t table is on α, and each tail of the distribution contains α/2 of the area under the curve when confidence intervals are constructed

t values are located at the intersection of the df value and the selected α/2 value

Reading the t Distribution

Page 41: Module 2 Part I

1

1,2/1,2/

1,2/

ndf

n

stx

n

stx

or

n

stx

nn

n

Confidence Intervals for of a Normal Population: Unknown

Page 42: Module 2 Part I

Table of Critical Values of t

t

With df = 24 and = 0.05, t = 1.711.

df t0.100 t0.050 t0.025 t0.010 t0.005

1 3.078 6.314 12.706 31.821 63.656

2 1.886 2.920 4.303 6.965 9.925

3 1.638 2.353 3.182 4.541 5.841

4 1.533 2.132 2.776 3.747 4.604

5 1.476 2.015 2.571 3.365 4.032

23 1.319 1.714 2.069 2.500 2.807

24 1.318 1.711 2.064 2.492 2.797

25 1.316 1.708 2.060 2.485 2.787

29 1.311 1.699 2.045 2.462 2.756

30 1.310 1.697 2.042 2.457 2.750

40 1.303 1.684 2.021 2.423 2.704

60 1.296 1.671 2.000 2.390 2.660

120 1.289 1.658 1.980 2.358 2.617

1.282 1.645 1.960 2.327 2.576

Page 43: Module 2 Part I

1

ndf

n

stx

n

stx

or

n

stx

Confidence Intervals for of a Normal Population: Unknown

Page 44: Module 2 Part I

Problem #4 The owner of a large equipment rental company wants to make a rather quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive. The owner decides to take a random sample of rental invoices. Fourteen different rentals of ditch diggers are selected randomly from the files, yielding the following data. The owner uses these data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assumes that the number of days per rental is normally distributed in the population.

3 1 3 2 5 1 2 1 4 2 1 3 1 1

Page 45: Module 2 Part I

18.310.1

04.114.204.114.2

14

29.1012.314.2

14

29.1012.314.2

n

stx

n

stx

012.3

005.02

99.1

2

131 ,14 ,29.1,14.2

13,005.

t

ndfn sx

Solution for Problem #4

Page 46: Module 2 Part I

Problem ??

Suppose a researcher wants to estimate the average amount of

extra working hours (beyond their 40-hour week) used per week

for managers in the aerospace industry. He randomly samples 18

managers and measures the amount of extra time they work

during a specific week and obtains the results (in hours) as shown

below:

6 21 17 20 7 0 8 16 29

3 8 12 11 9 21 25 15 16

Construct a 90% confidence interval to estimate the average

amount of extra time per week worked by a manager

Answer: (a) 10.356 – 16.754

t0.05,17 = 1.740

Page 47: Module 2 Part I

2 2

ˆ ˆ ˆ ˆˆ ˆ

:

ˆ = sample proportion

ˆ ˆ=1

= population proportion

= sample size

p q p qp z p p z

n n

where

p

q p

p

n

Confidence Interval to Estimate the Population Proportion

Estimating the population proportion often must be made

Page 48: Module 2 Part I

Problem #5

A clothing company produces men’s jeans. The jeans are made and sold with either a regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans market in Oklahoma City that prefers boot-cut jeans, the analyst takes a random sample of 423 jeans sales from the company’s two Oklahoma City retail outlets. Only 72 of the sales were for boot-cut jeans. Construct a 90% confidence interval to estimate the proportion of the population in Oklahoma City who prefer boot-cut jeans.

Page 49: Module 2 Part I

ˆ ˆ ˆ ˆˆ ˆ

(0.17)(0.83) (0.17)(0.83)0.17 1.645 0.17 1.645

423 423

0.17 0.03 0.17 0.03

0.14 0.20

pq pqp z p p z

n n

p

p

p

72ˆ423, 72, 0.17

423

ˆ ˆ=1 1 0.17 0.83

90% 1.645

xn x p

n

q p

Confidence z

Solution Problem #5

Page 50: Module 2 Part I

Determining Sample Size when Estimating

It may be necessary to estimate the sample size when working on a project

In studies where µ is being estimated, the size of the sample can be determined by using the z formula for sample means to solve for n

Difference between and µ is the error of estimation x

Page 51: Module 2 Part I

Determining Sample Size when Estimating

n

xz

xE

E

z

E

zn

2

2

2

22

2

1

4range

z formula

Error of Estimation (tolerable error)

Estimated Sample Size

Estimated

Page 52: Module 2 Part I

Problem #6

Suppose you want to estimate the average age of all Boeing 737-300 airplanes now in active domestic U.S. service. You want to be 95% confident, and you want your estimate to be within one year of the actual figure. The 737-300 was first placed in service about 24 years ago, but you believe that no active 737-300s in the U.S. domestic fleet are more than 20 years old. How large of a sample should you take?

Page 53: Module 2 Part I

22

2

2 2(1.96) (5)

21

96.04 or 97

n zE

Solution for Problem 6

Page 54: Module 2 Part I

Determining Sample Size when Estimating p

n

qp

ppZ

ˆ

ppE ˆ

E

pqzn

2

2

z formula

Error of Estimation (tolerable error)

Estimated Sample Size

Page 55: Module 2 Part I

Problem #7

Hewitt Associates conducted a national survey to determine the extent to which employers are promoting health and fitness among their employees. One of the questions asked was, Does your company offer on-site exercise classes? Suppose it was estimated before the study that no more than 40% of the companies would answer Yes. How large a sample would Hewitt Associates have to take in estimating the population proportion to ensure a 98% confidence in the results and to be within .03 of the true population proportion?

Page 56: Module 2 Part I

2

2

2

2

(2.33) (0.40)(0.60)

(.03)

1,447.7 or 1,448

z pqn

E

60.01

40.0

33.2 %98

03.0

PQ

Pestimated

ZConfidence

E

Solution for Problem 7