40
6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Statistics 571: Statistical Methods Ramón V. León Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

  • Upload
    lynhan

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1

Statistics 571: Statistical MethodsRamón V. León

Unit 5: Sampling Distributions of Statistics

Page 2: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 2

Definitions and Key Concepts

• A sample statistic used to estimate an unknown population parameter is called an estimate

• The discrepancy between the estimate and the true parameter value is known as sampling error

• Sampling error is due to sampling variation

Page 3: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 3

Frequentist Approach to Statistics

• Assesses the accuracy of a sample estimate by considering how the estimate would vary around the true parameter value if repeated random samples are drawn from the same population

• A statistic is a random variable with a probability distribution - called the sampling distribution -which is generated by repeated sampling.

• We use the sampling distribution of a statistic to assess the sampling error of an estimate

Page 4: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 4

Sample Mean

A random sample is a set of independently, identically distributed ori.i.d. observations X1, X2, … , Xn (when sampling from a large population or with replacement)

2

1

Assume that thepopulation hasmean ( ) and variance ( )

1How does thesample mean

vary on repeated random samples of size ?

i in

ii

E X Var X

X Xn

n

µ σ

=

= =

= ∑

This is called the sampling distribution of the sample mean.

Page 5: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 5

Mean and Variance of a Die Toss

Page 6: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 6

Simulating a Die Toss in JMP

Page 7: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 7

Rolling Two Dice

Each one of these 36 outcomes are equally likely, i.e., each one occurs with 1/36 probability.

Page 8: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 8

Rolling Two Dice Sampling Distribution

Page 9: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 9

Homework To be Done Right Away

• Draw 10,000 random samples of size N=5 from the normal distribution provided. – Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance– Turn in this output with the rest of the homework for Unit 5.

• Draw 10,000 random samples of size N=20 from the normal distribution provided.– Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance

• Draw 10,000 random samples of size N=5 from a uniform distribution on [0,32].– Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance

• Draw 10,000 random samples of size N=20 from a uniform distribution on [0,32].– Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance

• Draw 10,000 random samples of size N=5 from the skewed distribution provided.– Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance– Construct the histogram of the sampling distribution of the sample median

• Draw 10,000 random samples of size N=20 from the skewed distribution provided.– Construct the histogram of the sampling distribution of the sample mean.– Construct the histogram of the sampling distribution of the sample variance– Construct the histogram of the sampling distribution of the sample median

Use the Sampling Distribution simulation Java applet at the Rice Virtual Lab in Statistics to do the following.

Page 10: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 10

Distribution of Sample Means• If the i.i.d. r.v.’s are Bernoulli, Normal, or Exponential

the distribution of the sample mean can be calculatedexactly.

• However, in general the exact distribution of the sample mean is difficult to calculate.

• What can be said about the distribution of the sample mean when the sample is drawn from an arbitrary population?

• In many cases we can approximate the distribution of the sample mean when n is large by a normal distribution. This result is called the Central Limit Theorem.

Page 11: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 11

Central Limit Theorem

Let X1, X2, … , Xn be a random sample drawn from an arbitrarydistribution with a finite mean µ and variance σ2. Then if n is sufficiently large

(0,1)X Nnµ

σ−

Sometimes the theorem is given in terms of the sums:

1 (0,1)

n

ij

X nN

n

µ

σ=

−≈

Page 12: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 12

Central Limit

Theorem Illustration

010

5

Page 13: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 13

Screen Shots of the Output of the Sampling Distribution Simulation Java Applet

6.22 2.78 2.815n

σ= =

Page 14: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 14

Central Limit Theorem and Law of Large Numbers

• Both are asymptotic results about the sample mean• Law of Large Numbers says that as n goes to infinity the

sample mean converges to the population mean, i.e.

converges to 0 asX nµ− → ∞

converges to (0,1) asX N nnµ

σ−

→ ∞

•CLT says that as n goes to infinity

Page 15: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 15

Central Limit Theorem

Let X1, X2, … , Xn be a random sample drawn from an arbitrarydistribution with a finite mean µ and variance σ2. Then if n is sufficiently large

(0,1)X Nnµ

σ−

Sometimes the theorem is given in terms of the sums:

1 (0,1)

n

ij

X nN

n

µ

σ=

−≈

Page 16: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 16

Normal Approximation to the BinomialA binomial r.v. is the sum of i.i.d. Bernoulli r.v.’s so the CLTcan be used to approximate its distribution

Suppose that Z is Bernoulli. Then the mean of Z is p andits variance is (1 ).p p−

By the CLT we have for the Binomial (n, p) r.v X :

1 1

( )(0,1)

(1 ) (1 ) ( )

n n

i ii i

Z np Z nE ZX np Nnp p np p Var Z n

= =

− −−

= = ≈− − ×

∑ ∑

How large of a sample, n, do we need for the approximation tobe good?

10 and (1 ) 10np n p≥ − ≥Rule of Thumb:

Page 17: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 17

CLT Approximation to the BinomialWhen p is Close to 0.5

For a good approximation np=n(1-p)=n0.5 should be at least 10. So, for a good approximation n should be at least 20

Page 18: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 18

CLT Approximation to the Binomial When p is Not Close to 0.5

np = n(.1)should be at least 10.So n should be at least100

Page 19: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 19

Continuity Correction

( ) 8.58(1 )

npP Xnp p

−≤ Φ −

Similarly:

( ) 7.58 1(1 )

npP Xnp p

−≥ − Φ −

Page 20: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 20

Screen Shots of the Output of the Java Applet “Normal Approximation to the Binomial

Distribution”

Homework: See the Homework Log.

Page 21: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 21

Why the Normal Approximation to the Binomial Distribution Works in Pictures

Green area is approximately the same as the red area

Page 22: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 22

Java Applet for N=100 and p=.1

Page 23: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 23

Example: CLT Approximation to the Binomial

Page 24: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 24

Rolling Two Dice

Each one of these 36 outcomes are equally likely, i.e., each one occurs with 1/36 probability.

Now we pay attention to the sample variance.

Page 25: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 25

Sampling Distribution of the Sample Variance: Two Dice Example

Page 26: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 26

Chi-Square Distribution

Page 27: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 27

Using JMP to Simulate a Chi-Square Random Sample with 5 d.f.

See the JMP tutorial “Chi-Square Simulation” on the course home page

The number of rows is the size of the random sample

Page 28: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 28

Sample of 1000 Random Chi-Square Random Variables

Notice the right skewness

Page 29: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 29

Fitted Chi-Square Based on the Sample

0 10 20

Page 30: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 30

Chi-Square Density Function CurvesNotice how similar is this density function to the histogram in the previous page.

Page 31: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 31

Critical Values for the

Chi-Square

See the JMP tutorial “Tabled Values of Common Distributions”

Page 32: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 32

Distribution of Sample VarianceAssuming that the random sample comes from a normal distribution

Page 33: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 33

Application of the Distribution of Sample Variance – Measurement Precision

Introduction to the ideas of hypothesis testing

Page 34: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 34

Application of the Distribution of Sample Variance – Measurement Precision

29,0.05 16.92χ =

0.05

Page 35: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 35

Student’s t-Distribution2

1 2Consider a random sample, , ,..., drawn from a ( , )nX X X N µ σ

It is known that ( )Xnµ

σ− is exactly distributed as N(0,1) for any n.

The distribution of T is named Student’s t-distribution. (A different distribution for each number ν = n -1 = degrees of freedom)

( )XTS n

µ−= is not longer distributed as N(0,1).But

Play with the Java applet “Student’s t Distribution”

Page 36: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 36

t-Distribution Table

See the JMP tutorial “Tabled Values of Common Distributions”

Page 37: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 37

Application of the t-Distribution Calculation –Process Control

Page 38: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 38

Example: t-Distribution Calculation

0.005=3.250

Page 39: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 39

F-Distribution

1 2

2 21 2 1 1 1 2 2 2

Consider two independent random samples,, ,..., from an ( , ), , ,..., from an ( , ).n nX X X N Y Y Y Nµ σ µ σ

2 21 12 22 2

SS

σσ

Then has an F distribution with ν1 = n1 - 1 d.f.

in the numerator and ν2= n2-1 d.f. in the denominator.

Page 40: Unit 5: Sampling Distributions of Statisticsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit5.pdf · Unit 5: Sampling Distributions of Statistics. ... X2, … , Xn (when sampling

6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 40

F-Distribution Table

See the JMP tutorial “Tabled Values of Common Distributions”