the Central Limit Theorem, Point Estimation & Estimatorshomepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch7_pt1.pdf · statistic of interest calculated from the data when estimating

Chapter 7Sampling Distributions and Point Estimation ofParameters

Part 1: Sampling Distributions,the Central Limit Theorem,Point Estimation & Estimators

Sections 7-1 to 7-2

1 / 26

Statistical Inferences

A random sample is collected on a population to draw conclusions, ormake statistical inferences, about the population.

Definition (Random Sample)

The random variables X1, X2, . . . , Xn are a random sample of size n if...1) the Xi’s are independent2) every Xi has the same probability distribution

Types of statistical inference:1 Parameter estimation (e.g. estimating µ) with a confidence interval

For estimating µ, we collect data and we use the observed samplemean x̄ as a point estimate for µ and create a confidence interval toreport a likely range in which µ lies.

2 Hypothesis testing about a population parameter (e.g. H0 : µ = 50)

We wish to compare the mean time that women and men spend at theCRWC. H0 : µM = µW ? Or perhaps there is evidence against thishypothesis.

2 / 26

Sample Mean X̄, a Point Estimate for the populationmean µ

The sample mean X̄ is a point estimate for the population mean µ.

NOTATION: µ̂ = X̄ (a ‘hat’ over a parameter represents an estimator,X̄ is the estimator here)

Prior to data collection, X̄ is a random variable and it is thestatistic of interest calculated from the data when estimating µ.

The value we get for X̄ (the sample mean) depends on the specificsample chosen!

If X̄ is a random variable, then it has a certain expected value,variance, and distribution. The distribution of the random variable X̄is called the sampling distribution of X̄.

3 / 26

Sample-to-Sample Variability

As stated earlier, there is randomness in the X̄ value we get from arandom sample. Suppose I want to estimate a population meanheight µ using a sample mean X̄.

Suppose I randomly select 50 individuals from a population, measuretheir heights, and find the sample mean x̄ = 5 foot 6 inches

Suppose I repeat the process1, I again randomly select 50 individualsfrom a population, measure their heights, and find the sample meanx̄ = 5 foot 8 inches

Suppose I repeat the process, I again randomly select 50 individualsfrom a population, measure their heights, and find the sample meanx̄ = 5 foot 5 inches

I didn’t do anything wrong in my data collection, this is justSAMPLING VARIABILITY!

1In reality, we only take one sample. The above is meant to emphasize the existence of sample-to-sample variability.

4 / 26

The Sampling Distribution of X̄

Definition (Sampling Distribution)

The probability distribution of a statistic is called a sampling distribution.

X̄ is a statistic calculated from a random sample X1, X2, . . . , Xn.

X̄ is a linear combination of random variables.

X̄ =∑n

i=1Xi

n = 1nX1 + 1

nX2 + · · ·+ 1nXn

For a random sample X1, X2, . . . , Xn drawn from any distributionwith mean E(Xi) = µ and variance V (Xi) = σ2 , we have

E(X̄) = µ and V (X̄) = σ2

n

But a mean and variance does not fully specify a distribution.Do we know the probability distribution of X̄? ...

5 / 26


It turns out that X̄ has some predictable behavior...

If the X1, X2, . . . , Xn are drawn from a normal distribution, or bynotation Xi ∼ N(µ, σ2) for all i, then

X̄ ∼ N(µ, σ2

n ) for any sample size n.

Example

Suppose IQ scores are normally distributed with mean µ = 100 andvariance σ2 = 256. If n = 9 IQ scores are drawn at random from thispopulation, what is the probability that the sample mean is less than 93?

ANSWER: Find P (X̄ < 93) (next page).

6 / 26


Example

Suppose IQ scores are normally distributed with mean µ = 100 andvariance σ2 = 256. If n = 9 IQ scores are drawn at random from thispopulation, what is the probability that the sample mean is less than 98?

ANSWER: Find P (X̄ < 93).

We first need a distribution for X̄ (it follows a normal distribution!), andthen we’ll use it to create a Z random variable and use the Z-table.

7 / 26


The graphic below shows how the variability in X̄ decreases as thesample size n increases. Recall X̄ ∼ N(µ, σ

2

n).

8 / 26


Notation:

E(X̄) = µX̄ = E(X) = µ

V (X̄) = σ2X̄

= V (X)n = σ2

n

Terminology:

The term standard deviation refers to the population standarddeviation, or

√V (X) = σ, and...

Z = X−µσ

The term standard error is a value related to X̄ and is also morefully stated as the standard error of the sample mean and it is thesquare root of the variance of X̄.

Std. Error of X̄ is√V (X̄) =

√σ2

n = σ√n

And then...

Z =X̄ − µX̄σX̄

=X̄ − µ√

σ2

n

=X̄ − µσ/√n

9 / 26


Even when Xi are NOT drawn from a normal distribution, it turnsout that X̄ has some predictable behavior...

If the X1, X2, . . . , Xn were NOT drawn from a normal distribution,or by notation Xi ∼?(µ, σ2) for all i, then X̄ is approximatelynormally distributed as long as n is large enough or

X̄·∼ N(µ, σ

2

n ) for n > 25 or 30.

Thus, X̄ follows a normal distribution!!! (for a sufficiently large n)This is an incredibly useful result for calculating probabilities for X̄!!

10 / 26


Example (Probability for X̄, Flaws in a copper wire)

Let X denote the number of flaws in a 1 inch length of copper wire. Theprobability mass function of X is presented in the following table:

x P (X = x)

0 0.481 0.392 0.123 0.01

Suppose n = 100 wires are randomly sampled from this population. Whatis the probability that the average number of flaws per wire in the sampleis less than 0.5? (i.e. find P (X̄ < 0.5)... next page)

11 / 26


Example (Probability for X̄, Flaws in a copper wire)

ANSWER: P (X̄ < 0.5))=

12 / 26

Central Limit Theorem (CLT)

Definition (Central Limit Theorem)

Let X1, X2, . . . , Xn be a random sample drawn from any population (ordistribution) with mean µ and variance σ2. If the sample size is*sufficiently large*, then X̄ follows an approximate normal distribution.

We write: X̄d→ N(µ, σ

2

n ) as n→∞

Or: Z =X̄−µσ/√n

d→ N(0, 1) as n→∞

If the random sample is drawn from a non-normal population, then X̄ isapproximately normal for sufficient large n (at least 25 or 30) and theapproximation gets better and better as n increases.

NOTE: If the original ‘parent population’ from which the sample was drawn isnormal, then X̄ follows a normal distribution for any n (a linear combination ofnormals is normal), and the CLT is not needed to achieve normality.

13 / 26

The Sampling Distribution of X̄ (simulation)

Let’s simulate this situation...

Case 1: Original population is normally distributed x

f(x)

1 Choose a sample of size n from a normal distribution2 Compute x̄3 Plot the x̄ on our frequency histogram4 Do steps 1-3 many time, such as 1000 times5 Draw a histogram of the 1000 x̄ values

(to see the sampling distribution of X̄)

See applet at:http://onlinestatbook.com/stat sim/sampling dist/index.html

14 / 26


Case 1: Original population is normally distributed (with n=2)

The empirical distribution for X̄n=2 is in the lower plot (in blue). Its meanis very close to the parent population mean µ = 16, and its s.d. of 3.59 isvery close to the theoretical σx̄ = σ√

n= 5√

2= 3.54.

15 / 26


Case 1: Original population is normally distributed (with n=25)

The empirical distribution for X̄n=25 is in the lower plot (in blue). Itsmean is very close to the parent population mean µ = 16, and its s.d. ofof 1.0 is the same as the theoretical σx̄ = σ√

n= 5√

25= 1.

16 / 26


x

f(x)

RESULT - If the parent population (the one you are drawing from)is normal , then X̄ will follow a normal distribution for any samplesize n with known mean and variance as show below.

X̄ ∼ N(µ, σ2

n )

17 / 26


Let’s simulate this situation...

Case 2: Original population is NOT normally distributed...

x

f(x)

x

f(x)

x

f(x)

1 Choose a sample of size n from a NON-normal distribution2 Compute x̄3 Plot the x̄ on our frequency histogram4 Do steps 1-3 many time, such as 1000 times5 Draw a histogram of the 1000 x̄ values

(to see the sampling distribution of X̄)

See applet at:http://onlinestatbook.com/stat sim/sampling dist/index.html

18 / 26


Case 2: Original population is NOT normally distributed(with right-skewed parent population and n=10)

The empirical distribution for X̄n=10 is in the lower plot (in blue). Itsbell-shaped with a mean equal to the parent population mean µ = 8.08.Its s.d. of 1.96 is very close to the theoretical σx̄ = σ√

n= 6.22√

10= 1.97.

19 / 26


Case 2: Original population is NOT normally distributed(with very non-normal parent population and n=2)

FAIL!!!! The empirical distribution for X̄n=2 is in the lower plot (in blue)and it is not normally distributed. This is just too small of a sample size toovercome the very non-normal parent population.

20 / 26


Case 2: Original population is NOT normally distributed(with very non-normal parent population and n=25)

The empirical distribution for X̄n=25 is in the lower plot (in blue). Its bell-shaped

with a mean close to the parent population mean µ = 16.92. Its s.d. of 2.46 is

very close to the theoretical σx̄ = σ√n

= 12.29√25

= 2.458.21 / 26


x

f(x)

x

f(x)

x

f(x)

RESULT - If the parent population (the one you are drawing from)is NOT normal , then X̄ will follow an approximate normaldistribution for sufficiently large n (we’ll say n > 25 or 30).

X̄·∼ N(µ, σ

2

n )

This is the Central Limit Theorem.The approximation improves as n increases.

22 / 26


A couple comments:

Averages are less variable than individual observations.

The distribution for X̄ has less variability than the distribution for X.

The distribution of our estimator X̄n is squeezed closer to, or istighter, around the thing we’re trying to estimate as n increases.

For some non-normal distributions, the approximation is pretty goodfor n lower than 25 or 30, so it depends on the parent populationfrom which we are drawing.

23 / 26


The next graphic shows 3 different original populations (one nearlynormal, two that are not), and the sampling distribution for X̄ basedon a sample of size n = 5 and size n = 30.

The three original distributions are on the far left (one that is nearlysymmetric and bell-shaped, one that is right skewed, and one that ishighly right skewed).

The graphic emphasizes the concept that the normal approximationbecomes better as n increases.

24 / 26


As shown in: Navidi, W. ‘Statistics for Engineers and Scientists’, McGraw Hill, 2006 25 / 26


The variability of X̄ decreases as n increasesRecall: V (X̄) = σ2

n .

If the original population has a shape that’s closer to normal, smallern is sufficient for X̄ to be normal.

The normal approximation gets better with larger n when you’restarting with a non-normal population.

Even when X has a very non-normal distribution, X̄ still has anapproximately normal distribution with a large enough n.

26 / 26

Documents

the Central Limit Theorem, Point Estimation & Estimatorshomepage.divms.uiowa.edu/~rdecook/stat2020/notes/ch7_pt1.pdf · statistic of interest calculated from the data when estimating