Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23

Preview:

DESCRIPTION

The Law of Averages (Ch 16) Box Models (Ch 16) Sampling Distribution Probability Histogram (Ch 17) Sampling Distribution Central Limit Theorem (Ch 17, 18) Expected Value (Ch 17, 18)for average (mean), sum, percentage, count - PowerPoint PPT Presentation

Citation preview

6-1Stats

Unit 6

Sampling Distributionsand Statistical Inference - 1

FPP Chapters 16-18, 20-21, 23

The Law of Averages (Ch 16)Box Models (Ch 16)

Sampling Distribution

Probability Histogram (Ch 17)

Sampling Distribution

Central Limit Theorem (Ch 17, 18)

Expected Value (Ch 17, 18) for average (mean), sum, percentage, count

Standard Error (Ch 17, 18) for average (mean), sum, percentage, count

Chance Error

Confidence Intervals (Ch 21) A.05

6-2Stats

The Law of Averages•Toss a coin 10,000 times.•At each toss we expect 50% to be

heads.•At each toss let’s note

–the number of heads–the percentage of heads

6-3Stats

Number of heads

Number of tosses

Number of heads-Expected number of heads

10 50 100 500 1000 5000 10000

-50

-40

-30

-20

-10

0

6-4Stats

Percentage of heads

Number of tosses

Percentage of heads - 50%

10 50 100 500 1000 5000 10000

-10

-5

0

5

10

6-5Stats

The Law of Averages

With a large number of tosses, the percentage of heads is likely to be close to 50%, although it is not likely to be exactly equal to 50%.

6-6Stats

The Law of Averagesdoes NOT say …“The ___________________ team has had such a long string of losses, they are due to get a win. Therefore their chances of winning the next game are greater.”

“I have tossed a coin many times, and now have a string of 5 heads. So the chances of getting tails on the next toss must be greater than 50%.”

6-7Stats

Number of Heads,Chance Error

•Number of heads = 50% of the number of tosses +

chance error

•Can we assess what the chance error is?

6-8Stats

Coin toss example

10100 =

•It turns out that - after 100 tosses, chance error = 5

- after 10,000 tosses, chance error = 50

- increasing the number of tosses by 100 times, chance error increases _______ times.

•Why does the percentage go to 50%?

6-9Stats

Example

We have the choice of tossing a coin 10 times or 100 times. We win if

–we get more than 60% heads.–we get more than 40% heads.–we get between 40% and 60% heads.–we get exactly 50% heads.

Should we toss 10 or 100 times?

6-10Stats

Baseball series

•Team A believes that on any day they have a 60% chance of beating Team B.•They have the option of playing

–1 game, or–best 2 out of 3

•Which format should they choose?

6-11Stats

Where we are headed

•We want to perform a political survey and randomly sample citizens.•We want to quantify the chance variability of our sample. (We don’t want all to be republican).•We can solve variability questions like these by analogy with drawing from a box.

6-12Stats

Making a Box Model

In specifying a box model, we would like to know

- What numbers go into the box

- How many of each kind

- How many draws (sample size)

In practice, what do we really know / not know?

Why do we make box models?

6-13Stats

Variability in the box model

1 2 43 5 6

•Sample 25 tickets with replacement.•Record the sum of the 25 tickets.

3 2 3 2 6 4 6 5 1 5 6 1 5 3 1 3 5 2 4 2 2 6 5 3 4

•Their sum is 89.

6-14Stats

Try again

4 4 6 1 4 1 6 1 5 2 1 4 5 2 1 4 5 2 2 5 4 3 3 2 6

•sum is 83

3 2 3 5 1 4 4 6 5 1 2 1 5 2 1 2 4 3 4 6 1 6 3 1 3

•sum is 78

•Other tries: 82, 92, 71, 73, 90•Range is 25 to 150 but we only observed 71 to 92.

6-15Stats

Roulette

•A roulette wheel has 38 pockets–18 red numbers–18 black numbers–2 green (0 and 00)

•We put a dollar on red. What are the chances of winning?•What numbers are in the box?

6-16Stats

Net gain

•Net gain is the amount that we have won or lost.•Let’s play 10 times…

R R R B G R R B B R+1 +1 +1 –1 –1 +1 +1 –1 –1 +1+1 +2 +3 +2 +1 +2 +3 +2 +1 +2

6-17Stats

So, Our Box Model is …

6-18Stats

6-19Stats

Which game?

You win if you draw a “1”.•A box has 1 “0” ticket and 9 “1” tickets.Or•A box has 10 “0” ticket and 90 “1” tickets.Or•You draw 10 times with replacement. If the sum is 10 then you win.

6-20Stats

Our Box Model is …

6-21Stats

Expected ValueChapt 17

“The expected value for the sum of draws made at random with replacement from a box”

equalsthe expected value for a sample sum

equals

A sample sum is likely to be around its expected value, but to be off by a chance error similar in size to the standard error for sum.

6-22Stats

Standard Error for Sum

The standard error for sum, SE(sum), for a random sample of a given sample size is

.

In FPP, this is

.

samplesize×(populationSD)

numberofdraws×(SDofbox)

6-23Stats

A Sample Sum is Likely ...

The sample sum is likely to be around

____________, give or take

____________or so.

The expected value for the sum, EV(sum), fills the first blank.

The standard error for sum, SE(sum), fills the second blank.

Observed values are rarely more than 2 or 3 SE’s away from the expected value.

6-24Stats

A ReminderThe formulas here are for simple random samples. They likely do not apply to other kinds of samples.

6-25Stats

Example - Keno

In Keno, if you bet on one number, if you win you get $2, if you lose you lose $1.

The chance of winning is ¼________.

What does the box model look like?

What is the expected net gain after 100 plays?

6-26Stats

6-27Stats

ExampleWashington State Lottery

In MegaMillions,you pay $1 to play. You select 5 numbers between 1 and 56, and one MegaBall number between 1 and 46. If you match all 5 numbers AND the MegaBall number, you win the jackpot (starts at $12 million).

The chance of winning is ¼_____.

What does the box model look like?

What is the expected net gain after 100 plays?

6-28Stats

6-29Stats

Washington State Lotterycontinued

Today’s jackpot is ___________.

Suppose you play 10 times.We want to know about your net gain.

What is the relevant box model?

6-30Stats

Washington State Lotterycontinued

What is the expected net gain if you buy 100 tickets?

What does that mean?

What is the standard error for your net gain?

What does that tell us?

6-31Stats

Probability histogram

Earlier in the course we displayed data in histograms.

1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

x

• Probability histograms represent the true (as opposed to the data) chance of an outcome.

• Example: rolling a die

6-32Stats

Sum of two die

2 4 6 8 10 12

0.0

0.10

0.20

0.30

x

2 4 6 8 10 12

0.0

0.10

0.20

0.30

x

2 4 6 8 10 12

0.0

0.10

0.20

0.30

x

2 4 6 8 10 12

0.0

0.10

0.20

0.30

x

100 1,000

10,000 truth

6-33Stats

Empirical vs. truth

After rolling 100 times we see that we never rolled a 2. But we know a 2 is possible.

After rolling 1,000 times the distribution seems more symmetric

After 10,000 the histogram is symmetric.

The empirical histogram converges to the true histogram.

6-34Stats

Caution

There are two counts that may be confused–the number of things added together–the number of repetitions of the

experiment

As the number of repetitions increases, the empirical distribution converges to the true histogram.

What happens when the number of things added together increases?

6-35Stats

Expected ValueChapt 23

“The expected value for the average of draws made at random with replacement from a box”

equalsthe expected value for a sample mean

equals

A sample average (mean) is likely to be around its expected value, but to be off by a chance error similar in size to the standard error for average.

6-36Stats

Standard Error for Average

The standard error for average, SE(avg), for a random sample of a given sample size is

.

In FPP, this is

.

populationSDsamplesize

SD ofboxnumber ofdraws

6-37Stats

A Sample Average is Likely ...

The sample average is likely to be around

__________ _, give or take

____________or so.

The expected value for the average, EV(avg), fills the first blank.

The standard error for average, SE(avg), fills the second blank.

Observed values are rarely more than 2 or 3 SE’s away from the expected value.

6-38Stats

A WarningThe formulas here are for simple random samples. They likely do not apply to other kinds of samples.

6-39Stats

Probability histogramsand the normal curve

Toss a coin 100 times

35 40 45 50 55 60 65

0.0

0.02

0.04

0.06

0.08

Average = 50

SD = 5

6-40Stats

Using the Normal

• A coin is tossed 100 times. Use the normal curve to estimate the chances of–exactly 50 heads (7.96%)

–between 45 and 55 heads inclusive (72.87%)

–between 45 and 55 heads exclusive (63.19%)

• Probability histograms can be difficult to compute but the normal curve is easy.

6-41Stats

Drawing from a lopsided box

Assume that the box has tickets 1,9,5,5,5

2 4 6 8

0.0

0.2

0.4

0.6

x

400 450 500 550

0.0

0.004

0.008

0.012

x

4800 4900 5000 5100 5200 5300

0.0

0.002

0.004

x

6-42Stats

6-43Stats

Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,

And computing the sample sum of draws (net gain),the sample count (# heads), the sample average, orthe sample percent,

the probability histogram will follow a normal curve.

=

6-44Stats

When the sample size is large enough, to use a normal curve to make probability calculations we simply need–the expected value of the sum–(This can tell us about the

)

–the standard error of the sum–(This can tell us about the

)

Central Limit Theorem

6-45Stats

Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,

the probability histogram for the sample sum will follow a normal curve.

The average of this probability histogram is the EV(sum),

and the SD of this probability histogram is SE(sum).

6-46Stats

Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,

And computing the average of draws,

the probability histogram for the sample average (mean) will follow a normal curve.

The average of this probability histogram is the

EV(avg) = the population mean,

and the SD of this probability histogram is SE(avg).

6-47Stats

Using the normal curve

In practice

68% of the time the observed sum will be between expected value 1 SE

95% of the time the observed sum will be between expected value 2 SEs

6-48Stats

Using Normal Curvesto figure probabilities

Example: RouletteThere are 161 students, 3 TA’s, and one professor for this course.

Suppose that we each play ten $1 games of roulette, always betting on red.

Recall that a roulette wheel has 18 red, 18 black, and 2 green pockets.

If the balls lands in a red pocket, we get back our $1 and win an additional $1.

If the ball lands in a black or green pocket, we lose our $1.

6-49Stats

Roulette example

• Box model

• Expected value of sum

• Standard error

• Probability

6-50Stats

A short cut to SE

When there are only two different numbers in the box

number smallithfraction w

number bigithfraction w

numbersmall

numberbig

SD ××⎟⎟⎠⎞

⎜⎜⎝⎛

−=

SD×= draws ofnumber SE

6-51Stats

Classifying & Counting

For percentages or counts (number of occurrences of something), we can use a special Box Model.

For classifying and counting (looking at percentages or counts) use a box with 0’s and 1’s on the tickets.

Tickets marked ‘1’ signify a “special” item.

Tickets marked ‘0’ signify a “non-special” item.

6-52Stats

Classifying & Countingcontinued

What is the average of all of the ticket values in a 0-1 box?

What is the SD of all of the ticket values in a 0-1 box?

6-53Stats

Classifying & Counting

continued furtherWhat is the sum of a sample of n draws from a 0-1 box?

Expected Value for the sum of a sample of n draws from a 0-1 box?

What is the SD for the sum of a sample of n draws from a 0-1 box?

6-54Stats

Expected Value and Standard Error for

Sample CountsWhat is the Expected Value of the number of 1’s drawn from a 0-1 box?(This is the Expected Value for a sample count drawn from a population with _____ “special” items and _______ “non-special” items.)

What is the Standard Error for the count of 1’s drawn from a 0-1 box?

6-55Stats

A Sample Count is Likely ...

The sample count is likely to be around

__________ _, give or take ____________or so.

The expected value for the count, EV(count), fills the first blank.

The standard error for count, SE(count), fills the second blank.

Observed values are rarely more than 2 or 3 SE’s away from the expected value.

6-56Stats

Remember ...The formulas here are for simple random samples. They likely do not apply to other kinds of samples.

6-57Stats

Expected Value and Standard Error for

Sample ProportionsWhat is the Expected Value of the percentage of 1’s drawn from a 0-1 box?(This is the Expected Value for a sample percentage drawn from a population with _____ “special” items and _______ “non-special” items.)

What is the Standard Error for the percentage of 1’s drawn from a 0-1 box?

6-58Stats

A Sample Percentage is Likely ...

The sample percentage is likely to be around

__________ _, give or take ____________or so.

The expected value for the count, EV(%), fills the first blank.

The standard error for count, SE(%), fills the second blank.

Observed values are rarely more than 2 or 3 SE’s away from the expected value.

6-59Stats

Central Limit Theoremfor Percentages & Counts

When drawing a LARGE sample at random with replacement from a box, the probability histogram for the sample percentage will follow a normal curve.

The average of this probability histogram is the

EV(%) = the population %,

and the

SD of this probability histogram is

SE(%) = .

6-60Stats

Central Limit Theoremfor Percentages & Counts

When drawing a LARGE sample at random with replacement from a box, the probability histogram for the sample count will follow a normal curve.

The average of this probability histogram is the

EV(count) =

and the

SD of this probability histogram is

SE(count) =

6-61Stats

Summarizing …

Expected Values and Standard Errors

6-62Stats

Shape of the Sampling Distribution

and Sample SizeWhat happens to the Shape of the Sampling Distribution as the Sample Size gets large?

6-63Stats

Expected Values, Standard Errors, and Sample Size

What happens to Expected Values and Standard Errors as Sample Size increases?

6-64Stats

Summarizing the Central Limit Theorem

As the sample size (# of draws from the box, n) gets large, …

6-65Stats

EstimationBox models:If we know what goes in the box, then we can say how likely various outcomes are.

In practice,We do not know what is in the box.That is,We do not know the population parameters.InsteadWe use data to estimate the population parameters, such as average, %, sd, …

6-66Stats

Confidence IntervalsPoint estimate:

To estimate the population average (mean) with a single value, use

The likely size of your estimation error is

Interval estimate:

To estimate the population average (mean) with an interval of values, the width of your interval depends upon how confident you want to be that your interval includes the population mean.

6-67Stats

Confidence IntervalsA confidence interval is used when estimating an unknown parameter from sample data. The interval gives a range for the parameter - and a confidence level that the range covers the true value.

Chances are in the sampling procedure, not in the parameter.

6-68Stats

Confidence IntervalExample

Pennies

6-69Stats

Confidence IntervalsPoint estimate:

To estimate the population percentage with a single value, use

The likely size of your estimation error is

Interval estimate:

To estimate the population percentage with an interval of values, the width of your interval depends upon how confident you want to be that your interval includes the population percentage.

6-70Stats

Confidence IntervalExample

Pennies

6-71Stats

The Bootstrap

When estimating a population percentage (i.e. when sampling from a 0-1 box), the fraction of 0’s and 1’s in the box is unknown.

The SD of the box can be estimated by substituting the fraction of 0’s and 1’s in the sample for the unknown fractions in the box.

The estimate is good when the sample is reasonably large.

6-72Stats

Basic Method for Constructing

Confidence Intervals

6-73Stats

Interpreting a Confidence Interval

6-74Stats

Margin of Error

6-75Stats

Sample Size Computations

6-76Stats

Recommended