Download pdf - Chapter 3 Section 3.1: The probability histogramwardrop/courses/lecturenotesch3d.pdfpossible assignments, and that 12,932,920 of these ... 4/7 0.0011 0.0010 Total 1.0000 1 ... a PH

+ +

Chapter 3

Section 3.1: The probability histogram

Below is the probability histogram for the IS

(p. 80 of text):

0.5

1.0

1.51.5755

0.7500

−0.90 −0.30 0.30 0.90

How is it determined?

+ 81

+ +

On a horizontal number line, list all possible

values of x; in this case:

−0.90,−0.70,−0.50,−0.30,−0.10,0.10,0.30,

0.50,0.70,0.90.

Draw rectangles: Each x serves as the center

of the base of its rectangle.

The base of each rectangle equals δ (this means

the rectangles touch, but do not overlap).

The height of the rectangle centered at x is:

P(X = x)/δ.

Below is the verification of two heights given

in our picture:

P(X = 0.10)/0.20 = 0.3151/0.20 = 1.5755

P(X = 0.30)/0.20 = 0.1500/0.20 = 0.7500

+ 82

+ +

Why this strange definition of height?

B/c we want to consider areas.

The area of the triangle centered at x is, of

course, its base times its height:

δ[P(X = x)/δ] = P(X = x).

Here is the main thing to remember:

In a probability histogram, the area of a rect-

angle equals the probability of its center value.

Once we know this, we can ‘see’ the symme-

try in the sampling distribution for the IS.

We are now going to add another adjective

(remember actual?).

The sampling distribution of Chapter 2 will be

called the exact sampling distribution and it

yields the exact P-value.

We say this b/c in Chapter 3 we will learn two

aways to approximate a sampling distribution:

computer simulation and fancy math.

+ 83

+ +

Section 3.2: Computer simulation

In the text, I talk about a Colloquium Study

(CQS). Its data are below:

Treat. S F Total

1 7 7 142 1 13 14

Total 8 20 28

It can be shown that there are 40,116,600

possible assignments, and that 12,932,920 of

these give x = 0 (a = 4). Thus,

P(X = 0) =12,932,920/40,116,600= 0.3224.

The idea of the computer simulation approxi-

mation is quite simple: Perhaps we can obtain

a good approximation to any probability by

looking at some, not all of the assignments.

+ 84

+ +

For example, I looked at 10,000 assignments

for the CQS and discovered that 3267 of them

gave a = 4 and x = 0. Thus, the relative

frequency (RF) of occurrence of 0 is 0.3267,

which is very close to its probability, 0.3224.

But, I am ahead of myself. Once we decide

to look at only some of the assignments, two

questions arise.

1. How many should we look at? The answer

is called the number of runs of the com-

puter simulation. As we shall see, 10,000

is a good choice for the number of runs.

2. Which ones should we look at? Well, to

avoid bias, we select assignments at ran-

dom.

If you want to see more details on this, read

pages 84 and 85 in the text. But you don’t

need to understand this.

+ 85

+ +

Below are the results of my computer simula-

tion study with 10,000 runs for the CQS:

x RF Probability

−4/7 0.0005 0.0010−3/7 0.0143 0.0155−2/7 0.0848 0.0879−1/7 0.2354 0.2345

0 0.3267 0.32241/7 0.2330 0.23452/7 0.0902 0.08793/7 0.0140 0.01554/7 0.0011 0.0010

Total 1.0000 1.0002

First, note that the RFs and probabilities are

‘close.’ (Remember Section 2.4)

We can use the RFs to approximate the P-

value.

The ingredients: the actual x = 3/7, the al-

ternative was >.

+ 86

+ +

Thus, the exact P-value is

P(X ≥ 3/7) = 0.0155 + 0.0010 = 0.0165.

We can approximate this by

RF (X ≥ 3/7) = 0.0140 + 0.0011 = 0.0151.

The picture below is taken from p. 87 of the

text.

1.0

2.0

−4/7 −1/7 2/7

10,000 Runs

1.0

2.0

−3/7 0 3/7

1000 Runs

1.0

2.0

−2/7 1/7 4/7

100 Runs

+ 87

+ +

Results of a simulation experiment with 10,000

runs for the Ballerina study:

Rel. Freq. Rel. Freq. Rel. Freq.x of x of ≤ x of ≥ x

−0.40 0.0009 0.0009 1.0000−0.32 0.0072 0.0081 0.9991−0.24 0.0383 0.0464 0.9919−0.16 0.1137 0.1601 0.9536−0.08 0.2169 0.3770 0.83990.00 0.2591 0.6361 0.62300.08 0.2022 0.8383 0.36390.16 0.1140 0.9523 0.16170.24 0.0383 0.9906 0.04770.32 0.0089 0.9995 0.00940.40 0.0005 1.0000 0.0005

Recall, the P-value is P(X ≤ −0.24) = 0.0477.

We approximate this with RF(X ≤ −0.24) =

0.0464.

+ 88

+ +

Results of a simulation experiment with 10,000

runs for the Crohn’s study:

Rel. Freq. Rel. Freq. Rel. Freq.x of x of ≤ x of ≥ x

−0.46 0.0002 0.0002 1.0000−0.41 0.0005 0.0007 0.9998−0.35 0.0027 0.0034 0.9993−0.29 0.0094 0.0128 0.9966−0.24 0.0289 0.0417 0.9872−0.18 0.0593 0.1010 0.9583−0.12 0.1178 0.2188 0.8990−0.07 0.1540 0.3728 0.7812−0.01 0.1893 0.5621 0.62720.05 0.1724 0.7345 0.43790.10 0.1287 0.8632 0.26550.16 0.0830 0.9462 0.13680.21 0.0345 0.9807 0.05380.27 0.0143 0.9950 0.01930.33 0.0039 0.9989 0.00500.38 0.0010 0.9999 0.00110.44 0.0001 1.0000 0.0001

Recall, the P-value = P(X ≥ 0.27) = 0.0198,which we approximate by RF(X ≥ 0.27) =

0.0193.

+ 89

+ +

Section 3.3: Center and Spread Page 89 of

the text shows probability histograms for four

studies in the text. In each picture, if you sum

the areas of the teal colored rectangles, you

get the P-value.

There are two facts (for all FTs) that are

revealed in these pictures.

• Each picture has one central peak. The

peak can be one or two rectangles wide,

but never three.

• As you move away from the central peak

in either direction, the rectangles become

shorter and shorter.

It is useful to have a concept of a ‘left to

right’ center for such a picture.

+ 90

+ +

Clearly, if the picture is symmetric, then its

center is 0.

To include all pictures, we define the center

of the PH to be its center of gravity.

It can be shown that for every FT, the center

of gravity is 0.

In general, the center of gravity, or mean, of

a PH is denoted by the Greek letter mu: µ.

For FT, µ = 0. (In Chapter 5, we will have

pictures for which µ 6= 0.)

Thus, all FTs are similar in that they all have

µ = 0.

But the pictures on page 89 look very differ-

ent. This is b/c they have different amounts

of spread.

For the four pictures, IS has the most spread,

then CQS, then Soccer and finally CCD has

the least spread. (This is an obvious visual

assessment.)

+ 91

+ +

We need more than a visual assessment of

spread. We need a number that summarizes

the spread in a PH. The number is the stan-

dard deviation of the PH, denoted by the

Greek letter sigma: σ.

There is a simple formula for calculating σ:

σ =

√

m1m2

n1n2(n − 1).

Below are the standard deviations for the four

pictures on page 89. (See text for details.)

Study: IS CQS Soccer CCD

σ: 0.2283 0.1739 0.1418 0.1193

Note that more spread corresponds to a larger

σ.

Why do we want to measure spread? Be pa-

tient please.

+ 92

+ +

Recall that X is the test statistic for FT. It is

also called a random variable.

Let X be any random variable, with mean µ

and standard deviation σ. Define the stan-

dardized version of X to be Z, where

Z =X − µ

σ.

Transforming X to Z is called ‘standardizing

X.’

The observed value of Z is denoted by z and

is computed by:

z = (x − µ)/σ.

For FT, b/c µ = 0:

Z = X/σ and z = x/σ.

Data → X → x → Z → z

+ 93

+ +

In Chapter 2, we learned about the sampling

distribution for X.

In a similar way, Z has a sampling distribution.

Assuming we have the sampling distribution

for X, it is very easy to get the sampling dis-

tribution for Z.

We will illustrate with the IS.

Recall for the IS: σ = 0.2283 and the possible

values of x are: −0.90,−0.70 . . . ,0.90.

Thus, z = x/0.2283 and we get all possible

values of z by taking every possible value of x

and dividing it by 0.2283.

Namely, −0.90/0.2283 = −3.94,

−0.70/0.2283 = −3.07 . . .

0.90/0.2283 = 3.94.

+ 94

+ +

The probabilities for the z’s are automatic.

For example, the event that z = −3.94 is the

same event as x = −0.90, so they have the

same probability.

To summarize, given the sampling distribution

for X, it is easy to get the sampling distribu-

tion for Z. You don’t need to worry about

reproducing these details; this result is simply

to motivate what happens next.

B/c Z has a sampling distribution, we can

draw its probability histogram and I have done

so on page 94 of the text. (Ignore the teal

color and smooth curve on p. 94.)

Compare the pictures for the IS on pages 89

and 94 of the text: These pictures have the

same shape and both are centered at 0. They

differ in their spreads.

+ 95

+ +

I repeated the above procedure (standardiz-

ing) for the other three probability histograms

on p. 89. The details will not be given. The

pictures I get for the three Z’s are on pages

95 and 96.

As with the IS, if you compare each picture

for X on page 89 with its picture for Z, you

will find that the two pictures have the same

shape and both are centered at 0, but they

have different spreads.

Let’s go back to the picture for Z for the IS

on page 94.

B/c Z is equivalent to X, we may use X or Z

to find the P-value. To make this precise:

P(X ≥ 0.30) = P(Z ≥ 1.31) and

P(X ≤ −0.30) = P(Z ≤ −1.31).

+ 96

+ +

Thus, the area of the teal-shaded rectangles

on page 94 is the P-value for the IS.

Similarly, the area of the teal-shaded rectan-

gles is the P-value for the three pictures on

pages 95 and 96.

Now, look at the four pictures on pages 94–

96. They look very similar. Why? B/c: the

mean of each picture is 0 and the standard

deviation of each picture is 1.

This is what standardizing does to a pic-

ture: It creates a new picture which has the

same shape as the old picture. In addition,

the new picture is centered at 0 and is scaled

to have a standard deviation of 1.

+ 97

+ +

Working with X, different data sets give very

different pictures (see page 89), but working

with Z, different data sets give very similar

pictures.

But why do we desire similar pictures?

This is where the smooth curve enters the

argument.

B/c the Z pictures are similar, we can use

one curve to as an approximation to each of

them.

HH

HH

HH

HH

HH

1.31

A

BC

Exact Area = B + C

Approximate Area = A + B

If A = C, then the approx. is perfect.

+ 98

+ +

The smooth curve is called the standard nor-

mal curve (snc).

The approximation method motivated above

is useful only if it is easy to find areas under

the snc.

0.1

0.2

0.3

0.4

0.5

0 1 2 3−1−2−3..................................................................................

.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Two facts: The snc is symmetric around 0

and its total area is 1.

+ 99

+ +

Suppose that I want to find the area under

the snc to the right of, say, z = 1.34.

Good news: The table in the front of the

book is designed to answer this question.

First, take the z and divide it into two pieces:

1.3 and 0.04.

Then go to the table as shown below.

z 0.01 . . . 0.04 . . .

0.00.1. . .1.3 0.0901. . .

+ 100

+ +

For another example, suppose we want the

area under the snc to the right of z = −1.56.

Break z into −1.5 and 0.06.

z 0.01 . . . 0.06 . . .

−3.5−3.4. . .−1.5 0.9406. . .

Suppose we want the area under the snc to

the right of z = 1.47286. The difficulty is:

What to do with the extra digits? In my class,

just round-off the z to 1.47 and proceed as

above.

Finally, suppose we want the area under the

snc to the right of z = 3.98. The difficulty is

that the table goes no higher than 3.59. But

read the 3.5 row in the table (does it remind

you of the movie ‘The Shining?’).

Fact: For any z > 3.59, the area under the

snc to the right of z is ≤ 0.0002.

+ 101

+ +

What is the area under the snc to the left of

z = −1.27?

Thinking like the MITK, use symmetry to re-

alize that the area to the left of z = −1.27

equals the area to the right of z = +1.27,

and we know how to find this latter area (it

is 0.1020).

Fact: The area under the snc to the left of

any z equals the area under the snc to the

right of −z.

Now we are ready to use the snc to obtain an

approximate P-value for FT.

Consider again the CCD. Recall that the exact

P-value is 0.0198 and, symbolically, is

P(X ≥ 0.2711). Also, recall that for the CCD,

σ = 0.1193. Thus,

P(X ≥ 0.2711) = P(X/σ ≥ 0.2711/0.1193) =

P(Z ≥ 2.27).

+ 102

+ +

Look at the picture on p. 96 of the text. This

P-value is the area of the rectangle centered

at 2.27 plus the areas of all the rectangles to

the right of this rectangle. (Only one of these

rectangles can be seen on p. 96).

It can be shown (more on this later) that the

rectangle centered at 2.27 has for its end-

points: 2.04 and 2.51 (these look ‘funny’ b/c

of round-off error).

The snc approximation to this P-value should

be the area under the snc to the right of 2.04,

which is 0.0207. This is a pretty good approx-

imation, almost as good as we obtained with

computer simulation (that was 0.0193).

BUT now I must tell you something strange

about the rest of the world, outside this class-

room. Every other book I have seen says,

‘It’s too much work to find the 2.04 above,

let’s just calculate the area to the right of

z = 2.27!’ I am NOT making this up!

+ 103

+ +

The area to the right of 2.27 is 0.0116, which

is a horrible approximation of 0.0198.

To summarize, every other book I have seen,

after spending all of Chapter 2 convincing you

that the P-value is important, gets to Chapter

3 and says, ‘It is too much trouble to approx-

imate it accurately.’

For now, let’s focus on what all the lazy peo-

ple do. Whereas the method gives a lousy

answer, it is very easy to use.

1. Calculate z = x/σ.

2. For the alternative >:

The approximate P-value is the area under

the snc to the right of z.

+ 104

+ +

For the alternative < the exact P-value is

P(X ≤ x) = P(Z ≤ z), where z = x/σ.

Thus, the lazy approximation is the area under

the snc to the left of z, which is equal to the

area under the snc to the right of −z.

Finally, for the alternative 6=, we know that

the P-value has two pieces to it. The approx-

imation is:

Twice the area under the snc to the right of

|z|.

This three-part rule (one for each alternative)

is presented on p. 102 of the text.

Its main advantage is that we calculate the

same z regardless of alternative and then look

up either z, −z or |z| in the table. Remember-

ing to double the area we find for the alterna-

tive 6=. Its main disadvantage is that it often

gives horribly inaccurate approximations.

+ 105

+ +

Of course, nobody else calls this the lazy method.

They call it the method.

Mostly, they don’t acknowledge the existence

of an exact P-value, thus avoiding the issue

that their approximation is bad.

If anything, they call the lazy method the

approximation ‘without the continuity correc-

tion’ and my better method the approxima-

tion ‘with the continuity correction.’

There is an additional bit of perversity go-

ing on: the approximation with the continuity

correction given in all books that give it is ac-

tually a very bad correction and gives answers

barely different from the ‘without.’ This helps

them discount people like me who want to im-

prove on their answers: by presenting an inef-

fective improvement, they send the message

that improvements are not needed!

In their defense it is true that for very large

studies the continuity correction does not change

the answer much.

+ 106

+ +

Thus, one can reasonably take the following

approach: For small studies use the website

to get the exact P-value and for larger studies

if you use the snc, it does not matter much

whether you use the continuity correction.

(BTW, I am convinced that after some thresh-

old for being a ‘large study’ the website that

we use actually uses an snc approximation.)

When I have managed to challenge, in person,

somebody who teaches the lazy method they

always say, ‘Oh, we only use the snc when the

study is too large to get the exact P-value.’

But they taught the lazy method before there

were computers readily available to obtain the

exact answer.

Now I am going to, perhaps, surprise you.

On our homework and especially on the midterm,

I encourage you to use the lazy method. (Dis-

cuss.)

+ 107

+ +

If you submit project 1 (A or B) you will need

to use the approximation with the continuity

correction. Its steps are outlined on page 3

of the ‘Course Notes’ on the course webpage.

(Discuss projects briefly.)

Consider BSS. Recall that x = −0.24, the al-

ternative is <, the exact P-value is 0.0477.

Also,

σ =

√

38(12)

25(25)(49)= 0.1220.

Following page 3 of the Course Notes,

g = δ/2 = 50/[2(25)(25)] = 0.04.

Thus, x2 = x + g = −0.24 + 0.04 = −0.20,

which standardized gives z2 = −0.20/0.1220 =

−1.64. We look up −z2 = 1.64 and find that

the approximate P-value is 0.0505.

W/o the c/c, z = −0.24/0.1220 = −1.97,

giving an approximate P-value of 0.0244.

+ 108

+ +

Let’s look at CCD again. As I mentioned ear-

lier in lecture, b/c of the nasty numbers for

n’s we need to be more precise in our compu-

tations. In particular,

p̂1 = 22/37 = 0.5946, p̂2 = 11/34 = 0.3235,x = 0.5946 − 0.3235 = 0.2711,g = δ/2 = 71/[2(37)(34)] = 0.0282.

Thus,

x1 = x − g = 0.2711 − 0.0282 = 0.2429 and

z1 = 0.2429/0.1193 = 2.04

Finally, consider IS as an example of 6=. Recall

that δ = 0.20, so g = 0.10. First, we compare

|x| to g. If |x| ≤ g then the exact P-value is 1

and no approximation is needed. In this case,

x = 0.30, so we must continue.

Next, x3 = |x| − g = 0.30 − 0.10 = 0.20. As

stated earlier, σ = 0.2283, making

z3 = 0.20/0.2283 = 0.88. The area to the

right of 0.88 is 0.1894; doubling this, we get

0.3788 as the approximation to the exact 0.3698.

W/o the c/c, z = 0.30/0.2283 = 1.31 giving

an approximation of 2(0.0951) = 0.1902.

+ 109

+ +

One somewhat positive comment about the

lazy method.

If all the margins are really large, w/ and w/o

the c/c give about the same answers. For

example, take

n1 = n2 = 5000, m1 = 4000 and x = 0.02.

One can verify σ = 0.0098 and g = 0.0001

With the c/c, z = 0.0199/0.0098 = 2.03 and

w/o it, z = 0.02/0.0098 = 2.04.

But this is a huge study!

Chapter 5

We spent Chapters 2 and 3 examining the

Skeptic’s argument. The Skeptic makes no

attempt to (formally) extend conclusions be-

yond the subjects in the study.

For example, we concluded that cyclosporine

was superior to placebo for the 71 people

+ 110

+ +

in the study. We concluded that Julie was

better spinning right than left for the 50 trials

in her study.

Typically, for better or worse, researchers want

to extend their conclusions to a ‘larger situa-

tion.’

There are many techniques for such exten-

sions, but central to every one is the notion

of a population.

For our purposes, there are two types of pop-

ulations: finite and infinite. And the word

infinite here means it is not what we call a

finite population. (?)

A finite population is a well-defined collection

of individuals.

Examples: All persons who will vote in this

year’s presidential election; all persons eligi-

ble to vote in this year’s presidential election;

all persons in this room; all persons enrolled

for one or more credits this semester at UW-

Madison.

+ 111

+ +

We need a way to think about a finite popu-

lation.

Imagine a box of cards, called the population

box. Each member of the population has a

card in the box. On the member’s card are

the values of one or more variables, or fea-

tures, of the member. (Same features for all

members.)

For simplicity, we begin with one dichotomous

feature per card.

As in Chapter 1, the possible values of the fea-

ture are labeled success and failure. A ‘1’ on

a card denotes that that member is a success

for the feature, and a ‘0’ denotes a failure.

Thus, every card has a ‘1’ or a ‘0’ on it.

Statisticians paraphrase ‘Snoopy’ who once

famously said: I love mankind, it’s people I

can’t stand!

That is, statisticians are interested in the box

in totality, not in any particular member’s card.

+ 112

+ +

For a given population box:

• Let s denote the number of cards in the

box marked ‘1.’

• Let f denote the number of cards in the

box marked ‘0.’

• Let N = s+ f denote the total number of

cards in the box.

• Let p = s/N denote the proportion of

cards in the box marked ‘1.’

• Let q = f/N denote the proportion of

cards in the box marked ‘0.’

+ 113

+ +

These five numbers, s, f, N, p and q tell us

what is in the box. And, of course, knowl-

edge of two of these numbers, N and p, allow

us to determine the others. As a result, I will

describe a box as Box(N ; p).

For example, Box(10;0.60) is a box with N =

10 cards, of which a proportion p = 0.60 of

them are successes.

In practice, a researcher does not know p and

often does not even know N . There are two

ways for a researcher to ‘learn’ about what is

in the population box.

First, is a census; this means one examines

every card in the box. Discuss.

Second, is a survey; this means one examines

only some of the cards in the box. Discuss.

We will focus on surveys. The cards actually

examined in a survey comprise the sample.

+ 114

+ +

A sample is called representative if it looks

like the box. Every (honest) researcher wants

a representative sample, but, alas, there is no

way to guarantee getting one.

Let’s consider this idea of representative again.

Suppose that two researchers, A and B, each

select samples of size n = 5 from Box(N ; 0.60).

Below are their samples:

A: 1, 1, 0, 1, 1

B: 0, 1, 1, 0, 1.

Which sample is representative? B’s b/c its

p̂ = p.

Discuss Bill Clinton’s cabinet.

We cannot guarantee a representative sam-

ple, so we advocate selecting what is called

a random sample. A random sample (much

like randomization) is a process and does not

guarantee a sample that is representative, or

necessarily even ‘close’ to representative.

+ 115

+ +

Its great virture is that a random sample al-

lows us to calculate the probability that we

will obtain a sample that is close to rep-

resentative.

As stated earlier, in practice, a researcher does

not know p and often does not even know N .

But, for now, let’s assume that we know both

of these numbers.

Consider the chance mechanism of selecting

n cards at random from Box(N ; p).

Imagine that we select the cards one-at-a-

time.

But once we think of selecting the cards one-

by-one, two ways of sampling come to mind:

—Without replacement (smart), and

—With replacement (dumb).

Why do I label these smart and dumb? Dis-

cuss.

+ 116

+ +

Two probability histograms for X, the num-

ber of successes in a sample of size n = 10

from Box(1000;0.60) Solid [Dashed] rectan-

gles are for a random sample with [without]

replacement.

2 4 6 8 10

0.050.100.150.200.25

As above, but from Box(20;0.60).

2 4 6 8 10

0.050.100.150.200.250.300.35

+ 117

+ +

Note that in the first of these pictures, prob-

abilities are the ‘same’ for both methods of

sampling, but for the second picture the prob-

abilities are quite different.

Also, note that the probabilities are ‘better’

for the smarter method of sampling.

Finally, note that for the dumb way of sam-

pling, the probabilities do not depend on the

value of N .

The key is the ratio, n/N of sample size to

population size. In the first example, this ratio

is 10/1000 = 0.01 and in the second example,

10/20 = 0.50.

The general guideline is: If n/N ≤ 0.05, then

probabilities calculated w/replacement are ap-

proximately equal to probabilities calculated

w/o/replacement.

+ 118

+ +

Why does any of this matter?

Well, it turns out that it is much easier, both

computationally and theoretically, to work with

the probabilities for the dumb way of sam-

pling. Thus, regardless of how you select a

random sample, provided n/N ≤ 0.05, it is

valid to calculate probabilities the easy way.

Extended enrichment example:

It is sometimes better to sample the dumb

way.

Recall the CCD study. There are

N = 2.1 × 1020 possible assignments of sub-

jects to treatments.

Let each possible assignment be a ‘pop. mem-

ber’ and as a group they form our (very very

large) finite population. Define an assignment

to be a success if it would yield x ≥ 0.27. (Re-

member, the actual x = 0.27.)

+ 119

+ +

Thus, the p for this population box equals

0.0198, the P-value for the FT.

Our computer simulation experiment selected

a sample of n =10,000 assignments in the

dumb way. But it would be very difficult to

to write a computer program to select assign-

ments the smart way (and it would require lots

of memory and would run slowly). And, as our

above result shows, the smart way and dumb

way of sampling give the same answers.

We will now investigate computing probabil-

ities for the dumb way of sampling (with re-

placement).

Recall, that we plan to (probabilities are al-

ways about the future) select n cards at ran-

dom with replacement from the population

box.

Define X1 to be the number on the first card

selected (0 or 1); X2 to be the number on the

second card selected; . . . ; and Xn to be the

number on the nth (last) card selected.

+ 120

+ +

Let’s begin by computing probabilities for X1.

Clearly,

P(X1 = 1) = s/N = p and

P(X1 = 0) = f/N = q.

We can present these equations in a table:

Value Probability

0 q1 p

This table presents the sampling distribution

for X1. Upon reflection, this table is the sam-

pling distribution for X2, X3, . . . , and Xn.

We summarize by saying that X1, X2, X3,

. . . , and Xn are identically distributed.

So, we can calculate probabilities for any in-

dividual card.

+ 121

+ +

Next, we consider two cards simultaneously.

For example,

P(X1 = 1 and X2 = 0) = P(X1 = 1, X2 = 0).

The result is the multiplication rule for prob-

abilities. In this example, the multiplication

rule says:

P(X1 = 1, X2 = 0) =

P(X1 = 1)P(X2 = 0) = pq. In words, we re-

place the word ‘and’ by the operation of mul-

tiplying. (Similarly, recall, the addition rule

replaced ‘or’ by adding.)

My argument for justifying the multiplication

rule is a little tricky, so I will give it for a

specific example, p = 0.60. For this p,

P(X1 = 1, X2 = 0) = pq = 0.60(0.40) = 0.24.

A brute force argument is given on pages 153–

5 of the text. The argument below appeals

to the long-run interpretation of probability

given in Section 2.4.

+ 122

+ +

Consider the chance mechanism of selecting

two cards at random w/replacement from the

box. Now, imagine operating this chance mech-

anism a large number of times.

We know that in the long-run approximately

60% of the operations will give a first card of

‘1.’ And of those operations that first give a

‘1,’ approximately 40% will give a second card

of ‘0.’ Thus, in the long-run, 40% of 60% of

the operations will give a ‘1’ followed by a ‘0.’.

Next, remember that a percent of a percent

is computed by converting to decimals and

multiplying:

0.40(0.60) = qp = 0.24.

The multiplication rule can be extended in two

directions.

First, it is true for any two cards, not just the

first two. Thus, for example,

P(X3 = 1, X7 = 1) = pp = p2.

Second, it is true for more than two cards, for

example,

P(X3 = 1, X6 = 0, X7 = 1) = pqp = p2q.

+ 123

+ +

Do you recognize 4!?

This is read 4-factorial (you don’t need to

shout) and it is calculated by:

4! = 4(3)(2)(1) = 24.

Similarly,

3! = 3(2)(1) = 6 and

5! = 5(4)(3)(2)(1) = 120.

These guys get big fast; for example,

50! = 3.04 × 1064. By special definition,

0! = 1.

Now, define X to be X1 + X2 + . . . + Xn.

Literally, X is the sum of the numbers on the

n cards selected. But b/c each card has a ‘1’

or a ‘0,’, X can be interpreted as the total

number of successes in the sample.

The variable X is very important in scientific

applications. Thus, we would like to know its

sampling distribution. Fortunately, there is a

simple (?) formula for it, given on page 159

of the text:

+ 124

+ +

P(X = x) =n!

x!(n − x)!pxqn−x,

for x = 0,1, . . . , n.

This is a pretty amazing formula. It works

for any choice of n and any value of p. It is

called the Binomial sampling distribution with

parameters n and p, written Bin(n, p).

In this class, you need to be able to evaluate

this formula with a hand calculator for n ≤ 6.

For example, suppose we select n = 5 cards

at random with replacement from a box with

p = 0.60. What is the probability we will get

a representative sample?

First, we realize that p̂ = x/5 will equal p =

0.60 if, and only if, x = 3. Thus, we want to

calculate P(X = 3):

P(X = 3) =5!

3!(5 − 3)!(0.60)3(0.40)5−3 =

+ 125

+ +

120

6(2)(0.216)(0.16) = 0.3456.

For another example, for the same n and box,

let’s calculate P(X = 5). First, note that we

can calculate this by using the multiplication

rule. The event (X = 5) means that every

card is a ‘1.’ Thus,

P(X1 = 1, X2 = 1, X3 = 1, X4 = 1, X5 = 1) =

p5 = (0.60)5 = 0.0778.

Using the binomial formula we get:

P(X = 5) =5!

5!(5 − 5)!(0.60)5(0.40)5−5 =

120

120(1)(0.0778)(1) = 0.0778.

Note that we need the definition 0! = 1 so

that the formula works.

+ 126

+ +

There are many statistical software packages

that will compute binomial probabilities for

us. For example, in the text on page 161 I

give the computer-generated probabilities for

Bin(25,0.50).

Once we have binomial probabilities, we can

draw probability histograms for binomials. Sev-

eral such pictures are given on pages 162–3

of the text.

These pictures illustrate the following facts

about the binomial.

*δ = 1; thus, the height of each rectangle

equals the probability of its center.

*Just like the pictures for FT, there is one

central peak—one or two but never more rect-

angles wide—and the probabilities steadily de-

crease as you move away from the peak.

*The ph is symmetric, if and only if, p = 0.50.

*For p 6= 0.50, the ph is roughly symmet-

ric provided that both np and nq are ‘large.’

People disagree on what ‘large’ means; most

say 5 or 10 or 15.

+ 127

+ +

Yes, we can use a computer to calculate bino-

mial probabilities, but a computer should not

be seen as a panacea. For example, if I try

to use my software package for Bin(130,0.50)

I get an error message. Unless a program is

written with extreme care, the accurate cal-

culation of, say, n! for large n is difficult.

Thus, we might want to find a way to ap-

proximate binomial probabilities. Given the

similarities of the binomial ph to the ph for

FT, it is appealing to use the snc as an ap-

proximation.

To this end, note that it can be shown that

for Bin(n, p),

µ = np and σ =√

npq.

Thus, for example, Bin(100,0.50) has:

µ = 100(0.50) = 50 and

σ =√

100(0.50)(0.50) =√

25 = 5.

As a result, it is easy to standardize X:

Z =X − np√

npq.

+ 128

+ +

On pages 164–166 of the text are probability

histograms for three Z’s, each with an snc

for comparison. Visually, it is clear that the

snc can give good approximations for Z, and

hence for X.

The details will not be given and you are not

responsible for them. This approximation will

be used in Chapter 6.

Section 5.3: Bernoulli Trials

Consider the following experiments:

–Julie spins in circles to the right;

–Clyde shoots free throws; and

–Bob repeatedly tosses a coin.

In each experiment, a person is conducting a

sequence of trials.

Consider the following question: Suppose that

on Monday, Clyde attempts 100 free throws

and achieves 77 successes. Clyde plans to at-

tempt 200 free throws on Tuesday.

+ 129

+ +

What is the probability that he will make 150

or more free throws on Tuesday?

In order to answer this and similar questions,

we need a mathematical model for the pro-

cess that generates the results of the trials,

whatever that means.

We begin with a simple sequence of trials:

repeated tosses of a fair coin.

Think about this for a minute. What does it

mean to you when you read, ‘Repeated tosses

of a fair coin?’

In particular, I want us to write down mathe-

matical assumptions that describe this notion.

There are three assumptions needed.

+ 130

+ +

1. Each toss results in one of two outcomes:

a heads or a tails.

2. The probability of heads is 0.50 for every

toss.

3. The tosses exhibit ‘no memory.’

We have studied the chance mechanism of

selecting cards at random, with replacement,

from a population box. For this CM, we learned

two very useful techniques: the multiplication

rule for a particular sequence of outcomes and

the binomial sampling distribution for the to-

tal number of successes.

Is there any relationship between the assump-

tions above and selecting cards from a box?

+ 131

+ +

Well, imagine a box with two cards, one card

marked ‘1’ for heads and the other marked

‘0’ for tails. Suppose that we select cards at

random with replacement from this box.

I claim that this selection of cards from this

box satisfies the three assumptions given above.

Discuss.

Thus, we can perform the following compu-

tations:

–If I toss a fair coin four times, the probability

I get all heads is:

P(H, H, H, H) = (0.50)4 = 0.0625.

–If I toss a fair coin eight times, the proba-

bility that I get a total of exactly six heads

is:8!

6!2!(0.50)8 = 0.1094.

Next, we generalize the above assumptions.

+ 132

+ +

Suppose that we have a sequence of trials. If

they satisfy the following three assumptions,

then we say that we have Bernoulli Trials

(BT).

1. Each trial results in one of two outcomes:

a success or a failure.

2. The probability of success equals p for ev-

ery trial.

3. The trials exhibit ‘no memory.’

As argued above for a coin, BT are mathe-

matically equivalent to selecting cards at ran-

dom with replacement from Box(N ; p).

For example, Katie is a very good free throw

shooter. On the assumption that Katie’s free

throws are BT with p = 0.85,

+ 133

+ +

we can calculate the following probabilities.

–If Katie shoots three free throws, the prob-

ability she makes all three is:

P(S, S, S) = (0.85)3 = 0.6141.

–If Katie shoots ten free throws, the proba-

bility she makes a total of exactly nine is:

10!

9!1!(0.85)9(0.15) = 0.3474.

To summarize if we are told that we have BT

and we are told the value of p, then we can

calculate probabilities about the outcomes of

the trials.

What are the difficulties with this?

Well, education should be more than learning

to obey authority figures! I acknowledge this

above by prefacing my computation by saying,

‘On the assumption . . . are BT,’ but we can

(sometimes) do more.

+ 134

+ +

If we have previous outcomes of the trials,

we can use these data to investigate (not

determine!) whether the assumptions of BT

seem reasonable.

I offer two ways to investigate. One way

is designed to explore the second assump-

tion (constancy of success probability) and

the other focuses on the third assumption

(lack of memory).

To be precise, suppose we have observed the

following results of n = 10 trials:

1 0 0 0 1 1 0 0 0 1.

We investigate constancy by creating the fol-

lowing table:

Half S F Total p̂1 2 3 5 0.402 2 3 5 0.40

Total 4 6 10

+ 135

+ +

We need to be careful. We want to know

whether p remains constant, but we never get

to see p. We can see the p̂’s. For the table

above,

p̂1 = p̂2 = 0.40.

B/c the p̂’s do not change from the first half

to the second half, there is no evidence that

p has changed.

Of course, this argument is not conclusive.

We could make it more formal, but that is

not my goal. I encourage you to think of this

as an ‘informal hypothesis test’ for which the

null hypothesis is that p remains constant.

Let’s look at the trials again:

1 0 0 0 1 1 0 0 0 1.

Do you see memory?

Well the last five trials yield exactly the same

outcomes as the first five, so this looks like

memory. But it could be simply the result of

chance.

+ 136

+ +

But now you need to be a scientist. Does

this ‘5 step’ memory make any sense to you?

Discuss.

Often, what makes sense is ‘1 step’ memory;

i.e. memory in which a current trial is influ-

enced by the outcome of the trial immediately

before it.

We take the 10 trials and form 9 ‘overlapping

pairs.’ (It is too difficult to do this w/my word

processor. I will illustrate it on the board.)

We create the following table. Note that in

this table we are counting pairs not trials.

Previous Current TrialTrial S F Total p̂1 1 2 3 0.332 2 4 6 0.33

Total 3 6 9

+ 137

+ +

B/c the two p̂’s are equal, there is no evidence

of 1 step memory. Discuss.

I want to show you an easier and faster way

to create the above memory table.

First, we know that the number of pairs will

be (n−1), which is nine for these data. Also,

we know from previously comparing halves that

there is a total of four successes and six fail-

ures. We put these numbers in the margins,

yielding the partial table below.

Previous Current TrialTrial S F Total p̂1 42 6

Total 4 6 9

A difficulty with this table, of course, is that

4 + 6 = 10, not 9. But look at the first trial

in the sequence, a success.

+ 138

+ +

The first trial appears in only one pair b/c it

has no trial before it. In other words, the first

trial never ‘gets to be’ current in the mem-

ory table. Thus, even though there are four

successes in the sequence, only three of them

are current. Thus, we change the column to-

tal for S to 3.

Similarly, the last trial, also an S, never gets

to be previous. Thus, we subtract one from

the row total for S to get the following partial

table, which has the correct margins.

Previous Current TrialTrial S F Total p̂1 32 6

Total 3 6 9

Now, we simply determine by counting one of

the entries in the table, and obtain the others

by subtracting.

For example, the pair SS (11) occurs exactly

once in the sequence.

+ 139

+ +

Here is another example with n = 20:

1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0.

These data yield the following tables.

Half S F Total p̂1 10 0 10 1.002 0 10 10 0.00

Total 10 10 20

Previous Current TrialTrial S F Total p̂1 9 1 10 0.902 0 9 9 0.00

Total 9 10 19

Discuss.

Over the years, I have had trouble convinc-

ing my students that BT exist in the world.

(Math majors are extremely willing to believe

assumptions; nonmath majors are not.)

I decided to use the video game Tetris to con-

vince my students that BTs do exist. Sadly,

as we will see, I failed miserably.

+ 140

+ +

I collected these data circa 1990 using a Nin-

tendo system played on my television.

A trial is a ‘shape’ of blocks falling from the

top of the screen.

First difficultly: There are 7, not 2 shapes.

Solution: Define ‘log’ to be an S; all others

are F’s.

I played 8 games and observed n = 1872 tri-

als. Instead of dividing the trials into halves,

I created the following table:

Play S F Total p̂First 22 171 193 0.114Second 33 185 218 0.151Third 36 215 251 0.143Fourth 25 206 231 0.108Fifth 30 198 228 0.132Sixth 42 220 262 0.160Seventh 33 215 248 0.133Eighth 33 208 241 0.137

Total 254 1618 1872 0.136

+ 141

+ +

•• •

• • • • •................................................................................................................................................................................................

..............................................................................................................................................................

1 2 3 4 5 6 7 8 Play0.000.040.080.120.16

Looking at this table and its plot, I was feeling

pretty good. There is some evidence that p

is not constant, but the evidence seems weak

to me. (A formal HT of the null hypothesis

that p is constant versus the alternative that

it can change every game—see Chapter 11 for

details—gives a very large P-value, 0.751.)

But then I created the memory table:

Current ShapePrevious Shape S F Total p̂Success 2 251 253 0.008Failure 249 1362 1611 0.155

Total 251 1613 1864 0.135

+ 142

+ +

Note: The grand total for this table is (n−8)

b/c we lose one pair for each play. (I believe

that it does not make sense to look for mem-

ory by pairing the last trial of one game with

the first trial of the next game.)

It is obvious (?) that there is memory. (The

P-value is

1.1 × 10−14.)

One of my favorite projects: Describe the pet-

turtle study.

Section 5.4: Some Practical Considera-

tions

The big result of Chapter 5 is the multiplica-

tion rule, from which we obtain the Binomial

and many other results not discussed in this

course.

For a finite population, we get the MR if we

sample at random w/replacement and for an

infinite population if we have BTs.

+ 143

+ +

B/c humans, and I would say especially math-

ematicians, like answers, there is a tremen-

dous pressure to assume/pretend/deceive that

we have the MR.

In my experience, it is very common for re-

searchers to claim that they have a random

sample from a finite population, even when

they clearly do not.

Examples: Pretty much every survey you ever

read about. My Wisconsin DOT study. (Dis-

cuss.)

Birthdays. Below is a very famous table in

probability theory.

+ 144

+ +

n Prob. n Prob. n Prob.

2 0.0027 13 0.1939 24 0.53743 0.0082 14 0.2226 25 0.56774 0.0163 15 0.2523 26 0.59725 0.0271 16 0.2829 27 0.62586 0.0404 17 0.3143 28 0.65347 0.0561 18 0.3461 29 0.67998 0.0741 19 0.3783 30 0.70539 0.0944 20 0.4106 31 0.7295

10 0.1166 21 0.4428 32 0.752411 0.1408 22 0.4748 33 0.774012 0.1666 23 0.5063 34 0.7944n Prob. n Prob.

35 0.8135 46 0.947836 0.8313 47 0.954437 0.8479 48 0.960238 0.8633 49 0.965439 0.8775 50 0.970140 0.8905 55 0.986141 0.9025 60 0.994042 0.9134 70 0.999143 0.9234 80 0.999944 0.9324 90 1.000045 0.9405

+ 145

+ +

Probabilities must be computed before the

CM is operated. But despite my feeling this

way, many people try to calculate probabili-

ties after the CM has been operated. In my

experience, the huge majority of such compu-

tations lead to gibberish.

There are three major mistakes that people

make. I call them:

–ignoring failed attempts;

–focusing on a winner; and

–inappropriate use of ELC.

Multiple lotto winner. Said to beat-the-odds

of one in 16 trillion. Discuss.

+ 146

+ +

Discuss batting order example.

It was reported on TV that the probability of

this happening was 1 in 1.317 × 1011.

Let’s put that number in perspective. There

are currently 30 teams in MLB, playing 162

games per year for a total of 2430 games per

year. The above event is, thus, expected to

happen once every 54,190,080 years. I really

doubt that MLB will survive that long!

+ 147

+ +

Yahtzee. Assuming that the five dice are bal-

anced (ELC) and act independently (no mem-

ory from die to die) then the probability of

getting a Yahtzee on a single throw of the

five dice is:

(1/6)4 = 1/1296.

Thus, if you pick up five dice and throw a

Yahtzee, that is pretty great.

But suppose you toss the dice 10 times per

minute for two hours and get a Yahtzee; what

then?

Suppose that 12,960 people each throw five

dice at once. Do you think there will be any

Yahtzees? How many?

+ 148