+ +
Chapter 3
Section 3.1: The probability histogram
Below is the probability histogram for the IS
(p. 80 of text):
0.5
1.0
1.51.5755
0.7500
−0.90 −0.30 0.30 0.90
How is it determined?
+ 81
+ +
On a horizontal number line, list all possible
values of x; in this case:
−0.90,−0.70,−0.50,−0.30,−0.10,0.10,0.30,
0.50,0.70,0.90.
Draw rectangles: Each x serves as the center
of the base of its rectangle.
The base of each rectangle equals δ (this means
the rectangles touch, but do not overlap).
The height of the rectangle centered at x is:
P(X = x)/δ.
Below is the verification of two heights given
in our picture:
P(X = 0.10)/0.20 = 0.3151/0.20 = 1.5755
P(X = 0.30)/0.20 = 0.1500/0.20 = 0.7500
+ 82
+ +
Why this strange definition of height?
B/c we want to consider areas.
The area of the triangle centered at x is, of
course, its base times its height:
δ[P(X = x)/δ] = P(X = x).
Here is the main thing to remember:
In a probability histogram, the area of a rect-
angle equals the probability of its center value.
Once we know this, we can ‘see’ the symme-
try in the sampling distribution for the IS.
We are now going to add another adjective
(remember actual?).
The sampling distribution of Chapter 2 will be
called the exact sampling distribution and it
yields the exact P-value.
We say this b/c in Chapter 3 we will learn two
aways to approximate a sampling distribution:
computer simulation and fancy math.
+ 83
+ +
Section 3.2: Computer simulation
In the text, I talk about a Colloquium Study
(CQS). Its data are below:
Treat. S F Total
1 7 7 142 1 13 14
Total 8 20 28
It can be shown that there are 40,116,600
possible assignments, and that 12,932,920 of
these give x = 0 (a = 4). Thus,
P(X = 0) =12,932,920/40,116,600= 0.3224.
The idea of the computer simulation approxi-
mation is quite simple: Perhaps we can obtain
a good approximation to any probability by
looking at some, not all of the assignments.
+ 84
+ +
For example, I looked at 10,000 assignments
for the CQS and discovered that 3267 of them
gave a = 4 and x = 0. Thus, the relative
frequency (RF) of occurrence of 0 is 0.3267,
which is very close to its probability, 0.3224.
But, I am ahead of myself. Once we decide
to look at only some of the assignments, two
questions arise.
1. How many should we look at? The answer
is called the number of runs of the com-
puter simulation. As we shall see, 10,000
is a good choice for the number of runs.
2. Which ones should we look at? Well, to
avoid bias, we select assignments at ran-
dom.
If you want to see more details on this, read
pages 84 and 85 in the text. But you don’t
need to understand this.
+ 85
+ +
Below are the results of my computer simula-
tion study with 10,000 runs for the CQS:
x RF Probability
−4/7 0.0005 0.0010−3/7 0.0143 0.0155−2/7 0.0848 0.0879−1/7 0.2354 0.2345
0 0.3267 0.32241/7 0.2330 0.23452/7 0.0902 0.08793/7 0.0140 0.01554/7 0.0011 0.0010
Total 1.0000 1.0002
First, note that the RFs and probabilities are
‘close.’ (Remember Section 2.4)
We can use the RFs to approximate the P-
value.
The ingredients: the actual x = 3/7, the al-
ternative was >.
+ 86
+ +
Thus, the exact P-value is
P(X ≥ 3/7) = 0.0155 + 0.0010 = 0.0165.
We can approximate this by
RF (X ≥ 3/7) = 0.0140 + 0.0011 = 0.0151.
The picture below is taken from p. 87 of the
text.
1.0
2.0
−4/7 −1/7 2/7
10,000 Runs
1.0
2.0
−3/7 0 3/7
1000 Runs
1.0
2.0
−2/7 1/7 4/7
100 Runs
+ 87
+ +
Results of a simulation experiment with 10,000
runs for the Ballerina study:
Rel. Freq. Rel. Freq. Rel. Freq.x of x of ≤ x of ≥ x
−0.40 0.0009 0.0009 1.0000−0.32 0.0072 0.0081 0.9991−0.24 0.0383 0.0464 0.9919−0.16 0.1137 0.1601 0.9536−0.08 0.2169 0.3770 0.83990.00 0.2591 0.6361 0.62300.08 0.2022 0.8383 0.36390.16 0.1140 0.9523 0.16170.24 0.0383 0.9906 0.04770.32 0.0089 0.9995 0.00940.40 0.0005 1.0000 0.0005
Recall, the P-value is P(X ≤ −0.24) = 0.0477.
We approximate this with RF(X ≤ −0.24) =
0.0464.
+ 88
+ +
Results of a simulation experiment with 10,000
runs for the Crohn’s study:
Rel. Freq. Rel. Freq. Rel. Freq.x of x of ≤ x of ≥ x
−0.46 0.0002 0.0002 1.0000−0.41 0.0005 0.0007 0.9998−0.35 0.0027 0.0034 0.9993−0.29 0.0094 0.0128 0.9966−0.24 0.0289 0.0417 0.9872−0.18 0.0593 0.1010 0.9583−0.12 0.1178 0.2188 0.8990−0.07 0.1540 0.3728 0.7812−0.01 0.1893 0.5621 0.62720.05 0.1724 0.7345 0.43790.10 0.1287 0.8632 0.26550.16 0.0830 0.9462 0.13680.21 0.0345 0.9807 0.05380.27 0.0143 0.9950 0.01930.33 0.0039 0.9989 0.00500.38 0.0010 0.9999 0.00110.44 0.0001 1.0000 0.0001
Recall, the P-value = P(X ≥ 0.27) = 0.0198,which we approximate by RF(X ≥ 0.27) =
0.0193.
+ 89
+ +
Section 3.3: Center and Spread Page 89 of
the text shows probability histograms for four
studies in the text. In each picture, if you sum
the areas of the teal colored rectangles, you
get the P-value.
There are two facts (for all FTs) that are
revealed in these pictures.
• Each picture has one central peak. The
peak can be one or two rectangles wide,
but never three.
• As you move away from the central peak
in either direction, the rectangles become
shorter and shorter.
It is useful to have a concept of a ‘left to
right’ center for such a picture.
+ 90
+ +
Clearly, if the picture is symmetric, then its
center is 0.
To include all pictures, we define the center
of the PH to be its center of gravity.
It can be shown that for every FT, the center
of gravity is 0.
In general, the center of gravity, or mean, of
a PH is denoted by the Greek letter mu: µ.
For FT, µ = 0. (In Chapter 5, we will have
pictures for which µ 6= 0.)
Thus, all FTs are similar in that they all have
µ = 0.
But the pictures on page 89 look very differ-
ent. This is b/c they have different amounts
of spread.
For the four pictures, IS has the most spread,
then CQS, then Soccer and finally CCD has
the least spread. (This is an obvious visual
assessment.)
+ 91
+ +
We need more than a visual assessment of
spread. We need a number that summarizes
the spread in a PH. The number is the stan-
dard deviation of the PH, denoted by the
Greek letter sigma: σ.
There is a simple formula for calculating σ:
σ =
√
m1m2
n1n2(n − 1).
Below are the standard deviations for the four
pictures on page 89. (See text for details.)
Study: IS CQS Soccer CCD
σ: 0.2283 0.1739 0.1418 0.1193
Note that more spread corresponds to a larger
σ.
Why do we want to measure spread? Be pa-
tient please.
+ 92
+ +
Recall that X is the test statistic for FT. It is
also called a random variable.
Let X be any random variable, with mean µ
and standard deviation σ. Define the stan-
dardized version of X to be Z, where
Z =X − µ
σ.
Transforming X to Z is called ‘standardizing
X.’
The observed value of Z is denoted by z and
is computed by:
z = (x − µ)/σ.
For FT, b/c µ = 0:
Z = X/σ and z = x/σ.
Data → X → x → Z → z
+ 93
+ +
In Chapter 2, we learned about the sampling
distribution for X.
In a similar way, Z has a sampling distribution.
Assuming we have the sampling distribution
for X, it is very easy to get the sampling dis-
tribution for Z.
We will illustrate with the IS.
Recall for the IS: σ = 0.2283 and the possible
values of x are: −0.90,−0.70 . . . ,0.90.
Thus, z = x/0.2283 and we get all possible
values of z by taking every possible value of x
and dividing it by 0.2283.
Namely, −0.90/0.2283 = −3.94,
−0.70/0.2283 = −3.07 . . .
0.90/0.2283 = 3.94.
+ 94
+ +
The probabilities for the z’s are automatic.
For example, the event that z = −3.94 is the
same event as x = −0.90, so they have the
same probability.
To summarize, given the sampling distribution
for X, it is easy to get the sampling distribu-
tion for Z. You don’t need to worry about
reproducing these details; this result is simply
to motivate what happens next.
B/c Z has a sampling distribution, we can
draw its probability histogram and I have done
so on page 94 of the text. (Ignore the teal
color and smooth curve on p. 94.)
Compare the pictures for the IS on pages 89
and 94 of the text: These pictures have the
same shape and both are centered at 0. They
differ in their spreads.
+ 95
+ +
I repeated the above procedure (standardiz-
ing) for the other three probability histograms
on p. 89. The details will not be given. The
pictures I get for the three Z’s are on pages
95 and 96.
As with the IS, if you compare each picture
for X on page 89 with its picture for Z, you
will find that the two pictures have the same
shape and both are centered at 0, but they
have different spreads.
Let’s go back to the picture for Z for the IS
on page 94.
B/c Z is equivalent to X, we may use X or Z
to find the P-value. To make this precise:
P(X ≥ 0.30) = P(Z ≥ 1.31) and
P(X ≤ −0.30) = P(Z ≤ −1.31).
+ 96
+ +
Thus, the area of the teal-shaded rectangles
on page 94 is the P-value for the IS.
Similarly, the area of the teal-shaded rectan-
gles is the P-value for the three pictures on
pages 95 and 96.
Now, look at the four pictures on pages 94–
96. They look very similar. Why? B/c: the
mean of each picture is 0 and the standard
deviation of each picture is 1.
This is what standardizing does to a pic-
ture: It creates a new picture which has the
same shape as the old picture. In addition,
the new picture is centered at 0 and is scaled
to have a standard deviation of 1.
+ 97
+ +
Working with X, different data sets give very
different pictures (see page 89), but working
with Z, different data sets give very similar
pictures.
But why do we desire similar pictures?
This is where the smooth curve enters the
argument.
B/c the Z pictures are similar, we can use
one curve to as an approximation to each of
them.
HH
HH
HH
HH
HH
1.31
A
BC
Exact Area = B + C
Approximate Area = A + B
If A = C, then the approx. is perfect.
+ 98
+ +
The smooth curve is called the standard nor-
mal curve (snc).
The approximation method motivated above
is useful only if it is easy to find areas under
the snc.
0.1
0.2
0.3
0.4
0.5
0 1 2 3−1−2−3..................................................................................
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Two facts: The snc is symmetric around 0
and its total area is 1.
+ 99
+ +
Suppose that I want to find the area under
the snc to the right of, say, z = 1.34.
Good news: The table in the front of the
book is designed to answer this question.
First, take the z and divide it into two pieces:
1.3 and 0.04.
Then go to the table as shown below.
z 0.01 . . . 0.04 . . .
0.00.1. . .1.3 0.0901. . .
+ 100
+ +
For another example, suppose we want the
area under the snc to the right of z = −1.56.
Break z into −1.5 and 0.06.
z 0.01 . . . 0.06 . . .
−3.5−3.4. . .−1.5 0.9406. . .
Suppose we want the area under the snc to
the right of z = 1.47286. The difficulty is:
What to do with the extra digits? In my class,
just round-off the z to 1.47 and proceed as
above.
Finally, suppose we want the area under the
snc to the right of z = 3.98. The difficulty is
that the table goes no higher than 3.59. But
read the 3.5 row in the table (does it remind
you of the movie ‘The Shining?’).
Fact: For any z > 3.59, the area under the
snc to the right of z is ≤ 0.0002.
+ 101
+ +
What is the area under the snc to the left of
z = −1.27?
Thinking like the MITK, use symmetry to re-
alize that the area to the left of z = −1.27
equals the area to the right of z = +1.27,
and we know how to find this latter area (it
is 0.1020).
Fact: The area under the snc to the left of
any z equals the area under the snc to the
right of −z.
Now we are ready to use the snc to obtain an
approximate P-value for FT.
Consider again the CCD. Recall that the exact
P-value is 0.0198 and, symbolically, is
P(X ≥ 0.2711). Also, recall that for the CCD,
σ = 0.1193. Thus,
P(X ≥ 0.2711) = P(X/σ ≥ 0.2711/0.1193) =
P(Z ≥ 2.27).
+ 102
+ +
Look at the picture on p. 96 of the text. This
P-value is the area of the rectangle centered
at 2.27 plus the areas of all the rectangles to
the right of this rectangle. (Only one of these
rectangles can be seen on p. 96).
It can be shown (more on this later) that the
rectangle centered at 2.27 has for its end-
points: 2.04 and 2.51 (these look ‘funny’ b/c
of round-off error).
The snc approximation to this P-value should
be the area under the snc to the right of 2.04,
which is 0.0207. This is a pretty good approx-
imation, almost as good as we obtained with
computer simulation (that was 0.0193).
BUT now I must tell you something strange
about the rest of the world, outside this class-
room. Every other book I have seen says,
‘It’s too much work to find the 2.04 above,
let’s just calculate the area to the right of
z = 2.27!’ I am NOT making this up!
+ 103
+ +
The area to the right of 2.27 is 0.0116, which
is a horrible approximation of 0.0198.
To summarize, every other book I have seen,
after spending all of Chapter 2 convincing you
that the P-value is important, gets to Chapter
3 and says, ‘It is too much trouble to approx-
imate it accurately.’
For now, let’s focus on what all the lazy peo-
ple do. Whereas the method gives a lousy
answer, it is very easy to use.
1. Calculate z = x/σ.
2. For the alternative >:
The approximate P-value is the area under
the snc to the right of z.
+ 104
+ +
For the alternative < the exact P-value is
P(X ≤ x) = P(Z ≤ z), where z = x/σ.
Thus, the lazy approximation is the area under
the snc to the left of z, which is equal to the
area under the snc to the right of −z.
Finally, for the alternative 6=, we know that
the P-value has two pieces to it. The approx-
imation is:
Twice the area under the snc to the right of
|z|.
This three-part rule (one for each alternative)
is presented on p. 102 of the text.
Its main advantage is that we calculate the
same z regardless of alternative and then look
up either z, −z or |z| in the table. Remember-
ing to double the area we find for the alterna-
tive 6=. Its main disadvantage is that it often
gives horribly inaccurate approximations.
+ 105
+ +
Of course, nobody else calls this the lazy method.
They call it the method.
Mostly, they don’t acknowledge the existence
of an exact P-value, thus avoiding the issue
that their approximation is bad.
If anything, they call the lazy method the
approximation ‘without the continuity correc-
tion’ and my better method the approxima-
tion ‘with the continuity correction.’
There is an additional bit of perversity go-
ing on: the approximation with the continuity
correction given in all books that give it is ac-
tually a very bad correction and gives answers
barely different from the ‘without.’ This helps
them discount people like me who want to im-
prove on their answers: by presenting an inef-
fective improvement, they send the message
that improvements are not needed!
In their defense it is true that for very large
studies the continuity correction does not change
the answer much.
+ 106
+ +
Thus, one can reasonably take the following
approach: For small studies use the website
to get the exact P-value and for larger studies
if you use the snc, it does not matter much
whether you use the continuity correction.
(BTW, I am convinced that after some thresh-
old for being a ‘large study’ the website that
we use actually uses an snc approximation.)
When I have managed to challenge, in person,
somebody who teaches the lazy method they
always say, ‘Oh, we only use the snc when the
study is too large to get the exact P-value.’
But they taught the lazy method before there
were computers readily available to obtain the
exact answer.
Now I am going to, perhaps, surprise you.
On our homework and especially on the midterm,
I encourage you to use the lazy method. (Dis-
cuss.)
+ 107
+ +
If you submit project 1 (A or B) you will need
to use the approximation with the continuity
correction. Its steps are outlined on page 3
of the ‘Course Notes’ on the course webpage.
(Discuss projects briefly.)
Consider BSS. Recall that x = −0.24, the al-
ternative is <, the exact P-value is 0.0477.
Also,
σ =
√
38(12)
25(25)(49)= 0.1220.
Following page 3 of the Course Notes,
g = δ/2 = 50/[2(25)(25)] = 0.04.
Thus, x2 = x + g = −0.24 + 0.04 = −0.20,
which standardized gives z2 = −0.20/0.1220 =
−1.64. We look up −z2 = 1.64 and find that
the approximate P-value is 0.0505.
W/o the c/c, z = −0.24/0.1220 = −1.97,
giving an approximate P-value of 0.0244.
+ 108
+ +
Let’s look at CCD again. As I mentioned ear-
lier in lecture, b/c of the nasty numbers for
n’s we need to be more precise in our compu-
tations. In particular,
p̂1 = 22/37 = 0.5946, p̂2 = 11/34 = 0.3235,x = 0.5946 − 0.3235 = 0.2711,g = δ/2 = 71/[2(37)(34)] = 0.0282.
Thus,
x1 = x − g = 0.2711 − 0.0282 = 0.2429 and
z1 = 0.2429/0.1193 = 2.04
Finally, consider IS as an example of 6=. Recall
that δ = 0.20, so g = 0.10. First, we compare
|x| to g. If |x| ≤ g then the exact P-value is 1
and no approximation is needed. In this case,
x = 0.30, so we must continue.
Next, x3 = |x| − g = 0.30 − 0.10 = 0.20. As
stated earlier, σ = 0.2283, making
z3 = 0.20/0.2283 = 0.88. The area to the
right of 0.88 is 0.1894; doubling this, we get
0.3788 as the approximation to the exact 0.3698.
W/o the c/c, z = 0.30/0.2283 = 1.31 giving
an approximation of 2(0.0951) = 0.1902.
+ 109
+ +
One somewhat positive comment about the
lazy method.
If all the margins are really large, w/ and w/o
the c/c give about the same answers. For
example, take
n1 = n2 = 5000, m1 = 4000 and x = 0.02.
One can verify σ = 0.0098 and g = 0.0001
With the c/c, z = 0.0199/0.0098 = 2.03 and
w/o it, z = 0.02/0.0098 = 2.04.
But this is a huge study!
Chapter 5
We spent Chapters 2 and 3 examining the
Skeptic’s argument. The Skeptic makes no
attempt to (formally) extend conclusions be-
yond the subjects in the study.
For example, we concluded that cyclosporine
was superior to placebo for the 71 people
+ 110
+ +
in the study. We concluded that Julie was
better spinning right than left for the 50 trials
in her study.
Typically, for better or worse, researchers want
to extend their conclusions to a ‘larger situa-
tion.’
There are many techniques for such exten-
sions, but central to every one is the notion
of a population.
For our purposes, there are two types of pop-
ulations: finite and infinite. And the word
infinite here means it is not what we call a
finite population. (?)
A finite population is a well-defined collection
of individuals.
Examples: All persons who will vote in this
year’s presidential election; all persons eligi-
ble to vote in this year’s presidential election;
all persons in this room; all persons enrolled
for one or more credits this semester at UW-
Madison.
+ 111
+ +
We need a way to think about a finite popu-
lation.
Imagine a box of cards, called the population
box. Each member of the population has a
card in the box. On the member’s card are
the values of one or more variables, or fea-
tures, of the member. (Same features for all
members.)
For simplicity, we begin with one dichotomous
feature per card.
As in Chapter 1, the possible values of the fea-
ture are labeled success and failure. A ‘1’ on
a card denotes that that member is a success
for the feature, and a ‘0’ denotes a failure.
Thus, every card has a ‘1’ or a ‘0’ on it.
Statisticians paraphrase ‘Snoopy’ who once
famously said: I love mankind, it’s people I
can’t stand!
That is, statisticians are interested in the box
in totality, not in any particular member’s card.
+ 112
+ +
For a given population box:
• Let s denote the number of cards in the
box marked ‘1.’
• Let f denote the number of cards in the
box marked ‘0.’
• Let N = s+ f denote the total number of
cards in the box.
• Let p = s/N denote the proportion of
cards in the box marked ‘1.’
• Let q = f/N denote the proportion of
cards in the box marked ‘0.’
+ 113
+ +
These five numbers, s, f, N, p and q tell us
what is in the box. And, of course, knowl-
edge of two of these numbers, N and p, allow
us to determine the others. As a result, I will
describe a box as Box(N ; p).
For example, Box(10;0.60) is a box with N =
10 cards, of which a proportion p = 0.60 of
them are successes.
In practice, a researcher does not know p and
often does not even know N . There are two
ways for a researcher to ‘learn’ about what is
in the population box.
First, is a census; this means one examines
every card in the box. Discuss.
Second, is a survey; this means one examines
only some of the cards in the box. Discuss.
We will focus on surveys. The cards actually
examined in a survey comprise the sample.
+ 114
+ +
A sample is called representative if it looks
like the box. Every (honest) researcher wants
a representative sample, but, alas, there is no
way to guarantee getting one.
Let’s consider this idea of representative again.
Suppose that two researchers, A and B, each
select samples of size n = 5 from Box(N ; 0.60).
Below are their samples:
A: 1, 1, 0, 1, 1
B: 0, 1, 1, 0, 1.
Which sample is representative? B’s b/c its
p̂ = p.
Discuss Bill Clinton’s cabinet.
We cannot guarantee a representative sam-
ple, so we advocate selecting what is called
a random sample. A random sample (much
like randomization) is a process and does not
guarantee a sample that is representative, or
necessarily even ‘close’ to representative.
+ 115
+ +
Its great virture is that a random sample al-
lows us to calculate the probability that we
will obtain a sample that is close to rep-
resentative.
As stated earlier, in practice, a researcher does
not know p and often does not even know N .
But, for now, let’s assume that we know both
of these numbers.
Consider the chance mechanism of selecting
n cards at random from Box(N ; p).
Imagine that we select the cards one-at-a-
time.
But once we think of selecting the cards one-
by-one, two ways of sampling come to mind:
—Without replacement (smart), and
—With replacement (dumb).
Why do I label these smart and dumb? Dis-
cuss.
+ 116
+ +
Two probability histograms for X, the num-
ber of successes in a sample of size n = 10
from Box(1000;0.60) Solid [Dashed] rectan-
gles are for a random sample with [without]
replacement.
2 4 6 8 10
0.050.100.150.200.25
As above, but from Box(20;0.60).
2 4 6 8 10
0.050.100.150.200.250.300.35
+ 117
+ +
Note that in the first of these pictures, prob-
abilities are the ‘same’ for both methods of
sampling, but for the second picture the prob-
abilities are quite different.
Also, note that the probabilities are ‘better’
for the smarter method of sampling.
Finally, note that for the dumb way of sam-
pling, the probabilities do not depend on the
value of N .
The key is the ratio, n/N of sample size to
population size. In the first example, this ratio
is 10/1000 = 0.01 and in the second example,
10/20 = 0.50.
The general guideline is: If n/N ≤ 0.05, then
probabilities calculated w/replacement are ap-
proximately equal to probabilities calculated
w/o/replacement.
+ 118
+ +
Why does any of this matter?
Well, it turns out that it is much easier, both
computationally and theoretically, to work with
the probabilities for the dumb way of sam-
pling. Thus, regardless of how you select a
random sample, provided n/N ≤ 0.05, it is
valid to calculate probabilities the easy way.
Extended enrichment example:
It is sometimes better to sample the dumb
way.
Recall the CCD study. There are
N = 2.1 × 1020 possible assignments of sub-
jects to treatments.
Let each possible assignment be a ‘pop. mem-
ber’ and as a group they form our (very very
large) finite population. Define an assignment
to be a success if it would yield x ≥ 0.27. (Re-
member, the actual x = 0.27.)
+ 119
+ +
Thus, the p for this population box equals
0.0198, the P-value for the FT.
Our computer simulation experiment selected
a sample of n =10,000 assignments in the
dumb way. But it would be very difficult to
to write a computer program to select assign-
ments the smart way (and it would require lots
of memory and would run slowly). And, as our
above result shows, the smart way and dumb
way of sampling give the same answers.
We will now investigate computing probabil-
ities for the dumb way of sampling (with re-
placement).
Recall, that we plan to (probabilities are al-
ways about the future) select n cards at ran-
dom with replacement from the population
box.
Define X1 to be the number on the first card
selected (0 or 1); X2 to be the number on the
second card selected; . . . ; and Xn to be the
number on the nth (last) card selected.
+ 120
+ +
Let’s begin by computing probabilities for X1.
Clearly,
P(X1 = 1) = s/N = p and
P(X1 = 0) = f/N = q.
We can present these equations in a table:
Value Probability
0 q1 p
This table presents the sampling distribution
for X1. Upon reflection, this table is the sam-
pling distribution for X2, X3, . . . , and Xn.
We summarize by saying that X1, X2, X3,
. . . , and Xn are identically distributed.
So, we can calculate probabilities for any in-
dividual card.
+ 121
+ +
Next, we consider two cards simultaneously.
For example,
P(X1 = 1 and X2 = 0) = P(X1 = 1, X2 = 0).
The result is the multiplication rule for prob-
abilities. In this example, the multiplication
rule says:
P(X1 = 1, X2 = 0) =
P(X1 = 1)P(X2 = 0) = pq. In words, we re-
place the word ‘and’ by the operation of mul-
tiplying. (Similarly, recall, the addition rule
replaced ‘or’ by adding.)
My argument for justifying the multiplication
rule is a little tricky, so I will give it for a
specific example, p = 0.60. For this p,
P(X1 = 1, X2 = 0) = pq = 0.60(0.40) = 0.24.
A brute force argument is given on pages 153–
5 of the text. The argument below appeals
to the long-run interpretation of probability
given in Section 2.4.
+ 122
+ +
Consider the chance mechanism of selecting
two cards at random w/replacement from the
box. Now, imagine operating this chance mech-
anism a large number of times.
We know that in the long-run approximately
60% of the operations will give a first card of
‘1.’ And of those operations that first give a
‘1,’ approximately 40% will give a second card
of ‘0.’ Thus, in the long-run, 40% of 60% of
the operations will give a ‘1’ followed by a ‘0.’.
Next, remember that a percent of a percent
is computed by converting to decimals and
multiplying:
0.40(0.60) = qp = 0.24.
The multiplication rule can be extended in two
directions.
First, it is true for any two cards, not just the
first two. Thus, for example,
P(X3 = 1, X7 = 1) = pp = p2.
Second, it is true for more than two cards, for
example,
P(X3 = 1, X6 = 0, X7 = 1) = pqp = p2q.
+ 123
+ +
Do you recognize 4!?
This is read 4-factorial (you don’t need to
shout) and it is calculated by:
4! = 4(3)(2)(1) = 24.
Similarly,
3! = 3(2)(1) = 6 and
5! = 5(4)(3)(2)(1) = 120.
These guys get big fast; for example,
50! = 3.04 × 1064. By special definition,
0! = 1.
Now, define X to be X1 + X2 + . . . + Xn.
Literally, X is the sum of the numbers on the
n cards selected. But b/c each card has a ‘1’
or a ‘0,’, X can be interpreted as the total
number of successes in the sample.
The variable X is very important in scientific
applications. Thus, we would like to know its
sampling distribution. Fortunately, there is a
simple (?) formula for it, given on page 159
of the text:
+ 124
+ +
P(X = x) =n!
x!(n − x)!pxqn−x,
for x = 0,1, . . . , n.
This is a pretty amazing formula. It works
for any choice of n and any value of p. It is
called the Binomial sampling distribution with
parameters n and p, written Bin(n, p).
In this class, you need to be able to evaluate
this formula with a hand calculator for n ≤ 6.
For example, suppose we select n = 5 cards
at random with replacement from a box with
p = 0.60. What is the probability we will get
a representative sample?
First, we realize that p̂ = x/5 will equal p =
0.60 if, and only if, x = 3. Thus, we want to
calculate P(X = 3):
P(X = 3) =5!
3!(5 − 3)!(0.60)3(0.40)5−3 =
+ 125
+ +
120
6(2)(0.216)(0.16) = 0.3456.
For another example, for the same n and box,
let’s calculate P(X = 5). First, note that we
can calculate this by using the multiplication
rule. The event (X = 5) means that every
card is a ‘1.’ Thus,
P(X1 = 1, X2 = 1, X3 = 1, X4 = 1, X5 = 1) =
p5 = (0.60)5 = 0.0778.
Using the binomial formula we get:
P(X = 5) =5!
5!(5 − 5)!(0.60)5(0.40)5−5 =
120
120(1)(0.0778)(1) = 0.0778.
Note that we need the definition 0! = 1 so
that the formula works.
+ 126
+ +
There are many statistical software packages
that will compute binomial probabilities for
us. For example, in the text on page 161 I
give the computer-generated probabilities for
Bin(25,0.50).
Once we have binomial probabilities, we can
draw probability histograms for binomials. Sev-
eral such pictures are given on pages 162–3
of the text.
These pictures illustrate the following facts
about the binomial.
*δ = 1; thus, the height of each rectangle
equals the probability of its center.
*Just like the pictures for FT, there is one
central peak—one or two but never more rect-
angles wide—and the probabilities steadily de-
crease as you move away from the peak.
*The ph is symmetric, if and only if, p = 0.50.
*For p 6= 0.50, the ph is roughly symmet-
ric provided that both np and nq are ‘large.’
People disagree on what ‘large’ means; most
say 5 or 10 or 15.
+ 127
+ +
Yes, we can use a computer to calculate bino-
mial probabilities, but a computer should not
be seen as a panacea. For example, if I try
to use my software package for Bin(130,0.50)
I get an error message. Unless a program is
written with extreme care, the accurate cal-
culation of, say, n! for large n is difficult.
Thus, we might want to find a way to ap-
proximate binomial probabilities. Given the
similarities of the binomial ph to the ph for
FT, it is appealing to use the snc as an ap-
proximation.
To this end, note that it can be shown that
for Bin(n, p),
µ = np and σ =√
npq.
Thus, for example, Bin(100,0.50) has:
µ = 100(0.50) = 50 and
σ =√
100(0.50)(0.50) =√
25 = 5.
As a result, it is easy to standardize X:
Z =X − np√
npq.
+ 128
+ +
On pages 164–166 of the text are probability
histograms for three Z’s, each with an snc
for comparison. Visually, it is clear that the
snc can give good approximations for Z, and
hence for X.
The details will not be given and you are not
responsible for them. This approximation will
be used in Chapter 6.
Section 5.3: Bernoulli Trials
Consider the following experiments:
–Julie spins in circles to the right;
–Clyde shoots free throws; and
–Bob repeatedly tosses a coin.
In each experiment, a person is conducting a
sequence of trials.
Consider the following question: Suppose that
on Monday, Clyde attempts 100 free throws
and achieves 77 successes. Clyde plans to at-
tempt 200 free throws on Tuesday.
+ 129
+ +
What is the probability that he will make 150
or more free throws on Tuesday?
In order to answer this and similar questions,
we need a mathematical model for the pro-
cess that generates the results of the trials,
whatever that means.
We begin with a simple sequence of trials:
repeated tosses of a fair coin.
Think about this for a minute. What does it
mean to you when you read, ‘Repeated tosses
of a fair coin?’
In particular, I want us to write down mathe-
matical assumptions that describe this notion.
There are three assumptions needed.
+ 130
+ +
1. Each toss results in one of two outcomes:
a heads or a tails.
2. The probability of heads is 0.50 for every
toss.
3. The tosses exhibit ‘no memory.’
We have studied the chance mechanism of
selecting cards at random, with replacement,
from a population box. For this CM, we learned
two very useful techniques: the multiplication
rule for a particular sequence of outcomes and
the binomial sampling distribution for the to-
tal number of successes.
Is there any relationship between the assump-
tions above and selecting cards from a box?
+ 131
+ +
Well, imagine a box with two cards, one card
marked ‘1’ for heads and the other marked
‘0’ for tails. Suppose that we select cards at
random with replacement from this box.
I claim that this selection of cards from this
box satisfies the three assumptions given above.
Discuss.
Thus, we can perform the following compu-
tations:
–If I toss a fair coin four times, the probability
I get all heads is:
P(H, H, H, H) = (0.50)4 = 0.0625.
–If I toss a fair coin eight times, the proba-
bility that I get a total of exactly six heads
is:8!
6!2!(0.50)8 = 0.1094.
Next, we generalize the above assumptions.
+ 132
+ +
Suppose that we have a sequence of trials. If
they satisfy the following three assumptions,
then we say that we have Bernoulli Trials
(BT).
1. Each trial results in one of two outcomes:
a success or a failure.
2. The probability of success equals p for ev-
ery trial.
3. The trials exhibit ‘no memory.’
As argued above for a coin, BT are mathe-
matically equivalent to selecting cards at ran-
dom with replacement from Box(N ; p).
For example, Katie is a very good free throw
shooter. On the assumption that Katie’s free
throws are BT with p = 0.85,
+ 133
+ +
we can calculate the following probabilities.
–If Katie shoots three free throws, the prob-
ability she makes all three is:
P(S, S, S) = (0.85)3 = 0.6141.
–If Katie shoots ten free throws, the proba-
bility she makes a total of exactly nine is:
10!
9!1!(0.85)9(0.15) = 0.3474.
To summarize if we are told that we have BT
and we are told the value of p, then we can
calculate probabilities about the outcomes of
the trials.
What are the difficulties with this?
Well, education should be more than learning
to obey authority figures! I acknowledge this
above by prefacing my computation by saying,
‘On the assumption . . . are BT,’ but we can
(sometimes) do more.
+ 134
+ +
If we have previous outcomes of the trials,
we can use these data to investigate (not
determine!) whether the assumptions of BT
seem reasonable.
I offer two ways to investigate. One way
is designed to explore the second assump-
tion (constancy of success probability) and
the other focuses on the third assumption
(lack of memory).
To be precise, suppose we have observed the
following results of n = 10 trials:
1 0 0 0 1 1 0 0 0 1.
We investigate constancy by creating the fol-
lowing table:
Half S F Total p̂1 2 3 5 0.402 2 3 5 0.40
Total 4 6 10
+ 135
+ +
We need to be careful. We want to know
whether p remains constant, but we never get
to see p. We can see the p̂’s. For the table
above,
p̂1 = p̂2 = 0.40.
B/c the p̂’s do not change from the first half
to the second half, there is no evidence that
p has changed.
Of course, this argument is not conclusive.
We could make it more formal, but that is
not my goal. I encourage you to think of this
as an ‘informal hypothesis test’ for which the
null hypothesis is that p remains constant.
Let’s look at the trials again:
1 0 0 0 1 1 0 0 0 1.
Do you see memory?
Well the last five trials yield exactly the same
outcomes as the first five, so this looks like
memory. But it could be simply the result of
chance.
+ 136
+ +
But now you need to be a scientist. Does
this ‘5 step’ memory make any sense to you?
Discuss.
Often, what makes sense is ‘1 step’ memory;
i.e. memory in which a current trial is influ-
enced by the outcome of the trial immediately
before it.
We take the 10 trials and form 9 ‘overlapping
pairs.’ (It is too difficult to do this w/my word
processor. I will illustrate it on the board.)
We create the following table. Note that in
this table we are counting pairs not trials.
Previous Current TrialTrial S F Total p̂1 1 2 3 0.332 2 4 6 0.33
Total 3 6 9
+ 137
+ +
B/c the two p̂’s are equal, there is no evidence
of 1 step memory. Discuss.
I want to show you an easier and faster way
to create the above memory table.
First, we know that the number of pairs will
be (n−1), which is nine for these data. Also,
we know from previously comparing halves that
there is a total of four successes and six fail-
ures. We put these numbers in the margins,
yielding the partial table below.
Previous Current TrialTrial S F Total p̂1 42 6
Total 4 6 9
A difficulty with this table, of course, is that
4 + 6 = 10, not 9. But look at the first trial
in the sequence, a success.
+ 138
+ +
The first trial appears in only one pair b/c it
has no trial before it. In other words, the first
trial never ‘gets to be’ current in the mem-
ory table. Thus, even though there are four
successes in the sequence, only three of them
are current. Thus, we change the column to-
tal for S to 3.
Similarly, the last trial, also an S, never gets
to be previous. Thus, we subtract one from
the row total for S to get the following partial
table, which has the correct margins.
Previous Current TrialTrial S F Total p̂1 32 6
Total 3 6 9
Now, we simply determine by counting one of
the entries in the table, and obtain the others
by subtracting.
For example, the pair SS (11) occurs exactly
once in the sequence.
+ 139
+ +
Here is another example with n = 20:
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0.
These data yield the following tables.
Half S F Total p̂1 10 0 10 1.002 0 10 10 0.00
Total 10 10 20
Previous Current TrialTrial S F Total p̂1 9 1 10 0.902 0 9 9 0.00
Total 9 10 19
Discuss.
Over the years, I have had trouble convinc-
ing my students that BT exist in the world.
(Math majors are extremely willing to believe
assumptions; nonmath majors are not.)
I decided to use the video game Tetris to con-
vince my students that BTs do exist. Sadly,
as we will see, I failed miserably.
+ 140
+ +
I collected these data circa 1990 using a Nin-
tendo system played on my television.
A trial is a ‘shape’ of blocks falling from the
top of the screen.
First difficultly: There are 7, not 2 shapes.
Solution: Define ‘log’ to be an S; all others
are F’s.
I played 8 games and observed n = 1872 tri-
als. Instead of dividing the trials into halves,
I created the following table:
Play S F Total p̂First 22 171 193 0.114Second 33 185 218 0.151Third 36 215 251 0.143Fourth 25 206 231 0.108Fifth 30 198 228 0.132Sixth 42 220 262 0.160Seventh 33 215 248 0.133Eighth 33 208 241 0.137
Total 254 1618 1872 0.136
+ 141
+ +
•• •
• • • • •................................................................................................................................................................................................
..............................................................................................................................................................
1 2 3 4 5 6 7 8 Play0.000.040.080.120.16
Looking at this table and its plot, I was feeling
pretty good. There is some evidence that p
is not constant, but the evidence seems weak
to me. (A formal HT of the null hypothesis
that p is constant versus the alternative that
it can change every game—see Chapter 11 for
details—gives a very large P-value, 0.751.)
But then I created the memory table:
Current ShapePrevious Shape S F Total p̂Success 2 251 253 0.008Failure 249 1362 1611 0.155
Total 251 1613 1864 0.135
+ 142
+ +
Note: The grand total for this table is (n−8)
b/c we lose one pair for each play. (I believe
that it does not make sense to look for mem-
ory by pairing the last trial of one game with
the first trial of the next game.)
It is obvious (?) that there is memory. (The
P-value is
1.1 × 10−14.)
One of my favorite projects: Describe the pet-
turtle study.
Section 5.4: Some Practical Considera-
tions
The big result of Chapter 5 is the multiplica-
tion rule, from which we obtain the Binomial
and many other results not discussed in this
course.
For a finite population, we get the MR if we
sample at random w/replacement and for an
infinite population if we have BTs.
+ 143
+ +
B/c humans, and I would say especially math-
ematicians, like answers, there is a tremen-
dous pressure to assume/pretend/deceive that
we have the MR.
In my experience, it is very common for re-
searchers to claim that they have a random
sample from a finite population, even when
they clearly do not.
Examples: Pretty much every survey you ever
read about. My Wisconsin DOT study. (Dis-
cuss.)
Birthdays. Below is a very famous table in
probability theory.
+ 144
+ +
n Prob. n Prob. n Prob.
2 0.0027 13 0.1939 24 0.53743 0.0082 14 0.2226 25 0.56774 0.0163 15 0.2523 26 0.59725 0.0271 16 0.2829 27 0.62586 0.0404 17 0.3143 28 0.65347 0.0561 18 0.3461 29 0.67998 0.0741 19 0.3783 30 0.70539 0.0944 20 0.4106 31 0.7295
10 0.1166 21 0.4428 32 0.752411 0.1408 22 0.4748 33 0.774012 0.1666 23 0.5063 34 0.7944n Prob. n Prob.
35 0.8135 46 0.947836 0.8313 47 0.954437 0.8479 48 0.960238 0.8633 49 0.965439 0.8775 50 0.970140 0.8905 55 0.986141 0.9025 60 0.994042 0.9134 70 0.999143 0.9234 80 0.999944 0.9324 90 1.000045 0.9405
+ 145
+ +
Probabilities must be computed before the
CM is operated. But despite my feeling this
way, many people try to calculate probabili-
ties after the CM has been operated. In my
experience, the huge majority of such compu-
tations lead to gibberish.
There are three major mistakes that people
make. I call them:
–ignoring failed attempts;
–focusing on a winner; and
–inappropriate use of ELC.
Multiple lotto winner. Said to beat-the-odds
of one in 16 trillion. Discuss.
+ 146
+ +
Discuss batting order example.
It was reported on TV that the probability of
this happening was 1 in 1.317 × 1011.
Let’s put that number in perspective. There
are currently 30 teams in MLB, playing 162
games per year for a total of 2430 games per
year. The above event is, thus, expected to
happen once every 54,190,080 years. I really
doubt that MLB will survive that long!
+ 147
+ +
Yahtzee. Assuming that the five dice are bal-
anced (ELC) and act independently (no mem-
ory from die to die) then the probability of
getting a Yahtzee on a single throw of the
five dice is:
(1/6)4 = 1/1296.
Thus, if you pick up five dice and throw a
Yahtzee, that is pretty great.
But suppose you toss the dice 10 times per
minute for two hours and get a Yahtzee; what
then?
Suppose that 12,960 people each throw five
dice at once. Do you think there will be any
Yahtzees? How many?
+ 148