Probability - Simon Fraser University

Probability

Role of probability in statistics:

Gather data by probabilistic (random) mecha-

nism.

Use probability to predict results of experiment

under assumptions.

Compute probability of error larger than given

amount.

Compute probability of given departure between

prediction and results under assumption.

Decide whether or not assumptions likely real-

istic.

101

Outline in randomized clinical trial context: Salk

vaccine.

Suppose vaccine useless – no cases of polio

prevented, none caused.

Between treatment and control total of 56+142=198

cases.

Each child assigned to treatment / control with

chance 1/2.

Is a 56 – 142 split likely in 198 coin tosses?

Actual chance of 56 or fewer heads in 198

tosses is:

4× 10−10

(4 in 10 billion).

Conclusion. Random assignment almost cer-

tainly not explanation for difference.

102

Chance of exactly 56 heads? 2.6× 10−10

Chance of exactly 99 heads? 0.056 or so.

No outcome particularly likely.

Evidence for effective vaccine: low number of

cases in treatment group.

We judge evidence against hypothesis of no

vaccine effect by computing:

Probability of getting evidence against hypoth-

esis as strong as, or stronger than, the evidence

we did get assuming hypothesis true.

Literary technique: foreshadowing – we come

back to this logic in Chapter 14,

103

Where did all these numbers come from?

Rules, jargon of probability:

Do chance / random experiment: toss thumb-

tack on each screen.

Possible outcomes: each tack can land point

up or tipped over.

The Sample Space

Screen ShortOutcome number Left Right hand1 up up UU2 up over UO3 over up OU4 over over OO

Sample space has 4 elements: {UU,UO,OU,UU}.

104

Probability: assign to each possible a number,

the probability of that outcome.

Rules for the numbers:

all probabilities are greater than or equal to 0.

all probabilities are less than or equal to 1.

the probabilities to each possible outcome must

add up to 1.

We assign probabilities to other things like:

Probability the two thumbtacks land opposite

to each other?

P(UO) + P(OU)

P is for probability.

105

BUT: how do we assign the four numbers?

Make simple assumptions.

Use rules of probability.

Simpler experiment: toss coin on each screen.

Possible outcomes:

{HH,HT, TH, TT}

Use interpretation of probability: P(HH) is

long-run fraction of times HH occurs.

In the long run which of four possibilities should

occur most often?

None. H should be about as common as T

on each screen and when H shows up on left

screen H and T should be equally common on

right screen.

Argument by symmetry: all four outcomes

have same chance.

106

So:

P(HH) = P(HT) = P(TH) = P(TT) =1

4

Does the same go for thumbtacks?

NO: thumbtack not symmetric in the same

way.

But: suppose long run fraction of U on left

were say 1/3 and long run fraction of U on

right were say 1/4. (Different tacks!)

Long run fraction of UU would be

1

3× 1

4

“Of the trials where left tack turns out U ,

should see 1/4 have U on right screen — no

contact/influence between screens.”

Jargon: left, right screen results independent.

107

This would lead to

Outcome UU UO OU OO

Prob 112

312

212

612

Where do the 1/3 and 1/4 come from?

Experimentation – empirical measurement.

Hypothesis / theory – as in genetics.

Symmetry (not in this example, though)

Physics, maybe.

Only empirical method likely to work in this

experiment.

108

Return to Salk trial. If no vaccine effect then:

number of cases in treatment like number of

heads in 198 tosses of fair coin. (Treat number

of cases as predestined.)

How do we compute chance of 56 heads in 198

tosses.

Method 1: exact calculation based on rules of

probability.

Leads to Binomial distribution.

Gives answer

198× 197× · · · × 1

(56× · · · × 1)× (142× · · · × 1)× 1

2198

109

Commentary:

Uselessly difficult to compute, even with cal-

culator.

Care need to compute accurately on computer.

Can do just as well with a normal approxima-

tion.

Method two: use central limit theorem.

Next: some details of Method 1.

Then: details of Method 2.

110

Some examples to illustrate ideas underlying

binomial distribution.

Toss coin twice.

The Sample Space

Toss #1 2 HeadsT T 0T H 1H T 1H H 2

111

Convert to chances for 0, 1, 2 heads:

Distribution of Number of Heads

# Heads 0 1 2

Probability 14

24

14

Notice: 22 = 4 possible outcomes.

All chances of form:

# of ways divided by 4.

112

Next three tosses:

The Sample Space

Toss #1 2 3 Heads ProbT T T 0 1/8T T H 1 1/8T H T 1 1/8H T T 1 1/8T H H 2 1/8H T H 2 1/8H H T 2 1/8H H H 3 1/8

113

Distribution of Number of Heads

# Heads 0 1 2 3

Probability 18

38

38

18

Notice: 23 = 8 possible outcomes.

All chances of form:

# ways / 8.

114

In general: toss coin n times.

2n possible outcomes.

each chance is 2n.

Count up number of ways to get desired num-

ber of heads.

Chance of that many heads is

# ways / 2n.

115

What if it had been a thumbtack?

More difficult to compute chances.

Use rules:

Jargon “get 56 heads” is an event — collec-

tion of all outcomes where there are 56 heads

and 142 tails.

Simpler case:

Toss coin three times:

Some events:

Get 2 heads {HHT,HTH, THH}Get 0 heads {TTT}Get at least 2 heads {HHT,HTH, THH,HHH}

116

Rules of probability: in following A,B short-

hand for two events. S is collection of all pos-

sible outcomes, the sample space.

1) 0 ≤ P(A) ≤ 1

2) P(S) = 1.

3) P(A doesn’t happen) = 1− P(A).

4) if it is impossible for both A and B to happen

at the same time and C is the event “either A

happens or B happens or both happen” then

P(C) = P(A) + P(B)

5) if A and B are independent and D is the

event “both A and B happen” then

P(D) = P(A)× P(B)

4) is the addition rule

5) is the multiplication rule.

117

Independent means: whether or not A hap-

pens doesn’t influence chance B happens.

Example: back to two thumbtacks.

A is “left tack lands U”

B is “right tack lands U”

Suppose P(A) = 1/3 and P(B) = 1/4.

If I tell you A happened then you should still

think probability that B will happen is 1/4 –

no way for result of left toss to influence result

of right toss.

D is event both A and B happen, that is, get

UU .

Then

P(D) = P(A)× P(B) =1

3× 1

4=

1

12

118

Recognizing use of Binomial distribution:

1) repeat same basic experiment a fixed num-

ber, n say, of times.

2) each trial results either in something hap-

pening (called “SUCCESS” or ’S’) or not hap-

pening (“FAILURE” or ‘F’);

3) trials independent. Outcome of one does

not influence any other.

4) prob of success on each trial is same, say p.

If so then

P(exactly k successes)

=n!

k!(n− k)!pk(1− p)n−k

Notation in other books:

n!

k!(n− k)!=

(n

k

)

= nCk = Cnk .

119

In this course: we don’t memorize the formula.

How is formula derived: use rules of probability

above and combinatorics or counting.

Graphical presentation of binomial probabili-

ties: draw histogram. Area of bar = chance of

corresponding value.

Example: toss coin twice; n = 2, p = 1/2.

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

120

Total area of all bars is 1.

n = 3, p = 0.5

0 1 2 3

0.0

0.2

121

n = 16, p = 0.5

2 4 6 8 10 12 14

0.00

0.10

0.20

n = 64, p = 0.5

20 25 30 35 40 45

0.00

0.06

122

n = 256, p = 0.5

110 120 130 140 150

0.00

0.03

123

Now try p = 1/4.

n = 2, p = 1/4.

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

n = 3, p = 1/4

0 1 2 3

0.0

0.2

0.4

124

n = 16, p = 1/4

0 2 4 6 8

0.00

0.10

0.20

n = 64, p = 1/4

5 10 15 20 25

0.00

0.06

125

n = 256, p = 1/4

50 60 70 80

0.00

0.03

126

Notice increasing symmetry.

Notice general shape of normal curve.

Now superimpose normal curves!

Idea: compute probabilities by adding up areas

of bars or make normal approximation.

To do so: need mean and standard deviation

of histogram!

Mean is µ = np

SD is σ =√

np(1− p).

127

Superimpose normal curves. p = 0.5

n = 2

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

n = 3

0 1 2 3

0.0

0.2

0.4

128

n = 16

2 4 6 8 10 12 14

0.00

0.10

0.20

n = 64

20 25 30 35 40 45

0.00

0.06

n = 256

110 120 130 140 150

0.00

0.03

129

p = 0.25

n = 2

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.3

0.6

n = 3

0 1 2 3

0.0

0.2

0.4

130

n = 16

0 2 4 6 8

0.00

0.10

0.20

n = 64

20 25 30 35 40 45

0.00

0.06

n = 256

110 120 130 140 150

0.00

0.03

131

Making normal approximations to compute bi-

nomial probabilities:

Identify range:

Compute mean and standard deviation.

Convert range to standard deviation units.

Look up area in normal table.

132

Example:

Probability of 1 or 2 heads in 3 tosses of fair

coin.

1) Range is 0.5 to 2.5. Reason for halfs: edges

of bars at -0.5,0.5,1.5, and so on.

2) Mean is µ = np = 3 ∗ 0.5 = 1.5

SD is σ =√

3 ∗ 0.5 ∗ (1− 0.5) = 0.866.

Convert to standard deviation units:

0.5− 1.5

0.866to

2.5− 1.5

0.866

or -1.15 to 1.15.

Look up area: 0.7499≈0.75.

Correct answer is 3/4=0.75.

133

Salk vaccine examples.

If the vaccine is ineffective then number of po-

lio cases in treatment group is like number of

heads in 198 tosses of a fair coin.

That is: Binomial distribution for number of

cases in treatment.

Reasoning: 198 cases destined regardless of

outcome of randomization. Each case assigned

to treatment with same chance, 1/2. The

cases are assigned independently.

Chance of 56 heads or fewer in 198 tosses:

Limits: 56.5 or fewer.

Mean is µ = 198 ∗ 0.5 = 99.

SD is σ =√198 ∗ 0.5 ∗ 0.5 = 7.04.

134

Convert 56.5 to standard deviation units:

56.5− 99

7.03= −6.04

Off end of the tables!

Chance ≤ 0.0003 from tables.

But actual chance from software is: 7.7×10−10.

Interpretation: either the hypothesis above (where

I wrote ‘if’) is wrong or something extremely

unlikely has happened; conclude hypothesis of

no treatment effect is wrong.

135

Why not compute chance of exactly 56 heads

instead of 56 or fewer?

Another example:

Imagine toss coin 10,000 times. Chance of

exactly 5,000 heads?

Range is 4999.5 to 5000.5.

Mean is µ = 5000.

SD is σ =√10000 ∗ 0.5 ∗ 0.5 = 50.

Convert range to standard units:

4999.5− 5000

50to

5000.5− 5000

50

136

This is -0.01 to 0.01 so chance is approxi-

mately 0.0080.

Notice: even most likely outcome is not very

likely!

In hypothesis testing we compute chance of

results “as extreme as or more extreme than’

the results we actually got assuming some hy-

pothesis is true. If the chance comes out small

we conclude the hypothesis is (likely) not true.

137

Documents

Probability - Simon Fraser University