31
MCB 432 Basic Probability: Considering the Odds

Basic Probability

Embed Size (px)

DESCRIPTION

prob

Citation preview

MCB 432

Basic Probability:

Considering the Odds

Probability Probability of an event is usually represented as P(event) For tossing a fair coin:

P(head) = 1/2 P(tail) = 1/2

When throwing a fair 6-sided die, the probability of any specific number is 1/6. So,

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6.

Combining Alternatives If there are multiple independent ways to arrive at a goal, the combined probability is the sum of the individual probabilities. That is,

P(A or B) = P(A) + P(B) Because these outcomes are mutually exclusive (you cannot get a 1 and a 2 on the same throw), the probability of a die throw yielding a 1 or a 2 is:

P(1 or 2) = P(1) + P(2) = 1/6 + 1/6 = 2/6 = 1/3 Similarly, the probability of an even number is:

P(even) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2

Something Happens A useful1 rule is the sum of the probabilities of all possible outcomes is 1 (something will happen). This is why the probability of a given number on a die is 1/6 (it is trivial in this case, but it is very useful in other circumstances). It comes from:

P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1 By the definition of a fair die, all outcomes are equally likely:

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) So we can replace P(2) etc. with P(1):

P(1) + P(1) + P(1) + P(1) + P(1) + P(1) = 1 6 x P(1) = 1 P(1) = 1/6 [= P(2) = P(3) = P(4) = P(5) = P(6)]

1 That means that you will use it later in the course.

Combinations of Events To get the probability that both of two separate (independent) things will happen, we take the product of the probabilities. That is,

P(A and B) = P(A) x P(B) For example, when throwing a die two times, the probability of getting 1's both times is the probability of a 1 followed by a 1, which is:

P(1,1) = P(1) x P(1) = 1/6 x 1/6 = 1/36 The same concept applies to the probability of a 1 followed by a 2:

P(1,2) = P(1) x P(2) = 1/6 x 1/6 = 1/36

Alternative Combinations of Events The previous case specified a 1 followed by a 2. What if all we want is that the total of the dice is 3? There are two ways to get a total of 3: 1 then 2, or 2 then 1.

P(total 3) = P(1,2) + P(2,1) = 1/6 x 1/6 + 1/6 x 1/6 = 1/36 + 1/36 = 2/36 = 1/18

If order does not matter, then we need to consider all possible orders. All alternatives that arrive at the goal must be considered. When throwing two dice, there are six ways for the dice to total 7: 1 & 6, 2 & 5, 3 & 4, 4 & 3, 5 & 2 or 6 & 1. Thus the probability of the total of two throws being 7 is:

P(total 7) = P(1,6) + P(2,5) + P(3,4) + P(4,3) + P(5,2) + P(6,1) = 6 x (1/6 x 1/6) = 6 x 1/36 = 1/6

Russian Roulette Russian Roulette is "played" with a 6-shot revolver with 5 empty chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger. If a player has played 3 times, what is the probability that he or she is still alive?

Russian Roulette Russian Roulette is "played" with a 6-shot revolver with 5 empty chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger. If a player has played 3 times, what is the probability that he or she is still alive? View 1 (the textbook answer): There is a 5/6 chance of being alive after a turn. Therefore the probability of being alive after 3 turns is

P(alive) = 5/6 x 5/6 x 5/6 = 125/216 ≈ 0.579

Russian Roulette Russian Roulette is "played" with a 6-shot revolver with 5 empty chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger. If a player has played 3 times, what is the probability that he or she is still alive? View 2 (the logical, from my perspective, answer): In order to play the third time, the player survived the first 2 rounds. Therefore the probability is

P(alive) = 1 x 1 x 5/6 = 5/6 ≈ 0.833

Poker Poker is played with a 52 card deck, with 13 different card types in each of 4 suits. We will consider hands of 5 cards, with no replacements. The strength of a hand is determined by special combinations of cards with the following rankings:

1 pair (two-of-a-kind) 2 pairs 3-of-a-kind straight (five consecutive cards from sequence

2-3-4-5-6-7-8-9-10-J-Q-K-A) flush (5 cards in same suit) full house (3-of-a-kind and a pair) four-of-a-kind straight flush (straight with all cards in same suit)

Poker Probability of a Pair With 5 random cards, a pair can be arrived at by several different paths. It could be the first 2 cards drawn, the first and third, first and fourth, etc. There are 10 alternatives (permutations) of this sort. The second factor is that of getting 2 that are the same, and 3 other cards that are all different. Let's work this out for the case of drawing the pair in the first 2 cards. For the first card, any card will do:

P = 52/52 The second card must be one of the 3 remaining that match the first card:

P = 3/51 The third card must be one of the 48 that do not match the pair:

P = 48/50

Poker Probability of a Pair The fourth card must be one of the 44 that do not match the previous (2 of 49 match the pair, 3 of 49 match the 3rd card):

P = 44/49 The fifth card must be one of the 40 that do not match the previous:

P = 40/48 The overall probability for this sequence is:

P = 52/52 x 3/51 x 48/50 x 44/49 x 40/48 = (3 x 44 x 40)/(51 x 50 x 49) = (44 x 4)/(17 x 5 x 49) = 176/4165 ≈ 0.0423

Poker Probability of a Pair What if we were to draw it in a different sequence, say getting the pair on the second and fourth cards? For the first card, any card will do:

P = 52/52 The second card (the first of the pair) must be different:

P = 48/51 The third card must be different than both the previous:

P = 44/50 The fourth card must match the second:

P = 3/49 The fifth card must not match the previous:

P = 40/48 Same numerators and denominators, just a different order.

Poker Probability of a Pair There are 10 different ways to arrive a exactly one pair. If A, B, C and D are 4 different card types, we can get:

AABCD, ABACD, ABCAD, ABCDA, BAACD, BACAD, BACDA, BCAAD, BCADA, or BCDAA

Each specific order has a probability of 176/6615, so the overall frequency of drawing a pair (but no better) is:

P = 10 x 176/4165 = 1760/4165 ≈ 0.423

Poker Probability of 3-of-a-Kind 3-of-a-kind is 3 cards of one type, and two cards that do not match the 3, or each other. Let's work this out for the case of completing it in the first 3 cards. For the first card, any card will do:

P = 52/52 The second card must match the first card:

P = 3/51 The third card must match the first card:

P = 2/50 The fourth card must be different:

P = 48/49 The fifth card must be different:

P = 44/48

Poker Probability of 3-of-a-Kind The overall probability for this sequence is:

P = 52/52 x 3/51 x 2/50 x 48/49 x 44/48 = (3 x 2 x 44)/(51 x 50 x 49) = 44/(17 x 25 x 49) = 44/20825 ≈ 0.00211

There are 10 alternative orders of completing it: AAABC, AABAC, AABCA, ABAAC, ABACA,

ABCAA, BAAAC, BAACA, BACAA, and BCAAA The probability for arriving at 3-of-a-kind in any order is:

P = 10 x 44/20825 = 440/20825 ≈ 0.0211

Poker Probability of a Full House A full house is 3-of-a-kind and a pair. Let's work this out for the case of drawing the 3-of-a-kind first, then the pair. For the first card, any card will do:

P = 52/52 The second card must match the first card:

P = 3/51 The third card must match the first card:

P = 2/50 The fourth card must be different:

P = 48/49 The fifth card must match the fourth:

P = 3/48

Poker Probability of a Full House The overall probability for this sequence is:

P = 52/52 x 3/51 x 2/50 x 48/49 x 3/48 = (3 x 2 x 3)/(51 x 50 x 49) = 3/(17 x 25 x 49) [exactly 3/44 the 3-of-a-kind value] = 3/20825 ≈ 0.000144

There are 10 alternative orders of completing it: AAABB, AABAB, AABBA, ABAAB, ABABA,

ABBAA, BAAAB, BAABA, BABAA, and BBAAA The probability for arriving at a full house in any order is:

P = 10 x 3/20825 = 30/20825 ≈ 0.00144

Poker Probability of 4-of-a-Kind Four-of-a-kind is 4 of the same type of cards. Let's work this out for the case of drawing the 4 of the same type first. For the first card, any card will do:

P = 52/52 The second card must match the first card:

P = 3/51 The third card must match the first card:

P = 2/50 The fourth card must match the first card:

P = 1/49 The fifth card will be different (all of the kind are gone):

P = 48/48

Poker Probability of 4-of-a-Kind The overall probability for this sequence is:

P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 = (3 x 2)/(51 x 50 x 49) = 1/(17 x 25 x 49) [exactly 1/3 the full house value] = 1/20825 ≈ 0.0000480

There are 5 alternative orders of completing it: AAAAB, AAABA, AABAA, ABAAA and BAAAA

The probability for arriving at 4-of-a-kind in any order is: P = 5 x 1/20825 = 5/20825 ≈ 0.000240

The low probabilities of exciting hands is one reason why televised poker is most commonly based on 7-card hands!

Poker Probability of 4-of-a-Kind in 7 Cards So, what is the probability of 4-of-a-kind in 7 cards? Let's work this out for the case of drawing the 4-of-a-kind first. The first 5 cards are as above:

P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 = 1/20825 The sixth and seventh cards can be anything (you can only use 5 cards, so getting another pair does not change the hand):

P = 47/47 x 46/46 The overall probability for this sequence remains:

P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 x 47/47 x 46/46 = 1/20825 ≈ 0.0000480

The higher probability comes from the fact that there are now 35 possible orders in which the 4 matching cards can be drawn (not just 5 orders). Thus,

P = 35 x 1/20825 = 35/20825 ≈ 0.00168

Poker Probability of a Full House in 7 Cards Let's work this out for drawing the 3-of-a-kind, then the pair. The first 5 cards are as above:

P = 52/52 x 3/51 x 2/50 x 48/49 x 3/48 = 3/20825 Now it gets uglier. There are 210 permutations for having both of the remaining cards different from both the 3-of-a-kind, and from the pair, with

P(full house) = 3/20825 x 44/47 x 43/46 But there are another 210 ways in which one of the remaining cards could match the pair (which does not improve the hand):

P(2 3-of-a-kind) = 3/20825 x 2/47 x 44/46 Putting everything together:

P = 210 x 3/20825 x ( 44/47 x 43/46 + 2/47 x 44/46 ) = 210 x 3/20825 x ( 44/47 x 45/46 ) = 210 x 1188/45023650 ≈ 210 x 0.0000263 ≈ 0.00554

The Monty Hall Problem You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the Grand Prize. You are offered the chance to keep your original door, or switch to the other unopened door. What is the optimal strategy? Keep the door you originally chose, switch to the other unopened door, or it does not matter. Using the optimal strategy, what is the probability of winning the Grand Prize?

The Monty Hall Problem You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the Grand Prize. You are offered the chance to keep your original door, or switch to the other unopened door. Initially,

P(car0) = 1/3, P(goat0) = 2/3

Keep the original door strategy: P(car) = P(car0) x P(car0→car) + P(goat0) x P(goat0→car) = 1/3 x 1 + 2/3 x 0 = 1/3 P(car) = 1/3 P(goat) = P(car0) x P(car0→goat) + P(goat0) x P(goat0→goat) = 1/3 x 0 + 2/3 x 1 = 2/3 P(goat) = 2/3

The Monty Hall Problem You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the grand prize. You are offered the chance to keep your original door, or switch to the other unopened door. Initially,

P(car0) = 1/3, P(goat0) = 2/3

Switch door strategy: P(car) = P(car0) x P(car0→car) + P(goat0) x P(goat0→car) = 1/3 x 0 + 2/3 x 1 = 2/3 P(car) = 2/3 P(goat) = P(car0) x P(car0→goat) + P(goat0) x P(goat0→goat) = 1/3 x 1 + 2/3 x 0 = 1/3 P(goat) = 1/3

Probabilities of Nucleotide Sequences DNA sequences have a 4-letter alphabet: A, C, G and T. RNA sequences have a 4-letter alphabet: A, C, G and U. The probability that a given six nucleotide DNA sequence is GAATTC (the EcoRI endonuclease recognition sequence) is

P(GAATTC) = P(G) x P(A) x P(A) x P(T) x P(T) x P(C) If each of the 4 nucleotides is equally likely, then

P(A) = P(C) = P(G) = P(T) = 1/4 so P(GAATTC) = 1/4 x 1/4 x 1/4 x 1/4 x 1/4 x 1/4 = 1/4096 ≈ 0.00024

Probabilities of Nucleotide Sequences The probability that a given six nucleotide DNA sequence is GAATTC (the EcoRI endonuclease recognition sequence) is

P(GAATTC) = P(G) x P(A) x P(A) x P(T) x P(T) x P(C) What if the probabilities of the nucleotides are not equal? What if

P(A) = P(T) = 1/6 P(C) = P(G) = 1/3

then P(GAATTC) = 1/3 x 1/6 x 1/6 x 1/6 x 1/6 x 1/3 = 1/11664 ≈ 0.000086 So, the base composition of the DNA matters in restriction site frequencies.

Probabilities of Nucleotide Sequences If we have a 10,240 basepair circular plasmid with equal frequencies of each of the 4 nucleotides, what is the probability that the plasmid is cleaved (one or more times) by EcoRI? We approach this most easily by computing the probability that it is not cleaved:2

P(cleaved one or more times) + P(not cleaved anywhere) = 1 So,

P(cleaved one or more times) = 1 – P(not cleaved anywhere)

2 I told you that the total probability being 1 would be useful.

Probabilities of Nucleotide Sequences For the plasmid to not be cleaved anywhere, it is necessary that it is not cleaved at position 1, and not cleaved at position 2, ... and not cleaved at position 4096. The probability that a given site is not cleaved is 1 minus the probability that it is cleaved at the site:

P(not cleaved at position 1) = 1 – 1/4096 = 4095/4096 Or more generally, for any position i:

P(not cleaved at position i) = 1 – 1/4096 = 4095/4096 The probability that it is not cleaved at any of the 10,240 positions is the product of the probabilities for not being cut at each individual position:

P(plasmid not cleaved anywhere) = (4095/4096)10240 ≈ 0.082 So,

P(plasmid cleaved) = 1 – P(plasmid not cleaved) ≈ 0.918

Probabilities of Nucleotide Sequences What if we had approached the above question as:

P(cleaved somewhere) = number_of_sites x P(cleaved at site i) This formulation is

P(cleaved at site 1) + P(cleaved at site 2) + ... + P(cleaved at last site)

This would only make sense as a way to combining probabilities if the events were mutually exclusive solutions to the problem, but, in fact, more than one site can be cleaved. If we were to use this formula we would get:

P = 10240 x 1/4096 = 2.5

This cannot be a probability; a probability cannot be greater than 1 (or less than 0).

Probabilities of Nucleotide Sequences If the above expression is not the probability of a cleavage, what is it? It is the expected number of cleavages:

E = 10240 x 1/4096 = 2.5

That is, if we had a large number of plasmids of this size and base composition, the number of cleavages per plasmid, averaged over all of the plasmids, would be 2.5. Any individual plasmid would be cut a specific number of times (0, 1, 2, 3, ...), but the average need not be an integer. The expected number of events is fundamental to the Poisson distribution, where it is usually called µ (i.e., the Greek letter mu). By the way, when the expected number is very small (<<1), it is a good approximation of the probability of one or more events occurring; but it assuming that this is always true will get you in trouble.