View
224
Download
0
Category
Tags:
Preview:
Citation preview
Thinking About Probability
OutlineBasic Idea
Different types of probabilityDefinitions and RulesConditional and Joint probabilities
Essentials of understanding statsDiscrete and Continuous probability
distributionsDensity
PermutationsA visit to the Binomial distributionThe Bayesian approach
The Problem with ProbabilitiesCan be very hard to grasp
e.g. Monty Hall problemTV show “Let’s make a deal”3 closed doors, behind 1 is a prize (others have
“goats”)Select a doorMonty Hall opens one of the remaining doors
that does NOT contain a prizeNow allowed to keep your original door or switch
to the other oneDoes it make a difference if you switch?
http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html
Properties of probabilities0 ≤ p(A) ≤ 1
0 = never happens 1 = always happens
A priori definitionp(A) = number of events classifiable as A
total number of classifiable events
A posteriori definitionp(A) = number of times A occurred
total number of occurrences
Properties of probabilitiesSo:p(A)= nA/N = number of events
belonging to subset A out of the total possible (which includes A).
If 6 movies are playing at the theater and 5 are crappy but 1 is not so crappy what is the probability that I will be disappointed?
5/6 or p = .8333
Probability in PerspectiveAnalytic view
The common approach: if there are 4 bad movies and one good one I have an 80% chance in selecting a bad one
FisherRelative Frequency view
Refers to the long run of events: the probability is the limit of chance i.e. in a hypothetical infinite number of movie weekends I will select a bad movie about 80% of the time
Neyman-PearsonSubjective view
Probability is akin to a statement of belief and subjective e.g. I always seem to pick a good one
Bayesian
Some definitionsMutually exclusive1
both events cannot occur simultaneouslyA + !A = impossible
Exhaustive setsset includes all possible eventsthe sum of probabilities of all the events in
the set = 1
Some definitionsEqual likelihood: roll a fair die each time
the likelihood of 1-6 is the same; whichever one we get, we could have just as easily have gotten anotherCounter example- put the numbers 1-7 in a
hat. What’s the probability of even vs. odd?
Independent events:occurrence of one event has no effect on the
probability of occurrence of the other
Laws of probability: AdditionThe question of Or
p(A or B) = p(A) + p(B)Probability of getting a grape or lemon skittle
in a bag of 60 pieces where there are 15 strawberry, 13 grape, 12 orange, 8 lemon, 12 lime?
p(G) = 13/60 p(L) = 8/6013/60 + 8/60 = 21/60 = .35 or a 35% chance
we’ll get one of those two flavors when we open the bag and pick one out
Laws of probability: MultiplicationThe question of AndIf A & B are independentp(A and B) = p(A)p(B)
p(A and B and C) = p(A)p(B)p(C)Probability of getting a grape and a lemon
(after putting the grape back) after two draws from the bag
p(Grape)*p(Lemon) = 13/60*8/60 = ~.0288
Conditional Probabilities and Joint EventsConditional probability
One where you are looking for the probability of some event with some sort of information in hand
e.g. the odds of having a boy given that you had a girl already.1
Joint probabilityProbability of the co-occurrence of eventsE.g. Would be the probability that you have a boy
and a girl for children i.e. a combination of eventsIn this case the conditional would be higher
because if we knew there was already a girl that means they’re of child-rearing age, able to have kids, possibly interested in having more etc.
Conditional probabilities
If events are not independent then:
p(X|Y) = probability that X happens given that Y happensThe probability of X “conditional
on” Yp(A and B) = p(A)*p(B|A)Stress and sleep relationship
conditioned on genderLittle relation for fems, negative
relation for guysThe observed p-value at the
heart of hypothesis testing is a conditional probabilityp(Data|H0)
Joint probabilityWhen dealing with independent events, we can
just use the multiplicative law.Joint probabilities are of particular interest in
classification problems and understanding multivariate relationshipsE.g. Bivariate and multivariate normal distributions
?
Simpson’s paradoxSuccess rates of a
particular therapy
What’s wrong with this picture?
Is the treatment a success?
Control Treatment
Male
N=30
7/10
70%
13/20
65%
Female
N=30
7/20
35%
3/10
30%
Total 14/30
46.7%
16/30
53%
Discrete probability distribution
Involves the distribution for a variable that takes on only a few values
Common example would be the Likert scale
Continuous probability distribution
We often deal with continuous probability distributions in inference, the most famous of which is the normal distribution
The height of the curve is known as the density
We expect values near the ‘hump’ to be more common
Permutations
(1)2)-1)(N-N(N N!
)!(
!
kN
NPNk
Counting is a key part of understanding probability (e.g. we can’t tell how often something occurs if we don’t know how many events occur in general).
Some complexity arises when we consider whether we track the order and whether events are able to be placed back for future selection.1
How many ways can a set of N units be ordered?Factorial
Permutations of size k taken from N objectsOrdered, without replacementThere are 5 songs on your top list, you want to hear any
combination of two. How many pairs of songs can you create? In this case ab != ba, i.e. each ordering counts20
Permutations
)!(!
!
kNk
NC Nk
Combinations: finding the number of combinations of k objects you can choose from a set of n objectsUnordered, without replacement
In this case, any pair considered will not be considered again i.e. ab = ba
From our previous example, there are now only 10 unique pairs to be considered
The combination described above will come back into play as we discuss the binomial
The BinomialBernoulli trials = 2 mutually exclusive
outcomesDistribution of outcomesOrder of items does not matterOnly the probability of various outcomes in
terms of e.g. numbers of heads and tailsN = # trials = 3
Coin tossHow many possible outcomes of the 3
coin tosses are there?List them out: HHH HHT HTT TTT TTH
THH THT HTHNow condense them ignoring order
e.g. HTT = THT = flips result in only 1 headsWhat is the probability of 0 heads, 1
heads, 2 heads, 3 heads?
Distribution of outcomes
Distribution of outcomesNow how about 10 coin flips? That’d be a lot of work writing out all the
possibilities. What’s another way to find the probability
of coin flips?Use the formula for combinations
Binomial distributionFind a probability for an event using:
N = number of trialsr = number of ‘successes’p = probability of ‘success’ on any trialq = 1-p (probability of ‘failure’)CN
r=The number of combinations of N things taken r at a time
( ) ( )!( )
!( )!N r N r r N rr
Np r C p q p q
r N r
9 10 910!( ) ( )
9!(10 9)!p H p T
9 110!(.5) (.5)
9!(10 9)!
9 110*9!(.5) (.5)
9!1!
So if I want to know the odds of getting 9 heads out of 10 coin flips or p(H,H, H,H, H,H, H,H, H,T):
p(9) =
10(.001953)(.5)=.0098 = .01
Now if we did this for all possible hits (heads) on 10 flips:
Number Heads Probability (p value)
0 .001
1 .010
2 .044
3 .117
4 .205
5 .246
6 .205
7 .117
8 .044
9 .010
10 .001
Using these probabilities
What is the probability of getting 4 or fewer heads in 10 coin tosses?
Addition p(4 or1 less) = p(4) + p(3) + p(2) + p(1) +
p(0) = .205 + .117 + .044 + .010 + 001 = p = .377 About 38% chance of getting 4 or fewer
heads on 10 flips
Test a HypothesisNow take it out a step. Suppose you were giving some sort of
treatment to depressed individuals and assumed the treatment could work or not work, and in general would have a 50/50 chance of doing so if it wasn’t anything special (i.e. just a placebo). Then it worked an average of 9 times out 10 administrations.
Would you think there was something special going on or that it was just a chance occurrence based on what was expected?
p = p(9) + p(10) = .011
Not just 50/50 Not every 2 outcome situation has equal probabilities
associated with each option There are two parameters we are concerned with when
considering a binomial distribution 1. p = the probability of a success. (q is 1-p) 2. n = the number of (Bernoulli) trials
More info about binomial distribution = Np 2=Nqp
In R Rcmdr (Distribution menu) ?pbinom (command line)
Approximately “normal” curve when: p is close to 0.5
If not then “skewed” distribution N large
If not then not as representative a distribution
Examples
Small N p = .8 N = 10
Bayesian ProbabilityThomas Bayes (c. 1702 –1761)The Bayesian approach involves weighing the
probability of an event by prior experience/knowledge, and as such fits in well with accumulation of knowledge that is science.
As new evidence presents itself, we will revise our previous assessment of the likelihood of some event
Prior probabilityInitial assessment
Posterior probabilityRevised estimate
Bayesian Probability
)()|()()|(
)()|()|(
1100
00
HpHDpHpHDp
HpHDpDHp
With regard to hypothesis testing:p(H0) = probability of the null hypothesisp(D|H0) = the observed p-value we’re used
to seeing, i.e. the probability of the data given the null hypothesis
p(H1) = probability of an alternative1
p(D|H1) = probability of the data given the alternative hypothesis
Empirical Bayes method in statistics Bayesian statistics is becoming more common in a
variety of disciplines Advantages: all the probabilities regarding hypothesis
testing make sense, interval estimates etc. are what we think they are and what they are not in null hypothesis testing
Disadvantage: if the priors are not well thought out, could lead to erroneous conclusions
Why don’t we see more of it?You actually have to think of not only ‘non-nil’ hypotheses
but perhaps several viable competing hypotheses, and this entails: Actually knowing prior research very well1 Not being lazy with regard to the ‘null’, which now becomes any
other hypothesis We will return with examples regarding proportions and
means later in the semester.
SummaryWhile it seems second nature to assess
probabilities, it’s actually not an easy process in the scientific realm
Knowing exactly what our probability regards and what it does not is the basis for inferring from a sample to the population
Not knowing what the probability entails results in much of the misinformed approach you see in statistics in the behavioral sciences
Recommended