41
Probability and Statistics Review Thursday Mar 12

Probability and Statistics Review Thursday Mar 12

Embed Size (px)

Citation preview

Page 1: Probability and Statistics Review Thursday Mar 12

Probability and Statistics Review

Thursday Mar 12

Page 2: Probability and Statistics Review Thursday Mar 12

מודל למידה

?מה המשתנים החשובים?מה הטווח שלהם?מהן הקומבינציות החשובות

Page 3: Probability and Statistics Review Thursday Mar 12

חזרה מהירה על הסתברות

מאורע , אוסף מאורעות•משתנה מקרי •הסתברות מותנית , חוק ההסתברות השלמה , •

חוק השרשרת ,חוק בייסתלות ואי תלות •שונות ,תוחלת ושונות משותפת••Moments

Page 4: Probability and Statistics Review Thursday Mar 12

Sample space and Events

• Sample Space, result of an experiment• If you toss a coin twice

• Event: a subset of • First toss is head = {HH,HT}

• S: event space, a set of events• Closed under finite union and complements

• Entails other binary operation: union, diff, etc.• Contains the empty event and

Page 5: Probability and Statistics Review Thursday Mar 12

Probability Measure

• Defined over (Ss.t.• P() >= 0 for all in S• P() = 1• If are disjoint, then

• P( U ) = p() + p()• We can deduce other axioms from the above ones

• Ex: P( U ) for non-disjoint event

Page 6: Probability and Statistics Review Thursday Mar 12

Visualization

• We can go on and define conditional probability, using the above visualization

Page 7: Probability and Statistics Review Thursday Mar 12

הסתברות מותנית

-P(F|H) = Fraction of worlds in which H is true that also have F true

)(

)()|(

Hp

HFpHFp

Page 8: Probability and Statistics Review Thursday Mar 12

Rule of total probability

A

B1

B2B3

B4

B5

B6B7

ii BAPBPAp |

Page 9: Probability and Statistics Review Thursday Mar 12

Bayes Rule

• We know that P(smart) = .7• If we also know that the students grade is

A+, then how this affects our belief about his intelligence?

• Where this comes from?

)(

)|()(|

yP

xyPxPyxP

Page 10: Probability and Statistics Review Thursday Mar 12

דוגמא

מתוצרת המפעל מיוצרת B . 10% ו-Aבמפעל פעולות שתי מכונות 5% ו A מהמוצרים המיוצרים במכונה B. 1% במכונה 90% ו-Aבמכונה

הם פגומים.Bמהמוצרים המיוצרים במכונה ?נבחר מוצר אקראי, מה ההסתברות שהוא פגום נמצא מוצר שהוא פגום, מה ההסתברות שיוצר במכונהA? אחרי ביקור של טכנאי שמטפל במכונהB ממוצרי 1.9% , מוצאים ש

Bהמפעל הם פגומים. מה עכשיו ההסתברות שמוצר המיוצר במכונה יהיה פגום?

פתרון:-המוצר A , B-המוצר הנבחר יוצר במכונה Aנגדיר את המאורעות הבאים:

-המוצר שנבחר פגום.B, Cהנבחר יוצר במכונה P(C|A)*P(A)+P(C|B)*P(B)א. עפ"י נוסחת ההסתברות השלמה מתקיים:

0.046=0.01*0.1 +0.05*0.9|P(A|C)=P(C|A)*P(A)/P(C) => P(A ב. עפ"י נוסחת בייס:

C)=0.01*0.1/0.046=0.021

Page 11: Probability and Statistics Review Thursday Mar 12

משתנה מקרי בדיד

מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) •למרחב המאפיינים

בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי •

• Modeling students (Grade and Intelligence):• all possible students• What are events

• Grade_A = all students with grade A• Grade_B = all students with grade A• Intelligence_High = … with high intelligence

Page 12: Probability and Statistics Review Thursday Mar 12

Random Variables

High

low

A

B A+

I:Intelligence

G:Grade

Page 13: Probability and Statistics Review Thursday Mar 12

Random Variables

High

low

A

B A+

I:Intelligence

G:Grade

P(I = high) = P( {all students whose intelligence is high})

Page 14: Probability and Statistics Review Thursday Mar 12

הסתברות משותפת

• Joint probability distributions quantify this • P( X= x, Y= y) = P(x, y)

• How probable is it to observe these two attributes together?• How can we manipulate Joint probability distributions?

.1,2,3,4דוגמא:מרכיבים באקראי מס' דו סיפרתי מהספרות 1 מס' הפעמים שהספרה Y מס' הספרות השונות המופיעות במס' ו Xיהי

מופיעה.?)X,Y(מהי ההתפלגות המשותפת של הזוג

XY

1 2

0 3/16 6/16

1 0 6/16

2 1/16 0

Page 15: Probability and Statistics Review Thursday Mar 12

חוק השרשרת

• Always true• P(x,y,z) = p(x) p(y|x) p(z|x, y)

= p(z) p(y|z) p(x|y, z) =…

כדי לסבך קצת את העניינים נוסיף לשאלה מקודם

יהיה מס הפעמים שמופיע ספרה גדולה ממש zאת הנתון הבא 2מ

P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x=2)=0.75*[(6/16)/P(x=2)]*[(4/16)/(6/16)]

Page 16: Probability and Statistics Review Thursday Mar 12

Conditional Probability

P X YP X Y

P Y

x yx y

y

)(

),(|

yp

yxpyxP

But we will always write it this way:

events

Page 17: Probability and Statistics Review Thursday Mar 12

הסתברות השולית

• We know P(X,Y), what is P(X=x)?• We can use the low of total probability, why?

y

y

yxPyP

yxPxp

|

,

A

B1

B2B3B4

B5

B6B7

Page 18: Probability and Statistics Review Thursday Mar 12

Marginalization Cont.

• Another example

yz

zy

zyxPzyP

zyxPxp

,

,

,|,

,,

Page 19: Probability and Statistics Review Thursday Mar 12

Bayes Rule cont.

• You can condition on more variables

)|(

),|()|(,|

zyP

zxyPzxPzyxP

Page 20: Probability and Statistics Review Thursday Mar 12

אי תלות

• X is independent of Y means that knowing Y does not change our belief about X.• P(X|Y=y) = P(X) • P(X=x, Y=y) = P(X=x) P(Y=y)

• Why this is true?• The above should hold for all x, y• It is symmetric and written as X Y

Page 21: Probability and Statistics Review Thursday Mar 12

CI: Conditional Independence

• X Y | Z if once Z is observed, knowing the value of Y does not change our belief about X• The following should hold for all x,y,z• P(X=x | Z=z, Y=y) = P(X=x | Z=z) • P(Y=y | Z=z, X=x) = P(Y=y | Z=z) • P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)

We call these factors : very useful concept !!

Page 22: Probability and Statistics Review Thursday Mar 12

Properties of CI

• Symmetry: – (X Y | Z) (Y X | Z)

• Decomposition: – (X Y,W | Z) (X Y | Z)

• Weak union: – (X Y,W | Z) (X Y | Z,W)

• Contraction: – (X W | Y,Z) & (X Y | Z) (X Y,W | Z)

• Intersection: – (X Y | W,Z) & (X W | Y,Z) (X Y,W | Z) – Only for positive distributions!– P()>0, 8, ;

Page 23: Probability and Statistics Review Thursday Mar 12

Monty Hall Problem

You're given the choice of three doors: Behind one door is a car; behind the others, goats.

You pick a door, say No. 1 The host, who knows what's behind the doors,

opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?

Page 24: Probability and Statistics Review Thursday Mar 12

                        

                        

Host mustreveal Goat B

                        

                        

Host mustreveal Goat A

                       

                             

Host revealsGoat A

orHost reveals

Goat B

           

             

Page 25: Probability and Statistics Review Thursday Mar 12

Monty Hall Problem: Bayes Rule

: the car is behind door i, i = 1, 2, 3 : the host opens door j after you pick door i

iC

ijH 1 3iP C

0

0

1 2

1 ,

ij k

i j

j kP H C

i k

i k j k

Page 26: Probability and Statistics Review Thursday Mar 12

Monty Hall Problem

WLOG, i=1, j=3

13 1 11 13

13

P H C P CP C H

P H

13 1 11 1 1

2 3 6P H C P C

Page 27: Probability and Statistics Review Thursday Mar 12

Monty Hall Problem: Bayes Rule cont.

13 13 1 13 2 13 3

13 1 1 13 2 2

, , ,

1 11

6 31

2

P H P H C P H C P H C

P H C P C P H C P C

1 131 6 1

1 2 3P C H

Page 28: Probability and Statistics Review Thursday Mar 12

Monty Hall Problem: Bayes Rule cont.

1 131 6 1

1 2 3P C H

You should switch!

2 13 1 131 2

13 3

P C H P C H

Page 29: Probability and Statistics Review Thursday Mar 12

Moments

Mean (Expectation): Discrete RVs:

Continuous RVs:

Variance: Discrete RVs:

Continuous RVs:

XE X P X

ii iv

E v v XE xf x dx

2X XV E 2X P X

ii iv

V v v 2XV x f x dx

Page 30: Probability and Statistics Review Thursday Mar 12

Properties of Moments

Mean If X and Y are independent,

Variance If X and Y are independent,

X Y X YE E E X XE a aE

XY X YE E E

2X XV a b a V

X Y (X) (Y)V V V

Page 31: Probability and Statistics Review Thursday Mar 12

The Big Picture

Model Data

Probability

Estimation/learning

Page 32: Probability and Statistics Review Thursday Mar 12

Statistical Inference

Given observations from a model What (conditional) independence assumptions

hold? Structure learning

If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

Page 33: Probability and Statistics Review Thursday Mar 12

MLE

Maximum Likelihood estimation Example on board

Given N coin tosses, what is the coin bias ( )? Sufficient Statistics: SS

Useful concept that we will make use later In solving the above estimation problem, we only

cared about Nh, Nt , these are called the SS of this model. All coin tosses that have the same SS will result in the

same value of Why this is useful?

Page 34: Probability and Statistics Review Thursday Mar 12

Statistical Inference

Given observation from a model What (conditional) independence assumptions

holds? Structure learning

If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

We need some concepts from information theory

Page 35: Probability and Statistics Review Thursday Mar 12

Information Theory

• P(X) encodes our uncertainty about X• Some variables are more uncertain that others

• How can we quantify this intuition?• Entropy: average number of bits required to encode X

P(X) P(Y)

X Y

xP xP

xPxp

EXH1

log1

log

Page 36: Probability and Statistics Review Thursday Mar 12

Information Theory cont.

• Entropy: average number of bits required to encode X

• We can define conditional entropy similarly

• We can also define chain rule for entropies (not surprising)

xP xP

xPxp

EXH1

log1

log

YHYXHyxp

EYXH PPP

,

|

1log|

YXZHXYHXHZYXH PPPP ,||,,

Page 37: Probability and Statistics Review Thursday Mar 12

Mutual Information: MI

• Remember independence?• If XY then knowing Y won’t change our belief about X• Mutual information can help quantify this! (not the only

way though)• MI:

• Symmetric• I(X;Y) = 0 iff, X and Y are independent!

YXHXHYXI PPP |;

Page 38: Probability and Statistics Review Thursday Mar 12

Continuous Random Variables

What if X is continuous? Probability density function (pdf) instead of

probability mass function (pmf) A pdf is any function that describes the

probability density in terms of the input variable x.

f x

Page 39: Probability and Statistics Review Thursday Mar 12

PDF Properties of pdf

Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is

0,f x x

1f x

1

0P 0 1X f x dx

1 ???f x

Page 40: Probability and Statistics Review Thursday Mar 12

Cumulative Distribution Function

Discrete RVs

Continuous RVs

X P XF v v

X P Xi

ivF v v

X

vF v f x dx

X

dF x f x

dx

Page 41: Probability and Statistics Review Thursday Mar 12

Acknowledgment

Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html

Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html