18
Data Analysis Class 4: Probability distributions and densities

Data Analysis Class 4: Probability distributions and densities

Embed Size (px)

Citation preview

Page 1: Data Analysis Class 4: Probability distributions and densities

Data Analysis

Class 4: Probability distributions and densities

Page 2: Data Analysis Class 4: Probability distributions and densities

Random variables

• Binary (e.g. heads = 1, tails = 0; plane crashes = 1, does not crash = 0)

• Discrete (e.g. number of heads in a series of coin tosses; number of plane crashes in a given time span)

• Continuous (e.g. time to next plane crash; height of a person)

• Vectorial• …

Page 3: Data Analysis Class 4: Probability distributions and densities

Probability distribution

• For binary/discrete random variables X:P(X=x)=?

• Specification of this for all x is a probability distribution

• Condition:

x

xXP 1)(

Page 4: Data Analysis Class 4: Probability distributions and densities

Probability distribution

• For example, for a coin flip:P(X=1)=0.4, and P(X=0)=0.6

• Note: P(X=1)+P(X=0)=1

Page 5: Data Analysis Class 4: Probability distributions and densities

Probability density function

• For continuous random variables X:• P(X=x), defined in the same way, is probably 0

(there are an infinite number of other possible outcomes)

• So we’ll define P(X=x) in a different way, such that: P(X=x)dxis the probability of the event X=x

• Note: this probability is indeed infinitesimally small

• Condition: x

dxxXP 1)(

Page 6: Data Analysis Class 4: Probability distributions and densities

Probability density function

• E.g. height of random people• Gaussian distribution:

• A ‘clock’ centred around mu and with width proportional to sigma

• Note:

2

2

2

)(exp

2

1)(

x

xXP

x x

dxx

dxxXP 12

)(exp

2

1)(

2

2

Page 7: Data Analysis Class 4: Probability distributions and densities

Some probability distributions

• Bernoulli distribution (binary)• Geometric distribution (discrete)• Binomial distribution (discrete)

• Gaussian distribution (continuous)

• Exponential distribution (continuous)• Poisson distribution (discrete)

Page 8: Data Analysis Class 4: Probability distributions and densities

Bernoulli distribution

• X is a binary random variable (success/1 versus failure/0)

• For example: – Biased coin– Whether a given plane crashes

pXP

pXP

1)0(

)1(

Page 9: Data Analysis Class 4: Probability distributions and densities

Geometric distribution

• X is the number of Bernoulli experiments to the first ‘success’, where the success probability is p

• Note:

as required

1)1(1

1

x

x pp

ppxXP x 1)1()(

Page 10: Data Analysis Class 4: Probability distributions and densities

Binomial distribution

• X is the number of successes (with success probability p) in n Bernoulli experiments

• Again: sums to 1…

xnxnx ppxXP )1()(

Page 11: Data Analysis Class 4: Probability distributions and densities

Gaussian distribution

• E.g. height of random people• Gaussian density function:

• A ‘clock’ centred around mu and with width proportional to sigma

• Note:

2

2

2

)(exp

2

1)(

x

xXP

x x

dxx

dxxXP 12

)(exp

2

1)(

2

2

Page 12: Data Analysis Class 4: Probability distributions and densities

Exponential density function

• Time to the first future plane crash (i.e. X>0)• Assume non-zero interval Δx• Probability of a crash in Δx is p=λΔx for some λ• Then, probability that the first crash is at time x

is (geometric distribution)

• Only valid for small enough Δx (then probability of >1 crashes in Δx becomes negligible)

• Limit for Δx0: dxxxXP exp)(

11)( x

x

xxxXP

Page 13: Data Analysis Class 4: Probability distributions and densities

Exponential probability density

• Thus (with P the probability density function):

• Exponentially decaying…• From this, the cumulative exponential distribution function:

• Note:

x)(- x) P(X exp

1)exp()( xx

dxxdxxXP

)exp(1)()(0

tdxxXPtXPtx

x

Page 14: Data Analysis Class 4: Probability distributions and densities

Poisson distribution

• Distribution over the number of plane crashes in a unit time interval

• Limit of the Binomial distribution:– Binomial: n trials, probability p per trial– Poisson: n/Δx trials, probability pΔx per trial, in the

limit for Δx0 (work it out!)

• Result:

!

)exp()(

xxXP

x

Page 15: Data Analysis Class 4: Probability distributions and densities

Summary

Type Random variable X

Distribution / density function Parameters

Bernoulli Binary

Geometric Positive integer

Binomial 0,1,…,n

Gaussian Real number

Exponential Positive real

Poisson Positive integer

p

p

np,

,

pXP

pXP

1)0(

)1(

xnxnx ppxXP )1()(

ppxXP x 1)1()(

2

2

2

)(exp

2

1)(

x

xXP

x)(- x) P(X exp

!

)exp()(

xxXP

x

Page 16: Data Analysis Class 4: Probability distributions and densities

Properties of distributions

• Mean

• Variance

• Standard deviation = square root of variance

xx

dxxXxPxXxPXXE )()(

x x

dxxXPxxXPxXXE )()()()( 22222

Page 17: Data Analysis Class 4: Probability distributions and densities

Summary

Type Distribution / density function Mean Variance

Bernoulli

Geometric

Binomial

Gaussian

Exponential

Poisson

pXP

pXP

1)0(

)1(

xnxnx ppxXP )1()(

ppxXP x 1)1()(

2

2

2

)(exp

2

1)(

x

xXP

x)(- x) P(X exp

!

)exp()(

xxXP

x

p

1

p1

2

1

p

p

21

)1( pp

np )1( pnp

2

Page 18: Data Analysis Class 4: Probability distributions and densities

Lab session• Compute conditional probability density and expectation for the

exponential density, conditional on X>t. (report)

• Complete the tables in these lecture notes with a uniform distribution and uniform density. (report)

• Compute the cumulative distributions of all distributions discussed (or find on wikipedia!)

• Plot the exponential distribution for 3 different values of lambda, as well as the cumulative exponential distribution. (report)

• Plot the Poisson distribution for 3 different values of lambda, as well as the cumulative Poisson distribution.

• Randomly sample n=10 passengers, do this N=1000 times, and plot a histogram of how many of these 10 passengers are in third class in each of the 1000 randomisations. Which distribution does this follow? (report)

• Make a histogram of the temperatures in all January months. Which distribution does this follow? (report)

)(

)(

)(

),()|(

tXP

xXP

tXP

tXxXPtXxXP