Upload
trinity-joyce
View
215
Download
2
Embed Size (px)
Citation preview
Data Analysis
Class 4: Probability distributions and densities
Random variables
• Binary (e.g. heads = 1, tails = 0; plane crashes = 1, does not crash = 0)
• Discrete (e.g. number of heads in a series of coin tosses; number of plane crashes in a given time span)
• Continuous (e.g. time to next plane crash; height of a person)
• Vectorial• …
Probability distribution
• For binary/discrete random variables X:P(X=x)=?
• Specification of this for all x is a probability distribution
• Condition:
x
xXP 1)(
Probability distribution
• For example, for a coin flip:P(X=1)=0.4, and P(X=0)=0.6
• Note: P(X=1)+P(X=0)=1
Probability density function
• For continuous random variables X:• P(X=x), defined in the same way, is probably 0
(there are an infinite number of other possible outcomes)
• So we’ll define P(X=x) in a different way, such that: P(X=x)dxis the probability of the event X=x
• Note: this probability is indeed infinitesimally small
• Condition: x
dxxXP 1)(
Probability density function
• E.g. height of random people• Gaussian distribution:
• A ‘clock’ centred around mu and with width proportional to sigma
• Note:
2
2
2
)(exp
2
1)(
x
xXP
x x
dxx
dxxXP 12
)(exp
2
1)(
2
2
Some probability distributions
• Bernoulli distribution (binary)• Geometric distribution (discrete)• Binomial distribution (discrete)
• Gaussian distribution (continuous)
• Exponential distribution (continuous)• Poisson distribution (discrete)
Bernoulli distribution
• X is a binary random variable (success/1 versus failure/0)
•
• For example: – Biased coin– Whether a given plane crashes
pXP
pXP
1)0(
)1(
Geometric distribution
• X is the number of Bernoulli experiments to the first ‘success’, where the success probability is p
•
• Note:
as required
1)1(1
1
x
x pp
ppxXP x 1)1()(
Binomial distribution
• X is the number of successes (with success probability p) in n Bernoulli experiments
•
• Again: sums to 1…
xnxnx ppxXP )1()(
Gaussian distribution
• E.g. height of random people• Gaussian density function:
• A ‘clock’ centred around mu and with width proportional to sigma
• Note:
2
2
2
)(exp
2
1)(
x
xXP
x x
dxx
dxxXP 12
)(exp
2
1)(
2
2
Exponential density function
• Time to the first future plane crash (i.e. X>0)• Assume non-zero interval Δx• Probability of a crash in Δx is p=λΔx for some λ• Then, probability that the first crash is at time x
is (geometric distribution)
• Only valid for small enough Δx (then probability of >1 crashes in Δx becomes negligible)
• Limit for Δx0: dxxxXP exp)(
11)( x
x
xxxXP
Exponential probability density
• Thus (with P the probability density function):
• Exponentially decaying…• From this, the cumulative exponential distribution function:
• Note:
x)(- x) P(X exp
1)exp()( xx
dxxdxxXP
)exp(1)()(0
tdxxXPtXPtx
x
Poisson distribution
• Distribution over the number of plane crashes in a unit time interval
• Limit of the Binomial distribution:– Binomial: n trials, probability p per trial– Poisson: n/Δx trials, probability pΔx per trial, in the
limit for Δx0 (work it out!)
• Result:
!
)exp()(
xxXP
x
Summary
Type Random variable X
Distribution / density function Parameters
Bernoulli Binary
Geometric Positive integer
Binomial 0,1,…,n
Gaussian Real number
Exponential Positive real
Poisson Positive integer
p
p
np,
,
pXP
pXP
1)0(
)1(
xnxnx ppxXP )1()(
ppxXP x 1)1()(
2
2
2
)(exp
2
1)(
x
xXP
x)(- x) P(X exp
!
)exp()(
xxXP
x
Properties of distributions
• Mean
• Variance
• Standard deviation = square root of variance
xx
dxxXxPxXxPXXE )()(
x x
dxxXPxxXPxXXE )()()()( 22222
Summary
Type Distribution / density function Mean Variance
Bernoulli
Geometric
Binomial
Gaussian
Exponential
Poisson
pXP
pXP
1)0(
)1(
xnxnx ppxXP )1()(
ppxXP x 1)1()(
2
2
2
)(exp
2
1)(
x
xXP
x)(- x) P(X exp
!
)exp()(
xxXP
x
p
1
p1
2
1
p
p
21
)1( pp
np )1( pnp
2
Lab session• Compute conditional probability density and expectation for the
exponential density, conditional on X>t. (report)
• Complete the tables in these lecture notes with a uniform distribution and uniform density. (report)
• Compute the cumulative distributions of all distributions discussed (or find on wikipedia!)
• Plot the exponential distribution for 3 different values of lambda, as well as the cumulative exponential distribution. (report)
• Plot the Poisson distribution for 3 different values of lambda, as well as the cumulative Poisson distribution.
• Randomly sample n=10 passengers, do this N=1000 times, and plot a histogram of how many of these 10 passengers are in third class in each of the 1000 randomisations. Which distribution does this follow? (report)
• Make a histogram of the temperatures in all January months. Which distribution does this follow? (report)
)(
)(
)(
),()|(
tXP
xXP
tXP
tXxXPtXxXP