Documentp2

Probability and statistics crash course

http://www.comp.leeds.ac.uk/hannah/mathsclub

Probability 1 (for dummies:-)

Stats 1 (averages and deviations)

Probability 2 (Trials and distributions)

Stats 2 (significance)

Stats 3 (errors)

– p. 1/19

Random variables

A random variable is an abstraction: it is how probabilitytheorists refer to things that can take more than one state,usually with associated probabilities.

A fair 6 sided dice can be modelled as a randomvariable with the possible outcomes {1, 2, 3, 4, 5, 6}. Theprobability of each state is 1

6 .

Random variables can be discrete, like dice. . .

. . . or continuous, like the height of a sunflower

Continuous random variables can take any number ofdifferent values

– p. 2/19

Probability functions

For discrete random variables, we can represent theprobability that each possible state occurs with a probabilitymass function, or pmf.

For continuous random variables, we can represent thedistribution of probabilities with a probability densityfunction, or pdf.

To find the probability of a particular continuous randomvariable falling between two values, calculate the areaunder the pdf for those values.

– p. 3/19

Bernoulli trials

Any experiment where there’s a random outcome and thiscan be either “success” or “failure” is known as a Bernoullitrial. A discrete random variable with 2 values is all youneed.

Head or tails?

Female or male?

Throwing a 6?

The Expectation E of a Bernoulli distribution with probility pis E(X) = p, and the variance V (X) = p(1 − p).

– p. 4/19

Bernoulli distribution

The Bernoulli distribution is the distribution of outcomes in aBernoulli trial. It’s really very simple. It’s indexed by a singleparameter p, which is the probability of success.

Figure 1: pmf of a Bernoulli trial with p=0.2

– p. 5/19

Trials

Think about throwing a dice repeatedly. How long are youlikely to have to wait to get a 6?

Figure 2: Waiting for a 6

– p. 6/19

Trials 2

Think about throwing a dice 50 times. How many sixes willyou get?

Figure 3: Counting the number of 6s in 50 throws,

1000 times

– p. 7/19

Distributions

The previous two slides are examples of the kind ofdistribution you get when you ask particular questionsabout a Bernoulli variable, and then carry out theexperiments to see what happens.

If you want to play with the parameters, the c++ code togenerate the data is on the web athttp://www.comp.leeds.ac.uk/hannah/mathsclub

Perhaps unsurprisingly, these distributions can be modelledtheoretically...

– p. 8/19

http://www.comp.leeds.ac.uk/hannah/mathsclub

Waiting for a 6 revisited

The probability of getting a 6 on the first throw is...

The probability of NOT getting a 6 on the first throw thengetting one on the second throw is...

The probability of NOT getting a 6 on the first or secondthrows then getting one on the third throw is...

– p. 9/19

Geometric distribution

Waiting for success in a Bernoulli trial is governed by aGeometric distribution.

P (X = j) = (1 − p)j−1p

– p. 10/19

Probability distribution: Geometric

Figure 4: Probability of rolling a 6 in X throws; theo-

reticalNote: Unlike a few slides back, this pmf sums to 1, and is abit tidier!

– p. 11/19

Binomial distribution

The pmf for a sum of n independent Bernoulli randomvariables with success probability p is the Binomialdistribution.

p(x) =

(

n

x

)

px(1 − p)n−x

As you will remember from a few weeks back, “n choose x”is defined as

(

n

x

)

=n!

x!(n − x)!

– p. 12/19

pdfs for continuous random variables

Three main types of pdf are found with continuous randomvariables. There are more, but these are the three big ones.

Continuous uniform distribution

Exponential distribution

Normal distribution (the famed Gaussian)

The probability of something falling between two values inone of these distributions is the area under the distribution.

– p. 13/19

Continuous uniform distribution

A continuous uniform distribution on the interval [a, b] haspdf given by

f(x) =1

b − a

At all points, the probability is the same, and equal to 1b−a

.

– p. 14/19

Exponential distribution

The pdf of a continuous random variable can sometimes bemodelled as an exponential distribution

Figure 5: Exponential distributions: lambda=0.2 and

0.5

– p. 15/19

Exponential distribution: The sums bit

f(x) = λe−λx

Mean of an exponential distribution= 1λ.

Variance of an exponential distribution= 1λ2 .

– p. 16/19

Normal distribution

Most things are normally distributed (blanket statementalert). A normal distribution (Gaussian) is defined by twoparameters: mean and standard deviation.

Figure 6: Normal distributions: sd=2, 4 and 1,

means at either 5 or 7– p. 17/19

Normal distribution: The sums bit

The random variable X is normally distributed with mean µ

and variance σ2 if the pdf is given by. . .

f(x) =1

σ√

2πe

1

2(x−µ

σ)2

If the normal distribution has mean 0 and variance 1, it’scalled the Standard Normal distribution and is referred to asZ.

– p. 18/19

Central Limit Theorem

The central limit theorem is what a lot of statistics is basedupon.

The distribution of an average is normal, even if thedistribution from which the average is drawn is totallystrange

This bit is magic, and probably best addressed with ananimation:http://www.statisticalengineering.com/central_limit_theorem.htm

– p. 19/19

Documents

Documentp2