58
Theoretical Distributions in Probability and Statistics

Theoretical Distributions in Probability and Statistics

Embed Size (px)

Citation preview

Page 1: Theoretical Distributions in Probability and Statistics

Theoretical Distributions in Probability and Statistics

Page 2: Theoretical Distributions in Probability and Statistics

Decision-making

In a large family where it is known that there is genetic pre-disposition to suffer from diabetes, how many children out of a possible 7 are likely to be affected by diabetes?

A hospital administrator needs to decide how many people to staff the Accident and Emergency Department of the hospital during 9am to 12pm on weekdays. How should the administrator decide?

In 2004, the World Health Organisation (WHO) revised the body-mass index (BMI) definitions for overweight and obese individuals in Asian populations. Instead of a BMI range of 25 – 29 for defining overweight, and a BMI range of > 30 for defining obese (as are used in Caucasian populations), the corresponding ranges for Asian populations are 23 – 27.5 and > 27.5. How did the scientists at WHO decide on the new ranges?

Modeling the outcome variable with some appropriate theoretical framework

Page 3: Theoretical Distributions in Probability and Statistics

1. Data checking, identifying problems and characteristics

2. Understanding chance and uncertainty

3. How will the data for one attribute behave, in a theoretical framework?

Data exploration and Statistical analysis

Page 4: Theoretical Distributions in Probability and Statistics

DataData exploration,

categorical / numerical outcomes Model each outcome with

a theoretical distribution

Page 5: Theoretical Distributions in Probability and Statistics

Random variable

Definition: A random variable is a theoretical consideration of the possible outcome of an event.

Example: In a survey of 5 students, how many female students are there?

The answer to this is a random variable. The possible outcome are 0, 1, 2, 3, 4 or 5 female students. So the random variable describes what the answer could have been, prior to finding out the actual answer.

Suppose we know that out of 5 students, there are 4 girls. Then there is no uncertainty nor variability anymore, the exact answer is known and thus this is not a random variable anymore.

Page 6: Theoretical Distributions in Probability and Statistics

Discrete random variables

Page 7: Theoretical Distributions in Probability and Statistics

Probability mass function

Definition: The PMF describes the probability of the possible events for a random outcome.

Properties of a probability function:

Page 8: Theoretical Distributions in Probability and Statistics

Example 1: Let X denote the number of heads obtained when an unbiased coin is tossed 3 times. Find the probability distribution of X. Find also P(|X – 2| 1.2).

Page 9: Theoretical Distributions in Probability and Statistics

Cumulative distribution function

Definition: The CDF describes the joint probability of multiple events, and is formally defined as F(X) = P(X x) for any real x.

Properties of a CDF:

Page 10: Theoretical Distributions in Probability and Statistics

Example 2:

Page 11: Theoretical Distributions in Probability and Statistics

Uniform Distribution

Definition: A random variable is said to follow a Uniform distribution if any of the possible outcomes are equally likely.

Mathematically: P(X = x) = constant.

So if there are n possible outcomes, the chance of each of the outcomes is 1 / n.

Page 12: Theoretical Distributions in Probability and Statistics

Example 3:In a game of chance, a gambler chooses an integer between 13 and 18 inclusive (including 13 and 18). There are equal chances for any number in the set {13, 14, 15, 16, 17, 18} to be drawn. Let X be the random variable denoting the number drawn. Find the probability distribution of X and also P(X < 16).

Page 13: Theoretical Distributions in Probability and Statistics

Bernoulli Distribution

A random experiment with two possible outcomes, conveniently defined as “success” or “failure” is called a Bernoulli trial after Jacob Bernoulli (1654 – 1705). The choice of the event as “success” or “failure” is completely arbitrary.

Images from www.google.com

Example: a toss of a coin will show either a head or a tail. The “success” event can be either the head, or the tail.

Conventionally, p denotes the probability of success and 1 – p denotes the probability of failure.

Page 14: Theoretical Distributions in Probability and Statistics

Binomial Distribution

The number of “success” events out of n repeated trials, each trial resulting in 2 mutually exclusive outcomes with the repeated trials being mutually independent, follows a Binomial distribution.

Page 15: Theoretical Distributions in Probability and Statistics

Example 4:A batch of pregnancy test kit contains 50 kits of which 10% are known to be defective. If 3 test kits are randomly chosen with replacement from the batch, what is the probability that:

(i) all will be defective;(ii) none will be defective;(iii) at least one will be defective;(iv) exactly one will be defective;(v) exactly two will be defective;(vi) not more than two will be defective.

Page 16: Theoretical Distributions in Probability and Statistics

Multinomial Distribution

The Binomial distribution has been used to obtain probabilities for the number of times an event of interest (out of 2 possible events) occurs when the same experiment is repeated several times.

Sometimes one is interested to count the number of occurrences of several events simultaneously. In such a situation the multinomial distribution is useful.

Assuming there are k possible outcomes, and E1, E2, …, Ek denote the corresponding number of occurrences of each of the possible outcomes out of a total of n events, then

with pi = P(Ei).

Page 17: Theoretical Distributions in Probability and Statistics

Example 5:When snapdragons with pink flowers are crossed, a randomly chosen offspring has either red (with prob. 0.25), pink (with prob. 0.50) or white (with prob. 0.25) flowers. What is the probability that among 10 randomly chosen seeds, 3 will develop white flowers, 2 red ones and 5 pink flowers?

Page 18: Theoretical Distributions in Probability and Statistics

Poisson Distribution

The Poisson distribution is usually used to calculate the probabilities of a number of occurrences of a rare event. Often these cases are such that an event can occur repeatedly over a long period of time or over a large area; the distribution applies to the number of occurrences in a small interval of time or over a small area.

Example: machine breakdowns, arrivals of calls at a telephone exchange, faults developing in a pipeline, random arrival of customers at a service station, accident occurrences, radioactive decay, gene mutations at a particular locus

Page 19: Theoretical Distributions in Probability and Statistics

Assumptions of a Poisson Distribution

• The outcomes occur randomly.

• The number of outcomes occurring in one time interval or specified region is independent of the number that occur in any other disjoint time interval or region.

• The probability that a single outcome will occur during a very short time interval or in a small region is a very small and is constant.

• The probability of 2 or more outcomes occurring in such a short time interval or fall in such a small region is negligible.

Page 20: Theoretical Distributions in Probability and Statistics

Properties of a Poisson Distribution

(A) If X ~ Binomial(n, p), X Poisson (np) as n , p 0, with np constant. That is, the Poisson distribution arises as the limiting case of the Binomial distribution.  (B) Suppose that X1 and X2 are independent random variables with X1 ~ Poisson(1) and X2 ~ Poisson(2), then Y = X1 + X2 ~ Poisson(1 + 2). That is, the sum of two independent Poisson random variables also has a Poisson distribution.  

Page 21: Theoretical Distributions in Probability and Statistics

Example 6: The number of emergency admissions each day to a hospital is found to have a Poisson distribution with mean 2.

a) Evaluate the probability that on a particular day there will be no emergency admissions.

b) At the beginning of one day, the hospital has 5 beds available for emergencies. Calculate the probability that this will be an insufficient number for the day.

c) Calculate the probability that there will be exactly 3 admissions altogether on two consecutive days.

Page 22: Theoretical Distributions in Probability and Statistics

Example 7: Oranges are packed in crates each containing 250. On the average 0.6% are found to be bad when the crates are opened. What is the probability that there will be more than 2 bad oranges in a crate?

Page 23: Theoretical Distributions in Probability and Statistics

Recap – Numerical EDA

• Calculating informative numbers which summarise the dataset

• What are the numbers useful for describing the age of 1,059 individuals with diabetes?

20 30 40 50 60 70 80

AGE

• Location parameters (mean, median, mode)

Mean age (54.6 years)

• Spread (range, standard deviation, interquartile range)

• Skewness Properties of means and variances in theoretical distributions play important roles in determining variations in the definitions of the outcomes

Page 24: Theoretical Distributions in Probability and Statistics

Mean (Expectation) of a discrete random variable

The expectation of a discrete outcome X, commonly known as the mean of X or the expected value of X, is denoted as E(X) and defined as

The value of E(X) refers to the average value of x that one can expect after sampling a large number of values from . E(X) is the long run average of observations of the variable X.

The expectation of any function g(.) which depends on the random variable X, g(X), is defined as follows

Page 25: Theoretical Distributions in Probability and Statistics

Variance of a discrete random variable

The variance of X, or the population variance of X, is denoted by Var(X) and is defined as

Var(X) is usually denoted by 2, and is defined to be the standard deviation of X.

Page 26: Theoretical Distributions in Probability and Statistics

Functions of means and variances

Page 27: Theoretical Distributions in Probability and Statistics

Example 8: Find the expected score of a single roll of a fair die.

Page 28: Theoretical Distributions in Probability and Statistics

Continuous random variables

Definition:A continuous random variable X takes any value in a given range, and theoretically can be measured to any desired degree of accuracy. (E.g. height, weight, age, etc.)

When the total number of possible outcome is very large, the histogram will approximate to a smooth curve called a frequency curve or a probability density curve. The function represented by this curve is called the frequency function, or more commonly known as the probability density function, denoted by f.

As the function f denotes a probability function,

Page 29: Theoretical Distributions in Probability and Statistics

Some notes on continuous random variables

Page 30: Theoretical Distributions in Probability and Statistics

Properties of continuous random variables

The cumulative density function (cdf) of a continuous random variable is denoted FX(x) = P(X x) for any real x

Page 31: Theoretical Distributions in Probability and Statistics

Uniform Distribution

Definition: A random variable is said to follow a Uniform distribution in the interval [a, b] if the probability density function is a constant in the interval.

Page 32: Theoretical Distributions in Probability and Statistics
Page 33: Theoretical Distributions in Probability and Statistics

Exam marks for Mathematics exam

40 50 60 70 80

68% of the probability, 1 standard deviation away

95% of the probability, 2 SDs away

Normal Distribution

Page 34: Theoretical Distributions in Probability and Statistics

Normal Distribution

Also known as the Gaussian distribution.

A useful distribution to model outcomes in the natural world.

Images from www.google.com

Page 35: Theoretical Distributions in Probability and Statistics

Properties of the Normal distribution

- Special case: If = 0, 2 = 1, the X has a Standard Normal distribution. Usually, the probability density function of the standard normal is written (x), and the cdf is written (x).

- If X ~ N(0, 1), and Y = aX + b, then Y ~ N(b, a2). Conversely, if X ~ N(, 2), and Y = (X – ) / , then Y ~ N(0, 1).

- If X1 ~ N(1, 12) and X2 ~ N(2, 2

2), and X1 and X2 are mutually independent, then Y = X1 + X2 ~ N(1 + 2 , 1

2 + 22).

- The plot of density function f is bell-shaped and symmetrical about the line x = with a single peak. So the mean, mode and median of the normal distribution coincide.

- Practically all of the population (about 99.7%) lies in the interval 3, about 95% of the population lies in the interval 2 and about 68% of the population lies in the interval .

Page 36: Theoretical Distributions in Probability and Statistics

Properties of the Normal distribution

- Suppose X ~ Binomial(n, p), for large n and relatively large p, the normal distribution can be used as an approximation and X N(np, np(1 – p)) - Suppose X ~ Poisson(), for large , the normal distribution can also be used as an approximation and X N(,) - When the Normal distribution is used to approximate to a discrete distribution, continuity correction must be used. This is because the discrete probability P(X = ) is equivalent to the continuous probability of P( 0.5 X < + 0.5). - For example, suppose X is discrete and the normal approximation is used. Suppose also the question requires to find P(X < 35). This is equivalent to finding the continuous probability P(X < 34.5), since the discrete value x = 35 is not included in the range X < 35, and so the continuous random variable cannot be bigger than 34.5. (since 34.5 x < 34.9999…will still round up to give 35 in the discrete random variable)

Page 37: Theoretical Distributions in Probability and Statistics

Calculating probabilities for N(0,1)

- http://www.stat.psu.edu/~babu/418/norm-tables.pdf

- Cumulative Standard Normal table

Images from training.ce.washington.edu

Page 38: Theoretical Distributions in Probability and Statistics

P(Z < 0.45) = ?P(Z < 0.45) = 0.67364

P(Z > 1.12) = ?

P(Z < 0.45) = 0.67364

P(Z > 1.12) = 1 – P(Z < 1.12) = 1 – 0.8684 = 0.1316

P(Z < 0.45) = 0.67364

P(Z > 1.12) = 1 – P(Z < 1.12) = 1 – 0.8684 = 0.1316

P(Z < -0.45) = 1 – P(Z > -0.45) = 1 – P(Z < 0.45)

Page 39: Theoretical Distributions in Probability and Statistics

RExcel and Normal distribution

Page 40: Theoretical Distributions in Probability and Statistics

RExcel and Normal distribution

Page 41: Theoretical Distributions in Probability and Statistics

Example 9: Suppose X ~ N(0, 1), and x takes values from the set X. Find the following probabilities, by using RExcel. a) P(X < x) for x = 0.65b) P(X x) for x = 0.123c) P(X > x) for x = 2.78d) P(X > x) for x = 0

Page 42: Theoretical Distributions in Probability and Statistics

Example 10: X and Y are independent random variables which are both normally distributed, with X ~ N(100, 25) and Y ~ N(120, 20).Calculate the following probabilities:(a) P(X > 92)(b) P(Y > X)(c) P(2X + Y < 300)(d) P(|X – Y| < 10)

Page 43: Theoretical Distributions in Probability and Statistics

Exponential Distribution

Recall that, under certain assumptions, the number of occurrences of rare events follows a Poisson distribution. Sometimes, the interest may be in the time till the observation of the event.  Let Yt denote the number of occurrences of rare events in t time units. Suppose the mean number of events is per time unit. Then Yt follows a Poisson distribution with mean = t.

Let X denote the time, measured from an arbitrary moment to the first event.

Then P(X > x) = P(No events in an interval of x time units)= P(Yx = 0)= ex

Therefore FX(x) = P(X x) = 1 – P(X > x) = 1 – ex , and f(x) = ex

This is called the exponential distribution or the waiting time distribution.

Page 44: Theoretical Distributions in Probability and Statistics

Exponential Distribution

The waiting time until an event occurs in a Poisson process follows the exponential distribution.

Page 45: Theoretical Distributions in Probability and Statistics

Lack of memory property

This is rather relevant to some of you! The waiting time for a bus follows an Exponential distribution (prove this!), and this property of an Exponential distribution is rather depressing. It says that the chance that you have to wait for another 5 minutes for the bus is exactly the same if you had waited for 20 minutes already and yet still have not seen it arrive!

Page 46: Theoretical Distributions in Probability and Statistics

Example 11: Assume that the number of radioactive particles emitted by a radioactive substance is 1.5 per second. What is the chance that we have to wait more than three seconds for the first emission to occur?

Page 47: Theoretical Distributions in Probability and Statistics

Example 12: Assume that the average time between two subsequent visits of insects to a certain flower is 12 minutes. You are starting to observe the flower. What is the chance that you will have to wait for no more than 15 minutes for the first insect to arrive? What is the chance that the time between the first and second arriving insect is less than 15 minutes? What is the chance that less than 3 insects will visit the flower, given that you observe the flower for one hour?

Page 48: Theoretical Distributions in Probability and Statistics

Entropy

Often in medical research, we are interested in predicting the outcome given some probability statements.

Suppose there are four possible outcomes after chemotherapy treatment:(complete remission, partial remission, no change, early death)

If the probabilities of the four outcomes estimated from current data are: (0.90, 0.08, 0.02, 0.00),

you will feel confident about the treatment, since current data intuitively provided a lot of information and this information seems to suggest a highl likelihood of positive outcomes.

Similarly, if the probabilities are(0.01, 0.01, 0.08, 0.90)

You will also feel confident that you should avoid undergoing the treatment, because again, current data provided a lot of information to suggest negative outcomes.

Page 49: Theoretical Distributions in Probability and Statistics

Entropy

However, if the probabilities are:(0.25, 0.25, 0.25, 0.25)

you actually will not gain additional information from previous data, or previous data are perfectly uninformative.

Entropy is a statistical measure to quantify the amount of information available for prediction, and is calculated from using all the probabilities of the possible outcomes (i.e. from the probability function).

Statistical definitionThe entropy of a random variable X with probability function p(x) is defined to be the quantity

Page 50: Theoretical Distributions in Probability and Statistics

Entropy

It can be shown that for a random variable with n possible values, the entropy is always bounded between 0 and log(n), where:- 0 corresponds to the situation with perfect information- Log(n) corresponds to the situation with no information.

Relative mutual informationIt is increasingly common to define the relative mutual information (RMI) as

RMI(X) = 1 – [H(X)/log(n)]

to yield a more intuitive information criterion that is bounded between 0 and 1, where:- 0 corresponds to the situation with no information- 1 corresponds to the situation with perfect information.

Page 51: Theoretical Distributions in Probability and Statistics

Example 13: Let X denote the outcome when flipping a fair coin and Y the outcome when rolling a fair die. Let furthermore Z be one, if two fair dice show a double six and zero otherwise. Notice that if you want to predict the outcome of these random variables, you have the best chance to predict Z correctly. Y is hardest to predict. Calculate the entropies and the relative mutual information of these three random variables.

Page 52: Theoretical Distributions in Probability and Statistics

Something fun – practical application of what we have learnt so far!

Very common for students to go through the material on probability and theoretical distributions thinking about what’s the relevance of all these in real life!

Let’s look at something fun, which most of you will hopefully have some experience with:

Images from www.google.com

Page 53: Theoretical Distributions in Probability and Statistics

Monopoly

Images from www.google.com

- 40 grids possible- each player moves his

avatar around the game board by rolling two dice

- Community Chest / Chance - Acquire properties across

the game board- Develop properties of the

same colour combination into houses and hotels

- Aim to bankrupt other players and be the richest (sounds familiar?)

- Potential of going to jail if landing on “Go to jail”

- or if you roll doubles 3 times in a row

- or if Chance / Community Chest sends you there.

Page 54: Theoretical Distributions in Probability and Statistics

Monopoly

Images from www.google.com

- 40 grids possible- Every grid equally likely? (or

2.5% chance?)- What are the properties that

are most likely to be landed on?

- Computer simulation of Monopoly, with all the rules and regulations

- turns out that the Jail spot has the highest occupancy rates (5.88%)

- that inevitably results in the orange properties being the most frequented (8.47%)

Possible outcomes from roll of two dice:Prob(X = 2) = 1 in 36 Prob(X = 3) = 2 in 36Prob(X = 4) = 3 in 36Prob(X = 5) = 4 in 36Prob(X = 6) = 5 in 36Prob(X = 7) = 6 in 36Prob(X = 8) = 5 in 36Prob(X = 9) = 4 in 36Prob(X = 10) = 3 in 36Prob(X = 11) = 2 in 36Prob(X = 12) = 1 in 36

Simple probability theory and knowledge of dice outcome can provide a marginal edge in games!

Page 55: Theoretical Distributions in Probability and Statistics

4.57%6.62%

7.17%8.47%

8.19% 7.61%

7.52

%4.

61%

2.15%

2.91%

2.65%

2.20%

5.88%

3.06%

2.96%

Page 56: Theoretical Distributions in Probability and Statistics

Waiting time?

We can model the waiting time for someone to land on a particular grid with an Exponential distribution.

For example, let’s suppose we are interested in the most expensive property on the board.

Page 57: Theoretical Distributions in Probability and Statistics

40.547.646.243.746.1

39.0

41.9

45.3

41.5

34.3

36.6

33.8

36.0

38.2 39.8 32.7 37.838.8 39.4 38.4 40.1

40.2

39.2

41.9

45.3

47.9

39.646.5

Page 58: Theoretical Distributions in Probability and Statistics

• know the definitions of the various terminologies and distributions

• know how to calculate the probability mass/density function for the theoretical distributions, and in empirical situations

• calculate the probability of specific outcomes, when assuming a theoretical distribution for these outcomes

• understand the interpretation of entropy and know how to calculate the entropy

Students should be able to