38
The Gaussian (Normal) Distribution: More Details & Some Applications

The Gaussian (Normal) Distribution: More Details & Some Applications

Embed Size (px)

Citation preview

Page 1: The Gaussian (Normal) Distribution: More Details & Some Applications

The Gaussian (Normal) Distribution: More Details & Some Applications

Page 2: The Gaussian (Normal) Distribution: More Details & Some Applications

The Gaussian (Normal) DistributionThe Gaussian Distribution is one of the most useddistributions in all of science. It is also called the “bell

curve” or the Normal Distribution.

If this is the “Normal Distribution”, logically, shouldn’t there also be an “Abnormal Distribution”?

Page 3: The Gaussian (Normal) Distribution: More Details & Some Applications

Johann Carl  Friedrich Gauss(1736–1806, Germany)•Mathematician, Astronomer & Physicist.•Sometimes called the

“Prince of Mathematics" (?)•A child prodigy in math.

(Do you have trouble believing some of the following? I do!)

•Age 3: He informed his father of a mistake in a payroll calculation

& gave the correct answer!!•Age 7: His teacher gave the problem of summing all integers 1100 to his class to keep them busy. Gauss quickly wrote the correctanswer 5050 on his slate!!•Whether or not you believe all of this, it is 100% true that he

Made a HUGE number of contributions to Mathematics, Physics, & Astronomy!!

Page 4: The Gaussian (Normal) Distribution: More Details & Some Applications

Johann Carl  Friedrich Gauss A Genius!

He made a HUGE number of contributions to Mathematics, Physics, & Astronomy

1. Proved The Fundamental Theorem of Algebra,that every polynomial has a root of the form

a+bi.2. Proved The fundamental Theorem of Arithmetic,

that every natural number can be represented as a

product of primes in only one way. 3. Proved that every number is the sum of at most 3 triangular numbers.4. Developed the method of least squares fitting & many other methods

in statistics & probability.5. Proved many theorems of integral calculus, including the divergence

theorem (when applied to the E field, it is what is called Gauss’s Law).6. Proved many theorems of number theory.7. Made many contributions to the orbital mechanics of the solar system. 8. Made many contributions to Non-Euclidean geometry9. One of the first to rigorously study the Earth’s magnetic field

Page 5: The Gaussian (Normal) Distribution: More Details & Some Applications

- 5

0 . 4

0 . 3

0 . 2

0 . 1

. 0

x

f(

xr a l i t r b u i o n : = 0 , = 1Characteristics of a Normal

or Gaussian Distribution

a

It is SymmetricIt’s Mean, Median, & Mode are Equal

Page 6: The Gaussian (Normal) Distribution: More Details & Some Applications

A 2-Dimensional Gaussian

Page 7: The Gaussian (Normal) Distribution: More Details & Some Applications

Gaussian or Normal Distribution• It is a symmetrical, bell-shaped curve.• It has a point of inflection at a position 1 standard

deviation from mean. Formula:

f (X ) 1

2(e)

(X )2

2 2

f f ((X X ))

xx

Page 8: The Gaussian (Normal) Distribution: More Details & Some Applications

The Normal Distribution2)(

2

1

2

1)(

x

exf

Note the constants: = 3.14159e = 2.71828

This is a bell shaped curve with different centers and spreads

depending on and

Page 9: The Gaussian (Normal) Distribution: More Details & Some Applications

• There are only 2 variables that determine the curve, the mean & the variance . The rest are constants.

• For “z scores” ( = 0, = 1), the equation becomes:

2/2

2

1)( zezf

The negative exponent means that big |z| values give small function values in the tails.

Page 10: The Gaussian (Normal) Distribution: More Details & Some Applications

Normal Distribution

12

1 2)(2

1

dxex

It’s a probability function, so no matter what the values of and , it must integrate to 1!

Page 11: The Gaussian (Normal) Distribution: More Details & Some Applications

The Normal Distribution is Defined by its Mean & Standard Deviation.

=

2 =

Standard Deviation =

dxexx

2)(

2

1

2

1

2)(

2

12 )

2

1(

2

dxexx

l

Page 12: The Gaussian (Normal) Distribution: More Details & Some Applications

Normal Distribution• Can take on an infinite

number of possible values.• The probability of any one

of those values occurring is essentially zero.

• Curve has area or probability = 1

Page 13: The Gaussian (Normal) Distribution: More Details & Some Applications

• A normal distribution with a mean of 0 and a standard deviation of 1 is called

the standard normal distribution.• Z Value: The distance between a selected value,

designated X, and the population mean , divided by the population standard deviation,

ZX

7-6

Page 14: The Gaussian (Normal) Distribution: More Details & Some Applications

Example 1• The monthly incomes of recent MBA graduates in a large

corporation are normally distributed with a mean of $2000 and a standard deviation of $200. What is the Z value for an income of $2200? An income of $1700?

• For X = $2200, Z= (2200-2000)/200 = 1.

• For X = $1700, Z = (1700-2000)/200 = -1.5

• A Z value of 1 indicates that the value of $2200 is 1 standard deviation above the mean of $2000, while a Z value of $1700 is 1.5 standard deviation below the mean of $2000.

7-7

Page 15: The Gaussian (Normal) Distribution: More Details & Some Applications

Probabilities Depicted by AreasUnder the Curve

• Total area under the curve is 1• The area in red is equal to

p(z > 1)• The area in blue is equal to

p(-1< z <0)• Since the properties of the

normal distribution are known, areas can be looked up on tables or calculated on a computer.

Page 16: The Gaussian (Normal) Distribution: More Details & Some Applications

Probability of an Interval

43210-1-2-3-4Z

pro

ba

bili

ty d

en

sity

Normal CurveInterval Probability

)21()1()2( XpFF

Page 17: The Gaussian (Normal) Distribution: More Details & Some Applications

Cumulative Probability

)()( aXpaF

320-1-3Z

prob

abili

ty d

ensi

ty

Normal CurveCumulative Probability

a=X

)()(1 XapaF

Page 18: The Gaussian (Normal) Distribution: More Details & Some Applications

A table will give this probability

Given any positive value for z, the corresponding

probability can be looked up in standard tables.

Given positive z

The probability found using a table is the probability of having a

standard normal variable between 0 & the given positive z.

Page 19: The Gaussian (Normal) Distribution: More Details & Some Applications

Areas Under the Standard Normal Curve

Page 20: The Gaussian (Normal) Distribution: More Details & Some Applications

Areas and Probabilities• The Table shows cumulative normal probabilities. Some

selected entries:

z F(z) z F(z) z F(z)

0 .50 .3 .62 1 .84

.1 .54 .4 .66 2 .98

.2 .58 .5 .69 3 .99

About 54 % of scores fall below z of .1. About 46 % of scores fall below a z of -.1 (1-.54 = .46). About 14% of

scores fall between z of 1 and 2 (.98-.84).

Page 21: The Gaussian (Normal) Distribution: More Details & Some Applications

Areas Under the Normal Curve• About 68 percent of the area under the normal curve is

within one standard deviation of the mean.• About 95 percent is within two standard deviations of

the mean.• 99.74 percent is within three standard deviations of the

mean.

1

2

3

7-9

Page 22: The Gaussian (Normal) Distribution: More Details & Some Applications

- 5

0 . 4

0 . 3

0 . 2

0 . 1

. 0

x

f(

x

r a l i t r b u i o n : = 0 , = 1

Areas Under the Normal Curve

1

2

3 1

2

3

Between:1.68.26%2.95.44%3.99.74%

Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

7-10

Page 23: The Gaussian (Normal) Distribution: More Details & Some Applications

Key Areas Under the Curve

For normal distributions

+ 1 ~ 68%

+ 2 ~ 95%

+ 3 ~ 99.9%

Page 24: The Gaussian (Normal) Distribution: More Details & Some Applications

“68-95-99.7 Rule”

68% of the data

95% of the data

99.7% of the data

Page 25: The Gaussian (Normal) Distribution: More Details & Some Applications

68.26 -95.44-99.74 Rule

For a Normally distributed variable:1. > 68.26% of all possible observations lie within one

standard deviation on either side of the mean

(between and 2.> 95.44% of all possible observations lie within two

standard deviations on either side of the mean

(between and 3.> 99.74% of all possible observations lie within two

standard deviations on either side of the mean

(between and

Page 26: The Gaussian (Normal) Distribution: More Details & Some Applications

• Using the unit normal (z), we can find areas and probabilities for any normal distribution.

• Suppose X = 120, =100, =10.

• Then z = (120-100)/10 = 2.

• About 98 % of cases fall below a score of 120 if the distribution is normal. In the normal, most (95%) are within 2 of the mean. Nearly everybody (99%) is within 3 of the mean.

Page 27: The Gaussian (Normal) Distribution: More Details & Some Applications

68.26-95.44-99.74 Rule

Page 28: The Gaussian (Normal) Distribution: More Details & Some Applications

68-95-99.7 Rule in Math terms…

997.2

1

95.2

1

68.2

1

3

3

)(2

1

2

2

)(2

1

)(2

1

2

2

2

dxe

dxe

dxe

x

x

x

Page 29: The Gaussian (Normal) Distribution: More Details & Some Applications

Example 2• The daily water usage per person in New Providence,

New Jersey is normally distributed with a mean of 20 gallons and a standard deviation of 5 gallons.

• About 68% of the daily water usage per person in New Providence lies between what two values?

• That is, about 68% of the daily water usage will lie between 15 and 25 gallons.

1 20 1 5( ).

7-11

Page 30: The Gaussian (Normal) Distribution: More Details & Some Applications

Normal Approximation to the Binomial• Using the normal distribution (a continuous distribution)

as a substitute for a binomial distribution (a discrete distribution) for large values of n seems reasonable because as n increases, a binomial distribution gets closer and closer to a normal distribution.

• The normal probability distribution is generally deemed a good approximation to the binomial probability distribution when n and n(1- ) are both greater than 5.

7-18

Page 31: The Gaussian (Normal) Distribution: More Details & Some Applications

Binomial Distribution for n = 3 & n = 20

7-20

n=3

0

0.1

0.2

0.3

0.4

0 1 2 3

number of occurences

P(x

)

n=20

0

0.05

0.1

0.15

0.2

2 4 6 8 10 12 14 16 18 20n u m b e r o f o c c u r e n c e s

P(x)

Page 32: The Gaussian (Normal) Distribution: More Details & Some Applications

• Flip coin N times

• Each outcome has an associated random variable Xi (= 1, if heads, otherwise 0)

• Number of heads:

• NH is a random variable

NH = x1 + x2 + …. + xN

Central Limit Theorem

Page 33: The Gaussian (Normal) Distribution: More Details & Some Applications

Central Limit Theorem

• Coin flip problem.

• Probability function of NH

– P(Head) = 0.5 (fair coin)

N = 5 N = 10 N = 40

Page 34: The Gaussian (Normal) Distribution: More Details & Some Applications

Central Limit Theorem The distribution of the sum of N random variables becomes increasingly Gaussian as N grows.

Example: N uniform [0,1] random variables.

Page 35: The Gaussian (Normal) Distribution: More Details & Some Applications

80 90 100 110 120 130 140 150 160 0

5

10

15

20

25

P e r c e n t

POUNDS

127.8 143.3112.3

Page 36: The Gaussian (Normal) Distribution: More Details & Some Applications

%

%

Pro

bab

ilit

y / %

Normal Distribution

Page 37: The Gaussian (Normal) Distribution: More Details & Some Applications

Normal DistributionWhy are normal distributions so important?

• Many dependent variables are commonly assumed to be normally distributed in the population

• If a variable is approximately normally distributed we can make inferences about values of that variable

• Example: Sampling distribution of the mean• So what?• Remember the Binomial distribution

– With a few trials we were able to calculate possible outcomes and the probabilities of those outcomes

• Now try it for a continuous distribution with an infinite number of possible outcomes. Yikes!

• The normal distribution and its properties are well known, and if our variable of interest is normally distributed, we can apply what we know about the normal distribution to our situation, and find the probabilities associated with particular outcomes.

Page 38: The Gaussian (Normal) Distribution: More Details & Some Applications

• Since we know the shape of the normal curve, we can calculate the area under the curve

• The percentage of that area can be used to determine the probability that a given value could be pulled from a given distribution.

• The area under the curve tells us about the probability- in other words we can obtain a p-value for our result (data) by treating it as a normally distributed data set.