Information Analysis Gaussian or Normal Distribution

Information AnalysisGaussian or Normal Distribution

0

0.002

0.004

0.006

0.008

0.01

0.012

0 50 100 150 200 250 300 350

X

Pro

babi

lity

= mean, estimated as xx = observed sample mean = x/n= standard deviation, estimated as sn = sample sizeS= observed standard deviation

2/1

11

22

nn

x

n

x

s

ii

0

0.002

0.004

0.006

0.008

0.01

0.012

0 50 100 150 200 250 300 350

X

Pro

ba

bil

ity

Area under curve = 1

0

0.002

0.004

0.006

0.008

0.01

0.012

0 50 100 150 200 250 300 350

X

Pro

ba

bil

ity

Coefficient of Variation

s

xCv

-0.005

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250 300 350

X

Pro

ba

bili

ty

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0 50 100 150 200 250 300 350

X

Pro

ba

bili

ty

Cv = 150/20 = 7.5 Cv = 150/60 = 2.5

Example

100 kg of glass is recovered from municipal refuse and processed. The glass is crushed and sieved. Lot the cumulative distribution of particle size from the data below

4 mm holes 10 kg glass remained on the sieve(90 kg went through)

3 mm holes 25 kg remained on the sieve2 mm holes 35 kg remained on the sieve1 mm holes 20 kg remained on the sieveNo holes 10 kg went all the way through

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

4 3 2 1 Pan

Sieve Size (mm)

Frac

tion

Ret

aine

d

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

4 3 2 1 Pan

Sieve Size (mm)

Fra

ctio

n R

eta

ine

dSieve Size Fraction Retained 4 10/100 = 0.1 3 25/100 = 0.25 2 35/100 = 0.35 1 20/100 = 0.20 <1 10/100 = 0.1

Cumulative Distribution

Sieve Size Fraction SmallerThan sieve size

4 1 – 0.1 = 0.9 3 1 – (0.1+0.25) = 0.65 2 1 –(0.35 + 0.35) = 0.3 1 1 – (0.7 + 0.2) = 0.1

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5

Particle Size (mm)

Frac

tion

of P

Artic

les

smal

ler

than

siz

e in

dica

ted

Graphs

Independent variable Abscissa (x-axis)

Dependent variable Ordinate (y-axis)

A variable is independent if the value is chosen, likesieve size in the previous example.

A value is dependent if is determined by experiment

Probability PaperX-axis is linear

Y-axis is plotted so that if the probability is normal (Gaussian) then the cumulative probability will plot as a straight line.

If this is the case the mean is at 0.5 or 50% and the standard deviation is 0.335 on either side of the mean.

You can also calculate s by: s = 2/5(x90 – x10)

ExampleConsider the recycled glass data from the previous example. What is the mean, the standard deviation, and the 95% interval?

The mean is the value on the x-axis when the y-axis value is 0.5, 2.4 mm.

The standard deviation is the spread around the mean so that 68% of the data fall into the range (or about 34% on either side of the mean).

0.5 + 0.34 = 0.84, which corresponds to 3.5 mm, so s = 3.5 – 2.4 = 1.1, or:S=2/5(3.9-1.0) = 1.16

The 95% interval means 95% of the data is in the range, or between 0.025 and 0.975, or 0.2 mm and 4.8 mm

Return PeriodReturn period is how often an event is expected to recur.

If the annual probability of an event occurring is 5%, then the event can be expected to occur once every 20 years, or have a return period of 20 years:

Return period = 1/fractional probability

To determine return periods, first rank time-variant data (smallest to largest or largest to smallest) then calculate the probabilities and plot the data.

Return Period ExampleThe data below are from a wastewater treatment plant. BOD is the measure of organic pollution in a water. The BOD is measured daily. .

Does this data fit the normal distribution? Can it be used to calculate the mean and standard deviation? What is the worst quality expected in 30 days?

First, rank the data:

Now plot the data. We will plot m/n (which is the probability), versus the BOD

It does fit the normal distribution fairly well

The mean is about 35 mg/L BOD

To find the worst quality in a 30 day period, calculate: 29/30 = 0.967. This is the fraction of days the quality is better than the worst day out of 30 days

Enter the graph at 0.967 and find the answer: 67 mg/L BOD

Sometimes data is analyzed after it is grouped. Often the mean is used to analyze the data.

Example:

Using the data from the previous problem estimate the highest expected BOD to occur once every 30 days using grouped data analysis

First define groups of BOD values.

Now plot these data Notice how the data points form a curve. This means the data don’t really fit the normal Distribution, but we’ll go ahead anyway

Now P29/30 = 0.967 and we read 67 mg/L BOD from the graph.

Documents

Information Analysis Gaussian or Normal Distribution