52
Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Embed Size (px)

Citation preview

Page 1: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Engineering StatisticsMnge 417

Introduction©Dr. B. C. Paul 2003

Page 2: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Why Should Engineers Even Care?

• Engineers Design and Plan– All heard of significant figures 15.232176451234

gets called 15.25– Is everything built actually 15.25 ft

• Are our roof bolt spacings in the field really 5 feet?

– Much profession is built around engineering tolerances - how close do I have to be to make it work

• reality is then (hopefully) a bunch of minor variations around acceptable answer

Page 3: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Building to Tolerance• Design says shafts are machined to 1.25 inches +/-

some tolerance• Reality says there is actually a bunch of very similar

sizes that are close to 1.25 inches– Every so often we will get a dud (we accept the reality

but want to minimize frequency)

• Often can’t check all parts for tolerance but can check every so many to make sure process is under control– Sample is a few values collected from a larger population

• Statistical Probability Distribution is a model of this process

Page 4: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Engineers Make Changes as a Means to Improve Things

• Mining Engineering Example– Coal production from a face area is critical to

costs and competitiveness– Make a policy or equipment change - does it

work• Will Joy’s new high voltage miner really improve

coal production?

• Will change in a ventilation pattern really reduce dust violations that limit production?

Page 5: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Cause and Effect Relationships• Very few real results of anything are just one

value– Coal production has up and down days

• Does the new policy, equipment or practice result in more up days of higher value?– How many good results do you actually need to see

before you can feel confident that its not just coincidental higher (or lower) values?

• Varying effects can be modeled as probability distribution

Page 6: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Engineering Design Practice

• You have a bunch of equations and formulas that tell you whether something should work.

• Next design step is often to consider that things don’t always work exactly as they should– Mining Truck or a Water Treatment Plant

processing train do not work all the time• Thus real production is different than the design equation

Page 7: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Modeling

• We build a mathematical model of the situation and then do the math to see if it is going to work for us in the real world

• We may not think of it but most of our engineering design equations are mathematical models that were fit to actual data long ago– Newtonian physics (we call them laws now)– Darcy’s law and the Bernoulli Equation

Page 8: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

How do You Decide if a Mathematical Model Fits What

You See?• Because you usually can’t measure 100%

accurate or don’t think of or can’t consider every minor effect– Real results tend to be distributed around our

potential mathematical models

• Statistical models consider a distribution of answers around an underlying trend

Page 9: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Sometimes you don’t know what is driving a result

• Is absenteeism being driven by work assignments, health, deer season etc.

• Statistical models can compare variations to possible causes and help identify what is driving things.

Page 10: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Spatial Relationships

• We take samples of an ore body– Do the results mean we have a certain tonnage of ore at

a certain grade?

• We use samples to tell us what material to take to the processing plant or waste dump

• We may want to tell our mill operator how much a grade or ore may go up or down

• We can have statistical models built with a spatial or location relationship.

Page 11: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

How Statistics Works

• Often trained to think that answer to real world problems comes out of an equation

• We actually create mathematical models that approximately fit reality and then work off of something predictable– math that actually is used to study mathematical

models may be something only a French Mathematician could love

– A lot of the basic ideas are fairly intuitive

Page 12: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Example

• If I have a random number generator that produces numbers between 1 and 100, what value is most likely?

• If I take 25 of those random numbers what will the average value most likely be close to?

Page 13: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

What Did You Assume to Get Those Answers?

• You assumed how those values were distributed– You considered what was called a uniform

distribution (all numbers are equally likely to come up)

– Statistics begins with a series of standard mathematical distributions

• We try to pick one that most nearly matches our reality

Page 14: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Getting Your Answers

• You also assumed that the numbers were taken from that distribution at random– ie no one is cherry picking any values

preferentially to any other

– One of the reasons that statisticians get so crazy if they think someone is Cherry Picking the sample

• Root of all Statistics is that you assume reality follows a standard mathematical distribution and the part we see was picked at random from that distribution

Page 15: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

How Do We Come Up With What Distribution Closely Resembles Our

Reality?• Process Starts with Figuring Out Which of Our Standard

Model Distributions it is• Three Levels of Effort• Say “I Believe” and assume one

– Most commonly done with “Normal Distribution” - “Bell Curve”

– Many things tend to be normally distributed– Strength of past experience becomes rationale

• Also have people who do it without having any idea what they have done– Standard statistics is built around normal distribution

Page 16: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Levels of Effort

• Level 2– Study the distribution to see if we are doing

something terrible– Common approach is called a “Histogram”

• it’s a bar graph that we plot our data on so we can look at it

– Also have things like probability paper where you plot your data and see if you get a straight line

Page 17: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Effort Level 3

• Use statistical techniques to test whether our sample data is like a set that could reasonably be pulled from some standard distribution– Often our goodness of fit tests

• All three levels of effort have some degree of custom for their use in some practices

Page 18: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Measuring Properties of Distributions

• Put sample data into a standard equation that generates a number– Often actually call that number a statistic

– Measures some property of the distribution that the data was taken from

• Some statistics have obvious tangible meaning– Example - Mean - mathematical average value of

the sample or population

Page 19: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Calculating a Mean (or simple average)

• Add up all the numbers and then divide by how ever many numbers you added

• Example– Numbers 5, 10, 15, 20, 25– What is the Mean?

• Calculate– (5 + 10 + 15 + 20 + 25)/5– Numerator totals to 75– Denominator is the number of values I put in– Divide the total by the number of values put in– Answer is 15 (the Mean or Average Value)

Page 20: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Statisticians Need Confusing Ways to Write Equations

• Xi means a sample value– The i subscript tells you whether it was the first, second, third etc

sample• From example on last slide we know X2 was the second number we

looked at which was 10

• Σ means the sum of a series of values• n means the number of samples considered• Thus we write the formula for mean as

– • We of course also have a special symbol for a mean

– X

n

n

iX1

Page 21: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Can Do Problems with Software(in this case SPSS)

Type in Data

Page 22: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Ready to Enter Data

Type in the Data

Page 23: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Command to Analyze

Pull Down Analyze Menu

Highlight Descriptive Statistics

Highlight Frequencies

Page 24: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Click on Frequencies

It gives me a list ofVariables to use

This list is tough withOnly one variable

Highlight the variableAnd push the arrowTo move it into theUse area

Click Statistics

Page 25: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Choose Your Statistics

Check off MeanAnd push continue

Page 26: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Click OK on the Frequencies Screen

Read Off OurMean at 2.89

Page 27: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

More Measurements

• Mode– The value that has the greatest chance of coming up

• Example– If I have 10 people who are 5’10”– 2 people who are 4’3”– 2 people who are 6’10”– If you pick a person at random from my group what

height will person most likely be?

Page 28: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

More Measures• Median

– Half of the values are higher - half are lower

• Mean, Median, and Mode all seem to have somewhat obvious physical meanings

• Other statistics are less obvious– Variance – A number that comes out of a formula that tells

you how spread out the distribution is

• Square root of variance is Standard Deviation– Average difference between a sample and the mean

value

Page 29: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

The Standard Deviation

• Standard Deviation is the average difference between individual samples and the mean

1

)( 2

n

Xs X i

What does it mean?Take each sample number, subtract the average sampleValue from it, square the result, do this for every numberAnd add up the result, then divide the result by one lessThan the number of samples you took, and then take theSquare root of that value.

Page 30: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

As a Practical Matter That’s a Pain• I have to compute the average before I can do the

math for standard deviation• Alternative Formula

1

1

2

12

nn

s

n

n

i

i

XX

Tells you keep track of two number1- Take each number square it and then add the squares

up2- Take each number and add them up and then square

the total

Page 31: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Getting Standard Deviation

• Statistical Calculators have multiple memories– They add up numbers in one memory– They square and add up numbers in another– They total entries in another– They then apply the standard deviation formula

• Of course can also use SPSS

Page 32: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Doing Standard Deviation with SPSS

Pull Down Analyze

Highlight DescriptiveStatistics

Highlight and clickfrequencies

Page 33: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Check Off Standard Deviation

Push Continue

Push Ok on theFrequenciesmenu

Page 34: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Read Off the Output

Std is 1.12

Page 35: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Variance is also a measure of how much things differ from their

average

• Variance is just the standard deviation squared

• To calculate a variance just do the standard deviation thing without taking the square root at the end

• Of course I could also check off variance instead of Standard Deviation in SPSS

Page 36: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Types of Distributions

• Idea is that we try to approximate reality with a mathematically defined distribution– Then we can use mathematical operations to predict our

answers• Distributions that often fit reality

– Normal Distribution (developed in 1733)• Bell Curve

– Uniform Distribution– Binomial Distribution– T Distribution– Qui Square Distribution– Lognormal Distribution

Page 37: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Derived Distributions

• T distribution, Qui Squared, and Lognormal Distributions are all derived from the Normal Distribution for specific types of situations

Page 38: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Normal Distribution

• Shaped Like

Formula

ex

xfY

2

2

2

2

1)(

Page 39: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Symmetric Distributions with a Central Tendency

• Normal Distribution is classic example– Most of the chances are right near the center of the

distribution• Frequency drops off to sides• Mode is at the Center of the Distribution

– Distribution is mirror image about its center• Allows to just compute one side• Median is Mean is the Mode

• A lot of reality has central tendency with relatively symmetric sides– T distribution like that too

• Sides slope off a little differently

Page 40: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Why the Normal Distribution

• One of the first mathematically defined distributions that was a real good fit– People developed other formulas and distributions

from calculations done on the normal distribution• T distribution and Qui Square Distribution both result

from performing mathematical operations on samples of a normal distribution

– Normal Distribution was first to press with a distribution that was heavy at the center and symmetric

Page 41: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Reality 101 for Statistical Distributions

• Probably no such thing as a real normal distribution in life

• Even if there were we almost never count each and every member of the population so you’d never know if it was

• Statistical Distributions let us take limited data – see what it approximately is– Then use the defined mathematical model to

suddenly know everything about it

Page 42: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Back to Why the Normal Distribution

• Big part of Real World is Central Tendency and Symmetric

• Found that calculations done with a normal distribution were robust– Minor lack of fit in real world data doesn’t

change the answers much– Thus works on almost anything with central

tendency and near symmetric

Page 43: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Most Common Lack of Fit

• Not Symmetric

Robustness covers aLittle skewness

This type of shape can be fit with aDistribution adapted from normal calledlognormal

If you take averages of about 25 samplesFrom this – the averages will be normal(averaging normalizes)Taking logarithms of the data will makeThe transformed distribution normal

Taking square-root will normalizeA few others

Page 44: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Multi-Modal Distributions

These types of distributions are often 3 different normallyDistributed families over-lying each other

Finding what is causing the three families often helps usTo better understand our world

Page 45: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Uniform Distribution

• All values within some range (which may or may not be plus or minus infinity) are equally likely

• Distribution has no central tendency

• Tends to be associated with truly random events (or at least events where the underlying cause is eluding our mathematical modeling)

Page 46: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Characteristics of Uniform Distribution

• Because all values are equally likely it has no mode

• Mean is at the center of the range

• Uniform is still symmetric about Mean so the Median and Mean are the same

• Standard Deviation is 1/4th the range (if range is infinite obviously that’s not defined)

• Variance is Standard Deviation Squared

Page 47: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Binomial Distribution

• Outcomes that are either off or on– Clearly describes computers and digital data

• Many things either work or they don’t– Mining dealing with whether our trucks are in

working order– Water treatment plant – water purification train

is working or not working– Coin tosses are heads or tails

Page 48: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

New Problem

• Can’t talk about means, modes, and medians because outcome has no continuous distribution

• Want to know what fraction of the outcomes are “yes”– P = 0.85 85% of members of bimodal population are

positive• Usually interested in what chances are that we can

take 5 members out of the population and have them all positive– Example if I have 5 mining trucks how much of the

time will all 5 be running?

Page 49: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

The Ordinate Problem

• How continuously distributed are our outcomes?– Our number line is continuous so at first glance we

almost assumed everything was continuous

• When and what if they are not

• This usually doesn’t take a very smart statistician to figure out

• Some things are yes or no distributed– Use binomial distribution model Da!

Page 50: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Some Things are Integer Distributed

• Continuity really is a function of observational scale– According to quantum physics everything is

made of integer numbers of discrete quanta– At our observation scale the little integer jumps

are perhaps so small we cannot even measure them

– Many times integer continuity is negligible

Page 51: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

What If Integer Continuity is Not Negligible?

• Happens when have small numbers or integer distributed data– How does one deal with teacher rankings in

classes of 5 students?• Our scale of observation is integer• Our sample size is small enough we can’t mask it• If it was a class of 500 students we could probably

model outcomes rather well as if continuous

• Non-Parametric Statistical Models

Page 52: Engineering Statistics Mnge 417 Introduction ©Dr. B. C. Paul 2003

Summary of Ideas

• Real world data comes as distributions of answers not one equation numbers

• We can represent these distributions with mathematical models that fully define how the data is distributed– Allows us to approximate things we could never get

enough data to count

• We work on these models and call our work Statistics