35
1 © Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02 Basic Statistics for Process Improvement

Mod 5 05 Basic Statistics March 02

Embed Size (px)

DESCRIPTION

Mod 5 05 Basic Statistics March 02

Citation preview

Page 1: Mod 5 05 Basic Statistics March 02

1© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Basic Statistics for Process Improvement

Page 2: Mod 5 05 Basic Statistics March 02

2© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

The Breakthrough StrategyDefine BB Works with Management

1 Select Output Characteristic and identify key process input and output variables

2 Define Performance Standards

3 Validate Measurement System

4 Establish Product Capability

5 Define Performance Objectives

6 Identify Variation Sources

7 Screen Potential Causes

8 Discover Variable Relationships

9 Establish Operating Tolerances

10 Validate Measurement System

11 Determine Process Capability

12 Implement Process Controls

Measure

Analyze

Improve

Control

Characterize

Optimize

Page 3: Mod 5 05 Basic Statistics March 02

3© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Measurement Phase• Project Definition:

– Problem Description – Project Metrics

• Process Exploration:– Process Flow Diagram– C&E Matrix, PFMEA, Fishbones– Data collection system

• Measurement System(s) Analysis (MSA):– Attribute / Variable Gage Studies

• Capability Assessment (on each Y)– Capability (Cpk, Ppk, s Level, DPU, RTY)

• Graphical & Statistical Tools• Project Summary

– Conclusion(s)– Issues and barriers– Next steps

• Completed “Local Project Review”

Page 4: Mod 5 05 Basic Statistics March 02

4© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

2520151050

75

70

65

Sample Number

Sam

ple

Mea

n

X-Bar Chart for Process A

X=70.91

UCL=77.20

LCL=64.62

2520151050

80

70

60

50

Sample Number

Sam

ple

Mea

n

X-Bar Chart for Process B

X=70.98

UCL=77.27

LCL=64.70

Basic Statistics Fundamentals of Improvement

• Variability– Is the process on target with minimum variability?– We use the mean to determine if process is on target. We

use the Standard Deviation to determine spread• Stability

– How does the process perform over time?– Stability is represented by a constant mean and predictable

variability over time.

Page 5: Mod 5 05 Basic Statistics March 02

5© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

20100

1451351251151059585756555

Sample Number

Sam

ple

Mea

n

X-bar Chart for Machine A

X=100.7

138.4

62.9320100

110

100

90

Sample Number

Sam

ple

Mea

n

X-bar Chart for Machine B

1

1

X=101.0

108.5

93.42

20100

120

115

110

Sample NumberS

ampl

e M

ean

X-bar Chart for Machine C

X=115.0

119.7

110.4

Warm-Up Exercise• Assume machines A, B, and C make identical products (w/range

charts in control)• Assume that the target value for each product output variable is 100

mm• Answer the following questions:

– Which machines exhibit(s) variation?– Where is each machine centered?– Which machines are predictable over time?– Which machines have special cause variation?– Which machine would you want making your product?– Which machine would probably be easiest to fix?

Page 6: Mod 5 05 Basic Statistics March 02

6© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Co

stC

ost

LSLLSLUSLUSLNomNom

Taguchi Loss Function

(New View)

Taguchi Loss Function

(New View)

LSLLSLUSLUSLNomNom USLUSL

Traditional View

Traditional ViewAcceptableAcceptable

Can We Tolerate Variability?• There will always be variability present in any process• We can tolerate variability if:

– the process is on target– the total variability is relatively small compared to the

process specifications– the process is stable over time

Page 7: Mod 5 05 Basic Statistics March 02

7© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Data Analysis Tasks for Improvement

• Determine if process is stable– If process is not stable, identify and remove causes

(X’s) of instability (obvious non-random variation)• Estimate the magnitude of the total variability. Is it

acceptable with respect to the customer requirements (spec limits)?– If not, identify the sources of the variability and

eliminate or reduce their influence on the process• Determine the location of the process mean. Is it on

target?– If not, identify the variables (X’s) which affect the mean

and determine optimal settings to achieve target value• We will now review statistics that help this process

Page 8: Mod 5 05 Basic Statistics March 02

8© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Types of Outputs (Data)

• Attribute Data (Qualitative)– Categories– Yes, No– Go, No go– Machine 1, Machine 2, Machine 3– Pass/Fail

• Variable Data (Quantitative)– Discrete (Count) Data

• Maintenance equipment failures, fiber breakouts, number of clogs

– Continuous Data• Decimal subdivisions are meaningful• Dimension, chemical yield, cycle time

Page 9: Mod 5 05 Basic Statistics March 02

9© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Discrete (Attribute) Continuous (Variable)

Continuous

(Variable)

Discrete

(Attribute)

Outputs

Inp

uts

Chi-square Analysis of Variance

Discriminate Analysis

Logistic regression

Correlation

Multiple Regression

Selecting Statistical Techniques

• There are statistical techniques available to analyze all combinations of input / output data.

Page 10: Mod 5 05 Basic Statistics March 02

10© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Statistical Distributions

• We can describe the behavior of any process or system by plotting multiple data points for the same variable– over time– across products– on different machines, etc.

• The accumulation of this data can be viewed as a distribution of values

• Represented by:– dot plots– histograms– normal curve or other “smoothed” distribution

Page 11: Mod 5 05 Basic Statistics March 02

11© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

:

:

. . . : . .

:: : :::.:: :: . ::

. : .. .:.:.:::::::::::::::.::.::::..: : .

-------+---------+---------+---------+---------+-------GPM

49.00 49.50 50.00 50.50 51.00

Dot plot distribution• Imagine a metering pump, geared to pump material at 50

gallons/minute• The actual pump rate is measured at 100 separate instances

in time.• Each dot is plotted and represents one “event” of output at a

given value (pump speed). As the dots accumulate, the nature of the pump’s actual performance can be seen as a “distribution” of pump speed values.

Page 12: Mod 5 05 Basic Statistics March 02

12© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

51.350.850.349.849.348.8

40

30

20

10

0

GPM

Freq

uenc

y

Histogram Distribution• Now imagine the same data, grouped into “intervals”

with the number of times that a pump speed data point falls within a given interval determining the height of the interval bar.

Page 13: Mod 5 05 Basic Statistics March 02

13© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

52.051.551.050.550.049.549.048.548.0

GPM

Smoothed (Normal) distribution• Finally, we can view the data as a smoothed distribution (red line).• In this example using the “normal distribution” assumption (we’ll

discuss this later) provides an approximation of how the data might look if we were to collect an infinite number of data points.

Page 14: Mod 5 05 Basic Statistics March 02

14© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

mean Sample=X

“Population Parameters” “Sample Statistics”

m = Population mean

s = Sample standard deviation

Sample

Population

s = Population standard deviation

Population Parameters Vs Sample Statistics

• Population:– an entire group of objects that have been made or will be

made containing a characteristic of interest– is it likely we can ever know the true population parameters

• Sample:– the group of objects actually measured in a statistical study– a sample is usually a subset of the population of interest

Page 15: Mod 5 05 Basic Statistics March 02

15© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Population Mean

N

X= 1

i

N

i

Sample Mean

n

x=x

n

1=ii

Population Standard Deviation

N

) (X=

N

1=i

2i

Sample Standard Deviation

1 ˆ

2

1

n

xxs

n

ii

Computational Equations

Page 16: Mod 5 05 Basic Statistics March 02

16© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

• Mean: Arithmetic average of a set of values

– Reflects the influence of all values

– Strongly Influenced by extreme values

• Median: Reflects the 50%rank - the center number after a set of numbers has been sorted

– Does not necessarily include all values in calculation

– Is “robust” to extreme scores

• Mode:

– Most frequently occurring value in a data set

• Why would we mainly use the mean, instead of the median, in process improvement efforts?

n

n

n nxx

1

Measures of Central Tendency

Page 17: Mod 5 05 Basic Statistics March 02

17© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

1n

)X(Xn

1i

2i

s

minmax Range

1n

)X(Xn

1i

2i

2

s

Measures of Variability:

• Range:– Numerical distance between the

highest and the lowest values in a data set.

• Variance (s2 ; s2 ):– The average squared deviation

of each individual data point from the mean.

• Standard Deviation (s ; s):– The square root of the variance.

• most commonly used measurement to quantify variability

Page 18: Mod 5 05 Basic Statistics March 02

18© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

1050

100

50

0

Deviates

Sq-D

ev

The Quadratic Deviation

• Squaring the deviation weights extreme deviations from the natural mean very heavily

(x - x) 2

Page 19: Mod 5 05 Basic Statistics March 02

19© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

22 Total

222

22X

12X

2total

21

21

2

1

So,

then,

;X VariableInput todue variance

;X VariableInput todue variance

output; process theof varianceIf

XX

XXTotal

Principle of Six Sigma• Variances add, standard deviations do not• Variances of the inputs add to calculate the total

variance in the output

Page 20: Mod 5 05 Basic Statistics March 02

20© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

The Normal Distribution

• The “Normal” Distribution is a distribution of data which has certain consistent properties

• These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained

• Most natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed

Page 21: Mod 5 05 Basic Statistics March 02

21© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

• Property 1: A normal distribution can be described completely by knowing only the:– mean, and– standard deviation

The Normal Distribution

Distribution OneDistribution One

Distribution Two

Distribution Two

Distribution ThreeDistribution Three

What is the difference among these three normal distributions?

Page 22: Mod 5 05 Basic Statistics March 02

22© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

The Normal Curve and Its Probabilities

43210-1-2-3-4

40%

30%

20%

10%

0%

Pro

bab

ilit

y o

f sa

mp

le v

alu

e

Number of standard deviations from the mean

99.73%

• Property 2: The area under sections of the curve can be used to estimate the cumulative probability of a certain “event” occurring

95%

68% Cumulative probability of obtaining a value between two values

Cumulative probability of obtaining a value between two values

Page 23: Mod 5 05 Basic Statistics March 02

23© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Number ofStandard

DeviationsTheoretical

NormalEmpiricalNormal

+/- 168% 60-75%

+/- 295% 90-98%

+/- 399.7% 99-100%

Empirical Rules for the Standard Deviation• The previous rules of cumulative probability closely apply even

when a set of data is not perfectly normally distributed.• Let’s compare the values for a theoretical (perfect) normal

distributions to empirical (real-world) distributions.

Page 24: Mod 5 05 Basic Statistics March 02

24© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Normal Probability Plots

• We can test whether a given data set can be described as “normal” with a test called a Normal Probability Plot

• If a distribution is close to normal, the normal probability plot will be a straight line.

• Minitab makes the normal probability plot easy.– Open Distskew.Mtw– Choose: Stat > Basic Stats > Normality Test >

• Produce a normal plot of each of the first 3 columns. Which appear to be normal?

• Now, graph a histogram of each.• What does this reveal?

Page 25: Mod 5 05 Basic Statistics March 02

25© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Normal Probability Plots

80706050403020100

300

200

100

0

C3

Freq

uenc

y

Normal Probability Plots

13012011010090807060

300

200

100

0

C2

Freq

uenc

y

Normal Probability Plots

1101009080706050403020

100

50

0

C1

Freq

uenc

y

Normal Probability Plots

1069686766656463626

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Normal

p-value: 0.328A-Squared: 0.418

Anderson-Darling Normality Test

N of data: 500Std Dev: 10Average: 70

Normal Distribution

13012011010090807060

.999.99.95

.80

.50

.20

.05.01

.001

Prob

abilit

y

Pos Skew

p-value: 0.000A-Squared: 46.447

Anderson-Darling Normality Test

N of data: 500Std Dev: 10Average: 70

Positive Skewed Distribution

80706050403020100

.999

.99

.95

.80

.50

.20

.05.01

.001

Prob

abilit

y

Neg Skew

p-value: 0.000A-Squared: 43.953

Anderson-Darling Normality Test

N of data: 500Std Dev: 10Average: 70

Negative Skewed Distribution

Page 26: Mod 5 05 Basic Statistics March 02

26© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Mystery Distribution

• Generate a Normal Probability Plot for the Mystery variable in C5.

• What is your conclusion? Is this a normal distribution?

15010050

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Mystery

p-value: 0.000A-Squared: 27.108

Anderson-Darling Normality Test

N of data: 500Std Dev: 32.3849Average: 100

Mystery Distribution

Page 27: Mod 5 05 Basic Statistics March 02

27© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Variable N Mean Median Tr Mean StDev SE Mean

Normal 500 70.000 69.977 70.014 10.000 0.447

Pos Skew 500 70.000 65.695 68.554 10.000 0.447

Neg Skew 500 70.000 73.783 71.368 10.000 0.447

Mystery 500 100.00 104.20 99.94 32.38 1.45

Variable Min Max Q1 Q3

Normal 29.824 103.301 63.412 76.653

Pos Skew 62.921 130.366 63.647 72.821

Neg Skew 1.866 77.106 67.891 76.290

Mystery 41.77 162.82 68.69 130.81

Exercise

• Open file DISTSKEW.MTW

• Stat > Basic Statistics > Display Descriptive Statistics

Page 28: Mod 5 05 Basic Statistics March 02

28© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

1801308030

95% Confidence Interval for Mu

1201101009080

95% Confidence Interval for Median

Variable: Mystery

82.78

30.49

97.15

Maximum3rd QuartileMedian1st QuartileMinimum

n of dataKurtosisSkewnessVarianceStd DevMean

p-value:A-Squared:

117.66

34.53

102.85

162.82 130.81 104.20 68.69 41.77

500.00 -1.63 0.01

1048.78 32.38 100.00

0.00 27.11

95% Confidence Interval for Median

95% Confidence Interval for Sigma

95% Confidence Interval for Mu

Anderson-Darling Normality Test

Descriptive Statistics

Stat > Basic Statistics > Display Descriptive Statistics> Graphs > Graphical Summary

Graphical Summary

Page 29: Mod 5 05 Basic Statistics March 02

29© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Exercise in “Data Mining”

• Remember the basic premise of Six Sigma, that sources of variation can be:– Identified– Quantified– Eliminated or Controlled

• The following example investigates potential sources of variation in breaking strength in a spin draw process.– Output: Breaking Strength– Inputs Tracked: Day, Doff, Spinneret and Draw ratio

• Objective: Which X’s affects variation in Y• Filename: Bhhmult.mtw

Page 30: Mod 5 05 Basic Statistics March 02

30© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Column Count Missing Name

C1 36 0 Day

C2 36 0 Doff

C3 36 0 Spinnert

C4 36 0 DrwRatio

C5 36 0 BrkStren The Info window of Minitab shows that the data set contains information about Day, Doff, Spinneret, Draw Ratio and Breaking Strength. There are 36 observations. The challenge is to determine what inputs are causing variation in the output.

Data Set

Page 31: Mod 5 05 Basic Statistics March 02

31© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

2927252321191715

10

5

0

BrkStren

Fre

qu

en

cy

Using the Graph > Histogram function we see the distribution of Breaking Strength. Values range from about 15 to about 30.

Variable N Mean Median Tr Mean StDev SE Mean

BrkStren 36 21.865 22.380 21.819 3.428 0.571

Variable Min Max Q1 Q3

BrkStren 15.330 29.720 19.242 24.138

Total Variation of Breaking Strength

Page 32: Mod 5 05 Basic Statistics March 02

32© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Let’s look at Draw Ratio and its effects on the variability of Breaking Strength. We can go to Stat > Basic Stats > Display Descriptive Statistics. Use the “By” statement.

Mining the Data

Page 33: Mod 5 05 Basic Statistics March 02

33© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Variable DrwRatio N Mean Median Tr Mean StDev SE Mean

BrkStren 1 12 18.774 18.990 18.625 2.560 0.739

5 12 22.282 22.815 22.377 1.821 0.526

10 12 24.538 24.565 24.621 3.017 0.871

Variable DrwRatio Min Max Q1 Q3

BrkStren 1 15.330 23.710 16.373 20.317

5 18.960 24.650 20.888 23.220

10 18.530 29.720 22.715 26.898

These results show that, as Draw Ratio varies from 1% to 10%, the average Breaking Strength varies from 18.8 to 24.5. If we could center Draw Ratio on 5%, the sigma for Breaking Strength would be reduced from 3.0 to about 1.8.

Breakdown by Draw Ratio

Page 34: Mod 5 05 Basic Statistics March 02

34© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Go to Graph > Character Graph > Dotplot and display Break Strength BY Draw Ratio.

Data Mining Graphically

Page 35: Mod 5 05 Basic Statistics March 02

35© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

DrwRatio 1

. ... . .... . . .

---+---------+---------+---------+---------+---------+---BrkStren

DrwRatio 5

.. . : :: ..

---+---------+---------+---------+---------+---------+---BrkStren

DrwRatio 10

. . . . .. :. . . .

---+---------+---------+---------+---------+---------+---BrkStren

15.0 18.0 21.0 24.0 27.0 30.0

Exercise: Investigate Day, Doff and Spinneret in the same way and be ready to report conclusions. Which is the strongest input in explaining variation in Breaking Strength.

Dotplots