Advanced Statistics Manual PDF

8/17/2019 Advanced Statistics Manual PDF

1/258

ADVANCED

STATISTICAL

METHODS

FOR ENGINEERS


2/258


3/258

Chapter Zero

Welcome to Advanced Statistical Methods forEngineers!

2

• Use name tents

• Cell phones:

– Turn off or use vibrate

– Take phone calls outside

• Keep side conversations to a minimum

• Be prompt in returning from breaks

• Don’t do other work during class

• Let instructor know if you need to leave for more than 30 minutes

• Listen with an open and active mind…

• If you have a question at any time, ask!

– Other Ground Rules wanted by students?…..

– Class agree to these Ground Rules?

Ground rules – please…


4/258

3

AgendaDay 1 Day 2 Day 3 Day 4

8:00Ch 0: Welcome

9:00 Ch 1: ANOVA andEquivalence Testing

10:00

11:00

12:00

1:00

2:00

3:00

4:00

5:00

Ch 3: Distribution Analys is

Ch 5: Regressionand GLM

Breaks as Needed

Lunch on your own

Ch 7: Statistical

Resources

Online Evaluations

Ch 4: Process

Capability and

Tolerance Intervals

Ch 2: MeasurementSystems Analysis

Ch 5: Regression

and GLM

continued

End of Day Review

Lunch on your own Lunch on your own Lunch on your own

En d o f Day Rev iew En d o f Day Rev iew En d o f Day Rev iew

Ch 6: Logistic

Regression

4

Logistics

• Starting Time: 8:00

• Ending Time: Not later than 5:00

• Lunch 12:00-1:00

• Breaks every 90-120 minutes

• Power Outlets

• Rest Room Location

• Food and drink locations (snacks, cafeteria, etc)


5/258

5

You Need ...

– Laptop with MINITAB and a working wireless InternetConnection

– Writing instruments

– Access to data files

6

Icebreaker (5 Minutes)

My favorite statistician, living or dead, is . . .

My favorite statistics joke is …

In my journey through the world of statistics…

(Extra Credit)

One thing that has worked well for me is …

One thing that has been a challenge for me is …


6/258

7

Expectations

– Tools, tools, tools…

• Course may overlap with material from DRM or Lean Sigma

• Tools may be familiar, but the intent is to present the tools with a focuson statistical thinking and decision-making.

• Topics may be explored in greater mathematical depth than is offered inother curricula.

– Benefits

• A deep mathematical dive can actually help you better see the surface.

• Awareness of mathematical assumptions is a critical first step forgrowing in your statistical knowledge, but advanced practitioners needto know:

– Which assumptions are most critical?

– When is it appropriate to break the rules?

– What are the consequences of breaking the rules?

• Statistical sophistication allows for flexibility and creativity in problemsolving.

8

Expectations – Experience Chart

• Mark an X in column that best describes yourexperience with each topic

– Your Expectations

• Create a list at your table

• Each table will report

• Spokesperson: skip itemsalready mentioned

– Time: 10 Minutes

Topic None A Little Comfortable Proficient I could teach it

Equivalence Testing

Tolerance Intervals

ANOVA Signal

Interpretation

Measurement Systems

Analysis

Distribution Analysis

Process Capability

General

Linear

Models


7/258

Your Feedback is Critical

• September 17-20 represents the first wave of Advanced SME atMDT

• Given that many of you already are leaders in the statistical orDRM worlds, your suggestions for course improvements areextremely important!

• At the end of each day, we will engage in brief feedbacksession.

• At the end of the week, there will be an online survey for you toformally evaluate the course.

• If you wish to provide more detailed feedback, please send anemail to the instructor team: Leroy Mattson, Karen Hulting, JeremyStrief, Tom Keenan, Grant Short, Dayna Cruz

| MDT Confidential9

10

What questions do you have?


8/258


9/258

Chapter 1: ANOVA and Equivalence Testing

Topics

• Quality Trainer Review

• ANOVA

– Assumptions

– Using Minitab Assistant vs Stat Menu

– Calculation Deep Dive

– Sample Size

– ANOVA Signals

• Equivalence Testing

| MDT Confidential2


10/258

| MDT Confidential3

Quality Trainer Review

Comparing Grouped Data:Variables Data Response

| MDT Confidential4


11/258

ANOVA: ASSUMPTIONS

| MDT Confidential5

One-way ANOVA:Testing for the significance of one factor

• The null hypothesis: – H0: μ1 = μ2 = … μk – Meaning that the population (response) means are equal at

each of the k levels of this factor or the factor is NOT significant.

• The alternative hypothesis: – H A: at least two population means are unequal

– Meaning that the factor IS significant

• Perform the One-way ANOVA and reject the null hypothesis ifthe p-value is < alpha – Usually alpha = 0.05 (or 0.10 or 0.01)

– A way to remember: “If p is low – the null must go”.

| MDT Confidential6


12/258

ANOVA: General Process Steps

• Select a model

• Plan sample size using relevant data or guesses

• (Optional) Simulate the data and try the analysis

• Collect real data

• Fit the model (perform ANOVA and get p value)

• Examine the residuals

• Transform the response or update the model, if

necessary

• State conclusion

| MDT Confidential7

Typical Assumptions for ANOVA Factors

• Factors (or “Inputs”)

– Each factor can be set to two or more distinct

levels

– Factor levels can be measured adequately

– Factor levels are “fixed” rather than “random”

– For multiple factors, all combinations of all levels

are represented (levels are “completely crossed”)

| MDT Confidential8


13/258

Typical Assumptions for ANOVA Responses

• Response data is “complete”, not censored

• Some software requires “balanced” data – same

sample size for each level of the input factor

• Assumptions on Residuals

– Residual = Response – Fitted Value

– Normally distributed

– Equal variance (assumption relaxed in Minitab

Assistant)

– Independent (e.g. no time trend)

| MDT Confidential9

ANOVA CALCULATIONS DEEP DIVE:STAT MENU & MINITAB ASSISTANT

| MDT Confidential10


14/258

ANOVA Calculations

• See www.khanacademy.org

– ANOVA 1 – Calculating SST (7:39)

– ANOVA 2 – Calculating SSW and SSB (13:20)

– ANOVA 3 – Hypothesis Test and F Statistic (10:14)


Minitab Analysis of Kahn Dataset


Can arrange either Stacked or Unstacked


15/258

Consider a PQ Dataset

• Three runs of n=10 units produced and tensile

tested

• See Ch1DataFile.mtw

• Columns TipTensile1, TipTensile2, TipTensile3


Minitab Options

• Could use – Stat -> ANOVA

– -> One way

– -> One way (Unstacked)

– -> General Linear Model

– Stat -> Regression -> GeneralRegression

– Minitab Assistant

• Data arrangement

– Stacked (one column for X, onecolumn for Y)

– Unstacked (Y values in columns foreach X)



16/258

ANOVA using Minitab Statistics Menu


Stat Menu Outputs


S, R2 and adjusted R2 are measures

of how well the model fits the data.


17/258

Judging model fit

• S is measured in the units of the response variable and represents thestandard distance data values fall from the fitted values

– For a given study, the better the model predicts the response, the lower S is

• R2 (R-Sq) describes the amount of variation in the observed responsevalues that is explained by the predictor(s)

– R2 always increases with additional predictors.

– R2 is most useful when comparing models of the same size

• Adjusted R2 is a modified R2 that has been adjusted for the number ofterms in the model

– R2 can be artificially high with unnecessary terms, while adjusted R2 mayget smaller when terms are added to the model

– Use adjusted R2

to compare models with different numbers of predictors


Comparisons Output



18/258

ANOVA – Examining Residuals

1) Test for

Normality

Normal

Probability Plot

is a Straight line

2) Test for Equal

Variances

Residual vs.

Fitted Values is

evenly distributed

around the 0 line

Using the Stacked arrangement, there wouldalso be a 4th Residual plot – Time Order.

This is a Test for Independence – looking for a

pattern over time.

Residuals are strongly non-normal . . .

Possible Causes:

• Failure of Equal Variance

Assumption

• Outliers

• Missing Important Factors

in the Model

• Data is from Non-Normal

Population

What to do?• Check for Outliers

• Check if Equal Variance is satisfied

• Perform Normality Test

• If data is from Non-Normal Population consider using

Non-Parametric Tests or Transform the Response

variable


19/258

If Residuals differ Group to Group

Possible Causes:

• Non-Constant Variance

• Outliers

• Missing Important Factors

in the Model

What to do?

• Test for equal variance assumption using Stat >

ANOVA > Test for Equal Variances

• If test indicates unequal variances then considertransforming the response variable

• Verify if the outlier is a data entry error

• Add the factor into the model

If there is a time pattern in the data . . .

What to do?

• Prevent by Randomizing

• A time effect may be present

• Consider time series procedure


20/258

Common Transformations

Transformation Comments

Appropriate for Poisson Distributed Data

Log(y)If the Response is exponentially increasing

then this transformation is appropriate

1/y Appropriate when responses are close to zero

Called the Arcsine Square Root function.

Appropriate when Response is a proportion

between zero and one.

y

ysin 1

Another useful tool is Box-Cox Transformation

0 ),(log

0 ,

:ProcedureCox-Box

whenY eY

whenY Y

Minitab

Box-Cox Transformation in Minitab

3210-1

12

10

8

6

4

2

0

Lambda

S t D e v

Lower CL Upper CL

Limit

Estimate 0.03

Lower CL -0.30

Upper CL 0.38

Rounded Va lue 0.00

(using 95.0% confidence)

Lambda

Box-Cox Plot of Data 1

Minitab > Stat > Control Charts > Box-Cox Transformation

Minitab Screenshots


21/258

ANOVA using Minitab Assistant


http://www.minitab.com/support/documentation/Answers/Assistant%20White%20Papers/OneWayANOVA_MtbAsstMenuWhitePaper.pdf

Report Card



22/258

Diagnostic Report


Power Report



23/258

Summary Report


ANOVA - Exercise

• Use Ch1DataFile.mtw

• Test for differences between the group means

using both Stat menu ANOVA and Minitab

Assistant ANOVA . . . for these 3-lot PQ studies:

– For TubeTensile1, TubeTensile2, TubeTensile3

– For Diameter1, Diameter2, Diameter3

• What are your conclusions?



24/258

ANOVA – Alternate Exercise

Analyze this data two ways: 1) Assistant and 2) Stat>ANOVA

Note: Stat>ANOVA assumes equal variances (and so may needtranformations), but Minitab Assistant ANOVA does no assume equalvariances.

An article in the IEEE Transactions on Components, Hybrids, andManufacturing Technology (Vol. 15, No. 2, 1992, pp. 146-153)described an experiment in which the contact resistance of abrake-only relay was studied for three different materials (all weresilver-based alloys).

Alloy-Contact Resistance.MPJ

Test at a alpha = 0.01 level

Does the type of alloy affect mean contact resistance?

Applied Statistics and Probability for Engineers, 4th Edition, Douglas C. Montgomery and George C. Runger

Alloy-Contact Resistance.MPJ

General Regression can be used for ANOVA

Use for

multiple

regression –

more than

one X

General regression can handle: 1) all continuous input(s), 2) all

categorical input(s), 3) a mixture of continuous and categorical

inputs, and 4) a non-normal response (it allows for the Box-Cox

transformation of the response).

The response must be continuous or considered as continuous.


25/258

General Regression: Example of ANOVA

Force in Grams

Condition Stylet 1 Stylet 2 Stylet 3

1 18.1 14.5 14.0

2 20.0 16.1 16.3

3 30.2 27.5 26.8

4 42.5 39.4 38.727.70 24.38 23.95

Note: A blocked One-way ANOVA is a two way ANOVA where one

factor’s effect is to be “ blocked out “ . The randomization is donewithin each block.

Background: The forces exerted by three different stylets in a lead is

compared at 4 different positi on/advancement cond itions (blocks).

The data is given below :

x

Perform an ANOVA analysis using Stats>Regression>General

Regression and determine if:

(1) there are significant differences between different stylets, and if

(2) the blocking factor employed was effect ive.

Stylet.MTW

stylet.MTW

Condition is

the Block

Blocked One-way ANOVA

x


26/258

Blocked One-way ANOVA

(1) Are there are significant differences between different stylets?

(2) Is the blocking factor employed effect ive?

SAMPLE SIZE FOR ANOVA



27/258

Planning Sample Size in ANOVA

• Fill in the number of levels for the factor

• Always fill in Standard Deviation (use conservative estimate)

• Then fill in two of the three long boxes

• Can specify several values, separated by spaces

Sample Size for One-Way ANOVA Example


28/258

Sample Size for One-Way ANOVA

RESPONDING TO ANOVA SIGNALS



29/258

Statistical vs. Practical Significance

• Key idea in any hypothesis testing effort

– If the test detects a difference (a “signal”), then what?

– Don’t assume the signal is automatically bad news (if

you’re hoping for consistency) or good news (if you’re

hoping for a change)

• For example, “ANOVA Failure” in PQ

– Examine the size of the signal in the appropriate

context . . . determine the “practical” significance of the

difference

– The appropriate response depends on an assessment

of both statistical and practical significance


ANOVA Signal in PQ

• There was a realization that a significant p-value

in the comparison of lot means should not

necessarily mean the PQ fails

• Analysis sometimes included to assess the

“power” of the ANOVA and the practical

significance of the difference in the means.

• Eventually, Corporate Policy on Manufacturing

Process Validation added the “ANOVA FailureFlow Chart”



30/258


2008 Version

of Corporate

Guideline for

Manufacturing

Process

Validation


2012

Version of

CRDM

ANOVA

Signal Flow

Chart


31/258

Pros and Cons

• Pro

– Provides a consistent way to address the questionof practical significance

– Relatively Simple

– Effective – expect the approach to stand up toregulatory scrutiny

• Con

– Can be very prescriptive

– Standards for Ppk are quite high: 95% confidence

bound on Ppk > 1.33 – Disincentive for larger sample size


Current approaches

• Corporate Guideline phased out

• CV procedure still has essentially the same

ANOVA Signal Flowchart

• CRDM originally had a more prescriptive version

• CRDM currently has a simplified version

• Would also work to include a discussion of the

sample size of the ANOVA and the practicalsignificance of the difference

• Discussion – other businesses?



32/258

Example of ANOVA Signal Flow Chart

• Recall the ANOVA exercise on Ch1DataFile.mtw

for TubeTensile1, TubeTensile2, TubeTensile3


ANOVA Signal Flow Chart Ppk Analysis


First Stack the 3 lots using Data -> Stack -> Columns

Then run

Stat -> Quality Tools -> Capability Analysis -> Normal

Add confidence interval for

Ppk using Options button


33/258

Next steps

• Total sample size is 90, so use confidence bound

• Lower 95% confidence bound on Ppk is 0.92

• Must make 3 more runs

– TubeTensile4, TubeTensile5, TubeTensile6

– These must pass tolerance interval analysis (like

the first three runs did)

– All six runs pass tolerance interval analysis


Conclusion


Note: Ppk analysis of all six lots is not

required. Included here FYI.


34/258

Exercise: ANOVA Signal

• Run ANOVA and assess practical significance for

– In Ch1DataFile.mtw, analyze

• WireTensile1, WireTensile2, WireTensile3

• Specification is 3 lb minimum

– Use one of the ANOVA Signal Flowcharts

– Then use another approach to determine the

practical significance of the difference between the

means

– Conclusion?


ANOVA: Summary And Recap

• Review Quality Trainer

• Calculations Deep Dive into ANOVA

• Analytically, ANOVA is a special case of

Regression

• Sample Size

• ANOVA Signal Flow chart – some Medtronic

divisions use one to standardize response to ANOVA Signal in PQ



35/258

EQUIVALENCE TESTING


Statistical Logic for Equivalence

• The basic statistical logic is designed to disproveequality.

– Null hypothesis: Two population parameters areequal, e.g. μ1 = μ2.

– Alternative hypothesis: Two population parametersare not equal, e.g. μ1 ≠ μ2.

• We need a different form of logic to affirmativelyprove equivalence.

– Null hypothesis: Two population parameters differby Δ or more, e.g. |μ1 - μ2| ≥ Δ.

– Alternative hypothesis: Two population parametersdiffer by less than ∆, e.g. |μ1 - μ2| < Δ.



36/258

Equality vs. Equivalence

Part of the confusion around the issue of

equivalence is that the concepts of equality and

equivalence may not be distinguished.

– Equality: Two values/processes are

mathematically identical.

– Equivalence: The difference between two

values/processes is sufficiently small that it can be

deemed practically insignificant.


Approach 1: Confidence Intervals

• The idea is to demonstrate that the confidence interval forthe difference of interest is fully contained within therange of practical significance [-Δ,Δ].

| MDT Confidential56 Jones, BMJ 1996


37/258

Approach 1: Confidence Intervals

• Step 1: Define Practical Significance – Before collecting data, use scientific/engineering

principles to decide what difference, Δ, is practicallynegligible.

• Step 2: Estimate Sample Size for Experiment – Based on characterization data or other assumptions,

estimate the sample size needed to produce aconfidence interval fully contained within [-Δ,Δ]. (Stat


38/258

Example of Approach 1


Met hod

Par amet er MeanDi st ri but i on NormalStandard devi ati on 3 ( esti mate)Conf i dence l evel 95%Conf i dence i nter val Two-s i ded

Resul t s

Margi n Sampl eof Err or Si ze

2 12

We need n=12 from

BOTH processes.

Example Output

• Conclusions:

• The processes are statistically different (p=0.003), whichis a statement about non-equality .

• Despite being unequal, the processes are still equivalent.The 95% confidence interval for the difference in means is(0.671, 2.798), which is a strict subset of [-3, 3]


Two- sampl e T f or New vs Ol d

N Mean St Dev SE Mean

New 12 30. 927 0. 858 0. 25

Ol d 12 29. 19 1. 52 0. 44

Di f f erence = mu (New) - mu (Ol d)

Est i mate f or dif f erence: 1.735

95%CI f or di f f erence: ( 0. 671, 2.798)

T- Test of di f f er ence = 0 ( vs not =) : T- Val ue = 3. 44 P- Val ue = 0. 003 DF = 17


39/258

Approach 1: Summary

• The confidence interval approach is the gold

standard for clinical trials and other high scrutiny

experiments requiring FDA approval.

• It is mathematically equivalent to a p-value-driven

approach called TOST (Two One-Sided T-tests).

• The confidence interval approach is easier to

understand than the original form of TOST.


Post-hoc Problems

• Rigorous application of approach 1 requires that

the Δ value be established before collecting data.

• What should we do when data have already been

collected without defining the difference of

interest or planning sample size?



40/258

Approach 2: Retrospective Power Analysis

• When data have already been collected withoutplanning for rigorous “equivalence testing”,equivalence may be assessed by displaying an entirepower curve.

• Even if this approach does not set a-priori standardsfor equivalence, – it provides additional context for an insignif icant p-value

– it can help engineering experts to make decisions

• Subjective judgment will be required to determine ifthe experiment was suitably powered to demonstrateequivalence.

• A power curve is a useful supplement to a traditionalanalysis, but it does not match the rigor in approach1.


Approach 2 Method

• After collecting the means and standard deviation

of the observed data, create a power curve

through the Power and Sample Size platform in

Minitab.

• Display and interpret the Power Curve in your

data analysis report.

• You may honestly believe that your experiment

was sufficiently powered (>80%) to detectmeaningful differences, but the post-hoc nature

of the analysis makes your argument weaker.



41/258

Example


• Consider again our old and new processes which havedistributions of N(30,22) and N(31,12), respectively.

• Suppose we forgot to take approach 1 and instead just collected5 data points from each process.

• We found a statistical difference when we collected 12 datapoints, but the p-value goes above 0.05 when collecting only 5:

Two-sample T for New_5 vs Old_5

N Mean StDev SE Mean

New_5 5 30.744 0.933 0.42

Old_5 5 29.42 3.02 1.4

Difference = mu (New_5) - mu (Old_5)

Estimate for difference: 1.3295% CI for difference: (-2.61, 5.25)

T-Test of difference = 0 (vs not =): T-Value = 0.93 P-Value = 0.403 DF = 4

Power Curve Inputs

• The observed sample size is n=5

• Desired power levels are in the range of .8-.95

• The pooled standard deviation is 2.24.



42/258

Power Curve Output

• With 80% power, this experiment could havedetected a difference of about 4.5.

• With 95% power, this experiment could havedetected a difference of about 6.

• It is a subjective engineering judgment as to whethersuch values provide sufficient reassurance about theexperimental results.


Extensions and Challenges

• Confidence intervals and power curves can be calculatedfor almost any type of statistical scenario:

– Comparing 2 means

– Comparing >2 means

– Comparing standard deviations

– Comparing reliability curves

• However, the required sample size for provingequivalence of standard deviations is often much largerthan the sample size for means.

• Equivalence for means can reasonably be quantified interms of arithmetic differences (e.g. |μ1 – μ2| < 5), butequivalence for standard deviations will be quantified interms of multiplicative differences (e.g. ½ < σ1/σ2 < 2).



43/258

Exercise – Lesion Depth

• Consider the key requirement for a new ablation catheter:equivalent (or greater) maximum lesion depth, compared to thecurrent design, where the difference of interest is 0.5 mm.

• Previous data shows – Normal distribution model is adequate for Max Lesion Depth

– Current Design has average max lesion depth of 2.3 mm

– New Design has average max lesion depth of 2.2 mm

– Largest pooled standard deviation of max lesion depth is 0.356.

• Follow Approach 1 to plan sample size for the equivalence test

• Assume test data as follows to complete the equivalenceanalysis – New: n=15, mean = 2.733, stdev = 0.342

– Current: n=15, mean = 2.723, stdev = 0.386

• State your conclusion


Alternate Exercise: Equivalence Testing

• Within your team, identify an example of

equivalence testing in your own work.

• Apply Approach 1, using actual or made-up

characterization data for the planning step.

• Use Minitab to simulate data collection.

– Hint: Use Calc -> Random Data -> Normal . . .

• Use Minitab to complete the Approach 1 data

analysis.

• State your conclusion from the data.



44/258

EQUIVALENCE Take Away Messages

• An insignificant p-value is not a rigorous method ofproving equivalence.

• Ideally, practical significance and sample size should beconsidered before the experiment begins.

• Rigorously proving equivalence first demands carefullydefining the threshold ( ∆) of practical significance.

• The most rigorous way to prove equivalence is todemonstrate that a confidence interval is fully containedwithin [- ∆, ∆].

• An alternative—but less formal—approach is toretrospectively perform a power analysis.

• Don’t feel like you need to remember all the Minitab steps;

we hope you remember the concepts and call yourneighborhood statistician for further support.


Summary and Review


• ANOVA

– Assumptions

– Using Minitab Assistant vs Stat Menu

– Calculation Deep Dive

– Sample Size

– ANOVA Signals

• Equivalence Testing



45/258

Chapter 2:Measurement Systems Analysis

Topics


• Topics with Variables Data

– Gage R&R Sample Size

– Probability of Misclassification (Variables Data)

– Helpful Hints

• MSA for Destructive Tests

• MSA for Attribute Tests

| MDT Confidential2


46/258

Quality Trainer Review

| MDT Confidential3

Value of Measurement Systems Analysis

If your goal is . . . then MSA helps by . . .

Process Improvement

Reducing variability in Xs and

Ys so that the “key” Xs may be

discovered.

Capability

Demonstration or

Estimation

More accurate measurements

of process performance

Sorting Out BadProduct

Reducing the Probability ofMisclassification

InnovationReduced noise allows discovery

of more subtle signals

| MDT Confidential4


47/258

5 | MDT Confidential

Recall . . . MSA Concepts

•Bias – Mean (Delta – difference -- from reference)

•Linearity – Mean (Bias vs Part or Operating Value)

•Stability – Mean (Bias vs Time)

•Repeatability – Standard Deviation

•Reproducibility – Standard Deviation

•Gage R&R – Standard Deviation

…so linearity

and stability

should be

plotted

…while bias,

repeatability and

reproducibility are

just single

numbers

Gage Bias and Linearity

• Bias is the difference between the average ofrepeated measurements and the “true value”

• MSA tends to focus on Gage R&R (variability), butaccuracy (= lack of bias) is equally important

– Assumption that procedures for Calibration are in place- need to confirm

– Assumption that procedures for Calibration areadequate – need to confirm

• “Linearity” is a study of bias across the range ofmeasured values

• In Minitab, use Stat -> Quality Tools -> Gage Study ->Gage Linearity and Bias Study

| MDT Confidential6


48/258


Gage Stability

> Stat > Control Charts > Variables Charts for Subgroups > Xbar-R

Day

S a m p l e M e a n

1 2 - S e p

5 : 0 0

1 2 - S e p

1 1 : 0 0

1 1 - S e p

5 : 0 0

1 1 - S e p

1 1 : 0 0

1 0 - S

e p 5 : 0 0

1 0 - S

e p 1 1 : 0 0

9 - S e

p 5 :

0 0

9 - S e p

1 1 : 0 0

8 - S e

p 5 :

0 0

8 - S e p

1 1 : 0 0

0.254

0.252

0.250

0.248

0.246

_ _ X=0.2497

UCL=0.253458

LCL=0.245942

Day

S a m p l e R a n g e

1 2 - S e p

5 : 0 0

1 2 - S e p

1 1 : 0 0

1 1 - S e p

5 : 0 0

1 1 - S e p

1 1 : 0 0

1 0 - S

e p 5 : 0 0

1 0 - S

e p 1 1 : 0 0

9 - S e

p 5 :

0 0

9 - S e p

1 1 : 0 0

8 - S e

p 5 :

0 0

8 - S e p

1 1 : 0 0

0.0100

0.0075

0.0050

0.0025

0.0000

_ R=0.00367

UCL=0.00946

LCL=0

Xbar-R Chart of Rep1, ..., Rep3

Xbar Chart - in control

R Chart - in control

Measurement system

is stable over time as

evidenced by:

Snap Gauge.mtwMINITAB®

GAGE R&R SAMPLE SIZE

| MDT Confidential8


49/258

Gage R&R Sample Size

• General recommendation:

– 5 to 10 Parts (P)

– 2 to 3 Operators (O)

– 2 to 3 Repeats (R)

• More rigorous methods

– Specify minimum Degrees of Freedom for

estimating Repeatability and Reproducibility

standard deviations

– Use confidence intervals for standard deviationestimates (option provided in Minitab 16)

| MDT Confidential9

Degrees of Freedom Approach

• Estimating Reproducibility Std Dev: O-1

– Include as many operators as feasible

• Estimating Repeatability Std Dev: P*O*(R-1)

– With 30 df, 90% confidence bound on ratio of estimateto true value is (0.79, 1.21). Ref: on www.minitab.comsearch for “ID 2613” to access “Minitab Assistant WhitePapers.”


CVG Test

Method

Validation


50/258

PROBABILITY OF

MISCLASSIFICATION



USLLSL

Probability of Misclassifying

Good Unit as Bad UnitProbability of

Misclassifying

Bad Unit as Good Unit

Misclassification

Two Misclassification Probabilities

• Probability of Misclassifying Bad Unit as Good

• Probability of Misclassifying Good Unit as Bad


51/258


MINITAB Simulated Estimation of Misclassification:

Following Gage RR study

Part mean = 30, Part Std Dev = 10, Part Upper Spec = 40

No measurement system bias Gage R&R Std Dev = 2.6

1) Calc/Random Data/Normal

(simulate true part measurements)2) Calc/Random Data/Normal

(simulate gage variability)


MINITAB Simulated Estimation of Misclassification (cont)

3) Calc/calculator/ use the “+”

Add 1) + 2) to simulate observed

measurements

4) Calc/calculator : assign a 1 for in

spec for 1)

Ex: (‘TrueMeasure’≤ 40)


52/258



5) Calc/calculator : assign a 1 for in

specs for 2)

Ex: (‘ObsMeasure’≤

40)

6) Stat/Table/Crosstabs to

crosstabulate 4) and 5).



Estimated % of Truly Out of Spec called In Spec is 2.1%.

The simulation sample size was 10000. A larger sample size would be better.


53/258


MINITAB Misclassification



Two problems:

1) Only three decimals for probabilities( i.e. 0.000)

2) Can’t enter historical: 1) process mean 2) part std.dev 3) gage std.dev

(Note: (2) can now be done with a CSR work aid 13)


54/258


Misclassification Using Minitab

and Work Aid 13

Load into the worksheet:the Part mean (30) and the Part Sigma (10) and the Gage Sigma (2.6)

CSRworkaid13 POM.mtwMINITAB®




55/258


MINITAB MisclassificationEnlarging the label on the sample mean chart, we see the mean is 30.



Examining the output we see that: USL 40, and the Part Sigma (10)and the Gage Sigma (2.6) .

Prob. of a truly bad part called good is .021


56/258

Probability of Misclassification (POM) Tool

• Originally written in R by Tarek Haddad to re-

create functionality lost when Medstat was

retired.

• Jim Dawson collaborated with Tarek to continue

development and turn it into an Excel tool.

• A substantial Software Validation effort was

undertaken by Nick Finstrom and Barry Christy,

with the support of Pete Patel and the CVG Test

Method Council. Validation work to be completedin early 2014.


POM Tool

• Replicates Medstat functionality

• More resolution in results than Minitab

• Graphics

• Guardbanding

• Normal, Lognormal and Weibull distributions of parts



57/258

POM with Guardband


Exercise

• Run POM analysis

– Using Minitab

Simulation

– Using Work Aid 13 and

Minitab GRR

– Using POM Tool



58/258

HELPFUL HINTS


Gage R&R Helpful Hints - Normality

• Normality testing is not needed for Gage R&Ranalysis

– Distribution of the raw data will depend strongly on theparts used in the study – there no expectation orassumption that the raw data will to follow any specificdistribution

– Repeated measurements on the same part by thesame operator will likely follow a normal distribution

• Like any ANOVA model, the residuals are assumed to follow

a normal distribution – but the analysis is relatively “robust”to non-normality of the residuals

– Probability of Misclassification does depend on the partor process distribution (each part measured once)



59/258

Gage R&R Helpful Hints – One-Sided

Specification• In the case of a one-sided specification, the Percent

Tolerance metric depends on the part average

• Minitab uses the overall average in the Gage R&R studyas the estimate of the part average

• If the parts used in the study are not representative of theexpected part distribution . . .

– The overall average will be a poor estimate of the processaverage

– The percent tolerance result will be misleading

– Best practice would be to calculate Percent Toleranceseparately using a better estimate of the process average

– Being “not representative” can be a good practice – forexample, including parts that don’t meet the specification


Corrective Actions for Failed Gage R&R

• Repeatability problem

– Could be due to part positional variation

• Standardize by measuring same position on each part

• Or make multiple measurements at random or systematic

positions and use the average

– If gage itself is too variable, may need to improve

or replace

• In the meantime, Repeatability variability can be filtered

out by taking repeated, independent measurements and

using the average. Note that this approach does not

correct for Reproducibility issues.



60/258

Corrective Actions for Failed Gage R&R

• Reproducibility Problem

– Look for assignable causes that explain the

operator-to-operator differences

– Understand any Operator*Part interactions – these

may provide clues to differences in technique.

– Possibly improve the measurement procedure

and/or re-train the operators

– Improve any visual aids or samples used in the

measurement procedure



Approaches to Robust Gage R&R

Standard Gage R&R methods assume that other factors that affect

measurements have been studied and controlled in the development

of the test method.

If these sources of variability still affect the measurements, then . . .

The Expanded Gage R&R allows you to add additional factors.

Besides operator & part, you could add fixture number, gage

number or other factors. The Expanded GRR can also handle

missing data.

Reference: “Make Your Destructive, Dynamic, and Attribute

Measurement System work for you” by William Mawby.

This book includes the Analysis Of Covariance method thatallows one to load in the varying environmental factors like

temperature & humidity (covariates) into a GRR.

The General Linear Model in Minitab (under the ANOVA branch)

can be used to model covariates (also handles missing data).


61/258

MSA FOR DESTRUCTIVE

MEASUREMENTS



Two Types of Destructive Measurements

1. Truly destructive: Measurement destroys unit being measured

Pull test

Peel test

Tensile test

2. Non-replicable: Measurement process can change the unit

or you are measuring a transient phenomena

Catapult distance

Motor speed

Heart rate

Dimension of silicon part (can compress)

Dimensions of heart tissue (can compress)

Ref: Make Your Destructive, Dynamic, and Attribute measurement System

Work for You. by. W. D. Mawby

In neither case is it possible

to take repeated measures,

so gage R&R is not possible.


62/258

Approaches to Destructive MSA

Approach Pro ConDevelop a non-destructive

measurement

Ideal solution Often difficult or

impossible

Attempt to use identical parts

as “repeat” measurements

and apply usual requirements

for GRR %Tolerance

Easy to apply usual

Minitab calculations

Rarely works because

parts aren’t actually

identical

Use a coupon test so that

parts are more identical

Results better than

above

Coupons may not be

representative – easier to

measure than real parts

Focus on improving the

measurement process using

DMAIC

Proven methodology Cannot conclude

whether measurement

system is adequate

Focus on Reproducibility Not affected by part-

to-part variability

Might miss a

Repeatability issue


What about using “Nested” Gage R&R?

• The “nested” Gage R&R analysis applies when one operatormeasures different parts than another operator. – For example, John measures parts 1, 2, 3, 4, 5 repeatedly and

Jane measures parts 6, 7, 8, 9, 10 repeatedly.

– Common application would be “Inter-laboratory Testing,” whereoperators at each location measure different parts repeatedly.

– Can work for Destructive MSA if each homogeneous samplemay be sub-sampled. Then operators can measure differentsamples repeatedly.

• Analysis – The nested analysis does not include a term for Part * Operator

interaction. – Note that Minitab Assistant doesn’t offer the Nested analysis

• Unless sub-sampling of homogeneous material is possible,Nested does not solve the key problem of Destructive MSA – It’s impossible to repeat the measurement



63/258


Destructive Gage R&R Example

Tensile testing of tubing

8 pieces of tubing

Each tubing cut into 2 sub samples

Assume variation between sub

samples due to measurement error

Assume an upper specification of

850 g

TestingSupplierCoils.mtwMINITAB®

Destructive Gage R&R using sub-samples



64/258


39|

MDT

Confi

denti

al



Large result for

% Tolerance

Measurement system does

not distinguish one part from

another within the range of

parts used in the study

Nearly all measurement system

variation due to repeatability

rather than operator

(reproducibility).. . . Or maybe

sub-sample differences?


65/258



Destructive Gage R&R using subsamples gave poor results

Since repeatability accounts for most of the apparent measurement

variation it is likely that parts were not very similar

In this project they used DMAIC Process Knowledge method to improve

system without obtaining a formal measurement

Focus on Reproducibility

• With destructive measurements, theRepeatability Standard Deviation always includesthe part-to-part or subsample-to-subsamplevariation. In general, repeatability standarddeviation cannot be accurately estimated.

• If one population of parts is randomly assigned tomultiple operators, then the ReproducibilityStandard Deviation is not affected by part-to-part

variation.• Reproducibility standard deviation can be

estimated accurately even for destructive tests.



66/258

Reproducibility

• Stop

– Trying to force (Repeatability + Part) Standard

Deviation to be small enough to meet a requirement.

– Trying to obtain or create “identical” parts.

• Start

– Estimate Reproducibility standard deviation and ensure

that it is small enough. This standard deviation

depends only on the differences between operator

means.

– Compare operator standard deviations. Identify caseswhere operators show substantially different variation

across equivalent sets of parts.


Example: CVG Test Method Validationfor Destructive Tests

• Obtain a population of 40 parts

– Do not need to get identical or nearly identical

parts

• Randomly assign 10 parts to each of 4 operators

• Calculate %Tolerance for Reproducibility

– Compare to requirement of 25%

• Calculate Std Dev Ratio

– Compare to simulation-based critical values (for

typical study, critical value is 3.10



67/258

Example Calculations

• Data based on actual TMV studies – But altered to disguise

– Detection Time A, Detection Time P


Detection Time A



68/258

Run One-Way ANOVA

• Reproducibility = sqrt((0.778-0.627)/10) = 0.123


Calculate Results

• % Tolerance (Reproducibility)

= 100 * ((6*0.123)/2*(30-11.740))

= 100 * (.738 / 36.52)= 2.02%

• Std Dev Ratio = 0.986 / 0.546 = 1.81

• Result: Pass



69/258

Detection Time P


Calculations for Detection Time P

• Reproducibility = sqrt((11.225-0.976)/10) = 1.01

• % Tolerance (Reproducibility)

= 100 * ( (6*1.01) / 2*(30-14.798) )

= 100 * (6.06 / 30.40)

= 19.9%

• Std Dev Ratio = 1.113 / 0.846 = 1.32• Result: Pass



70/258

Exercises

• Open Destructive Exercises.mtw

• For Bond Strength results:

– Assume specification is Minimum 5 lb

– Analysis

• Individual Value Plot

• % Tolerance for Reproducibility

• Std Dev Ratio

• Is this destructive measurement system adequate?

• Repeat for Buckle Force results – Assume specification is Maximum 340 grams


MSA FOR ATTRIBUTEMEASUREMENTS



71/258

ATTRIBUTE GAGE R&R

• Attribute data are usually the result of human judgment

– Which category does this item belong in?

• When categorizing items, you need a high degree of

agreement on which way an item should be categorized

• The best way to assess human judgment is to have all

operators repeatedly categorize several known test units

(Attribute Gage R&R)

– Look for agreement

• each person categorizes the same unit consis tently

• there is agreement between the operators on each unit

– Use disagreements as opportunities to determine and eliminateproblems


SETTING UP AN ATTRIBUTE GAGE STUDY

• Most important aspect of attribute Gage Study is

selecting parts (representative defects)

• Most challenging aspect is choosing parts for the

study. Typically use . . . – 50% acceptable parts

– 50% defective parts

• Have operators repeatedly classify parts inrandom order without knowledge of which part

they are classifying (blind study)



72/258

Analysis of Attribute Gage R&R

• Stat Quality Tools Attribute Agreement

Analysis

– Percent Agreement based on number of Parts

– Kappa Statistics (range -1 to 1)

• Minitab Assistant Measurement System

Analysis

– More graphical output

– Accuracy statistics based on number of Appraisals – No Kappa statistics


Use Minitab Assistant-> Measurement Systems Analysis (MSA)


73/258

Create Attribute Agreement worksheet

Create Attribute Agreement worksheet


74/258

• Choose Number of Appraisers = 3

• Choose Number of Trials = 2

• Choose Number of Test Items = 10

• Items 1-5 are “Good”; Items 6-10 are “Bad

• Click “OK”

• Copy column “Standards” and paste into “Results”

• Fix column name back to “Results”

• Find first trial of Item 1 and Item 2

– Change result from “Good” to “Bad” to inject twoerrors into the simulated study

• Save onto Desktop as “Attribute GRR”

Create Result Data

Attribute Agreement Analysis


75/258

Summary Report


standard 96.7% of the time.

The appraisals of the test items correctly matched the

100%< 50%

YesNo

96.7%

Appraiser 3 Appraiser 2 Appraiser 1

120

100

80

60

40

20

0

96.7%

thus very difficult to assess.

the study were borderline cases between Good and Bad,

-- High percentage of mixed ratings: May indicate items in

items are being passed on to the consumer (or both).

many Good items are being rejected, or too many Bad

-- High misclassification rates: May indicate that either too

or incorrect standards.

problems, such as poor operating definitions, poor training,

Low rates for all appraisers may indicate more systematic

indicate a need for additional training for those appraisers.

-- Low accuracy rates: Low rates for some appraisers may

measurement system can be improved:

Consider the following when assessing how the

Overall error rate 3.3%

Good rated Bad 6.7%

Bad rated Good 0.0%

ways)

Mixed ratings (same item rated both 6.7%

Misclassification Rat es

100.0100.0

90.0

% Accuracy by Appraiser Comments

Attribute Agreement Analysis for Results

Summary Report

Is the overall % accuracy acceptable?

Attribute “c=0”

result . . .

Showing that no

bad parts were

misclassified as

good

Overall, 96.7% of

presentations

were classified

correctly

Appraiser 3

Appraiser 2

Appraiser 1

100806040

Bad

Good

100806040

2

1

100806040

Appraiser 3

Appraiser 2

Appraiser 1

Appraiser 3

Appraiser 2

Appraiser 1

100806040

Good

Bad

% by A ppraiser

% by Standard

% by Trial

% by A ppraiser and Standard

Attribute Agreement Analysis for Results

Accuracy Report All graphs show 95% confidence intervals for accuracy rates.

Intervals that do not overlap are likely to be different.

Accuracy Report

Illustrates the

95% / 90% result


76/258

Kappa


Kappa is a measure of rater’s agreement.

Minitab:

• Reports two Kappa statistics: Fleiss’ & Cohen’s

• Defaults to Fleiss’ Kappa Minitab will only calculate Cohen’s Kappa if you choose the option for

Cohen’s Kappa, and if one of these two conditions is true:

• A) Two appraisers perform a single trial on each

sample

• B) One appraiser performs two trials on each sample Kappa is meant for attribute data.

Kappa ranges from -1 to 1.


Kappa (Landis and Koch)

According to AIAG (Auto industry), a general ru le of thumb i s:

A Kappa value greater than 0.75 ind icates a good to excellent

agreement

Kappa values less than 0.40 indicate poor agreement.

This general rule of thumb may not apply for most Medtronic

applications. Any disagreement on rejectable units would be of

concern.


77/258

Kappa calculations


Kappa results



78/258

Summary and Recap


• Topics with Variables Data

– Gage R&R Sample Size

– Probability of Misclassification (Variables Data)

– Helpful Hints

• MSA for Destructive Tests

• MSA for Attribute Tests


BACKUP SLIDES



79/258


Destructive Gage R&R - 2 Nested Designs

•2 Stage Nested Design Approach

•Samples are parts that canbe subdivided intohomogenous sub samples.

•Stage 1: 1 operatormeasures sub-samples (2-5)from parts (5-10).

•Stage 2: 3 operators eachmeasure same location perpart (5-10).

1 2 5

1

1 2 5

2

1 2 5

10Parts

Location

Stage 1

1 Operator

1 2 10

1Operator

Parts

Stage 2

1 sub-sample per part

1 2 10

2

1 2 10

3


Destructive Gage R&R - 2 Stage Die

Bond Example (cont.)•Project: Destructive 2 stage nested.mpj

Pull testing of die bond.

Parts are die. Sub-samples

are 5 wire locations on the

die. Spec = 7.5 grams

minimum.

Stage1: 1 operator pull

tests all 5 wire locations on

each of 10 die.

Stage 2: Each of 3

operators pull test 10 die at

wire location 1.

MINITAB®


80/258


Destructive Gage R&R - 2 Stage Die Bond

Example (cont.)

Stage 1: Stat > ANOVA > Fully Nested

ANOVA

Nested ANOVA: Pull Strength versus Die

Var i ance Component s

Source Var Comp. % of Total St Dev

Di e 0. 088 15. 50 0. 296Err or 0. 479 84. 50 0. 692

Tot al 0. 567 0. 753

2ˆ part

From worksheet: stage1


Nested ANOVA: Pull Strength (Wire 1) versus Operator

Vari ance Component s

Source Var Comp. % of Total StDev

Oper ator 0. 053 11. 08 0. 231

Err or 0. 428 88. 92 0. 654

Tot al 0. 481 0. 694

Destructive Gage R&R - 2 Stage Die Bond Example

(cont.)Stage 2: Stat >

ANOVA > Fully Nested

ANOVA

From worksheet: stage2

2ˆoperator

2

/ˆ

repeat part


81/258


2

ˆ repeat

= 2 /

ˆ repeat part

2

ˆ part

= 0.428 – 0.088 = 0.340

2

& R R = 0.340 + .053 = 0.393

Destructive Gage R&R - 2 Stage Die Bond Example

(cont.)Manual calculation of Gage Repeatability and Reproducibility

Compare Gage R&R variance to part variance if parts are

chosen to be representative of production process.Since this is a one-sided spec (7.5 grams) use

Misclassification to determine gage acceptance.


Kappa – Call Center Example

Call Center workers were asked to categorize types of calls they

received: Callcat.mtwMINITAB®


82/258


Kappa Attribute Analysis: Option Setting


Kappa : Within Appraiser Agreement


83/258


Kappa: Each Appraiser vs Standard


Kappa for Appraisers

What do we conclude from this analysis for the raters performance?

What would you do next?

Can this method be applied to the banana data?


84/258


85/258

Distribution AnalysisThe Art of Finding Useful Models

Jeremy Strief, Ph.D.

MECC Principal Statistician

Objectives

• Explain why distributional analysis is statistically

complicated (and sometimes emotionally frustrating!)

• Emphasize the importance of engineering theory and

historical precedent.

• Encourage the use of multiple graphical methods in

addition to numerical tests.

• Review common causes of Non-Normality.

• Discuss Transformations and how they compare to

fitting non-Normal distributions.

Medtronic Confidential


86/258

Recap from Quality Trainer

• Normal Distribution Basics

• Capability Analysis (Normal)

• Capability Analysis (Non-Normal)

• Graphical tools

– Boxplots

– Histograms

– Individual Value Plots

| MDT Confidential3

Distribution AnalysisMotivation and Philosophy


87/258


Why Assess Distribution• Statistical tools vary in sensitivity to and effect of distributional assumptions

• Some MDT procedures require distributional assessment for thosestatistical methods which are highly sensitive to distributional assumptions

Statistical

Tool Distributional

Sensitivity

Effect

of

Poor

Distributional

Fit

Capability Analysis High Incorrect PPM/Ppk

Tolerance Intervals High Incorrect Bounds

Variables Lot Acceptance Sampling High Altered rejection and acceptance rates

Individuals Chart for SPC High Incorrect control limits

GLM / Regression / ANOVA Med approximate p‐value

Xbar chart for SPC Med/Low approximate p‐value

Two‐sample t‐test Low approximate p‐value

Non‐parametric methods Low approximate p‐value


Time

P e r c

e n t

60504030201 00-1 0-2 0

99.9

99

95

90

80

7060

504030

20

10

5

1

0.1

M ean

< 0.005

12.31

S t D e v 9 .6 56

N 100

A D 5.7 38

P -V alu e

P r o b a b i l i t y P l o t o f T i m eNorm a l

Time

F r e q u e n c y

50403020100

40

30

20

10

0

H i s t o g r a m o f T i m e

Not All Data Are Normal: Example

Lead Time Data

usually have a long

tail – skewed

distribution


88/258

Not All Data are Normal: Considerations

• Observed data need not follow any tractablemathematical model.

• Some mathematical models may be useful, if

imperfect, representations of the data.

| MDT Confidential7

Frustrations with Distributional Analysis

• Larger sample sizes (n>100) cause the statistical

tests to detect small departures from a theoretical

model. Such departures may not be practically

significant.

• Smaller sample sizes (n


89/258

The Underlying Statistical Hypotheses

• The statistical hypothesis testing is ‘backward,’ in that the null

hypothesis assumes that the particular distribution is a good fit.

– H0: Distribution specified has a good fit

– H1: Distribution specified has lack-of-fit

• Low p-values will disprove the fit of a distribution. So certain

distributions can be ruled out as a reasonable models.

• Using the standard goodness-of-fit metrics, it is technically not

possible to prove that a particular distribution is the “true model”

for the data.

• Instead of providing statistical “proof”, distribution analysis is

geared toward assessing which statistical distributions are

plausible models for the data at hand.

| MDT Confidential9

Philosophy of Distribution Analysis

“All models are approximations. Essentially, all

models are wrong, but some are useful. However,

the approximate nature of the model must always

be borne in mind.”

--G.E.P. Box



90/258

N=15 Probability Plots


N=500 Examples


Only 12 out of 500 values were affected by the truncation or

censoring.


91/258


How to Determine Distribution

1. Scientific/Engineering Knowledge

2. Historical distribution analysis

3. Distribution analysis

Priority order

Why is

distribution

analysis last?

• Sample size (50 to 100)

• Regardless of n, key Xs and shift and drift

can mask true distribution

Distribution applies to short term data only

Importance of Engineering Theory

• The choice of distribution should be both statistically

plausible and scientifically justified.

• Engineering theory and historical precedents often

suggest whether a distribution should be Normal,

Lognormal, or Weibull.

• If scientific theory does not lead to one single

statistical model, at least consider

– Whether the distribution should be skewed or symmetric

– Which distributions can be ruled out



92/258

Data Analysis Philosophy

• Information shouldn’t be destroyed. Examples ofinformation destruction are

– Converting variables data to attribute data.

– Heavy rounding with a bad measurement system.

– Drifting measurement system.

• Check the quality and structure of the raw data.

– Are there physically impossible values, wildoutliers, missing values, too many ties?

– Are the data paired or unpaired?

– Was randomization employed? – How was the data generated?


Data Analysis Philosophy

• Plot the data AND do analytics.

– PLOT histograms, run charts, scatter plots,… .See what is going on. Do a probability plot forprocess data.

– Use ANALYTICS to get quantitative about whatyou have seen. Examine the residual plots fromanalytical model fits.

• Analyses are performed on yesterday’s data

today to predict tomorrow’s performance. – Data from an unstable process that is analyzed

(ignoring the instability) may result in a conclusionthat will not hold up tomorrow.



93/258

Distribution AnalysisReview of Engineering Distributions

Most Common Statistical Models for

Engineering Applications

• Weibull

• Exponential (special case of Weibull)

• Lognormal

• Normal



94/258

Weibull

• A flexible model which can assume many different shapes, depending on the

choice of parameters• Scale parameter α or η

• Shape parameter β

• Arises from “weakest link” failures, or situations when the underlying process

focuses on the minimum or maximum value of independent, positive random

variables.

• Models stress-strength failures


Exponential

• Special case of Weibull when β=1

• Constant hazard rate, meaning that the probability of failure is not a

function of the age of the device/material.

• May occur when multiple failure modes are operating simultaneously

• May be useful in modeling software failures resulting from external

sources (e.g. cosmic radiation causes bit-flips at an extremely low,

constant rate)



95/258

Lognormal

• Models time-to-failure caused by several forces which combine

multiplicatively.

• Describes time to fracture from fatigue crack growth in metals.

• Right skewed distribution, useful when data values take multiple

orders of magnitude (e.g. 1.4, 14, 140).

• Two parameters (μ,σ), each of which is traditionally expressed on

the log scale.

• So if X~Lognormal(μ,σ), then ln(X)~Normal(μ,σ)


Normal

• Models time-to-failure caused by additive, independent forces

• Commonly describes gage error, dimensional measurements from

a supplier, and other symmetric, bell-shaped phenomena



96/258

Additional Models to Consider

• Logistic• Smallest Extreme Value (SEV)

• Largest Extreme Value (LEV)


Some Relationships

• SEV distribution = ln(Weibull distribution).

• LEV distribution = ln(1/Weibull distribution).

• Normal distribution = ln(Log-normal distribution).

• All Weibull distributions can be rescaled and

repowered to get another Weibull.

• The Weibull(100,4) is very close to a Normal

(mean=90.64, s.d= 25.43). This normal is thicker in

the tails than the Weibull (100,4). Ref: 02SR013

“Algorithm for Computing Weibull Sample Size forComplete Data”



97/258


Review: Common Engineering Distributions

DimensionsMeasurement

error

Lead Time

Normal Weibull

Lognormal

Default

Time to

stress/strength

related failure

Time to

fatiguerelated failure

Infant

mortality

Wearout

Distribution AnalysisStatistical Overview


98/258

Statistical Approach to Distribution Analysis

• Both graphical and numerical approaches areneeded

• P-value is not definitive, given the “backward”

nature of hypothesis testing

• Visual assessment of the probability plot is

crucial

• Reasonably large sample sizes (~50) are

needed. Consult your local procedures (e.g.

DOC000550 within CRDM) for specific rules.


Distribution AnalysisGraphical Methods


99/258

Good Distribution Analysis Should

Always Begin With Plots!

• Probability plots

• Histograms

• Time plots


Probability Plot

• A probability plot is a 2-dimensional plot with specialized (often

logarithmic) axes, to facilitate comparison between observed

data and a hypothesized distribution.

• More specifically, a probability plot is a comparison between the

observed and theoretical quantiles (i.e. percentiles) for a

hypothesized distribution.



100/258

Probability Plot Interpretation

• If the distribution i s a good fit to the data, the plotted points

should fall approximately in a straight line.

• When interpreting the probability plot, examine both the p-value

and the visual fit.

– At the tails of the distribution, look whether the points are falling on

the conservative side of the fitted line.

– Look for major deviations in the pattern of points from a straight

line—kinks, ties, curves, jumps, etc. Do not worry if a few points

fall outside the confidence bounds.

– Fat Pencil Test: Can the observed data values be covered up by a

“fat pencil”?


Probability Plot in Minitab



101/258

Probability Plot Examples

Large N makes for

obvious curvature:


Right skew and

curvature:


“Subtle Patterns” can becaused by randomness


Both datasets were

sampled directly from a

Normal distribution.


102/258


• Distribution does not pass the Anderson-Darling test, but the

lower tail of the distribution falls on the conservative side of thefitted line.

• Distribution appears to have a lower limit of zero

• It would be conservative to use the Normal model to estimate

the lower tail behavior.


Histograms in Minitab

The graph menu offers a histogram platform, but the graphical

summary platform offers more information with fewer clicks.



103/258

Histograms

• More intuitive than probability plots, since the x-y axes are not

transformed.• Not informative with small sample sizes (


104/258


Why is Stability needed to Assess Distribution?

Distribution Assessment Risks

• Shift and Drift, and Variation in Key Xs

masks distribution

• Initial capability data always contains

Shift and Drift

• At Final Capability, process is stable

and variation in Key Xs is removed

Distribution Analysis Shift and Drift.mtw

100 samples from Week 1



Distribution applies to short term data only

MINITAB®


Initial Process Data often have Shift and Drift

Observation

I n d i v i d u a l V a l u e

221199177155133111896745231

35

30

25

20

15

10

5

_ X=19.93

UCL=26.30

LCL=13.55

1

11

11

1

1

1

111

1

111

11

1

11

1

1

1

11

11

1

111

1

1

1

1

1

1

1

111111

11

1

1

11

1

1

11111

1

11

1

111

11

111

111

1

1111

1

1111

1

1

1

11

11

1

1

1

11

1

1

11

11

1

11

11

11

11

1

11

1

1

11

11

111

1

11

11

1

1

11

1

1

111

111

11111

1

1

1

1

1

111

111

11

1

1

11

1

111

1

1

1

11

11

1

1111

1

11

11

1

1

11

1

1

1

11

1

1

I Chart of Initial Capability Data


105/258


Long Term Data May not be Normal

Initial Capability Dat a

P e r c e n t

6050403020100-10-20

99.9

99

95

90

80

7060504030

20

10

5

1

0.1

Mean


106/258

Distribution AnalysisNumerical Methods

Numerical Methods

• For all numerical methods:

– A large (≥0.05) p-value implies there is no evidenceagainst the hypothesized distribution.

– A small (


107/258


Most Common Normality Tests

• Anderson-Darl ing (AD) test

• Ryan-Joiner test

Note: The Ryan-Joiner test is essentially

equivalent to the Shapiro-Wilk test.

Anderson-Darling

• Default approach in Minitab.

• May be used to assess fit of Normal and non-

Normal distributions.

• Gives unreliable results when data are

discretized/grouped, which is fairly common

when measurement system resolution is poor.



108/258

Anderson-Darling in Minitab

For assessing Normality:


Anderson-Darling in Minitab

For any/all distributions:



109/258

Anderson-Darling Results

Normal(10,1.5)


Normal(10,1.5)--Rounded

Ryan-Joiner

• Useful for discretized, rounded, or clumpy data

• Will not declare significant lack-of-fit simply due to poor

measurement resolution

• Recommended minimum of 5 “groups” to have a meaningful p-

value. Fewer groups may yield an overly optimistic (high) p-

value.


Anderson-Darling Ryan-Joiner


110/258

Ryan-Joiner in Minitab


Truncation

• The Normal distribution may be used to model tail

behavior if it provides a conservative estimate of

those tails.

• This situation arises when data are truncated, which

is quantitatively captured as negative kurtosis.



111/258

Truncation

• In principle, truncated data may be evaluatedgraphically or through a Skewness-Kurtosis (SK) test.

• The SK test checks whether the tails of the Normal

distribution are longer or shorter than the tails of your

data.

• MECC has created and validated an Excel

spreadsheet (R134997) which executes the SK test.

• In practice, consult your local procedures to ensure

your analysis of truncated data is compliant.


Microsoft Excel

Worksheet

Avoiding Parametric Distributions Altogether

• Chebyshev’s inequality captures the tail behavior of anystatistical distribution with a finite variance. – For any random variable X and constant k > 1,

P( |X- μ | ≥ k σ ) ≤ 1/k 2

• This inequality may be useful for skipping the issue ofdistributional fit altogether, especially if distributional fit isbeing assessed in order to compute a tolerance interval.

• Chebyshev’s will only be helpful if the process capability isextremely high.

• Consult your own procedures for details, but CRDMprocedures invoke the following version of Chebyshev: – If the nearest specification is at least 10 standard deviations

away from the mean, it may be inferred by Chebyshev that at

least 99% of the distribution will fall within specification.



112/258

Why Normality Tests Fail

1. A shift occurred in the middle of the data

2. Multiple sources or multiple failure modes withdifferent distributions

3. Outliers

4. Piled up data.

5. Truncated data (sorted before you get it)

6. The underlying distribution is not normal (skewed)

7. Poor measurement resolution

8. Too much data (over powered to detect non-normality)

9. Due to random chance –you expect the test to fail5% of the time (i.e. 95% confidence) if the data weretruly from a normal distribution.

Resolving Non-Normality

1 Data shift Sublot

Skewness/kurtosis test

Attribute sampling

2 Multiple data sources Sublot


Attribute sampling

3 Outliers Attribute sampling

Outlier removal

(May remove outliers only if they

constitute typos or data collection

errors.)

4/5 Censored/Truncated data

(tails lost)


Conservative fitting

Attribute sampling

6 Distribution not normal Non‐normal analysis

Transformation

Attribute Sampling

7 Poor measurement resolution Ryan‐Joiner


8 Too much data Graphical evidence

Randomsubsampling

9 Random Chance Historical assessment


113/258

When Multiple Distributions Fit

Prior engineering knowledge is

particularly useful when multipledistributions yield p-values above 0.05:

– Picking the distribution solely based on best p-value orbest R2 is rational when there is absolutely no history orscientific theory.

– A better approach is to assemble a list of plausible(p>0.05) distributions and then make a final choice basedupon history and science.

– P-values will sometimes be below 0.05 simply as a resultof chance (Type I error). It is not recommended toimmediately change years of analysis based on onesignificant p-value. Investigate and monitor beforechanging distributions.


Avoid the daily special

– Do NOT take the “distribution du jour” approach, in

which multiple distributions are chosen for a single

process. This reflects either:

• An out-of-control process, which can’t be

captured by a single distribution anyway.

• The bad statistical practice of just defaulting to

the distribution with the highest p-value.



114/258


Example: Capability for Non-Normal Data

using Tribal Knowledge for Distribution

Problem Statement: Time (in days) to process

(reject/accept) loan applications is too long causing loss in

customer applications

Project Goal: Decrease potential customer loss from

15% to 5%. Customer expectation is 20 days.

Project Strategy: Path Y = Time

Task: Determine capability for Y = Time

LoanApplicationTime.MTW

Assume lead time has a LogNormal Distribution

MINITAB®


Time

P e r c e n t

100101

99.9

99

95

90

80

7060504030

20

10

5

1

0.1

Loc

0.299

2.269

Scale 0 .6845

N 100

AD 0.432

P-Value

Probability Plot of TimeLognormal - 95% CI

Verify Lognormal Distribution

Check if LogNormal

provides a good fit Time

P e r c e n

t

10 0101

99.9

99

95

90

80

7060504030

20

10

5

1

0.1

Lo c

0.299

2.269

S c a le 0 .6 84 5

N 100

A D 0.4 32

P -Value

Probabil ity Plot of T imeLogno rm a l - 9 5% CI


115/258


Capability for Non-Normal Data using LogNormal

50403020100

USL

Process Data

Samp le N 100

Location 2.26918

S ca le 0 .6 84 49 3

LSL *

Target *

USL 20

Sample Mean 12.31

Ov erall Capability

Z.Bench 1.06

Z.LSL *

Z .U S L 0 .4 7

P pk 0.16

Observed Performance

P PM < LS L *

PPM > USL 160000

PPM Total 160000

Exp. Overall Performance

P PM < LS L *

PPM > USL 144242

PPM Total 144242

Process Capability of TimeCalculations Based on Lognormal Distribution Model

Distribution AnalysisTransformations


116/258

Two Options

• When a dataset is non-Normal, it is acceptable either to

– Mathematically transform the data to achieve Normality – Fit a non-Normal distribution

• Transformation carries the practical advantage that manystatistical methods are based upon Normality, so there willbe more analytical tools available for the transformeddataset.

• Transformation carries the disadvantages of creatingunnatural units (e.g. log-meters instead of meters) andaltering potentially relevant structures of the data.

• Note: Please do NOT try transformations of data froman unstable process, or bimodal data (two bumps).


Transformation Advice

• If a transformation is chosen, it should be as

simple as possible, and it should ideally have a

physical interpretation.

• A log transformation is particularly desirable,

since it

– Is monotonic

– Is straightforward to interpret (it turns multiplicative

effects into additive effects)

Documents

Advanced Statistics Manual PDF