Upload
hamartinez
View
265
Download
3
Embed Size (px)
Citation preview
8/17/2019 Advanced Statistics Manual PDF
1/258
ADVANCED
STATISTICAL
METHODS
FOR ENGINEERS
8/17/2019 Advanced Statistics Manual PDF
2/258
8/17/2019 Advanced Statistics Manual PDF
3/258
Chapter Zero
Welcome to Advanced Statistical Methods forEngineers!
2
• Use name tents
• Cell phones:
– Turn off or use vibrate
– Take phone calls outside
• Keep side conversations to a minimum
• Be prompt in returning from breaks
• Don’t do other work during class
• Let instructor know if you need to leave for more than 30 minutes
• Listen with an open and active mind…
• If you have a question at any time, ask!
– Other Ground Rules wanted by students?…..
– Class agree to these Ground Rules?
Ground rules – please…
8/17/2019 Advanced Statistics Manual PDF
4/258
3
AgendaDay 1 Day 2 Day 3 Day 4
8:00Ch 0: Welcome
9:00 Ch 1: ANOVA andEquivalence Testing
10:00
11:00
12:00
1:00
2:00
3:00
4:00
5:00
Ch 3: Distribution Analys is
Ch 5: Regressionand GLM
Breaks as Needed
Lunch on your own
Ch 7: Statistical
Resources
Online Evaluations
Ch 4: Process
Capability and
Tolerance Intervals
Ch 2: MeasurementSystems Analysis
Ch 5: Regression
and GLM
continued
End of Day Review
Lunch on your own Lunch on your own Lunch on your own
En d o f Day Rev iew En d o f Day Rev iew En d o f Day Rev iew
Ch 6: Logistic
Regression
4
Logistics
• Starting Time: 8:00
• Ending Time: Not later than 5:00
• Lunch 12:00-1:00
• Breaks every 90-120 minutes
• Power Outlets
• Rest Room Location
• Food and drink locations (snacks, cafeteria, etc)
8/17/2019 Advanced Statistics Manual PDF
5/258
5
You Need ...
– Laptop with MINITAB and a working wireless InternetConnection
– Writing instruments
– Access to data files
6
Icebreaker (5 Minutes)
My favorite statistician, living or dead, is . . .
My favorite statistics joke is …
In my journey through the world of statistics…
(Extra Credit)
One thing that has worked well for me is …
One thing that has been a challenge for me is …
8/17/2019 Advanced Statistics Manual PDF
6/258
7
Expectations
– Tools, tools, tools…
• Course may overlap with material from DRM or Lean Sigma
• Tools may be familiar, but the intent is to present the tools with a focuson statistical thinking and decision-making.
• Topics may be explored in greater mathematical depth than is offered inother curricula.
– Benefits
• A deep mathematical dive can actually help you better see the surface.
• Awareness of mathematical assumptions is a critical first step forgrowing in your statistical knowledge, but advanced practitioners needto know:
– Which assumptions are most critical?
– When is it appropriate to break the rules?
– What are the consequences of breaking the rules?
• Statistical sophistication allows for flexibility and creativity in problemsolving.
8
Expectations – Experience Chart
• Mark an X in column that best describes yourexperience with each topic
– Your Expectations
• Create a list at your table
• Each table will report
• Spokesperson: skip itemsalready mentioned
– Time: 10 Minutes
Topic None A Little Comfortable Proficient I could teach it
Equivalence Testing
Tolerance Intervals
ANOVA Signal
Interpretation
Measurement Systems
Analysis
Distribution Analysis
Process Capability
General
Linear
Models
8/17/2019 Advanced Statistics Manual PDF
7/258
Your Feedback is Critical
• September 17-20 represents the first wave of Advanced SME atMDT
• Given that many of you already are leaders in the statistical orDRM worlds, your suggestions for course improvements areextremely important!
• At the end of each day, we will engage in brief feedbacksession.
• At the end of the week, there will be an online survey for you toformally evaluate the course.
• If you wish to provide more detailed feedback, please send anemail to the instructor team: Leroy Mattson, Karen Hulting, JeremyStrief, Tom Keenan, Grant Short, Dayna Cruz
| MDT Confidential9
10
What questions do you have?
8/17/2019 Advanced Statistics Manual PDF
8/258
8/17/2019 Advanced Statistics Manual PDF
9/258
Chapter 1: ANOVA and Equivalence Testing
Topics
• Quality Trainer Review
• ANOVA
– Assumptions
– Using Minitab Assistant vs Stat Menu
– Calculation Deep Dive
– Sample Size
– ANOVA Signals
• Equivalence Testing
| MDT Confidential2
8/17/2019 Advanced Statistics Manual PDF
10/258
| MDT Confidential3
Quality Trainer Review
Comparing Grouped Data:Variables Data Response
| MDT Confidential4
8/17/2019 Advanced Statistics Manual PDF
11/258
ANOVA: ASSUMPTIONS
| MDT Confidential5
One-way ANOVA:Testing for the significance of one factor
• The null hypothesis: – H0: μ1 = μ2 = … μk – Meaning that the population (response) means are equal at
each of the k levels of this factor or the factor is NOT significant.
• The alternative hypothesis: – H A: at least two population means are unequal
– Meaning that the factor IS significant
• Perform the One-way ANOVA and reject the null hypothesis ifthe p-value is < alpha – Usually alpha = 0.05 (or 0.10 or 0.01)
– A way to remember: “If p is low – the null must go”.
| MDT Confidential6
8/17/2019 Advanced Statistics Manual PDF
12/258
ANOVA: General Process Steps
• Select a model
• Plan sample size using relevant data or guesses
• (Optional) Simulate the data and try the analysis
• Collect real data
• Fit the model (perform ANOVA and get p value)
• Examine the residuals
• Transform the response or update the model, if
necessary
• State conclusion
| MDT Confidential7
Typical Assumptions for ANOVA Factors
• Factors (or “Inputs”)
– Each factor can be set to two or more distinct
levels
– Factor levels can be measured adequately
– Factor levels are “fixed” rather than “random”
– For multiple factors, all combinations of all levels
are represented (levels are “completely crossed”)
| MDT Confidential8
8/17/2019 Advanced Statistics Manual PDF
13/258
Typical Assumptions for ANOVA Responses
• Response data is “complete”, not censored
• Some software requires “balanced” data – same
sample size for each level of the input factor
• Assumptions on Residuals
– Residual = Response – Fitted Value
– Normally distributed
– Equal variance (assumption relaxed in Minitab
Assistant)
– Independent (e.g. no time trend)
| MDT Confidential9
ANOVA CALCULATIONS DEEP DIVE:STAT MENU & MINITAB ASSISTANT
| MDT Confidential10
8/17/2019 Advanced Statistics Manual PDF
14/258
ANOVA Calculations
• See www.khanacademy.org
– ANOVA 1 – Calculating SST (7:39)
– ANOVA 2 – Calculating SSW and SSB (13:20)
– ANOVA 3 – Hypothesis Test and F Statistic (10:14)
| MDT Confidential11
Minitab Analysis of Kahn Dataset
| MDT Confidential12
Can arrange either Stacked or Unstacked
8/17/2019 Advanced Statistics Manual PDF
15/258
Consider a PQ Dataset
• Three runs of n=10 units produced and tensile
tested
• See Ch1DataFile.mtw
• Columns TipTensile1, TipTensile2, TipTensile3
| MDT Confidential13
Minitab Options
• Could use – Stat -> ANOVA
– -> One way
– -> One way (Unstacked)
– -> General Linear Model
– Stat -> Regression -> GeneralRegression
– Minitab Assistant
• Data arrangement
– Stacked (one column for X, onecolumn for Y)
– Unstacked (Y values in columns foreach X)
| MDT Confidential14
8/17/2019 Advanced Statistics Manual PDF
16/258
ANOVA using Minitab Statistics Menu
| MDT Confidential15
Stat Menu Outputs
| MDT Confidential16
S, R2 and adjusted R2 are measures
of how well the model fits the data.
8/17/2019 Advanced Statistics Manual PDF
17/258
Judging model fit
• S is measured in the units of the response variable and represents thestandard distance data values fall from the fitted values
– For a given study, the better the model predicts the response, the lower S is
• R2 (R-Sq) describes the amount of variation in the observed responsevalues that is explained by the predictor(s)
– R2 always increases with additional predictors.
– R2 is most useful when comparing models of the same size
• Adjusted R2 is a modified R2 that has been adjusted for the number ofterms in the model
– R2 can be artificially high with unnecessary terms, while adjusted R2 mayget smaller when terms are added to the model
– Use adjusted R2
to compare models with different numbers of predictors
| MDT Confidential17
Comparisons Output
| MDT Confidential18
8/17/2019 Advanced Statistics Manual PDF
18/258
ANOVA – Examining Residuals
1) Test for
Normality
Normal
Probability Plot
is a Straight line
2) Test for Equal
Variances
Residual vs.
Fitted Values is
evenly distributed
around the 0 line
Using the Stacked arrangement, there wouldalso be a 4th Residual plot – Time Order.
This is a Test for Independence – looking for a
pattern over time.
Residuals are strongly non-normal . . .
Possible Causes:
• Failure of Equal Variance
Assumption
• Outliers
• Missing Important Factors
in the Model
• Data is from Non-Normal
Population
What to do?• Check for Outliers
• Check if Equal Variance is satisfied
• Perform Normality Test
• If data is from Non-Normal Population consider using
Non-Parametric Tests or Transform the Response
variable
8/17/2019 Advanced Statistics Manual PDF
19/258
If Residuals differ Group to Group
Possible Causes:
• Non-Constant Variance
• Outliers
• Missing Important Factors
in the Model
What to do?
• Test for equal variance assumption using Stat >
ANOVA > Test for Equal Variances
• If test indicates unequal variances then considertransforming the response variable
• Verify if the outlier is a data entry error
• Add the factor into the model
If there is a time pattern in the data . . .
What to do?
• Prevent by Randomizing
• A time effect may be present
• Consider time series procedure
8/17/2019 Advanced Statistics Manual PDF
20/258
Common Transformations
Transformation Comments
Appropriate for Poisson Distributed Data
Log(y)If the Response is exponentially increasing
then this transformation is appropriate
1/y Appropriate when responses are close to zero
Called the Arcsine Square Root function.
Appropriate when Response is a proportion
between zero and one.
y
ysin 1
Another useful tool is Box-Cox Transformation
0 ),(log
0 ,
:ProcedureCox-Box
whenY eY
whenY Y
Minitab
Box-Cox Transformation in Minitab
3210-1
12
10
8
6
4
2
0
Lambda
S t D e v
Lower CL Upper CL
Limit
Estimate 0.03
Lower CL -0.30
Upper CL 0.38
Rounded Va lue 0.00
(using 95.0% confidence)
Lambda
Box-Cox Plot of Data 1
Minitab > Stat > Control Charts > Box-Cox Transformation
Minitab Screenshots
8/17/2019 Advanced Statistics Manual PDF
21/258
ANOVA using Minitab Assistant
| MDT Confidential25
http://www.minitab.com/support/documentation/Answers/Assistant%20White%20Papers/OneWayANOVA_MtbAsstMenuWhitePaper.pdf
Report Card
| MDT Confidential26
8/17/2019 Advanced Statistics Manual PDF
22/258
Diagnostic Report
| MDT Confidential27
Power Report
| MDT Confidential28
8/17/2019 Advanced Statistics Manual PDF
23/258
Summary Report
| MDT Confidential29
ANOVA - Exercise
• Use Ch1DataFile.mtw
• Test for differences between the group means
using both Stat menu ANOVA and Minitab
Assistant ANOVA . . . for these 3-lot PQ studies:
– For TubeTensile1, TubeTensile2, TubeTensile3
– For Diameter1, Diameter2, Diameter3
• What are your conclusions?
| MDT Confidential30
8/17/2019 Advanced Statistics Manual PDF
24/258
ANOVA – Alternate Exercise
Analyze this data two ways: 1) Assistant and 2) Stat>ANOVA
Note: Stat>ANOVA assumes equal variances (and so may needtranformations), but Minitab Assistant ANOVA does no assume equalvariances.
An article in the IEEE Transactions on Components, Hybrids, andManufacturing Technology (Vol. 15, No. 2, 1992, pp. 146-153)described an experiment in which the contact resistance of abrake-only relay was studied for three different materials (all weresilver-based alloys).
Alloy-Contact Resistance.MPJ
Test at a alpha = 0.01 level
Does the type of alloy affect mean contact resistance?
Applied Statistics and Probability for Engineers, 4th Edition, Douglas C. Montgomery and George C. Runger
Alloy-Contact Resistance.MPJ
General Regression can be used for ANOVA
Use for
multiple
regression –
more than
one X
General regression can handle: 1) all continuous input(s), 2) all
categorical input(s), 3) a mixture of continuous and categorical
inputs, and 4) a non-normal response (it allows for the Box-Cox
transformation of the response).
The response must be continuous or considered as continuous.
8/17/2019 Advanced Statistics Manual PDF
25/258
General Regression: Example of ANOVA
Force in Grams
Condition Stylet 1 Stylet 2 Stylet 3
1 18.1 14.5 14.0
2 20.0 16.1 16.3
3 30.2 27.5 26.8
4 42.5 39.4 38.727.70 24.38 23.95
Note: A blocked One-way ANOVA is a two way ANOVA where one
factor’s effect is to be “ blocked out “ . The randomization is donewithin each block.
Background: The forces exerted by three different stylets in a lead is
compared at 4 different positi on/advancement cond itions (blocks).
The data is given below :
x
Perform an ANOVA analysis using Stats>Regression>General
Regression and determine if:
(1) there are significant differences between different stylets, and if
(2) the blocking factor employed was effect ive.
Stylet.MTW
stylet.MTW
Condition is
the Block
Blocked One-way ANOVA
x
8/17/2019 Advanced Statistics Manual PDF
26/258
Blocked One-way ANOVA
(1) Are there are significant differences between different stylets?
(2) Is the blocking factor employed effect ive?
SAMPLE SIZE FOR ANOVA
| MDT Confidential36
8/17/2019 Advanced Statistics Manual PDF
27/258
Planning Sample Size in ANOVA
• Fill in the number of levels for the factor
• Always fill in Standard Deviation (use conservative estimate)
• Then fill in two of the three long boxes
• Can specify several values, separated by spaces
Sample Size for One-Way ANOVA Example
8/17/2019 Advanced Statistics Manual PDF
28/258
Sample Size for One-Way ANOVA
RESPONDING TO ANOVA SIGNALS
| MDT Confidential40
8/17/2019 Advanced Statistics Manual PDF
29/258
Statistical vs. Practical Significance
• Key idea in any hypothesis testing effort
– If the test detects a difference (a “signal”), then what?
– Don’t assume the signal is automatically bad news (if
you’re hoping for consistency) or good news (if you’re
hoping for a change)
• For example, “ANOVA Failure” in PQ
– Examine the size of the signal in the appropriate
context . . . determine the “practical” significance of the
difference
– The appropriate response depends on an assessment
of both statistical and practical significance
| MDT Confidential41
ANOVA Signal in PQ
• There was a realization that a significant p-value
in the comparison of lot means should not
necessarily mean the PQ fails
• Analysis sometimes included to assess the
“power” of the ANOVA and the practical
significance of the difference in the means.
• Eventually, Corporate Policy on Manufacturing
Process Validation added the “ANOVA FailureFlow Chart”
| MDT Confidential42
8/17/2019 Advanced Statistics Manual PDF
30/258
| MDT Confidential43
2008 Version
of Corporate
Guideline for
Manufacturing
Process
Validation
| MDT Confidential44
2012
Version of
CRDM
ANOVA
Signal Flow
Chart
8/17/2019 Advanced Statistics Manual PDF
31/258
Pros and Cons
• Pro
– Provides a consistent way to address the questionof practical significance
– Relatively Simple
– Effective – expect the approach to stand up toregulatory scrutiny
• Con
– Can be very prescriptive
– Standards for Ppk are quite high: 95% confidence
bound on Ppk > 1.33 – Disincentive for larger sample size
| MDT Confidential45
Current approaches
• Corporate Guideline phased out
• CV procedure still has essentially the same
ANOVA Signal Flowchart
• CRDM originally had a more prescriptive version
• CRDM currently has a simplified version
• Would also work to include a discussion of the
sample size of the ANOVA and the practicalsignificance of the difference
• Discussion – other businesses?
| MDT Confidential46
8/17/2019 Advanced Statistics Manual PDF
32/258
Example of ANOVA Signal Flow Chart
• Recall the ANOVA exercise on Ch1DataFile.mtw
for TubeTensile1, TubeTensile2, TubeTensile3
| MDT Confidential47
ANOVA Signal Flow Chart Ppk Analysis
| MDT Confidential48
First Stack the 3 lots using Data -> Stack -> Columns
Then run
Stat -> Quality Tools -> Capability Analysis -> Normal
Add confidence interval for
Ppk using Options button
8/17/2019 Advanced Statistics Manual PDF
33/258
Next steps
• Total sample size is 90, so use confidence bound
• Lower 95% confidence bound on Ppk is 0.92
• Must make 3 more runs
– TubeTensile4, TubeTensile5, TubeTensile6
– These must pass tolerance interval analysis (like
the first three runs did)
– All six runs pass tolerance interval analysis
| MDT Confidential49
Conclusion
| MDT Confidential50
Note: Ppk analysis of all six lots is not
required. Included here FYI.
8/17/2019 Advanced Statistics Manual PDF
34/258
Exercise: ANOVA Signal
• Run ANOVA and assess practical significance for
– In Ch1DataFile.mtw, analyze
• WireTensile1, WireTensile2, WireTensile3
• Specification is 3 lb minimum
– Use one of the ANOVA Signal Flowcharts
– Then use another approach to determine the
practical significance of the difference between the
means
– Conclusion?
| MDT Confidential51
ANOVA: Summary And Recap
• Review Quality Trainer
• Calculations Deep Dive into ANOVA
• Analytically, ANOVA is a special case of
Regression
• Sample Size
• ANOVA Signal Flow chart – some Medtronic
divisions use one to standardize response to ANOVA Signal in PQ
| MDT Confidential52
8/17/2019 Advanced Statistics Manual PDF
35/258
EQUIVALENCE TESTING
| MDT Confidential53
Statistical Logic for Equivalence
• The basic statistical logic is designed to disproveequality.
– Null hypothesis: Two population parameters areequal, e.g. μ1 = μ2.
– Alternative hypothesis: Two population parametersare not equal, e.g. μ1 ≠ μ2.
• We need a different form of logic to affirmativelyprove equivalence.
– Null hypothesis: Two population parameters differby Δ or more, e.g. |μ1 - μ2| ≥ Δ.
– Alternative hypothesis: Two population parametersdiffer by less than ∆, e.g. |μ1 - μ2| < Δ.
| MDT Confidential54
8/17/2019 Advanced Statistics Manual PDF
36/258
Equality vs. Equivalence
Part of the confusion around the issue of
equivalence is that the concepts of equality and
equivalence may not be distinguished.
– Equality: Two values/processes are
mathematically identical.
– Equivalence: The difference between two
values/processes is sufficiently small that it can be
deemed practically insignificant.
| MDT Confidential55
Approach 1: Confidence Intervals
• The idea is to demonstrate that the confidence interval forthe difference of interest is fully contained within therange of practical significance [-Δ,Δ].
| MDT Confidential56 Jones, BMJ 1996
8/17/2019 Advanced Statistics Manual PDF
37/258
Approach 1: Confidence Intervals
• Step 1: Define Practical Significance – Before collecting data, use scientific/engineering
principles to decide what difference, Δ, is practicallynegligible.
• Step 2: Estimate Sample Size for Experiment – Based on characterization data or other assumptions,
estimate the sample size needed to produce aconfidence interval fully contained within [-Δ,Δ]. (Stat
8/17/2019 Advanced Statistics Manual PDF
38/258
Example of Approach 1
| MDT Confidential59
Met hod
Par amet er MeanDi st ri but i on NormalStandard devi ati on 3 ( esti mate)Conf i dence l evel 95%Conf i dence i nter val Two-s i ded
Resul t s
Margi n Sampl eof Err or Si ze
2 12
We need n=12 from
BOTH processes.
Example Output
• Conclusions:
• The processes are statistically different (p=0.003), whichis a statement about non-equality .
• Despite being unequal, the processes are still equivalent.The 95% confidence interval for the difference in means is(0.671, 2.798), which is a strict subset of [-3, 3]
| MDT Confidential60
Two- sampl e T f or New vs Ol d
N Mean St Dev SE Mean
New 12 30. 927 0. 858 0. 25
Ol d 12 29. 19 1. 52 0. 44
Di f f erence = mu (New) - mu (Ol d)
Est i mate f or dif f erence: 1.735
95%CI f or di f f erence: ( 0. 671, 2.798)
T- Test of di f f er ence = 0 ( vs not =) : T- Val ue = 3. 44 P- Val ue = 0. 003 DF = 17
8/17/2019 Advanced Statistics Manual PDF
39/258
Approach 1: Summary
• The confidence interval approach is the gold
standard for clinical trials and other high scrutiny
experiments requiring FDA approval.
• It is mathematically equivalent to a p-value-driven
approach called TOST (Two One-Sided T-tests).
• The confidence interval approach is easier to
understand than the original form of TOST.
| MDT Confidential61
Post-hoc Problems
• Rigorous application of approach 1 requires that
the Δ value be established before collecting data.
• What should we do when data have already been
collected without defining the difference of
interest or planning sample size?
| MDT Confidential62
8/17/2019 Advanced Statistics Manual PDF
40/258
Approach 2: Retrospective Power Analysis
• When data have already been collected withoutplanning for rigorous “equivalence testing”,equivalence may be assessed by displaying an entirepower curve.
• Even if this approach does not set a-priori standardsfor equivalence, – it provides additional context for an insignif icant p-value
– it can help engineering experts to make decisions
• Subjective judgment will be required to determine ifthe experiment was suitably powered to demonstrateequivalence.
• A power curve is a useful supplement to a traditionalanalysis, but it does not match the rigor in approach1.
| MDT Confidential63
Approach 2 Method
• After collecting the means and standard deviation
of the observed data, create a power curve
through the Power and Sample Size platform in
Minitab.
• Display and interpret the Power Curve in your
data analysis report.
• You may honestly believe that your experiment
was sufficiently powered (>80%) to detectmeaningful differences, but the post-hoc nature
of the analysis makes your argument weaker.
| MDT Confidential64
8/17/2019 Advanced Statistics Manual PDF
41/258
Example
| MDT Confidential65
• Consider again our old and new processes which havedistributions of N(30,22) and N(31,12), respectively.
• Suppose we forgot to take approach 1 and instead just collected5 data points from each process.
• We found a statistical difference when we collected 12 datapoints, but the p-value goes above 0.05 when collecting only 5:
Two-sample T for New_5 vs Old_5
N Mean StDev SE Mean
New_5 5 30.744 0.933 0.42
Old_5 5 29.42 3.02 1.4
Difference = mu (New_5) - mu (Old_5)
Estimate for difference: 1.3295% CI for difference: (-2.61, 5.25)
T-Test of difference = 0 (vs not =): T-Value = 0.93 P-Value = 0.403 DF = 4
Power Curve Inputs
• The observed sample size is n=5
• Desired power levels are in the range of .8-.95
• The pooled standard deviation is 2.24.
| MDT Confidential66
8/17/2019 Advanced Statistics Manual PDF
42/258
Power Curve Output
• With 80% power, this experiment could havedetected a difference of about 4.5.
• With 95% power, this experiment could havedetected a difference of about 6.
• It is a subjective engineering judgment as to whethersuch values provide sufficient reassurance about theexperimental results.
| MDT Confidential67
Extensions and Challenges
• Confidence intervals and power curves can be calculatedfor almost any type of statistical scenario:
– Comparing 2 means
– Comparing >2 means
– Comparing standard deviations
– Comparing reliability curves
• However, the required sample size for provingequivalence of standard deviations is often much largerthan the sample size for means.
• Equivalence for means can reasonably be quantified interms of arithmetic differences (e.g. |μ1 – μ2| < 5), butequivalence for standard deviations will be quantified interms of multiplicative differences (e.g. ½ < σ1/σ2 < 2).
| MDT Confidential68
8/17/2019 Advanced Statistics Manual PDF
43/258
Exercise – Lesion Depth
• Consider the key requirement for a new ablation catheter:equivalent (or greater) maximum lesion depth, compared to thecurrent design, where the difference of interest is 0.5 mm.
• Previous data shows – Normal distribution model is adequate for Max Lesion Depth
– Current Design has average max lesion depth of 2.3 mm
– New Design has average max lesion depth of 2.2 mm
– Largest pooled standard deviation of max lesion depth is 0.356.
• Follow Approach 1 to plan sample size for the equivalence test
• Assume test data as follows to complete the equivalenceanalysis – New: n=15, mean = 2.733, stdev = 0.342
– Current: n=15, mean = 2.723, stdev = 0.386
• State your conclusion
| MDT Confidential69
Alternate Exercise: Equivalence Testing
• Within your team, identify an example of
equivalence testing in your own work.
• Apply Approach 1, using actual or made-up
characterization data for the planning step.
• Use Minitab to simulate data collection.
– Hint: Use Calc -> Random Data -> Normal . . .
• Use Minitab to complete the Approach 1 data
analysis.
• State your conclusion from the data.
| MDT Confidential70
8/17/2019 Advanced Statistics Manual PDF
44/258
EQUIVALENCE Take Away Messages
• An insignificant p-value is not a rigorous method ofproving equivalence.
• Ideally, practical significance and sample size should beconsidered before the experiment begins.
• Rigorously proving equivalence first demands carefullydefining the threshold ( ∆) of practical significance.
• The most rigorous way to prove equivalence is todemonstrate that a confidence interval is fully containedwithin [- ∆, ∆].
• An alternative—but less formal—approach is toretrospectively perform a power analysis.
• Don’t feel like you need to remember all the Minitab steps;
we hope you remember the concepts and call yourneighborhood statistician for further support.
| MDT Confidential71
Summary and Review
• Quality Trainer Review
• ANOVA
– Assumptions
– Using Minitab Assistant vs Stat Menu
– Calculation Deep Dive
– Sample Size
– ANOVA Signals
• Equivalence Testing
| MDT Confidential72
8/17/2019 Advanced Statistics Manual PDF
45/258
Chapter 2:Measurement Systems Analysis
Topics
• Quality Trainer Review
• Topics with Variables Data
– Gage R&R Sample Size
– Probability of Misclassification (Variables Data)
– Helpful Hints
• MSA for Destructive Tests
• MSA for Attribute Tests
| MDT Confidential2
8/17/2019 Advanced Statistics Manual PDF
46/258
Quality Trainer Review
| MDT Confidential3
Value of Measurement Systems Analysis
If your goal is . . . then MSA helps by . . .
Process Improvement
Reducing variability in Xs and
Ys so that the “key” Xs may be
discovered.
Capability
Demonstration or
Estimation
More accurate measurements
of process performance
Sorting Out BadProduct
Reducing the Probability ofMisclassification
InnovationReduced noise allows discovery
of more subtle signals
| MDT Confidential4
8/17/2019 Advanced Statistics Manual PDF
47/258
5 | MDT Confidential
Recall . . . MSA Concepts
•Bias – Mean (Delta – difference -- from reference)
•Linearity – Mean (Bias vs Part or Operating Value)
•Stability – Mean (Bias vs Time)
•Repeatability – Standard Deviation
•Reproducibility – Standard Deviation
•Gage R&R – Standard Deviation
…so linearity
and stability
should be
plotted
…while bias,
repeatability and
reproducibility are
just single
numbers
Gage Bias and Linearity
• Bias is the difference between the average ofrepeated measurements and the “true value”
• MSA tends to focus on Gage R&R (variability), butaccuracy (= lack of bias) is equally important
– Assumption that procedures for Calibration are in place- need to confirm
– Assumption that procedures for Calibration areadequate – need to confirm
• “Linearity” is a study of bias across the range ofmeasured values
• In Minitab, use Stat -> Quality Tools -> Gage Study ->Gage Linearity and Bias Study
| MDT Confidential6
8/17/2019 Advanced Statistics Manual PDF
48/258
7 | MDT Confidential
Gage Stability
> Stat > Control Charts > Variables Charts for Subgroups > Xbar-R
Day
S a m p l e M e a n
1 2 - S e p
5 : 0 0
1 2 - S e p
1 1 : 0 0
1 1 - S e p
5 : 0 0
1 1 - S e p
1 1 : 0 0
1 0 - S
e p 5 : 0 0
1 0 - S
e p 1 1 : 0 0
9 - S e
p 5 :
0 0
9 - S e p
1 1 : 0 0
8 - S e
p 5 :
0 0
8 - S e p
1 1 : 0 0
0.254
0.252
0.250
0.248
0.246
_ _ X=0.2497
UCL=0.253458
LCL=0.245942
Day
S a m p l e R a n g e
1 2 - S e p
5 : 0 0
1 2 - S e p
1 1 : 0 0
1 1 - S e p
5 : 0 0
1 1 - S e p
1 1 : 0 0
1 0 - S
e p 5 : 0 0
1 0 - S
e p 1 1 : 0 0
9 - S e
p 5 :
0 0
9 - S e p
1 1 : 0 0
8 - S e
p 5 :
0 0
8 - S e p
1 1 : 0 0
0.0100
0.0075
0.0050
0.0025
0.0000
_ R=0.00367
UCL=0.00946
LCL=0
Xbar-R Chart of Rep1, ..., Rep3
Xbar Chart - in control
R Chart - in control
Measurement system
is stable over time as
evidenced by:
Snap Gauge.mtwMINITAB®
GAGE R&R SAMPLE SIZE
| MDT Confidential8
8/17/2019 Advanced Statistics Manual PDF
49/258
Gage R&R Sample Size
• General recommendation:
– 5 to 10 Parts (P)
– 2 to 3 Operators (O)
– 2 to 3 Repeats (R)
• More rigorous methods
– Specify minimum Degrees of Freedom for
estimating Repeatability and Reproducibility
standard deviations
– Use confidence intervals for standard deviationestimates (option provided in Minitab 16)
| MDT Confidential9
Degrees of Freedom Approach
• Estimating Reproducibility Std Dev: O-1
– Include as many operators as feasible
• Estimating Repeatability Std Dev: P*O*(R-1)
– With 30 df, 90% confidence bound on ratio of estimateto true value is (0.79, 1.21). Ref: on www.minitab.comsearch for “ID 2613” to access “Minitab Assistant WhitePapers.”
| MDT Confidential10
CVG Test
Method
Validation
8/17/2019 Advanced Statistics Manual PDF
50/258
PROBABILITY OF
MISCLASSIFICATION
| MDT Confidential11
12 | MDT Confidential
USLLSL
Probability of Misclassifying
Good Unit as Bad UnitProbability of
Misclassifying
Bad Unit as Good Unit
Misclassification
Two Misclassification Probabilities
• Probability of Misclassifying Bad Unit as Good
• Probability of Misclassifying Good Unit as Bad
8/17/2019 Advanced Statistics Manual PDF
51/258
13 | MDT Confidential
MINITAB Simulated Estimation of Misclassification:
Following Gage RR study
Part mean = 30, Part Std Dev = 10, Part Upper Spec = 40
No measurement system bias Gage R&R Std Dev = 2.6
1) Calc/Random Data/Normal
(simulate true part measurements)2) Calc/Random Data/Normal
(simulate gage variability)
14 | MDT Confidential
MINITAB Simulated Estimation of Misclassification (cont)
3) Calc/calculator/ use the “+”
Add 1) + 2) to simulate observed
measurements
4) Calc/calculator : assign a 1 for in
spec for 1)
Ex: (‘TrueMeasure’≤ 40)
8/17/2019 Advanced Statistics Manual PDF
52/258
15 | MDT Confidential
MINITAB Simulated Estimation of Misclassification (cont)
5) Calc/calculator : assign a 1 for in
specs for 2)
Ex: (‘ObsMeasure’≤
40)
6) Stat/Table/Crosstabs to
crosstabulate 4) and 5).
16 | MDT Confidential
MINITAB Simulated Estimation of Misclassification (cont)
Estimated % of Truly Out of Spec called In Spec is 2.1%.
The simulation sample size was 10000. A larger sample size would be better.
8/17/2019 Advanced Statistics Manual PDF
53/258
17 | MDT Confidential
MINITAB Misclassification
18 | MDT Confidential
MINITAB Misclassification
Two problems:
1) Only three decimals for probabilities( i.e. 0.000)
2) Can’t enter historical: 1) process mean 2) part std.dev 3) gage std.dev
(Note: (2) can now be done with a CSR work aid 13)
8/17/2019 Advanced Statistics Manual PDF
54/258
19 | MDT Confidential
Misclassification Using Minitab
and Work Aid 13
Load into the worksheet:the Part mean (30) and the Part Sigma (10) and the Gage Sigma (2.6)
CSRworkaid13 POM.mtwMINITAB®
20 | MDT Confidential
MINITAB Misclassification
8/17/2019 Advanced Statistics Manual PDF
55/258
21 | MDT Confidential
MINITAB MisclassificationEnlarging the label on the sample mean chart, we see the mean is 30.
22 | MDT Confidential
MINITAB Misclassification
Examining the output we see that: USL 40, and the Part Sigma (10)and the Gage Sigma (2.6) .
Prob. of a truly bad part called good is .021
8/17/2019 Advanced Statistics Manual PDF
56/258
Probability of Misclassification (POM) Tool
• Originally written in R by Tarek Haddad to re-
create functionality lost when Medstat was
retired.
• Jim Dawson collaborated with Tarek to continue
development and turn it into an Excel tool.
• A substantial Software Validation effort was
undertaken by Nick Finstrom and Barry Christy,
with the support of Pete Patel and the CVG Test
Method Council. Validation work to be completedin early 2014.
| MDT Confidential23
POM Tool
• Replicates Medstat functionality
• More resolution in results than Minitab
• Graphics
• Guardbanding
• Normal, Lognormal and Weibull distributions of parts
| MDT Confidential24
8/17/2019 Advanced Statistics Manual PDF
57/258
POM with Guardband
| MDT Confidential25
Exercise
• Run POM analysis
– Using Minitab
Simulation
– Using Work Aid 13 and
Minitab GRR
– Using POM Tool
| MDT Confidential26
8/17/2019 Advanced Statistics Manual PDF
58/258
HELPFUL HINTS
| MDT Confidential27
Gage R&R Helpful Hints - Normality
• Normality testing is not needed for Gage R&Ranalysis
– Distribution of the raw data will depend strongly on theparts used in the study – there no expectation orassumption that the raw data will to follow any specificdistribution
– Repeated measurements on the same part by thesame operator will likely follow a normal distribution
• Like any ANOVA model, the residuals are assumed to follow
a normal distribution – but the analysis is relatively “robust”to non-normality of the residuals
– Probability of Misclassification does depend on the partor process distribution (each part measured once)
| MDT Confidential28
8/17/2019 Advanced Statistics Manual PDF
59/258
Gage R&R Helpful Hints – One-Sided
Specification• In the case of a one-sided specification, the Percent
Tolerance metric depends on the part average
• Minitab uses the overall average in the Gage R&R studyas the estimate of the part average
• If the parts used in the study are not representative of theexpected part distribution . . .
– The overall average will be a poor estimate of the processaverage
– The percent tolerance result will be misleading
– Best practice would be to calculate Percent Toleranceseparately using a better estimate of the process average
– Being “not representative” can be a good practice – forexample, including parts that don’t meet the specification
| MDT Confidential29
Corrective Actions for Failed Gage R&R
• Repeatability problem
– Could be due to part positional variation
• Standardize by measuring same position on each part
• Or make multiple measurements at random or systematic
positions and use the average
– If gage itself is too variable, may need to improve
or replace
• In the meantime, Repeatability variability can be filtered
out by taking repeated, independent measurements and
using the average. Note that this approach does not
correct for Reproducibility issues.
| MDT Confidential30
8/17/2019 Advanced Statistics Manual PDF
60/258
Corrective Actions for Failed Gage R&R
• Reproducibility Problem
– Look for assignable causes that explain the
operator-to-operator differences
– Understand any Operator*Part interactions – these
may provide clues to differences in technique.
– Possibly improve the measurement procedure
and/or re-train the operators
– Improve any visual aids or samples used in the
measurement procedure
| MDT Confidential31
32 | MDT Confidential
Approaches to Robust Gage R&R
Standard Gage R&R methods assume that other factors that affect
measurements have been studied and controlled in the development
of the test method.
If these sources of variability still affect the measurements, then . . .
The Expanded Gage R&R allows you to add additional factors.
Besides operator & part, you could add fixture number, gage
number or other factors. The Expanded GRR can also handle
missing data.
Reference: “Make Your Destructive, Dynamic, and Attribute
Measurement System work for you” by William Mawby.
This book includes the Analysis Of Covariance method thatallows one to load in the varying environmental factors like
temperature & humidity (covariates) into a GRR.
The General Linear Model in Minitab (under the ANOVA branch)
can be used to model covariates (also handles missing data).
8/17/2019 Advanced Statistics Manual PDF
61/258
MSA FOR DESTRUCTIVE
MEASUREMENTS
| MDT Confidential33
34 | MDT Confidential
Two Types of Destructive Measurements
1. Truly destructive: Measurement destroys unit being measured
Pull test
Peel test
Tensile test
2. Non-replicable: Measurement process can change the unit
or you are measuring a transient phenomena
Catapult distance
Motor speed
Heart rate
Dimension of silicon part (can compress)
Dimensions of heart tissue (can compress)
Ref: Make Your Destructive, Dynamic, and Attribute measurement System
Work for You. by. W. D. Mawby
In neither case is it possible
to take repeated measures,
so gage R&R is not possible.
8/17/2019 Advanced Statistics Manual PDF
62/258
Approaches to Destructive MSA
Approach Pro ConDevelop a non-destructive
measurement
Ideal solution Often difficult or
impossible
Attempt to use identical parts
as “repeat” measurements
and apply usual requirements
for GRR %Tolerance
Easy to apply usual
Minitab calculations
Rarely works because
parts aren’t actually
identical
Use a coupon test so that
parts are more identical
Results better than
above
Coupons may not be
representative – easier to
measure than real parts
Focus on improving the
measurement process using
DMAIC
Proven methodology Cannot conclude
whether measurement
system is adequate
Focus on Reproducibility Not affected by part-
to-part variability
Might miss a
Repeatability issue
| MDT Confidential35
What about using “Nested” Gage R&R?
• The “nested” Gage R&R analysis applies when one operatormeasures different parts than another operator. – For example, John measures parts 1, 2, 3, 4, 5 repeatedly and
Jane measures parts 6, 7, 8, 9, 10 repeatedly.
– Common application would be “Inter-laboratory Testing,” whereoperators at each location measure different parts repeatedly.
– Can work for Destructive MSA if each homogeneous samplemay be sub-sampled. Then operators can measure differentsamples repeatedly.
• Analysis – The nested analysis does not include a term for Part * Operator
interaction. – Note that Minitab Assistant doesn’t offer the Nested analysis
• Unless sub-sampling of homogeneous material is possible,Nested does not solve the key problem of Destructive MSA – It’s impossible to repeat the measurement
| MDT Confidential36
8/17/2019 Advanced Statistics Manual PDF
63/258
37 | MDT Confidential
Destructive Gage R&R Example
Tensile testing of tubing
8 pieces of tubing
Each tubing cut into 2 sub samples
Assume variation between sub
samples due to measurement error
Assume an upper specification of
850 g
TestingSupplierCoils.mtwMINITAB®
Destructive Gage R&R using sub-samples
38 | MDT Confidential
8/17/2019 Advanced Statistics Manual PDF
64/258
Destructive Gage R&R using sub-samples
39|
MDT
Confi
denti
al
40 | MDT Confidential
Destructive Gage R&R using sub-samples
Large result for
% Tolerance
Measurement system does
not distinguish one part from
another within the range of
parts used in the study
Nearly all measurement system
variation due to repeatability
rather than operator
(reproducibility).. . . Or maybe
sub-sample differences?
8/17/2019 Advanced Statistics Manual PDF
65/258
41 | MDT Confidential
Destructive Gage R&R using sub-samples
Destructive Gage R&R using subsamples gave poor results
Since repeatability accounts for most of the apparent measurement
variation it is likely that parts were not very similar
In this project they used DMAIC Process Knowledge method to improve
system without obtaining a formal measurement
Focus on Reproducibility
• With destructive measurements, theRepeatability Standard Deviation always includesthe part-to-part or subsample-to-subsamplevariation. In general, repeatability standarddeviation cannot be accurately estimated.
• If one population of parts is randomly assigned tomultiple operators, then the ReproducibilityStandard Deviation is not affected by part-to-part
variation.• Reproducibility standard deviation can be
estimated accurately even for destructive tests.
| MDT Confidential42
8/17/2019 Advanced Statistics Manual PDF
66/258
Reproducibility
• Stop
– Trying to force (Repeatability + Part) Standard
Deviation to be small enough to meet a requirement.
– Trying to obtain or create “identical” parts.
• Start
– Estimate Reproducibility standard deviation and ensure
that it is small enough. This standard deviation
depends only on the differences between operator
means.
– Compare operator standard deviations. Identify caseswhere operators show substantially different variation
across equivalent sets of parts.
| MDT Confidential43
Example: CVG Test Method Validationfor Destructive Tests
• Obtain a population of 40 parts
– Do not need to get identical or nearly identical
parts
• Randomly assign 10 parts to each of 4 operators
• Calculate %Tolerance for Reproducibility
– Compare to requirement of 25%
• Calculate Std Dev Ratio
– Compare to simulation-based critical values (for
typical study, critical value is 3.10
| MDT Confidential44
8/17/2019 Advanced Statistics Manual PDF
67/258
Example Calculations
• Data based on actual TMV studies – But altered to disguise
– Detection Time A, Detection Time P
| MDT Confidential45
Detection Time A
| MDT Confidential46
8/17/2019 Advanced Statistics Manual PDF
68/258
Run One-Way ANOVA
• Reproducibility = sqrt((0.778-0.627)/10) = 0.123
| MDT Confidential47
Calculate Results
• % Tolerance (Reproducibility)
= 100 * ((6*0.123)/2*(30-11.740))
= 100 * (.738 / 36.52)= 2.02%
• Std Dev Ratio = 0.986 / 0.546 = 1.81
• Result: Pass
| MDT Confidential48
8/17/2019 Advanced Statistics Manual PDF
69/258
Detection Time P
| MDT Confidential49
Calculations for Detection Time P
• Reproducibility = sqrt((11.225-0.976)/10) = 1.01
• % Tolerance (Reproducibility)
= 100 * ( (6*1.01) / 2*(30-14.798) )
= 100 * (6.06 / 30.40)
= 19.9%
• Std Dev Ratio = 1.113 / 0.846 = 1.32• Result: Pass
| MDT Confidential50
8/17/2019 Advanced Statistics Manual PDF
70/258
Exercises
• Open Destructive Exercises.mtw
• For Bond Strength results:
– Assume specification is Minimum 5 lb
– Analysis
• Individual Value Plot
• % Tolerance for Reproducibility
• Std Dev Ratio
• Is this destructive measurement system adequate?
• Repeat for Buckle Force results – Assume specification is Maximum 340 grams
| MDT Confidential51
MSA FOR ATTRIBUTEMEASUREMENTS
| MDT Confidential52
8/17/2019 Advanced Statistics Manual PDF
71/258
ATTRIBUTE GAGE R&R
• Attribute data are usually the result of human judgment
– Which category does this item belong in?
• When categorizing items, you need a high degree of
agreement on which way an item should be categorized
• The best way to assess human judgment is to have all
operators repeatedly categorize several known test units
(Attribute Gage R&R)
– Look for agreement
• each person categorizes the same unit consis tently
• there is agreement between the operators on each unit
– Use disagreements as opportunities to determine and eliminateproblems
| MDT Confidential53
SETTING UP AN ATTRIBUTE GAGE STUDY
• Most important aspect of attribute Gage Study is
selecting parts (representative defects)
• Most challenging aspect is choosing parts for the
study. Typically use . . . – 50% acceptable parts
– 50% defective parts
• Have operators repeatedly classify parts inrandom order without knowledge of which part
they are classifying (blind study)
| MDT Confidential54
8/17/2019 Advanced Statistics Manual PDF
72/258
Analysis of Attribute Gage R&R
• Stat Quality Tools Attribute Agreement
Analysis
– Percent Agreement based on number of Parts
– Kappa Statistics (range -1 to 1)
• Minitab Assistant Measurement System
Analysis
– More graphical output
– Accuracy statistics based on number of Appraisals – No Kappa statistics
| MDT Confidential55
Use Minitab Assistant-> Measurement Systems Analysis (MSA)
8/17/2019 Advanced Statistics Manual PDF
73/258
Create Attribute Agreement worksheet
Create Attribute Agreement worksheet
8/17/2019 Advanced Statistics Manual PDF
74/258
• Choose Number of Appraisers = 3
• Choose Number of Trials = 2
• Choose Number of Test Items = 10
• Items 1-5 are “Good”; Items 6-10 are “Bad
• Click “OK”
• Copy column “Standards” and paste into “Results”
• Fix column name back to “Results”
• Find first trial of Item 1 and Item 2
– Change result from “Good” to “Bad” to inject twoerrors into the simulated study
• Save onto Desktop as “Attribute GRR”
Create Result Data
Attribute Agreement Analysis
8/17/2019 Advanced Statistics Manual PDF
75/258
Summary Report
| MDT Confidential61
standard 96.7% of the time.
The appraisals of the test items correctly matched the
100%< 50%
YesNo
96.7%
Appraiser 3 Appraiser 2 Appraiser 1
120
100
80
60
40
20
0
96.7%
thus very difficult to assess.
the study were borderline cases between Good and Bad,
-- High percentage of mixed ratings: May indicate items in
items are being passed on to the consumer (or both).
many Good items are being rejected, or too many Bad
-- High misclassification rates: May indicate that either too
or incorrect standards.
problems, such as poor operating definitions, poor training,
Low rates for all appraisers may indicate more systematic
indicate a need for additional training for those appraisers.
-- Low accuracy rates: Low rates for some appraisers may
measurement system can be improved:
Consider the following when assessing how the
Overall error rate 3.3%
Good rated Bad 6.7%
Bad rated Good 0.0%
ways)
Mixed ratings (same item rated both 6.7%
Misclassification Rat es
100.0100.0
90.0
% Accuracy by Appraiser Comments
Attribute Agreement Analysis for Results
Summary Report
Is the overall % accuracy acceptable?
Attribute “c=0”
result . . .
Showing that no
bad parts were
misclassified as
good
Overall, 96.7% of
presentations
were classified
correctly
Appraiser 3
Appraiser 2
Appraiser 1
100806040
Bad
Good
100806040
2
1
100806040
Appraiser 3
Appraiser 2
Appraiser 1
Appraiser 3
Appraiser 2
Appraiser 1
100806040
Good
Bad
% by A ppraiser
% by Standard
% by Trial
% by A ppraiser and Standard
Attribute Agreement Analysis for Results
Accuracy Report All graphs show 95% confidence intervals for accuracy rates.
Intervals that do not overlap are likely to be different.
Accuracy Report
Illustrates the
95% / 90% result
8/17/2019 Advanced Statistics Manual PDF
76/258
Kappa
63 | MDT Confidential
Kappa is a measure of rater’s agreement.
Minitab:
• Reports two Kappa statistics: Fleiss’ & Cohen’s
• Defaults to Fleiss’ Kappa Minitab will only calculate Cohen’s Kappa if you choose the option for
Cohen’s Kappa, and if one of these two conditions is true:
• A) Two appraisers perform a single trial on each
sample
• B) One appraiser performs two trials on each sample Kappa is meant for attribute data.
Kappa ranges from -1 to 1.
64 | MDT Confidential
Kappa (Landis and Koch)
According to AIAG (Auto industry), a general ru le of thumb i s:
A Kappa value greater than 0.75 ind icates a good to excellent
agreement
Kappa values less than 0.40 indicate poor agreement.
This general rule of thumb may not apply for most Medtronic
applications. Any disagreement on rejectable units would be of
concern.
8/17/2019 Advanced Statistics Manual PDF
77/258
Kappa calculations
| MDT Confidential65
Kappa results
| MDT Confidential66
8/17/2019 Advanced Statistics Manual PDF
78/258
Summary and Recap
• Quality Trainer Review
• Topics with Variables Data
– Gage R&R Sample Size
– Probability of Misclassification (Variables Data)
– Helpful Hints
• MSA for Destructive Tests
• MSA for Attribute Tests
| MDT Confidential67
BACKUP SLIDES
| MDT Confidential68
8/17/2019 Advanced Statistics Manual PDF
79/258
69 | MDT Confidential
Destructive Gage R&R - 2 Nested Designs
•2 Stage Nested Design Approach
•Samples are parts that canbe subdivided intohomogenous sub samples.
•Stage 1: 1 operatormeasures sub-samples (2-5)from parts (5-10).
•Stage 2: 3 operators eachmeasure same location perpart (5-10).
1 2 5
1
1 2 5
2
1 2 5
10Parts
Location
Stage 1
1 Operator
1 2 10
1Operator
Parts
Stage 2
1 sub-sample per part
1 2 10
2
1 2 10
3
70 | MDT Confidential
Destructive Gage R&R - 2 Stage Die
Bond Example (cont.)•Project: Destructive 2 stage nested.mpj
Pull testing of die bond.
Parts are die. Sub-samples
are 5 wire locations on the
die. Spec = 7.5 grams
minimum.
Stage1: 1 operator pull
tests all 5 wire locations on
each of 10 die.
Stage 2: Each of 3
operators pull test 10 die at
wire location 1.
MINITAB®
8/17/2019 Advanced Statistics Manual PDF
80/258
71 | MDT Confidential
Destructive Gage R&R - 2 Stage Die Bond
Example (cont.)
Stage 1: Stat > ANOVA > Fully Nested
ANOVA
Nested ANOVA: Pull Strength versus Die
Var i ance Component s
Source Var Comp. % of Total St Dev
Di e 0. 088 15. 50 0. 296Err or 0. 479 84. 50 0. 692
Tot al 0. 567 0. 753
2ˆ part
From worksheet: stage1
72 | MDT Confidential
Nested ANOVA: Pull Strength (Wire 1) versus Operator
Vari ance Component s
Source Var Comp. % of Total StDev
Oper ator 0. 053 11. 08 0. 231
Err or 0. 428 88. 92 0. 654
Tot al 0. 481 0. 694
Destructive Gage R&R - 2 Stage Die Bond Example
(cont.)Stage 2: Stat >
ANOVA > Fully Nested
ANOVA
From worksheet: stage2
2ˆoperator
2
/ˆ
repeat part
8/17/2019 Advanced Statistics Manual PDF
81/258
73 | MDT Confidential
2
ˆ repeat
= 2 /
ˆ repeat part
2
ˆ part
= 0.428 – 0.088 = 0.340
2
& R R = 0.340 + .053 = 0.393
Destructive Gage R&R - 2 Stage Die Bond Example
(cont.)Manual calculation of Gage Repeatability and Reproducibility
Compare Gage R&R variance to part variance if parts are
chosen to be representative of production process.Since this is a one-sided spec (7.5 grams) use
Misclassification to determine gage acceptance.
74 | MDT Confidential
Kappa – Call Center Example
Call Center workers were asked to categorize types of calls they
received: Callcat.mtwMINITAB®
8/17/2019 Advanced Statistics Manual PDF
82/258
75 | MDT Confidential
Kappa Attribute Analysis: Option Setting
76 | MDT Confidential
Kappa : Within Appraiser Agreement
8/17/2019 Advanced Statistics Manual PDF
83/258
77 | MDT Confidential
Kappa: Each Appraiser vs Standard
78 | MDT Confidential
Kappa for Appraisers
What do we conclude from this analysis for the raters performance?
What would you do next?
Can this method be applied to the banana data?
8/17/2019 Advanced Statistics Manual PDF
84/258
8/17/2019 Advanced Statistics Manual PDF
85/258
Distribution AnalysisThe Art of Finding Useful Models
Jeremy Strief, Ph.D.
MECC Principal Statistician
Objectives
• Explain why distributional analysis is statistically
complicated (and sometimes emotionally frustrating!)
• Emphasize the importance of engineering theory and
historical precedent.
• Encourage the use of multiple graphical methods in
addition to numerical tests.
• Review common causes of Non-Normality.
• Discuss Transformations and how they compare to
fitting non-Normal distributions.
Medtronic Confidential
8/17/2019 Advanced Statistics Manual PDF
86/258
Recap from Quality Trainer
• Normal Distribution Basics
• Capability Analysis (Normal)
• Capability Analysis (Non-Normal)
• Graphical tools
– Boxplots
– Histograms
– Individual Value Plots
| MDT Confidential3
Distribution AnalysisMotivation and Philosophy
8/17/2019 Advanced Statistics Manual PDF
87/258
5 | MDT Confidential
Why Assess Distribution• Statistical tools vary in sensitivity to and effect of distributional assumptions
• Some MDT procedures require distributional assessment for thosestatistical methods which are highly sensitive to distributional assumptions
Statistical
Tool Distributional
Sensitivity
Effect
of
Poor
Distributional
Fit
Capability Analysis High Incorrect PPM/Ppk
Tolerance Intervals High Incorrect Bounds
Variables Lot Acceptance Sampling High Altered rejection and acceptance rates
Individuals Chart for SPC High Incorrect control limits
GLM / Regression / ANOVA Med approximate p‐value
Xbar chart for SPC Med/Low approximate p‐value
Two‐sample t‐test Low approximate p‐value
Non‐parametric methods Low approximate p‐value
6 | MDT Confidential
Time
P e r c
e n t
60504030201 00-1 0-2 0
99.9
99
95
90
80
7060
504030
20
10
5
1
0.1
M ean
< 0.005
12.31
S t D e v 9 .6 56
N 100
A D 5.7 38
P -V alu e
P r o b a b i l i t y P l o t o f T i m eNorm a l
Time
F r e q u e n c y
50403020100
40
30
20
10
0
H i s t o g r a m o f T i m e
Not All Data Are Normal: Example
Lead Time Data
usually have a long
tail – skewed
distribution
8/17/2019 Advanced Statistics Manual PDF
88/258
Not All Data are Normal: Considerations
• Observed data need not follow any tractablemathematical model.
• Some mathematical models may be useful, if
imperfect, representations of the data.
| MDT Confidential7
Frustrations with Distributional Analysis
• Larger sample sizes (n>100) cause the statistical
tests to detect small departures from a theoretical
model. Such departures may not be practically
significant.
• Smaller sample sizes (n
8/17/2019 Advanced Statistics Manual PDF
89/258
The Underlying Statistical Hypotheses
• The statistical hypothesis testing is ‘backward,’ in that the null
hypothesis assumes that the particular distribution is a good fit.
– H0: Distribution specified has a good fit
– H1: Distribution specified has lack-of-fit
• Low p-values will disprove the fit of a distribution. So certain
distributions can be ruled out as a reasonable models.
• Using the standard goodness-of-fit metrics, it is technically not
possible to prove that a particular distribution is the “true model”
for the data.
• Instead of providing statistical “proof”, distribution analysis is
geared toward assessing which statistical distributions are
plausible models for the data at hand.
| MDT Confidential9
Philosophy of Distribution Analysis
“All models are approximations. Essentially, all
models are wrong, but some are useful. However,
the approximate nature of the model must always
be borne in mind.”
--G.E.P. Box
| MDT Confidential10
8/17/2019 Advanced Statistics Manual PDF
90/258
N=15 Probability Plots
Medtronic Confidential
N=500 Examples
Medtronic Confidential
Only 12 out of 500 values were affected by the truncation or
censoring.
8/17/2019 Advanced Statistics Manual PDF
91/258
13 | MDT Confidential
How to Determine Distribution
1. Scientific/Engineering Knowledge
2. Historical distribution analysis
3. Distribution analysis
Priority order
Why is
distribution
analysis last?
• Sample size (50 to 100)
• Regardless of n, key Xs and shift and drift
can mask true distribution
Distribution applies to short term data only
Importance of Engineering Theory
• The choice of distribution should be both statistically
plausible and scientifically justified.
• Engineering theory and historical precedents often
suggest whether a distribution should be Normal,
Lognormal, or Weibull.
• If scientific theory does not lead to one single
statistical model, at least consider
– Whether the distribution should be skewed or symmetric
– Which distributions can be ruled out
Medtronic Confidential
8/17/2019 Advanced Statistics Manual PDF
92/258
Data Analysis Philosophy
• Information shouldn’t be destroyed. Examples ofinformation destruction are
– Converting variables data to attribute data.
– Heavy rounding with a bad measurement system.
– Drifting measurement system.
• Check the quality and structure of the raw data.
– Are there physically impossible values, wildoutliers, missing values, too many ties?
– Are the data paired or unpaired?
– Was randomization employed? – How was the data generated?
| MDT Confidential15
Data Analysis Philosophy
• Plot the data AND do analytics.
– PLOT histograms, run charts, scatter plots,… .See what is going on. Do a probability plot forprocess data.
– Use ANALYTICS to get quantitative about whatyou have seen. Examine the residual plots fromanalytical model fits.
• Analyses are performed on yesterday’s data
today to predict tomorrow’s performance. – Data from an unstable process that is analyzed
(ignoring the instability) may result in a conclusionthat will not hold up tomorrow.
| MDT Confidential16
8/17/2019 Advanced Statistics Manual PDF
93/258
Distribution AnalysisReview of Engineering Distributions
Most Common Statistical Models for
Engineering Applications
• Weibull
• Exponential (special case of Weibull)
• Lognormal
• Normal
| MDT Confidential18
8/17/2019 Advanced Statistics Manual PDF
94/258
Weibull
• A flexible model which can assume many different shapes, depending on the
choice of parameters• Scale parameter α or η
• Shape parameter β
• Arises from “weakest link” failures, or situations when the underlying process
focuses on the minimum or maximum value of independent, positive random
variables.
• Models stress-strength failures
| MDT Confidential19
Exponential
• Special case of Weibull when β=1
• Constant hazard rate, meaning that the probability of failure is not a
function of the age of the device/material.
• May occur when multiple failure modes are operating simultaneously
• May be useful in modeling software failures resulting from external
sources (e.g. cosmic radiation causes bit-flips at an extremely low,
constant rate)
| MDT Confidential20
8/17/2019 Advanced Statistics Manual PDF
95/258
Lognormal
• Models time-to-failure caused by several forces which combine
multiplicatively.
• Describes time to fracture from fatigue crack growth in metals.
• Right skewed distribution, useful when data values take multiple
orders of magnitude (e.g. 1.4, 14, 140).
• Two parameters (μ,σ), each of which is traditionally expressed on
the log scale.
• So if X~Lognormal(μ,σ), then ln(X)~Normal(μ,σ)
| MDT Confidential21
Normal
• Models time-to-failure caused by additive, independent forces
• Commonly describes gage error, dimensional measurements from
a supplier, and other symmetric, bell-shaped phenomena
| MDT Confidential22
8/17/2019 Advanced Statistics Manual PDF
96/258
Additional Models to Consider
• Logistic• Smallest Extreme Value (SEV)
• Largest Extreme Value (LEV)
| MDT Confidential23
Some Relationships
• SEV distribution = ln(Weibull distribution).
• LEV distribution = ln(1/Weibull distribution).
• Normal distribution = ln(Log-normal distribution).
• All Weibull distributions can be rescaled and
repowered to get another Weibull.
• The Weibull(100,4) is very close to a Normal
(mean=90.64, s.d= 25.43). This normal is thicker in
the tails than the Weibull (100,4). Ref: 02SR013
“Algorithm for Computing Weibull Sample Size forComplete Data”
| MDT Confidential24
8/17/2019 Advanced Statistics Manual PDF
97/258
25 | MDT Confidential
Review: Common Engineering Distributions
DimensionsMeasurement
error
Lead Time
Normal Weibull
Lognormal
Default
Time to
stress/strength
related failure
Time to
fatiguerelated failure
Infant
mortality
Wearout
Distribution AnalysisStatistical Overview
8/17/2019 Advanced Statistics Manual PDF
98/258
Statistical Approach to Distribution Analysis
• Both graphical and numerical approaches areneeded
• P-value is not definitive, given the “backward”
nature of hypothesis testing
• Visual assessment of the probability plot is
crucial
• Reasonably large sample sizes (~50) are
needed. Consult your local procedures (e.g.
DOC000550 within CRDM) for specific rules.
| MDT Confidential27
Distribution AnalysisGraphical Methods
8/17/2019 Advanced Statistics Manual PDF
99/258
Good Distribution Analysis Should
Always Begin With Plots!
• Probability plots
• Histograms
• Time plots
Medtronic Confidential
Probability Plot
• A probability plot is a 2-dimensional plot with specialized (often
logarithmic) axes, to facilitate comparison between observed
data and a hypothesized distribution.
• More specifically, a probability plot is a comparison between the
observed and theoretical quantiles (i.e. percentiles) for a
hypothesized distribution.
| MDT Confidential30
8/17/2019 Advanced Statistics Manual PDF
100/258
Probability Plot Interpretation
• If the distribution i s a good fit to the data, the plotted points
should fall approximately in a straight line.
• When interpreting the probability plot, examine both the p-value
and the visual fit.
– At the tails of the distribution, look whether the points are falling on
the conservative side of the fitted line.
– Look for major deviations in the pattern of points from a straight
line—kinks, ties, curves, jumps, etc. Do not worry if a few points
fall outside the confidence bounds.
– Fat Pencil Test: Can the observed data values be covered up by a
“fat pencil”?
| MDT Confidential31
Probability Plot in Minitab
| MDT Confidential32
8/17/2019 Advanced Statistics Manual PDF
101/258
Probability Plot Examples
Large N makes for
obvious curvature:
Medtronic Confidential
Right skew and
curvature:
Probability Plot Examples
“Subtle Patterns” can becaused by randomness
Medtronic Confidential
Both datasets were
sampled directly from a
Normal distribution.
8/17/2019 Advanced Statistics Manual PDF
102/258
Probability Plot Examples
• Distribution does not pass the Anderson-Darling test, but the
lower tail of the distribution falls on the conservative side of thefitted line.
• Distribution appears to have a lower limit of zero
• It would be conservative to use the Normal model to estimate
the lower tail behavior.
| MDT Confidential35
Histograms in Minitab
The graph menu offers a histogram platform, but the graphical
summary platform offers more information with fewer clicks.
| MDT Confidential36
8/17/2019 Advanced Statistics Manual PDF
103/258
Histograms
• More intuitive than probability plots, since the x-y axes are not
transformed.• Not informative with small sample sizes (
8/17/2019 Advanced Statistics Manual PDF
104/258
39 | MDT Confidential
Why is Stability needed to Assess Distribution?
Distribution Assessment Risks
• Shift and Drift, and Variation in Key Xs
masks distribution
• Initial capability data always contains
Shift and Drift
• At Final Capability, process is stable
and variation in Key Xs is removed
Distribution Analysis Shift and Drift.mtw
100 samples from Week 1
25 samples from Week 2
100 samples from Week 3
Distribution applies to short term data only
MINITAB®
40 | MDT Confidential
Initial Process Data often have Shift and Drift
Observation
I n d i v i d u a l V a l u e
221199177155133111896745231
35
30
25
20
15
10
5
_ X=19.93
UCL=26.30
LCL=13.55
1
11
11
1
1
1
111
1
111
11
1
11
1
1
1
11
11
1
111
1
1
1
1
1
1
1
111111
11
1
1
11
1
1
11111
1
11
1
111
11
111
111
1
1111
1
1111
1
1
1
11
11
1
1
1
11
1
1
11
11
1
11
11
11
11
1
11
1
1
11
11
111
1
11
11
1
1
11
1
1
111
111
11111
1
1
1
1
1
111
111
11
1
1
11
1
111
1
1
1
11
11
1
1111
1
11
11
1
1
11
1
1
1
11
1
1
I Chart of Initial Capability Data
8/17/2019 Advanced Statistics Manual PDF
105/258
41 | MDT Confidential
Long Term Data May not be Normal
Initial Capability Dat a
P e r c e n t
6050403020100-10-20
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Mean
8/17/2019 Advanced Statistics Manual PDF
106/258
Distribution AnalysisNumerical Methods
Numerical Methods
• For all numerical methods:
– A large (≥0.05) p-value implies there is no evidenceagainst the hypothesized distribution.
– A small (
8/17/2019 Advanced Statistics Manual PDF
107/258
45 | MDT Confidential
Most Common Normality Tests
• Anderson-Darl ing (AD) test
• Ryan-Joiner test
Note: The Ryan-Joiner test is essentially
equivalent to the Shapiro-Wilk test.
Anderson-Darling
• Default approach in Minitab.
• May be used to assess fit of Normal and non-
Normal distributions.
• Gives unreliable results when data are
discretized/grouped, which is fairly common
when measurement system resolution is poor.
| MDT Confidential46
8/17/2019 Advanced Statistics Manual PDF
108/258
Anderson-Darling in Minitab
For assessing Normality:
| MDT Confidential47
Anderson-Darling in Minitab
For any/all distributions:
| MDT Confidential48
8/17/2019 Advanced Statistics Manual PDF
109/258
Anderson-Darling Results
Normal(10,1.5)
| MDT Confidential49
Normal(10,1.5)--Rounded
Ryan-Joiner
• Useful for discretized, rounded, or clumpy data
• Will not declare significant lack-of-fit simply due to poor
measurement resolution
• Recommended minimum of 5 “groups” to have a meaningful p-
value. Fewer groups may yield an overly optimistic (high) p-
value.
| MDT Confidential50
Anderson-Darling Ryan-Joiner
8/17/2019 Advanced Statistics Manual PDF
110/258
Ryan-Joiner in Minitab
| MDT Confidential51
Truncation
• The Normal distribution may be used to model tail
behavior if it provides a conservative estimate of
those tails.
• This situation arises when data are truncated, which
is quantitatively captured as negative kurtosis.
| MDT Confidential52
8/17/2019 Advanced Statistics Manual PDF
111/258
Truncation
• In principle, truncated data may be evaluatedgraphically or through a Skewness-Kurtosis (SK) test.
• The SK test checks whether the tails of the Normal
distribution are longer or shorter than the tails of your
data.
• MECC has created and validated an Excel
spreadsheet (R134997) which executes the SK test.
• In practice, consult your local procedures to ensure
your analysis of truncated data is compliant.
| MDT Confidential53
Microsoft Excel
Worksheet
Avoiding Parametric Distributions Altogether
• Chebyshev’s inequality captures the tail behavior of anystatistical distribution with a finite variance. – For any random variable X and constant k > 1,
P( |X- μ | ≥ k σ ) ≤ 1/k 2
• This inequality may be useful for skipping the issue ofdistributional fit altogether, especially if distributional fit isbeing assessed in order to compute a tolerance interval.
• Chebyshev’s will only be helpful if the process capability isextremely high.
• Consult your own procedures for details, but CRDMprocedures invoke the following version of Chebyshev: – If the nearest specification is at least 10 standard deviations
away from the mean, it may be inferred by Chebyshev that at
least 99% of the distribution will fall within specification.
| MDT Confidential54
8/17/2019 Advanced Statistics Manual PDF
112/258
Why Normality Tests Fail
1. A shift occurred in the middle of the data
2. Multiple sources or multiple failure modes withdifferent distributions
3. Outliers
4. Piled up data.
5. Truncated data (sorted before you get it)
6. The underlying distribution is not normal (skewed)
7. Poor measurement resolution
8. Too much data (over powered to detect non-normality)
9. Due to random chance –you expect the test to fail5% of the time (i.e. 95% confidence) if the data weretruly from a normal distribution.
Resolving Non-Normality
1 Data shift Sublot
Skewness/kurtosis test
Attribute sampling
2 Multiple data sources Sublot
Skewness/kurtosis test
Attribute sampling
3 Outliers Attribute sampling
Outlier removal
(May remove outliers only if they
constitute typos or data collection
errors.)
4/5 Censored/Truncated data
(tails lost)
Skewness/kurtosis test
Conservative fitting
Attribute sampling
6 Distribution not normal Non‐normal analysis
Transformation
Attribute Sampling
7 Poor measurement resolution Ryan‐Joiner
Skewness/kurtosis test
8 Too much data Graphical evidence
Randomsubsampling
9 Random Chance Historical assessment
8/17/2019 Advanced Statistics Manual PDF
113/258
When Multiple Distributions Fit
Prior engineering knowledge is
particularly useful when multipledistributions yield p-values above 0.05:
– Picking the distribution solely based on best p-value orbest R2 is rational when there is absolutely no history orscientific theory.
– A better approach is to assemble a list of plausible(p>0.05) distributions and then make a final choice basedupon history and science.
– P-values will sometimes be below 0.05 simply as a resultof chance (Type I error). It is not recommended toimmediately change years of analysis based on onesignificant p-value. Investigate and monitor beforechanging distributions.
| MDT Confidential57
Avoid the daily special
– Do NOT take the “distribution du jour” approach, in
which multiple distributions are chosen for a single
process. This reflects either:
• An out-of-control process, which can’t be
captured by a single distribution anyway.
• The bad statistical practice of just defaulting to
the distribution with the highest p-value.
| MDT Confidential58
8/17/2019 Advanced Statistics Manual PDF
114/258
59 | MDT Confidential
Example: Capability for Non-Normal Data
using Tribal Knowledge for Distribution
Problem Statement: Time (in days) to process
(reject/accept) loan applications is too long causing loss in
customer applications
Project Goal: Decrease potential customer loss from
15% to 5%. Customer expectation is 20 days.
Project Strategy: Path Y = Time
Task: Determine capability for Y = Time
LoanApplicationTime.MTW
Assume lead time has a LogNormal Distribution
MINITAB®
60 | MDT Confidential
Time
P e r c e n t
100101
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Loc
0.299
2.269
Scale 0 .6845
N 100
AD 0.432
P-Value
Probability Plot of TimeLognormal - 95% CI
Verify Lognormal Distribution
Check if LogNormal
provides a good fit Time
P e r c e n
t
10 0101
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Lo c
0.299
2.269
S c a le 0 .6 84 5
N 100
A D 0.4 32
P -Value
Probabil ity Plot of T imeLogno rm a l - 9 5% CI
8/17/2019 Advanced Statistics Manual PDF
115/258
61 | MDT Confidential
Capability for Non-Normal Data using LogNormal
50403020100
USL
Process Data
Samp le N 100
Location 2.26918
S ca le 0 .6 84 49 3
LSL *
Target *
USL 20
Sample Mean 12.31
Ov erall Capability
Z.Bench 1.06
Z.LSL *
Z .U S L 0 .4 7
P pk 0.16
Observed Performance
P PM < LS L *
PPM > USL 160000
PPM Total 160000
Exp. Overall Performance
P PM < LS L *
PPM > USL 144242
PPM Total 144242
Process Capability of TimeCalculations Based on Lognormal Distribution Model
Distribution AnalysisTransformations
8/17/2019 Advanced Statistics Manual PDF
116/258
Two Options
• When a dataset is non-Normal, it is acceptable either to
– Mathematically transform the data to achieve Normality – Fit a non-Normal distribution
• Transformation carries the practical advantage that manystatistical methods are based upon Normality, so there willbe more analytical tools available for the transformeddataset.
• Transformation carries the disadvantages of creatingunnatural units (e.g. log-meters instead of meters) andaltering potentially relevant structures of the data.
• Note: Please do NOT try transformations of data froman unstable process, or bimodal data (two bumps).
| MDT Confidential63
Transformation Advice
• If a transformation is chosen, it should be as
simple as possible, and it should ideally have a
physical interpretation.
• A log transformation is particularly desirable,
since it
– Is monotonic
– Is straightforward to interpret (it turns multiplicative
effects into additive effects)