Upload
kjets78
View
13
Download
0
Embed Size (px)
DESCRIPTION
review
Citation preview
1© 2001 ConceptFlow
Review of Analyze
2© 2001 ConceptFlow
Analyze Phase Deliverables
• A prioritized list of potential sources of variation • Variation Component Studies• Measurement Analysis on the x’s• Data collected to validate sources• Graphical and statistical analysis of data
• P-value establishing level of significance and probability• Correlation and regression analysis to determine variable relationships• Reduced list of potential key input variables that affect the output(s)• Updated control charts, process map & FMEA• Results to data (compared to baseline)
Define Improve ControlMeasure Analyze
Statistically links key input variables with key output variable
3© 2001 ConceptFlow
Analyze Week Topics
• Review of Measure Week• Central Limit Theorem• Confidence Intervals• Introduction to Hypothesis
Testing• Hypothesis Testing
• Means• Variance• Proportion• Chi Square
• Analysis of Variance (ANOVA)• Variation Components• Correlation and Simple
Regression• Multiple Regression• Wrap-up and Deliverables
4© 2001 ConceptFlow
n sample sizex
individuals
x
Central Limit Theorem Defined
• If variable x has an unknown distribution with mean = and standard deviation = , then
• Sampling distribution of x (mean) having sample size of n will
(1) have a mean,
(2) have a standard deviation,
(3) tend to be normal as the sample size becomes large (n>30 for unknown distributions)
© 2001 ConceptFlow
Standard Error of the Mean
mean for the Size Sample=n
Scores Individual for theDeviation Standard
Mean theofError Standard
x
Distribution of Sample Averages
Population of Individuals
SE Mean =
x
nx
© 2001 ConceptFlow
Central Limit Theorem Objectives
By the end of this module the participant should be able to:• Discuss the Central Limit Theorem (CLT) and demonstrate its results
using a practical example• Discuss the implications of Central Limit Theorem in statistical analysis• Describe how to apply the Central Limit Theorem to reduce
measurement variation
7© 2001 ConceptFlow
A Graphical View
A 95% confidence interval suggests that approximately 95 out of 100 confidence intervals will contain the population parameter
Confidence Interval
Population Mean
Sample Mean
8© 2001 ConceptFlow
SAMPLE WITHIN
(subset)
ENTIRE POPULATION
Population Versus Sample
Sample mean=X
“Population Parameters”
“Sample Statistics”
= Population mean
s = Sample standard deviationPopulation
= Population standard deviation
If we only pull samples, do we ever know the true population parameters?
Sample
9© 2001 ConceptFlow
CI = Sample Statistic Margin of Error
Margin of Error = K * Measure of Variability
Statistic = Mean, Variance, Proportion, etc. from sample
Confidence Factor, K = Constant based on a statistical distribution
Estimating Confidence Intervals (CIs)
• Parametric confidence intervals in most cases take the general form:
• Confidence intervals reflect the sample to sample variation of our point estimates
10© 2001 ConceptFlow
Confidence Interval and Central Limit Theorem
10090807060504030
500
400
300
200
100
0
Population
Fre
qu
en
cy
10090807060504030
80
70
60
50
4030
20
10
0
Sample
Fre
qu
en
cy
43210-1-2-3-4
99.73%
95.44%
68.26%
Pro
babi
lity
of S
ampl
e V
alue
95% of all sample means are within two “standard errors” of the population mean
11© 2001 ConceptFlow
Confidence Interval Objectives
By the end of this module participants should be able to:• Discuss the role of confidence intervals in statistical analysis• Discuss the meaning of confidence intervals in
practical terms• Calculate confidence intervals for the mean, standard deviation,
proportion and other derived parameters such as Cp and Pp
12© 2001 ConceptFlow
What is Hypothesis Testing?
• In hypothesis testing, relatively small samples are used to answer questions about population parameters (inferential statistics)
• There is always a chance that the selected sample is not representative of the population; therefore, there is always a chance that the conclusion obtained is wrong (Alpha & Beta Risks)
• With some assumptions, inferential statistics allows the estimation of the probability of getting an “odd” sample and quantifies the probability (p-value) of a wrong conclusion
13© 2001 ConceptFlow
Process Flow of a Hypothesis Test
DECIDE:What does the evidence suggest?Reject Ho? or Fail to reject Ho?
Calculate test statistic and/or p-value
Collect sample data
Establish significance level ()
State the “Alternate Hypothesis” (Ha)
State a “Null Hypothesis” (Ho)
Define the problem and state objectives
14© 2001 ConceptFlow
Forming a Hypothesis
• Null Hypothesis (Ho)
• No difference/ no change • Factor not statistically significant• Population follows a normal
distribution
• Alternative Hypothesis (Ha)
• Difference/change occurred• Factor statistically significant• Population does not follow a
normal distribution
Assume H0 to be true until proven otherwise. Burden of proof rests with Ha
15© 2001 ConceptFlow
(Alpha) - Simplified Perspective
Null Hypothesis (Ho) assumed true
• e.g., defendant assumed innocent• Prosecuting attorney must provide evidence beyond reasonable doubt
that assumption is not true• Reasonable doubt = (significance level)
16© 2001 ConceptFlow
Alpha () & Beta () Risk
-risk • Risk of finding a difference when there really isn’t one• Type I error or Producers’ risk
-risk• Risk of not finding a difference when there really is one• Type II error or Consumers’ risk
17© 2001 ConceptFlow
Sensitivity
/ where = size of difference and =SD• Relative magnitude or size of the difference being tested expressed in
standard deviations• Called test sensitivity
1
/2
18© 2001 ConceptFlow
The Relationship in Hypothesis Testing
Decision
Fail to reject Ho
Truth
Ho true
Ha true
Type I Error-Risk or false
positive)
Type II Error-Risk or false
negative)
Correct Decision
CI = 1-
Correct Decision
Power = 1-
Reject Ho
Producers’ Risk
Consumers’ Risk
19© 2001 ConceptFlow
Test Statistic and -value Graphical View
0
Observed value of Test Statistic
Critical value
-risk - value
20© 2001 ConceptFlow
Hypothesis Testing Introduction Objectives
By the end of this module participants should be able to:• Discuss the hypothesis testing process• Recognize and risks and how they affect hypothesis testing• Discuss how the p-value is used for decision making• Relate the hypothesis testing process to real world examples
21© 2001 ConceptFlow
Comparison of Means: 4 Scenarios
1. Single Mean Comparison
• One sample vs. target
• is known
2. Single Mean Comparison
• One sample vs. target
• is NOT known
targetvalue
vs.
targetvalue
vs.
22© 2001 ConceptFlow
Comparison of Means: 4 Scenarios
3. Two Sample Comparison
• Two independent samples compared to each other
4. Paired Comparison
• The difference (“”) between two paired samples
vs.
1
- =
1
2
2
d
d vs. target
23© 2001 ConceptFlow
Hypothesis Testing of Means-Roadmap
3 or more
factors
Comparing Means
1 Factor
1-sample Z-test
Two way ANOVA
ANOVAGLM
One way ANOVA
1-sample t-test
2-samplet-test
Paired t-test
1 Sample 2 Samples 2 or more
samples
2 Factors
not known known independent paired
24© 2001 ConceptFlow
Means Hypothesis Testing Objectives
By the end of this module participant should be able to:• Choose the appropriate test for a given problem regarding population
mean• Perform hypothesis tests of mean• Design and apply hypothesis tests of mean on projects
25© 2001 ConceptFlow
vs. targetvalue
Comparison of Variance: 3 Scenarios
1. Single Variance Comparison
• One population standard deviation compared to a target value
2. Two Sample Comparison
• Variances of two independent populations compared to each other
vs.
21
22
26© 2001 ConceptFlow
Comparison of Variance: 3 Scenarios
3. More than Two Sample Comparison
• Variances of more than two independent populations compared to each other
vs.
21 2
322
vs.
27© 2001 ConceptFlow
1 VarianceTest
1 Sample
Comparing Variances
Hypothesis Testing of Variation - Roadmap
2 VarianceTest
2 Sample
Test for EqualVariance
More Than 2 Samples
Levene’s TestBartlett’s TestLevene’s TestF- TestDescriptiveStatistics
28© 2001 ConceptFlow
Variation Hypothesis Testing Objectives
By the end of this module participants should be able to:• Choose the appropriate test of variance for a given problem• Perform hypothesis tests of variance• Design and apply hypothesis tests of variance on projects
29© 2001 ConceptFlow
P
Comparison of Proportion: 2 Scenarios
1. Single Proportion Comparison
• One population proportion compared to a target value
2. Two Sample Comparison
• Proportions of two independent populations compared to each other
vs.
P1P2
30© 2001 ConceptFlow
1 ProportionTest
Comparing Proportions
Hypothesis Testing of Proportion - Roadmap
2 ProportionTest
2 Sample
Chi-Square Test
More than 2 samples1 Sample
31© 2001 ConceptFlow
Proportion Hypothesis Testing Objectives
By the end of this module participants should be able to:• Choose the appropriate test of proportion for a given problem• Perform hypothesis tests of proportion• Determine sample size for 1 proportion and 2 proportion hypothesis
testing• Design and apply hypothesis tests of proportion on projects
32© 2001 ConceptFlow
Both of these tools use the Chi-Square distribution, where fo and fe are the observed and expected frequencies, respectively.
What Are Chi-Square Tools?
• Chi-Square Goodness-of-Fit Test
• To test if a particular distribution (model) is a good fit for a population
• Chi-Square Test for Association
• To test if a relationship between two attribute variables exists
2 = fo - fe
2
fej = 1
g
Chi-Square Statistic
33© 2001 ConceptFlow
The Chi-Square Distribution
• Measure of difference between observed counts and expected counts
• Observations must be independent
• Works best with 5 or more observations in each cell
• Cells may be combined to pool observations
0.1
1.2
2.3
3.4
4.5
5.6
6.7
7.8
8.9
1011
.112
.213
.314
.415
.516
.617
.718
.819
.9
= 2
= 10
= 4
2
Val
ue
of
the
(2 )
dis
trib
uti
on
= 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Chi-square distributionfor various degrees of freedom ()
34© 2001 ConceptFlow
Chi Square Hypothesis Testing Objectives
By the end of this module the participants should be able to• Formulate appropriate hypotheses for Chi-Square tests• Apply Chi-Square Goodness-of-Fit Test to practical problems• Apply Chi-Square Test for Association to practical problems
35© 2001 ConceptFlow
What is ANOVA?
• Hypothesis Test for MEANS• Uses two components of variance
• within variance (no change)• between variance (after a change)
• Uses the F-distribution to test the variance components• Comprehensive test for significance • Backbone test statistic for subsequent complex analysis
36© 2001 ConceptFlow
When to Use ANOVA
Variables Road Map
1 Sample t-test
1 Sample
2 Samplet -test
PairedComparisons
Tukey'sQuick Test
2 Samples
ANOVA
2 or more samples
Variables Data
1 Mean 2 Means 2+ Means
ANOVA is used to test two or more means
37© 2001 ConceptFlow
Working With the ANOVA Data
• ANOVA data analysis will determine• Total process variance• Within factor variance
• Variation due to noise• Technology focus
• Between factor variance• Variation due to factor change• Process focus
38© 2001 ConceptFlow
ANOVA Objectives
By the end of this module, the participant should be able to: • Explain how ANOVA works • Interpret an ANOVA table• Determine significant effects • Perform a residual analysis• Determine if data is normal• Test groups of data for equal variances• Run main effects plots
39© 2001 ConceptFlow
What is a Variation Component Study?
• A variation component study combines techniques from familiar areas:• Shewart control chart model
• Rational sub-grouping• Measurement systems analysis• Graphical, Multi-Variate charts • Analysis of variance (ANOVA) methods
• Type of study partitions potential sources of variation within a process so the researcher will know where to work first
40© 2001 ConceptFlow
Crossed Versus Nested Studies
Subject 1 Subject 2 Subject 3
Group 1
Subject 1 Subject 2 Subject 3
Group 2 ...
Subject 1 Subject 2 Subject 3
Group k
Subject 1 Subject 2 Subject 3
Group 1
Subject 4 Subject 5 Subject 6
Group 2 ...
Subject 16 Subject 17 Subject 18
Group k
Crossed Study: Subjects are not unique to one group
Nested Study: Subjects are unique to one group
41© 2001 ConceptFlow
Variation Component Studies Objectives
By the end of this module participant should be able to:• Design appropriate sampling plans for variation component studies• Recognize whether data is crossed, nested or both and model the
scenarios using ANOVA• Analyze studies
• Graphically• With control charts• Using ANOVA methods
• Provide estimates of variation components (quantify)• Provide guidance/direction for process improvement
42© 2001 ConceptFlow
Correlation Coefficient
302010
100
90
80
70
60
50
40
X
Y
r = -1.0302010
90
80
70
60
50
40
30
20
X
Y
r = +1.0
302010
76
75
74
73
72
71
X
Y
r = 0.0
No correlation
43© 2001 ConceptFlow
Correlation and Regression
• Correlation tells how much linear association exists between two variables
• Regression provides an equation describing the nature of relationship
Correlations: Shelf Space, Sales
Pearson correlation of Shelf Space and Sales = 0.978
p-value = 0.000
Regression Analysis: Sales versus Shelf Space
The regression equation is Sales = - 4711 + 10.1 Shelf Space
44© 2001 ConceptFlow
Types of Regression
• Simple Linear Regression
• Single regressor (x) variable such as x1 and model linear with respect to coefficients
• Multiple Linear Regression
• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients
• Simple Non-Linear Regression • Single regressor (x) variable such as x and model non-linear with
respect to coefficients• Multiple Non-Linear Regression
• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients
45© 2001 ConceptFlow
Method of Least Squares
Objective:
• Find a line that will minimize sum of squares of residuals
650600550
2000
1500
1000
Shelf Space
Sal
es
Regression Plot
Ŷ
Regression Line
Residual = Y - Ŷ ̂
Residuals are the error of prediction
Y
46© 2001 ConceptFlow
Correlation and Simple Regression Objectives
By the end of this module the participant should be able to:• Measure the strength of correlation between two variables• Determine if a correlation coefficient is statistically significant• Perform simple linear regression including polynomial regression• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for
a given value of predictor
47© 2001 ConceptFlow
What is Multiple Regression?
• Procedure of establishing relationship between a continuous type response variable and two or more independent variables
• Multiple regression equation can be used to predict a response based on values of predictor variables
• Multiple regression equation takes the form
Y = f (x1, x2, x3, ….)
48© 2001 ConceptFlow
Types of Multiple Regression
• Multiple Linear Regression
• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients
• Multiple Non-Linear Regression
• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients
This module focuses on multiple linear regression applying general least squares method
49© 2001 ConceptFlow
Predictor Variable Selection
• What combination of predictor variables is best for the regression model?
• Three options in MINITAB™:• Stepwise: procedure to add and remove variables to the regression
model to produce a useful subset of predictors• Best Subsets: procedure to give best fitting regression model that
can be constructed with one variable, two variable, three variable, etc. models
• Regression: once the best model is selected, use Regression to get more detailed diagnostics
50© 2001 ConceptFlow
Multiple Regression Objectives
By the end of this module participant should be able to:• Determine, for a given response variable, the key process input
variables from a set of multiple input variables• Perform multiple linear regression for a given set of response variables
using several input variables• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for
given values of predictor variables
51© 2001 ConceptFlow
Analyze Phase Deliverables
• Week 1 Deliverables summarized and updated
• Revised problem statement reflecting an increased understanding of the problem
• Detailed Process Map revised
• Additional sources of variation quantified and prioritized
• Use and display data to identify and verify the “vital few” factors
• Sampling plan
• Graphical analysis and interpretation of data
• Correlation and Regression Analysis
• Confidence interval for Y metric(s)
• Hypothesis statement(s), null hypothesis and alternative hypothesis
• MINITAB hypothesis test output, p value and interpretation
• Project management report (Gantt chart, timelines, milestones, critical path)
• Any red flags with project or project scope and recommendations to resolve
• Next steps
• Signed approval of report out by Project Champion
Prepare and deliver a 10 minute presentation that discusses the following project status items:
52© 2001 ConceptFlow
Appendix
53© 2001 ConceptFlow
3 or more Levels
Non-Parametric Tests
Binominal (Dichotomous)
Mann-Whitney U
(T-test analog)
Friedman Two way
ANOVA (Repeated measure ANOVA)
Dependent
Kruskal-Wallis H (One
way ANOVA analog)
Wilcoxon Sign (Paired
t-test analog)
Independent Dependent Independent
Non-Parametric Hypothesis Testing Roadmap
Trademarks and Service Marks
Six Sigma is a federally registered trademark of Motorola, Inc.
Breakthrough Strategy is a federally registered trademark of Six Sigma Academy.
VISION. FOR A MORE PERFECT WORLD is a federally registered trademark of Six Sigma Academy.
ESSENTEQ is a trademark of Six Sigma Academy.
FASTART is a trademark of Six Sigma Academy.
Breakthrough Design is a trademark of Six Sigma Academy.
Breakthrough Lean is a trademark of Six Sigma Academy.
Design with the Power of Six Sigma is a trademark of Six Sigma Academy.
Legal Lean is a trademark of Six Sigma Academy.
SSA Navigator is a trademark of Six Sigma Academy.
SigmaCALC is a trademark of Six Sigma Academy.
iGrafx is a trademark of Micrografx, Inc.
SigmaTRAC is a trademark of DuPont.
MINITAB is a trademark of Minitab, Inc.