3-1 Review of Analyze Phase

1© 2001 ConceptFlow

Review of Analyze


Analyze Phase Deliverables

• A prioritized list of potential sources of variation • Variation Component Studies• Measurement Analysis on the x’s• Data collected to validate sources• Graphical and statistical analysis of data

• P-value establishing level of significance and probability• Correlation and regression analysis to determine variable relationships• Reduced list of potential key input variables that affect the output(s)• Updated control charts, process map & FMEA• Results to data (compared to baseline)

Define Improve ControlMeasure Analyze

Statistically links key input variables with key output variable


Analyze Week Topics

• Review of Measure Week• Central Limit Theorem• Confidence Intervals• Introduction to Hypothesis

Testing• Hypothesis Testing

• Means• Variance• Proportion• Chi Square

• Analysis of Variance (ANOVA)• Variation Components• Correlation and Simple

Regression• Multiple Regression• Wrap-up and Deliverables


n sample sizex

individuals

x

Central Limit Theorem Defined

• If variable x has an unknown distribution with mean = and standard deviation = , then

• Sampling distribution of x (mean) having sample size of n will

(1) have a mean,

(2) have a standard deviation,

(3) tend to be normal as the sample size becomes large (n>30 for unknown distributions)

© 2001 ConceptFlow

Standard Error of the Mean

mean for the Size Sample=n

Scores Individual for theDeviation Standard

Mean theofError Standard

x

Distribution of Sample Averages

Population of Individuals

SE Mean =

x

nx

© 2001 ConceptFlow

Central Limit Theorem Objectives

By the end of this module the participant should be able to:• Discuss the Central Limit Theorem (CLT) and demonstrate its results

using a practical example• Discuss the implications of Central Limit Theorem in statistical analysis• Describe how to apply the Central Limit Theorem to reduce

measurement variation


A Graphical View

A 95% confidence interval suggests that approximately 95 out of 100 confidence intervals will contain the population parameter

Confidence Interval

Population Mean

Sample Mean


SAMPLE WITHIN

(subset)

ENTIRE POPULATION

Population Versus Sample

Sample mean=X

“Population Parameters”

“Sample Statistics”

= Population mean

s = Sample standard deviationPopulation

= Population standard deviation

If we only pull samples, do we ever know the true population parameters?

Sample


CI = Sample Statistic Margin of Error

Margin of Error = K * Measure of Variability

Statistic = Mean, Variance, Proportion, etc. from sample

Confidence Factor, K = Constant based on a statistical distribution

Estimating Confidence Intervals (CIs)

• Parametric confidence intervals in most cases take the general form:

• Confidence intervals reflect the sample to sample variation of our point estimates


Confidence Interval and Central Limit Theorem

10090807060504030

500

400

300

200

100

0

Population

Fre

qu

en

cy

10090807060504030

80

70

60

50

4030

20

10

0

Sample

Fre

qu

en

cy

43210-1-2-3-4

99.73%

95.44%

68.26%

Pro

babi

lity

of S

ampl

e V

alue

95% of all sample means are within two “standard errors” of the population mean


Confidence Interval Objectives

By the end of this module participants should be able to:• Discuss the role of confidence intervals in statistical analysis• Discuss the meaning of confidence intervals in

practical terms• Calculate confidence intervals for the mean, standard deviation,

proportion and other derived parameters such as Cp and Pp


What is Hypothesis Testing?

• In hypothesis testing, relatively small samples are used to answer questions about population parameters (inferential statistics)

• There is always a chance that the selected sample is not representative of the population; therefore, there is always a chance that the conclusion obtained is wrong (Alpha & Beta Risks)

• With some assumptions, inferential statistics allows the estimation of the probability of getting an “odd” sample and quantifies the probability (p-value) of a wrong conclusion


Process Flow of a Hypothesis Test

DECIDE:What does the evidence suggest?Reject Ho? or Fail to reject Ho?

Calculate test statistic and/or p-value

Collect sample data

Establish significance level ()

State the “Alternate Hypothesis” (Ha)

State a “Null Hypothesis” (Ho)

Define the problem and state objectives


Forming a Hypothesis

• Null Hypothesis (Ho)

• No difference/ no change • Factor not statistically significant• Population follows a normal

distribution

• Alternative Hypothesis (Ha)

• Difference/change occurred• Factor statistically significant• Population does not follow a

normal distribution

Assume H0 to be true until proven otherwise. Burden of proof rests with Ha


(Alpha) - Simplified Perspective

Null Hypothesis (Ho) assumed true

• e.g., defendant assumed innocent• Prosecuting attorney must provide evidence beyond reasonable doubt

that assumption is not true• Reasonable doubt = (significance level)


Alpha () & Beta () Risk

-risk • Risk of finding a difference when there really isn’t one• Type I error or Producers’ risk

-risk• Risk of not finding a difference when there really is one• Type II error or Consumers’ risk


Sensitivity

/ where = size of difference and =SD• Relative magnitude or size of the difference being tested expressed in

standard deviations• Called test sensitivity

1

/2


The Relationship in Hypothesis Testing

Decision

Fail to reject Ho

Truth

Ho true

Ha true

Type I Error-Risk or false

positive)

Type II Error-Risk or false

negative)

Correct Decision

CI = 1-

Correct Decision

Power = 1-

Reject Ho

Producers’ Risk

Consumers’ Risk


Test Statistic and -value Graphical View

0

Observed value of Test Statistic

Critical value

-risk - value


Hypothesis Testing Introduction Objectives

By the end of this module participants should be able to:• Discuss the hypothesis testing process• Recognize and risks and how they affect hypothesis testing• Discuss how the p-value is used for decision making• Relate the hypothesis testing process to real world examples


Comparison of Means: 4 Scenarios

1. Single Mean Comparison

• One sample vs. target

• is known

2. Single Mean Comparison

• One sample vs. target

• is NOT known

targetvalue

vs.

targetvalue

vs.


Comparison of Means: 4 Scenarios

3. Two Sample Comparison

• Two independent samples compared to each other

4. Paired Comparison

• The difference (“”) between two paired samples

vs.

1

- =

1

2

2

d

d vs. target


Hypothesis Testing of Means-Roadmap

3 or more

factors

Comparing Means

1 Factor

1-sample Z-test

Two way ANOVA

ANOVAGLM

One way ANOVA

1-sample t-test

2-samplet-test

Paired t-test

1 Sample 2 Samples 2 or more

samples

2 Factors

not known known independent paired


Means Hypothesis Testing Objectives

By the end of this module participant should be able to:• Choose the appropriate test for a given problem regarding population

mean• Perform hypothesis tests of mean• Design and apply hypothesis tests of mean on projects


vs. targetvalue

Comparison of Variance: 3 Scenarios

1. Single Variance Comparison

• One population standard deviation compared to a target value


• Variances of two independent populations compared to each other

vs.

21

22


Comparison of Variance: 3 Scenarios

3. More than Two Sample Comparison

• Variances of more than two independent populations compared to each other

vs.

21 2

322

vs.


1 VarianceTest

1 Sample

Comparing Variances

Hypothesis Testing of Variation - Roadmap

2 VarianceTest

2 Sample

Test for EqualVariance

More Than 2 Samples

Levene’s TestBartlett’s TestLevene’s TestF- TestDescriptiveStatistics


Variation Hypothesis Testing Objectives

By the end of this module participants should be able to:• Choose the appropriate test of variance for a given problem• Perform hypothesis tests of variance• Design and apply hypothesis tests of variance on projects


P

Comparison of Proportion: 2 Scenarios

1. Single Proportion Comparison

• One population proportion compared to a target value


• Proportions of two independent populations compared to each other

vs.

P1P2


1 ProportionTest

Comparing Proportions

Hypothesis Testing of Proportion - Roadmap

2 ProportionTest

2 Sample

Chi-Square Test

More than 2 samples1 Sample


Proportion Hypothesis Testing Objectives

By the end of this module participants should be able to:• Choose the appropriate test of proportion for a given problem• Perform hypothesis tests of proportion• Determine sample size for 1 proportion and 2 proportion hypothesis

testing• Design and apply hypothesis tests of proportion on projects


Both of these tools use the Chi-Square distribution, where fo and fe are the observed and expected frequencies, respectively.

What Are Chi-Square Tools?

• Chi-Square Goodness-of-Fit Test

• To test if a particular distribution (model) is a good fit for a population

• Chi-Square Test for Association

• To test if a relationship between two attribute variables exists

2 = fo - fe

2

fej = 1

g

Chi-Square Statistic


The Chi-Square Distribution

• Measure of difference between observed counts and expected counts

• Observations must be independent

• Works best with 5 or more observations in each cell

• Cells may be combined to pool observations

0.1

1.2

2.3

3.4

4.5

5.6

6.7

7.8

8.9

1011

.112

.213

.314

.415

.516

.617

.718

.819

.9

= 2

= 10

= 4

2

Val

ue

of

the

(2 )

dis

trib

uti

on

= 6

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Chi-square distributionfor various degrees of freedom ()


Chi Square Hypothesis Testing Objectives

By the end of this module the participants should be able to• Formulate appropriate hypotheses for Chi-Square tests• Apply Chi-Square Goodness-of-Fit Test to practical problems• Apply Chi-Square Test for Association to practical problems


What is ANOVA?

• Hypothesis Test for MEANS• Uses two components of variance

• within variance (no change)• between variance (after a change)

• Uses the F-distribution to test the variance components• Comprehensive test for significance • Backbone test statistic for subsequent complex analysis


When to Use ANOVA

Variables Road Map

1 Sample t-test

1 Sample

2 Samplet -test

PairedComparisons

Tukey'sQuick Test

2 Samples

ANOVA

2 or more samples

Variables Data

1 Mean 2 Means 2+ Means

ANOVA is used to test two or more means


Working With the ANOVA Data

• ANOVA data analysis will determine• Total process variance• Within factor variance

• Variation due to noise• Technology focus

• Between factor variance• Variation due to factor change• Process focus


ANOVA Objectives

By the end of this module, the participant should be able to: • Explain how ANOVA works • Interpret an ANOVA table• Determine significant effects • Perform a residual analysis• Determine if data is normal• Test groups of data for equal variances• Run main effects plots


What is a Variation Component Study?

• A variation component study combines techniques from familiar areas:• Shewart control chart model

• Rational sub-grouping• Measurement systems analysis• Graphical, Multi-Variate charts • Analysis of variance (ANOVA) methods

• Type of study partitions potential sources of variation within a process so the researcher will know where to work first


Crossed Versus Nested Studies

Subject 1 Subject 2 Subject 3

Group 1


Group 2 ...


Group k


Group 1


Group 2 ...


Group k

Crossed Study: Subjects are not unique to one group

Nested Study: Subjects are unique to one group


Variation Component Studies Objectives

By the end of this module participant should be able to:• Design appropriate sampling plans for variation component studies• Recognize whether data is crossed, nested or both and model the

scenarios using ANOVA• Analyze studies

• Graphically• With control charts• Using ANOVA methods

• Provide estimates of variation components (quantify)• Provide guidance/direction for process improvement


Correlation Coefficient

302010

100

90

80

70

60

50

40

X

Y

r = -1.0302010

90

80

70

60

50

40

30

20

X

Y

r = +1.0

302010

76

75

74

73

72

71

X

Y

r = 0.0

No correlation


Correlation and Regression

• Correlation tells how much linear association exists between two variables

• Regression provides an equation describing the nature of relationship

Correlations: Shelf Space, Sales

Pearson correlation of Shelf Space and Sales = 0.978

p-value = 0.000

Regression Analysis: Sales versus Shelf Space

The regression equation is Sales = - 4711 + 10.1 Shelf Space


Types of Regression

• Simple Linear Regression

• Single regressor (x) variable such as x1 and model linear with respect to coefficients

• Multiple Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients

• Simple Non-Linear Regression • Single regressor (x) variable such as x and model non-linear with

respect to coefficients• Multiple Non-Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients


Method of Least Squares

Objective:

• Find a line that will minimize sum of squares of residuals

650600550

2000

1500

1000

Shelf Space

Sal

es

Regression Plot

Ŷ

Regression Line

Residual = Y - Ŷ ̂

Residuals are the error of prediction

Y


Correlation and Simple Regression Objectives

By the end of this module the participant should be able to:• Measure the strength of correlation between two variables• Determine if a correlation coefficient is statistically significant• Perform simple linear regression including polynomial regression• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for

a given value of predictor


What is Multiple Regression?

• Procedure of establishing relationship between a continuous type response variable and two or more independent variables

• Multiple regression equation can be used to predict a response based on values of predictor variables

• Multiple regression equation takes the form

Y = f (x1, x2, x3, ….)


Types of Multiple Regression

• Multiple Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model linear with respect to coefficients

• Multiple Non-Linear Regression

• Multiple regressor (x) variables such as x1, x2, x3 and model non-linear with respect to coefficients

This module focuses on multiple linear regression applying general least squares method


Predictor Variable Selection

• What combination of predictor variables is best for the regression model?

• Three options in MINITAB™:• Stepwise: procedure to add and remove variables to the regression

model to produce a useful subset of predictors• Best Subsets: procedure to give best fitting regression model that

can be constructed with one variable, two variable, three variable, etc. models

• Regression: once the best model is selected, use Regression to get more detailed diagnostics


Multiple Regression Objectives

By the end of this module participant should be able to:• Determine, for a given response variable, the key process input

variables from a set of multiple input variables• Perform multiple linear regression for a given set of response variables

using several input variables• Perform model diagnostics and validate assumptions• Use a regression model to predict the value of a response variable for

given values of predictor variables


Analyze Phase Deliverables

• Week 1 Deliverables summarized and updated

• Revised problem statement reflecting an increased understanding of the problem

• Detailed Process Map revised

• Additional sources of variation quantified and prioritized

• Use and display data to identify and verify the “vital few” factors

• Sampling plan

• Graphical analysis and interpretation of data

• Correlation and Regression Analysis

• Confidence interval for Y metric(s)

• Hypothesis statement(s), null hypothesis and alternative hypothesis

• MINITAB hypothesis test output, p value and interpretation

• Project management report (Gantt chart, timelines, milestones, critical path)

• Any red flags with project or project scope and recommendations to resolve

• Next steps

• Signed approval of report out by Project Champion

Prepare and deliver a 10 minute presentation that discusses the following project status items:


Appendix


3 or more Levels

Non-Parametric Tests

Binominal (Dichotomous)

Mann-Whitney U

(T-test analog)

Friedman Two way

ANOVA (Repeated measure ANOVA)

Dependent

Kruskal-Wallis H (One

way ANOVA analog)

Wilcoxon Sign (Paired

t-test analog)

Independent Dependent Independent

Non-Parametric Hypothesis Testing Roadmap

Trademarks and Service Marks

Six Sigma is a federally registered trademark of Motorola, Inc.

Breakthrough Strategy is a federally registered trademark of Six Sigma Academy.

VISION. FOR A MORE PERFECT WORLD is a federally registered trademark of Six Sigma Academy.

ESSENTEQ is a trademark of Six Sigma Academy.

FASTART is a trademark of Six Sigma Academy.

Breakthrough Design is a trademark of Six Sigma Academy.

Breakthrough Lean is a trademark of Six Sigma Academy.

Design with the Power of Six Sigma is a trademark of Six Sigma Academy.

Legal Lean is a trademark of Six Sigma Academy.

SSA Navigator is a trademark of Six Sigma Academy.

SigmaCALC is a trademark of Six Sigma Academy.

iGrafx is a trademark of Micrografx, Inc.

SigmaTRAC is a trademark of DuPont.

MINITAB is a trademark of Minitab, Inc.

Documents

3-1 Review of Analyze Phase