Upload
leanleadersorg
View
3.331
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
This material is not for general distribution, and its contents should not be quoted, extracted for publication, or otherwisecopied or distributed without prior coordination with the Department of the Army, ATTN: ETF. UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National GuardBlack Belt Training
Module 36
Simple Linear Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
CPI Roadmap – Analyze
Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.
TOOLS•Value Stream Analysis•Process Constraint ID •Takt Time Analysis•Cause and Effect Analysis •Brainstorming•5 Whys•Affinity Diagram•Pareto •Cause and Effect Matrix •FMEA•Hypothesis Tests•ANOVA•Chi Square •Simple and Multiple Regression
ACTIVITIES
• Identify Potential Root Causes
• Reduce List of Potential Root Causes
• Confirm Root Cause to Output Relationship
• Estimate Impact of Root Causes on Key Outputs
• Prioritize Root Causes
• Complete Analyze Tollgate
1.Validate the
Problem
4. Determine Root
Cause
3. Set Improvement
Targets
5. Develop Counter-
Measures
6. See Counter-MeasuresThrough
2. IdentifyPerformance
Gaps
7. Confirm Results
& Process
8. StandardizeSuccessfulProcesses
Define Measure Analyze ControlImprove
8-STEP PROCESS
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
3Simple Linear Regression
Learning Objectives
Terminology and data requirements for conducting a regression analysis
Interpretation and use of scatter plots
Interpretation and use of correlation coefficients
The difference between correlation and causation
How to generate, interpret, and use regression equations
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
4Simple Linear Regression
Application Examples
Administrative – A financial analyst wants to predict the cash needed to support growth and increases in training
Market/Customer Research – The main exchange wants to determine how to predict a customer’s buying decision from demographics and product characteristics
Hospitality – The MWR Guest House wants to see if there is a relationship between room service delays and order size
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
5Simple Linear Regression
When Should I Use Regression?
The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but can also be used with count or categorical
inputs and outputs.
Continuous AttributeA
ttri
bu
te C
on
tin
uo
us
Independent Variable (X)D
ep
en
de
nt
Va
ria
ble
(Y
)
Regression ANOVA
Logistic
Regression
Chi-Square (2)
Test
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
6Simple Linear Regression
General Strategy for Regression Modeling
• What variables?• How will I get the data?• How much data do I need?
• What input variables have the biggest effect on the response variable?
• What are some candidate prediction models?
• What is the best model?
• How well does the model predict new observations?
Planning and Data Collection
Initial Analysis and Reduction of Variables
Select and Refine Models
Validate Model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
7Simple Linear Regression
Regression Terminology
Types of Variables
Input Variable (Xs)
These are also called predictor variables or independent variables
Best if the variables are continuous, but can be count or categorical
Output Variable (Ys)
These are also called response variables or dependent variables (what we’re trying to predict)
Best if the variables are continuous, but can be count or categorical
Process or
Product
X1
X2
X3
Y
Error
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
8Simple Linear Regression
Visualize the Data – A Good Start!
Lets you “see” patterns in data
Supports or refutes theories about the data
Helps create or refine hypotheses
Predicts effects under other circumstances (be careful extending predictions beyond the range of data used)
Scatter Plot: A graph showing a relationship (or correlation) between two factors or variables
Be CarefulCorrelation does not guarantee causation!
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
9Simple Linear Regression
Correlation vs. Causation
Correlation by itself does not imply a cause and effect relationship!
Lurking
variables!
Other examples?
Ave
rag
e lif
e ex
pec
tan
cy
# divorces/10,000 Price of automobiles
Gas
mile
age
When is it correct to infer causation?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
10Simple Linear Regression
Example: Mortgage Estimates
A Belt is trying to reduce the call length for military clients calling for a good faith estimate on a VA loan
The Belt thinks that there is a relationship between broker experience and call length, and creates a scatter plot to visualize the relationship
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
11Simple Linear Regression
Example: Mortgage Estimate Scatter Plot
Does it look like a relationship exists between Broker Experience and Call Length?
302010
60
50
40
30
20
Broker Experience
Call
Length
Hypothesis: Brokers with more experience can provide
estimates in a shorter time.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
12Simple Linear Regression
302010
60
50
40
30
20
Broker Experience
Call
Length
Scatter Plot - Structure
Y Axis(Result?)
PairedData
X Axis( SuspectedInfluence )
Paired Data?To use a scatter plot, you must have measured two factors for a single observation or item (ex: for a given measurement, you need to know both the call length and the broker’s experience). You have to make sure that the data “pair-up” properly in Minitab, or the diagram will be meaningless.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
13Simple Linear Regression
Input, Process, Output Context
Y
X
PREDICTOR MEASURES RESULTS MEASURES
Input OutputProcess
• Customer Satisfaction
• Total Defects
• Cycle Time
• Cost
• Arrival Time
• Accuracy
• Cost
• Key Specs
(X) (Y) (X)
• Time Per Task
• In-Process Errors
• Labor Hours
• Exceptions
X Axis –Independent Variable
Y Axis –Dependent Variable
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
14Simple Linear Regression
Scatter Plots
See how one factor relates to changes in another
Develop and/or verify hypotheses
Judge strength of relationship by width or tightness of scatter
Don’t assume a causal relationship!
No Correlation PositiveCurvilinearNegative
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
15Simple Linear Regression
Exercise: Interpreting Scatter Plots
1. As a team, review assigned Scatter Plots – see next pages
2. What kind of correlation do you see? (Name)
3. What does it mean?
4. What can you conclude?
5. What data might this represent? (Example)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
16Simple Linear Regression
Example One
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
17Simple Linear Regression
Example Two
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
18Simple Linear Regression
Example Three
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
19Simple Linear Regression
Minitab Example: Scatter Plot
Next, we will work through a Minitab example using data collected at the Anthony’s Pizza company
The Belt suspects that the customers have to wait too long on days when there are many deliveries to make at Anthony’s Pizza
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
20Simple Linear Regression
Minitab Example: Pizza Scatter Plot
A month of data was collected, and stored in the Minitab file Regression-Pizza.mtw
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
21Simple Linear Regression
Pizza Scatter Plot (Cont.)
1. Open worksheet Regression-Pizza.mtw
2. Choose Graph>Scatterplot
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
22Simple Linear Regression
Pizza Scatter Plot (Cont.)
When you click on Scatterplots,this is the first dialog box thatcomes up
3. Select the Simple Scatterplot
4. Click on OK to move to the next dialog box
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
23Simple Linear Regression
Pizza Scatter Plot (Cont.)
5. Double click on C5 Wait Time to enter it as the Y variable, then double click on C6 Deliveries to enter it as the X variable
6. Edit dialog box options (Optional)
7. Click OK
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
24Simple Linear Regression
Pizza Scatter Plot (Cont.)
Does it look like the number of Deliveries influences the customer’s Wait Time?
Deliveries
Wa
it T
ime
353025201510
55
50
45
40
35
Scatterplot of Wait Time vs Deliveries
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
25Simple Linear Regression
Pizza Scatter Plot (Cont.)
Note: Hold your cursor over any point on the Scatterplot and Minitab will identify the
Row, X-Value and Y-Value for that point
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
26Simple Linear Regression
Correlation Coefficients (r & r2)
Numbers that indicate the strength of the correlation between two factors
r - strength and the direction of the relationship
Also called Pearson’s Correlation Coefficient
r2 - percentage of variation in Y attributable to the independent variable X.
Adds precision to a person’s visual judgment about correlation
Test the power of your hypothesis
How much influence does this factor have?
Are there other, more important, “vital few” causes?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
27Simple Linear Regression
Interpreting Correlation Coefficients
r falls on or between -1 and 1
Calculate in Minitab
Figures below -0.65 and above 0.65 indicate a meaningful correlation
1 = “Perfect” positive correlation
-1 = “Perfect” negative correlation
Use to calculate r2
r=0
r=-.8
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
28Simple Linear Regression
Pearson Correlation Coefficient (r) – Mortgage
Betty Black Belt used the scatter plot to get a visual picture of the relationship between broker experience and call length
Now she uses the Pearson Correlation Coefficient, r, to quantify the strength of the relationship
r = - 0.896(a strong negative correlation)
302010
60
50
40
30
20
Broker Experience
Call
Length
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
29Simple Linear Regression
Exercise: Correlation
The scatter plot shows that the customers are waiting longer when Anthony’s Pizza has to make more deliveries
Next, the Belt wants to quantify the strength of that relationship
To do that, we will calculate the Pearson Correlation Coefficient, r
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
30Simple Linear Regression
Pizza Correlation
1. Choose Stat > Basic Statistics > Correlation
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
31Simple Linear Regression
Correlation Input Window
2. Double click on C5 Wait Time and C6 Deliveriesto add them to the Variables box
3. Uncheck the box, Display p-values
4. Click OK
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
32Simple Linear Regression
Correlation Coefficient
Since r, the Pearson correlation, is 0.970, there is a meaningful correlation between the wait time and number of deliveries
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
33Simple Linear Regression
Interpreting Coefficients – r2
First, we obtained r from the Correlation analysis
Next, in Regression, we will look at r2 to see how good our model (regression equation) is
r2: Compute by multiplying r x r (Pearson correlation squared)
Example: With an r value of .970, in the Pizza example, the team computed r2 :
.970 x .970 = .941 or 94.1%
So, 94% of the variation in wait time is explained by the variability in deliveries
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
34Simple Linear Regression
Regression Analysis
Regression Analysis is used in conjunction with Correlation and Scatter Plots to predict future performance using past results
While Correlation shows how much linear relationship exists between two variables, Regression defines the relationship more precisely
Use this tool when there is existing data over a defined range
Regression analysis is a tool that uses data on relevant variables to develop a prediction equation, or model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
35Simple Linear Regression
Linear Regression
In Simple Linear Regression, a single variable “X” is used to define/predict “Y”
e.g.; Wait Time = B1 + (B2) x (Deliveries) + (error)
Simple Regression Equation: Y = B1 + (B2) x (X) + Y
X
x
B2 = Slope
y
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
36Simple Linear Regression
Exercise: Regression
Since the Pearson Correlation (r) was .970, we know that there is a strong positive correlation between the number of deliveries and the wait time
Next, the Belt would like to get an equation to predict how long the customers will be waiting
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
37Simple Linear Regression
Regression (Cont.)
1. Choose Stat>Regression>Fitted Line Plot
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
38Simple Linear Regression
Fitted Line Input Window
2. Double click on C5 Wait Time to enter it as the Response (Y) variable
3. Double click on C6 Deliveries to enter it as the Predictor (X) variable
4. Make sure Linear is checked for the type of Regression
5.Edit dialog box options (Optional)
6. Click OK
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
39Simple Linear Regression
Pizza Regression Plot
Deliveries
Wa
it T
ime
353025201510
55
50
45
40
35
S 1.11885
R-Sq 94.1%
R-Sq(adj) 93.9%
Fitted Line PlotWait Time = 32.05 + 0.5825 Deliveries
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
40Simple Linear Regression
Regression Analysis Results – Session Window
R-Sq is the amount of variation in the data explained by the model. Notice that 94.1 = .970 * .970. R-Sq is the square of the Pearson
correlation from the previous analysis.
Prediction Equation(Regression Model)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
41Simple Linear Regression
Using the Prediction Equation
If we have 20 deliveries to make, how long will the customer have to wait for their order?
Based on our 30 minute guarantee, how acceptable is our performance?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
42Simple Linear Regression
Method of “Least Squares” Regression – Technical Note
Minitab will find the “best fitting” line for us. How does it do that?•We want to have as little difference as possible between the true observations and the fitted line
•Minitab minimizes the sums of squares of the distance between the fitted and true observations
Deliveries
Wa
it T
ime
353025201510
55
50
45
40
35
Fitted Line PlotWait Time = 32.05 + 0.5825 Deliveries
Y
Y
true observation (the data point)
“fitted” observation (the line)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
43Simple Linear Regression
Multiple Regression
Use this when you want to consider more than one predictor variable
The benefit is that you might need more predictors to create an accurate model
In the case of our Anthony’s Pizza example, we may want to look at the impact that incorrect orders, damaged pizzas, and cold pizzas have on wait time
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
44Simple Linear Regression
Individual Exercise: Pizza
As a Anthony’s Pizza Belt, you suspect that the number of pizza defects increases when more pizzas are ordered. You want to visualize the data and quantify the relationship
Use the Minitab file Pizza Exercise.mtw data to investigate the relationship between “Total Pizzas” and “Defects”
Create a scatter plot
Determine correlation
Create a fitted line plot
Determine the prediction equation
How many defects do we usually have when 50 pizzas are on order? What do you think of this model?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
45Simple Linear Regression
Another Exercise: Absentee Rate
The human resources director of a chain of fast-food restaurants studied the absentee rate of employees. Whenever employees called in sick, or simply did not show up, the restaurant manager had to find replacements in a hurry, or else work short-handed
The director had data on the number of absences per 100 employees per week (Y) and the average number of months’ experience at the restaurant (X) for 10 restaurants in the chain. The director expected that long-term employees would be more reliable and absent less often
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
46Simple Linear Regression
Absentee Rate
1. Open an blank Minitab worksheet and input the data
2. Create a scatter plot and decide whether a straight line is a reasonable model
3. Conduct a regression analysis and get the linear prediction equation
4. Predict the number of absences for employees with 19.5 months of experience
Experience Absences
18.1 31.5
20.0 33.1
20.8 27.4
21.5 24.5
22.0 27.0
22.4 27.8
22.9 23.3
24.0 24.7
25.4 16.9
27.3 18.1
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
47Simple Linear Regression
Takeaways
Start with a visual tool – create a scatter plot
Determine the Pearson correlation coefficient, r, to determine the strength of the relationship
Remember that correlation does not guarantee causation!
Create and interpret the Regression Plot
Use the prediction equation
Validate the prediction model’s r-squared using new data (not part of the data set used in creating the prediction equation)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
What other comments or questions
do you have?