Upload
colin-nichols
View
218
Download
0
Embed Size (px)
Citation preview
3
Example: Computer Repair
A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are:
Time and Units.
4
Humm…
How long will it take me to repair this unit?
Goal: to predict the length of repair Time for a given number of computer Units
5
Computer Repair Data
Units Min’s Units Min’s
1 23 6 97
2 29 7 109
3 49 8 119
4 64 9 149
4 74 9 145
5 87 10 154
6 96 10 166
6
Scatterplot of response variable against explanatory variable
What is the overall (average) pattern? What is the direction of the pattern? How much do data points vary from the
overall (average) pattern? Any potential outliers?
Graphical Summary of Two Quantitative Variable
7
Time is Linearly related with computer Units.
(The length of) Time is Increasing as (the number of) Units increases.
Data points are closed to the line.
No potential outlier.
Scatterplot (Time vs Units) Some Simple Conclusions
Summary for Computer Repair Data
9
Review: Math Equation for a Line
Y: the response variable X: the explanatory variable
X
Y Y=b0+b1X
} b0
} b1
1
10
Regression Equation
The regression line models the relationship between X and Y on average.
The math equation of a regression line is called regression equation.
11
The Predicted Y Value
We use the regression line to estimate the average Y value for a specified X value and use this Y value to predict what Y value we might observe at this X value in the near future.
This predicted Y value, denoted as and pronounced as “y hat,” is the Y value on the regression line. So,
XbbY 10ˆ
Y
Regression equation
12
The Usage of Regression Equation
Predict the value of Y for a given X valueEg. Wish to predict a lady’s weight by her height.** What is X? Y?** Suppose b0 = -205 and b1 = 5: ** For ladies with HT of 60”, their WT will be
predicted as b0+b1x60=95 pounds, the (estimated) average WT of all ladies with HT of 60’’.
13
The Usage of Regression Equation
Eg. How long will it take to repair 3 computer units?
** Suppose b0= 4.16 and b1=15.51:
** the predicted time = 4.16+15.51x3 = 50.69
** It will take about 50.69 minutes.
14
• The predicted WT of a given HT
• The predicted repair time of a given # of units
Examples of the Predicted Y
XY 5205ˆ
XY 51.1516.4ˆ
15
The Limitation of the Regression Equation
The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.
Eg. Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!
16
The Unpredicted Part
The value is the part the regression equation (model) cannot catch, and it is called “residual.”
YY ˆ
18
Correlation between X and Y
X and Y might be related to each other in many ways: linear or curved.
19
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.2
1.4
1.6
1.8
2.0
2.2
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.5
2.0
2.5
3.0
r = .98Strong Linearity
r = .71Median Linearity
Examples of Different Levels of Correlation
20
x
y
0.0 0.2 0.4 0.6 0.8 1.0
2.0
2.5
3.0
3.5
4.0
r = -.09Nearly
Uncorrelated
Examples of Different Levels of Correlation
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
r = .00Nearly Curved
21
Correlation Coefficient of X and Y
A measurement of the strength of the “LINEAR” association between X and Y
Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y;
the correlation coefficient of X and Y is:
xy
n
iii
ssn
xxyyr
)1(
))((1
22
Correlation Coefficient of X and Y
-1< r < 1 The magnitude of r measures the strength of
the linear association of X and Y, which is the overall closeness of the points to a line.
The sign of r indicate the direction of the association: “-” negative association
“+” positive association
** visit the previous 4 plots again
23
Correlation Coefficient
The value r is almost 0
the best line to fit the data points is exactly horizontal
the value of X won’t change our prediction on Y
The value r is almost 1
A line fits the data points almost perfectly.
24
Correlation does not Prove Causation
Four Ways to interpret an observed association:
Causation There might be causation, but other variables
contribute as well The association is explained by how other
variables affect X and Y Y is causing a change in X
25
i
1
2
…
n
… …. ….
Total
2)(,, yyyyy iii 2)(,, xxxxx iii ))(( xxyy ii
2111 )(,, yyyyy
2222 )(,, yyyyy
2)(,, yyyyy nnn
211,1 )(, xxxxx
2222 )(,, xxxxx
2)(,, xxxxx nnn
))(( 11 xxyy
))(( 22 xxyy
))(( xxyy nn
2
11
)(,*, yyyn
ii
n
ii
2
11
)(,*, xxxn
ii
n
ii
))((1
xxyy i
n
ii
ySy,*, xSx ,0, r
Table for Computing Mean, St. Deviation, and Corr. Coef.
26
Example: Computer Repair Time
996.)96.2*217.46/(136)/(),(),(
,136)114/(1746),( ,1746))((
96.2 ,769.8)114/(114)( ,114)(
,614/84,84
22.46 ,2136)114/(36.27768)( ,36.27768)(
21.9714/1346,14,1346
1
2
1
1
1
2
1
xy
i
n
ii
x
n
ii
n
ii
y
n
ii
n
ii
ssXYCovXYCor
XYCovxxyy
sXVarxx
xx
sYVaryy
yny
27
(1) Fill the following table, then compute the mean and st. deviation of Y and X (2) Compute the corr. coef. of Y and X
(3) Draw a scatterplot
i
1 -.3 -.3 .09 .1 -.9 .81 .27
2 -.2 -.2 .04 .4 -.6 .36 .12
3 -.1 .01 .7
4 .1 .1 .01 1.2 .2
5 .2 .04 1.6 .6
6 .3 .3 .09 2.0
Total 0 * 6.0 *
ix xxi 2)( xxi iy yyi 2)( yyi ))(( xxyy ii
Exercise
28
4 6 8 10 12 14
X3
5
7
9
11
13
Y3
The Influence of Outliers
The slope becomes larger (toward the outlier)
The size of r becomes smaller