Upload
tamsyn-fletcher
View
212
Download
0
Embed Size (px)
Citation preview
CHAPTER 3 REVIEW: LINEAR
REGRESSION
Haroon Alam, Mitchell Sanders, Chuck McAllister-Ashley, and Arjun Patel
The Big Idea
Plot data on a scatterplot. Interpret what
you see: direction, form, strength, and outliers Numerical
summary: mean of X and Y, standard deviation of X and Y, and r.
Least-Squares Regression Line
How well does it fit: r and r^2
Vocabulary Response Variable: output, dependent
variable, y value Explanatory Variable: input, independent
variable, x value Scatterplot: a mathematical diagram that
shows values for two variables as points on a Cartesian plane; best used for quantitative data
Outlier: an observation that has a large residual
Vocabulary Influential Point: a point that has a large effect
on the slope of a regression line but has a small residual
Correlation: a measure of how dependent the response variable is on the explanatory variable
Residuals: difference between observed value of the response value and the value predicted by the regression line
Vocabulary Least-squares Regression Line: the line that
makes the sum of the squared vertical distances of the data points from the line as small as possible
Sum of Squared Errors: a measure of the difference between the estimated values based on the linear regression and the actual observations
Total Sum of Squares: a measure of the difference between the estimated values on the line y = y and the actual observed values.
Vocabulary Coefficient of Determination: the fraction of the
variation in the values of the response variable that can be explained by the LSRL of y on x
Residual Plots: a plot of the residuals against the explanatory variable
Extrapolation: the use of a regression line for prediction outside the range of values of the explanatory variable
Lurking Variable: a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables
Key Topics Data: categorical and quantitative Scatterplots and descriptions
Strong/weak, positive/negative, linear/not linear Outliers and Influential Points Creating the least-squares regression line Calculating correlation and coefficient of
determination
Formulas
To calculate the correlation r :
To calculate the slope, b, of the least-squares regression line:
To calculate the y-intercept:
To calculate the sum of squared errors, SSE:
Formulas
To calculate the total sum of squares, SSM:
To calculate the coefficient of determination:
Or the correlation r could be squared
To calculate the residual:
Calculator Key Strokes To make a scatterplot with the calculator, first enter the
explanatory variable data in L1. Then enter the corresponding response variable data in L2. Then, turn push “2nd” “Y=” “ENTER” “ENTER”. Next push “ZoomStat” to view the scatterplot.
To overlay the least-squares regression line over the scatterplot, follow the above two list of steps. However, after pushing “8” choose to store the RegEQ: by first selecting RegEQ:. Next, push “VARS”, scroll over to “Y-VARS”, and push “ENTER” twice. Push “ENTER” twice again to calculate the least-squares regression line. Next, push “ZoomStat” to view the scatterplot and the overlaying least-squares regression line.
Calculator Key Strokes To calculate the least-squares regression line, r,
and r2, first push “MODE”. Scroll down to “Stat Diagnostics” and select “ON”. Hit “Enter”. Enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Press “STAT”, choose “CALC”, then push “8”. Hit “ENTER” five times. The y-intercept, slope, r, and r2 will be calculated.
Calculator Key Strokes To create a residual plot in the calculator,
first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Next, calculate the least-squares regression line. Then, push “2nd” “Y=” “ENTER”. Turn on Plot1, make sure the scatterplot form is selected, and Xlist should be L1. Ylist should be changed to Resid. This is done by selecting Ylist, then pushing “2nd” “Stat” “Resid”. Next push “ZoomStat” to view the residual plot.
Example Problem With this data, find the
LSRL Start by entering this
data into list 1 and list 2
Shoe Size (men’s U.S.) Height (in)
7 64
10 69
12 71
8 68
9.5 71
10.5 70
11 72
12.5 74
13.5 77
10 68
Example Problem Results of the
Regressiona=53.24b=1.65r-squared=.8422r=.9177
)(65.124.53
65.124.53ˆ
sizeshoeHeight
xy
Example Problem Interpreting the intercept
When your shoe size is 0, you should be about 53.24 inches tall
Of course this does not make much sense in the context of the problem
Interpreting the slopeFor each increase of 1 in the shoe size, we would
expect the height to increase by 1.65 inches Making predictions
How tall might you expect someone to be who has a shoe size of 12.5?
Plug in 12.5Height = 53.24+1.65 (12.5)=73.865 inches
Helpful Hints Our eyes are not good judges of how
strong a linear relationship is. Correlation requires that both variables be
quantitative. Correlation makes no distinction between
explanatory and response variables. r does not change when the units of
measurement of x or y change. The correlation r is always a number
between -1 and 1.
Helpful Hints Correlation measures strength of linear
relationships only. The correlation is very affected by outliers. Regression, unlike correlation, requires
that we have an explanatory variable and a response variable.
The size of the LSRL slope does not determine how important a relationship is.
There is a close connection between correlation and the slope of LSRL.
Helpful Hints Do not forget to use y-hat in the equations. Write in the form Extrapolation produces unreliable
predictions. Lurking variables can make correlation
misleading. Correlations based on averages are
usually too high when applied to individuals.
Association does not imply causation.
bxay ˆ