18
CHAPTER 3 REVIEW: LINEAR REGRESSION Haroon Alam, Mitchell Sanders, Chuck McAllister-Ashley, and Arjun Patel

Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Embed Size (px)

Citation preview

Page 1: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

CHAPTER 3 REVIEW: LINEAR

REGRESSION

Haroon Alam, Mitchell Sanders, Chuck McAllister-Ashley, and Arjun Patel

Page 2: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

The Big Idea

Plot data on a scatterplot. Interpret what

you see: direction, form, strength, and outliers Numerical

summary: mean of X and Y, standard deviation of X and Y, and r.

Least-Squares Regression Line

How well does it fit: r and r^2

Page 3: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Vocabulary Response Variable: output, dependent

variable, y value Explanatory Variable: input, independent

variable, x value Scatterplot: a mathematical diagram that

shows values for two variables as points on a Cartesian plane; best used for quantitative data

Outlier: an observation that has a large residual

Page 4: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Vocabulary Influential Point: a point that has a large effect

on the slope of a regression line but has a small residual

Correlation: a measure of how dependent the response variable is on the explanatory variable

Residuals: difference between observed value of the response value and the value predicted by the regression line

Page 5: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Vocabulary Least-squares Regression Line: the line that

makes the sum of the squared vertical distances of the data points from the line as small as possible

Sum of Squared Errors: a measure of the difference between the estimated values based on the linear regression and the actual observations

Total Sum of Squares: a measure of the difference between the estimated values on the line y = y and the actual observed values.

Page 6: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Vocabulary Coefficient of Determination: the fraction of the

variation in the values of the response variable that can be explained by the LSRL of y on x

Residual Plots: a plot of the residuals against the explanatory variable

Extrapolation: the use of a regression line for prediction outside the range of values of the explanatory variable

Lurking Variable: a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables

Page 7: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Key Topics Data: categorical and quantitative Scatterplots and descriptions

Strong/weak, positive/negative, linear/not linear Outliers and Influential Points Creating the least-squares regression line Calculating correlation and coefficient of

determination

Page 8: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Formulas

To calculate the correlation r :

To calculate the slope, b, of the least-squares regression line:

To calculate the y-intercept:

To calculate the sum of squared errors, SSE:

Page 9: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Formulas

To calculate the total sum of squares, SSM:

To calculate the coefficient of determination:

Or the correlation r could be squared

To calculate the residual:

Page 10: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Calculator Key Strokes To make a scatterplot with the calculator, first enter the

explanatory variable data in L1. Then enter the corresponding response variable data in L2. Then, turn push “2nd” “Y=” “ENTER” “ENTER”. Next push “ZoomStat” to view the scatterplot.

To overlay the least-squares regression line over the scatterplot, follow the above two list of steps. However, after pushing “8” choose to store the RegEQ: by first selecting RegEQ:. Next, push “VARS”, scroll over to “Y-VARS”, and push “ENTER” twice. Push “ENTER” twice again to calculate the least-squares regression line. Next, push “ZoomStat” to view the scatterplot and the overlaying least-squares regression line.

Page 11: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Calculator Key Strokes To calculate the least-squares regression line, r,

and r2, first push “MODE”. Scroll down to “Stat Diagnostics” and select “ON”. Hit “Enter”. Enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Press “STAT”, choose “CALC”, then push “8”. Hit “ENTER” five times. The y-intercept, slope, r, and r2 will be calculated.

Page 12: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Calculator Key Strokes To create a residual plot in the calculator,

first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Next, calculate the least-squares regression line. Then, push “2nd” “Y=” “ENTER”. Turn on Plot1, make sure the scatterplot form is selected, and Xlist should be L1. Ylist should be changed to Resid. This is done by selecting Ylist, then pushing “2nd” “Stat” “Resid”. Next push “ZoomStat” to view the residual plot.

Page 13: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Example Problem With this data, find the

LSRL Start by entering this

data into list 1 and list 2

Shoe Size (men’s U.S.) Height (in)

7 64

10 69

12 71

8 68

9.5 71

10.5 70

11 72

12.5 74

13.5 77

10 68

Page 14: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Example Problem Results of the

Regressiona=53.24b=1.65r-squared=.8422r=.9177

)(65.124.53

65.124.53ˆ

sizeshoeHeight

xy

Page 15: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Example Problem Interpreting the intercept

When your shoe size is 0, you should be about 53.24 inches tall

Of course this does not make much sense in the context of the problem

Interpreting the slopeFor each increase of 1 in the shoe size, we would

expect the height to increase by 1.65 inches Making predictions

How tall might you expect someone to be who has a shoe size of 12.5?

Plug in 12.5Height = 53.24+1.65 (12.5)=73.865 inches

Page 16: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Helpful Hints Our eyes are not good judges of how

strong a linear relationship is. Correlation requires that both variables be

quantitative. Correlation makes no distinction between

explanatory and response variables. r does not change when the units of

measurement of x or y change. The correlation r is always a number

between -1 and 1.

Page 17: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Helpful Hints Correlation measures strength of linear

relationships only. The correlation is very affected by outliers. Regression, unlike correlation, requires

that we have an explanatory variable and a response variable.

The size of the LSRL slope does not determine how important a relationship is.

There is a close connection between correlation and the slope of LSRL.

Page 18: Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel

Helpful Hints Do not forget to use y-hat in the equations. Write in the form Extrapolation produces unreliable

predictions. Lurking variables can make correlation

misleading. Correlations based on averages are

usually too high when applied to individuals.

Association does not imply causation.

bxay ˆ