28
WELCOME TO SEMINAR 8 Chapter 9 Correlation & Regression Anthony J Feduccia MM207 Statistics

WELCOME TO SEMINAR 8 Chapter 9 Correlation & Regression

Embed Size (px)

DESCRIPTION

Example Scatterplot. WELCOME TO SEMINAR 8 Chapter 9 Correlation & Regression. Anthony J Feduccia MM207 Statistics. Chapter Outline. 9.1 Correlation 9.2 Linear Regression Skip 9.3 and 9.4. Correlation. Correlation A relationship between two variables. - PowerPoint PPT Presentation

Citation preview

Page 1: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

WELCOME TO SEMINAR 8 Chapter 9

Correlation & Regression

Example Scatterplot

Anthony J Feduccia MM207 Statistics

Page 2: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Chapter Outline

• 9.1 Correlation

• 9.2 Linear Regression

• Skip 9.3 and 9.4

2Larson/Farber 4th ed.

Page 3: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation

Correlation

• A relationship between two variables.

• The data can be represented by ordered pairs (x, y) – x is the independent (or explanatory) variable

– y is the dependent (or response) variable

3Larson/Farber 4th ed.

Page 4: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Example: Constructing a Scatter Plot

A marketing manager conducted a study to determine whether there is a linear relationship between money spent on advertising and company sales. The data are shown in the table. Display the data in a scatter plot and determine whether there appears to be a positive or negative linear correlation or no linear correlation.

Advertisingexpenses,($1000), x

Companysales

($1000), y2.4 2251.6 1842.0 2202.6 2401.4 1801.6 1842.0 1862.2 215

4Larson/Farber 4th ed.

Page 497 Example 1

Page 5: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Solution: Constructing a Scatter Plot

x

y

Advertising expenses(in thousands of dollars)

Co

mp

any

sa

les

(in

thou

san

ds

of d

olla

rs)

Appears to be a positive linear correlation. As the advertising

expenses increase, the sales tend to increase.5

Page 6: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation CoefficientCorrelation coefficient• A measure of the strength and the direction of a linear relationship between two

variables. • The symbol r represents the sample correlation coefficient. • A formula for r is

• The population correlation coefficient is represented by ρ (rho).

n is the number of data pairs

6Larson/Farber 4th ed.

Page 7: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation Coefficient

• The range of the correlation coefficient is -1 to 1.

-1 0 1

If r = -1 there is a perfect negative

correlation

If r = 1 there is a perfect positive

correlation

If r is close to 0 there is no linear

correlation

7Larson/Farber 4th ed.

Page 8: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Linear Correlation

Strong negative correlation

Weak positive correlation

Strong positive correlation

Nonlinear Correlation

x

y

x

y

x

y

x

y

r = 0.91 r = 0.88

r = 0.42 r = 0.07

8Larson/Farber 4th ed.

Page 9: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Regression lines• After verifying that the linear correlation between two

variables is significant, next we determine the equation of the line that best models the data (regression line).

• Can be used to predict the value of y for a given value of x.

x

y

9Larson/Farber 4th ed.

Page 10: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Regression line (line of best fit)

• The line for which the sum of the squares of the residuals is a minimum.

• The equation of a regression line for an independent variable x and a dependent variable y is

ŷ = mx + b

Regression Line

Predicted y-value for a given x-value

Slopey-intercept

10

Page 11: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

The Equation of a Regression Line

• ŷ = mx + b where

• is the mean of the y-values in the data

• is the mean of the x-values in the data

• The regression line always passes through the point

22

n xy x ym

n x x

y xb y mx mn n

yx

,x y

11Larson/Farber 4th ed.

Page 12: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation Coefficient

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10

Test score, y

96 85 82 74 95 68 76 84 58 65 75 50

Example:The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday.a.) Display the scatter plot.b.) Calculate the correlation coefficient r.

Continued.

Page 13: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation Coefficient Using Excel

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10

Test score, y

96 85 82 74 95 68 76 84 58 65 75 50

Use Excel. Insert> Chart>Scatter Diagram

100

x

y

Hours watching TV

Tes

t sco

re 80

60

40

20

2 4 6 8 10

Go to Course Extras for Excel Template

Page 14: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation Coefficient

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10Test score, y

96 85 82 74 95 68 76 84 58 65 75 50

xy 0 85 164 222 285 340 380 420 348 455 525 500

x2 0 1 4 9 9 25 25 25 36 49 49 100

y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

The formula method:

2 22 2

n xy x yr

n x x n y y

22

12(3724) 54 908

12(332) 54 12(70836) 908

0.831

There is a strong negative linear correlation (-0.831).As the number of hours spent watching TV increases, the test scores tend to decrease.

54x 908y 3724xy 2 332x 2 70836y

Page 15: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Testing a Population Correlation Coefficient

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10

Test score, y

96 85 82 74 95 68 76 84 58 65 75 50

Example:The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday.

Continued.

The correlation coefficient r 0.831.

Is the correlation coefficient significant at = 0.01?

Page 16: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Hypothesis Testing for ρ

The t-Test for the Correlation CoefficientA t-test can be used to test whether the correlation between two variables is significant. The test statistic is r and the standardized test statistic

follows a t-distribution with n – 2 degrees of freedom.

2

2r

r rtσ 1 r

n

In this text, only two-tailed hypothesis tests for ρ are considered.

Page 17: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Hypothesis Testing for ρ

Example continued:

Ha: ρ 0 (significant correlation)H0: ρ = 0 (no correlation)

The level of significance is = 0.01.

Degrees of freedom are d.f. = 12 – 2 = 10.

The critical values are t0 = 3.169 and t0 = 3.169.

The standardized test statistic is

t0 = 3.169t

0 t0 = 3.169

2

0.831

1 ( 0.831)12 2

2

2

rt1 rn

4.72.

The test statistic falls in the rejection region, so H0 is rejected.

At the 1% level of significance, there is enough evidence to conclude that there is a significant linear correlation between the number of hours of TV watched over the weekend and the test scores on Monday morning.

Page 18: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Correlation and Causation

The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables.

If there is a significant correlation between two variables, you should consider the following possibilities.

1. Is there a direct cause-and-effect relationship between the variables?Does x cause y?

2. Is there a reverse cause-and-effect relationship between the variables?Does y cause x?

3. Is it possible that the relationship between the variables can be caused by a third variable or by a combination of several other variables?

4. Is it possible that the relationship between two variables may be a coincidence?

Page 19: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Regression LineExample:The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10Test score, y

96 85 82 74 95 68 76 84 58 65 75 50

xy 0 85 164 222 285 340 380 420 348 455 525 500

x2 0 1 4 9 9 25 25 25 36 49 49 100

y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 250054x 908y 3724xy 2 332x 2 70836y

a.) Find the equation of the regression line.b.) Use the equation to find the expected test score for a student who

watches 9 hours of TV.

Page 20: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

Regression LineExample continued:

22

n xy x ym

n x x

2

12(3724) 54 908

12(332) 54

4.067

b y mx 908 54( 4.067)12 12

93.97

ŷ = –4.07x + 93.97

100

x

y

Hours watching TV

Tes

t sco

re 80

60

40

20

2 4 6 8 10

54 908( , ) , 4.5,75.712 12

x y

Continued.

Use Excel instead. See Course Extras

Page 21: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression
Page 22: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression
Page 23: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression
Page 24: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression
Page 25: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression
Page 26: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

0 961

85

2 82

3 74

3 95

5 68

5 76

5 84

6 58

7 65

7 75

10 50

EXCEL OUTPUT

Page 27: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression

6 15 Guided Exercise 3 Page 143

20 31 Enter x ,y values in COL a and COL B respectively

0 10 Shade COLs A and B, then select Scatter Plot under Insert

14 16 Right Click on any dot on the scatter plot and select Add Trendline

25 28 Under Trendline options, select Linear then "Add Equation to Chart" then "Display r^2"

16 20 Click on the chart and notice Chart Tools appears at the top of the screen. Select Layout.

28 40 Title the Chart using Chart Title and the axes using Axes titles

18 25

10 12

8 15

Page 28: WELCOME TO SEMINAR  8   Chapter 9 Correlation & Regression