33
uthor: Phillip E. Pfeifer 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly used to describe the relationship between two numerically-scaled variables (correlation and regression).

Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

Embed Size (px)

Citation preview

Page 1: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

Author: Phillip E. Pfeifer

© 2012 Phillip E. Pfeifer and Management by the Numbers, Inc.

Descriptive Statistics II

This module covers statistics commonly used to describe the relationship between two numerically-scaled variables (correlation and regression).

Page 2: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

2

TW

O K

IND

S O

F D

ES

CR

IPT

IVE

STA

TIS

TIC

STwo Kinds of Descriptive Statistics

MBTN | Management by the Numbers

• Measures of Central Tendency• Mean• Median• Mode

• Measures of Variability• Range (Maximum – Minimum)• Standard Deviation• Variance

Descriptive Statistics I covered these six statistical measures used to describe a single numerically-scaled variable. If we have two (or more) variables, we often begin by calculating and examining these statistics for each of the variables of interest.

Page 3: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

3

EX

AM

PLE

Example

MBTN | Management by the Numbers

Heights and Weights of 30 Students(in inches and pounds)*

• Using what we learned in Descriptive Statistics I, we can calculate (and interpret) summary statistics for height and weight.

• These calculations and interpretations are accomplished separately for the two variables.

• Our summary of height ignores weight and vice versa.

*http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/height-R.html

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 4: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

4

EX

AM

PLE

Example

MBTN | Management by the Numbers

Separate Summary Statistics for Height and Weight:

These descriptive statistics were discussed in module I

Notice that none of them measure anything about the relationship between height and weight.

Height   Weight     Sample Mean 57.07 Sample Mean 79.23Median 57 Median 74Mode 59 Mode 70Standard Deviation 3.07 Standard Deviation 17.00Sample Variance 9.44 Sample Variance 289.08Count 30 Count 30

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 5: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

5

TH

E S

CA

TT

ER

PLO

TThe Scatter Plot

MBTN | Management by the Numbers

• A great way to begin to examine the relationship between two variables, is to construct a scatter plot.

• The scatter plots of weight (on the Y-axis) versus height (on the X-axis) and height (on the Y-axis) versus weight (on the X-axis) both show that there is a positive relationship between these two variables.

• Students with greater heights tend to have greater weights (and vice versa).

0 10 20 30 40 50 60 700

20406080

100120140

Height (inches)

Wei

ght (

poun

ds)

50 60 70 80 90 100 110 120 1300

10203040506070

Weight (pounds)

Heig

ht (i

nche

s)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 6: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

6

TH

E S

CA

TT

ER

PLO

TThe Scatter Plot

MBTN | Management by the Numbers

• Many of us might say that the relationship between the two variables looks “stronger” in the left plot compared to the right plot.

• But that is nonsense given that both charts plot the same 30 pairs of data.

• The “problem” is one of scaling. Changing the scales on the axes changes the “look” of the plot.

0 10 20 30 40 50 60 700

20406080

100120140

Height (inches)

Wei

ght (

poun

ds)

50 60 70 80 90 100 110 120 1300

10203040506070

Weight (pounds)He

ight

(inc

hes)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 7: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

7

TH

E S

CA

TT

ER

PLO

TThe Scatter Plot

MBTN | Management by the Numbers

The first scatter plots

New scatter plots created by changing the scale of the axes. By changing the scales, notice how the height vs. weight looks “stronger” than weight vs. height, the opposite of the “look” above.

0 10 20 30 40 50 60 700

20406080

100120140

Height (inches)

Wei

ght (

poun

ds)

50 60 70 80 90 100 110 120 1300

10203040506070

Weight (pounds)

Heig

ht (i

nche

s)

30 35 40 45 50 55 60 65 70

-440

-340

-240

-140

-40

60

Height (inches)

Wei

ght (

poun

ds)

50 60 70 80 90 100 110 120 13045

50

55

60

65

70

Weight (pounds)

Heig

ht (i

nche

s)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 8: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

8

TH

E S

CA

TT

ER

PLO

TThe Scatter Plot

MBTN | Management by the Numbers

• The “look” of a scatter plots depends on the scales used for the axes.

• Be aware of this as you interpret scatter plots (and charts in general)

• As a consequence, we want/need a statistic that measures the direction of (positive or negative or zero) and amount/strength of the relationship between two variables.

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

• If high values of X tend to be paired with high values of Y (and vice versa) the sign of the statistic should be positive (and vice versa)

• If the value of X is of no help in predicting Y (and vice versa), the statistic should equal zero.

• If the value of X is a perfect predictor of Y (and vice versa) the statistic should be either +1 or -1. In which case, all the points in the scatter plot will fall on a straight line.

Page 9: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

9

TH

E C

OR

RE

LAT

ION

CO

EF

FIC

IEN

TThe Correlation Coefficient

MBTN | Management by the Numbers

Insights

The correlation coefficient measures both the direction and strength of the relationship between two numerically-scaled variables.

The correlation of X and Y equals the correlation of Y and X.

The correlation doesn’t depend on the scales used in the scatter plot, and doesn’t even depend on the scales used to measure the variables.

• Convert the heights to centimeters and/or the weights to kilos, and the correlation coefficient won’t change.

Definition

Correlation Coefficient =

Excel Function = Correl(Array1,Array2)

Page 10: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

10

TH

E C

OR

RE

LAT

ION

CO

EF

FIC

IEN

TThe Correlation Coefficient

MBTN | Management by the Numbers

For this data set, the correlation of Height and Weight is 0.72.

• It is positive, as expected.

• And it appears to be high (close to one)

0 10 20 30 40 50 60 700

20406080

100120140

Height (inches)

Wei

ght (

poun

ds)

50 60 70 80 90 100 110 120 1300

10203040506070

Weight (pounds)

Heig

ht (i

nche

s)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 11: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

11

EX

AM

PLE

CO

RR

ELA

TIO

N C

OE

FF

ICIE

NT

SExample Correlation Coefficients

MBTN | Management by the Numbers

3 4 5 6 7 8 9 10 11 12 130

2

4

6

8

10

X

Y

3 4 5 6 7 8 9 10 11 12 13012345678

X

Y

3 4 5 6 7 8 9 10 11 12 1302468

101214

X

Y

3 4 5 6 7 8 9 10 11 12 1302468

101214

X

Y

-1 -0.6

+0.6 +1

03 4 5 6 7 8 9 10 11 12 13

0

2

4

6

8

10

X

Y

Page 12: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

12

PO

INT

S LIN

ED

UP

ON

A F

LAT

LINE

?Points Lined Up on a Flat Line?

MBTN | Management by the Numbers

3 4 5 6 7 8 9 10 11 12 130

1

2

3

4

5

6

X

Y

Points all on a line; the correlation should be 

+1 or -1?

But the line is flat; correlation coefficient 

should be 0?

Math to the rescue; the correlation is 0/0 which 

is UNDEFINED.

InsightIn order to measure the relationship between two variables, both have to exhibit variability. Since Y was always 5, we can’t tell whether Y goes up or down with X. Y never changed!!

Page 13: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

13

PO

INT

S LIN

ED

UP

ON

A C

UR

VE

D LIN

E?

Points Lined Up on a Curved Line?

MBTN | Management by the Numbers

X appears to be a PERFECT predictor of Y.

However, the relationship is NOT 

linear.

The correlation coefficient for these 

data is ZERO!

Insight

The correlation coefficient measures the direction and strength of a LINEAR relationship between X and Y.

Because the best straight line through these data is a flat one, X and Y are uncorrelated.

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

120

140

160

180

200

X

Y

Page 14: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

14

CO

RR

ELA

TIO

N V

ER

SU

S C

AU

SA

TIO

NCorrelation Versus Causation

MBTN | Management by the Numbers

The correlation coefficient measures the direction and strength of a possible LINEAR relationship between X and Y in the observed data.

• Just because X and Y moved together in the past does not mean X caused Y or that Y caused X.

• Both could have been caused by something else (Z?)• They could have moved together just by chance.

• Just because X and Y are uncorrelated, does not mean that X might not have caused Y.

• Refer to the previous slide. If X causes Y in a nonlinear manner, the correlation can come out to be zero.

Page 15: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

15

PR

OP

ER

TIE

S O

F T

HE

CO

RR

ELA

TIO

N C

OE

FF

ICIE

NT

Properties of the Correlation Coefficient

MBTN | Management by the Numbers

• The scales used for the chart and the scales used for the variables (pounds or kilogram, cm or inches) do not change the correlation coefficient as discussed in previous slides.

• In Descriptive Statistics I we learned how adding and multiplying by constants changed the descriptive statistics (mean, standard deviation, range, etc), but how does this affect the correlation coefficient?

Page 16: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

16

PR

OP

ER

TIE

S O

F T

HE

CO

RR

ELA

TIO

N C

OE

FF

ICIE

NT

Properties of the Correlation Coefficient

MBTN | Management by the Numbers

What happens to the correlation coefficient if we add and/or multiply X and/or Y by some non-zero constants?

If X and Y are positively correlated, X and –Y will be negatively correlated.

• Adding a constant to X and/or Y will not change the correlation coefficient.

• Multiplying X and/or Y by a constant can change the sign of the correlation coefficient (if we multiply be a negative constant) but not the magnitude.

Page 17: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

17

TH

E C

OR

RE

LAT

ION

CO

EF

FIC

IEN

TThe Correlation Coefficient

MBTN | Management by the Numbers

Question 1: The correlation coefficient for the height and weight data was 0.72. If the device used to measure these weights under-stated each weight by 5 pounds, what will be the correlation between height and the corrected weights?

Answer:

0.72!

Adding 5 to each weight will not change the correlation coefficient.

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 18: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

18

TH

E C

OR

RE

LAT

ION

CO

EF

FIC

IEN

TThe Correlation Coefficient

MBTN | Management by the Numbers

Question 2: Over the course of a year, each student’s height increased 5% and each weight increased 2%. What is the new correlation coefficient?

Answer:

0.72!

Multiplying all heights by 1.05 and all weights by 1.02 will not change the correlation coefficient.

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 19: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

19

TH

E C

OR

RE

LAT

ION

CO

EF

FIC

IEN

TThe Correlation Coefficient

MBTN | Management by the Numbers

Question 3: If the tallest student loses 5 pounds and the shortest student gains 5 pounds, what will happen to the correlation coefficient?

Answer:

It will be less than 0.72.

Since the data started out being positively correlated, moving the Y value for a high X down and the Y value for a low X up will make scatter plot flatter. The correlation coefficient will be less than 0.72.

In contrast, if the tallest student gained weight and/or the shortest student lost weight, the correlation coefficient would increase.

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 20: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

20

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

• The correlation between height and weight was 0.72.

• So we know the relationship is positive (taller students tend to weigh more), and 0.72 measures the “strength” of the relationship.

• But other than as a relative measure of “strength” is there any other direct use for the correlation coefficient?

Not Really!

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

Height (inches)

Wei

ght (

poun

ds)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 21: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

21

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

• So if the correlation coefficient left you longing for something a little more useful….you are going to like the regression line.

• Since height and weight are correlated, we should be able to use one to help predict the other.

• The regression line is the way to accomplish that prediction task.

• If a new student is 61 inches tall, how can we predict what that student will weigh?

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

Height (inches)

Wei

ght (

poun

ds)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 22: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

22

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

If a new student is 61 inches tall, how can we predict what that student will weigh?

• One approach would be to predict (110+79)/2=94.5 pounds. This is the average weight of the two students who were 61 inches tall.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

Height (inches)

Wei

ght (

poun

ds)

One 61-inch tall student weighed 110 pounds.

The other 61-inch tall student weighed 79 

pounds.

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 23: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

23

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

If a new student is 61 inches tall, how can we predict what that student will weigh?

• But rather than base the prediction off of only 2 data values, a regression line lets us use ALL the data.

• If you have charted the data in Excel, it is very easy to find the regression line.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

Height (inches)

Wei

ght (

poun

ds)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 24: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

24

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Finding the Regression Line

• Right-click on the charted data.

• Select “Add Trendline”

• Select “Display Equation on chart” and “Display R-squared value on chart”.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

Height (inches)

Wei

ght (

poun

ds)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 25: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

25

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Finding the Regression Line

• We ran a regression of weight on height using “add trendline” option on the graph in Excel.*

• Weight was the Y or dependent variable

• Height was the X or independent variable

• We regressed weight on height to find an equation (the regression line) that can be used to predict weight based on height.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

f(x) = 3.97103213242454 x − 147.38023369036R² = 0.515142660874044

Height (inches)

Wei

ght (

poun

ds)

The regression line!(Excel Output)

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

*Note that there is an alternative way to run regression that provides a more complete set of regression output using the Excel Analysis Toolpak Add-in, but that is beyond the scope of this module.

Page 26: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

26

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Finding the Regression Line

• Predicted weight = 3.971 * height – 147.38

• For the new student…

• Predicted weight = 3.971 * 61 – 147.38

• Therefore, predicted weight = 94.9 pounds.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

f(x) = 3.97103213242454 x − 147.38023369036R² = 0.515142660874044

Height (inches)

Wei

ght (

poun

ds)

The regression line!

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 27: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

27

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Predicted Weight = 3.971 * Height – 147.38

• The 3.971 number is called the regression coefficient.

• The -147.39 is called the regression intercept.

• If X and Y are positively correlated, the regression coefficient will be positive (and vice versa)

• If X and Y are negatively correlated, the regression coefficient will be negative.

• If X and Y are UN-correlated, the regression coefficient will be zero.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

f(x) = 3.97103213242454 x − 147.38023369036R² = 0.515142660874044

Height (inches)

Wei

ght (

poun

ds)

The regression line!

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 28: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

28

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Predicted Weight = 3.971 * Height – 147.38

• We also asked excel to calculate and display the R-squared for the regression.

• For this regression, R-squared was 0.5151.

• R-squared is also a measure of the strength of the linear relationship.

• So both the correlation coefficient and R-squared measure the strength of the linear relationship? Why do we need two?

• We don’t. One is simply the square of the other.

• The square of the correlation coefficient is the R-squared.

• 0.718^2 = 0.515

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

f(x) = 3.97103213242454 x − 147.38023369036R² = 0.515142660874044

Height (inches)

Wei

ght (

poun

ds) The R-squared

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 29: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

29

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Predicted Weight = 3.971 * Height – 147.38

• One way to think about the correlation coefficient is as a summary of the regression of Y on X.

• The sign of the correlation coefficient tells you the sign of the regression coefficient.

• And the square of the correlation coefficient tells you the R-squared of the regression….a measure of the ability of the regression line to predict Y.

50 52 54 56 58 60 62 64 660

20

40

60

80

100

120

140

f(x) = 3.97103213242454 x − 147.38023369036R² = 0.515142660874044

Height (inches)

Wei

ght (

poun

ds)

The R-squared is the square of the 

correlation coefficient

Student Height Weight1 53 572 57 723 60 1214 61 1105 55 706 52 557 59 978 54 689 56 7910 57 7711 56 6112 54 6113 61 7914 59 10515 61 7916 52 6817 59 7418 56 7019 65 10320 57 8121 59 10122 58 7923 60 10324 55 7225 56 9226 58 7027 59 7028 56 6329 54 7430 53 66

Page 30: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

30

TH

E R

EG

RE

SS

ION

LINE

: EX

AM

PLE

The Regression Line: Example

MBTN | Management by the Numbers

Question 4: A new student is surprisingly short…just 50 inches tall. What is the predicted weight of this new student based on the above regression line?

Answer:

Just substitute X=50 into the regression equation.

Predicted Weight = 3.971 * 50 – 147.38 = 51.2 pounds.

Regression Line: Predicted Weight = 3.971 * Height – 147.38

Page 31: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

31

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Question 5: A new student is of average height (57.07 inches from the summary statistics given earlier). Will this new student weigh more or less than the average?

Answer:

Substitute X=57.07 into the regression equation.

Predicted Weight = 3.971 * 57.07 – 147.38 = 79.23 pounds.

The sample mean weight of the 30 students was also 79.23 pounds.

THIS IS NOT A COINCIDENCE!

The regression prediction for the sample mean X is ALWAYS the sample mean of the Y’s.

Regression Line: Predicted Weight = 3.971 * Height – 147.38

Page 32: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

32

TH

E R

EG

RE

SS

ION

LINE

The Regression Line

MBTN | Management by the Numbers

Question 6: Using the same 30 data points, suppose we regress height on weight (rather then weight on height). Will the resulting regression coefficient be positive, negative, or zero? What will be the resulting R-squared?

Answer:

Because the correlation between X and Y is the same as the correlation between Y and X, the regression of Y on X has the same R-squared as the regression of X on Y.

R-squared for the new regression will be 0.515

The coefficient will be positive (because the variables are positively correlated)….but will not equal 1/3.971. To find the new coefficient, one has to run the regression.

Regression Line: Predicted Weight = 3.971 * Height – 147.38

R-squared = 0.515

Page 33: Author: Phillip E. Pfeifer © 2012 Phillip E. Pfeifer and Management by the Numbers, Inc. Descriptive Statistics II This module covers statistics commonly

33

Any Introductory Statistics Book such as Introductory Statistics (9th Edition), Neil. A. Weiss, Pearson Publishing, 2010.

DE

SC

RIP

TIV

E S

TAT

IST

ICS

– FU

RT

HE

R R

EF

ER

EN

CE

Descriptive Statistics - Further Reference

MBTN | Management by the Numbers