41
COMPLETE f o u r t h e d i t BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction Using the Computer Summary and Review of Terms Simple Linear Regression and Correlation 1 0

Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

  • Upload
    cian

  • View
    50

  • Download
    1

Embed Size (px)

DESCRIPTION

Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? - PowerPoint PPT Presentation

Citation preview

Page 1: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-1

• Using Statistics

• The Simple Linear Regression Model

• Estimation: The Method of Least Squares

• Error Variance and the Standard Errors of Regression Estimators

• Correlation

• Hypothesis Tests about the Regression Relationship

• How Good is the Regression?

• Analysis of Variance Table and an F Test of the Regression Model

• Residual Analysis and Checking for Model Inadequacies

• Use of the Regression Model for Prediction

• Using the Computer

• Summary and Review of Terms

Simple Linear Regression and Correlation10

Page 2: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-2

This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that:

Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising.

Scatterplot of Advertising Expenditures (X) and Sales (Y)

50403020100

140

120

100

80

60

40

20

0

Advertising

Sa

les

The scatter of points tends to be distributed around a positively sloped straight line.

The pairs of values of advertising expenditures and sales are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.

10-1 Using Statistics

Page 3: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-3

X

Y

X

Y

X 0

0

0

0

0

Y

X

Y

X

Y

X

Y

Examples of Other Scatterplots

Page 4: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-4

The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship.

A statistical model separates the systematic component of a relationship from the random component.

The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship.

A statistical model separates the systematic component of a relationship from the random component.

Data

Statistical model

Systematic component

+Random

errors

In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE).

In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line.

In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE).

In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line.

Model Building

Page 5: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-5

The population simple linear regression model:

Y= 0 + 1 X + Nonrandom or Random

Systematic Component Component

where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and is the error term, the only random component in the model, and thus, the only source of randomness in Y.

0 is the intercept of the systematic component of the regression relationship.1 is the slope of the systematic component.

The conditional mean of Y:

The population simple linear regression model:

Y= 0 + 1 X + Nonrandom or Random

Systematic Component Component

where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and is the error term, the only random component in the model, and thus, the only source of randomness in Y.

0 is the intercept of the systematic component of the regression relationship.1 is the slope of the systematic component.

The conditional mean of Y: E Y X X[ ] 0 1

10-2 The Simple Linear Regression Model

Page 6: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-6

The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi

Actual observed values of Y differ from the expected value by an unexplained or random error:

Yi = E[Yi] + i

= 0 + 1 Xi + i

The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi

Actual observed values of Y differ from the expected value by an unexplained or random error:

Yi = E[Yi] + i

= 0 + 1 Xi + i

X

Y

E[Y]=0 + 1 X

Xi

}} 1 = Slope

1

0 = Intercept

Yi

{Error: i

Regression Plot

Picturing the Simple Linear Regression Model

Page 7: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-7

• The relationship between X and Y is a straight-line relationship.

• The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i.

• The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2)

• The relationship between X and Y is a straight-line relationship.

• The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i.

• The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2) X

Y

E[Y]=0 + 1 X

Assumptions of the Simple Linear Regression Model

Identical normal distributions of errors, all centered on the regression line.

Assumptions of the Simple Linear Regression Model

Page 8: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-8

Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line.

The estimated regression equation: Y=b0 + b1X + e

where b0 estimates the intercept of the population regression line, 0 ;b1 estimates the slope of the population regression line, 1;and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points.

Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line.

The estimated regression equation: Y=b0 + b1X + e

where b0 estimates the intercept of the population regression line, 0 ;b1 estimates the slope of the population regression line, 1;and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points. The estimated regression line:

+

where Y (Y - hat) is the value of Y lying on the fitted regression line for a givenvalue of X.

Y b b X 0 1

The estimated regression line:

+

where Y (Y - hat) is the value of Y lying on the fitted regression line for a givenvalue of X.

Y b b X 0 1

10-3 Estimation: The Method of Least Squares

Page 9: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-9

Fitting a Regression Line

X

Y

Data

X

Y

Three errors from a fitted line

X

Y

Three errors from the least squares regression line

e

X

Errors from the least squares regression line are minimized

Page 10: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-10

.{Error ei Yi Yi

Yi the predicted value of Y for Xi

Y

X

Y b b X 0 1 the fitted regression line

Yi

Yi

Errors in Regression

Page 11: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-11

Least Squares Regression

The sum of squared errors in regression is:

SSE = e (y

The is that which the SSEwith respect to the estimates b and b .

The :

y x

x y x x

i

2

i=1

n

ii=1

n

0 1

ii=1

n

ii=1

n

i ii=1

n

ii=1

n

i

2

i=1

n

)y

nb b

b b

i

2

0 1

0 1

least squares regression line

normal equations

minimizes

b0SSE

b1

Least squares b0

Least squares b1

Page 12: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-12

Sums of Squares and Cross Products:

Least squares regression estimators:

SS x x xx

n

SS y y yy

n

SS x x y y xyx y

n

bSSSS

b y b x

x

y

xy

XY

X

( )

( )

( )( )( )

2 2

2

2 2

2

1

0 1

Sums of Squares and Cross Products:

Least squares regression estimators:

SS x x xx

n

SS y y yy

n

SS x x y y xyx y

n

bSSSS

b y b x

x

y

xy

XY

X

( )

( )

( )( )( )

2 2

2

2 2

2

1

0 1

Sums of Squares, Cross Products, and Least Squares Estimators

Page 13: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-13

Miles Dollars Miles 2 Miles*Dollars 1211 1802 1466521 2182222 1345 2405 1809025 3234725 1422 2005 2022084 2851110 1687 2511 2845969 4236057 1849 2332 3418801 4311868 2026 2305 4104676 4669930 2133 3016 4549689 6433128 2253 3385 5076009 7626405 2400 3090 5760000 7416000 2468 3694 6091024 9116792 2699 3371 7284601 9098329 2806 3998 7873636 11218388 3082 3555 9498724 10956510 3209 4692 10297681 15056628 3466 4244 12013156 14709704 3643 5298 13271449 19300614 3852 4801 14837904 18493452 4033 5147 16265089 20757852 4267 5738 18207288 24484046 4498 6420 20232004 28877160 4533 6059 20548088 27465448 4804 6426 23078416 30870504 5090 6321 25908100 32173890 5233 7026 27384288 36767056 5439 6964 29582720 3787719679498 10605 293426944 390185024

Miles Dollars Miles 2 Miles*Dollars 1211 1802 1466521 2182222 1345 2405 1809025 3234725 1422 2005 2022084 2851110 1687 2511 2845969 4236057 1849 2332 3418801 4311868 2026 2305 4104676 4669930 2133 3016 4549689 6433128 2253 3385 5076009 7626405 2400 3090 5760000 7416000 2468 3694 6091024 9116792 2699 3371 7284601 9098329 2806 3998 7873636 11218388 3082 3555 9498724 10956510 3209 4692 10297681 15056628 3466 4244 12013156 14709704 3643 5298 13271449 19300614 3852 4801 14837904 18493452 4033 5147 16265089 20757852 4267 5738 18207288 24484046 4498 6420 20232004 28877160 4533 6059 20548088 27465448 4804 6426 23078416 30870504 5090 6321 25908100 32173890 5233 7026 27384288 36767056 5439 6964 29582720 3787719679498 10605 293426944 390185024

SSx xx

n

SSxy xyx y

n

bSS XY

SS X

b y b x

22

29342694479448

2

2540947552

390185024106605

2

2551402848

151402848

409475521 255333776 1 26

0 1106605

251 255333776)

79448

25

274 85

( )

. .

( .

.

SSx xx

n

SSxy xyx y

n

bSS XY

SS X

b y b x

22

29342694479448

2

2540947552

390185024106605

2

2551402848

151402848

409475521 255333776 1 26

0 1106605

251 255333776)

79448

25

274 85

( )

. .

( .

.

Example 10-1

Page 14: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-14

5500500045004000350030002500200015001000

M iles

Do

llars

8 000

7 000

6 000

5 000

4 000

3 000

2 000

1 000

R-Squared = 0.965Y = 274.850 + 1.25533X

Regression of Dollars Charged against Miles

MTB > Regress 'Dollars' 1 'Miles';SUBC> Constant.

Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

MTB > Regress 'Dollars' 1 'Miles';SUBC> Constant.

Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Example 10-1: Using the Computer

Page 15: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-15

The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit.

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.98243393R Square 0.965176428Adjusted R Square 0.963662359Standard Error 318.1578225Observations 25

ANOVAdf SS MS F Significance F

Regression 1 64527736.8 64527736.8 637.4721586 2.85084E-18Residual 23 2328161.201 101224.4Total 24 66855898

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 274.8496867 170.3368437 1.61356569 0.120259309 -77.51844165 627.217815 -77.51844165 627.217815MILES 1.255333776 0.049719712 25.248211 2.85084E-18 1.152480856 1.358186696 1.152480856 1.358186696

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.98243393R Square 0.965176428Adjusted R Square 0.963662359Standard Error 318.1578225Observations 25

ANOVAdf SS MS F Significance F

Regression 1 64527736.8 64527736.8 637.4721586 2.85084E-18Residual 23 2328161.201 101224.4Total 24 66855898

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 274.8496867 170.3368437 1.61356569 0.120259309 -77.51844165 627.217815 -77.51844165 627.217815MILES 1.255333776 0.049719712 25.248211 2.85084E-18 1.152480856 1.358186696 1.152480856 1.358186696

Example 10-1: Using Computer-Excel

Page 16: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-16

Residual Analysis. The plot shows the absence of a relationshipbetween the residuals and the X-values (miles).

Residuals vs. Miles

-800

-600

-400

-200

0

200

400

600

0 1000 2000 3000 4000 5000 6000

Miles

Re

sid

ua

ls

Example 10-1: Using Computer-Excel

Page 17: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-17

Y

X

What you see when looking at the total variation of Y.

X

What you see when looking along the regression line at the error variance of Y.

Y

Total Variance and Error Variance

Page 18: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-18

Degrees of Freedom in Regression:

An unbiased estimator of s2

, denoted by S2

:

df = (n - 2) (n total observations less one degree of freedom

for each parameter estimated (b0 and b1) )

= ( - )

=

MSE =SSE

(n - 2)

SSE Y Y SSY

SS XY

SS XSSY b SS XY

( )2

2

1

X

Y

Square and sum all regression errors to find SSE.

Example 10 - 1:

SSE SSY b SS XY

MSESSE

n

s MSE

=

166855898 1 255333776 51402852 4

2328161 2

2

2328161 2

23101224 4

101224 4 318 158

( . )( . )

.

.

.

. .

10-4 Error Variance and the Standard Errors of Regression Estimators

Page 19: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-19

The standard error of (intercept)

where s = MSE

The standard error of (slope)

0

1

b

s bs x

nSS

b

s bs

SS

X

X

:

( )

:

( )

0

2

1

The standard error of (intercept)

where s = MSE

The standard error of (slope)

0

1

b

s bs x

nSS

b

s bs

SS

X

X

:

( )

:

( )

0

2

1

Example 10 - 1:

s bs x

nSS X

s bs

SS X

( )

.

( )( . ).

( )

.

..

0

2

318 158 293426944

25 4097557 84170 338

1

318 158

40947557 840 04972

Example 10 - 1:

s bs x

nSS X

s bs

SS X

( )

.

( )( . ).

( )

.

..

0

2

318 158 293426944

25 4097557 84170 338

1

318 158

40947557 840 04972

Standard Errors of Estimates in Regression

Page 20: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-20

A (1 - ) 100% confidence interval for b0

A (1 - ) 100% confidence interval for b1

:

,( )( )

:

,( )( )

b tn

s b

b tn

s b

02

2 0

12

2 1

Example 10 - 195% Confidence Intervals:b t s b

b t s b

0 0 025 25 2 0

0 025 25 2

170 33827485 352 43

7758 627 28

01 25533 010287115246 1 35820

1 1

. ,( ) ( )

. ,( ) ( )

( . ). .

[ . , . ]

( ). .

[ . , . ]

= 274.85 2.069) (

= 1.25533 2.069) ( .04972

Example 10 - 195% Confidence Intervals:b t s b

b t s b

0 0 025 25 2 0

0 025 25 2

170 33827485 352 43

7758 627 28

01 25533 010287115246 1 35820

1 1

. ,( ) ( )

. ,( ) ( )

( . ). .

[ . , . ]

( ). .

[ . , . ]

= 274.85 2.069) (

= 1.25533 2.069) ( .04972

Length = 1H

eight = Slope

Least-squares point estimate:b1=1.25533

Upper

95%

bou

nd o

n slo

pe: 1

.358

20

Lower 95% bound: 1

.15246

(not a possible value of the regression slope at 95%)

0

Confidence Intervals for the Regression Parameters

Page 21: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-21

The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables.

The population correlation, denoted by, can take on any value from -1 to 1.

indicates a perfect negative linear relationship-1< <0 indicates a negative linear relationship indicates no linear relationship0< <1 indicates a positive linear relationshipindicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.

indicates a perfect negative linear relationship-1< <0 indicates a negative linear relationship indicates no linear relationship0< <1 indicates a positive linear relationshipindicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.

10-5 Correlation

Page 22: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-22

Y

X

=0

Y

X

=-.8 Y

X

=.8Y

X

=0

Y

X

=-1 Y

X

=1

Illustrations of Correlation

Page 23: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-23

The sample correlation coefficient*:

=rSS

XYSS

XSS

Y

The population correlation coefficient:

=

Cov X Y

X Y

( , )

The covariance of two random variables X and Y: where X and are the population means of X and Y respectivelyY .

Cov X Y E X X Y Y( , ) [( )( )]

Example 10 - 1:

=

rSS

XYSS

XSS

Y

51402852.4

40947557.84 6685589851402852.4

52321943 299824

( )( )

..

*Note: If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0

Covariance and Correlation

Page 24: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-24

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.992265946R Square 0.984591707Adjusted R Square 0.98266567Standard Error 0.279761372Observations 10

ANOVAdf SS MS F Significance F

Regression 1 40.0098686 40.0098686 511.2009204 1.55085E-08Residual 8 0.626131402 0.078266425Total 9 40.636

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -8.762524695 0.594092798 -14.74942084 4.39075E-07 -10.13250603 -7.39254336 -10.13250603 -7.39254336US 1.423636087 0.062965575 22.60975277 1.55085E-08 1.278437117 1.568835058 1.278437117 1.568835058

RESIDUAL OUTPUT

Observation Predicted Y Residuals1 2.057109569 0.2428904312 2.484200395 0.1157996053 3.05365483 -0.153654834 3.480745656 -0.2807456565 3.765472874 -0.0654728746 4.050200091 0.0497999097 4.619654526 0.1803454748 5.758563396 -0.0585633969 7.466926701 -0.466926701

10 8.463471962 0.436528038

X Variable 1 Line Fit Plot

0

5

10

0 5 10 15

X Variable 1

Y

Y

Predicted Y

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.992265946R Square 0.984591707Adjusted R Square 0.98266567Standard Error 0.279761372Observations 10

ANOVAdf SS MS F Significance F

Regression 1 40.0098686 40.0098686 511.2009204 1.55085E-08Residual 8 0.626131402 0.078266425Total 9 40.636

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -8.762524695 0.594092798 -14.74942084 4.39075E-07 -10.13250603 -7.39254336 -10.13250603 -7.39254336US 1.423636087 0.062965575 22.60975277 1.55085E-08 1.278437117 1.568835058 1.278437117 1.568835058

RESIDUAL OUTPUT

Observation Predicted Y Residuals1 2.057109569 0.2428904312 2.484200395 0.1157996053 3.05365483 -0.153654834 3.480745656 -0.2807456565 3.765472874 -0.0654728746 4.050200091 0.0497999097 4.619654526 0.1803454748 5.758563396 -0.0585633969 7.466926701 -0.466926701

10 8.463471962 0.436528038

X Variable 1 Line Fit Plot

0

5

10

0 5 10 15

X Variable 1

Y

Y

Predicted Y

Example 10-2: Using Computer-Excel

Page 25: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-25

8 9 10 11 12

2

3

4

5

6

7

8

9

United States

Inte

rna

tiona

l

Y = -8.76252 + 1.42364X R-Sq = 0.9846

Regression Plot

Example 10-2: Regression Plot

Page 26: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-26

H0: =0 (No linear relationship)H1: 0 (Some linear relationship)

Test Statistic: tr

rn

n( )

2 212

Example 10 -1:

=0.98241- 0.9651

25- 2

=0.98240.0389

H rejected at 1% level0

tr

rn

t

n( )

.

.

. .

2 2

0 005

12

2525

2 807 2525

Example 10 -1:

=0.98241- 0.9651

25- 2

=0.98240.0389

H rejected at 1% level0

tr

rn

t

n( )

.

.

. .

2 2

0 005

12

2525

2 807 2525

Hypothesis Tests for theCorrelation Coefficient

Page 27: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-27

Y

X

Y

X

Y

X

Constant Y Unsystematic Variation Nonlinear Relationship

A hypothesis test for the existence of a linear relationship between X and Y:

H 0 H1Test statistic for the existence of a linear relationship between X and Y:

( - )

where is the least - squares estimate of the regression slope and ( ) is the standard error of .

When the null hypothesis is true, the statistic has a distribution with - degrees of freedom.

:

:

( )

1 0

1 0

2

1

1

1 1 12

tn

b

s b

b s b b

t n

Hypothesis Tests about the Regression Relationship

Page 28: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-28

Example 10 - 1:

H 0 H1

=1.25533

0.04972

H 0 is rejected at the 1% level and we may

conclude that there is a relationship between

charges and miles traveled.

( - )

:

:

( )

.

. .( . , )

1 0

1 0

1

1

25 25

2 807 25 25

2

0 005 23

t

b

s b

t

n

Example 10 - 3:

H0 H1

=1.24 - 1

0.21

H0 is not rejected at the 10% level.

We may not conclude that the beta

coefficient is different from 1.

( - )

:

:

( )

.

. .( . , )

1 1

1 1

11

1

114

1 671 114

2

0 05 58

t

b

s b

t

n

Hypothesis Tests for the Regression Slope

Page 29: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-29

The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.

.{

Y

X

Y

Y

Y

X

{}Total Deviation

Explained Deviation

Unexplained Deviation

Total = Unexplained ExplainedDeviation Deviation Deviation (Error) (Regression)

SST = SSE + SSR

r2

( ) ( ) ( )

( ) ( ) ( )

y y y y y y

y y y y y y

SSR

SST

SSE

SST

2 2 2

1Percentage of total variation explained by the regression.

10-7 How Good is the Regression?

Page 30: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-30

Y

X

r2=0 SSE

SST

Y

X

r2=0.90SSE

SST

SSR

Y

X

r2=0.50 SSE

SST

SSR

Example 10 -1:

r 2 SSRSST

64527736 866855898

0 96518.

.

5500500045004000350030002500200015001000

7000

6000

5000

4000

3000

2000

Miles

Dol

lar s

The Coefficient of Determination

Page 31: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-31

10-8 Analysis of Variance and an F Test of the Regression Model

Example 10-1

Source ofVariation

Sum ofSquares

Degrees ofFreedom

Mean SquareF Ratio p Value

Regression 64527736.8 1 64527736.8 637.47 0.000

Error 2328161.2 23 101224.4

Total 66855898.0 24

Example 10-1

Source ofVariation

Sum ofSquares

Degrees ofFreedom

Mean SquareF Ratio p Value

Regression 64527736.8 1 64527736.8 637.47 0.000

Error 2328161.2 23 101224.4

Total 66855898.0 24

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (1) MSR MSRMSE

Error SSE (n-2) MSE

Total SST (n-1) MST

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (1) MSR MSRMSE

Error SSE (n-2) MSE

Total SST (n-1) MST

Page 32: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-32

x or y

0

Residuals

Homoscedasticity: Residuals appear completely random. No indication of model inadequacy.

0

Residuals

Curved pattern in residuals resulting from underlying nonlinear relationship.

0

Residuals

Residuals exhibit a linear trend with time.

Time

0

Residuals

Heteroscedasticity: Variance of residuals changes when x changes.

x or y

x or y

10-9 Residual Analysis and Checking for Model Inadequacies

Page 33: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-33

• Point Prediction– A single-valued estimate of Y for a given value of X

obtained by inserting the value of X in the estimated regression equation.

• Prediction Interval – For a value of Y given a value of X

• Variation in regression line estimate

• Variation of points around regression line

– For an average value of Y given a value of X• Variation in regression line estimate

• Point Prediction– A single-valued estimate of Y for a given value of X

obtained by inserting the value of X in the estimated regression equation.

• Prediction Interval – For a value of Y given a value of X

• Variation in regression line estimate

• Variation of points around regression line

– For an average value of Y given a value of X• Variation in regression line estimate

10-10 Use of the Regression Model for Prediction

Page 34: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-34

X

Y

X

Y

Regression line

Upper limit on slope

Lower limit on slope

1) Uncertainty about the slope of the regression line

X

Y

X

Y

Regression lineUpper limit on intercept

Lower limit on intercept

2) Uncertainty about the intercept of the regression line

Errors in Predicting E[Y|X]

Page 35: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-35

X

Y

X

Prediction Interval for E[Y|X]

Y

Regression line

• The prediction band for E[Y|X] is narrowest at the mean value of X.

• The prediction band widens as the distance from the mean of X increases.

• Predictions become very unreliable when we extrapolate beyond the range of the sample itself.

• The prediction band for E[Y|X] is narrowest at the mean value of X.

• The prediction band widens as the distance from the mean of X increases.

• Predictions become very unreliable when we extrapolate beyond the range of the sample itself.

Prediction Interval for E[Y|X]

Prediction band for E[Y|X]

Page 36: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-36

Additional Error in Predicting Individual Value of Y

3) Variation around the regression line

X

YRegression line

X

Y

X

Prediction Interval for E[Y|X]

Y

Regression line

Prediction band for E[Y|X]

Prediction band for Y

Page 37: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-37

A (1- ) 100% prediction interval for Y:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

11

274.85 2.069 1125

4000 3177.9240947557.84

5296 05 676 62 4619.43 5972.67

A (1- ) 100% prediction interval for Y:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

11

274.85 2.069 1125

4000 3177.9240947557.84

5296 05 676 62 4619.43 5972.67

Prediction Interval for a Value of Y

Page 38: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-38

A (1- ) 100% prediction interval for the E[Y X]:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

1

274.85 2.069125

4000 3177.9240947557.84

5296 05 156 48 5139.57 5452.53

A (1- ) 100% prediction interval for the E[Y X]:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

1

274.85 2.069125

4000 3177.9240947557.84

5296 05 156 48 5139.57 5452.53

Prediction Interval for the Average Value of Y

Page 39: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-39

MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4;SUBC> predict 4000;SUBC> residuals in C5.Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8)

MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4;SUBC> predict 4000;SUBC> residuals in C5.Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8)

Using the Computer

Page 40: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-40

5500500045004000350030002500200015001000

500

0

-500

Miles

Res

ids

700060005000400030002000

500

0

-500

Fits

Res

ids

MTB > PLOT 'Resids' * 'Fits' MTB > PLOT 'Resids' *'Miles'

Plotting on the Computer (1)

Page 41: Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o nBUSINESS STATISTICS

AczelIrwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999

10-41

Plotting on the Computer (2)

MTB > HISTOGRAM 'StRes'

210-1-2

8

7

6

5

4

3

2

1

0

StRes

Fre

quen

cy

5500500045004000350030002500200015001000

7000

6000

5000

4000

3000

2000

MilesD

olla

rs

MTB > PLOT 'Dollars' * 'Miles'