106

6-1 Introduction To Empirical Models

  • Upload
    marty

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

6-1 Introduction To Empirical Models. 6-1 Introduction To Empirical Models. Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-line relationship:. - PowerPoint PPT Presentation

Citation preview

Page 1: 6-1 Introduction To Empirical Models
Page 2: 6-1 Introduction To Empirical Models
Page 3: 6-1 Introduction To Empirical Models
Page 4: 6-1 Introduction To Empirical Models

6-1 Introduction To Empirical Models

Page 5: 6-1 Introduction To Empirical Models

6-1 Introduction To Empirical Models

Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-line relationship:

where the slope and intercept of the line are called regression coefficients.The simple linear regression model is given by

where is the random error term.

Page 6: 6-1 Introduction To Empirical Models

6-1 Introduction To Empirical Models

Page 7: 6-1 Introduction To Empirical Models

We think of the regression model as an empirical model.

Suppose that the mean and variance of are 0 and 2, respectively, then

The variance of Y given x is

6-1 Introduction To Empirical Models

Page 8: 6-1 Introduction To Empirical Models

• The true regression model is a line of mean values:

where 1 can be interpreted as the change in the mean of Y for a unit change in x.• Also, the variability of Y at a particular value of x

is determined by the error variance, 2.• This implies there is a distribution of Y-values at

each x and that the variance of this distribution is the same at each x.

6-1 Introduction To Empirical Models

Page 9: 6-1 Introduction To Empirical Models

6-1 Introduction To Empirical Models

Page 10: 6-1 Introduction To Empirical Models

6-1 Introduction To Empirical Models

A Multiple Linear Regression Model:

where = the intercept of the plane, = partial regression coefficients

Page 11: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.1 Least Squares Estimation• The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y.

• The expected value of Y at each level of x is a random variable:

• We assume that each observation, Y, can be described by the model

Page 12: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

• Suppose that we have n pairs of observations (x1, y1), (x2, y2), …, (xn, yn).

• The method of least squares is used to estimate the parameters, 0 and 1 by minimizing the sum of the squares of the vertical deviations in Figure 6-6.

Page 13: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

• Using Equation 6-8, the n observations in the sample can be expressed as

• The sum of the squares of the deviations of the observations from the true regression line is

Page 14: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 15: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 16: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 17: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 18: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 19: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 20: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 21: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 22: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Sums of Squares and Cross-products Matrix

The Sums of squares and cross-products matrix is a convenient way to summarize the quantities needed to do the hand calculations in regression. It also plays a key role in the internal calculations of the computer. It is outputed from PROC REG and PROC GLM if the XPX option is included on the model statement. The elements are

X’X

Intercept X Y

Interceptn

X

Y

Page 23: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.1 Least Squares Estimation

Page 24: 6-1 Introduction To Empirical Models

6-2 Simple Linear RegressionRegression Assumptions and Model Properties

Page 25: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Regression Assumptions and Model Properties

Page 26: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Regression and Analysis of Variance𝑆𝑆𝑇=𝑆𝑦𝑦=∑

𝑖=1

𝑛

(𝑦 𝑖− 𝑦 )2=∑𝑖=1

𝑛

𝑦2 −¿¿¿

Page 27: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Regression and Analysis of Variance

Page 28: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

OPTIONS NOOVP NODATE NONUMBER LS=80;DATA ex61;INPUT salt area @@;LABEL salt='Salt Conc' area='Roadway area';CARDS;3.8 0.19 5.9 0.15 14.1 0.57 10.4 0.414.6 0.7 14.5 0.67 15.1 0.63 11.9 0.4715.5 0.75 9.3 0.6 15.6 0.78 20.8 0.8114.6 0.78 16.6 0.69 25.6 1.3 20.9 1.0529.9 1.52 19.6 1.06 31.3 1.74 32.7 1.62PROC REG; MODEL salt=area/xpx r; PLOT salt*area; /* Scatter Plot*/RUN; QUIT;

Example 6-1

Page 29: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

salt = 2.6765 +17.547areaN 20 Rsq 0.9514AdjRsq0.9487RMSE 1.7907

Linear Regression of SALT vs AREASa

lt Con

c

0

5

10

15

20

25

30

35

Roadway area0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Page 30: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Linear Regression of SALT vs AREA The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc

Number of Observations Read 20 Number of Observations Used 20----------------------------------------------------------------------------------------- Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550

Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001

Linear Regression of SALT vs AREA The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y

Variable Label Intercept area salt Intercept Intercept 20 16.48 342.7 area Roadway area 16.48 17.2502 346.793 salt Salt Conc 342.7 346.793 7060.03-----------------------------------------------------------------------------------------

Page 31: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

SAS 시스템

The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc

Output Statistics

Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual -2-1 0 1 2 D

1 3.8000 6.0104 0.7152 -2.2104 1.642 -1.346 | **| | 0.172 2 5.9000 5.3085 0.7464 0.5915 1.628 0.363 | | | 0.014 3 14.1000 12.6781 0.4655 1.4219 1.729 0.822 | |* | 0.025 4 10.4000 9.6952 0.5633 0.7048 1.700 0.415 | | | 0.009 5 14.6000 14.9592 0.4168 -0.3592 1.741 -0.206 | | | 0.001 6 14.5000 14.4328 0.4255 0.0672 1.739 0.0386 | | | 0.000 7 15.1000 13.7309 0.4395 1.3691 1.736 0.789 | |* | 0.020 8 11.9000 10.9235 0.5194 0.9765 1.714 0.570 | |* | 0.015 9 15.5000 15.8365 0.4063 -0.3365 1.744 -0.193 | | | 0.001 10 9.3000 13.2045 0.4518 -3.9045 1.733 -2.253 | ****| | 0.173 11 15.6000 16.3629 0.4025 -0.7629 1.745 -0.437 | | | 0.005 12 20.8000 16.8893 0.4006 3.9107 1.745 2.241 | |**** | 0.132 13 14.6000 16.3629 0.4025 -1.7629 1.745 -1.010 | **| | 0.027 14 16.6000 14.7837 0.4195 1.8163 1.741 1.043 | |** | 0.032 15 25.6000 25.4872 0.5985 0.1128 1.688 0.0668 | | | 0.000 16 20.9000 21.1005 0.4527 -0.2005 1.732 -0.116 | | | 0.000 17 29.9000 29.3475 0.7639 0.5525 1.620 0.341 | | | 0.013 18 19.6000 21.2760 0.4571 -1.6760 1.731 -0.968 | *| | 0.033 19 31.3000 33.2077 0.9451 -1.9077 1.521 -1.254 | **| | 0.304 20 32.7000 31.1021 0.8449 1.5979 1.579 1.012 | |** | 0.147

Sum of Residuals 0 Sum of Squared Residuals 57.71626 Predicted Residual SS (PRESS) 70.97373

Page 32: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-TestsSuppose we wish to test

An appropriate test statistic would be

Page 33: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-Tests

We would reject the null hypothesis if

Page 34: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-TestsSuppose we wish to test

An appropriate test statistic would be

Page 35: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-Tests

We would reject the null hypothesis if

Page 36: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-TestsAn important special case of the hypotheses of Equation 6-23 is

These hypotheses relate to the significance of regression.

Failure to reject H0 is equivalent to concluding that there is no linear relationship between x and Y.

Page 37: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-Tests

Page 38: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

Use of t-Tests

Page 39: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

The Analysis of Variance Approach

Page 40: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.2 Testing Hypothesis in Simple Linear Regression

The Analysis of Variance Approach

Page 41: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.3 Confidence Intervals in Simple Linear Regression

Page 42: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.3 Confidence Intervals in Simple Linear Regression

Page 43: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Page 44: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.4 Prediction of Future Observations

Page 45: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.4 Prediction of Future Observations

Page 46: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy• Fitting a regression model requires several

assumptions.

1. Errors are uncorrelated random variables with mean zero;

2. Errors have constant variance; and,

3. Errors be normally distributed.• The analyst should always consider the validity of

these assumptions to be doubtful and conduct analyses to examine the adequacy of the model

Page 47: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy

• The residuals from a regression model are ei = yi - ŷi , where yi is an actual observation and ŷi is the corresponding fitted value from the regression model.

• Analysis of the residuals is frequently helpful in checking the assumption that the errors are approximately normally distributed with constant variance, and in determining whether additional terms in the model would be useful.

Page 48: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy

• As an approximate check of normality, construct a frequency histogram or a normal probability plot of residuals.

• Standardize the residuals by computing , i = 1, 2, …, n. If the errors are normally distributed, approximately 95% of the standardized residuals should fall in the interval (.

Page 49: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy• Plot the residuals (1) in time sequence (if known), (2)

against the , and (3) against the independent variable .

,

Page 50: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy

Page 51: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.5 Checking Model Adequacy

Page 52: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression6-2.5 Checking Model Adequacy

Page 53: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

OPTIONS NOOVP NODATE NONUMBER LS=120;DATA ex61;INPUT salt area @@;LABEL salt='Salt Conc' area='Roadway area';CARDS;3.8 0.19 5.9 0.15 14.1 0.57 10.4 0.414.6 0.7 14.5 0.67 15.1 0.63 11.9 0.4715.5 0.75 9.3 0.6 15.6 0.78 20.8 0.8114.6 0.78 16.6 0.69 25.6 1.3 20.9 1.0529.9 1.52 19.6 1.06 31.3 1.74 32.7 1.62PROC REG; MODEL salt=area/CLB CLM CLI; /* CLB for (100-) % confidence limits for the parameter estimates, CLM for (100-) % confidence limits for the expected value of the dependent variable, CLI for (100-) % confidence limits for an individual predicted value */ PLOT salt*area/pred conf; /* pred for (100-) % prediction intervals for individual responses, conf for (100-) % confidence intervals for the mean */ PLOT npp.* residual.; /* Normal probability-probability plot */ PLOT residual.*pred.; /* RESIDUAL PLOT */RUN; QUIT;RUN; QUIT;

Example 6-1 (Continued)

Page 54: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression Linear Regression of SALT vs AREA

The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc

Number of Observations Read 20 Number of Observations Used 20

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550

Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001

Parameter Estimates

Variable Label DF 95% Confidence Limits

Intercept Intercept 1 0.85294 4.50015 area Roadway area 1 15.58308 19.51025

Page 55: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression SAS 시스템

The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc

Output Statistics

Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual Residual Residual -2-1 0 1 2 D

1 3.8000 6.0104 0.7152 4.5079 7.5129 1.9594 10.0614 -2.2104 1.642 -1.346 | **| | 0.172 2 5.9000 5.3085 0.7464 3.7404 6.8767 1.2328 9.3843 0.5915 1.628 0.363 | | | 0.014 3 14.1000 12.6781 0.4655 11.7002 13.6561 8.7911 16.5652 1.4219 1.729 0.822 | |* | 0.025 4 10.4000 9.6952 0.5633 8.5117 10.8788 5.7514 13.6390 0.7048 1.700 0.415 | | | 0.009 5 14.6000 14.9592 0.4168 14.0835 15.8350 11.0966 18.8218 -0.3592 1.741 -0.206 | | | 0.001 6 14.5000 14.4328 0.4255 13.5389 15.3267 10.5660 18.2996 0.0672 1.739 0.0386 | | | 0.000 7 15.1000 13.7309 0.4395 12.8075 14.6544 9.8572 17.6047 1.3691 1.736 0.789 | |* | 0.020 8 11.9000 10.9235 0.5194 9.8322 12.0147 7.0064 14.8406 0.9765 1.714 0.570 | |* | 0.015 9 15.5000 15.8365 0.4063 14.9829 16.6902 11.9789 19.6942 -0.3365 1.744 -0.193 | | | 0.001 10 9.3000 13.2045 0.4518 12.2553 14.1538 9.3246 17.0845 -3.9045 1.733 -2.253 | ****| | 0.173 11 15.6000 16.3629 0.4025 15.5173 17.2086 12.5070 20.2189 -0.7629 1.745 -0.437 | | | 0.005 12 20.8000 16.8893 0.4006 16.0477 17.7310 13.0343 20.7444 3.9107 1.745 2.241 | |**** | 0.132 13 14.6000 16.3629 0.4025 15.5173 17.2086 12.5070 20.2189 -1.7629 1.745 -1.010 | **| | 0.027 14 16.6000 14.7837 0.4195 13.9023 15.6652 10.9198 18.6477 1.8163 1.741 1.043 | |** | 0.032 15 25.6000 25.4872 0.5985 24.2297 26.7447 21.5206 29.4538 0.1128 1.688 0.0668 | | | 0.000 16 20.9000 21.1005 0.4527 20.1495 22.0516 17.2201 24.9809 -0.2005 1.732 -0.116 | | | 0.000 17 29.9000 29.3475 0.7639 27.7427 30.9523 25.2575 33.4375 0.5525 1.620 0.341 | | | 0.013 18 19.6000 21.2760 0.4571 20.3156 22.2364 17.3933 25.1587 -1.6760 1.731 -0.968 | *| | 0.033 19 31.3000 33.2077 0.9451 31.2221 35.1934 28.9538 37.4616 -1.9077 1.521 -1.254 | **| | 0.304 20 32.7000 31.1021 0.8449 29.3271 32.8772 26.9424 35.2619 1.5979 1.579 1.012 | |** | 0.147

Sum of Residuals 0 Sum of Squared Residuals 57.71626 Predicted Residual SS (PRESS) 70.97373

Page 56: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Scatter Diagram

Page 57: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Normal Probability plot

Page 58: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Residual Plot

Page 59: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.6 Correlation and Regression

The sample correlation coefficient between X and Y is

• as the population correlation coefficient, which is a measure of the strength of the linear relationship between Y and X in the population or joint distribution.

Page 60: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.6 Correlation and Regression

The sample correlation coefficient is also closely related to the slope in a linear regression model

Page 61: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Correlation Analysis

Correlation is a measure of the linear relationship between two random variables, X and Y. It is a parameter of the bivariate distribution (joint distribution) of X & Y and will be denoted by the Greek letter rho, .

= Corr(X, Y)

If =1 then X and Y have a perfect direct linear relationship.If =1 then X and Y have a perfect inverse linear relationship.If =0 then X and Y have no linear relationship.If then X and Y are directly related. The closer to 1 the more perfect the linear relationship.If then X and Y are inversely related. The closer to the more perfect the linear relationship.

Page 62: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Page 63: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

6-2.6 Correlation and Regression

It is often useful to test the hypotheses

The appropriate test statistic for these hypotheses is

Reject H0 if |t0| > t/2,n-2.

Page 64: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

OPTIONS NOOVP NODATE NONUMBER LS=80;DATA ex61;INPUT salt area @@;LABEL salt='Salt Conc' area='Roadway area';CARDS;3.8 0.19 5.9 0.15 14.1 0.57 10.4 0.414.6 0.7 14.5 0.67 15.1 0.63 11.9 0.4715.5 0.75 9.3 0.6 15.6 0.78 20.8 0.8114.6 0.78 16.6 0.69 25.6 1.3 20.9 1.0529.9 1.52 19.6 1.06 31.3 1.74 32.7 1.62PROC CORR DATA=EX61; VAR SALT AREA;TITLE 'Correlation between SALT and AREA';PROC REG; MODEL salt=area/XPX; TITLE 'Linear Regression of SALT vs AREA';RUN; QUIT;

Example 6-1 (Continued)

Page 65: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Correlation between SALT and AREA

CORR 프로시저

2 Variables: salt area

단순 통계량

변수 N 평균 표준편차 합 최소값 최대값

salt 20 17.13500 7.90691 342.70000 3.80000 32.70000 area 20 0.82400 0.43954 16.48000 0.15000 1.74000

단순 통계량

변수 레이블

salt Salt Conc area Roadway area

피어슨 상관 계수 , N = 20 H0: Rho=0 가정하에서 Prob > |r|

salt area

salt 1.00000 0.97540 Salt Conc <.0001

area 0.97540 1.00000 Roadway area <.0001

Linear Regression of SALT vs AREA

The REG Procedure Model: MODEL1 Dependent Variable: salt Salt Conc

Number of Observations Read 20 Number of Observations Used 20

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 1130.14924 1130.14924 352.46 <.0001 Error 18 57.71626 3.20646 Corrected Total 19 1187.86550

Root MSE 1.79066 R-Square 0.9514 Dependent Mean 17.13500 Adj R-Sq 0.9487 Coeff Var 10.45030

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 2.67655 0.86800 3.08 0.0064 area Roadway area 1 17.54667 0.93463 18.77 <.0001

Page 66: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.1 Estimation of Parameters in Multiple Regression

Page 67: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.1 Estimation of Parameters in Multiple Regression• The least squares function is given by

• The least squares estimates must satisfy

Page 68: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.1 Estimation of Parameters in Multiple Regression• The least squares normal equations are

• The solution to the normal equations are the least squares estimators of the regression coefficients.

Page 69: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

Page 70: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

Page 71: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

Page 72: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

Page 73: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

Page 74: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

OPTIONS NOOVP NODATE NONUMBER LS=80;DATA ex67;INPUT strength length height @@;CARDS;9.95 2 50 24.45 8 110 31.75 11 120 35 10 55025.02 8 295 16.86 4 200 14.38 2 375 9.6 2 5224.35 9 100 27.5 8 300 17.08 4 412 37 11 40041.95 12 500 11.66 2 360 21.65 4 205 17.89 4 40069 20 600 10.3 1 585 34.93 10 540 46.59 15 25044.88 15 290 54.12 16 510 56.63 17 590 22.13 6 10021.15 5 400PROC SGSCATTER data=ex67; MATRIX STRENGTH LENGTH HEIGHT;TITLE 'Scatter Plot Matrix for Wire Bond Data';PROC REG data=ex67; MODEL strength=length height/xpx r; TITLE 'Multiple Regression';RUN; QUIT;

Example 6-7

Page 75: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression

Page 76: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression Multiple Regression

The REG Procedure Model: MODEL1

Model Crossproducts X'X X'Y Y'Y

Variable Intercept length height strength

Intercept 25 206 8294 725.82 length 206 2396 77177 8008.47 height 8294 77177 3531848 274816.71 strength 725.82 8008.47 274816.71 27178.5316----------------------------------------------------------------------------------------- Multiple Regression

The REG Procedure Model: MODEL1 Dependent Variable: strength

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 5990.77122 2995.38561 572.17 <.0001 Error 22 115.17348 5.23516 Corrected Total 24 6105.94470

Root MSE 2.28805 R-Square 0.9811 Dependent Mean 29.03280 Adj R-Sq 0.9794 Coeff Var 7.88090

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 2.26379 1.06007 2.14 0.0441 length 1 2.74427 0.09352 29.34 <.0001 height 1 0.01253 0.00280 4.48 0.0002

Page 77: 6-1 Introduction To Empirical Models

6-2 Simple Linear Regression Multiple Regression

The REG Procedure Model: MODEL1 Dependent Variable: strength

Output Statistics

Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual -2-1 0 1 2 D

1 9.9500 8.3787 0.9074 1.5713 2.100 0.748 | |* | 0.035 2 24.4500 25.5960 0.7645 -1.1460 2.157 -0.531 | *| | 0.012 3 31.7500 33.9541 0.8620 -2.2041 2.119 -1.040 | **| | 0.060 4 35.0000 36.5968 0.7303 -1.5968 2.168 -0.736 | *| | 0.021 5 25.0200 27.9137 0.4677 -2.8937 2.240 -1.292 | **| | 0.024 6 16.8600 15.7464 0.6261 1.1136 2.201 0.506 | |* | 0.007 7 14.3800 12.4503 0.7862 1.9297 2.149 0.898 | |* | 0.036 8 9.6000 8.4038 0.9039 1.1962 2.102 0.569 | |* | 0.020 9 24.3500 28.2150 0.8185 -3.8650 2.137 -1.809 | ***| | 0.160 10 27.5000 27.9763 0.4651 -0.4763 2.240 -0.213 | | | 0.001 11 17.0800 18.4023 0.6960 -1.3223 2.180 -0.607 | *| | 0.013 12 37.0000 37.4619 0.5246 -0.4619 2.227 -0.207 | | | 0.001 13 41.9500 41.4589 0.6553 0.4911 2.192 0.224 | | | 0.001 14 11.6600 12.2623 0.7689 -0.6023 2.155 -0.280 | | | 0.003 15 21.6500 15.8091 0.6213 5.8409 2.202 2.652 | |***** | 0.187 16 17.8900 18.2520 0.6785 -0.3620 2.185 -0.166 | | | 0.001 17 69.0000 64.6659 1.1652 4.3341 1.969 2.201 | |**** | 0.565 18 10.3000 12.3368 1.2383 -2.0368 1.924 -1.059 | **| | 0.155 19 34.9300 36.4715 0.7096 -1.5415 2.175 -0.709 | *| | 0.018 20 46.5900 46.5598 0.8780 0.0302 2.113 0.0143 | | | 0.000 21 44.8800 47.0609 0.8238 -2.1809 2.135 -1.022 | **| | 0.052 22 54.1200 52.5613 0.8432 1.5587 2.127 0.733 | |* | 0.028 23 56.6300 56.3078 0.9771 0.3222 2.069 0.156 | | | 0.002 24 22.1300 19.9822 0.7557 2.1478 2.160 0.995 | |* | 0.040 25 21.1500 20.9963 0.6176 0.1537 2.203 0.0698 | | | 0.000

Sum of Residuals 0 Sum of Squared Residuals 115.17348 Predicted Residual SS (PRESS) 156.16295

Page 78: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.1 Estimation of Parameters in Multiple Regression

Page 79: 6-1 Introduction To Empirical Models

6-3 Multiple Regression

6-3.2 Inferences in Multiple RegressionTest for Significance of Regression

Page 80: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.2 Inferences in Multiple RegressionInference on Individual Regression Coefficients

• This is called a partial or marginal test

Page 81: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.2 Inferences in Multiple Regression

Confidence Intervals on the Mean Response and Prediction Intervals

Page 82: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.2 Inferences in Multiple Regression

Confidence Intervals on the Mean Response and Prediction Intervals

Page 83: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.2 Inferences in Multiple Regression

Confidence Intervals on the Mean Response and Prediction Intervals

Page 84: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.2 Inferences in Multiple RegressionA Test for the Significance of a Group of Regressors

Page 85: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyResidual Analysis

Page 86: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyResidual Analysis

Page 87: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyResidual Analysis

Page 88: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyResidual Analysis

Page 89: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyResidual Analysis

Page 90: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyInfluential Observations

Page 91: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyInfluential Observations

Page 92: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model Adequacy

Page 93: 6-1 Introduction To Empirical Models

6-3 Multiple Regression6-3.3 Checking Model AdequacyMulticollinearity

Page 94: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 95: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 96: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 97: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 98: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 99: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.2 Categorical Regressors

• Many problems may involve qualitative or categorical variables.

• The usual method for the different levels of a qualitative variable is to use indicator variables.

• For example, to introduce the effect of two different operators into a regression model, we could define an indicator variable as follows:

Page 100: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.2 Categorical Regressors

Page 101: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.2 Categorical Regressors

Page 102: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.3 Variable Selection Procedures

Best Subsets Regressions

Page 103: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresBackward Elimination

Page 104: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresForward Selection

Page 105: 6-1 Introduction To Empirical Models

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresStepwise Regression

Page 106: 6-1 Introduction To Empirical Models