34
Advanced Engineering Statistics - Multiple Linear Regression Project 2 Instructor: Dr.Victoria Chen Group Members :

Statistics project2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Statistics project2

Advanced Engineering Statistics -Multiple Linear Regression

Project 2

Instructor: Dr.Victoria Chen

Group Members : Rakesh Raj. N

Jaime SanguinoShriraam Madanagopal

Page 2: Statistics project2

Introduction to Multiple Linear Regressions:

The Multiple Linear Regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. The Objective of this project is to develop a best multiple linear regression model for the response variable and the Regressors (set of predictor variables). A statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regressions (MLR) is to model the relationship between the explanatory and response variables.  The model for MLR, given n observations, is:   yi = B0 + B1xi1 + B2xi2 + ... + Bpxip + Ei where i = 1, 2, n

MLR takes a group of random variables and tries to find a mathematical relationship between them. The model creates a relationship in the form of a straight line (linear) that best approximates all the individual data points.    MLR is often used to determine how many specific factors such as, the price of a commodity, interest rates, and particular industries or sectors, influence the price movement of an asset. For example, the current price of oil, lending rates, and the price movement of oil futures, can all have an effect on the price of an oil company's stock price. MLR could be used to model the impact that each of these variables have on stock's price.

Our Project:

The water line at America’s Beverage Company (Kroger Manufacturing) is the main source of income for the manufacturing plant and the number of cases of water produced during the month of October was 591,092. Also, there are three (3) more soft drinks lines, which are not returning the pertinent dividends because of marketing purposes but increasing costs of production for the facility. At this point, it is imperative to maximize the number of water cases processed in the water line in order to keep the plant running and justify any capital appropriation requested to the General Office.

      Industrial Engineering concepts suggest that the minimization of downtime scheduled, not scheduled downtime and set up time and the maximization of the running time and efficiency of the equipment. Achieving these objectives will allow the enhancement of profits generated from the automated water line.     

DISCUSSION: 

      Modeling as dependent variable the number the water cases produced in the line y= number of water cases and using the predictors run time, downtime,  unscheduled down time and setup time will be have the following variables

X1: Run time, the time where the line is processing the product.

X2: Downtime, the time where preventive maintenance is used to check the performance of the equipment and execute any repairs if necessary.

X3: Setup time, the time used to do changes on the equipment when size of bottles change.

Page 3: Statistics project2

X4: Efficiency, the key performance indicator used by management in order to check status of production.  

Data Set:  

Cases(Y)

Runmin(X1)

Downmin(X2)

Setupmin(X3)

Effper(X4)

33,551.0

1,027.2 222.3 177.12 80.6

24,120.0

733.2 301.7 247.60 69.9

28,800.0

885.5 257.1 292.37 75.6

36,504.0

1,094.4 249.8 93.90 81.5

34,776.0

1,061.4 89.8 288.82 90.6

35,064.0

1,071.7 348.1 20.27 74.1

31,390.0

955.0 171.9 299.87 83.6

28,008.0

846.9 99.1 314.05 88.8

33,264.0

1,159.0 101.0 180.00 79.2

27,028.0

1,260.0 0.0 180.00 64.4

22,680.0

1,019.8 240.0 180.00 54.0

31,392.0

975.5 142.4 319.97 84.3

25,992.0

782.9 270.6 373.48 74.0

17,314.0

468.2 289.1 177.55 68.6

32,327.0

963.4 205.6 242.85 83.0

11,592.0

352.4 138.7 27.22 70.8

22,104.0

660.3 134.0 0.00 83.5

173.0 5.2 0.0 0.00 99.2

36,576.0

1,108.1 291.9 39.98 78.4

24,912.0

759.0 292.2 388.82 71.1

Page 4: Statistics project2

33,509.0

1,005.0 223.7 211.35 81.8

20,016.0

618.5 492.8 297.03 54.0

As we were suggested by Dr. Chen to choose between the Not Scheduled and Down_min , we opted for Down_min and continued with the analysis of the project.

A Methodical approach to our Project:

In our project we have 4 predictors, the preliminary model is as mentioned below:

Yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi4 + εi

Page 5: Statistics project2

i = 1,..., n observations

X1: Run time, the time where the line is processing the product.

X2: Downtime, the time where preventive maintenance is used to check the performance of the equipment and execute any repairs if necessary.

X3: Setup time, the time used to do changes on the equipment when size of bottles change.

X4: Efficiency, the key performance indicator used by management in order to check status of production.

From the graph attached above, we can observe the different relations between the predictors and the response variables, as well as the relationship between the predictors. In the above figure we find that there in no major trend present in the X2, X3 and X5(Since we have omitted the consideration of X4 ie:- NOT SCHEDULED,we should check the co-relation of X5). The correlation between the predictor and response variable appear to be pretty good having a linear trend. The Predictor- Predictor plots show a pretty good scatter apart from the X1 and X4 plot. The ANOVA table below shows the various correlations between the Predictors and the response variable . The highest correlation gives us a value of 0.85271 which indicates that the effect of Runmin has the highest influence on the Response variable.

The CORR Procedure

5 Variables: Cases Runmin Downmin Setupmin Effper

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum Label

Cases 22 26868 8888 591092 173.00000 36576 CasesRunmin 22 855.10381 295.30889 18812 5.23333 1260 RunminDownmin 22 207.34846 115.36229 4562 0 492.81662 DownminSetupmin 22 197.82879 122.34451 4352 0 388.81667 SetupminEffper 22 76.86493 10.87210 1691 54.01255 99.18188 Effper

Pearson Correlation Coefficients, N = 22

Cases Runmin Downmin Setupmin Effper

Cases 1.00000 0.91564 0.18742 0.25811 0.07561

Runmin 0.91564 1.00000 0.03402 0.22376 -0.11767

Downmin 0.18742 0.03402 1.00000 0.19526 -0.57898

Setupmin 0.25811 0.22376 0.19526 1.00000 -0.14302

Page 6: Statistics project2

Effper 0.07561 -0.11767 -0.57898 -0.14302 1.00000

Page 7: Statistics project2

Our preliminary analysis suggests that bivariate relationships between the individual factors should not cause a problem in our model so the assumptions of the model need to be evaluated to further appropriateness of our whole model.

Model Adequacy:

The residual analysis is used to verify our model assumptions:

1. The current MLR Model is reasonable2. The residuals have constant variance 3. The residuals are normally distributed 4. The residuals are uncorrelated 5. No outliers 6. The predictors are not highly correlated with each other.

Residuals vs. fitted values : Our preliminary fitted model is a first order four variable Linear Equation of the form as shown below,

= b0 + b1 xi + b2x2 + b3x3 + b4x4 + .i

Cases = -28850 -24.46143* Runmin - 26.47767*Downmin -0.42515 Setup time + 388.10655 Effper

Residual V/S Fitted Value:The residuals given by (e) represent the difference between the model and fitted values of the cases. This comparison is useful for identifying possible outliers, checking the general form of the model and checking for constancy of the variance of error terms. The plot of residuals vs. the fitted values is as shown below in figure.

Inference: A Funnel shape can be observed in the values between the Residual and the Fitted values. This indicates that Constant –Variance is NOT OK . Hence we need to proceed with the transformation on Y , we use a Square root transformation to check if the Non-Constant Variance can be improved.

Page 8: Statistics project2

Residuals vs. Predictor variables:

1 : Residuals V/S Predictors plots are as given in figures below,

The graph’s above indicate the relationship between the Residuals and the various Predictors of the Model. We can observe a random scatter in all the plots. Since there is no curvature we can state that the current MLR model forms are OK.

Normal probability plot:

The plot between residuals and normal scores is as shown below.From the graph we observe a Line which is not Straight . Hence the Normality is NOT OK .

Page 9: Statistics project2

Plots for Predictor - Predictor variables

Below are the plots between Runmin, Downtime, Setupmin and EffPer.

Page 10: Statistics project2
Page 11: Statistics project2

From above plots we can observe a proper Scatter and there is no trend or curvature in the plots are randomly scattered with our Predictor Vs Predictor Variables.

Transformation:

A Funnel can be observed in the plot between Residual and Yhat .Hence As suggested by Dr Chen , we carried out a Square root of “Y “ transformation . The results are as given below….Since there was not much of an improvement which was observed . We reverted back to the old data set without any Transformations .

The values of these are as follows :

Formal tests on constancy of variance, multi co linearity, normality of error terms, lack of fit and X or Y outliers.

i. Test for normality: We conduct a correlation test for normality with value of α=.05. From the SAS output, we have the coefficient of correlation is given as 0.9263And from the given α=0.05, the test statistic we have from table B6 from the textbook is 0.9525.

Decision rule is as given below

H0: Normality is OK

H1: Normality is violated

If (Table B6) Reject H0

Since Normality is Ok.

The CORR Procedure

Page 12: Statistics project2

2 Variables: e2 enrm

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum Label

e2 21 0 0.01188 0 -0.03180 0.02978 Residual enrm 21 0 0.96464 0 -1.88951 1.88951 Normal Scores

Pearson Correlation Coefficients, N = 21 Prob > |r| under H0: Rho=0

e2 enrm

e2 1.00000 0.92630 Residual <.0001

enrm 0.92630 1.00000 Normal Scores <.000

Test For Multicollinearity: The variance inflation factors associated with various predictor variables are as given below,

VIFrunmin= (1-Rrunmin2)-1 = 1.13198

VIFdowntime = (1-Rdowntime2)-1 = 1.38942

VIFsetuptime = (1-Rsetuptime2)-1 = 1.00751

VIFEffper = (1-Reffper 2)-1 = 1.33575

VIF bar= 1.216165 < 5

The result of the above VIF values is that it confirms there is little multicollinearity among the individual predictor variables. A VIF value near or above 5 would indicate a serious deviation in the variance i.e. serious multicollinearity but a perfect VIF value would be 1 which all of our variables are relatively close. The maximum value of 1.38942 for focal length being used as an indicator the total model confirms that multicollinearity is not present as suggested by earlier plots.

BONFERRONI TEST FOR OUTLIER: From Figure 2, we identify the outlier as the 7th observation. It is a Y – outlier because it is in the Y – direction. Hence , we use Bonferroni outlier test for the outlier. Using the two tailed bonferroni test at α = 0.05, we have

The Bonferroni critical value is given as, t(1-α/2n ; n-p-1 ) = t(1-.05/2×21 ; 21-5-1 ) = 3.286

From the SAS output , we have the test statistic as,

Obs tinvtres finv50

1 3.29725 0.90583

Page 13: Statistics project2

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Number of Observations Read 21 Number of Observations Used 21

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 4 0.30425 0.07606 431.05 <.0001 Error 16 0.00282 0.00017646 Corrected Total 20 0.30707

Root MSE 0.01328 R-Square 0.9908 Dependent Mean 4.43447 Adj R-Sq 0.9885 Coeff Var 0.29956

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 3.40999 0.03197 106.66 <.0001 0 Runmin 1 0.00046190 0.00001363 33.88 <.0001 1.13198 Downmin 1 0.00047875 0.00003234 14.80 <.0001 1.38942 Setupmin 1 0.00009967 0.00002550 3.91 0.0013 1.00751 Effper 1 0.00641 0.00034657 18.51 <.0001 1.33575

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Output Statistics

Hat Diag Cov Obs Residual RStudent H Ratio DFFITS

1 0.000243 0.0185 0.0839 1.5071 0.0056 2 0.0163 1.3212 0.0962 0.8811 0.4310 3 0.003302 0.2515 0.0805 1.4705 0.0744 4 -0.004805 -0.3892 0.1818 1.6050 -0.1835 5 -0.0118 -0.9941 0.1998 1.2544 -0.4968 6 -0.004059 -0.3671 0.3445 2.0145 -0.2662 7 -0.002662 -0.2058 0.1090 1.5281 -0.0720 8 -0.002175 -0.1791 0.2151 1.7405 -0.0938 9 0.002376 0.1876 0.1465 1.5990 0.0777 10 0.008857 1.1031 0.6297 2.5247 1.4384

Page 14: Statistics project2

11 -0.004586 -0.4284 0.3837 2.1084 -0.3380 12 -0.004503 -0.3548 0.1366 1.5339 -0.1411 13 0.001862 0.1485 0.1635 1.6387 0.0657 14 0.0161 1.4161 0.2244 0.9507 0.7618 15 -0.000378 -0.0287 0.0830 1.5054 -0.0086 16 -0.0318 -8.4950 0.5674 0.0005 -9.7298 17 0.0298 3.6993 0.3415 0.0820 2.6641 18 -0.005207 -0.4481 0.2731 1.7775 -0.2747 19 0.001193 0.0970 0.1965 1.7130 0.0480 20 -0.001818 -0.1385 0.0834 1.4968 -0.0418 21 -0.006185 -0.6212 0.4597 2.2511 -0.5731

Output Statistics

-------------------------DFBETAS------------------------- Obs Intercept Runmin Downmin Setupmin Effper

1 -0.0026 0.0024 0.0020 -0.0013 0.0023 2 0.0890 -0.1393 0.1299 0.0981 -0.0646 3 -0.0136 0.0033 0.0208 0.0409 0.0076 4 0.0886 -0.0873 -0.0899 0.1003 -0.0737 5 0.1991 -0.0561 0.1258 -0.1748 -0.2414 6 0.0798 -0.1173 -0.1672 0.1730 -0.0436 7 0.0261 -0.0022 0.0045 -0.0379 -0.0296 8 0.0175 0.0320 0.0334 -0.0418 -0.0415 9 0.0094 0.0395 -0.0362 -0.0071 -0.0138 10 0.9167 0.5288 -1.0356 0.0160 -1.0440 11 -0.2378 -0.1161 0.0890 0.0114 0.3041 12 0.0378 -0.0048 0.0316 -0.0829 -0.0447 13 -0.0029 -0.0130 0.0093 0.0506 0.0009 14 0.3462 -0.6029 -0.0053 -0.0956 -0.0817 15 0.0046 -0.0012 -0.0020 -0.0018 -0.0049 16 -6.1126 7.5499 4.4954 4.1659 1.5784 17 0.4446 -1.4287 -0.6237 -1.7983 0.7613

We see that observations 10,16 and 17 have absolute values greater than 1. We measure the value with respect to 1 since the data set is less than 30 values.

Page 15: Statistics project2

16:00 Saturday, December 6, 2008 135

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Output Statistics

-------------------------DFBETAS------------------------- Obs Intercept Runmin Downmin Setupmin Effper

18 0.1043 -0.1317 -0.1509 0.1786 -0.0739 19 0.0019 -0.0090 0.0074 0.0370 -0.0050 20 0.0235 -0.0139 -0.0159 0.0005 -0.0215 21 -0.1220 0.0500 -0.2880 -0.1304 0.2082

Sum of Residuals 0 Sum of Squared Residuals 0.00282 Predicted Residual SS (PRESS) 0.00938

We find that there is a 16th and 17th observations have the ti value higher than the 3.29725.

LEVERAGETo test for x outliers the leverage of the hii values was calculated. By comparing hii to|

DFFITS|>1 we can identify possible x outliers. The leverage value is equal to . By examining all of the points only point seven is near the leverage value but is not exceeding it, all other points are below the leverage point. The leverage values are given in Table and 10th , 16th and 17th have X outliers which have |DFFITS| exceeding 1.

From the Residual V/S X1X4 plot we can observe a linear trend Hence we need to add the interaction term to the Model . Hence Adding these terms and standardizing the models we can get the below plots and graphs .

BONFERRONI TEST FOR OUTLIER: From Figure 2, we identify the outlier as the 7th observation. It is a Y – outlier because it is in the Y – direction. Hence , we use Bonferroni outlier test for the outlier. Using the two tailed bonferroni test at α = 0.05, we have

The Bonferroni critical value is given as, t(1-α/2n ; n-p-1 ) = t(1-.05/2×21 ; 21-5-1 ) = 3.286

From the SAS output , we have the test statistic as,

Obs tinvtres finv50

1 3.29725 0.90583

Page 16: Statistics project2

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Number of Observations Read 21 Number of Observations Used 21

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 4 0.30425 0.07606 431.05 <.0001 Error 16 0.00282 0.00017646 Corrected Total 20 0.30707

Root MSE 0.01328 R-Square 0.9908 Dependent Mean 4.43447 Adj R-Sq 0.9885 Coeff Var 0.29956

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 3.40999 0.03197 106.66 <.0001 0 Runmin 1 0.00046190 0.00001363 33.88 <.0001 1.13198 Downmin 1 0.00047875 0.00003234 14.80 <.0001 1.38942 Setupmin 1 0.00009967 0.00002550 3.91 0.0013 1.00751 Effper 1 0.00641 0.00034657 18.51 <.0001 1.33575

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Output Statistics

Hat Diag Cov Obs Residual RStudent H Ratio DFFITS

1 0.000243 0.0185 0.0839 1.5071 0.0056 2 0.0163 1.3212 0.0962 0.8811 0.4310 3 0.003302 0.2515 0.0805 1.4705 0.0744 4 -0.004805 -0.3892 0.1818 1.6050 -0.1835 5 -0.0118 -0.9941 0.1998 1.2544 -0.4968 6 -0.004059 -0.3671 0.3445 2.0145 -0.2662 7 -0.002662 -0.2058 0.1090 1.5281 -0.0720 8 -0.002175 -0.1791 0.2151 1.7405 -0.0938 9 0.002376 0.1876 0.1465 1.5990 0.0777 10 0.008857 1.1031 0.6297 2.5247 1.4384 11 -0.004586 -0.4284 0.3837 2.1084 -0.3380 12 -0.004503 -0.3548 0.1366 1.5339 -0.1411 13 0.001862 0.1485 0.1635 1.6387 0.0657 14 0.0161 1.4161 0.2244 0.9507 0.7618 15 -0.000378 -0.0287 0.0830 1.5054 -0.0086 16 -0.0318 -8.4950 0.5674 0.0005 -9.7298 17 0.0298 3.6993 0.3415 0.0820 2.6641 18 -0.005207 -0.4481 0.2731 1.7775 -0.2747 19 0.001193 0.0970 0.1965 1.7130 0.0480 20 -0.001818 -0.1385 0.0834 1.4968 -0.0418 21 -0.006185 -0.6212 0.4597 2.2511 -0.5731

Output Statistics

-------------------------DFBETAS------------------------- Obs Intercept Runmin Downmin Setupmin Effper

1 -0.0026 0.0024 0.0020 -0.0013 0.0023 2 0.0890 -0.1393 0.1299 0.0981 -0.0646 3 -0.0136 0.0033 0.0208 0.0409 0.0076 4 0.0886 -0.0873 -0.0899 0.1003 -0.0737

Page 17: Statistics project2

5 0.1991 -0.0561 0.1258 -0.1748 -0.2414 6 0.0798 -0.1173 -0.1672 0.1730 -0.0436 7 0.0261 -0.0022 0.0045 -0.0379 -0.0296 8 0.0175 0.0320 0.0334 -0.0418 -0.0415 9 0.0094 0.0395 -0.0362 -0.0071 -0.0138 10 0.9167 0.5288 -1.0356 0.0160 -1.0440 11 -0.2378 -0.1161 0.0890 0.0114 0.3041 12 0.0378 -0.0048 0.0316 -0.0829 -0.0447 13 -0.0029 -0.0130 0.0093 0.0506 0.0009 14 0.3462 -0.6029 -0.0053 -0.0956 -0.0817 15 0.0046 -0.0012 -0.0020 -0.0018 -0.0049 16 -6.1126 7.5499 4.4954 4.1659 1.5784 17 0.4446 -1.4287 -0.6237 -1.7983 0.7613

We see that observations 10,16 and 17 have absolute values greater than 1.

Page 18: Statistics project2

16:00 Saturday, December 6, 2008 135

The REG Procedure Model: MODEL1 Dependent Variable: yprime

Output Statistics

-------------------------DFBETAS------------------------- Obs Intercept Runmin Downmin Setupmin Effper

18 0.1043 -0.1317 -0.1509 0.1786 -0.0739 19 0.0019 -0.0090 0.0074 0.0370 -0.0050 20 0.0235 -0.0139 -0.0159 0.0005 -0.0215 21 -0.1220 0.0500 -0.2880 -0.1304 0.2082

Sum of Residuals 0 Sum of Squared Residuals 0.00282 Predicted Residual SS (PRESS) 0.00938

We find that there is a 16th and 17th observations have the ti value higher than the 3.29725.

LEVERAGETo test for x outliers the leverage of the hii values was calculated. By comparing hii to|DFFITS|>1 we can

identify possible x outliers. The leverage value is equal to . By examining all of the points only point seven is near the leverage value but is not exceeding it, all other points are below the leverage point. The leverage values are given in Table and 10th , 16th and 17th have X outliers which have |DFFITS| exceeding 1.

Interaction and Partial Regression:

Below, the residuals vs. the residuals of the interactions terms are shown for each set of predictor variable bilinear interaction terms. If the plot shows a linear or curvilinear trend it may suggest that that term needs to be included in the model selection process. From the Figure we can see that the residuals plotted against the interaction of X1 and X2. From figure, we observe that the points do have a set pattern i.e. they form a . Hence, we conclude that the interaction term of X1 and X2 does significantly impact the model. The result of this is that it needs to be included as a possible term in the model selection process. The SAS System 12:27 Wednesday, December 2, 1992 85

The CORR Procedure

11 Variables: Cases Runmin Downmin Setupmin Effper x1x2 x1x3 x1x4 x2x3 x2x4 x3x4

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

Cases 21 28139 6754 590919 11592 36576 Runmin 21 895.57384 231.80595 18807 352.36668 1260 Downmin 21 217.22857 108.26411 4562 0 492.80000 Setupmin 21 207.25000 116.90431 4352 0 388.82000 Effper 21 75.80000 9.90571 1592 54.00000 90.60000 x1x2 21 185305 102699 3891401 0 312136 x1x3 21 186914 94604 3925202 0 373047 x1x4 21 68455 20823 1437560 24948 96160 x2x3 21 45731 38792 960343 0 146376 x2x4 21 15970 6706 335367 0 26611 x3x4 21 15739 9364 330515 0 27888

Pearson Correlation Coefficients, N = 21

Cases Runmin Downmin Setupmin Effper x1x2

Cases 1.00000 0.85271 -0.12048 0.02289 0.58199 0.25666

Page 19: Statistics project2

Runmin 0.85271 1.00000 -0.31922 -0.01174 0.26098 0.29392

Downmin -0.12048 -0.31922 1.00000 0.05890 -0.48570 -0.21515

Setupmin 0.02289 -0.01174 0.05890 1.00000 0.02654 0.92742

Effper 0.58199 0.26098 -0.48570 0.02654 1.00000 0.18734

x1x2 0.25666 0.29392 -0.21515 0.92742 0.18734 1.00000

x1x3 0.35604 0.15644 0.84200 -0.07242 -0.22919 -0.18400

x1x4 0.94699 0.90401 -0.42237 0.01737 0.64029 0.32251

x2x3 -0.24437 -0.34795 0.66165 0.73111 -0.39934 0.47990

x2x4 0.08254 -0.22874 0.94610 0.01404 -0.22875 -0.22638

Pearson Correlation Coefficients, N = 21

x1x3 x1x4 x2x3 x2x4 x3x4

Cases 0.35604 0.94699 -0.24437 0.08254 0.16147

Runmin 0.15644 0.90401 -0.34795 -0.22874 0.06401

Page 20: Statistics project2

The SAS System 12:27 Wednesday, December 2, 1992 86

The CORR Procedure

Pearson Correlation Coefficients, N = 21

x1x3 x1x4 x2x3 x2x4 x3x4

Downmin 0.84200 -0.42237 0.66165 0.94610 -0.10221

Setupmin -0.07242 0.01737 0.73111 0.01404 0.96448

Effper -0.22919 0.64029 -0.39934 -0.22875 0.27622

x1x2 -0.18400 0.32251 0.47990 -0.22638 0.94632

x1x3 1.00000 0.05837 0.37322 0.89412 -0.15526

x1x4 0.05837 1.00000 -0.41554 -0.24201 0.18346

x2x3 0.37322 -0.41554 1.00000 0.55003 0.56619

x2x4 0.89412 -0.24201 0.55003 1.00000 -0.08268

The SAS System 12:27 Wednesday, December 2, 1992 87

The CORR Procedure

Pearson Correlation Coefficients, N = 21

Cases Runmin Downmin Setupmin Effper x1x2

x3x4 0.16147 0.06401 -0.10221 0.96448 0.27622 0.94632

Pearson Correlation Coefficients, N = 21

x1x3 x1x4 x2x3 x2x4 x3x4

x3x4 -0.15526 0.18346 0.56619 -0.08268 1.00000

Page 21: Statistics project2

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x1x2

0 100000 200000 300000 400000

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x1x3

0 100000 200000 300000 400000

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x1x4

20000 30000 40000 50000 60000 70000 80000 90000 100000

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x2x4

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x2x3

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000

Page 22: Statistics project2

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

x3x4

0 10000 20000 30000

- 1200

- 1100

- 1000

- 900

- 800

- 700

- 600

- 500

- 400

- 300

- 200

- 100

0

100

200

300

400

500

600

700

Res i dual

- 2000 - 1000 0 1000 2000

In order to measure the influence of the X1X4 variable on the plot , we perform the regression after adding X1X4 to the model .The results of the regressions are as follows :

The CORR Procedure

6 Variables: Cases Runmin Downmin Setupmin Effper stdx1x4

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

Cases 21 28139 6754 590919 11592 36576 Runmin 21 895.57384 231.80595 18807 352.36668 1260 Downmin 21 217.22857 108.26411 4562 0 492.80000 Setupmin 21 207.25000 116.90431 4352 0 388.82000 Effper 21 75.80000 9.90571 1592 54.00000 90.60000 stdx1x4 21 0.24855 0.90505 5.21956 -1.80919 2.63021

Pearson Correlation Coefficients, N = 21

Cases Runmin Downmin Setupmin Effper stdx1x4

Cases 1.00000 0.85271 -0.12048 0.02289 0.58199 -0.20520

Runmin 0.85271 1.00000 -0.31922 -0.01174 0.26098 -0.51074

Downmin -0.12048 -0.31922 1.00000 0.05890 -0.48570 0.54015

Page 23: Statistics project2

Setupmin 0.02289 -0.01174 0.05890 1.00000 0.02654 0.15996

Effper 0.58199 0.26098 -0.48570 0.02654 1.00000 -0.05965

stdx1x4 -0.20520 -0.51074 0.54015 0.15996 -0.05965 1.00000

Page 24: Statistics project2

The SAS System 12:27 Wednesday, December 2, 1992 93

Obs Cases Runmin Downmin Setupmin Effper stdx1 stdx2 stdx3

1 33551 1027.17 222.3 177.12 80.6 0.56769 0.04684 -0.25773 2 24120 733.15 301.7 247.60 69.9 -0.70069 0.78023 0.34515 3 28800 885.47 257.1 292.37 75.6 -0.04360 0.36828 0.72812 4 36504 1094.37 249.8 93.90 81.5 0.85758 0.30085 -0.96960 5 34776 1061.37 89.8 288.82 90.6 0.71522 -1.17702 0.69775 6 35064 1071.67 348.1 20.27 74.1 0.75966 1.20882 -1.59943 7 31390 954.95 171.9 299.87 83.6 0.25615 -0.41869 0.79227 8 28008 846.90 99.1 314.05 88.8 -0.20998 -1.09111 0.91357 9 33264 1159.02 101.0 180.00 79.2 1.13648 -1.07357 -0.23310 10 27028 1259.98 0.0 180.00 64.4 1.57205 -2.00647 -0.23310 11 22680 1019.83 240.0 180.00 54.0 0.53605 0.21033 -0.23310 12 31392 975.52 142.4 319.97 84.3 0.34487 -0.69117 0.96421 13 25992 782.85 270.6 373.48 74.0 -0.48629 0.49297 1.42193 14 17314 468.18 289.1 177.55 68.6 -1.84374 0.66385 -0.25405 15 32327 963.37 205.6 242.85 83.0 0.29246 -0.10741 0.30452 16 11592 352.37 138.7 27.22 70.8 -2.34337 -0.72534 -1.53998 17 22104 660.32 134.0 0.00 83.5 -1.01489 -0.76875 -1.77282 18 36576 1108.13 291.9 39.98 78.4 0.91697 0.68972 -1.43083 19 24912 758.97 292.2 388.82 71.1 -0.58932 0.69249 1.55315 20 33509 1004.95 223.7 211.35 81.8 0.47184 0.05977 0.03507 21 20016 618.53 492.8 297.03 54.0 -1.19514 2.54536 0.76798

Obs stdx4 stdx1x2 stdx1x3 stdx1x4 stdx2x3 stdx2x4 stdx3x4

1 0.48457 0.02659 -0.14631 0.27508 -0.01207 0.02270 -0.12489 2 -0.59562 -0.54670 -0.24185 0.41734 0.26930 -0.46472 -0.20558 3 -0.02019 -0.01606 -0.03175 0.00088 0.26815 -0.00744 -0.01470 4 0.57543 0.25801 -0.83151 0.49348 -0.29170 0.17312 -0.55793 5 1.49409 -0.84183 0.49905 1.06861 -0.82126 -1.75857 1.04250 6 -0.17162 0.91829 -1.21502 -0.13037 -1.93341 -0.20745 0.27449 7 0.78742 -0.10724 0.20294 0.20170 -0.33171 -0.32968 0.62385 8 1.31237 0.22911 -0.19183 -0.27557 -0.99681 -1.43195 1.19894 9 0.34324 -1.22009 -0.26491 0.39008 0.25024 -0.36849 -0.08001 10 -1.15085 -3.15426 -0.36644 -1.80919 0.46770 2.30915 0.26826 11 -2.20075 0.11275 -0.12495 -1.17971 -0.04903 -0.46289 0.51299 12 0.85809 -0.23836 0.33253 0.29593 -0.66643 -0.59308 0.82738 13 -0.18171 -0.23973 -0.69146 0.08836 0.70098 -0.08958 -0.25838 14 -0.72685 -1.22397 0.46841 1.34013 -0.16865 -0.48252 0.18466 15 0.72685 -0.03141 0.08906 0.21257 -0.03271 -0.07807 0.22134 16 -0.50476 1.69975 3.60874 1.18284 1.11701 0.36612 0.77732 17 0.77733 0.78020 1.79921 -0.78890 1.36286 -0.59758 -1.37806 18 0.26247 0.63245 -1.31203 0.24068 -0.98686 0.18103 -0.37556 19 -0.47447 -0.40809 -0.91530 0.27962 1.07554 -0.32857 -0.73693 20 0.60571 0.02820 0.01655 0.28580 0.00210 0.03621 0.02124 21 -2.20075 -3.04206 -0.91784 2.63021 1.95478 -5.60171 -1.69013

Model search:

Now we apply three search algorithms namely stepwise regression, backwards regression and best subset regression algorithm. The criteria for model selection used to evaluate the possible models are higher R2, R2

a; lower MSE, PRESS, as well as lower number of predictor variables and Cp close to p. We have included the following variables in the model search algorithms:Run_Min, Down_min, Schedule_min, eff_per, and other interaction terms. The model has been standardized because the values of the predictor’s variables and response variable have varying magnitudes.

a. Selection process:

The different procedures for model selection were done and the results were obtained . The resultsa for the different procedures are as follows :

Page 25: Statistics project2

1: Best Sub Set model…..

First best set

The REG Procedure Model: MODEL1 Dependent Variable: Cases

Adjusted R-Square Selection Method

Number of Observations Read 21 Number of Observations Used 21

Number in Adjusted Model R-Square R-Square C(p) AIC SBC Variables in Model

3 0.9951 0.9958 13.7203 262.2361 266.41422 Runmin Downmin Effper

Second Best set :

5 0.9969 0.9977 6.0000 254.1113 260.37839 Runmin Downmin Setupmin Effper

The new Subset obtained are:

Dependent Variable: Cases

Adjusted R-Square Selection Method

Number of Observations Read 21 Number of Observations Used 21

Number in Adjusted Model R-Square R-Square C(p) AIC SBC Variables in Model

3 0.9951 0.9958 3.2091 262.2361 266.41422 Runmin Downmin Effper

Similarly the other process of Backward deletion and Stepwise regression were carried out .The output is as follows :

The SAS System 12:27 Wednesday, December 2, 1992 105

The REG Procedure Model: MODEL1 Dependent Variable: Cases

Number of Observations Read 21 Number of Observations Used 21

Stepwise Selection: Step 1

Page 26: Statistics project2

Variable Runmin Entered: R-Square = 0.7271 and C(p) = 1732.637

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 663351807 663351807 50.63 <.0001 Error 19 248945175 13102378 Corrected Total 20 912296982

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept 5888.80747 3225.28395 43678620 3.33 0.0836 Runmin 24.84462 3.49169 663351807 50.63 <.0001

Bounds on condition number: 1, 1------------------------------------------------------------------------------------------------------

Stepwise Selection: Step 2

Variable Effper Entered: R-Square = 0.8658 and C(p) = 845.6608

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 789838790 394919395 58.05 <.0001 Error 18 122458192 6803233 Corrected Total 20 912296982

Page 27: Statistics project2

The SAS System 12:27 Wednesday, December 2, 1992 106

The REG Procedure Model: MODEL1 Dependent Variable: Cases

Stepwise Selection: Step 2

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept -11419 4638.30362 41235217 6.06 0.0241 Runmin 21.91167 2.60637 480833856 70.68 <.0001 Effper 262.99050 60.99227 126486982 18.59 0.0004

Bounds on condition number: 1.0731, 4.2923------------------------------------------------------------------------------------------------------

Stepwise Selection: Step 3

Variable Downmin Entered: R-Square = 0.9958 and C(p) = 13.7203

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 3 908495108 302831703 1354.11 <.0001 Error 17 3801874 223640 Corrected Total 20 912296982

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept -28901 1132.80996 145571123 650.92 <.0001 Runmin 24.46158 0.48535 568081327 2540.16 <.0001 Downmin 26.43461 1.14763 118656318 530.57 <.0001 Effper 387.74395 12.31346 221757766 991.59 <.0001

Bounds on condition number: 1.3806, 11.529------------------------------------------------------------------------------------------------------

Stepwise Selection: Step 4

Variable stdx1x4 Entered: R-Square = 0.9975 and C(p) = 5.2554

Page 28: Statistics project2

The SAS System 12:27 Wednesday, December 2, 1992 107

The REG Procedure Model: MODEL1 Dependent Variable: Cases

Stepwise Selection: Step 4

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 4 909984098 227496024 1573.77 <.0001 Error 16 2312884 144555 Corrected Total 20 912296982

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept -28237 933.96796 132134456 914.08 <.0001 Runmin 25.15032 0.44533 461068290 3189.56 <.0001 Downmin 24.42755 1.11462 69428305 480.29 <.0001 Effper 375.20126 10.64318 179646862 1242.76 <.0001 stdx1x4 425.29879 132.51505 1488990 10.30 0.0055

Bounds on condition number: 2.0148, 28.068------------------------------------------------------------------------------------------------------

All variables left in the model are significant at the 0.1000 level.

No other variable met the 0.1000 significance level for entry into the model.

Summary of Stepwise Selection

Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 Runmin 1 0.7271 0.7271 1732.64 50.63 <.0001 2 Effper 2 0.1386 0.8658 845.661 18.59 0.0004 3 Downmin 3 0.1301 0.9958 13.7203 530.57 <.0001 4 stdx1x4 4 0.0016 0.9975 5.2554 10.30 0.0055

To find out the outliers , we use the below terms :

Run_min ,down_min,eff_per (std x1,x4)

F*=MSR/MSE=302831703

F*=1354.103

Run_min:1000

Down_min: 250

Eff_per: 90

From Annova Table

Page 29: Statistics project2

Run_min,Down_min,Eff_per

X=2P/n=2*4/21=0.38095

Obsv10 =hii=.6296

16= hii= 0.4634

21= hii=.4359

Finv=3.297 No Youtliers

Conclusion

The conclusion of this analysis is The water line at America's Beverage Company (Kroger Manufacturing) is the main source of income for the manufacturing plant and the number of cases of water produced during the month of October was 591,092. Also, there are three (3) more soft drinks lines, which are not returning the pertinent dividends because of marketing purposes but increasing costs of production for the facility. At this point, it is imperative to maximize the number of water cases processed in the water line in order to keep the plant running and justify any capital appropriation requested to the General Office.

 

In our final model the response variable has a linear correlation with the predictor variables.

The final MLR model form is reasonable. The final model satisfied all the model assumptions and has constant variance, normality is OK, multicollinearity problem is eliminated.