Project part C.docx

DeVry university

AJ DAVIS DEPARTMENT STORES

Part C: Regression Analysis

Kunal Desai

2/17/2015

Math 533 Applied Managerial Statistics

Project Part C: Regression and Correlation Analysis

1. Generate a scatterplot for INCOME (1000) vs CREDIT BALANCE including the graph of the BEST FIT line. Interpret.

ANS :

Here each point in the scatter plot is representing the combination of income and credit balance. as we can see as the income is increasing the credit balance is also increasing thus based on the cater plot we can say there is positive relation presents in between the 2 variables income and credit balance. so the expected correlation in between these 2 variables is positive and we can also see the best fit line is fitting the data really good. thus based on the scatter plot we can say the customer having high income is expected to have high credit balance.

2. Determine the equation of the BEST FIT line, which describes the relationship between INCOME and CREDIT BALANCE.

The MINI-TAB output is given below,

Regression Analysis: Income($1000) versus Credit Balance($)

The regression equation isIncome($1000) = - 3.52 + 0.0119 Credit Balance($)

Predictor Coef SE Coef T PConstant -3.516 5.483 -0.64 0.524Credit Balance($) 0.011926 0.001289 9.25 0.000

S = 8.40667 R-Sq = 64.1% R-Sq(adj) = 63.3%

Analysis of Variance

Source DF SS MS F PRegression 1 6052.7 6052.7 85.65 0.000Residual Error 48 3392.3 70.7Total 49 9445.0

SO based on the output the equation of the best fitted line is,

Income = -3.516+0.011926* credit balance

Where the unit of credit balance is in $ and the unit of income is in $1000

3. Determine the coefficient of correlation. Interpret.

The coefficient of correlation between the two variables is 0.801. The large positive value of correlation coefficient is telling us that there is a strong positive relation present in between the two considered variables. So if the value of one variable increases (decreases) the value of other variable will also increase (decrease) by almost same unit.

4. Determine the coefficient of determination. Interpret.

The coefficient of determination is 0.641 or 64.1%. This value tells us about the strength of prediction of the dependent variable based on the value of the independent variable. The value 64.1% is implying that 64.1% of the variation of the dependent variable (Income) is explained by the regression model. The moderate value of coefficient of determination or R-sq is implying that the model is a medium fit.

5. Test the utility of this regression model (use a two tail test with α =.05). Interpret your results, including the p-value.

The utility of this model can be tested by a t-test for beta-1. From the obtained output we can see the test statistic for that test is 9.25 with corresponding p-value 0.000. Since the p-value is smaller than the significance level α =.05 so we can say that the model is significant.

6. Based on your findings in 1-5, what is your opinion about using CREDIT BALANCE to predict INCOME? Explain.

In 1-5 as we have seen that the model is significant which means the independent variable is significant in predicting the dependent variable. So using Credit Balance to predict Income is appropriate and Credit Balance is predicting Income really well.

7. Compute the 95% confidence interval for beta-1 (the population slope). Interpret this interval.

The 95% confidence interval for β1 is (0.009335272, 0.01451756) this interval implies that it contains the true value of the parameter β1 with probability 0.95.

8. Using an interval, estimate the average income for customers that have credit balance of $4,000. Interpret this interval.

The estimated interval is (41.77, 46.61), this interval tells us that based on the given data this

interval contains the new estimated income for a customer, with probability 0.95, who has Credit Balance $4000.

9. Using an interval, predict the income for a customer that has a credit balance of $4,000. Interpret this interval.

The predicted interval is (27.11, 61.27), this interval implies that based on the given data this interval contains the new prediction income for a customer having Credit Balance $4000 with probability 0.95.

10. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer.

Putting the Credit balance value $10,000 in the regression model we get,

Income = -3.516+ 0.011926*10,000 = 115.7482703.

So a person having credit balance $10,000 is expected to have income $115,748.27 based on the fitted regression model.

11. In an attempt to improve the model, we attempt to do a multiple regression model predicting INCOME based on CREDIT BALANCE, YEARS and SIZE.

Using MINI-TAB run the multiple regression analysis using the variables CREDIT BALANCE, YEARS and SIZE to predict INCOME. State the equation for this multiple regression model.

The output in this case is given below,

Regression Analysis: Income($1000) versus Credit Balance($), Size, Years

The regression equation isIncome($1000) = - 13.2 + 0.0108 Credit Balance($) + 0.615 Size + 1.21 Years

Predictor Coef SE Coef T PConstant -13.186 3.608 -3.65 0.001Credit Balance($) 0.0107922 0.0008184 13.19 0.000Size 0.6151 0.4178 1.47 0.148Years 1.2097 0.2322 5.21 0.000

S = 5.26121 R-Sq = 86.5% R-Sq(adj) = 85.6%

Analysis of Variance

Source DF SS MS F PRegression 3 8171.7 2723.9 98.41 0.000

Residual Error 46 1273.3 27.7Total 49 9445.0

Source DF Seq SSCredit Balance($) 1 6052.7Size 1 1368.0Years 1 750.9

So the fitted regression line is,

Income = -13.186 +0.0107922* Credit Balance + 0.6151* Size + 1.2097*Years.

12. Perform the Global Test for Utility (F-Test). Explain your conclusion.

From the MINI-TAB output we can see the F-test statistic in this case is 98.41 with corresponding p-value 0. Thus the null hypothesis of insignificancy is rejected and we can conclude that the regression model is significant in predicting the dependent variable Income.

13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, which independent variables should we keep and which should be discarded.

From the output, t-test statistic for Credit Balance is 13.19 with corresponding p-value 0, for Size it is 1.47 with p-value 0.148 and for Years it is 5.21 with p-value 0. Since the p-value for Credit balance and Years is smaller than 0.05 so they are significant in predicting Income so we should keep them in the model but the p-value for size is greater than 0.05 implying Size is not so significant in predicting the Income thus we should remove this variable from the model.

14. Is this multiple regression model better than the linear model that we generated in parts 1-10? Explain.

For that we need to look at the R-sq or coefficient of determination for both the models. As we can see the coefficient of determination for multiple regression model (86.5%) is greater than for the simple linear model (64.1%) so the MLR is explaining much higher variance thus implying the Multiple linear regression model is better than simple linear regression model.

Documents

Project part C.docx