Chapter 15 CRAVEN SALES MODEL - Multiple Regression

Embed Size (px)

Citation preview

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    1/19

    Variable Selection in Multiple Regression Modeling

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    2/19

    Use Multiple Regression to answer following questions?

    1) Taking Sales as Target variables develop a Regression Model2) Which variables should be included and why?3) Is the model adequate for future predictions?

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    3/19

    Section A : Find Correlations between Predictor Variables

    Variable Accounts has significant correlations with variables Time, Poten and ShareVariable Poten has significant correlation with variables TimeVariable Rating hassignificant correlations with variable AdvExp

    Section B : Perform Multiple Linear RegressionB.1 : ENTER REGRESSION METHOD (Include ALL Predictor Variables)Now enter all 8 variables as predictor and Sales as dependent variable for Linear regression. The

    output is as shown below. Model Summary output shows that adjusted R2 is 0.89 indicating that

    model explains 89% variability in the data.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    4/19

    ANNOVA Table below shows that the Regression model adequately fits the data.

    The Coefficients Table below attests to the Correlation observations of section A. Variable Accountsis the highest correlated with VIF equal to 5.637.

    Variable Share is highly correlated with VIF equal to 3.395. Similarly variable Time is also highlycorrelated with VIF equal to 3.356.

    Variables to be excluded include Time, Change, Accounts, Work and Rating since p-value of thesevariables exceeds value of 0.05.So the variables that pass the significance test and are to be

    retained in the Regression model are;

    Poten, AdvExp, ShareTo ascertain the reliability of this conclusion, we continue with other methods in Multiple Regression.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    5/19

    B.2: Stepwise Regression method.We apply the stepwise method as shown below

    Now click on Options tab as shown below.

    The options tab invokes Stepping Method Criteria as shown.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    6/19

    A variable will be entered into a solution if p-value is less than 0.05 and will be removed from the

    solution if p-value of that variable is greater than 0.10. In stepwise regression, the order of selection of

    variables into the solution is shown in Coefficients Table.

    The selected variables are,

    Accounts, AdvExp, Poten, Share

    Each selected variable has p-value less than 0.05. As shown in ANNOV Table, the four variables are

    selected in four steps. Since p-value is less than 0.05, model adequately fits the data.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    7/19

    Strength of the model is provided in the Summary Table as shown below.

    Note the final model is obtained in fourth stage with model explaining 88.1% variability in the data. It is

    interesting to observe that Adjusted R2 increase while Standard Error of the Estimate decreases at each

    step.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    8/19

    B.3: Backward Regression method.Invoke the Backward regression as shown below.

    Initially all the variables are entered in the solution. Then variablesWork, Rating and Accounts are

    removed at subsequent stages since their respective p-values are greater than 0.10 as shown below.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    9/19

    As seen in Coefficients Table below, all eight predictor variables are entered in the initial solution to find

    the respective p-values.

    Since variable Workhas highest p-value of 0.487, it is excluded from the solution. The solution for

    remaining seven variables (excluding Work) is shown below. Since variable Rating has highest p-value of

    0.440, it is excluded from the solution.

    The process of Regression model building is re-executed with six variables now. The output is shown

    below. Variable Accounts has the highest p-vale of 0.224. So it is excluded from subsequent solution.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    10/19

    Regression model building is now carried out with five variables to find their respective p-values. The

    output of Coefficients is shown below.

    Since all the remaining five variables have their respective p-values lessthan 0.10, no further exclusion is

    carried out.

    ANNOVA Tables for four stages of model development are shown below.

    Note that value of F increases with each stage. Five variables at Model 4 have collective p-value of 0.000

    attesting that model is adequate for predictive analysis.

    To find the strength of model, we look at MODEL SUMMARY Table.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    11/19

    Five variables for Model 4 have Adjusted R Square value of 0.893 signifying very high variability

    explaining power.

    B.4: Forward Regression method.The forward selection procedure starts with no independent variables. It adds variables one at a time

    using the same procedure as stepwise regression for determining whether an independent variable

    should be entered into the model. However, the forward selection procedure does not permit a

    variable to be removed from the model once it has been entered. The procedure stops if the p-value for

    each of the independent variables not in the model is greater than to enter.

    We apply Forward Regression method by invoking option Forward as shown below.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    12/19

    We try all variables one-by-one as predictor variables with Sales as dependent variable. Variable

    Accounts has the smallest p-value out of eight predictor variable, so it is entered as first variable in the

    solution as shown below.

    Now, there are seven variables remaining. Which one to choose? Select one of the remaining variables

    at a time with variable Accounts to find p-values of all pairs as shown below.

    Note variable AdvExp when paired with variable Accounts yield lowest p-value. So it is chosen to join the

    selected variables group as seen in Coefficients Table.

    Now we have two variables (Accounts, AdvExp) in the selected variable list. Adding the remaining six

    variables one-by-one and finding p-values for the remaining five variables, we have following Table.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    13/19

    As seen, variable Poten has p-value of 0.019 when combined with two variables (Accounts, AdvExp). So

    we select variable Poten to be included in the selected variables list as seen in the following Coefficients

    Table..

    Next, we combine the remaining five variables one-by-one with already three selected variables

    (Accounts, AdvExp, Poten) to select next variable with lowest p-value (less than .05). The results are

    shown in Table below.

    Variable Share has a p-value of .001 to join the remaining three variables as seen in the following

    Coefficients Table

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    14/19

    Next we combine remaining four variables one-by-one with the selected four variables to find p-values

    of these remaining variables. The p-values of these four variables are shown in the following Table.

    No variable from the remaining four variables has a p-value less than 0.05, so no further variable is

    added to the list of selected variables.

    ANNOVA Table for the four models is shown below attesting that Model 4 has highest F value.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    15/19

    MODEL SUMMARY Table shows that Model 4 has lowest standard error with highest value of adjusted

    R2

    value of 0.881

    SUMMARY OF FOUR REGRESSION METHODS

    Here we compare the four Regression methods and list down the eight variables as recommended to be

    included by each method.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    16/19

    Situation A:

    Analysis of Regression Model including Four Variables Suggested by Backward Regression

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    17/19

    Situation B:

    Regression Model by including three Variables as Recommended by ENTER method.

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    18/19

    Situation C:

    Regression Model by Including Four Variables as Suggested by

    (i) Stepwise Method(ii) Forward Method

  • 7/30/2019 Chapter 15 CRAVEN SALES MODEL - Multiple Regression

    19/19

    Comparison of Three Situations

    Situation C has better parameters of the three situations. The model proposed by this situation has

    largest adjusted R2 of 0.881 with maximum F value of 45.226. However standard error of estimate is

    greater than obtained in situation B.

    So the recommended model according to situation C is;

    Estimate of Sales = -1441.932 + 0.038 (Poten) + 0.175 (AdvExp) + 190.144 (Share) + 9.214 (Accounts)