119
Regression Shibin Liu SAS Beijing R&D

Regression

Embed Size (px)

DESCRIPTION

Regression. Shibin Liu SAS Beijing R&D. Agenda. 0 . Lesson overview 1. Exploratory Data Analysis 2. Simple Linear Regression 3. Multiple Regression 4. Model Building and Interpretation 5. Summary. 2. Agenda. 0. Lesson overview 1. Exploratory Data Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Regression

Regression

Shibin LiuSAS Beijing R&D

Page 2: Regression

Agenda• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

2

Page 3: Regression

Agenda

• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

3

Page 4: Regression

Lesson overview

4

ANOVA

Predictor VariableResponse Variable

+

Page 5: Regression

Lesson overview

5

ContinuousContinuous

Correlation analysis

Linear regression

Page 6: Regression

Lesson overview

6

Continuous predictorContinuous response

• Measure linear association• Examine the relationship• Screen for outliers• Interpret the correlation

Correlation analysis

Page 7: Regression

Lesson overview

7

Continuous predictorContinuous response

• Define the linear association• Determine the equation for

the line• Explain or predict variability

Linear regression

Page 8: Regression

Lesson overview

8

What do you want to

examine?

The relationship between variables

The difference between groups on one or more

variables

The location, spread, and shape of the data’s

distribution

Summary statistics

or graphics?

How many

groups?Which kind

of variables?

SUMMARY STATISTICS

DISTRIBUTION ANALYSIS

TTEST

LINEAR MODELS

CORRELATIONS

ONE-WAY FREQUENCIES

& TABLE ANALYSIS

LINEAR REGRESSION

LOGISTIC REGRESSION

Summary statistics

Both Two

Two or more

Descriptive Statistics

Descriptive Statistics, histogram, normal,

probability plots

Analysis of variance

Continuous only

Frequency tables, chi-square test

Categorical response variable

Descriptive Statistics

Inferential Statistics

Lesson 1 Lesson 2 Lesson 3 & 4 Lesson 5

Page 9: Regression

Agenda• 0. Lesson overview

• 1. Exploratory Data Analysis• 2. Simple Linear Regression• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

9

Page 10: Regression

Exploratory Data Analysis: Introduction

10

Weight Height

Continuous variable Continuous variable

Scatter plot Correlation analysis

Exploratory data analysis

Linear regression

Page 11: Regression

Exploratory Data Analysis: Objective• Examine the relationship between continuous variable

using a scatter plot• Quantify the degree of association between two

continuous variables using correlation statistics• Avoid potential misuses of the correlation coefficient • Obtain Pearson correlation coefficients

11

Page 12: Regression

Exploratory Data Analysis: Using Scatter Plots to Describe Relationships between Continuous Variables

12

Scatter plot Correlation analysis

Exploratory data analysis

Relationship

Trend

Range

Outlier

Communicate analysis result

X: Predict variableY: Response variableCoordinate: values of X and Y

Page 13: Regression

Exploratory Data Analysis: Using Scatter Plots to Describe Relationships between Continuous Variables

13

?Model

Terms2

Squared Quadratic

Page 14: Regression

Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables

14

Exploratory data analysis

Scatter plot Correlation analysisCorrelation analysis

Linear association

Zero PositiveNegative

Page 15: Regression

Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables

15

Person correlation coefficient:

For population

For sample

Page 16: Regression

Exploratory Data Analysis: Using Correlation to Measure Relationships between Continuous Variables

16

Correlation analysis

Person correlation coefficient: r

-1 0 +1

Strong negativelinear relationship

Strong positivelinear relationship

No linear relationship

Page 17: Regression

Exploratory Data Analysis: Hypothesis testing for a Correlation

17

Correlation Coefficient Test

H0: 0

Ha: 0

Population parameter

Sample statistic

Correlation r

• A p-value does not measure the magnitude of the association.• Sample size affects the p-value.

Rejecting the null hypothesis only means that you can be confident that the true population correlation is not 0. small p-value can occur (as with many statistics) because of very large sample sizes. Even a correlation coefficient of 0.01 can be statistically significant with a large enough sample size. Therefor, it is important to also look at the value of r itself to see whether it is meaningfully large.

Page 18: Regression

Exploratory Data Analysis: Hypothesis testing for a Correlation

18

-1 0 +1

0.72 0.81rrrr

Page 19: Regression

Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect

19

Correlation does not imply causation

Besides causality, could other reasons account for strong correlation between two variables?

Page 20: Regression

Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect

20

Weight Height

Correlation does not imply causation

A strong correlation between two variables does not mean change in one variable causes the other variable to change, or vice versa.

Page 21: Regression

Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect

21

Correlation does not imply causation

Page 22: Regression

Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect

22

Correlation does not imply causation

Page 23: Regression

Exploratory Data Analysis: Avoiding Common Errors in Interpreting CorrelationsCause and Effect

23

X: the percent of students who take the SAT exam in one of the statesY: SAT scores

?SAT score bounded to college entrance

or not

Page 24: Regression

Exploratory Data Analysis: Avoiding Common Errors: Types of Relationships

24

Pearson correlation

coefficient: r -> 0

?

curvilinear

quadratic

parabolic

Page 25: Regression

Exploratory Data Analysis: Avoiding Common Errors: outliers

25

Data one Data two

r =0.82 r =0.02

Page 26: Regression

Exploratory Data Analysis: Avoiding Common Errors: outliers

26

What to do with outlier?

Why an outlier ?

Valid Error

Collect data

Replicate data

Compute two correlation coefficients

Report both coefficients

Page 27: Regression

Exploratory Data Analysis: Scenario: Exploring Data Using Correlation and Scatter Plots

27

Fitness

oxygen

consumption?

Page 28: Regression

Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots

28

Page 29: Regression

Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots

29

What’s the Pearson correlation coefficient of Oxygen_Consumption with Run_Time?

What’s the p-value for the correlation of Oxygen_Consumption with Performance?

Page 30: Regression

Exploratory Data Analysis: Exploring Data with Correlations and Scatter Plots

30

Page 31: Regression

Exploratory Data Analysis: Examining Correlations between Predictor Variables

31

Page 32: Regression

Exploratory Data Analysis: Examining Correlations between Predictor Variables

32

What are the two highest Pearson correlation coefficient s?

Page 33: Regression

Exploratory Data Analysis

33

Question 1.

The correlation between tuition and rate of graduation at U.S. college is 0.55. What does this mean?

a) The way to increase graduation rates at your college is to raise tuition

b) Increasing graduation rates is expensive, causing tuition to risec) Students who are richer tend to graduate more often than

poorer studentsd) None of the above.

Answer: d

Page 34: Regression

Agenda• 0. Lesson overview• 1. Exploratory Data Analysis

• 2. Simple Linear Regression• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

34

Page 35: Regression

Simple Linear Regression: Introduction

35

Page 36: Regression

Simple Linear Regression: Introduction

36

-1 0 +1

Linear relationships

Variable A Variable DVariable CVariable B

Page 37: Regression

Simple Linear Regression: Introduction

37

r

r

Same

Different

Page 38: Regression

Simple Linear Regression: Introduction

38

Simple Linear Regression

Y: variable of primary interest

X: explains variability in Y

Regression Line

Page 39: Regression

Simple Linear Regression: Objective

39

• Explain the concepts of Simple Linear Regression• Fit a Simple Linear Regression using the Linear

Regression task• Produce predicted values and confidence intervals.

Page 40: Regression

Simple Linear Regression: Scenario: Performing Simple Linear Regression

40

Fitness

Run_Time Oxygen_Consumption

Simple Linear Regression

Linear regression

Page 41: Regression

Simple Linear Regression: The Simple Linear Regression Model

41

Page 42: Regression

Simple Linear Regression: The Simple Linear Regression Model

42

Question 2.

What does epsilon represent?

a) The intercept parameterb) The predictor variablec) The variation of X around the lined) The variation of Y around the line

Answer: d

Page 43: Regression

Simple Linear Regression: How SAS Performs Linear Regression

43

Method of least squareMinimize

Best Linear Unbiased Estimators. Are unbiased estimators. Have minimum variance

Page 44: Regression

Simple Linear Regression: Measuring How Well a Model Fits the Data

44

Regression model Baseline model

VS.

Page 45: Regression

Simple Linear Regression: Comparing the Regression Model to a Baseline Model

45

Base line model: 𝑌

Better model: Explain more variability

Type of variability Equation

Explained (SSM)

Unexplained (SSE)

Total

Page 46: Regression

Simple Linear Regression: Hypothesis Testing for Linear Regression

46

Linear regression

H 𝑎 : 𝛽1≠0

Page 47: Regression

Simple Linear Regression: Assumptions of Simple Linear Regression

47

Linear regression

Assumptions:

1 .The mean of Y is linearly related to X.2. Errors are normally distributed3. Errors have equal variances.4. Errors are independent.

Page 48: Regression

Simple Linear Regression: Performing Simple Linear Regression

48

Task >Regression>Linear Regression

Page 49: Regression

Simple Linear Regression: Performing Simple Linear Regression

49

Task >Regression>Linear Regression

Page 50: Regression

Simple Linear Regression: Performing Simple Linear Regression

50

Question 3.

In the model Y=X, if the parameter estimate (slope) of X is 0, then which of the following is the best guess (predicted value) for Y when X is equals to 13?

a) 13b) The mean of Yc) A random numberd) The mean of Xe) 0

Answer: b

Page 51: Regression

Simple Linear Regression: Confidence and Prediction Intervals

51

Page 52: Regression

Simple Linear Regression: Confidence and Prediction Intervals

52

Question 4.

Suppose you have a 95% confidence interval around the mean. How do you interpret it?

a) The probability is .95 that the true population mean of Y for a particular X is within the interval.

b) You are 95% confident that a newly sampled value of Y for a particular X is within the interval.

c) You are 95% confident that your interval contains the true population mean of Y for a particular X.

Answer: c

Page 53: Regression

Simple Linear Regression: Confidence and Prediction Intervals

53

Page 54: Regression

Simple Linear Regression: Confidence and Prediction Intervals

54

Page 55: Regression

Simple Linear Regression: Producing Predicted Values of the Response Variable

55

data Need_Predictions; input Runtime @@; datalines;9 10 11 12 13;run;

Page 56: Regression

Simple Linear Regression: Producing Predicted Values of the Response Variable

56

data Need_Predictions; input Runtime @@; datalines;9 10 11 12 13;run;

Page 57: Regression

Simple Linear Regression: Producing Predicted Values of the Response Variable

57

18

Page 58: Regression

Agenda• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression

• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

58

Page 59: Regression

Multiple Regression• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression

• 3. Multiple Regression• 4. Model Building and Interpretation• 5. Summary

59

Page 60: Regression

Multiple Regression: Introduction

60

Simple Linear Regression

Predictor VariableResponse Variable

Multiple Linear Regression

Response Variable Predictor Variable

More than one Predictor Variable

Predictor Variable

Page 61: Regression

Multiple Regression: Introduction

61

Simple Linear Regression

Multiple Linear Regression

When k=2

Page 62: Regression

Multiple Regression: Objective

62

• Explain the mathematical model for multiple regression• Describe the main advantage of multiple regression

versus simple linear regression• Explain the standard output from the Linear Regression

task.• Describe common pitfalls of multiple linear regression

Page 63: Regression

Multiple RegressionAdvantages and Disadvantages of Multiple Regression

63

Advantages Disadvantages

Multiple Linear Regression

127 possible model

Complex to interpret

Page 64: Regression

Multiple RegressionPicturing the Model for Multiple Regression

64

Multiple Linear Regression

=0 0

𝑘+1 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠

𝑖𝑛𝑐𝑙𝑢𝑑𝑒

Page 65: Regression

Multiple RegressionPicturing the Model for Multiple Regression

65

Multiple Linear Regression

Page 66: Regression

Multiple RegressionCommon applications

66

Multiple Linear Regression is a powerful tool for the following tasks:

1. Prediction, which is used to develop a model future values of a response variable (Y) based one its relationships with other predictor variables (Xs).

2. Analytical or Explanatory Analysis, which is used to develop an understanding of the relationships between the response variable and predictor variables

Page 67: Regression

Multiple Regression Analysis versus Prediction in Multiple Regression

67

Prediction 1. The terms in the model, the values of their

coefficients, and their statistical significance are of secondary importance.

2. The focus is on producing a model that is the best at predicting future values of Y as a function of the Xs.

𝑌= �̂�0+ �̂�1 𝑋 1+…+ �̂�𝑘 𝑋𝑘

Page 68: Regression

Multiple Regression Analysis versus Prediction in Multiple Regression

68

Analytical or Explanatory Analysis

1. The focus is understanding the relationship between the dependent variable and independent variables.

2. Consequently, the statistical significance of the coefficient is important as well as the magnitudes and signs of the coefficients.

𝑌= �̂�0+ �̂�1 𝑋 1+…+ �̂�𝑘 𝑋𝑘

Page 69: Regression

Multiple RegressionHypothesis Testing for Multiple Regression

69

Multiple Linear Regression

H 𝑎 :𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒𝛽𝑖≠0

The regression model does not fit the data better than the baseline model.

The regression model does fit the data better

than the baseline model.

Page 70: Regression

Multiple RegressionHypothesis Testing for Multiple Regression

70

Question 4.

Match below items left and right?

a 1. At least one slope of the regression in the population is not 0 and at least one predictor variable explains a significant amount of variability in the response model

2. No predictor variable explains a significant amount of variability in the response variable

3. The estimated linear regression model does not fit the data better than the baseline model

a) Reject the null hypothesis

b) Fail to reject the null hypothesis

a) Reject the null hypothesis

b

b

Page 71: Regression

Multiple RegressionAssumptions for Multiple Regression

71

Linear regression model

Assumptions:

1 .The mean of Y is linearly related to X.2. Errors are normally distributed3. Errors have equal variances.4. Errors are independent.

Page 72: Regression

Multiple Regression: Scenario: Using Multiple Regression to Explain Oxygen Consumption

72

Performance

Age

Page 73: Regression

Multiple Regression: Adjust R2

73

Adj. R2

𝑅2=1−(𝑛−𝑖)(1−R2 )

𝑛−𝑝

i = 1 if there is an intercept and 0 otherwisen = the number of observations used to fit the modelp = the number of parameters in the model

Page 74: Regression

Multiple Regression: Performing Multiple Linear Regression

74

Page 75: Regression

Multiple Regression: Performing Multiple Linear Regression

75

What’s the p-value of the overall model?

Should we reject the null hypothesis or not?

Based on our evidence, do we reject the null hypothesis that the parameter estimate is 0?

Page 76: Regression

Multiple Regression: Performing Multiple Linear Regression

76

Oxygen_Consumption

RunTime

Performance Oxygen_Consumption

Oxygen_Consumption

Performance

RunTime

?

Page 77: Regression

Multiple Regression: Performing Multiple Linear Regression

77

Performance

RunTime-0.82049

Collinearity

Page 78: Regression

Agenda• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression• 3. Multiple Regression

• 4. Model Building and Interpretation• 5. Summary

78

Page 79: Regression

Model Building and Interpretation : Introduction

79

Performance

Age

Page 80: Regression

Model Building and Interpretation : Introduction

80

?

Page 81: Regression

Model Building and Interpretation : Introduction

81

Stepwise selection methods

Forward

Backward

Stepwise

All possible regressions rank criteria:

R2

Adjusted R2

Mallows’ Cp

‘No selection’ is the default

Page 82: Regression

Model Building and Interpretation: Objectives

82

• Explain the Linear Regression task options for the model selection

• Describe model selection options and interpret output to evaluate the fit of several models

Page 83: Regression

Model Building and Interpretation : Approaches to Selecting Models: Manual

83

Full Model

?!

Page 84: Regression

Model Building and Interpretation : SAS and Automated Approaches to Modeling

84

Stepwise selection methods

Forward

Backward

Stepwise

All possible regressions rank criteria:

R2

Adjusted R2

Mallows’ Cp

‘No selection’ is the default

Run all methods

Look for commonalities

Narrow down models

Page 85: Regression

Model Building and Interpretation : The All-Possible Regressions Approach to Model Building

85

Fitness

Predictor variables

128 possible models

Page 86: Regression

Model Building and Interpretation : Evaluating Models Using Mallows' Cp Statistic

86

Cp

Mallows' Cp Statistic

Model Bias

Under-fitting Over-fitting

Page 87: Regression

Model Building and Interpretation : Evaluating Models Using Mallows' Cp Statistic

87

Cp

Mallows' Cp Statistic

Model Bias

For Prediction

Mallows' criterion: Cp <= p

criteria

Parameter estimation

Hockings' criterion: Cp <= 2p –pfull +1

Page 88: Regression

Model Building and Interpretation : Viewing Mallows' Cp Statistic

88

Cp Linea Regression task

Partial output

+1=p

Page 89: Regression

Model Building and Interpretation : Viewing Mallows' Cp Statistic

89

Partial output

Mallows' criterion: Cp <= p

In this output, how many models have a value for Cp that is less than or equal to p?

Which of these models has the fewest parameters?

Page 90: Regression

Model Building and Interpretation : Viewing Mallows' Cp Statistic

90

Partial output

How many models meet Hockings' criterion for Cp for parameter estimation?

First of all, what is the p for the full model? Hockings' criterion: Cp <= 2p –pfull +1

Pfull = 8 (7 vars +1intercept )

Cp <= 12 – 8 +1

Page 91: Regression

Model Building and Interpretation : Viewing Mallows' Cp Statistic

91

Question 5.

What happens when you use the all-possible regressions method? Select all that apply.

y 1. You compare the R-square, adjusted R-square, and Cp statistics to evaluate the models.

2. SAS computes al possible models3. You choose a selection method (stepwise, forward, or

backward) 4. SAS ranks the results.5. You cannot reduce the number of models in the output6. You can produce a plot to help identify models that satisfy

criteria for the Cp statistic.

y

y

y

Page 92: Regression

Model Building and Interpretation : Viewing Mallows' Cp Statistic

92

Question 6.

Match below items left and right.

c 1. Prefer to use R-square for evaluating multiple linear regression models (take into account the number of terms in the model).

2. Useful for parameter estimation3. Useful for prediction

b

a

a. Mallows' criterion for Cp.b. Hockings' criterion for Cp.c. adjusted R-square

Page 93: Regression

Model Building and Interpretation : Using Automatic Model Selection

93

Page 94: Regression

Model Building and Interpretation : Using Automatic Model Selection

94

Page 95: Regression

Model Building and Interpretation : Estimating and Testing Coefficients for Selected Models – Prediction model

95

Page 96: Regression

Model Building and Interpretation : Estimating and Testing Coefficients for Selected Models –Explanatory Model

96

Page 97: Regression

Model Building and Interpretation : Estimating and Testing Coefficients for Selected Models

97

Page 98: Regression

Model Building and Interpretation : The Stepwise Selection Approach to Model Building

98

Stepwise selection methods

Forward

Backward

Stepwise

Page 99: Regression

Model Building and Interpretation : The Stepwise Selection Approach to Model Building

99

Forward

Forward selection method starts with no variable, then select the most significant variable, until there is no significant variable. The variable added will not be removed even it becomes in-significant later.

Page 100: Regression

Model Building and Interpretation : The Stepwise Selection Approach to Model Building

100

Backward

Backward selection method starts with all variables in, then remove the most in-significant variable, until all variables left are significant. Once the variable is removed, it cannot re-enter.

Page 101: Regression

Model Building and Interpretation : The Stepwise Selection Approach to Model Building

101

Stepwise

Stepwise combines the thoughts of both Forward and Backward selection. It starts with no variable, then select the most significant variable as the Forward , however, like Backward selection, stepwise method can drop the in-significant variable one at a time. until there is no significant variable. Stepwise method stops when all terms in the model are significant , and all terms out off model are not significant.

Page 102: Regression

Model Building and Interpretation : The Stepwise Selection Approach to Model Building

102

Application Stepwise selection methods

Forward

Backward

Stepwise

Identify candidate models

Use expertise to choose

Page 103: Regression

Model Building and Interpretation : Performing Stepwise Regression: Forward selection

103

Page 104: Regression

Model Building and Interpretation : Performing Stepwise Regression: Forward selection

104

Page 105: Regression

Model Building and Interpretation : Performing Stepwise Regression: Backward selection

105

Page 106: Regression

Model Building and Interpretation : Performing Stepwise Regression: Backward selection

106

Page 107: Regression

Model Building and Interpretation : Performing Stepwise Regression: Stepwise selection

107

Page 108: Regression

Model Building and Interpretation : Performing Stepwise Regression: Stepwise selection

108

Page 109: Regression

Model Building and Interpretation : Using Alternative Significance Criteria for Stepwise Models

109

Stepwise Regression Models With default significant levels

Using 0.05 significant levels

Page 110: Regression

Model Building and Interpretation : Comparison of Selection Methods

110

Stepwise selection methods

All-possible regression

Use fewer computer resources

Generate more candidate models that might have nearly equal R2 and Cp statistics.

Page 111: Regression

Agenda• 0. Lesson overview• 1. Exploratory Data Analysis• 2. Simple Linear Regression• 3. Multiple Regression• 4. Model Building and Interpretation

• 5. Summary

111

Page 112: Regression

Home Work: Exercise 11.1 Describing the Relationship between Continuous Variables

Percentage of body fat, age, weight, height, and 10 body circumference measurements (for example, abdomen) were recorded for 252 men. The data are stored in the BodyFat2 data set. Body fat one measure of health, was accurately estimated by an underwater weighing technique. There are two measures of percentage body fat in this data set.

Case case numberPctBodyFat1 percent body fat using Brozek’s equation, 457/Density-414.2PctBodyFat2 percent body fat using Siri’s equation, 495/Density-450Density Density(gm/cm^3)Age Age(yrs)Weight weight(lbs)Height height (inches)

112

Page 113: Regression

Home Work: Exercise 1

113

Page 114: Regression

Home Work: Exercise 11.1 Describing the Relationship between Continuous Variables

a. Generate scatter plots and correlations for the variables Age, Weight, Height, and the circumference measures versus the variable PctBodyFat2.

Important! The Correlation task limits you to 10 variables at a time for scatter plot matrices, so for this exercise, look at the relationships with Age, Weight, and Height separately from the circumference variables (Neck, Chest, abdomen, Hip, thigh, Knee, Ankle, Biceps, Forearm, and Wrist)

Note: Correlation tables can be created using more than 10 VAR variables at a time.

b. What variable has the highest correlation with PctBodyFat2?c. What is the value for the coefficient?d. Is the correlation statistically significant at the 0.05 level?e. Can straight lines adequately describe the relationships?f. Are there any outliers that you should investigate?g. Generate correlations among the variable (Age, Weight, Height), among one another,

and among the circumference measures. Are there any notable relationships?

114

Page 115: Regression

Home Work: Exercise 22.1 Fitting a Simple Linear Regression Model

Use the BodyFat2 data set for this exercise:

a. Perform a simple linear regression model with PctBodyFat2 as the response variable and Weight as the predictor.

b. What is the value of the F statistic and the associated p-value? How would you interpret this with regard to the null hypothesis?

c. Write the predicted regression equation.d. What is the value of the R2 statistic? How would you interpret this?e. Produce predicted values for PctBodyFat2 when Weight is 125, 150, 175, 200 and

225. (see SAS code in below comments part)f. What are the predicted values?g. What’s the value of PctBodyFat2 when Weight is 150?

115

Page 116: Regression

Home Work: Exercise 33.1 Performa Multiple Regression

a. Using the BodyFat2 data set, run a regression of PctBodyFat2 on the variables Age, Weight, Height, Neck, Chest, Abdomen, Hip, thigh, Knee, Ankle, Biceps, Forearm, and Wrist. Compare the ANOVA table with that from the model with only Weight in the previous exercise. What is the different?

b. How do the R2 and the adjusted R2 compare with these statistics for the Weight regression demonstration?

c. Did the estimate for the intercept change? Did the estimate for the coefficient of Weight change?

116

Page 117: Regression

Home Work: Exercise 33.2 Simplifying the model

a. Rerun the model in the previous exercise, but eliminate the variable with the highest p-value. Compare the result with the previous model.

b. Did the p-value for the model change notably?c. Did the R2 and adjusted R2 change notably?d. Did the parameter estimates and their p-value change notably?

3.3 More simplifying of the model

e. Rerun the model in the previous exercise, but eliminate the variable with the highest p-value.

f. How did the output change from the previous model?g. Did the number of parameters with a p-value less than 0.05 change?

117

Page 118: Regression

Home Work: Exercise 44.1 Using Model Building Techniques

Use the BodyFat2 data set to identify a set of “best” models.

a. Using the Mallows' Cp option, use an all-possible regression technique to identify a set of candidate models that predict PctBodyFat2 as a function of the variables Age, Weight, Height, Neck, Chest, abdomen, Hip, thigh, Knee, Ankle, Biceps, Forearm, and Wrist .

Hint: select the best 60 models based on Cp to compare

b. Use a stepwise regression method to select a candidate model. Try Forward selection, Backward selection, and Stepwise selection.

c. How many variables would result from a model using Forward selection and a significant level for entry criterion of 0.05, instead of the default of 0.50?

118

Page 119: Regression

Thank you!