77
10/31/22 Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the data satisfies the assumption of linearity in a scatterplot of the raw data, provided the spread of the residuals is equal for all of the predicted values in the residual plot, and provided there are no outliers impacting the linear model. When the relationship we are analyzing does not meet these criteria, the use of regression analysis can still be justified if re- expressing one or both variables reduces the non-linear pattern in the scatterplot, equalizes the variance in the residual plot, and reduces the distance of outliers from the other cases in the distributions. Clues that re-expression might be effective in linearizing the relationship are: severe skewing of one or both variables (outside the range from -1.0 to +1.0), and when Spearman's rho greater than Pearson's r. There is no guarantee that re-expression will produce a scatterplot that satisfies the assumptions of linear regression. When it does not we are left with the choice

11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

Embed Size (px)

Citation preview

Page 1: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 1

SOLVING THE PROBLEM

Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the data satisfies the assumption of linearity in a scatterplot of the raw data, provided the spread of the residuals is equal for all of the predicted values in the residual plot, and provided there are no outliers impacting the linear model. When the relationship we are analyzing does not meet these criteria, the use of regression analysis can still be justified if re-expressing one or both variables reduces the non-linear pattern in the scatterplot, equalizes the variance in the residual plot, and reduces the distance of outliers from the other cases in the distributions.

Clues that re-expression might be effective in linearizing the relationship are: severe skewing of one or both variables (outside the range from -1.0 to +1.0), and when Spearman's rho greater than Pearson's r.

There is no guarantee that re-expression will produce a scatterplot that satisfies the assumptions of linear regression. When it does not we are left with the choice of determining that the violations are not of serious consequence, or choosing an alternative strategy for modeling the relationship.

Page 2: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 2

To solve these problems, we will first assess the conformity of the relationship to regression assumptions. Second, we will examine the criteria that suggest that re-expression might be effective. Third, we will examine the model using re-expressed variables to assess conformity to regression assumptions.

Finally, if the model using raw data supports the regression assumption, we will interpret the direction and strength of the relationship. If it was necessary to re-express one or both variables, we will interpret the strength of the re-expressed model, providing it satisfies the regression assumptions.

If the model still violates the conditions for a linear model, we will not interpret the direction and strength of the relationship. This is the convention in our homework problems; in a real application, I would consider interpreting the relationship, attaching a caution which identifies the violation of the assumptions.

In these problems, outliers are defined as cases that have a Cook’s distance greater than 0.5, and hence have a larger influence on the regression solution than other cases included in the analysis .

Page 3: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 3

The introductory statement in the question indicates:• The data set to use (world2007.sav)• The task to accomplish (simple linear regression)• The variables to use in the analysis: the independent

variable population median age in years [agemdn] and the dependent variable infant mortality rate [infmort]

Page 4: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 4

The second paragraph tells us how to re-express the variables, should it be necessary.

Page 5: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 5

The first statement asks about the size of the sample. To answer this question, we run the linear regression in SPSS.

Page 6: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 6

To compute a simple linear regression, select Regression> Linear from the Analyze menu.

Page 7: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 7

First, move the dependent variable, infmort, to the Dependent text box.

Second, move the independent variable, agemdn, to the Independent(s) list box.

Third, click on the Statistics button to request basic descriptive statistics.

Page 8: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 8

First, in addition to the defaults marked by SPSS, mark the check box for Descriptives so that we get the number of cases used in the analysis.

Second, click on the Continue button to close the dialog box.

Page 9: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 9

Next, click on the Plots button to request the residual plot.

Though we do not need it to answer the immediate, producing the residual plot now will save us time later in the problem.

Page 10: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 10

Second, move *ZPRED (for standardized predictions) to the Y axis text box.

First, move *ZRESID (for standardized residuals) to the Y axis text box.

Third, click on the Continue button to close the dialog box.

Page 11: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 11

Next, click on the Save button to include Cooks distance in the output.

Page 12: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 12

Click on the Continue button to close the dialog box.

Mark the check box for Cook’s distance to include this value in the data view and the output.

Page 13: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 13

Click on the OK button to request the output.

Page 14: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 14

Click on the OK button to request the output.

In the table of Descriptive Statistics, we see that the number of cases that have valid data for both variables is 190.

Page 15: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 15

The number of cases with valid data to analyze the relationship between "population median age in years" and "infant mortality rate" was 190, out of the total of 192 cases in the data set.

Mark the check box for a correct statement.

Page 16: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 16

The next statement asks us to examine a scatterplot to evaluate the assumption of linearity.

Page 17: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 17

To create the scatterplot, select the Legacy Dialogs > Scatter/Dot from the Graphs menu.

Page 18: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 18

In the Scatter/Dot dialog box, we click on Simple Scatter as the type of plot we want to create.

Click on the Define button to go to the next step.

Page 19: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 19

First, move the dependent variable infmort to the Y axis text box.

Second, move the independent variable agemdn to the X axis text box.

Third, click on the OK button to produce the plot.

Page 20: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 20

The scatterplot appears in the SPSS output window.

To facilitate our determination about the linearity of the plot, we will add a linear fit line, a loess fit line, and a confidence interval to the plot.

See slides 8 through 18 in the powerpoint titled: SimpleLinearRegression-Part2.ppt for directions on adding the fit lines and confidence interval to the plot.

Page 21: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 21

The criteria we use for evaluating linearity is a comparison of the loess fit line to the linear fit line. If the loess fit line falls within a 99% confidence interval around the linear fit line, we characterize the relationship as linear. Minor fluctuations over the lines of the confidence interval are ignored.

The pattern of points in the scatterplot shows an obvious curve indicating non-linearity.

The assumption of linearity is not satisfied.

Page 22: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 22

The pattern of points in the scatterplot showed an obvious curve indicating non-linearity. The assumption of linearity is not satisfied.

The check box is not marked.

We can try re-expressing one or both variables to see if we can improve the linearity of the relationship sufficiently to justify the use of linear regression to analyze the relationship.

Page 23: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 23

The next statement asks us to examine the residual plot for evidence that the assumptions of linearity or homogeneity of variance are violated.

We will not examine the residual plot when we have a clear violation of the assumption of linearity in the scatterplot.

We would leave this statement unchecked when we violate linearity because it is difficult to evaluate homogeneity of variance when the plot is non-linear.

However, we will examine the residual plot just for practice.

If we violate linearity, homogeneity of variance, or have influential cases, we jump ahead to the question comparing r and rho.

Page 24: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 24

If we add a loess fit line to the residual plot, we see that the non-linearity in the scatterplot is supported by the residual plot.

Just for practice, we examine the residual plot for linearity and homogeneity of variance.

Page 25: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 25

The violation of homogenity of variance is also clearly visible in the residual plot. The spread of residual is narrow for low predicted values, but substantially larger for high predicted values.

Page 26: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 26

The next statement asks us to examine the impact of influential cases on the analysis. Like outliers, we might consider elimination of influential cases to improve the fit for the majority of cases.

Since re-expression will alter the distance used to identify influential cases, we will not evaluate Cook's distances until we satisfy the assumptions of linearity and homogeneity.

While we would leave this statement unchecked when we violate linearity or homogeneity of variance, we will check for the presence of influential cases just for practice.

Page 27: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 27

Summary information about Cook’s Distances is found in the table of Residual Statistics.

The maximum Cook’s distance is .150, less than the cutoff of .50 specified for the problems.

We do not cases that have very high values of Cook’s distance for the variables in this relationship.

Page 28: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 28

The next statement asks us to compare Spearman’s rho to Pearson’s r to assess our expectations for the effectiveness of re-expressing the variables.

To compare rho and r, we compute the correlation coefficients in SPSS.

Page 29: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 29

To compute correlations, select Correlate > Bivariate from the Correlate menu.

Page 30: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 30

First, move the variables agemdn and infmort to the Variables list box.

Second, mark the check box for Spearman and leave the check box for Pearson marked.

Third, click on the OK button to produce the output.

Page 31: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 31

Spearman's rho (-0.886) was larger than Pearson's r (-0.732). The feasibility of re-expressing the data to correct for the violation of regression assumptions is supported.

Page 32: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 32

Spearman's rho (-0.886) was larger than Pearson's r (-0.732). The feasibility of re-expressing the data to correct for the violation of regression assumptions is supported.

The check box for a correct answer is marked.

Page 33: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 33

The next statement asks us which transformations we used to try to induce linearity in the scatterplot.

We should re-express variables that have skewness less than -1.0 or greater than +1.0.

Page 34: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 34

We will use the Descriptives procedure to obtain skewness for both variables.

Select Descriptive Statistics > Descriptives from the Analyze menu.

Page 35: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 35

First, move the variables agemdn and infmort to the Variable(s) list box.

Second, click on the Options button to specify our choice for statistics.

Page 36: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 36

Next, mark the check boxes for Kurtosis and Skewness in addition to the defaults marked by SPSSS.

Finally, click on the Continue button to close the dialog box.

Page 37: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 37

Click on the OK button to produce the output.

Page 38: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 38

The skewness for "infant mortality rate" [infmort] was 1.470. The skewness for "population median age in years" [agemdn] was 0.456

Since the skew for the dependent variable "infant mortality rate" [infmort] (1.470) was equal to or greater than +1.0, we attempt to correct violation of assumptions by re-expressing "infant mortality rate" on a logarithmic scale. Since the skew for the independent variable "population median age in years" [agemdn] (0.456) was greater than -1.0 and less than +1.0, we do not attempt to correct violation of assumptions by re-expressing it.

Page 39: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 39

Since the skew for the dependent variable "infant mortality rate" [infmort] (1.470) was equal to or greater than +1.0, we attempt to correct violation of assumptions by re-expressing "infant mortality rate" on a logarithmic scale.

We mark the check box for a correct statement.

Page 40: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 40

The next statement asks whether or not the relationship with the re-expressed variable, LG_infmort, supports the assumption of linearity.

Page 41: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 41

We first create the transformed variable, the logarithm of infmort.

Select the Compute Variable command from the Transform menu.

Page 42: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 42

First, type the name for the re-expressed variable in the Target Variable text box.

The directions for the problem give us the formula for the transformation: Use the formula LG10(infmort) to create the log transformation of infant mortality rate [LG_infmort].

Second, type the formula in the Numeric Expression text box.

Third, click on the OK button to compute the transformation.

Page 43: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 43

Next, we create the scatterplot for the relationship with the re-expressed variable.

To create the scatterplot, select the Legacy Dialogs > Scatter/Dot from the Graphs menu.

Page 44: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 44

In the Scatter/Dot dialog box, we click on Simple Scatter as the type of plot we want to create.

Click on the Define button to go to the next step.

Page 45: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 45

First, move the dependent variable LG_infmort to the Y axis text box.

Second, move the independent variable agemdn to the X axis text box.

Third, click on the OK button to produce the plot.

Page 46: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 46

The scatterplot looks linear, but to make sure we will add fit lines and a confidence interval.

Page 47: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 47

The pattern of points in the scatterplot does not show an obvious curve indicating non-linearity. The assumption of linearity is satisfied.

See slides 8 through 18 in the powerpoint titled: SimpleLinearRegression-Part2.ppt for directions on adding the fit lines and confidence interval to the plot.

Page 48: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 48

The pattern of points in the scatterplot does not show an obvious curve indicating non-linearity. The assumption of linearity is satisfied.

The check box for a correct answer is marked.

Page 49: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 49

The next statement asks whether or not the residual plot supports the assumptions of linearity and equal variance.

Page 50: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 50

To compute a simple linear regression, select Regression> Linear from the Analyze menu.

We next do the regression analysis, creating the residual plot in the process.

Page 51: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 51

First, move the dependent variable, LG_infmort, to the Dependent text box.

Second, move the independent variable, agemdn, to the Independent(s) list box.

Third, click on the Statistics button to request basic descriptive statistics.

Page 52: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 52

First, in addition to the defaults marked by SPSS, mark the check box for Descriptives so that we get the number of cases used in the analysis.

Second, click on the Continue button to close the dialog box.

Page 53: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 53

Next, click on the Plots button to request the residual plot.

Page 54: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 54

Second, move *ZPRED (for standardized predictions) to the Y axis text box.

First, move *ZRESID (for standardized residuals) to the Y axis text box.

Third, click on the Continue button to close the dialog box.

Page 55: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 55

Next, click on the Save button to include Cooks distance in the output.

Page 56: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 56

Click on the Continue button to close the dialog box.

Mark the check box for Cook’s distance to include this value in the data view and the output.

Page 57: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 57

Click on the OK button to request the output.

Page 58: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 58

The pattern of points in the residual plot does not show an obvious curve indicating non-linearity. The assumption of linearity is satisfied, confirming the finding of linearity in the scatterplot for the data.

The pattern of points in the residual plot shows equal spread across the standardized predictions of "infant mortality rate" [infmort]. The assumption of equal variance is satisfied.

Page 59: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 59

The pattern of points in the residual plot does not show an obvious curve indicating non-linearity and shows equal spread across the standardized predictions.

The check box for a correct answer is marked.

Page 60: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 60

The next statement asks about the presence of influential cases after the variable is re-expressed.

Page 61: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 61

Summary information about Cook’s Distances is found in the table of Residual Statistics.

There were no cases that had a Cook's distance of 0.5 or greater, qualifying as influential cases.

Page 62: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 62

Since there were no Cook’s distances greater than 0.5, we mark the check box as correct.

Page 63: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 63

The next statement asks about the direction of the relationship between the variables.

Page 64: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 64

The slope for the regression equation between "population median age in years" [agemdn] and the log transformation of "infant mortality rate" [LG_infmort] was -.05. The negative value for the slope means that scores for the two variables change in the opposite direction. Higher scores on the variable "population median age in years" were associated with lower scores on the log transformation of "infant mortality rate".

Page 65: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 65

The negative slope implies an inverse relationship in which increases in one variable are associated with lower scores on the other variables. The statement that "countries who had a higher median age in years had a lower infant mortality rate" is correct.

We mark the statement as correct.

Page 66: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 66

The next statement asks about the strength of the relationship based on Tukey’s criteria.

Page 67: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 67

Using the rule of thumb attributed to Tukey, an R² between 0.0 and 0.04 is very weak; 0.04 to 0.16 is weak; 0.16 to 0.36 is moderate; 0.36 to 0.64 is strong; and greater than 0.64 is very strong, the relationship between the log transformation of "infant mortality rate" [LG_infmort] and "population median age in years" [agemdn]was correctly characterized as a very strong relationship (R² = 78.2%).

To answer the question about the strength of the relationship, we look to the Model Summary table.

Page 68: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 68

The relationship between the log transformation of "infant mortality rate" [LG_infmort] and "population median age in years" [agemdn]was correctly characterized as a very strong relationship (R² = 78.2%).

The check box for a correct answer is marked.

Page 69: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 69

The final statement asks about the strength of the relationship based on Cohen’s criteria.

Page 70: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 70

Applying Cohen's criteria for effect size (less than 0.01 = trivial; 0.01 up to 0.09 = weak or small; 0.09 up to 0.25 = moderate; 0.25 or greater = strong or large), the relationship between the log transformation of "infant mortality rate" [LG_infmort] and "population median age in years" [agemdn]was incorrectly characterized as a moderate relationship.

The relationship should have been characterized as a strong relationship (R² = 78.2%).

Page 71: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 71

The relationship between the log transformation of "infant mortality rate" [LG_infmort] and "population median age in years" [agemdn]was incorrectly characterized as a moderate relationship.

The check box for a correct statement is not marked.

Page 72: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 72

Dependent variable and independent variable

both quantitative?

Yes

Stop. Remaining statements

are not marked.

Sample size stated correctly?

Yes

Do not mark check box.

Mark statement check box.

No

No

There is no explicit question about level of measurement, but it should always be something we consider.

Because of the large number of steps needed to solve this problem, we will outline the process.

Page 73: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 73

Scatterplot supports linearity assumption?

Yes

Mark statement check box.

Residual plot supports linearity/homogeneity

?

Yes

Mark statement check box.

If we don’t satisfy linearity, we go to the comparison of r and rho.

If we don’t satisfy homogeneity, we go to the comparison of r and rho.

Do not mark check box.

No

Do not mark check box.

No

Page 74: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23Slide 74

Cook’s distance < 0.5 for all cases?

Yes

Mark statement check box.

Spearman’s rho > Pearsons r?

Yes

Mark statement check box.

No

Do not mark check box.

When we satisfy the linear conditions, we bypass the questions on re-expression.

We use the comparison of r and rho to frame our expectations of re-expressing variables. Note: we compare absolute values, ignoring the sign of both r and rho.

Do not mark check box.

No

Page 75: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 75

Scatterplot supports linearity assumption?

Yes Do not mark check box.

Mark statement check box.

No

Re-express variables with skew ≤ -1.0 or ≥ 1.0

Residual plot supports linearity/homogeneity

?

Yes Do not mark check box.

Mark statement check box.

No

We re-express one or both variable, and re-test for a linear relationship.

Page 76: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23

YesDo not mark

check box.

Mark statement check box.

Cook’s distance < 0.5 for all cases?

Stop.Linear model is not

appropriate.

No

Yes No

Do not mark check box.

Mark statement check box.

Direction of relationship(b) interpreted correctly?

If we support a linear model with either the raw data or the re-expressed data, we interpret direction and strength.

This is the strategy for our homework problems. In reality, we might choose to interpret the relationship even though assumptions were violated.

Page 77: 11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the

04/20/23 Slide 77

Strength of relationship(R²) interpreted correctly?

Yes No

Do not mark check box.

Mark statement check box.