core.ecu.educore.ecu.edu/.../MV/LSANOVA/ANOVA_Dummy-Coded_PDS.docx · Web viewCreate a Word document in which you write your summary statement (see my example below) and then you

PDS: Independent Samples ANOVA via Multiple Regression

Your assignment is to recode your PDS data so that the two dichotomous variables are converted into a 4-level classification variable. You will then dummy code the groups and use a multiple regression analysis to conduct a one-way ANOVA comparing the four groups on one of your continuous variables.

Create a Word document in which you write your summary statement (see my example below) and then you paste in the SAS, SPSS, or R output. Then go to the Discussion Board in BlackBoard, PDS_ANOVA_Dummy forum. Create a new thread using your last name as the subject. Paste in your summary statement and attach your Word document.

Here is how to do this, using data kindly provided by Grace Williams. Our dichotomous variables are whether or not the subject is male and whether or not the subject is a college student. For both, 0 = no and 1 = yes. The outcome variable is how much time the subject takes to get ready in the morning. For me, this includes making breakfast, eating breakfast, pinching a loaf, brushing my teeth, cleaning my eyeglasses, checking my email, and getting dressed. In the Winter, this may also include scraping the damn ice off of the windshield of my Subaru. My prep time can be as little as 30 minutes or as much as 60 minutes.

SAS

Proc Format; value sex 0='Female' 1='Male'; value grp 1='Fem Not' 2='Fem Stud' 3='Male Not' 4 = 'Male Stud';Data Grace;Input Male CollegeStud MorningPrepMin SleepHours;*Create the variable Group;If Male = 0 AND CollegeStud = 0 then Group = 1; Else If Male = 0 AND CollegeStud = 1 then Group = 2; Else If Male = 1 AND CollegeStud = 0 then Group = 3; Else If Male = 1 AND CollegeStud = 1 then Group = 4;*Create the Dummy Variables;D1 = 0; D2 = 0; D3=0;If Group = 1 then D1 = 1; If Group = 2 then D2 = 1; If Group = 3 then D3 = 1;cards;0 0 20 70 0 60 61 0 60 61 0 60 70 1 60 7.51 0 30 80 0 180 50 1 60 6.50 1 20 60 0 30 60 0 60 70 0 60 6.51 1 10 71 1 20 60 1 60 9.50 0 15 9.50 1 45 7.5

http://core.ecu.edu/psyc/wuenschk/SPSS/MorningPrepTime.sav

1 0 60 61 0 30 7.50 0 90 61 1 30 71 0 30 61 0 20 81 0 30 5.50 1 90 5.51 1 30 71 1 25 61 1 60 61 0 10 6.51 0 45 60 1 90 60 0 120 7.50 0 20 70 1 20 5.50 0 45 71 0 20 6.51 0 60 70 1 30 61 0 20 60 0 60 50 0 45 4.50 0 45 60 0 20 6.50 0 30 5.51 1 10 7.50 0 20 81 0 30 50 0 60 91 0 10 61 1 15 4.5;Proc Sort; By Group; run;Proc Means; Var MorningPrepMin; By Group; run;Proc Reg; Model MorningPrepMin = D1 D2 D3; run; quit;*Normally the data would be in an external file and brought in with INFILE. Here I put the data in the program so you could see them.

The SAS OutputGroup=Fem Not Stdnt

Analysis Variable : MorningPrepMin

N Mean Std Dev Minimum Maximum

18

54.4444444 41.4760345

15.0000000

180.0000000

Group=Fem Stdnt



9 52.7777778

26.5884269

20.0000000 90.0000000

Group=Male Not Stdnt



15 34.3333333

18.2117178

10.0000000

60.0000000

Group=Male Stdnt



8 25.0000000

16.2568667

10.0000000 60.0000000

The REG Procedure

Dependent Variable: MorningPrepMin

Number of Observations Read

50

Number of Observations Used

50

Analysis of Variance

Source DF Sum ofSquares

MeanSquare

F Value Pr > F

Model 3 6928.66667

2309.55556

2.57 0.0659

Error 46 41393 899.85507

Analysis of Variance

Source DF Sum ofSquares

MeanSquare

F Value Pr > F

Corrected Total

49 48322

The omnibus ANOVA is not quite significant, but the planned comparisons (comparing the reference group with each of the other groups) can still be interpreted. Women who are not students take more time to get ready in the morning than do men who are students. We still need to get a 90% confidence interval for the R2. Instructions for doing this can be found here: Confidence Intervals for R and R 2 .

Root MSE 29.99758 R-Square 0.1434

Dependent Mean 43.40000 Adj R-Sq 0.0875

Coeff Var 69.11886

Parameter EstimatesVariable DF Parameter

EstimateStandard

Errort Value Pr > |

t|Squared

Semi-partialCorr Type II

Intercept 1 25.00000 10.60575 2.36 0.0227 .

D1 1 29.44444 12.74652 2.31 0.0254 0.09937

D2 1 27.77778 14.57621 1.91 0.0630 0.06763

D3 1 9.33333 13.13287 0.71 0.4809 0.00941

The contrast between Group 1 and Group 4 is significant.

Missing DataIf you have missing data on the classification variables, it is possible that you will have cases

that belong in none of the groups but nevertheless have data on the three dummy variables. Such cases should be excluded from the analysis. If your sample is small, this can be done manually. If your sample is large, use syntax

Data Cull; Set Original; If Group GE 0 ;

SPSSRecode the two dichotomous variables into a different variable, Group. Start with cases IF

Male = 0.

http://core.ecu.edu/psyc/wuenschk/docs30/CI-R2.docx

http://core.ecu.edu/psyc/wuenschk/docs30/CI-R2.docx

Recode again, but this time IF MALE = 1.

Values 3 and 4 for Group.

DO IF (Male = 0).RECODE CollegeStud (0=1) (1=2) INTO Group.END IF.EXECUTE.

DO IF (Male = 1).RECODE CollegeStud (0=3) (1=4) INTO Group.END IF.EXECUTE.

Another way to create the group variable

Dichotomous variables X1 and X2 are coded 0,1. Recode X2 to 0,2. Then compute Group =

X1 + X2.

SAS. If X2 = 1 then X2 = 2; Group = X1 + X2;

SPSS. RECODE X2 (1=2). EXECUTE. COMPUTE Group=X1 + X2. EXECUTE.

X1 = 0, X2 = 0, Group = 0.

X1 = 1, X2 = 0, Group = 1.

X1 = 0, X2 = 2, Group = 2.

X1 = 1, X2 = 2, Group = 3.

Creating the Dummy Variables: This syntax will do the trick. Yes, you can create the syntax by

pointing and clicking, but that is rather tedious.

COMPUTE D1=0.EXECUTE.COMPUTE D2=0.EXECUTE.COMPUTE D3=0.EXECUTE.IF (Group=1) D1=1.EXECUTE.IF (Group=2) D2=1.EXECUTE.IF (Group=3) D3=1.EXECUTE.

Predict Morning Prep Time From The Dummy Variables

Variables Entered/Removeda

ModelVariables Entered

Variables Removed Method

1 D2, D3, D1b . Enter

a. Dependent Variable: MorningPrepMinb. All requested variables entered.

Model Summary

Model R R SquareAdjusted R

SquareStd. Error of the Estimate

1 .379a .143 .088 29.998

a. Predictors: (Constant), D2, D3, D1

ANOVAa

ModelSum of

Squares df Mean Square F Sig.

1 Regression 6928.667 3 2309.556 2.567 .066b

Residual 41393.333 46 899.855

Total 48322.000 49

a. Dependent Variable: MorningPrepMinb. Predictors: (Constant), D2, D3, D1

Obtaining the Confidence Interval for R2

The CI runs from 0 to .259

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.Correlations

B Beta Zero-order Partial Part

1 (Constant) 25.000 2.357 .023

D1 29.444 .455 2.310 .025 .266 .322 .315

D2 27.778 .343 1.906 .063 .141 .271 .260

D3 9.333 .138 .711 .481 -.191 .104 .097

a. Dependent Variable: MorningPrepMin

Creating a new grouping variable from existing dichotomous variables can also be done via

concatenation.

Missing DataIf you have missing data on the classification variables, it is possible that you will have cases

that belong in none of the groups but nevertheless have data on the three dummy variables. Such cases should be excluded from the analysis. Look at Case 22 in the screenshot below. The Ethnicity datum is missing, but that case still has data on the three dummy variables. To eliminate such cases, use Data, Select Cases.USE ALL.COMPUTE filter_$=(Group > 0).VARIABLE LABELS filter_$ 'Group > 0 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMATS filter_$ (f1.0).FILTER BY filter_$.EXECUTE.

http://core.ecu.edu/psyc/wuenschk/SAS/Concatenate.docx

Presenting the ResultsA short survey was administered to a convenience sample of 50 of my friends. Subjects were

asked whether they were female or male, student or nonstudent, and how many minutes it typically took them to get ready in the morning. Four groups were defined: Female nonstudent, female student, male nonstudent, male student. The male student group was designated as the reference group for planned comparisons with the other groups.

The descriptive statistics are displayed in Table 1. Although the multiple regression fell short of significance, F(3, 46) = 2.57, p = .066, R2 = .143. 90% CI [0, .259], the comparison between female

nonstudents and male students was significant, p = .025. Female nonstudents took significantly more time to get ready in the morning than did male students.

Table 1Time Taken to Get Ready in the Morning (Minutes)

Group M SD N g1 g2 sr p

Female Nonstudent

54.4 41.5 18 1.90 4.24 .315 .025

Female Student 52.8 26.6 9 0.18 -1.13 .260 .063

Male Nonstudent 34.3 18.2 8 0.42 -1.19 .097 .481

Male Student 25.0 16.3 15 1.56 3.03

Note. Values of sr and p are for comparing each of first three groups with the male student group.

I am not requiring you to report g1 and g2 (skewness and kurtosis), but it would be good practice to obtain these estimates and, if they indicate problems, report them and take corrective action. In SPSS one would split the file by group and then obtain the desired statistics. Here I show only the output for the skewness and kurtosis statistics.

StatisticsMorningPrepMin1 N Valid 18

Missing 0

Skewness 1.902

Kurtosis 4.244

2 N Valid 9

Missing 0

Skewness .184

Kurtosis -1.126

3 N Valid 15

Missing 0

Skewness .420

Kurtosis -1.193

4 N Valid 8

Missing 0

Skewness 1.563

Kurtosis 3.021

Notes on the AnalysisThe ratio of the largest group variance to the smallest group variance, (41.5/16.3)2, is 6.5. This

is pretty good evidence that the homogeneity of variance assumption has been violated. Also note that the highest score among the female nonstudents was a whopping 180 minutes. I am thinking this is an outlier that needs to be investigated, and if it is a valid score then the distribution within that group is likely positively skewed, violating the normality assumption. I checked the skewness and kurtosis within each group and added those to the table above. There is unacceptable positive skewness in the female nonstudent and the male students groups. Any transformation that would correct the skewness in those groups would make the scores in the other group unacceptably negatively skewed. The high values of kurtosis in the troublesome groups indicates the likely presence of outliers. To analyze these data properly, we would need to resort to nonparametric or resampling analysis.

Regression diagnostics showed that the female nonstudent case with a score of 180 had a standardized residual of 4.3. In that same group there was also a case with a score of 120 and a standardized residual of 2.3. The outlier in the male student group had a standardized residual of 1.2.

The schematic plot to the right illustrates the problems with these data.

http://core.ecu.edu/psyc/wuenschk/StatHelp/Resampling.htm

http://core.ecu.edu/psyc/wuenschk/docs30/nonparm.docx

Documents

core.ecu.educore.ecu.edu/.../MV/LSANOVA/ANOVA_Dummy-Coded_PDS.docx · Web viewCreate a Word document in which you write your summary statement (see my example below) and then you