Upload
nguyenliem
View
216
Download
2
Embed Size (px)
Citation preview
PDS: Independent Samples ANOVA via Multiple Regression
Your assignment is to recode your PDS data so that the two dichotomous variables are converted into a 4-level classification variable. You will then dummy code the groups and use a multiple regression analysis to conduct a one-way ANOVA comparing the four groups on one of your continuous variables.
Create a Word document in which you write your summary statement (see my example below) and then you paste in the SAS, SPSS, or R output. Then go to the Discussion Board in BlackBoard, PDS_ANOVA_Dummy forum. Create a new thread using your last name as the subject. Paste in your summary statement and attach your Word document.
Here is how to do this, using data kindly provided by Grace Williams. Our dichotomous variables are whether or not the subject is male and whether or not the subject is a college student. For both, 0 = no and 1 = yes. The outcome variable is how much time the subject takes to get ready in the morning. For me, this includes making breakfast, eating breakfast, pinching a loaf, brushing my teeth, cleaning my eyeglasses, checking my email, and getting dressed. In the Winter, this may also include scraping the damn ice off of the windshield of my Subaru. My prep time can be as little as 30 minutes or as much as 60 minutes.
SAS
Proc Format; value sex 0='Female' 1='Male'; value grp 1='Fem Not' 2='Fem Stud' 3='Male Not' 4 = 'Male Stud';Data Grace;Input Male CollegeStud MorningPrepMin SleepHours;*Create the variable Group;If Male = 0 AND CollegeStud = 0 then Group = 1; Else If Male = 0 AND CollegeStud = 1 then Group = 2; Else If Male = 1 AND CollegeStud = 0 then Group = 3; Else If Male = 1 AND CollegeStud = 1 then Group = 4;*Create the Dummy Variables;D1 = 0; D2 = 0; D3=0;If Group = 1 then D1 = 1; If Group = 2 then D2 = 1; If Group = 3 then D3 = 1;cards;0 0 20 70 0 60 61 0 60 61 0 60 70 1 60 7.51 0 30 80 0 180 50 1 60 6.50 1 20 60 0 30 60 0 60 70 0 60 6.51 1 10 71 1 20 60 1 60 9.50 0 15 9.50 1 45 7.5
1 0 60 61 0 30 7.50 0 90 61 1 30 71 0 30 61 0 20 81 0 30 5.50 1 90 5.51 1 30 71 1 25 61 1 60 61 0 10 6.51 0 45 60 1 90 60 0 120 7.50 0 20 70 1 20 5.50 0 45 71 0 20 6.51 0 60 70 1 30 61 0 20 60 0 60 50 0 45 4.50 0 45 60 0 20 6.50 0 30 5.51 1 10 7.50 0 20 81 0 30 50 0 60 91 0 10 61 1 15 4.5;Proc Sort; By Group; run;Proc Means; Var MorningPrepMin; By Group; run;Proc Reg; Model MorningPrepMin = D1 D2 D3; run; quit;*Normally the data would be in an external file and brought in with INFILE. Here I put the data in the program so you could see them.
The SAS OutputGroup=Fem Not Stdnt
Analysis Variable : MorningPrepMin
N Mean Std Dev Minimum Maximum
18
54.4444444 41.4760345
15.0000000
180.0000000
Group=Fem Stdnt
Analysis Variable : MorningPrepMin
N Mean Std Dev Minimum Maximum
9 52.7777778
26.5884269
20.0000000 90.0000000
Group=Male Not Stdnt
Analysis Variable : MorningPrepMin
N Mean Std Dev Minimum Maximum
15 34.3333333
18.2117178
10.0000000
60.0000000
Group=Male Stdnt
Analysis Variable : MorningPrepMin
N Mean Std Dev Minimum Maximum
8 25.0000000
16.2568667
10.0000000 60.0000000
The REG Procedure
Dependent Variable: MorningPrepMin
Number of Observations Read
50
Number of Observations Used
50
Analysis of Variance
Source DF Sum ofSquares
MeanSquare
F Value Pr > F
Model 3 6928.66667
2309.55556
2.57 0.0659
Error 46 41393 899.85507
Analysis of Variance
Source DF Sum ofSquares
MeanSquare
F Value Pr > F
Corrected Total
49 48322
The omnibus ANOVA is not quite significant, but the planned comparisons (comparing the reference group with each of the other groups) can still be interpreted. Women who are not students take more time to get ready in the morning than do men who are students. We still need to get a 90% confidence interval for the R2. Instructions for doing this can be found here: Confidence Intervals for R and R 2 .
Root MSE 29.99758 R-Square 0.1434
Dependent Mean 43.40000 Adj R-Sq 0.0875
Coeff Var 69.11886
Parameter EstimatesVariable DF Parameter
EstimateStandard
Errort Value Pr > |
t|Squared
Semi-partialCorr Type II
Intercept 1 25.00000 10.60575 2.36 0.0227 .
D1 1 29.44444 12.74652 2.31 0.0254 0.09937
D2 1 27.77778 14.57621 1.91 0.0630 0.06763
D3 1 9.33333 13.13287 0.71 0.4809 0.00941
The contrast between Group 1 and Group 4 is significant.
Missing DataIf you have missing data on the classification variables, it is possible that you will have cases
that belong in none of the groups but nevertheless have data on the three dummy variables. Such cases should be excluded from the analysis. If your sample is small, this can be done manually. If your sample is large, use syntax
Data Cull; Set Original; If Group GE 0 ;
SPSSRecode the two dichotomous variables into a different variable, Group. Start with cases IF
Male = 0.
Recode again, but this time IF MALE = 1.
Values 3 and 4 for Group.
DO IF (Male = 0).RECODE CollegeStud (0=1) (1=2) INTO Group.END IF.EXECUTE.
DO IF (Male = 1).RECODE CollegeStud (0=3) (1=4) INTO Group.END IF.EXECUTE.
Another way to create the group variable
Dichotomous variables X1 and X2 are coded 0,1. Recode X2 to 0,2. Then compute Group =
X1 + X2.
SAS. If X2 = 1 then X2 = 2; Group = X1 + X2;
SPSS. RECODE X2 (1=2). EXECUTE. COMPUTE Group=X1 + X2. EXECUTE.
X1 = 0, X2 = 0, Group = 0.
X1 = 1, X2 = 0, Group = 1.
X1 = 0, X2 = 2, Group = 2.
X1 = 1, X2 = 2, Group = 3.
Creating the Dummy Variables: This syntax will do the trick. Yes, you can create the syntax by
pointing and clicking, but that is rather tedious.
COMPUTE D1=0.EXECUTE.COMPUTE D2=0.EXECUTE.COMPUTE D3=0.EXECUTE.IF (Group=1) D1=1.EXECUTE.IF (Group=2) D2=1.EXECUTE.IF (Group=3) D3=1.EXECUTE.
Predict Morning Prep Time From The Dummy Variables
Variables Entered/Removeda
ModelVariables Entered
Variables Removed Method
1 D2, D3, D1b . Enter
a. Dependent Variable: MorningPrepMinb. All requested variables entered.
Model Summary
Model R R SquareAdjusted R
SquareStd. Error of the Estimate
1 .379a .143 .088 29.998
a. Predictors: (Constant), D2, D3, D1
ANOVAa
ModelSum of
Squares df Mean Square F Sig.
1 Regression 6928.667 3 2309.556 2.567 .066b
Residual 41393.333 46 899.855
Total 48322.000 49
a. Dependent Variable: MorningPrepMinb. Predictors: (Constant), D2, D3, D1
Obtaining the Confidence Interval for R2
The CI runs from 0 to .259
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig.Correlations
B Beta Zero-order Partial Part
1 (Constant) 25.000 2.357 .023
D1 29.444 .455 2.310 .025 .266 .322 .315
D2 27.778 .343 1.906 .063 .141 .271 .260
D3 9.333 .138 .711 .481 -.191 .104 .097
a. Dependent Variable: MorningPrepMin
Creating a new grouping variable from existing dichotomous variables can also be done via
concatenation.
Missing DataIf you have missing data on the classification variables, it is possible that you will have cases
that belong in none of the groups but nevertheless have data on the three dummy variables. Such cases should be excluded from the analysis. Look at Case 22 in the screenshot below. The Ethnicity datum is missing, but that case still has data on the three dummy variables. To eliminate such cases, use Data, Select Cases.USE ALL.COMPUTE filter_$=(Group > 0).VARIABLE LABELS filter_$ 'Group > 0 (FILTER)'.VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.FORMATS filter_$ (f1.0).FILTER BY filter_$.EXECUTE.
Presenting the ResultsA short survey was administered to a convenience sample of 50 of my friends. Subjects were
asked whether they were female or male, student or nonstudent, and how many minutes it typically took them to get ready in the morning. Four groups were defined: Female nonstudent, female student, male nonstudent, male student. The male student group was designated as the reference group for planned comparisons with the other groups.
The descriptive statistics are displayed in Table 1. Although the multiple regression fell short of significance, F(3, 46) = 2.57, p = .066, R2 = .143. 90% CI [0, .259], the comparison between female
nonstudents and male students was significant, p = .025. Female nonstudents took significantly more time to get ready in the morning than did male students.
Table 1Time Taken to Get Ready in the Morning (Minutes)
Group M SD N g1 g2 sr p
Female Nonstudent
54.4 41.5 18 1.90 4.24 .315 .025
Female Student 52.8 26.6 9 0.18 -1.13 .260 .063
Male Nonstudent 34.3 18.2 8 0.42 -1.19 .097 .481
Male Student 25.0 16.3 15 1.56 3.03
Note. Values of sr and p are for comparing each of first three groups with the male student group.
I am not requiring you to report g1 and g2 (skewness and kurtosis), but it would be good practice to obtain these estimates and, if they indicate problems, report them and take corrective action. In SPSS one would split the file by group and then obtain the desired statistics. Here I show only the output for the skewness and kurtosis statistics.
StatisticsMorningPrepMin1 N Valid 18
Missing 0
Skewness 1.902
Kurtosis 4.244
2 N Valid 9
Missing 0
Skewness .184
Kurtosis -1.126
3 N Valid 15
Missing 0
Skewness .420
Kurtosis -1.193
4 N Valid 8
Missing 0
Skewness 1.563
Kurtosis 3.021
Notes on the AnalysisThe ratio of the largest group variance to the smallest group variance, (41.5/16.3)2, is 6.5. This
is pretty good evidence that the homogeneity of variance assumption has been violated. Also note that the highest score among the female nonstudents was a whopping 180 minutes. I am thinking this is an outlier that needs to be investigated, and if it is a valid score then the distribution within that group is likely positively skewed, violating the normality assumption. I checked the skewness and kurtosis within each group and added those to the table above. There is unacceptable positive skewness in the female nonstudent and the male students groups. Any transformation that would correct the skewness in those groups would make the scores in the other group unacceptably negatively skewed. The high values of kurtosis in the troublesome groups indicates the likely presence of outliers. To analyze these data properly, we would need to resort to nonparametric or resampling analysis.
Regression diagnostics showed that the female nonstudent case with a score of 180 had a standardized residual of 4.3. In that same group there was also a case with a score of 120 and a standardized residual of 2.3. The outlier in the male student group had a standardized residual of 1.2.
The schematic plot to the right illustrates the problems with these data.