49
Tias Sen 6/4/17 Data Analysis Report Executive Summary In this study, subjects are randomly assigned to one of three different treatments of instruction type (standard, A, or B) and then play a game to earn money. In addition to treatment type, the independent variables of gender, age, and size of working memory prior to the game are analyzed to determine what makes a strong prediction of how many dollars a subject will earn in the game. For the purposes of the model selection process, subjects who went bankrupt during the game were not included in the data. The final model chosen eliminated the age and working memory variables because of their lack of contribution to predicting the amount of money a subject will earn. For predicting the amount of money, the model presented utilizes the interaction between a subject’s gender and treatment type. This resulted in women, on average, earning more money if they were assigned to treatment A, but men earning more money, on average, if they were assigned to treatment B. Introduction 1

hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

  • Upload
    ngocong

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Data Analysis Report

Executive Summary

In this study, subjects are randomly assigned to one of three different treatments of

instruction type (standard, A, or B) and then play a game to earn money. In addition to treatment

type, the independent variables of gender, age, and size of working memory prior to the game are

analyzed to determine what makes a strong prediction of how many dollars a subject will earn in

the game. For the purposes of the model selection process, subjects who went bankrupt during

the game were not included in the data. The final model chosen eliminated the age and working

memory variables because of their lack of contribution to predicting the amount of money a

subject will earn. For predicting the amount of money, the model presented utilizes the

interaction between a subject’s gender and treatment type. This resulted in women, on average,

earning more money if they were assigned to treatment A, but men earning more money, on

average, if they were assigned to treatment B.

Introduction

For the dataset analyzed in this study, a psychologist was studying how much money

subjects earn in a game, following exposure to one of three instruction approaches. The subjects

were randomly assigned to the instruction approaches, and their gender, age, and the size of their

pre-game working memory were also recorded as independent variables. The following study

details how the dataset was analyzed to result in a model that demonstrates the relationship

between the independent variables and the dependent variable, the money the subjects earned.

Methods

Exploratory Data Analysis (EDA) was used to begin the process of looking at the

relationship between the variables. First, it is important to look at the distributions of the

1

Page 2: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

variables themselves through univariate numerical and graphical EDA. Through univariate

numerical EDA, it is possible to see that the ages of the subjects studied ranged from 19 to 27

years old. Another interesting feature is that the treatment levels of treatment A, B, or standard,

were categorized in both lowercase and uppercase letters (Figure 1). Additionally, one of the

subjects did not have a treatment assignment recorded. Both of these features of the treatment

distribution are important to note for cleaning the data.

Figure 1

a A b B std Std

1 1 74 3 81 3 79

Univariate graphical EDA revealed that the distribution of the age of the subjects is

skewed left, as seen in Figure 2.

2

Page 3: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 2

Additionally, Figure 3 shows that the distribution of the pre-game measurement of subjects’

working memory has a relatively Normal distribution.

3

Page 4: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 3

After recoding the data to make sure that treatments labeled with lowercase letters were counted

as the same as their uppercase counterparts (treatment a is the same as treatment A), the data

point of the subject whose treatment assignment was not recorded was removed. The resulting

distribution in Figure 4 shows similar numbers of subjects assigned to each treatment.

4

Page 5: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 4

Finally, for the purposes of this data analysis, subjects who went bankrupt during the game were

removed from the dataset. Figure 5 indicates that an outlier exists in the distribution of money

earned by subjects.

5

Page 6: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 5

6

Page 7: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

For multivariate EDA, a pairs plot was used to look at all relationships between all of the

variables. A pairs plot without the outlier was used to see the relationships more clearly, as seen

in Figure 6. There are two key observations to note from Figure 6. First, in looking at the

scatterplot that represents the relationship between the memory score and the dollars earned, it

looks like there is a weak positive trend, implying that memory score on its own might not be a

strong predictor of the amount of dollars earned. Second, in the boxplots representing the

relationship between dollars earned and treatment, it appears that the subjects who were assigned

the standard treatment earned less money than those who were assigned to treatment A or

treatment B.

7

Page 8: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 6

Following the EDA, an initial model including all of the variables and no transformations

or interactions was built. Residual analyses indicated that there is a lack of evidence that the

assumptions of error in regression are violated (Figure 7); however, the outlier was included in

the dataset used for the model, potentially skewing all model estimates.

8

Page 9: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 7

To confirm this suspicion, Cook’s Distance was utilized with an F distribution with degrees of

freedom 5 and 178. The result revealed that the outlier fell in the 60th percentile of the F

distribution, making it an influential point.

The next model to consider was the previous model, but with the dataset that does not

have the influential point. Figure 8 shows that removing the influential point made a drastic

difference on the model, so any further models should use the data without the influential point.

Figure 8

9

Page 10: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

The residual analyses of Figure 8 indicate room for further improvement because there is

evidence of assumption violations. Additionally, the placement of the residuals in vertical bands

on the residual versus fit plot suggests that the model may be missing an interaction.

Before moving on to additional models, it was important to check for transformations that

might help adjust some of the skewed continuous data to improve nonlinearity in the relationship

with dollars earned, or to prevent outliers from being too influential. For example, the original

age data was skewed left, but log and squared transformations did not improve the skewness

much (Figure 9).

Figure 9

Next, it was important to check for potential interactions between the variables. The

possible interactions to consider are gender and age, gender and memory score, gender and

treatment, age and memory score, age and treatment, and memory score and treatment. To build

the models including these interactions, the independent variables that were not included in the

interaction term were still left in the model.

The model with an interaction between gender and age shows an improvement in lack of

evidence of assumption violations and is significant with a p-value of 1.575x10-8 (Figure 10).

10

Page 11: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 10

Next, upon looking at the interaction between gender and working memory, the residual plots

appeared very similar to the model considered in Figure 8, indicating that this interaction might

not be adding anything to the model that does not have the influential point and includes all of

the variables and no interactions. On the other hand, the interaction between gender and

treatment did reveal quite a bit of change. While R2 is generally not a reliable measure of model

strength on its own, it is notable that the R2 of this model was about 45%, while all of the other

models considered had an R2 in the range of about 20 to 25%. Additionally, the p-value of this

model is 2.2x10-16, and the residual plots show a strong lack of evidence of assumption violations

(Figure 11).

11

Page 12: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 11

The next interaction of age and working memory has a p-value of 1.133x10-7, and the residual

versus fit plot reveals that an interaction term might be missing from this model, as seen in

Figure 12.

Figure 12

The interaction between age and treatment has similar plots to those in Figure 12, suggesting

again that an interaction term may be missing from this model. The final interaction to consider

was the interaction between working memory and treatment. This model has a p-value of

4.891x10-8, and the scatter on the residual versus fit plot shows a lack of evidence of the

12

Page 13: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

assumption violation of nonlinearity. There is still some widening in the residual versus fit plot,

indicating heteroskedasticity (Figure 13). Upon closer inspection of the data, it appears that the

heteroskedasticity may be due to the lack of data for the subjects on the younger end of the range

of 19 to 27 years. With a larger sample of older subjects, there is naturally more variability in the

residuals towards the right side of the plot.

Figure 13

Before choosing a final model, it was important to look at the BIC values of each

interaction model to mathematically determine which model was the strongest. Figure 14 shows

the BIC values for each interaction.

13

Page 14: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 14

Interaction BIC

Gender and Age 1230.197

Gender and Memory 1235.843

Gender and Treatment 1173.416

Age and Memory 1234.631

Age and Treatment 1240.414

Memory and Treatment 1235.348

Calculating the BIC of the models including each interaction shows that the interaction of gender

and treatment has the best ability to predict dollars earned in the game because of its much lower

BIC value compared to the other interactions. Knowing this, it was important to include the

interaction of gender and treatment in the final model.

To select the final model presented, a model including just the interaction of gender and

treatment as a predictor was compared against models including this interaction along with

different combinations of the age and memory score variables. The residual plots of the model

using just the interaction of gender and treatment as a predictor is shown in Figure 15.

14

Page 15: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 15

Figure 15 indicates that the residuals of the model with just the gender and treatment interaction

show a lack of evidence that the assumptions of errors in regression are violated. In comparing

models that included this interaction, along with combinations of the age and memory variables,

the residual plots continued to look similar to those in Figure 15 with just a little more variability

in the scatter of points. The similar residual plots most likely mean that age and working memory

are not contributing to the model. To confirm this, BIC was used again to mathematically

compare the models, and the results can be seen in Figure 16.

15

Page 16: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Figure 16

Model (* indicates interaction) BIC

Gender*Treatment 1163.773

Age and Gender*Treatment 1168.291

Memory and Gender*Treatment 1168.926

Gender*Age and Gender*Treatment 1169.902

Age*Memory and Gender*Treatment 1177.291

Age*Treatment and Gender*Treatment 1178.606

Memory*Treatment and Gender*Treatment 1177.442

Due to the similar BIC values of each model, it is prudent to pick the model with the lowest BIC,

in order to avoid overfitting to the data. In this case, that model would be the model with just the

interaction of gender and treatment.

Results

The final model chosen, with the interaction of gender and treatment is:

E(Dollars) = 0 + F*Female + TA*Treatment A + TB*Treatment B + FTA*Female*Treatment

A + FTB*Female*Treatment B

Figure 17 summarizes the coefficient estimate, standard error, and p-value for each part of the

model.

Figure 17

16

Page 17: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Estimate Standard Error P-Value

Intercept 20.50 0.96 8.24x10-51

Female 6.38 1.40 9.18x10-6

Treatment A 5.87 1.33 1.77x10-5

Treatment B 13.30 1.31 2.97x10-19

Female*Treatment A 1.35 1.96 4.92x10-1

Female*Treatment B -14.20 1.95 1.21x10-11

The assumptions of regression—fixed x, linearity of errors, constant variance of errors,

Normality of errors, and independent errors—are confirmed in the following ways for the model

chosen. The fixed x assumption holds because of there is random assignment of three exact

treatments in the study. The linearity and constant variance of errors can be seen in the residual

versus fitted values plot (Figure 15). The Normality of errors is present in the Normal Q-Q plot

(Figure 15), and finally, there are independent errors because each measurement of data

corresponded to an individual subject, and the subjects did not interact with each other.

From the model, it is possible to predict the expected value of dollars earned for every

combination of gender and treatment. The following simplified equations demonstrate this.

E(Dollars | Female, Standard Treatment) = 0 + F*Female

E(Dollars | Male, Standard Treatment) = 0

E(Dollars | Female, Treatment A) = 0 + F*Female + TA*Treatment A +

FTA*Female*Treatment A

E(Dollars | Male, Treatment A) = 0 + TA*Treatment A

17

Page 18: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

E(Dollars | Female, Treatment B) = 0 + F*Female + + TB*Treatment B +

FTB*Female*Treatment B

E(Dollars | Male, Treatment B) = 0 + TB*Treatment B

Using the estimates from the model, Figure 18 shows a visual of the relationship between the

interaction of gender and treatment type versus dollars earned. The solid pink and blue lines

indicate the expected value of dollars earned for each combination of gender and treatment type,

while the dotted pink and blue lines show the change in expected value for each combination.

Additionally, the green intervals show the 95% confidence intervals for each combination of

gender and treatment type.

Figure 18

Discussion

18

Page 19: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

Based on the model chosen, a female subject who is assigned to the standard treatment

will earn, on average, $26.89. There is 95% confidence that the true mean dollars earned for

female subjects assigned to the standard treatment falls between $24.89 and $28.88. A male

subject who is assigned to the standard treatment will earn, on average, $20.51. There is 95%

confidence that the true mean dollars earned for male subjects assigned to the standard treatment

falls between $18.61 and $22.41. A female subject assigned to treatment A will earn, on average,

$34.11. There is 95% confidence that the true mean dollars earned for female subjects assigned

to treatment A falls between $32.07 and $36.14. A male subject assigned to treatment A will

earn, on average, $26.38. There is 95% confidence that the true mean dollars earned for male

subjects assigned to treatment A falls between $24.57 and $28.19. A female subject assigned to

treatment B will earn, on average, $26. There is 95% confidence that the true mean dollars

earned for female subjects assigned to treatment B falls between $23.96 and $28.03. A male

subject assigned to treatment B will earn, on average, $33.77. There is 95% confidence that the

true mean dollars earned for male subjects assigned to treatment B falls between $32.01 and

$35.53.

Based on the expected values, confidence intervals, and Figure 18, it is possible to

conclude that females assigned to treatment A earn more money than females assigned to

standard or treatment B. However, males assigned to treatment B earn more money than males

assigned to standard or treatment A. Additionally, when assigned to the standard treatment,

females tend to earn more money than males. Therefore, the model selected shows that the

treatment types that earn the most money in the game are different for females and males.

Finally, as with any statistical model, it is important to account for the error that occurred

in this study. One error that can potentially be fixed is the error that resulted from the outlier

19

Page 20: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

influencing the original model estimates. The amount of dollars earned for the outlier subject

was $311. This is likely an error that resulted in the data recording process, and the subject

probably earned $31 instead. The study could be re-run without taking out the outlier by

adjusting this data point to $31. Another error could be in the assumption of fixed x. Although

there was random assignment of exact treatments for each subject, it is possible that not all

subjects participated in the treatment experience equally, or different psychologists delivered the

treatments in different manners. While there will always be irreducible error in any statistical

model, these errors are important to account for in future recreations of the study.

In conclusion, for this study, the model presented identifies the interaction of gender and

treatment to be an important predictor of dollars earned in the psychology game.

Appendix

# Author: Tias Sen

20

Page 21: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

# Last Edited: June 4, 2017# Purpose: Find best model to predict dollars earned in a game used in a psychology study

# Packageslibrary(ggplot2)library(GGally)

# Import the Dataset # Save HW2.dat to the same file location as this code before running the next linepsych_game <- read.csv("HW2.dat")

# ggplot Themetsen_theme <- theme_bw() + theme(axis.text = element_text(family = "Helvetica", color = "darkgreen"), axis.title = element_text(family = "Helvetica", face = "bold", color = "gray27", size = 12), axis.ticks = element_line(color = "darkgreen"), plot.title = element_text(family = "Helvetica", face = "bold", color = "gray27", size = 16, hjust = 0.5), legend.text = element_text(family = "Helvetica", size = 12), legend.title = element_text(family = "Helvetica", face = "bold", color = "gray27", size = 12))

# Univariate Numerical EDAdim(psych_game)

summary(as.factor(psych_game$female))

summary(psych_game$age)sd(psych_game$age)unique(psych_game$age)

summary(psych_game$memScore)sd(psych_game$memScore)

summary(psych_game$treatment)

summary(psych_game$dollars)sd(psych_game$dollars)

# Univariate Graphical EDA gender_bar <- ggplot(data = psych_game, aes(x = as.factor(female))) + geom_bar() + scale_x_discrete(labels = c("Male", "Female")) + ggtitle("Distribution of Gender of Subjects") +

21

Page 22: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

labs(x = "Gender of Subject", y = "Frequency") + tsen_themegender_bar

age_hist <- ggplot(data = psych_game, aes(x = age)) + geom_histogram(binwidth = .5) + scale_x_continuous(breaks = c(19, 20, 21, 22, 23, 24, 25, 26, 27)) + ggtitle("Distribution of Age of Subjects") + labs(x = "Age of Subject\n (Years)", y = "Frequency") + tsen_themeage_hist

mem_hist <- ggplot(data = psych_game, aes(x = memScore)) + geom_histogram(binwidth = 1) + ggtitle("Distribution of Pre-Game Measurement\n of Subjects' Working Memory") + labs(x = "Size of Subject's Working Memory", y = "Frequency") + tsen_thememem_hist

psych_game$treatment[which(psych_game$treatment == "a")] <- "A"psych_game$treatment[which(psych_game$treatment == "b")] <- "B"psych_game$treatment[which(psych_game$treatment == "std")] <- "Std"which(psych_game$treatment == "")psych_game <- psych_game[-52,]treatment_bar <- ggplot(data = psych_game, aes(x = as.factor(treatment))) + geom_bar() + ggtitle("Distribution of Subjects by Treatment Type") + labs(x = "Treatment Assigned to Subject", y = "Frequency") + tsen_themetreatment_bar

dollars_hist <- ggplot(data = psych_game, aes(x = dollars)) + geom_histogram(binwidth = 5) + ggtitle("Distribution of Dollars Earned by Subjects") + labs(x = "Dollars Earned by Subject", y = "Frequency") + tsen_themedollars_hist

psych_game_no_bankrupt <- psych_game[-which(psych_game$dollars == 0),]dollars_no_bankrupt_hist <- ggplot(data = psych_game_no_bankrupt, aes(x = dollars)) + geom_histogram(binwidth = 5) + ggtitle("Distribution of Dollars Earned by Subjects\n Excluding Bankrupt Subjects") + labs(x = "Dollars Earned by Subject", y = "Frequency") + tsen_theme

22

Page 23: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

dollars_no_bankrupt_hist

psych_game_rm_outlier <- psych_game_no_bankrupt[-which(psych_game_no_bankrupt$dollars > 300),]dollars_no_outlier_hist <- ggplot(data = psych_game_rm_outlier, aes(x = dollars)) + geom_histogram(binwidth = 2) + ggtitle("Distribution of Dollars Earned by Subjects\n Excluding Bankrupt Subjects and Outliers") + labs(x = "Dollars Earned by Subject", y = "Frequency") + tsen_themedollars_no_outlier_hist

# Multivariate EDApsych_game_no_bankrupt$female <- as.factor(psych_game_no_bankrupt$female)psych_game_rm_outlier$female <- as.factor(psych_game_rm_outlier$female)psych_game_pairs <- ggpairs(data = psych_game_no_bankrupt) + ggtitle("Pairs Plot of Variables\n Excluding Bankrupt Subjects") + tsen_themepsych_game_pairs_outlier <- ggpairs(data = psych_game_rm_outlier) + ggtitle("Pairs Plot of Variables\n Excluding Bankrupt Subjects and Outliers") + tsen_themepsych_game_pairspsych_game_pairs_outlier

# Initial Model init_model <- lm(dollars ~ female + age + memScore + factor(treatment, levels = c("Std", "A", "B")), data = psych_game_no_bankrupt)summary(init_model)

cooks_init_model <- cooks.distance(init_model)which.max(cooks_init_model)cooks_init_model[160]head(sort(pf(cooks_init_model, 5, 178), decreasing = T))

resid_init_model <- plot(x = init_model$fitted.values, y = init_model$residuals, main = "Residuals vs. Fitted Values\n for the Initial Model", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(init_model$residuals, pch = 16)qqline(init_model$residuals)

# Model Removing Influential Point

23

Page 24: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

wo_inf_model <- lm(dollars ~ female + age + memScore + factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(wo_inf_model)

resid_wo_inf_model <- plot(x = wo_inf_model$fitted.values, y = wo_inf_model$residuals, main = "Residuals vs. Fitted Values\n for the Model without the Influential Point", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(wo_inf_model$residuals, pch = 16)qqline(wo_inf_model$residuals)

# Check for Transformation Possibilitieslog_age_hist <- ggplot(data = psych_game_rm_outlier, aes(x = log(age))) + geom_histogram(binwidth = .02) + scale_x_continuous(breaks = c(19, 20, 21, 22, 23, 24, 25, 26, 27)) + ggtitle("Distribution of Log Age of Subjects") + labs(x = "Log Age of Subject", y = "Frequency") + tsen_themelog_age_hist

sqr_age_hist <- ggplot(data = psych_game_rm_outlier, aes(x = age^2)) + geom_histogram(binwidth = 15) + scale_x_continuous(breaks = c(19, 20, 21, 22, 23, 24, 25, 26, 27)) + ggtitle("Distribution of Squared Age of Subjects") + labs(x = "Squared Age of Subject", y = "Frequency") + tsen_themesqr_age_hist

# Interaction between Gender and Ageint_gen_age_model <- lm(dollars ~ female*age + memScore + factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_gen_age_model)

resid_int_gen_age_model <- plot(x = int_gen_age_model$fitted.values, y = int_gen_age_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between Gender and Age", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

24

Page 25: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

qqnorm(int_gen_age_model$residuals, pch = 16)qqline(int_gen_age_model$residuals)

# Interaction between Gender and MemScore int_gen_mem_model <- lm(dollars ~ age + female*memScore + factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_gen_mem_model)

resid_int_gen_mem_model <- plot(x = int_gen_mem_model$fitted.values, y = int_gen_mem_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between Gender and MemScore", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(int_gen_mem_model$residuals, pch = 16)qqline(int_gen_mem_model$residuals)

# Interaction between Gender and Treatmentint_gen_treat_model <- lm(dollars ~ age + memScore + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_gen_treat_model)

resid_int_gen_treat_model <- plot(x = int_gen_treat_model$fitted.values, y = int_gen_treat_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between Gender and Treatment", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(int_gen_treat_model$residuals, pch = 16)qqline(int_gen_treat_model$residuals)

# Interaction between Age and MemScoreint_age_mem_model <- lm(dollars ~ female + age*memScore + factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_age_mem_model)

25

Page 26: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

resid_int_age_mem_model <- plot(x = int_age_mem_model$fitted.values, y = int_age_mem_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between Age and MemScore", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(int_age_mem_model$residuals, pch = 16)qqline(int_age_mem_model$residuals)

# Interaction between Age and Treatmentint_age_treat_model <- lm(dollars ~ female + memScore + age*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_age_treat_model)

resid_int_age_treat_model <- plot(x = int_age_treat_model$fitted.values, y = int_age_treat_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between Age and Treatment", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(int_age_treat_model$residuals, pch = 16)qqline(int_age_treat_model$residuals)

# Interaction between MemScore and Treatmentint_mem_treat_model <- lm(dollars ~ female + age + memScore*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(int_mem_treat_model)

resid_int_mem_treat_model <- plot(x = int_mem_treat_model$fitted.values, y = int_mem_treat_model$residuals, main = "Residuals vs. Fitted Values for the Model\n with Interaction between MemScore and Treatment", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(int_mem_treat_model$residuals, pch = 16)qqline(int_mem_treat_model$residuals)

26

Page 27: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

# BIC for Interaction ModelsBIC(int_gen_age_model, int_gen_mem_model, int_gen_treat_model, int_age_mem_model, int_age_treat_model, int_mem_treat_model)

# Test Final Models final_model_1 <- lm(dollars ~ female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier) summary(final_model_1)signif(coefficients(summary(final_model_1)), 3)

resid_final_model_1 <- plot(x = final_model_1$fitted.values, y = final_model_1$residuals, main = "Residuals vs. Fitted Values for Final Model 1", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_1$residuals, pch = 16)qqline(final_model_1$residuals)

final_model_2 <- lm(dollars ~ age + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(final_model_2)

resid_final_model_2 <- plot(x = final_model_2$fitted.values, y = final_model_2$residuals, main = "Residuals vs. Fitted Values for Final Model 2", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_2$residuals, pch = 16)qqline(final_model_2$residuals)

final_model_3 <- lm(dollars ~ memScore + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(final_model_3)

resid_final_model_3 <- plot(x = final_model_3$fitted.values, y = final_model_3$residuals, main = "Residuals vs. Fitted Values for Final Model 3", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_3$residuals, pch = 16)qqline(final_model_3$residuals)

27

Page 28: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

final_model_4 <- lm(dollars ~ female*age + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(final_model_4)

resid_final_model_4 <- plot(x = final_model_4$fitted.values, y = final_model_4$residuals, main = "Residuals vs. Fitted Values for Final Model 4", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_4$residuals, pch = 16)qqline(final_model_4$residuals)

final_model_5 <- lm(dollars ~ age*memScore + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(final_model_5)

resid_final_model_5 <- plot(x = final_model_5$fitted.values, y = final_model_5$residuals, main = "Residuals vs. Fitted Values for Final Model 5", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_5$residuals, pch = 16)qqline(final_model_5$residuals)

final_model_6 <- lm(dollars ~ age*factor(treatment, levels = c("Std", "A", "B")) + female*factor(treatment, levels = c("Std", "A", "B")), data = psych_game_rm_outlier)summary(final_model_6)

resid_final_model_6 <- plot(x = final_model_6$fitted.values, y = final_model_6$residuals, main = "Residuals vs. Fitted Values for Final Model 6", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_6$residuals, pch = 16)qqline(final_model_6$residuals)

final_model_7 <- lm(dollars ~ memScore*factor(treatment, levels = c("Std", "A", "B")) + female*factor(treatment, levels = c("Std", "A", "B")),

28

Page 29: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

data = psych_game_rm_outlier)summary(final_model_7)

resid_final_model_7 <- plot(x = final_model_7$fitted.values, y = final_model_7$residuals, main = "Residuals vs. Fitted Values for Final Model 7", xlab = "Fitted Values", ylab = "Residuals", pch = 16)abline(a = 0, b = 0)

qqnorm(final_model_7$residuals, pch = 16)qqline(final_model_7$residuals)

BIC(final_model_1, final_model_2, final_model_3, final_model_4, final_model_5, final_model_6, final_model_7)

# Simplified Equations# exp_dollars <- 20.51 + 6.37*female + 5.86*A + 13.26*B + 1.35*female*A - 14.15*female*Bfem_std <- 20.51 + 6.37 fem_a <- 20.51 + 6.37 + 5.86 + 1.35fem_b <- 20.51 + 6.37 + 13.26 - 14.15male_std <- 20.51male_a <- 20.51 + 5.86male_b <- 20.51 + 13.26

# Adjust Data for Interaction Plotfem_std_ind <- which(psych_game_rm_outlier$female == 1 & psych_game_rm_outlier$treatment == "Std")fem_a_ind <- which(psych_game_rm_outlier$female == 1 & psych_game_rm_outlier$treatment == "A")fem_b_ind <- which(psych_game_rm_outlier$female == 1 & psych_game_rm_outlier$treatment == "B")male_std_ind <- which(psych_game_rm_outlier$female == 0 & psych_game_rm_outlier$treatment == "Std")male_a_ind <- which(psych_game_rm_outlier$female == 0 & psych_game_rm_outlier$treatment == "A")male_b_ind <- which(psych_game_rm_outlier$female == 0 & psych_game_rm_outlier$treatment == "B")

psych_game_rm_outlier$interaction[fem_std_ind] <- "Female,\n Standard Treatment"psych_game_rm_outlier$interaction[fem_a_ind] <- "Female,\n Treatment A"psych_game_rm_outlier$interaction[fem_b_ind] <- "Female,\n Treatment B"psych_game_rm_outlier$interaction[male_std_ind] <- "Male,\n Standard Treatment"psych_game_rm_outlier$interaction[male_a_ind] <- "Male,\n Treatment A"psych_game_rm_outlier$interaction[male_b_ind] <- "Male,\n Treatment B"

29

Page 30: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

# Interaction Plotint_plot <- ggplot(data = psych_game_rm_outlier, aes(x = factor(interaction, levels = c("Female,\n Standard Treatment", "Male,\n Standard Treatment", "Female,\n Treatment A", "Male,\n Treatment A", "Female,\n Treatment B", "Male,\n Treatment B")), y = dollars)) + geom_point() + ggtitle("Dollars Earned by Subjects\n versus Interaction of\n Subjects' Gender & Treatment Type") + labs(x = "Subject's Gender & Treatment Type", y = "Dollars Earned by Subject") + tsen_theme + theme(axis.text.x = element_text(angle = 30, hjust = 1))

int_plot <- int_plot + geom_segment(x = 0.5, xend = 1.5 , y = fem_std, yend = fem_std, col = "deep pink")int_plot <- int_plot + geom_segment(x = 1.5, xend = 2.5 , y = male_std, yend = male_std, col = "blue")int_plot <- int_plot + geom_segment(x = 2.5, xend = 3.5 , y = fem_a, yend = fem_a, col = "deep pink")int_plot <- int_plot + geom_segment(x = 3.5, xend = 4.5 , y = male_a, yend = male_a, col = "blue")int_plot <- int_plot + geom_segment(x = 4.5, xend = 5.5 , y = fem_b, yend = fem_b, col = "deep pink")int_plot <- int_plot + geom_segment(x = 5.5, xend = 6.5 , y = male_b, yend = male_b, col = "blue")

int_plot <- int_plot + geom_segment(x = 1.5, xend = 2.5, y = fem_std, yend = fem_a, col = "deep pink", linetype = "dotted")int_plot <- int_plot + geom_segment(x = 3.5, xend = 4.5, y = fem_a, yend = fem_b, col = "deep pink", linetype = "dotted")int_plot <- int_plot + geom_segment(x = 2.5, xend = 3.5, y = male_std, yend = male_a, col = "blue", linetype = "dotted")int_plot <- int_plot + geom_segment(x = 4.5, xend = 5.5, y = male_a, yend = male_b, col = "blue", linetype = "dotted")int_plot

# 95% CIsfem_std_ci <- predict(final_model_1, newdata = list(female = as.factor(1), treatment = "Std"), level = 0.95, interval = "confidence")male_std_ci <- predict(final_model_1, newdata = list(female = as.factor(0), treatment = "Std"),

30

Page 31: hseltman/files/Revised 402 HW 2 XIMRAD... · Web view1168.926 Gender*Age and Gender*Treatment 1169.902 Age*Memory and Gender*Treatment 1177.291 Age*Treatment and Gender*Treatment

Tias Sen 6/4/17

level = 0.95, interval = "confidence")fem_a_ci <- predict(final_model_1, newdata = list(female = as.factor(1), treatment = "A"), level = 0.95, interval = "confidence")male_a_ci <- predict(final_model_1, newdata = list(female = as.factor(0), treatment = "A"), level = 0.95, interval = "confidence")fem_b_ci <- predict(final_model_1, newdata = list(female = as.factor(1), treatment = "B"), level = 0.95, interval = "confidence")male_b_ci <- predict(final_model_1, newdata = list(female = as.factor(0), treatment = "B"), level = 0.95, interval = "confidence")

psych_game_rm_outlier$ci_lwr[fem_std_ind] <- fem_std_ci[2]psych_game_rm_outlier$ci_upr[fem_std_ind] <- fem_std_ci[3]psych_game_rm_outlier$ci_lwr[fem_a_ind] <- fem_a_ci[2]psych_game_rm_outlier$ci_upr[fem_a_ind] <- fem_a_ci[3]psych_game_rm_outlier$ci_lwr[fem_b_ind] <- fem_b_ci[2]psych_game_rm_outlier$ci_upr[fem_b_ind] <- fem_b_ci[3]psych_game_rm_outlier$ci_lwr[male_std_ind] <- male_std_ci[2]psych_game_rm_outlier$ci_upr[male_std_ind] <- male_std_ci[3]psych_game_rm_outlier$ci_lwr[male_a_ind] <- male_a_ci[2]psych_game_rm_outlier$ci_upr[male_a_ind] <- male_a_ci[3]psych_game_rm_outlier$ci_lwr[male_b_ind] <- male_b_ci[2]psych_game_rm_outlier$ci_upr[male_b_ind] <- male_b_ci[3]

int_plot <- int_plot + geom_errorbar(data = psych_game_rm_outlier, aes(ymin = ci_lwr, ymax = ci_upr), color = "darkgreen", width = 0.5, size = 0.75)int_plot

31