31
Advanced Methods and Models in Behavioral Research – 2013 Instead of Friday, March 20: s monday, 25 March, hours 7 and In addition to the regular program: Tuesday, 2 April, hours 7 and 8 Regular: Friday, 5 April, hours 5 and 6

Instead of Friday, March 20: This monday , 25 March, hours 7 and 8

  • Upload
    aneko

  • View
    35

  • Download
    1

Embed Size (px)

DESCRIPTION

Instead of Friday, March 20: This monday , 25 March, hours 7 and 8. In addition to the regular program: Tuesday, 2 April, hours 7 and 8. Regular: Friday, 5 April, hours 5 and 6. Check out: My logistic regression run on auto.dta - PowerPoint PPT Presentation

Citation preview

Page 1: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Instead of Friday, March 20:This monday, 25 March, hours 7 and 8

In addition to the regular program:Tuesday, 2 April, hours 7 and 8

Regular:Friday, 5 April, hours 5 and 6

Page 2: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Check out:My logistic regression run on auto.dta

(Not easy / thinking out loud / there is more than one correct answer)

Revisit your own and others’ logit do files;check if you are able to do this yourself

Page 3: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

The exam

Same kind of setup as MMBR:On laptop

ExamMonitor installedNo books or notes allowed, only Stata’s help files

but:No (or hardly any) multiple choice questions

Largest part is working on dataMMBR is considered working knowledge

You get the data before the exam (!)

Page 4: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Our exam data

• Roger Fuchs & Freek Schoonbrood BEP project

• Go through the experiment at the link supplied in a minute

• Make sure to:– Answer seriously– Understand that your are doing a conjoint analysis– Realize that the data from this experiment are going to be

the ones that we will use during the exam– Write down notes for improvement of the survey

Page 5: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Logistics• We now go through the experiment• We try to come up with as many improvements as we can think

of• Roger and Freek implement the ones they feel make sense

today• Everyone arranges for at least 5 participants as of tomorrow:

ensure some variance! (and/or put an invite on your Facebook/Twitter/... page)

• As soon as I have at least some data, I will put the data set online (note that it might not be complete yet, as there might follow some more participants later)

• (note that we are doing this sort of quick-and-dirty: we do not check for all kinds of sampling biases, etc. Think about how we could have done that!)

Page 6: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

http://bep.freek.ws

Page 7: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

In with the (multi-level) statistics...

Page 8: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

MULTI – LEVEL ANALYSIS

Page 9: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Multi-level models or ...

dealing with clustered data.One solution: the variance component model

• Bayesian hierarchical models • mixed models (in SPSS)• hierarchical linear models • random effects models • random coefficient models • subject specific models • variance component models • variance heterogeneity models

Page 10: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Clustered data / multi-level models

• Pupils within schools (within regions within countries)

• Firms within regions (or sectors)

• Vignettes within persons

• Employees within stores (our fastfood.dta example)

Page 11: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Two issues with clustered data

• Your estimates will (in all likelihood) be too precise: you find effects that do not exist in the population

[do we get that?]

• You will want to distinguish between effects within clusters and effects between clusters

[see next two slides]

Page 12: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

On individual vs aggregate data

For instance: X = introvert X = age of McDonald’s employee Y = school results Y = like the manager

Page 13: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Had we only known, that the data are clustered!

So the effect of an X within clusters can be different from the effect between clusters!

Using the school example: lines represent schools. And within schools the effect of being introvert is positive!

Page 14: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

MAIN MESSAGES

Be able to recognize clustered data and deal with it appropriately (how to do that will follow)

Distinguish two kinds of effects: those at the "micro-level" (within clusters) vs those at the aggregate level (between clusters). They need not be the same!

(and ... do not test a micro-hypothesis with aggregate data)

Page 15: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

A toy example – two schools, two pupils

Overall mean(0)

Two schools each with two pupils. We first calculate the means.

Overall mean= (3+2+(-1)+(-4))/4=0

3

2

-1

-4

exam

sco

re

School 2School 1

(taken from Rasbash)

Page 16: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Now the variance

Overall mean(0)

3

2

-1

-4

exam

sco

re

School 2School 1

The total variance is the sum of the squares of the departures of the observations around the mean, divided by the sample size (4) =

(9+4+1+16)/4=7.5

Page 17: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

The variance of the school means around the overall mean

3

2

-1

-4

exam

sco

re

School 2School 1

Overall mean(0)

2.5

-2.5

The variance of the school means around the overall mean=

(2.52+(-2.5)2)/2=6.25 (total variance was 7.5)

Page 18: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

The variance of the pupils scores around their school’s mean

3

2

-1

-4

exam

sco

re

School 2School 1

2.5

-2.5

The variance of the pupils scores around their school’s mean=

((3-2.5)2 + (2-2.5)2 + (-1-(-2.5))2 + (-4-(-2.5))2 )/4 =1.25

Page 19: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

-> So you can partition the total variance in individual level variance and school level variance

How much of the variability in pupil attainment is attributable to factors at the school and how much to factors at the pupil level?

In terms of our toy example we can now say

6.25/7.5= 82% of the total variation of pupils attainment is attributable to school level factors

1.25/7.5= 18% of the total variation of pupils attainment is attributable to pupil level factors

And this is important; we want to know how

to explain (in this example)

school attainment,and appararently thedifferences are at theschool level more than

the pupil level

Page 20: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Standard multiple regression won't do

Y D1 D2 D3 D4 D5 id …

+4 -1 -1 0 1 0 1

-3 1 1 1 0 -1 1

+2 0 0 1 0 -1 2

0 1 0 -1 1 0 2

+1 … … … … … 3

+2 … … … … … 3

-3 … … … … … 4

+4 … … … … … 4

… … … … … … …

So you can use all the data and just run a multiple regression, but then you disregard the clustering effect, which gives uncorrect confidence intervals (and cannot distinguish between effects at the cluster vs at the school level)

Possible solution (but not so good) You can aggregate within clusters, and then run a multiple regression on the aggregate data. Two problems: no individual level testing possible + you get much less data points.

So what can we do?

Page 21: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Multi-level models

The standard multiple regression model assumes

... with the subscript "i" defined at the case-level.

... and the epsilons independently distributed with covariance matrix I.

With clustered data, you know these assumptions are not met.

Page 22: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Solution 1: add dummy-variables per cluster

• Try multiple regression, but with as many dummy variables as you have clusters (minus 1)

... where, in this example, there are j+1 clusters.

IF the clustering differences are (largely) due to differences in the intercept between persons, this might work.

BUT if there are only a handful of cases per person, this necessitates a huge number of extra variables

Page 23: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Solution 2: split your micro-level X-vars

Say you have:

then create:

and add both as predictors (instead of x1)

Make sure that you understand what

is happening here,and why it is of use.

Page 24: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Solution 3: the variance component model

In the variance component model, we split the randomness

in a "personal part" and a "rest part"

Page 25: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Now: how do you do this in Stata?

<See Stata demo> [note to CS: use age and schooling as examples to split at restaurant level]

relevant commandsxtset and xtregbys <varA>: egen <meanvarB> = mean(<varB>)gen dvarB = <varB> - <meanvarB>

convenience commandstab <var>, gen() droporder desedit sum

Page 26: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Up next

• How do we run the "Solution 1”, "Solution 2”, and “Solution 3” analysis and compare which works best? What about assumption checking?

• Random intercept we now saw, but how about random slopes?

Page 27: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

When you have multi-level data (2 levels)1. If applicable: consider whether using separate dummies per

group might help (use only when this does not create a lot of dummies)

2. Run an empty mixed model (i.e., just the constant included) in Stata. Look at the level on which most of the variance resides.

3. If applicable: divide micro-variables in "group mean" variables and "difference from group mean" variables.

4. Re-run your mixed model with these variables included (as you would a multiple regression analysis)

5. (and note: use regression diagnostics secretly, to find outliers and such)

Page 28: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

On non-response

Page 29: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Non-response analysis

• Not all of the ones invited are going to participate

• Think about selective non-response: some (kinds of) individuals might be less likely to participate.

How might that influence the results?

sample

Page 30: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Data: TVSFP on influencing behavior

Page 31: Instead of Friday, March 20: This  monday , 25 March, hours 7 and 8

Advanced Methods and Models in Behavioral Research – 2013

Online as

motoroccasion8March2013.dta