Review of Identifying Causal Effects

Review of Identifying Causal Effects

Methods of Economic Investigation

Lecture 13

Last Term Classical OLS has 5 main assumptions

A1. Full Rank: X is a T x k matrix with rank p(X)= k≤n

A2. Linearity: y = Xβ + ε where E(ε ) = 0 A3. X is exogenous with respect to ε , i.e. E(ε

| X) = 0 Somewhat weaker condition, X is uncorrelated with ε

so that E(εX)=0

A4. Homoskedasticiy: E(ε ε’) = σ2IT A5. Normality ε~N(0, σ2IT)

Relaxing the assumptions Even in finite samples, with these

assumption linear regression is BLUE

Last term you talked about some of the consequences of relaxing some of these assumptions Can rely on large sample properties to get

around some of the problems (e.g. A5) Can construct more robust estimates which are

ok in these large sample (e.g. A4)

This term The biggest problem in estimation is A3

We call this the “Conditional Independence Assumption” or CIA

Our estimates become inconsistent—which means we cannot rely on large samples to fix the problem

There are lots of ways violations of A3 can happen

CIA is violated, now what? We need to figure out what is generating

the correlations Measurement error Omitted variables Selection on unobservables Simultaneity of determination Correlations across different periods

How do we figure out what’s wrong? We put this in the context of a “program

evaluation” In truth, may not be any “program” per se Think of variable of interest as a “treatment”

For simplicity, we now talk about the variable of interest as an indicator variable that can be zero or one In practice can be continuous Use derivatives rather than differences to

calculate differences/changes

The Imaginary Experimental Ideal Pretend you could run an imaginary

experiment Not bounded by reality Everything observable

How would you construct a test to isolate the effect of your variable of interest T on the outcome of interest Y

Partioning the World Two groups of people in the world

People who got the treatment (so T=1) People who did not get the treatment (so T=0)

To think of a counterfactual outcome we need to know what would have happened in the absence of the experiment

The Road Not Taken… We Imagine 2 states of the world: one

where someone gets T and one where that same person does not

Now let’s define our usual notation

Individual A

Gets T Doesn’t Get T

Y1AY0A

Our Gold Standard If we could observe both these states of

the world we could know what would have happened in the absence of the treatment

The ONLY DIFFERENCE is the treatment so we know any difference in the Y’s must be caused by the treatment

This is our Average Treatment Effect E[Y1A – Y0A]

Back to the real world Sadly, we do not observe the true

counterfactual

What do we observe? Y1 for all the people in the treatment group (let

them all be A’s) Y0 for all the people in the control group (let

them all be B’s)

The Difference Estimate What if we just difference between

treatment and control? That is what if we did:E[Y1 | T = 1] – E[Y0 | T = 0]

NOTE: These are now CONDITIONAL expectation because we don’t observe the treatment for the control and vice versa

Decomposing the Difference Estimate What if we rewrite our difference estimate

so that: E[Y1A | T = 1] – E[Y0B | T = 0] =

E[Y1A | T = 1] - E[Y0A | T = 1] +

{E[Y0A | T = 1] – E[Y0B | T = 0]}

= TOT + Selection Bias

ATE vs. TOT Let’s look at the two definitions: TOT: E[Y1A – Y0A | T = 1]

ATE: E[Y1A – Y0A]

Why might these differ even if SB=0? Heterogeneous Treatment Effects This means there may be an idiosyncratic

individual component to the treatment effect Not the same as selection, more a function of

the actual effect

ATE vs. TOT - 2 Suppose Y1A = μ1 + ξ1 and Y0A = μ0 + ξ0

Then we can rewrite TOT= E[Y1A – Y0A|T=1]

= E[μ1 – μ0|T=1] + E[ξ1 – ξ0|T=1]

ATE = E[Y1A – Y0A] = E[μ1 – μ0] + E[ξ1 – ξ0]

= {E[μ1 – μ0| T=1]+ E[ξ1 – ξ0|T=1]}*Pr(T=1)

+ {E[μ1 – μ0| T=0]+ E[ξ1 – ξ0|T=0]}*Pr(T=0)

So the ATE will be a weighted average of TOT and a treatment effect for the Control group

Visual Representation of Difference

Assigned Treatment (T=1)

A B

Not Assigned Treatment (T=0)

Y1AY1BY0A

Y0B ATETOT

MY CARELESS NOTATION Exercise 2

ATE defined as: ATE = E( Y1i –Y0i | Ti =1) This is really ATET ATET=ATE if E( Y1i –Y0i | Ti =1) = E( Y1i –Y0i | Ti =0) No heterogenity in treatment effects In general, these are not the same

Why do we care about TOT? In the case of an experiment, we can get an

estimate of TOT (may not be ATE)

Why? We observe: E[Y1A | T = 1] – E[Y0B | T=0] This can be decomposed into two parts:E[Y1A – Y0A | T = 1] + E[Y0A | T = 1] – E[Y0B | T=0]

If E[Y0A | T = 1] = E[Y0B | T=0] then the observed difference in outcomes is our estimate of TOT!

That’s why experiments are good! If you have an experiment which is

randomly assigned with no compliance issues then we can estimate TOT

If there are compliance issues, then we estimate ITT

E[Y1A | T = 1] – E[Y0B | T=0]

TOT vs. ITT ITT may not be the same as TOT (and thus

in the case of random assignment not the same as ATE) because of compliance:

DEVIATION FROM PREVIOUS NOTATION: Before we have assumed that if T= 1 then you

were both assigned to and received treatment Now we need two separate things: T the

assignment to treatment and R receipt of treatment

Visual Representation of Difference

Assigned Treatment (T=1)

A B

Not Assigned Treatment (T=0)

R=1

R=0 R=0

R=1

Compare A (orange) to B (blue) = ITT

Compare R=1 (solid) to R=0 (striped) = TOT + SB

Compare A,R=1 (orange solid) to B,R=0 (blue striped) = LATE

Compliance Issues In the case where R ≠ T, rewrite the

observed difference in outcomes asE[Y1 | R = 1] – E[Y0 | R=0]

= E[YA | R = 1, T = 1] *Pr[T=1 | R=1]

+ E[YB | R = 1, T = 0] *Pr[T=0 | R=1]

– E[YB | R = 0, T = 0] *Pr[T=0 | R=0]

– E[YA | R = 0, T = 1] *Pr[T=1 | R=0]

(Treatment Group Compliers)

(Control Group Compliers)

(Always Takers)

(Never Takers)

Imagine non-compliance is symmetric

Rewrite with Pr[T=1 | R=0] = Pr[T=0 | R=1] = p

E[Y1 | R = 1] – E[Y0 | R=0]

= {E[YA | R = 1, T = 1]– E[YB | R = 0, T = 0] } (1 – p)

+{ E[YB | R = 1, T = 0]– E[YB | R = 0, T = 1]}p

= (TOT + SB)*(1 – p) + (AT – NT) p If SB = 0 (no selection bias, i.e. among compliers, the

counterfactual for the treatment group is the same) then this is a the weighted avg between the TOT and the AT/NT bias.

Roadmap of the course so far:

Hypothetical counterfactual difference

ExperimentPerfect Compliance

Non-experimental

Imperfect Compliance

TOT

ITT

Fixed Differences between Groups Groups with

Parallel TrendsGroups with similar

characteristics

Fixed Effect Difference-in-Differences

TOT

Matching Methods

TOT/ITT TOT

What we’ve done so far… Ways to define a ‘control group’

Fixed Effect Individuals within a group, on average the same Attribute any within group difference to treatment

Difference-in-Differences Assume: Fixed Differences over time Attribute any change in trend to treatment

Propensity Score Matching Assume: Treatment, conditional on observables, is as

if randomly assigned Attribute any difference in outcomes to treatment

Next time… Instrumental Variables

What Are They What about LATE? How to Estimate

Documents

Review of Identifying Causal Effects