Upload
josef
View
45
Download
0
Embed Size (px)
DESCRIPTION
Review of Identifying Causal Effects. Methods of Economic Investigation Lecture 13. Last Term. Classical OLS has 5 main assumptions A1. Full Rank: X is a T x k matrix with rank p(X) = k≤n A2. Linearity: y = X β + ε where E( ε ) = 0 - PowerPoint PPT Presentation
Citation preview
Review of Identifying Causal Effects
Methods of Economic Investigation
Lecture 13
Last Term Classical OLS has 5 main assumptions
A1. Full Rank: X is a T x k matrix with rank p(X)= k≤n
A2. Linearity: y = Xβ + ε where E(ε ) = 0 A3. X is exogenous with respect to ε , i.e. E(ε
| X) = 0 Somewhat weaker condition, X is uncorrelated with ε
so that E(εX)=0
A4. Homoskedasticiy: E(ε ε’) = σ2IT A5. Normality ε~N(0, σ2IT)
Relaxing the assumptions Even in finite samples, with these
assumption linear regression is BLUE
Last term you talked about some of the consequences of relaxing some of these assumptions Can rely on large sample properties to get
around some of the problems (e.g. A5) Can construct more robust estimates which are
ok in these large sample (e.g. A4)
This term The biggest problem in estimation is A3
We call this the “Conditional Independence Assumption” or CIA
Our estimates become inconsistent—which means we cannot rely on large samples to fix the problem
There are lots of ways violations of A3 can happen
CIA is violated, now what? We need to figure out what is generating
the correlations Measurement error Omitted variables Selection on unobservables Simultaneity of determination Correlations across different periods
How do we figure out what’s wrong? We put this in the context of a “program
evaluation” In truth, may not be any “program” per se Think of variable of interest as a “treatment”
For simplicity, we now talk about the variable of interest as an indicator variable that can be zero or one In practice can be continuous Use derivatives rather than differences to
calculate differences/changes
The Imaginary Experimental Ideal Pretend you could run an imaginary
experiment Not bounded by reality Everything observable
How would you construct a test to isolate the effect of your variable of interest T on the outcome of interest Y
Partioning the World Two groups of people in the world
People who got the treatment (so T=1) People who did not get the treatment (so T=0)
To think of a counterfactual outcome we need to know what would have happened in the absence of the experiment
The Road Not Taken… We Imagine 2 states of the world: one
where someone gets T and one where that same person does not
Now let’s define our usual notation
Individual A
Gets T Doesn’t Get T
Y1AY0A
Our Gold Standard If we could observe both these states of
the world we could know what would have happened in the absence of the treatment
The ONLY DIFFERENCE is the treatment so we know any difference in the Y’s must be caused by the treatment
This is our Average Treatment Effect E[Y1A – Y0A]
Back to the real world Sadly, we do not observe the true
counterfactual
What do we observe? Y1 for all the people in the treatment group (let
them all be A’s) Y0 for all the people in the control group (let
them all be B’s)
The Difference Estimate What if we just difference between
treatment and control? That is what if we did:E[Y1 | T = 1] – E[Y0 | T = 0]
NOTE: These are now CONDITIONAL expectation because we don’t observe the treatment for the control and vice versa
Decomposing the Difference Estimate What if we rewrite our difference estimate
so that: E[Y1A | T = 1] – E[Y0B | T = 0] =
E[Y1A | T = 1] - E[Y0A | T = 1] +
{E[Y0A | T = 1] – E[Y0B | T = 0]}
= TOT + Selection Bias
ATE vs. TOT Let’s look at the two definitions: TOT: E[Y1A – Y0A | T = 1]
ATE: E[Y1A – Y0A]
Why might these differ even if SB=0? Heterogeneous Treatment Effects This means there may be an idiosyncratic
individual component to the treatment effect Not the same as selection, more a function of
the actual effect
ATE vs. TOT - 2 Suppose Y1A = μ1 + ξ1 and Y0A = μ0 + ξ0
Then we can rewrite TOT= E[Y1A – Y0A|T=1]
= E[μ1 – μ0|T=1] + E[ξ1 – ξ0|T=1]
ATE = E[Y1A – Y0A] = E[μ1 – μ0] + E[ξ1 – ξ0]
= {E[μ1 – μ0| T=1]+ E[ξ1 – ξ0|T=1]}*Pr(T=1)
+ {E[μ1 – μ0| T=0]+ E[ξ1 – ξ0|T=0]}*Pr(T=0)
So the ATE will be a weighted average of TOT and a treatment effect for the Control group
Visual Representation of Difference
Assigned Treatment (T=1)
A B
Not Assigned Treatment (T=0)
Y1AY1BY0A
Y0B ATETOT
MY CARELESS NOTATION Exercise 2
ATE defined as: ATE = E( Y1i –Y0i | Ti =1) This is really ATET ATET=ATE if E( Y1i –Y0i | Ti =1) = E( Y1i –Y0i | Ti =0) No heterogenity in treatment effects In general, these are not the same
Why do we care about TOT? In the case of an experiment, we can get an
estimate of TOT (may not be ATE)
Why? We observe: E[Y1A | T = 1] – E[Y0B | T=0] This can be decomposed into two parts:E[Y1A – Y0A | T = 1] + E[Y0A | T = 1] – E[Y0B | T=0]
If E[Y0A | T = 1] = E[Y0B | T=0] then the observed difference in outcomes is our estimate of TOT!
That’s why experiments are good! If you have an experiment which is
randomly assigned with no compliance issues then we can estimate TOT
If there are compliance issues, then we estimate ITT
E[Y1A | T = 1] – E[Y0B | T=0]
TOT vs. ITT ITT may not be the same as TOT (and thus
in the case of random assignment not the same as ATE) because of compliance:
DEVIATION FROM PREVIOUS NOTATION: Before we have assumed that if T= 1 then you
were both assigned to and received treatment Now we need two separate things: T the
assignment to treatment and R receipt of treatment
Visual Representation of Difference
Assigned Treatment (T=1)
A B
Not Assigned Treatment (T=0)
R=1
R=0 R=0
R=1
Compare A (orange) to B (blue) = ITT
Compare R=1 (solid) to R=0 (striped) = TOT + SB
Compare A,R=1 (orange solid) to B,R=0 (blue striped) = LATE
Compliance Issues In the case where R ≠ T, rewrite the
observed difference in outcomes asE[Y1 | R = 1] – E[Y0 | R=0]
= E[YA | R = 1, T = 1] *Pr[T=1 | R=1]
+ E[YB | R = 1, T = 0] *Pr[T=0 | R=1]
– E[YB | R = 0, T = 0] *Pr[T=0 | R=0]
– E[YA | R = 0, T = 1] *Pr[T=1 | R=0]
(Treatment Group Compliers)
(Control Group Compliers)
(Always Takers)
(Never Takers)
Imagine non-compliance is symmetric
Rewrite with Pr[T=1 | R=0] = Pr[T=0 | R=1] = p
E[Y1 | R = 1] – E[Y0 | R=0]
= {E[YA | R = 1, T = 1]– E[YB | R = 0, T = 0] } (1 – p)
+{ E[YB | R = 1, T = 0]– E[YB | R = 0, T = 1]}p
= (TOT + SB)*(1 – p) + (AT – NT) p If SB = 0 (no selection bias, i.e. among compliers, the
counterfactual for the treatment group is the same) then this is a the weighted avg between the TOT and the AT/NT bias.
Roadmap of the course so far:
Hypothetical counterfactual difference
ExperimentPerfect Compliance
Non-experimental
Imperfect Compliance
TOT
ITT
Fixed Differences between Groups Groups with
Parallel TrendsGroups with similar
characteristics
Fixed Effect Difference-in-Differences
TOT
Matching Methods
TOT/ITT TOT
What we’ve done so far… Ways to define a ‘control group’
Fixed Effect Individuals within a group, on average the same Attribute any within group difference to treatment
Difference-in-Differences Assume: Fixed Differences over time Attribute any change in trend to treatment
Propensity Score Matching Assume: Treatment, conditional on observables, is as
if randomly assigned Attribute any difference in outcomes to treatment
Next time… Instrumental Variables
What Are They What about LATE? How to Estimate