68
Week 6: Synthetic Control Method Jack Blumenau 1 / 52

Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Week 6: Synthetic Control Method

Jack Blumenau

1 / 52

Page 2: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Outline

Motivation

Synthetic Control

Inference

Additional application

Conclusion

2 / 52

Page 3: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Motivation

Page 4: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Motivation

If an instance in which the phenomena under investigation occurs andan instance in which it does not occur, have every circumstance in com-mon save one, that one occurring only in the former, the circumstancein which alone the two instances differ, is the effect, or the cause, or anindispensable part of the cause, of the phenomenon.

J.S. Mill on the “Method of Difference”

3 / 52

Page 5: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Motivation

Comparative case studies have a long history in applied political science:• Qualitative: “thick” description of the context/features of two or moreinstances of specific phenomena. Aim to describe contrasts or similaritiesacross the cases and reason inductively about causality

• Quantitative: more explicitly causal, using aggregate data from one treatedunit and a small set of control units. Often based on ‘natural experiments’where a shock affects one unit, but not others.

3 / 52

Page 6: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Quantitative Comparative Case Studies

Goal:

• Estimate effects of events or policy interventions that take place at anaggregate level

• Types of unit: cities, states, countries, etc

• Types of intervention: passage of laws, economic shocks, etc

Approach:

• Compare the evolution of an aggregate outcome for the unit affected by theintervention to the evolution of the same outcome for some control group

• e.g. Card (1990), Card and Krueger (1994), Abadie and Gardeazabal (2003)

4 / 52

Page 7: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Quantitative Comparative Case Studies

Advantages:

• Policy interventions often take place at an aggregate level• Aggregate/macro data are often available

Problems:

• Reasons for selection of control group are often ambiguous• Standard errors do not reflect uncertainty about the ability of the controlgroup to reproduce the counterfactual of interest

Solution:

• If you don’t have a good control group: synthesize one

5 / 52

Page 8: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Running example

Reunification of West and East GermanyWhat were the economic effects of reunification on the West Germaneconomy? Many economic historians argue that reunification had largenegative economic costs, but identification is difficult because there is noobvious country with which we can compare the growth trajectory of WestGermany. Abadie et al (2015) estimate the effects of reunification by comparingthe actual time series for West Germany with a synthetic control group whichprovides the counterfactual.

• Outcome: GDP per capita (inflation adjusted)• Treatment: Reunification (1 for W. Germany after 1990, 0 otherwise)• Time: Years (1960 to 2003)

6 / 52

Page 9: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

What is the most appropriate control group for evaluating the effects ofreunification on West Germany in 1990?

• Geographical/cultural: Austria?• Economic: USA?• Average: OECD countries?

The choice of the control group matters!

7 / 52

Page 10: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

1960 1970 1980 1990 2000

010

000

2000

030

000

Year

GD

P p

er c

apita

West Germanyrest of OECD sample

reunification

8 / 52

Page 11: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

1960 1970 1980 1990 2000

010

000

2000

030

000

Year

GD

P p

er c

apita

West GermanyMean of OECD sample

reunification

9 / 52

Page 12: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

1960 1970 1980 1990 2000

010

000

2000

030

000

Year

GD

P p

er c

apita

West GermanyAustria

reunification

10 / 52

Page 13: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

1960 1970 1980 1990 2000

010

000

2000

030

000

Year

GD

P p

er c

apita

West GermanyUSA

reunification

11 / 52

Page 14: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

1960 1970 1980 1990 2000

010

000

2000

030

000

Year

GD

P p

er c

apita

West GermanyUSA and Austria

reunification

12 / 52

Page 15: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

What should be the control group?

Synthetic control moves away from using a single control unit or a simpleaverage of control units.

Instead we use a weighted average of the set of control or “donor” units.

Rather than assuming that either the USA or Austria are similar to W. Germany,we calculate a weighted average (the synthetic control) which is more similar toWest Germany than any individual country.

IntuitionWhen we only have a few aggregate units, a ‘synthetic’ combination of controlunits may do a better job of reproducing the characteristics of the treated unitthan any one unit alone.

13 / 52

Page 16: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

14 / 52

Page 17: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Synthetic Control

Page 18: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Notation

DefinitionFor units 𝑗 ∈ 1, ..., 𝐽 + 1:• Unit 1 is the unit of interest (which receives the treatment)

• Units 2 to 𝐽 + 1 are the ‘donor pool’ or potential comparison unitsTime periods ∈ 1, ..., 𝑇 :• Pre-treatment period: 𝑡 = 1, ..., 𝑇0

• Post-treatment period: 𝑡 = 𝑇0 + 1, ..., 𝑇

15 / 52

Page 19: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Notation

15 / 52

Page 20: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Notation

Estimand

𝜏1𝑡 = 𝑌 𝐼1𝑡 − 𝑌 𝑁

1𝑡 = 𝑌1𝑡 − 𝑌 𝑁1𝑡 for all 𝑡 > 𝑇0

i.e. the treatment effect on the treated unit in the post-treatment periods.

Problem

We cannot observe 𝑌 𝑁1𝑡

Why? → Fundamental problem of causal inference.

The critial question, as always, is how should we impute 𝑌 𝑁1𝑡 ?

15 / 52

Page 21: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Imputing 𝑌 𝑁1𝑡

1. Matching

• For each time period 𝑡, find the 𝑀 ‘closest’ units to unit 1 and average theobserved outcomes:

𝑌 𝑁1,𝑡=1 = 1

𝑀𝑀

∑𝑚=1

𝑌𝑗𝑚(1),𝑡=12. Diff-in-diff

• Add the average change in outcome for the control group to the treated unit’soutcome in the pre-treatment period

𝑌 𝑁1,𝑡=1 = 𝑌1,𝑡=0 + ( 𝑌0,𝑡=1 − 𝑌0,𝑡=0)

3. Synthetic control

• Take a weighted average of the outcomes of the donor units• Weights defined by closeness to the trend of the outcome for the treated unitin the pre-treatment period

𝑌 𝑁1,𝑡=1 =

𝐽+1∑𝑗=2

𝑤∗𝑗𝑌𝑗,𝑡=1

16 / 52

Page 22: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Defining the synthetic control

Definition (Synthetic control)A synthetic control is a vector of weights, 𝑊 , associated with each of theavailable 𝐽 donor units.

Compare to our three examples above. 𝑊 is a vector with…

• …equal weight for each unit (OECD average)• …0 weight for all units, except Austria where 𝑤𝑗 = 1 (Austria)• …0 weight for all units, except USA where 𝑤𝑗 = 1 (USA)• …0 weight for all units, except USA where 𝑤𝑗 = .5 and Austria where

𝑤𝑗 = .5 (USA and Austria)There are many potential synthetic controls! The goal is to select 𝑊 such thatthe characteristics of the treated unit are best resembled by the characteristicsof the synthetic control.

17 / 52

Page 23: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊

For each donor unit, define a weight 𝑊 = {𝑤2, 𝑤3, ..., 𝑤𝐽+1}, where:𝐽

∑𝑖=2

��𝑗 = 1

and

0 <= 𝑤𝑗 <= 1 ∀𝑗 ∈ 2...𝐽

Goal: Find values for 𝑤𝑗 which make treatment and control units as similar aspossible.

18 / 52

Page 24: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊

We want 𝑤𝑗 such that treatment/control units are similar in terms of:• Pre-intervention outcome values

𝑌1,𝑡 ≈𝐽+1∑𝑗=2

��𝑗𝑌𝑗,𝑡 for all 𝑡 ∈ 1, ..., 𝑇0

• Covariates that are predictive of post-intervention outcomes

𝑍1 ≈𝐽+1∑𝑗=2

��𝑗𝑍𝑗

The idea is to givemore weight to units in the donor pool that closely approximatethe treated unit in the pre-intervention period.

18 / 52

Page 25: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Design decisions

1. Which variables should be included in 𝑍𝑖?

• Those that reflect the most important determinants of the outcome• Can use either time-varying or time-invariant covariates (R will average thetime-varying values)

2. Which units should be included in the donor pool?

• Units whose outcome is determined in the same way as the treated unit• Control units should not become treated in any of the post-treatment period• Control units should not be subject to idiosyncratic shocks in the posttreatment period

19 / 52

Page 26: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊

We find the values of 𝑊 by minimizing the following expression:

𝑘∑𝑚=1

𝑣𝑚(𝑋1𝑚 − 𝑋0𝑚𝑊)2

where

1. 𝑋1 = {𝑍1, 𝑌1,1, 𝑌1,2, ..., 𝑌1,𝑇0 } and 𝑋0 is a matrix containing the same informationfor each of the control units

2. 𝑣𝑚 is a weight that reflects the importance of the 𝑚th variable that we use to measure thedistance between treated and control units

20 / 52

Page 27: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating 𝑣𝑚

We also need to establish which variables get the largest weights (𝑣𝑚). To do so,we use cross-validation:

1. Split the pre-treatment period into a training period (1960-1980) and avalidation period (1981-1990)

2. Using training period data, select 𝑣𝑚 such that 𝑊 minimizes the rootmean squared prediction error for the validation period

𝑅𝑀𝑆𝑃𝐸 =√√√√⎷

1𝑇𝑜

𝑇0∑𝑡=1

(𝑌1𝑡 −𝐽+1∑𝑗=2

��𝑗𝑌𝑗𝑡)2

Implications:

1. Selects 𝑣𝑚 that minimizes out-of-sample prediction errors2. 𝑣𝑚 indicate which covariates are most predictive of the outcome3. Most weight (𝑤𝑗) is put on control units which are similar to the treatedunits on covariates (𝑍1, 𝑍0) that are predictive of the outcome(𝑌1,𝑡,𝑌2,𝑡, ..., 𝑌𝐽,𝑡) in the pre-intervention period (𝑡 ≤ 𝑇0)

21 / 52

Page 28: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊

SC is, at heart, a sort of difference-in-differences matching estimator.

• Diff-in-diff: establish a control group which follows a parallel trend in theabsence of treatment (note we are still assuming that the parallel-nesswould continue post-treatment!)

• Matching: 𝑤𝑗 calculated using observed pre-treatment observedcovariates

Overall, SC tries to find the weighted counterfactual that minimises the distance,in terms of time-invariant characteristics and pre-treatment outcomes, betweenthe treated unit and the synthetic control.

22 / 52

Page 29: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightUSA 0.06UK 0.06Austria 0.06Belgium 0.06Denmark 0.06France 0.06Italy 0.06Netherlands 0.06Norway 0.06Switzerland 0.06Japan 0.06Greece 0.06Portugal 0.06Spain 0.06Australia 0.06New Zealand 0.06

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 30: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightAustria 0.14Japan 0.08Netherlands 0.08USA 0.07Switzerland 0.07UK 0.06Denmark 0.06France 0.06Spain 0.06Australia 0.06Italy 0.05Norway 0.05Belgium 0.04Greece 0.04Portugal 0.04New Zealand 0.03

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 31: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightAustria 0.18Netherlands 0.13USA 0.11UK 0.07Greece 0.07New Zealand 0.07Japan 0.07Switzerland 0.05Belgium 0.04Denmark 0.04France 0.04Norway 0.04Spain 0.04Australia 0.04Portugal 0.01Italy 0.00

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 32: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightAustria 0.28USA 0.10Switzerland 0.09Netherlands 0.08Denmark 0.07Greece 0.07New Zealand 0.07Japan 0.07Belgium 0.03France 0.03Norway 0.03Portugal 0.03Spain 0.03Australia 0.03UK 0.00Italy 0.00

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 33: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightAustria 0.42USA 0.22Japan 0.16Switzerland 0.11Netherlands 0.09UK 0.00Belgium 0.00Denmark 0.00France 0.00Italy 0.00Norway 0.00Greece 0.00Portugal 0.00Spain 0.00Australia 0.00New Zealand 0.00

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 34: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Estimating𝑊 (Intuition)

Goal: minimize difference in outcome trend in pre-treatment period.

1960 1965 1970 1975 1980 1985 1990

010

0020

0030

0040

00

Year

GD

P d

iffer

ence

(R

eal −

Syn

thet

ic)

reunification

county weightAustria 0.42USA 0.22Japan 0.16Switzerland 0.11Netherlands 0.09UK 0.00Belgium 0.00Denmark 0.00France 0.00Italy 0.00Norway 0.00Greece 0.00Portugal 0.00Spain 0.00Australia 0.00New Zealand 0.00

The Synth package in R will automate this optimization problem for us.

23 / 52

Page 35: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Interpreting𝑊 (country weights)

Country weights in synthetic Germany:

County Weight County WeightAustria 0.42 France 0.00USA 0.22 Italy 0.00Japan 0.16 Norway 0.00Switzerland 0.11 Greece 0.00Netherlands 0.09 Portugal 0.00UK 0.00 Spain 0.00Belgium 0.00 Australia 0.00Denmark 0.00 New Zealand 0.00

24 / 52

Page 36: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Interpreting𝑊 (country weights)

Country weights in synthetic Germany (SC and OLS):

Country Synth Reg Country Synth RegAustria 0.42 0.26 France 0.00 0.04USA 0.22 0.13 Italy 0.00 -0.05Japan 0.16 0.19 Norway 0.00 0.04Switzerland 0.11 0.05 Greece 0.00 -0.09Netherlands 0.09 0.14 Portugal 0.00 -0.08UK 0.00 0.06 Spain 0.00 -0.01Belgium 0.00 -0.00 Australia 0.00 0.12Denmark 0.00 0.08 New Zealand 0.00 0.12

Regression weights can be greater than 1 or less than zero → extrapolationoutside of the support of control units.

Extrapolation is not possible in the SC case because the weights are boundbetween 0 and 1.

25 / 52

Page 37: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Interpreting𝑊 (assessing balance)

GDP predictor means:

Treated Synthetic Rest of OECD SampleGDP per-capita 15808.9 15802.2 8021.1Trade openness 56.8 56.9 31.9Inflation rate 2.6 3.5 7.4Industry share 34.5 34.4 34.2

Schooling 55.5 55.2 44.1Investment rate 27.0 27.0 25.9

26 / 52

Page 38: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Interpreting 𝑣

Which variables are most important for determining the synthetic control?

Variable 𝑣GDP per-capita 0.44Investment rate 0.24Trade openness 0.13Schooling 0.11Inflation rate 0.07Industry share 0.00

The weights 𝑣1, … , 𝑣𝑘 should reflect the predictive value of the covariates.

27 / 52

Page 39: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Causal effects

Weighting donor units leads to a synthetic unit with a similar outcome trend inthe pre-intervention period as the treated unit.

Given ��, an unbiased estimator of 𝜏1𝑡 is:

𝜏1𝑡 = 𝑌1,𝑡 −𝐽+1∑𝑗=2

𝑤𝑗𝑌𝑗,𝑡 for 𝑡 ∈ {𝑇0 + 1, … , 𝑇 }

where

• 𝑌1,𝑡 is the outcome for the treated unit in post-treatment period 𝑡• ∑𝐽+1

𝑗=2 𝑤𝑗𝑌𝑗,𝑡 is the outcome for the synthetic control unit inpost-treatment period 𝑡

• 𝜏1,𝑡 is the ATT for time period 𝑡

28 / 52

Page 40: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Causal effect I

1960 1970 1980 1990 2000

050

0010

000

1500

020

000

2500

030

000

Year

Per

−ca

pita

GD

P (

PP

P, 2

002

US

D)

West Germanysynthetic West Germany

Reunification

29 / 52

Page 41: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Causal effect II

1960 1970 1980 1990 2000

−40

00−

2000

020

0040

00

Year

Gap

in p

er−

capi

ta G

DP

(P

PP,

200

2 U

SD

)

Reunification

30 / 52

Page 42: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

31 / 52

Page 43: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Inference

Page 44: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Asymptotic inference in synthetic control

Standard errors from regression/t-tests are typically used to characteriseuncertainty about aggregate data:

• i.e. use a sample of restaurants in NJ and PA to estimate employmenttrends in each state

• standard errors reflect unavailability of aggregate data on employment

So, if we use aggregate data, is there zero uncertainty? No!

• We do not have perfect information about potential outcomes, even whenwe use aggregate data

• We have uncertainty about the potential outcome under control for thetreated unit

But, because the number of units is small in most SC applications, large sampleinferential techniques are not appropriate.

32 / 52

Page 45: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Permutation inference

Instead, we turn to an alternative inference technique: permutation inference.

1. Calculate the the test-statistic under the actual treatment assignment2. Calculate the distribution of the test-statistic under alternative treatmentassignments assuming treatment effects of zero

3. Assess whether the ‘true’ test-statistic is unlikely under the nulldistribution of treatment effects

Here, this implies constructing a synthetic control for every country in oursample, summarising the treatment effect, and comparing it to the treatmenteffect in West Germany.

33 / 52

Page 46: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Permutation inference (example)

What is the test-statistic here? For each unit calculate:

RMSE𝑗,𝑇0 =√√√√⎷

1𝑇0

𝑇0∑𝑡=1

(𝑌1,𝑡 −𝐽+1∑𝑗=2

𝑤𝑗𝑌𝑖,𝑡)2

RMSE𝑗,𝑇1 =√√√√⎷

1𝑇1

𝑇∑

𝑡=𝑇0

(𝑌1,𝑡 −𝐽+1∑𝑗=2

𝑤𝑗𝑌𝑖,𝑡)2

Where

• RMSE𝑗,𝑇0→ pre-treatment difference between unit and SC

• RMSE𝑗,𝑇1→ post-treatment difference between unit and SC

34 / 52

Page 47: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Permutation inference (example)

Given these, the test-statistic is:

𝑡𝑗 =RMSE𝑗,𝑇1

RMSE𝑗,𝑇0

= Post-intervention ‘fit’Pre-intervention ‘fit’

Intuition:

• More confident that the effect is different from zero when the estimatedtreatment effect is larger (RMSE𝑗,𝑇1

)

• Less confident that the effect is different from zero when thepre-treatment fit with the SC is larger (RMSE𝑗,𝑇0

)

P-value: how likely would it be to observe a ratio as large as the one we actuallyobserve if the treatment effects were zero and we picked a country at random?

35 / 52

Page 48: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Permutation inference (example)

Portugal

Denmark

France

Netherlands

Japan

UK

Austria

Switzerland

Belgium

Australia

Spain

USA

New Zealand

Italy

Greece

Norway

West Germany

5 10 15

Post−Period RMSE / Pre−Period RMSE

𝑝 = 1/17 = 0.05936 / 52

Page 49: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Placebos in space

1960 1970 1980 1990 2000

−40

00−

2000

020

0040

00

Year

Gap

in p

er−

capi

ta G

DP

(P

PP,

200

2 U

SD

)

West GermanyDonor countries

37 / 52

Page 50: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Placebos in time

1960 1965 1970 1975 1980 1985 1990

050

0010

000

1500

020

000

2500

030

000

year

per−

capi

ta G

DP

(P

PP,

200

2 U

SD

)

West Germanysynthetic West Germany

placebo reunification

38 / 52

Page 51: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

39 / 52

Page 52: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Additional application

Page 53: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

Anti-smoking legislation and cigarette consumptionIn 1988, California passed comprehensive tobacco control legislation. This wasa package of measures that included a tax increase, more earmarked spendingto anti-smoking health initiatives, and anti-smoking media campaigns. We willinvestigate the effect of this legislation on cigarette consumption in Californiausing synthetic control methods.

• Outcome variable (Y): Per capita cigarette sales (packs)• Treatment (D): 1 for CA after 1988, 0 for all other periods/states• Time (T): 1970 to 2000

(All states which passed similar legislation are excluded from the donor pool.)

40 / 52

Page 54: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)Californiarest of the U.S.

Passage of Proposition 99

41 / 52

Page 55: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

020

4060

8010

012

014

0

year

per−

capi

ta c

igar

ette

sal

es (

in p

acks

)Californiasynthetic California

Passage of Proposition 99

42 / 52

Page 56: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

State Weights in Synthetic California

State WeightUtah 0.334Nevada 0.234Montana 0.199Colorado 0.164Connecticut 0.069

43 / 52

Page 57: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Predictor means: Real vs. Synthetic California

California Average ofVariables Real Synthetic 38 control statesLn(GDP per capita) 10.08 9.86 9.86Percent aged 15-24 17.40 17.40 17.29Retail price 89.42 89.41 87.27Beer consumption per capita 24.28 24.20 23.75Cigarette sales per capita 1988 90.10 91.62 114.20Cigarette sales per capita 1980 120.20 120.43 136.58Cigarette sales per capita 1975 127.10 126.99 132.81Note: All variables except lagged cigarette sales are averaged for the 1980-1988 period (beerconsumption is averaged 1984-1988).

44 / 52

Page 58: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

)

Passage of Proposition 99

Cigarette sales gap in CA (versus synthetic CA).

45 / 52

Page 59: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

) Californiacontrol states

Passage of Proposition 99

Cigarette sales gap in all 38 states.

46 / 52

Page 60: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

) Californiacontrol states

Passage of Proposition 99

Cigarette sales gap in states with pre-intervention MSPE < 20 ⋅ MSPECA.

47 / 52

Page 61: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

) Californiacontrol states

Passage of Proposition 99

Cigarette sales gap in states with pre-intervention MSPE < 5 ⋅ MSPECA.

48 / 52

Page 62: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

California’s Proposition 99

1970 1975 1980 1985 1990 1995 2000

−30

−20

−10

010

2030

year

gap

in p

er−

capi

ta c

igar

ette

sal

es (

in p

acks

) Californiacontrol states

Passage of Proposition 99

Cigarette sales gap in states with pre-intervention MSPE < 2 ⋅ MSPECA.

49 / 52

Page 63: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Conclusion

Page 64: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Data requirements

Synthetic control has relatively low data requirements:

• Can use aggregate data (often administrative)

• e.g. economic indicators such as GDP, current-account balance, etc; politicalindicators such as turnout, vote share, etc

• Causal factors can be big and important

• e.g. legislation changes, macro-shocks, etc

• Units of analysis can be large

• Countries, states, regions, etc

• Does not even require full panel data for the pre-treatment period

• Can use averages of covariates rather than full panel data (useful whencovariates do not vary yearly)

50 / 52

Page 65: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Conclusion

The synthetic control approach…is arguably the most important inno-vation in the policy evaluation literature in the last 15 years.

Athey and Imbens, 2017Advantages:

• Builds on D&D and Matching by essentially forcing the data to exhibitparallel trends in the pre-treatment period

• Amenable to small-ish N comparisons (often easier to get data)• Clear, transparent, and easily communicable comparisons (e.g. Germany ispart Austria, part USA, etc)

51 / 52

Page 66: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Conclusion

The synthetic control approach…is arguably the most important inno-vation in the policy evaluation literature in the last 15 years.

Athey and Imbens, 2017Disadvantages 1:

• Provides inferences limited to single cases, not “average” treatment effects• Often easy to think of “compound” treatments, or multiple changesaffecting the treated unit at the same time as the treatment

• Pre-intervention period must be relatively large for us to trust paralleltrends holds in the post-intervention period

51 / 52

Page 67: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Conclusion

The synthetic control approach…is arguably the most important inno-vation in the policy evaluation literature in the last 15 years.

Athey and Imbens, 2017Disadvantages 2:

• Inference is not straightforward! Asymptotic inference does not work withthe SC method

• Coding is not straightforward! As you will see in the seminar :)

51 / 52

Page 68: Week 6: Synthetic Control Method · 2021. 3. 19. · 15/52. Notation 15/52. ... 43/52. Predictormeans:Realvs.SyntheticCalifornia California Averageof Variables Real Synthetic 38controlstates

Thanks for watching, and have a good week!

52 / 52