Upload
carlos-castillo-chato
View
144
Download
0
Embed Size (px)
Citation preview
Natural experiments:the basics
Class Data Mining Technology for Business and SocietyProgram M. Sc. Data ScienceUniversity Sapienza University of RomeSemester Spring 2016Lecturer Carlos Castillo http://chato.cl/
Sources:● Thad Dunning: Natural Experiments in the Social Sciences.
Cambridge University Press, 2012 [link].
Results from a “Natural Experiment”
Yulia Tyshchuk, Cindy Hui, Martha Grabowski, William A. Wallace: “Social Media and Warning Response Impacts in Extreme Events: Results from a Naturally Occurring Experiment” HICSS 2012● “On April 6th, 2010, at 8:15 a.m., an armed perpetrator
robbed Regina Check Cashing Corporation, located at 450 Hoosick Street in Troy, N.Y., which is about one mile away from the Rensselaer Polytechnic Institute (RPI) campus. Later on, the perpetrator was seen on campus, specifically, in the East Campus Athletic Village. The RPIAlert system was activated and the first ‘stay in shelter’ warning, via on campus loudspeakers, emails, phone calls, voice mails and text messages, was issued at 9:30 a.m. Two more ‘stay in shelter’ warnings were issued at 10:48 a.m. and 11:48 a.m. that day, before the ‘all clear’ message was issued at 12:52 p.m.”
● Paper describes Twitter's network evolution, keywords, etc. after the event
Natural experiments
Thad Dunning: Natural Experiments in the Social Sciences. Cambridge University Press, 2012 [link].
What is a randomized controlled experiment?
● Start with a population (= “study group”)● Separate control and treatment groups at
random● Apply treatment to treatment group● Measure outcomes in both groups● Compare outcomes● Profit!
Key elements of
randomized controlled experiments
1.Randomized: assignment of subjects to treatment/control groups is done at random
2.Controlled: response of subjects assigned to treatment is compared to response of subjects assigned to control
3.Experiment: treatment received by treatment group is under the control of a researcher
Why natural experiments?
● In some contexts, direct manipulation is– Expensive
– Impractical
– Unethical
● Most results from social sciences and computational social science in large populations are observational
Example: John Snow's cholera research (1855)
Do not confuse John Snow with Jon Snow.
Red = cholera death
Blue = water pump
Prevalent hypothesis: cholera was caused by miasma in air.
● Two companies– (1) Southwark & Vauxhall (2) Lambeth
● In 1852, Lambeth moved their intake pipes upstream the Thames river, before city sewage, but Shouthwark and Vauxhall did not
Snow's observations (1853-1854)
Comparison
Randomized controlled experiment
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to groups is done using a randomization device
3.Treatment is under the control of a researcher
Natural experiment
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to groups is as-if random, or as good as random
3.Treatment was not under the control of a researcher
Types of natural experiment
1.Standard natural experiment
2.Instrumental variables
3.Regression discontinuity
“Perfect” natural experiment
● Doherty, Gerber, Green 2006– Compare political attitudes of lottery winners vs
lottery non-winners
– Found that “lottery-induced affluence increases hostility toward estate taxes, marginally increases hostility towards government redistribution, but has little effect on broader attitudes concerning economic stratification or the role of government as a provider of social insurance”
Weakness: study group = lottery players. Are they representative of the whole society?
Another natural experiment: random attacks incite violence?
Lyall 2009● Russian soldiers in
Chechnya are instructed to strike random positions at random times during random durations
Another natural experiment: random attacks incite violence? (No)
Lyall 2009● Russian soldiers in
Chechnya were instructed to strike random positions at random times during random durations
● Moreover, many Russian soldiers were drunk (data from disciplinary actions)
● Less attacks to Russian soldiers from villages that were attacked
Natural experiment through instrumental variables
● Control/treatment differences too difficult to model
● Use an instrumental variable instead of assignment to control/treatment
Example: exposure of Chinese to protests in Hong Kong [Zhang 2015]● Study group: Chinese who
traveled to Hong Kong before the protests started
● Control: returned to China 36 to 6 days before the protests started
● Treatment: returned after the protests, and hence possibly witnessed them
Evaluating effects: difference of differences
● Found increase in political activity in Weibo
● Before: 1.66/100 posts were political posts
● After: difference in differences of 0.66 posts: 75% more.
Example: economic output and war[Miguel, Satyanath, Sergenti 2004]
● Economic shocks push countries to war?– Big methodological problem is reciprocal causation:
poverty creates the conditions for war, which creates more poverty
● External variable for economic output: rainfall● Study across countries with high/low relative rainfall● Low rainfall in one year increases chance of war
next year
→ evidence that economic shocks breed war
Example: military and future income
● Setting: we want to study whether going to the Vietnam war affected the future income of people in the US– E.g. lost experience or years of career caused drop in
future salary, trauma of war caused decline in productivity, etc.
– Very important to apportion stipends, pensions, etc.
● Instrumental variable: eligibility for military draft– Date of birth yields a number from 1 to 365
– All whose number is larger than X are drafted
– Not all drafted go to war, not all that go to war were drafted
Example: military and future income
● Study group: white men of military age in 1971
● This is called intention-to-treat analysis because in this case we use the intention to send people to war for dividing the study group
● Why? Because going to war is not a random process, while day of birth is as-if random
● More on this later ...
Instrumental variable 1984 earnings adjusted by inflation
Eligible by day of birth $16.172
Not eligible by day of birth $15.813 (about 2.2% less)
Regression discontinuity design
● A scalar variable is used to decide who receives treatment and who does not
● An arbitrary threshold is used● Observe outcomes just below and just above
the threshold
Regression discontinuity designs
Students with a score above 11 are given a “Certificate of Merit”
Hypothesis: getting a certificate of merit increases chances of scholarship
Electronic voting in Brazil[Fujiwara 2015]
● Literally hundreds of candidates per ballot● Municipalities with 40,500 voters or more
→ electronic ballot
● Municipalities with less than 40,500 voters
→ paper ballot
● Create two bands and compare, using h=20k
[40500-h, 40500) [40500, 40500+h)
● Jump from 75% to 90% valid votes, particularly in municipalities with lower literacy
Band size: if it's too narrow we question sample size, if it's too wide we question randomness
The Neyman-Holland-Rubin model(see, e.g., [Freedman 2006])
What we really would like to observe
● We would like to observe the following:– If we take a random subject i
– And create two parallel universes, one (T) in which i receives treatment and one (C) in which she doesn't
– What would be the expected treatment effect:
What we observe
● Instead, we observe
● Actually, by construction we never observe both outcomes for the same unit — that's why those outcomes are called counter-factual
● Instead, we either observe YT or YC
Example in regression discontinuity designs
(Students with a score above 11 were given a “Certificate of Merit”)
Justification
● We assume:
● Which requires:
Average causal effect
How do we satisfy these requirements?
Reality is more complicated ...
● In reality, matching designs are often used, in which the control group is of similar size to treatment group and with similar characteristics
● The best matching design is strong in terms of “similar characteristics”, ideally taking everything into account
● More on this later, but first ...
In instrumental variables
● Instrumental variable: eligibility for military draft– Date of birth yields a number from 1 to 365
– All whose number is larger than X are drafted
● Note imperfect overlapDrafted
Went to war
Declared medically unfit,escaped to Canada, etc.
Volunteered togo kill people
Drafted andwent to war
In instrumental variables
● Complier:– If drafted, goes to war
– If not drafted, doesn't go to war
● Always-treat:– Goes to war no matter what
● Never-treat:– Never goes to war
Neyman's model with crossovers
...
... ...
Study group
...
...
Treatment group
Control group
...Never-treats
Compliers and always-
treats
... Always-treats
Compliers and never-
treats
Analysis for instrumental variables
● Average response of compliers to treatment = – How much money a random person who was
drafted and hence went to war will make?
● Average response of compliers to control = – How much money a random person who was not
drafted and hence did not go to war will make?
● Objective: determine the average response of compliers to treatment =
Let's work this outN = effect on never treats; T = effect on always-treats
α proportion of always-treats; β proportion of compliers, γ proportion of never-treats
How do we estimate β?...
... ...
Study group
...
...
Treatment group
Control group
... ...
α proportion of always-treats; β proportion of compliers, γ proportion of never-treats
How do we estimate β?...
... ...
Study group
...
...
Treatment group
Control group
... ...
α proportion of always-treats; β proportion of compliers, γ proportion of never-treats
How do we estimate β?α proportion of always-treats; β proportion of compliers, γ proportion of never-treats
... ...
Treatment group
Control group
... ...
● = proportion of treated in the treatment group● = proportion of treated in the control group
Qualitative evidence
● Do not ignore qualitative evidence! It tells you about the causal process
● Several kinds of causal-process observations, including:– How were units assigned to treatment?
– Which instrumental variables could be useful?
– What is the mechanism by which treatment acts?
Evaluation
● Quality of random assignment– Information
– Incentives
– Capacities
● Credibility of the model● Relevance of the intervention
Additional material: application to WWW research
● See also this tutorial:
https://sites.google.com/site/csswwwtutorial/