Social Media Use During Humanitarian Emergencies and Disasters
Alexandra Olteanu Social Computing +
Computational Social Science
IBM Science for Social Good Applied Data Science
IBM TJ Watson Research Center
IBM Science for Social Good InitiativeExploring a new approach in addressing world’s challenges
Partners IBM Research scientists with academic fellows and subject matter experts from a diverse range of non-governmental organizations (NGOs) to tackle emerging societal challenges using science and technology.
Events & Social MediaHigh volumes of event-related tweets
• 25+ million tweets for Sandy Hurricane • 24+ million tweets for 2016 Academy Awards • 10+ million tweets for Supreme Court ruling on marriage equality • 4+ million tweets about UK 2015 elections
Endorsed by governmental agencies
Social media are to enhance, not replace
What kind of information people share during humanitarian crises?
with Carlos Castillo and Sarah Vieweg
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month
✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment
✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda
✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH
✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.
✓ 15 instantaneous crises ✓ 15 diffused crises
✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information
✓ Crowdsource workers from the affected countries
Data Collection & Annotation
Singapore Haze 2013
Typhoon Pablo 2012
Australia Wildfire 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Content & Source Variations
Affected individuals 20% (min 5%, max 57%)
Infrastructure & Utilities 7% (min 0%, max 22%)
Savar Building Collapse 2013
Typhoon Pablo 2012
Alberta Floods 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work
Singapore Haze 2013
Typhoon Pablo 2012
Australia Wildfire 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Content & Source Variations
Affected individuals 20% (min 5%, max 57%)
Infrastructure & Utilities 7% (min 0%, max 22%)
Savar Building Collapse 2013
Typhoon Pablo 2012
Alberta Floods 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Eyewitness accounts 9% (min 0%, max 54%)
Outsiders 38% (min 3%, max 65%)
Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work
Singapore Haze 2013
Typhoon Pablo 2012
Australia Wildfire 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Content & Source Variations
Affected individuals 20% (min 5%, max 57%)
Infrastructure & Utilities 7% (min 0%, max 22%)
Savar Building Collapse 2013
Typhoon Pablo 2012
Alberta Floods 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Eyewitness accounts 9% (min 0%, max 54%)
Outsiders 38% (min 3%, max 65%)
Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work
Singapore Haze 2013
Typhoon Pablo 2012
Australia Wildfire 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Content & Source Variations
Affected individuals 20% (min 5%, max 57%)
Infrastructure & Utilities 7% (min 0%, max 22%)
Savar Building Collapse 2013
Typhoon Pablo 2012
Alberta Floods 2013
Guatemala Quake 2012
Boston Bombings 2013
Typhoon Yolanda 2013
Eyewitness accounts 9% (min 0%, max 54%)
Outsiders 38% (min 3%, max 65%)
Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work
Savar building collapse’13LA airport shootings’13
NY train crash’13Russia meteor’13
Colorado wildfires’12Guatemala earthquake’12
Glasgow helicopter crash’13West Texas explosion’13
Lac Megantic train crash’13Venezuela refinery’12Bohol earthquake’13Boston bombings’13
Brazil nightclub fire’13Spain train crash’13
Manila floods’13Alberta floods’13
Philipinnes floods’12Typhoon Yolanda’13
Costa Rica earthquake’12Singapore haze’13
Italy earthquakes’12Australia bushfire’13
Queensland floods’13Colorado floods’13Sardinia floods’13Typhoon Pablo’12
Data Patterns: Types
lower similarity
Savar building collapse’13LA airport shootings’13
NY train crash’13Russia meteor’13
Colorado wildfires’12Guatemala earthquake’12
Glasgow helicopter crash’13West Texas explosion’13
Lac Megantic train crash’13Venezuela refinery’12Bohol earthquake’13Boston bombings’13
Brazil nightclub fire’13Spain train crash’13
Manila floods’13Alberta floods’13
Philipinnes floods’12Typhoon Yolanda’13
Costa Rica earthquake’12Singapore haze’13
Italy earthquakes’12Australia bushfire’13
Queensland floods’13Colorado floods’13Sardinia floods’13Typhoon Pablo’12 Natural, diffused,
and progressive
Human-induced, focalized, and instantaneous
Data Patterns: Types
lower similarity
Singapore haze’13Philipinnes floods’12
Alberta floods’13Manila floods’13
Italy earthquakes’12Australia bushfire’13Colorado wildfires’12
Colorado floods’13Queensland floods’13
Typhoon Pablo’12Typhoon Yolanda’13
Brazil nightclub fire’13Russia meteor’13
Boston bombings’13Glasgow helicopter crash’13
Bohol earthquake’13Venezuela refinery’12
Sardinia floods’13West Texas explosion’13
NY train crash’13Costa Rica earthquake’12
LA airport shootings’13Savar building collapse’13
Lac Megantic train crash’13Guatemala earthquake’12
Spain train crash’13
Data Patterns: Sources
lower similarity
Singapore haze’13Philipinnes floods’12
Alberta floods’13Manila floods’13
Italy earthquakes’12Australia bushfire’13Colorado wildfires’12
Colorado floods’13Queensland floods’13
Typhoon Pablo’12Typhoon Yolanda’13
Brazil nightclub fire’13Russia meteor’13
Boston bombings’13Glasgow helicopter crash’13
Bohol earthquake’13Venezuela refinery’12
Sardinia floods’13West Texas explosion’13
NY train crash’13Costa Rica earthquake’12
LA airport shootings’13Savar building collapse’13
Lac Megantic train crash’13Guatemala earthquake’12
Spain train crash’13
Data Patterns: Sources
lower similarity
Instantaneous, focalized, and human-induced
Natural, diffused, and progressive
Singapore haze’13Philipinnes floods’12
Alberta floods’13Manila floods’13
Italy earthquakes’12Australia bushfire’13Colorado wildfires’12
Colorado floods’13Queensland floods’13
Typhoon Pablo’12Typhoon Yolanda’13
Brazil nightclub fire’13Russia meteor’13
Boston bombings’13Glasgow helicopter crash’13
Bohol earthquake’13Venezuela refinery’12
Sardinia floods’13West Texas explosion’13
NY train crash’13Costa Rica earthquake’12
LA airport shootings’13Savar building collapse’13
Lac Megantic train crash’13Guatemala earthquake’12
Spain train crash’13
Data Patterns: Sources
lower similarity
Instantaneous, focalized, and human-induced
Natural, diffused, and progressive
Singapore haze’13Philipinnes floods’12
Alberta floods’13Manila floods’13
Italy earthquakes’12Australia bushfire’13Colorado wildfires’12
Colorado floods’13Queensland floods’13
Typhoon Pablo’12Typhoon Yolanda’13
Brazil nightclub fire’13Russia meteor’13
Boston bombings’13Glasgow helicopter crash’13
Bohol earthquake’13Venezuela refinery’12
Sardinia floods’13West Texas explosion’13
NY train crash’13Costa Rica earthquake’12
LA airport shootings’13Savar building collapse’13
Lac Megantic train crash’13Guatemala earthquake’12
Spain train crash’13
Data Patterns: Sources
lower similarity
Instantaneous, focalized, and human-induced
Natural, diffused, and progressive
Similar events tend to have a similar distribution of message types & sources.
Med
ia
Out
side
rs
Eye
witn
ess
Gov
ernm
ent
NG
Os
Bus
ines
s
Other Useful Info.
Sympathy
Affected Ind.
Donat. & Volun.
Caution & Advice
Infrast. & Utilities
0
5%
10%
>15%
Sources & Message Types
Med
ia
Out
side
rs
Eye
witn
ess
Gov
ernm
ent
NG
Os
Bus
ines
s
Other Useful Info.
Sympathy
Affected Ind.
Donat. & Volun.
Caution & Advice
Infrast. & Utilities
0
5%
10%
>15%
Sources & Message Types
Med
ia
Out
side
rs
Eye
witn
ess
Gov
ernm
ent
NG
Os
Bus
ines
s
Other Useful Info.
Sympathy
Affected Ind.
Donat. & Volun.
Caution & Advice
Infrast. & Utilities
0
5%
10%
>15%
Sources & Message Types
Med
ia
Out
side
rs
Eye
witn
ess
Gov
ernm
ent
NG
Os
Bus
ines
s
Other Useful Info.
Sympathy
Affected Ind.
Donat. & Volun.
Caution & Advice
Infrast. & Utilities
0
5%
10%
>15%
Sources & Message Types
Med
ia
Out
side
rs
Eye
witn
ess
Gov
ernm
ent
NG
Os
Bus
ines
s
Other Useful Info.
Sympathy
Affected Ind.
Donat. & Volun.
Caution & Advice
Infrast. & Utilities
0
5%
10%
>15%
Sources & Message Types
Caution & Advice
Sympathy & Support
Temporal Distribution: Information Typespeak
12h 24h 36h 48h … several days
Caution & Advice
Sympathy & Support
Infrastructure & Utilities
Temporal Distribution: Information Typespeak
12h 24h 36h 48h … several days
Caution & Advice
Sympathy & Support
Affected Individuals
Infrastructure & Utilities
Temporal Distribution: Information Typespeak
12h 24h 36h 48h … several days
Caution & Advice
Sympathy & Support
Affected Individuals
Infrastructure & Utilities
Other Useful Info.
Temporal Distribution: Information Typespeak
12h 24h 36h 48h … several days
Caution & Advice
Sympathy & Support
Affected Individuals
Infrastructure & Utilities
Other Useful Info.
Donations & Volunteering
Temporal Distribution: Information Typespeak
12h 24h 36h 48h … several days
12h 24h 36h 48h … several days
Eyewitness
Government Media
Progressive
Temporal Distribution: Sourcespeak
12h 24h 36h 48h … several days
Eyewitness
Government
Outsiders
Media
Progressive
Temporal Distribution: Sourcespeak
12h 24h 36h 48h … several days
Eyewitness
Government
NGOsOutsiders
Media
Progressive
Temporal Distribution: Sourcespeak
12h 24h 36h 48h … several days
Eyewitness
Government
Businesses
NGOsOutsiders
Media
Progressive
Temporal Distribution: Sourcespeak
How do we collect social media data during humanitarian crises?
with Carlos Castillo, Fernando Diaz, and Sarah Vieweg
Data Collection: How Is It Done?
Tweets are queried by
Content #prayforwest #abflood
Location
Low recall: 33%Not everyone uses the keywords.
Maximum 400 terms.
Low precision: 12%Not everyone on the ground talks
about the event. Maximum 25 geo-rectangles.
Maximum 1% of all tweets
longitude: [-97.5, -96.5] & latitude: [31.5, 32]
Data Collection: How Is It Done?
Tweets are queried by
Content #prayforwest #abflood
Location
Low recall: 33%Not everyone uses the keywords.
Maximum 400 terms.
Low precision: 12%Not everyone on the ground talks
about the event. Maximum 25 geo-rectangles.
Maximum 1% of all tweets
longitude: [-97.5, -96.5] & latitude: [31.5, 32]
damage
affected people
people displaced
donate blood
text redcross
stay safe
crisis deepens
evacuated
toll raises
… …
Key Insight: Distill a Crisis Lexicon
Precision vs. Recall
Recall(% of crisis tweets retrieved)
Prec
ision
(o
n-cris
is %
of ret
rieve
d twe
ets)
Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.
Precision vs. Recall
Recall(% of crisis tweets retrieved)
Prec
ision
(o
n-cris
is %
of ret
rieve
d twe
ets)
DesiredKW-based
Geo-based
Keyword based sampling#sandy, #bostonbombings, #qldflood
Location based samplingtweets geo-tagged in area of the disaster
Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.
100%30%
100%
10%
Precision vs. Recall
Recall(% of crisis tweets retrieved)
Prec
ision
(o
n-cris
is %
of ret
rieve
d twe
ets)
DesiredKW-based
Geo-based
Keyword based sampling#sandy, #bostonbombings, #qldflood
Location based samplingtweets geo-tagged in area of the disaster
Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.
High Precision
High Recall
PMI PMI+Freq.CHI2
Freq.
100%30%
100%
10%
Precision vs. Recall
Recall(% of crisis tweets retrieved)
Prec
ision
(o
n-cris
is %
of ret
rieve
d twe
ets)
DesiredKW-based
Geo-based
Keyword based sampling#sandy, #bostonbombings, #qldflood
Location based samplingtweets geo-tagged in area of the disaster
Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.
High Precision
High Recall
100%30%
100%
10%
0
15
30
45
60
Geo-sampling (baseline)
Eyewitness Government NGOs For-profit Corp. Media Other sources
RepresentativenessKeyword-based samples appear to overrepresent media reports
and underrepresent eyewitness accounts.
Lexicon-based samples better preserve the original distribution of messages types and sources.
Geo-sampling
0
15
30
45
60
Geo-sampling (baseline)
Eyewitness Government NGOs For-profit Corp. Media Other sources
Representativeness
Keyword-sampling
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sourcesKeyword-sampling
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sources
Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.
Lexicon-based samples better preserve the original distribution of messages types and sources.
Geo-sampling
0
15
30
45
60
Geo-sampling (baseline)
Eyewitness Government NGOs For-profit Corp. Media Other sources
Representativeness
Keyword-sampling
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sourcesKeyword-sampling
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sources
Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.
Lexicon-based samples better preserve the original distribution of messages types and sources.
Geo-sampling
0
15
30
45
60
Geo-sampling (baseline)
Eyewitness Government NGOs For-profit Corp. Media Other sources
Representativeness
S4
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sourcesS4
0
15
30
45
60
EyewitnessGov. NGO For-profitMediaOther sources
Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.
Lexicon-based samples better preserve the original distribution of messages types and sources.
Geo-sampling