70
Evolution of the Humanitarian Data Ecosystem Sara Terp, AAAI 2015

Evolution of the Humanitarian Data Ecosystem

Embed Size (px)

Citation preview

Evolution of the

Humanitarian Data

EcosystemSara Terp, AAAI 2015

SJ’s Stages of Data Use

• Hand-scraping (including lists of where to look),

random categories, SMS, maps

• Standards and dataset visualisations

• Mashups and statistical analysis

• Stable datastores and local data scientists

2004-2009• December 2004: Boxing Day Tsunami kills 230,000 people. Sri

Lankan techs create Sahana

• January 2008: Kenyan news blackout during post-election violence.

Bloggers create Ushahidi

• June 2009: CrisisCommons forms after a tweet-up

• October 2009: ICCM conference, Cleveland

• 2009: Ushahidi creates CrisisMappers

• 2009: First RHOK hackathon creates PeopleFinder

• 2009: CDAC forms after a discussion in a bar

Intelligence Systems

BOTSHUMANS

Good at: complex analysis, heuristics, pragmatic

translations, creative data finding, sudden onset

Not so good at: high volume, repetitive, 24/7 accurate

Good at: high volume, repetitive, complex

pattern finding, long termNot so good at:

complexity, human foibles

Unmanned Vehicle ControlPACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank

Computer monitored by

human

Full 5b Computer does everything autonomously

5aComputer chooses action, performs it &

informs human

Computer backed up by

human

Action unless revoked 4bComputer chooses action & performs it

unless human disapproves

4aComputer chooses action & performs it if

human approves

Human backed up by

computer

Advice, and if

authorised, action3

Computer suggests options and proposes

one of them

Human assisted by

computerAdvice 2 Computer suggests options to human

Human assisted by

computer only when

requested

Advice only if requested 1Human asks computer to suggest options

and human selects

Operator None 0Whole task done by human except for

actual operations

2010: Haiti, VTCs

“Don’t be Imperial”

Pro: “Laboratory” = on behalf of

Per: “Community” = alongside

Para: “Grassroots” –by and within

Volunteer Skills Used

Programming

Telecommunications

Mapping

User Experience

IT project management

Data analysis

Relief work experience

Local knowledge

Translation

Communications & PR

Facilitation and admin

Making tea!

Data Scientist Skills

Data Process

Ask a good question…

Obtain datasets

Clean, combine, transform data

Explore the data

Try models (classification, machine learning etc)

Interpret and communicate your results

People started conversations…

• Twitter

• Facebook

• SMS

• Phones

• Photos

• News

• Sneakernet

DecisionsGAP

Overworked Field People

SMS to Map

@bodaceacat

http://blog.overcognition.com/

Creating Datasets

• People add features to OpenStreepMap

• Person sends SMS to 4636

• Message goes to CrowdFlower

• Person translates and geolocates message

• Message goes to Ushahidi display

• Message gets to responders, public, aunts, Sahana etc.

Interpreting Aerial Images

Building Technologies

Ongoing:

• CDAC website review

• Field Voices

• Haiti Amps Network

• Haitian Voices

• Machine Translation System

• Oil Spill Response

• PAP outskirts food relief

• Telecommunications technical project

• Low-bandwidth Ushahidi

• Kapab Medical Facility Capacity Finder

• Disaster Accountability Public Database

• Sync the Sheet

• Testing Crabgrass

Closed:

• Translators in Action - other translation tools were

developed

Proposed

• Mining Relief Data

• Automating Aid Request via a Voice Phone Call

• Building A Refugee Camp Cell Phone Early

Warning System

• Community Tool Box

• CrisisCommons Roledex

• Facebook for ARC Safe and Well site

• Haitian Skilled Workforce Retention

• Post Disaster Child Protection

• CDAC Radio Website

Unknown

• Disaster Accountability Hotline

• Incident visualisation

• Needs Categorization

• World Academic TeaCHing Hospitals disaster

relief

Improving Technologies

• ReliefWeb UX redesign

• Ushahidi UX redesign

• CDAC website review

• OpenStreetMap development, at other end of table;

OpenStreetMap users at the other

Building Interfaces

Creating Community Sensors

@bodaceacat

http://blog.overcognition.com/

What’s an appropriate crisis to help?

• Information

– Information deluge

– Knowledge drought

• Infrastructure

– Local infrastructure is overwhelmed

– Existing information channels

• Stages

– Mitigation

– Preparedness

– Response

– Recovery

– Sustainability

@bodaceacat

http://blog.overcognition.com/

user questions for pkfloods

• Where can I find out who needs my help?

• Where can I find people to help me deliver aid?

• Where can I find out information?

• How do I find out if I'm about to be flooded?

• Who should I alert/give my information to?

• Where can I find general information out about #pkfloods?

• Where can I search for people? (I cannot find my grandmother/relative)

• I have been 'found' - who should I alert/give my status to?

• I need food/water/supplies, how can I tell people I need something?

• I have food/water/supplies, how can I find out where there's a need?

• I want to get to location x, where can I find out about the state of the roads?

• I am observing/know the state of the roads, who should I alert/give my

information to?

• How can I find out where there are information blackspots/there is no

telecomms coverage?

• I know where the telecoms/information blackspots are, who should I give my

alert/information to and how?

@bodaceacat

http://blog.overcognition.com/

Pkfloods Use Cases

What if the datapoints move?

• Ash cloud from Snæfellsjökull left planes on ground

and thousands of people stranded

• UK crisis mappers started news and twitter watches

• Needed a tool that let us track who was stranded

and ways for people to get home

• But all the methods we had were static

@bodaceacat

http://blog.overcognition.com/

The 2010 Vision:

effective crisis information ecosystems

Responder-triggered VTCs

Task Types

• Message level:

• Media monitoring, source checking (e.g. SMS), summarisation, translation,

geolocation, cleaning (e.g. PII removal), categorising (e.g. grouping)

• Meta level:

• Analysis (producing graphs, explanations, connections),

• Verification

• Tasks / team control

• Communication

• After-action reporting (inc evaluation)

Sudden-Onset Crisis

• Fire, flood, heat, cold, tsunami, earthquake, storm, tornado, hurricane, cyclone, refugees, bombings, election issues / violence etc

2011: UN Data Science

Slow-Burn Crises

Droughts, agriculture, food insecurity, conflict, education, disease, employment, shelter, trade, endemic violence, GBV etc.

“Human development is a process of enlarging people’s choices. The most critical ones are to lead a long and healthy life, to be educated and to enjoy a decent standard of living. Additional choices include political freedom, guaranteed human rights and self-respect – what Adam Smith called the ability to mix with others without being ashamed to appear in public” – UNDP Human Development Report

Crisismapping Early 2011: radiation

Category Standards

Human/Machine Data Generation

Data CrossWalks

DR Congo in Data.UN.Org:

“Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the

Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem.

Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the

Congo”

DR Congo in common standards:

“Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of

the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN

Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)

2012: Partial Automation

ACAPS DNA

Data Finding

Common Data Needs• Rolodexes: which response groups to follow, and who’s

likely to bring what

• 3Ws: who’s doing what where

• GIS data: knowing where medical facilities, schools, roads,

bridges are

• Communications: cell tower locations and signal maps

• Demographics.

• Technology and social media use to demographics

Commonly Available Data

• Direct messages (SMS etc)

• Social media messages (tweets etc)

• Demographic data (e.g. surveys)

• News reports

• 3Ws, situation reports (both official, via news sources and on

social media), field notes

• Photos: ground, aerial, satellite, videos

• CSVs, webpages, PDFs, audio recordings (e.g. radio)

Common Issues• Massively dispersed and unstructured data (still)

• Named entity and category mismatches between datasets

• Trust

• Personally Identifiable Information (and risk)

* Crisis response is time-limited

* Crisis data response is resource-limited

* Crisis preparation is attention-limited (if you want resilience,

either pay or lead)

(Some of) What’s Broken

• Crisis Data– Remote vs Ground disconnect– Crisis vs Development disconnect– Deployment lead overload

• Development Data– Broken data formats, access, coverage, standards– Ignored data sources– Human vs Data disconnect

• Communities– Stovepipes, fiefdoms, imperialism, finding…

2013: Data Overloads

Cleaner Workflows

More Maps

2013 Boston bombings

My Personal Three Vs

• Variety– Data all over the place– Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images,

videos, audiofiles, maps, proprietary. Etc.

• Velocity– Streams updating too fast for a mapping team (100-200 people) to handle– Pages updating too frequently to check by hand

• Volume– Can’t open the data in a spreadsheet– Can’t fit the data on my laptop– Maxes out my credit card (thank you Amazon!)

The other Vs: Veracity

Mappers Needed More Data Science Literacy

Datastores

2014: Datastores

We Build Community Data

Tools

Ushahidi is a Dataset

Ushahidi Platform

PHOTOS, VIDEOS

Ushahidi Platform as Data

Non-Expert Visualisations

Word-level analysis

Typhoon Ruby, Dec 2014

Where to Map?

Stuff Happens

Lots of groups curate data

Including volunteer mappers

Ruby Datastores

Local wins. Local should

(almost) always win

2015: NGO Data Scientists

Ushahidi Platforms as

Datasets

Datastores and Viz

Resilience

And are making it part of “normality”

Here are some missing

pieces• Basic vocabularies, e.g. stopword lists for most languages

(including SMSspeak in different languages)

• Pre-crisis datasets for many crisis-prone countries

• Philippines: local response groups set up

• Missing Maps project for GIS data

• What about the rest?

• User datasets in existing tools

• E.g. adding own gazetteers into Ushahidi.