17
Validating search protocols for mining of health and disease events on Twitter Aditya Lia Ramadona 1,2* , Lutfan Lazuardi 3 , Sulistyawati 1,4 , Anwar Dwi Cahyono 5 , Åsa Holmner 6 , Hari Kusnanto 3 , Joacim Rocklöv 1 The International Conference on Public Health (ICPH) Solo, Indonesia; September 14-15, 2016 https://arxiv.org/abs/1608.05910

Validating search protocols for mining of health and disease events on Twitter

Embed Size (px)

Citation preview

Validating search protocols for mining of health and disease events

on TwitterAditya Lia Ramadona1,2*, Lutfan Lazuardi3, Sulistyawati1,4,

Anwar Dwi Cahyono5, Åsa Holmner6, Hari Kusnanto3, Joacim Rocklöv1

The International Conference on Public Health (ICPH)Solo, Indonesia; September 14-15, 2016

https://arxiv.org/abs/1608.05910

Introduction Twitter• free social networking and micro-

blogging service • 140-character: news, events,

personal feeling and experiences, …• May 2016: 24.34 million Indonesian

active users ~ 10% (Statista, 2016)

Twitter offers streams of the public data flowing

• might contain health-related information

• can be explored for public health monitoring and surveillance purposes (Paul et al. 2016)

Indonesia Social Media Trend (Jakpat, 2016)

Introduction

Previous studies • Signorini et al. 2011: track levels of disease activity

• Eichstaedt et al. 2015: predicts heart disease mortality

• Strom et al. 2013: measuring health-related quality of life

• many more…

Methodological challenges • data and language processing

• model development

www.bahasakita.com

Subjects and Methods

Develop groups of words and phrases relevant to disease symptoms and health outcomes in the Bahasa Indonesia

historical Twitter

Twitter stream14dreal-time

Subjects and Methods

Sentiment analysis• examining a tweet from Twitter feeds

• the decisions were made by people with expert knowledge

millions of tweets: time-consuming and inefficient

Replicating expert assessment• develop a model, interpret results and adjust the model

• make predictions

Results: text analysis

Historical Twitter feeds: 390 tweets• "rumah OR sakit OR rawat OR inap OR demam OR panas -cuaca OR berdarah

OR pendarahan OR tombosit OR badan OR muntah OR badan OR tua OR ':('"

Preprocessing • removing retweets and eliminate some noise

• removing punctuation, numbers, capitalization, and the Bahasa stop-words (e.g. kamu and aja)

[107] "@XYZ kamu izin aja, bilang kamu sakit :(("[107] "xyz izin bilang sakit"

Results: text analysis

1,632 words

• the most highly correlate words: sakit (sick, ill, pain)

hati (0.23) ~ shame, broken heart, …rasa (0.13) ~ painperut (0.12) ~ stomach ache

Figure 1. Words that appear more than 10 times

Results: model development

Predictors

• highest words frequencies (22)

• counting the number of the predictor words in a tweet

Classification and Regression Trees model (Breiman et al. 1983)

• rpart package (Therneau et al. 2015)

Results: model development

390 tweets

historical Twitter feeds

• 273 tweets (70%): training

• 117 tweets (30%): validating

1,145,649 tweets

Twitter stream feeds: testingIndonesia: between 11°S and 6°N and 95°E and 141°E, 7 days: 26th July – 1st August 2016

• 100 from 6,109 TRUE results

• 100 from 1,139,540 FALSE result

Results: model development

Results

Results

Model Performance Validation Testing

AUC 0.82 0.70

Sensitivity 80.0 42.0

Specifity 84.6 98.0

Positive Predictive Value 86.7 95.5

Negative Predictive Value 77.2 62.8

Limitations + Challenges = Future Work

team member involved• academics, health workers

Twitter users• telecommunications infrastructure

• characteristics of people

methods• data: streaming (Indonesia, 7d/24h ~ 1.5GB in csv format)

• model: CART, RandomForest, GBM, …

Summary

Monitoring of public sentiment on Twitter + contextual knowledge • a nearly real-time proxy for health-related indicators

Models do not replace expert judgment• accurately analyze small amounts of information (tweets)

• improve and refine the model

• bias and emotion: integrate assessments of many experts

Summary

1 Department of Public Health and Clinical Medicine, Epidemiology and Global Health, Umeå University2 Center for Environmental Studies, Universitas Gadjah Mada

3 Department of Public Health, Faculty of Medicine, Universitas Gadjah Mada4 Department of Public Health, Universitas Ahmad Dahlan

5 District Health Office, Yogyakarta6 Department of Radiation Sciences, Umeå University

*[email protected]

www.themexpert.com/images/easyblog_articles/270/twitter_cover.jpg