24
Chief Science Officer, Epidemico, Inc. Presented at TREAT-NMD Washington, DC December 8, 2015 01 Slides available @epidemico 02 Use of Social Media for Data Mining in Pharmacovigilance Nabarun Dasgupta, MPH, PhD This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit

TREAT-NMD Conference, Washington D.C. 12/8/15

Embed Size (px)

Citation preview

Page 1: TREAT-NMD Conference, Washington D.C. 12/8/15

Chief Science Officer, Epidemico, Inc.

Presented at TREAT-NMDWashington, DC

December 8, 2015

01

Slides available @epidemico

02

Use of Social Mediafor Data Miningin PharmacovigilanceNabarun Dasgupta, MPH, PhD

This work is licensed under the Creative CommonsAttribution-ShareAlike 4.0 International License.To view a copy of this license, visithttp://creativecommons.org/licenses/by-sa/4.0/

Page 2: TREAT-NMD Conference, Washington D.C. 12/8/15

DisclosuresEpidemico is commercializing aspects of this research.

Epidemico is wholly-owned subsidiary of Booz Allen Hamilton.

FDA has been funding research technology for 3 years.

Office of the Chief Scientist (OCS)Center for Tobacco Products (CTP)

Office of International Programs (OIP)

US FDA 01

A public-private partnership entering Year 2. The WEB-RADR project is supported by theInnovative Medicines Initiative Joint Undertaking (IMI JU) under grant agreement n° 115632,resources of which are composed of financial contributions from the European Union's SeventhFramework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.

EUROPE - IMI

02

GSK has been a development partner to extendtechnology and obtain GxP software validation,

over the last 2 years.

INDUSTRY

03

Page 3: TREAT-NMD Conference, Washington D.C. 12/8/15

MethodsHow to tame social media

01

Page 4: TREAT-NMD Conference, Washington D.C. 12/8/15

Data Processing Overview

Automated processes to identifyrelevant posts and clean data

02. Filter

Automated statistics and visual display serve hypothesis generation. HOWEVER, determining causation requiresinformation from other sources.

04. Statistics… and Determining Causation

OPTIONAL: remove false positives,disentangle attributions,

expand dictionary, correct geography

03. Curate

Collect social media data from APIs, third-partyauthorized resellers, and automated scraping

01. Acquire

Page 5: TREAT-NMD Conference, Washington D.C. 12/8/15

Automated ProcessesFor making sense of social media data

Use post geodata orprofile location

(~50% of Proto-AEs)

GeotagInternet vernacular to Med-

DRA preferred terms

Translate Vernacular

Reduce multiplicate copiesRemove identifiers

Consolidate

Automated NLPBayesian classifierMachine learning

Identify Events

Page 6: TREAT-NMD Conference, Washington D.C. 12/8/15

Proto-AEPosts with Resemblance to Adverse Events

DizzinessMedDRA: 10013573

ICD-10: R53

AstheniaMedDRA: 10003549

ICD-10: R42

For more examples see demo.medwatcher.org

Page 7: TREAT-NMD Conference, Washington D.C. 12/8/15

System Statistics

Drugs, medical devices,vaccines, biologics, tobaccoproducts, supply chain

Regulated products2,500

Updated daily by humancurators. Mandarin, French,Spanish & Dutch next.

Vernacular English phrases10,000

Public posts from Facebook, Twitter

& patient forums, back to 2012

Posts collected63,000,000

Certified coders train machinelearning classifiers

& build dictionary

Manually classified380,000

As of October 23, 2015

Page 8: TREAT-NMD Conference, Washington D.C. 12/8/15

System PerformanceHow well does automated processing work?

Automated tools identify 9 out of 10 adverse events.

Sensitivity or Recall

7 out of 10 posts contain AEinformation. Can increase to100% with manual curation.May vary by product subset.

Positive Predictive Valueor Precision

68%

88%

Across all products, all time, all data sources

100%

Page 9: TREAT-NMD Conference, Washington D.C. 12/8/15

SC

RA

NTO

N B

, DE

VR

IES

J, B

ER

INAT

O S

, BO

RTZ

C. H

AR

VAR

D B

US

INES

S R

EV

IEW

, JU

NE

201

3

Page 10: TREAT-NMD Conference, Washington D.C. 12/8/15

Indicator score calculationHow well does automated processing work?

Albuterol has me feeling all sorts of dizzy and weak this morning

albute

rol

albute

rol h

as

albute

rol h

as m

e

has m

e0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Proportion out of all keeps

Keep

Proportion of posts to-ken

never shows up in

None

Proportion of all trash

Trash

albuterolalbuterol hasalbuterol has mehas mehas me feelingfeelingfeeling allfeeling all sortssortssorts of dizzydizzydizzy and weakweakweak thisweak this morningmorningmorning

1. Tokenization 2. Score calculationFor each token “w”

b(w) = (number of positives containing w)

(total number of positives)

g(w) = (number of negatives containing w)

(total number of negatives)

Page 11: TREAT-NMD Conference, Washington D.C. 12/8/15

One Applied Examplealbuterol or salbutamolEstablished oral and inhaled drug for asthma & COPDHighly genericized marketA priori selected demonstration drug at demo.medwatcher.org

02

Page 12: TREAT-NMD Conference, Washington D.C. 12/8/15

MethodsAlbuterol/salbutamol has been a demo drug from project inception due to client interest

English language onlyUse 0.7 indicator score thresholdMultiplicate copies consolidated

02. Proto-AEs

Tree plots, top events, PRRComparison to deduplicated UK spontaneous reportsDrug Analysis Prints form MHRA

04. Statistics

Need to check for false positives03. No curation

1 year ending October 23, 2015albuterol, salbutamol, brand names

01. Public Facebook and Twitter

Page 13: TREAT-NMD Conference, Washington D.C. 12/8/15

Proto-AEs Mentions n= 13,055

Albuterol/Salbutamol: Proto-AE vs. Mentions

Mentions represent the patient voice.They exclude most spam, news, coupons,

and corporate announcements.

3% of Mentions were Proto-AEs

2.5%

Proto-AEs Mentions n=50,229

Proto-AEs Mentions n = 63,284

3.1%3.2%

Unique posts from Twitter & Facebookn = 1,946 Proto-AEs

21 out of 26 System Organ Classes81% MedDRA SOCs

Posts can have multiple symptoms andproducts mentioned within same posts

2,987 Drug-Event Pairs

1,613 from Twitter & 333 from Facebook83% From Twitter

Higher level group terms82 HLGTs

Higher level terms125 HLTs

MedDRA v18199 Preferred Terms

24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=63,284 Mentions

Page 14: TREAT-NMD Conference, Washington D.C. 12/8/15

Albuterol/Salbutamol by System Organ Class24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs

Rectangle size = Proto-AE count | Color intensity = PRR

Page 15: TREAT-NMD Conference, Washington D.C. 12/8/15

Albuterol/Salbutamol by Preferred Term24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEsRectangle size = Proto-AE count | Color intensity = PRR | Black = small sample PRR restriction

Palpitations

Drug ineffective

PainMalaise

Incorrect doseadministered

Hypersensitivity

Insomnia

Altered state ofconsciousnessFeeling jittery

Tremor

Restlessness

Cough

Lung disorder

Anxiety Crying Somnolence

Wheezing

Dizziness

Nonspecificreaction

FatigueConditionaggravated

Feelingabnormal

Heart rateincreased

Drugdoseomission

Bronchitis

Headache

Nausea VomitingPsychomotorhyperactivity

Burningsensation

Affect lability Dependence

Pyrexia

Pneumonia

IrritabilityMuscletwitching

Page 16: TREAT-NMD Conference, Washington D.C. 12/8/15

Albuterol/Salbutamol – Top 5 by Count24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs

Public Facebook and Twitter combined.

0 50 100 150 200 250 300 350 400 450 500

123

136

175

204

433Tremor

Drug ineffective

Altered state of consciousness

Malaise

Cough

Represents the things that most concern patients

Page 17: TREAT-NMD Conference, Washington D.C. 12/8/15

Albuterol/Salbutamol – Top 5 by PRR24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs

Unadjusted PRRs calculated against all other products in social listening database (n=2,551 medical products), 5 or more events

0 5 10 15 20 25 30 35 40

23.6

26.6

31.1

32.5

35.7Tremor

Palpitations

Feeling jittery

Tachycardia

Wheezing

Corresponds well with product label

Page 18: TREAT-NMD Conference, Washington D.C. 12/8/15

Correlation: Spontaneous Report vs. Social MediaNon-parametric density (isometric plot) at HLGT level

Respiratory disorders NEC(cough, lung disorder, oropharyngeal pain, rhinorrhoea, chest pain, dyspnoea)

Movement disorders(tremor)

Therapeutic/non-therapeutic effects(drug ineffective)

General system disorders(malaise, feeling jittery, pain, condition aggravated,

fatigue, feeling abnormal, asthenia)

Sleep disorders(insomnia, somnolence)

Epidermal & dermal conditions(pruritus, skin discomfort, rash, angioedema, urticaria)

Spontaneous07-Apr-1969 to 26-Aug-2015

46.5 years2,199 unique non-fatal re-

portsn=3,851 reactions at HLGT

156 unique HLGT

Social24-Oct-2014 to 23-Oct-20151 year1,946 unique non-fatal re-portsN=2,419 reactions at HLGT57 unique HLGT

ConcordanceMore common in spontaneous dataMore common in social media data

Page 19: TREAT-NMD Conference, Washington D.C. 12/8/15

Benefit Discussions and ContextualizationForthcoming manuscript, 15 GSK Rx & OTC drugs, n=15,490 posts, Facebook & Twitter Oct 2012 - Oct 2014

Page 20: TREAT-NMD Conference, Washington D.C. 12/8/15

LimitationsWhat does “bias” mean in this setting?

03

Page 21: TREAT-NMD Conference, Washington D.C. 12/8/15

How well can patients assess causality? How much should we care?Is a solely quantitative approach justified?

Causality & Quantitative Emphasis

01

Are we willing to disenfranchise the patient voice on privacy grounds?

Patient Privacy vs. Muting the Patient Voice02

How do you build tools for an ever-shifting data environment?Stability of Data Availability & Emerging Arenas

03

Can we all agree that nobody wants to see each Tweet as a separatecase report in a safety database? So, why is everyone so afraid?

Regulation Unclear

04

ChallengesA few among many

Page 22: TREAT-NMD Conference, Washington D.C. 12/8/15

What we know, how we know itEach source of product safety information provides different perspectives on patient experience

Clinical Trials Sponsor AE Practice Data Social Media0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%PRO Seriousness Completeness

Social media can elucidate whatpatients experience most frequently.

1. Patient-Reported Outcomes

Traditional PV focus on most serious of eventsmay be less central in social media.

2. Seriousness

Social media posts may by less complete.Conversely, contain less sensitive identifying

information than electronic health records.

3. Completeness

Three Dimensions of Bias

Hypothetical data for illustrative purposes only.

Page 23: TREAT-NMD Conference, Washington D.C. 12/8/15

Collaboration with WHO-UMC & MedDRA MSSO

Identify Products, Translate Vernacular to MedDRA

01

Reduce noise, provide indicator scores

Remove Spam, Identify AE , Benefits, Sentiment02

Demographics, time to onset, dose, route,medical conditions, lab results, challenge

Entity Extraction for Causal Determination03

Names, hospital name, medical record #, phone, email

Anonymize by Removing PII

04

Coming 1Q2016: Free Public APIIn appreciation for the public investment in this tool (FDA, IMI & Web-RADR)

For social media, clinical trial patient diaries, case safety reports, scientific literature

Page 24: TREAT-NMD Conference, Washington D.C. 12/8/15

Thank [email protected]

@epidemico@WEBRADR

demo.medwatcher.org