Utilizing Social Health Websites for Cognitive Computing and Clinical Decision Support Systems

Utilizing Social Health Websites for Cognitive Computing Exploring the Potential of User-Generated Health Content for Clinical Decision Support Systems

Harriëtte Smook [email protected]

28 October 2014

Cognitive Computing Systems‘Prostheses’ for human cognition

Introduce a new generation of Clinical Decision Support Systems

Learn by being used: Humans often can easily detect machine errors. Systems usage can be arranged in such a way that humans understand the system and the problems it solves.

Expand human cognition: Ease processes, especially those with large data sets or data that requires human interpretation.

Why Cognitive Systems? IBM Research. Retrieved from http://www.research.ibm.com/cognitive-computing/why-cognitive-systems.shtml, accessed 16 July 2014. Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014.

Apple Siri

Google Glass

IBM Watson

Interact naturally: Machines & users should be closer to each other by enabling machines to understand human natural language

Clinical Decision Support SystemsIBM Watson

2. Generates & evaluates!evidence-based hypothesis

1. Understands !human natural language & human communication

3. Adapts & learns!from user selections

& responses

Transformational technologies combinedLora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014

How can Health 2.0 help cognitive computing systems?

+ =Collaboration of patients, medical experts and researchers

Collective aggregation of information, experiences and data

Tools for collecting, tracking and sharing health information: • Monitoring new treatments • Collecting real-world experiences • Patients have more explicit control over their own data

Social Health Websites: !

PatientsLikeMe !

… !

HealthUnlocked ?Health Tracking Tools:

How can health 2.0 help cognitive computing systems?

+ =The crowd provides human perspectives:

Crowdsourcing Human SemanticsNew generation of

Clinical Decision Support Systems

Patients Health-aware citizensDoctors

Experts provide

formal knowledge

My patient has acute coryza!

Well, I only have a cold.

How to utilize user-generated health content as training data for cognitive computing systems?

Representativeness Validity

Consistency

2. Data Analysis 3. Create Ground Truth Data

Compare with existing Watson data

1. Gather the data

PatientsLikeMe Publicly available pages

Data AnalysisImportant aspects for obtaining widespread health data

Coverage of different medical conditions

> 500 conditions

Availability of different kinds of data

Diverse health tracking tools

Consistency in the used vocabulary

43% of the symptoms covered by UMLS

Cultural and geographical dispersion of users

> 260.000 users Website in English

PatientsLikeMe (PLM)

Catherine Arnott Smith and Paul J Wicks. Patientslikeme: Consumer health vocabulary as a folksonomy. In AMIA annual symposium proceedings, vol. 2008, p. 682. American Medical Informatics Association, 2008.

Demographic analysis:!• Data analysis in terms of demographics & population

• Countries of residence, gender & age

Analysis of top-reported conditions:!• Prevalence on PLM vs. prevalence in the U.S. • Demographics per top-reported condition vs. official health statistics:

• Gender, peak age & onset age

Analysis of top-reported treatments:!• Top-reported treatments vs. official drug prescription statistics • PLM treatments per top-reported condition vs. officially listed treatments in U.S.

Lexical Analysis:!• PLM conditions and treatments compared with official medical terminology (UMLS)

PLM Data Analysis

PLM Data Characteristics

697 ConditionsCurrent ageOnset age

432 ConditionsReported treatments

Perceived effectiveness of treatments

1617 TreatmentsCurrent patientsStopped patients

AdherenceBurdenCosts

Current durationPast duration

Severity of side effects

1257 TreatmentsReported purpose

Perceived effectiveness per purpose1172 Treatments

Top reported dosages

1032 TreatmentsTop reasons why people stopped

663 TreatmentsTop reported side effects

663 ConditionsCurrent patients

GenderPrimary conditionCondition status

Top reported symptoms

373600 PatientsAge Gender

Gender per age category233153 Unique members

99274 U.S. members

Demographic Analysis

Other

United States

United StatesUnited KingdomCanadaAustraliaIndiaSouth AfricaIrelandNew ZealandOther

37,2% 4,2% 2,7% 1,1% 0,8% 0,3% 0,3% 0,2% 51,7%

37% of PatientsLikeMe’s members lives in the United States

Countries of residence, gender and age

The dataset is biased towards womenPe

rcen

tage

of a

ll m

embe

rs

0

1

2

3

4

5

6

7

8

9

10

Age category0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79

0,5

1,4

3,1

5,6

8,4

9,89,4

8,8

6,9

5,8

4,4

3

1,1

0,20,10,1

0,6

1,1

1,9

2,6

3,13,23,132,72,6

2,4

1,6

0,60,20,20,1

Male: 1 Female: 2,35Gender ratio

Perc

enta

ge

0

2

4

6

8

10

12

14

16

18

Age category0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79

USA PLM USA

1,6

3,4

6,7

11

15,1

16,7

15,6

14,3

11

9,2

6,5

3,9

1,3

0,40,40,2

2,43,2

4,4

5,7

6,67,276,7

6,26,66,87,16,96,76,66,5

People aged 30 - 70 are overrepresented

Top-reported conditionsAre more prevalent on PatientsLikeMe than in the United States

Condition PLM US US1 Fibromyalgia 21,4% 2%2 Multiple Sclerosis!

!19,3% 0,1%

3 Major Depressive Disorder 8,7% 6,7%4 Generalized Anxiety Disorder 7% 3,1%5 Chronic Fatigue Syndrome 6,6% 0,3%6 Parkinson’s Disease 6,6% 0,3%7 Epilepsy 4,5% 0,2%8 Rheumatoid Arthritis 2,4% 0,6%9 Amyotrophic Lateral Sclerosis 3,3% 0,01%

10 Post-Traumatic Stress Disorder 3,4% 3,6%

U.S. most prevalent conditions are mainly related to heart disease and overweight

Demographics per conditionGender

Onset age

Women are overrepresented in all top conditions on PatientsLikeMe

Peak age

PLM patients suffering from mental health conditions are remarkably older than the peak age PLM patients suffering from conditions common among elderly are remarkably younger

PLM patients suffering from mental health conditions experience these often already in their childhood

Top-reported treatmentsAre less popular prescription drugs in the U.S.

Top-reported PLM treatments versus official U.S. rankingsPLM Treatment U.S. rank

1 Gabapentin 202 Duloxetine n.a.3 Pregabalin n.a.4 Baclofen n.a.5 Clonazepam n.a.6 Copaxone n.a.7 Levothyroxine 28 Tramadol 219 Lamotrigine n.a.

10 Bupropion n.a.

U.S. Treatment PLM rank1 Hydrocodone Paracetamol 132 Levothyroxine Sodium 73 Lisinopril 374 Simvastatin 425 Metoprolol 536 Amlodipine 577 Omeprazole 98 Metformin 229 Salbutamol 28

10 Atorvastatin n.a.

Official U.S. rankings versus top-reported PLM treatments

Frequently prescribed drugs in the U.S. are less popular on PLM

Lexical analysisThe majority of the treatments and conditions is covered by UMLSLexical tools:!• BeCas1 • UMLS Metathesaurus

Browser2 • NCBO BioPortal Annotator3

• RxTerms4

All treatments and conditions from the data set are compared with UMLS!• Only 2 out of 1025 unique treatments & 9 out of 663 unique conditions are not covered:

• Too general term (e.g. accidental fall) • Term is proposed and not yet included in UMLS or under discussion • Term is removed from UMLS • Term is not evidence-based and used by alternative healers

1. http://bioinformatics.ua.pt/becas/#!/about 2. http://uts.nlm.nih.gov/home.html 3. http://bioportal.bioontology.org/annotator 4. http://wwwcf.nlm.nih.gov/umlslicense/rxtermApp/rxTerm.cfm

Issues in utilizing user-generated health content as training data for cognitive computing systems

Accessibility Privacy issuesBias & Limitations

Each data source comes with bias and

limitations that need to be considered

Data is not easily accessible How to avoid?

Access to high coverage of (rare) medical conditions

Access to patients and health-aware citizens as an intermediate between

the general crowd and experts

Knowledge from the patients’ perspective

Opportunities in utilizing user-generated health content as training data for cognitive computing systems

In the future..

Perform analysis on data from

alternative geographical contexts

Perform analysis on data with

different characteristics

Generate better

ground truth data

Healthcare

Utilizing Social Health Websites for Cognitive Computing and Clinical Decision Support Systems