Upload
crowdtruth
View
585
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Crowdsourced annotations data offers cognitive computing systems insights in lay semantics. This is especially important in health care, where medical terminology is often not aligned with patients `lay' language. However, the general crowd often has limited medical knowledge. Therefore this research investigated the opportunities of social health websites for obtaining ground truth annotations data for cognitive computing systems including clinical decision support systems. By identifying these websites and analyzing their data, it offers a starting point for the future utilization of user-generated health content for cognitive systems. However, the opportunities of social health data are currently limited by various legal regulations. Therefore this paper also dwells on the legal aspects of implementing social health data for cognitive computing systems.
Citation preview
Utilizing Social Health Websites for Cognitive Computing Exploring the Potential of User-Generated Health Content for Clinical Decision Support Systems
Harriëtte Smook [email protected]
28 October 2014
Cognitive Computing Systems‘Prostheses’ for human cognition
Introduce a new generation of Clinical Decision Support Systems
Learn by being used: Humans often can easily detect machine errors. Systems usage can be arranged in such a way that humans understand the system and the problems it solves.
Expand human cognition: Ease processes, especially those with large data sets or data that requires human interpretation.
Why Cognitive Systems? IBM Research. Retrieved from http://www.research.ibm.com/cognitive-computing/why-cognitive-systems.shtml, accessed 16 July 2014. Lora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014.
Apple Siri
Google Glass
IBM Watson
Interact naturally: Machines & users should be closer to each other by enabling machines to understand human natural language
Clinical Decision Support SystemsIBM Watson
2. Generates & evaluates!evidence-based hypothesis
1. Understands !human natural language & human communication
3. Adapts & learns!from user selections
& responses
Transformational technologies combinedLora Aroyo. CrowdTruth: The 7 Myths of Human Annotation. Cognitive Computing Forum 2014. Retrieved from http://www.slideshare.net/laroyo/truth-is-a-lie-7-myths-about-human-annotation-cogcomputing-forum-2014, accessed 28 October 2014
How can Health 2.0 help cognitive computing systems?
+ =Collaboration of patients, medical experts and researchers
Collective aggregation of information, experiences and data
Tools for collecting, tracking and sharing health information: • Monitoring new treatments • Collecting real-world experiences • Patients have more explicit control over their own data
Social Health Websites: !
PatientsLikeMe !
… !
HealthUnlocked ?Health Tracking Tools:
How can health 2.0 help cognitive computing systems?
+ =The crowd provides human perspectives:
Crowdsourcing Human SemanticsNew generation of
Clinical Decision Support Systems
Patients Health-aware citizensDoctors
Experts provide
formal knowledge
My patient has acute coryza!
Well, I only have a cold.
How to utilize user-generated health content as training data for cognitive computing systems?
Representativeness Validity
Consistency
2. Data Analysis 3. Create Ground Truth Data
Compare with existing Watson data
1. Gather the data
PatientsLikeMe Publicly available pages
Data AnalysisImportant aspects for obtaining widespread health data
Coverage of different medical conditions
> 500 conditions
Availability of different kinds of data
Diverse health tracking tools
Consistency in the used vocabulary
43% of the symptoms covered by UMLS
Cultural and geographical dispersion of users
> 260.000 users Website in English
PatientsLikeMe (PLM)
Catherine Arnott Smith and Paul J Wicks. Patientslikeme: Consumer health vocabulary as a folksonomy. In AMIA annual symposium proceedings, vol. 2008, p. 682. American Medical Informatics Association, 2008.
Demographic analysis:!• Data analysis in terms of demographics & population
• Countries of residence, gender & age
Analysis of top-reported conditions:!• Prevalence on PLM vs. prevalence in the U.S. • Demographics per top-reported condition vs. official health statistics:
• Gender, peak age & onset age
Analysis of top-reported treatments:!• Top-reported treatments vs. official drug prescription statistics • PLM treatments per top-reported condition vs. officially listed treatments in U.S.
Lexical Analysis:!• PLM conditions and treatments compared with official medical terminology (UMLS)
PLM Data Analysis
PLM Data Characteristics
697 ConditionsCurrent ageOnset age
432 ConditionsReported treatments
Perceived effectiveness of treatments
1617 TreatmentsCurrent patientsStopped patients
AdherenceBurdenCosts
Current durationPast duration
Severity of side effects
1257 TreatmentsReported purpose
Perceived effectiveness per purpose1172 Treatments
Top reported dosages
1032 TreatmentsTop reasons why people stopped
663 TreatmentsTop reported side effects
663 ConditionsCurrent patients
GenderPrimary conditionCondition status
Top reported symptoms
373600 PatientsAge Gender
Gender per age category233153 Unique members
99274 U.S. members
Demographic Analysis
Other
United States
United StatesUnited KingdomCanadaAustraliaIndiaSouth AfricaIrelandNew ZealandOther
37,2% 4,2% 2,7% 1,1% 0,8% 0,3% 0,3% 0,2% 51,7%
37% of PatientsLikeMe’s members lives in the United States
Countries of residence, gender and age
The dataset is biased towards womenPe
rcen
tage
of a
ll m
embe
rs
0
1
2
3
4
5
6
7
8
9
10
Age category0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79
0,5
1,4
3,1
5,6
8,4
9,89,4
8,8
6,9
5,8
4,4
3
1,1
0,20,10,1
0,6
1,1
1,9
2,6
3,13,23,132,72,6
2,4
1,6
0,60,20,20,1
Male: 1 Female: 2,35Gender ratio
Perc
enta
ge
0
2
4
6
8
10
12
14
16
18
Age category0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79
USA PLM USA
1,6
3,4
6,7
11
15,1
16,7
15,6
14,3
11
9,2
6,5
3,9
1,3
0,40,40,2
2,43,2
4,4
5,7
6,67,276,7
6,26,66,87,16,96,76,66,5
People aged 30 - 70 are overrepresented
Top-reported conditionsAre more prevalent on PatientsLikeMe than in the United States
Condition PLM US US1 Fibromyalgia 21,4% 2%2 Multiple Sclerosis!
!19,3% 0,1%
3 Major Depressive Disorder 8,7% 6,7%4 Generalized Anxiety Disorder 7% 3,1%5 Chronic Fatigue Syndrome 6,6% 0,3%6 Parkinson’s Disease 6,6% 0,3%7 Epilepsy 4,5% 0,2%8 Rheumatoid Arthritis 2,4% 0,6%9 Amyotrophic Lateral Sclerosis 3,3% 0,01%
10 Post-Traumatic Stress Disorder 3,4% 3,6%
U.S. most prevalent conditions are mainly related to heart disease and overweight
Demographics per conditionGender
Onset age
Women are overrepresented in all top conditions on PatientsLikeMe
Peak age
PLM patients suffering from mental health conditions are remarkably older than the peak age PLM patients suffering from conditions common among elderly are remarkably younger
PLM patients suffering from mental health conditions experience these often already in their childhood
Top-reported treatmentsAre less popular prescription drugs in the U.S.
Top-reported PLM treatments versus official U.S. rankingsPLM Treatment U.S. rank
1 Gabapentin 202 Duloxetine n.a.3 Pregabalin n.a.4 Baclofen n.a.5 Clonazepam n.a.6 Copaxone n.a.7 Levothyroxine 28 Tramadol 219 Lamotrigine n.a.
10 Bupropion n.a.
U.S. Treatment PLM rank1 Hydrocodone Paracetamol 132 Levothyroxine Sodium 73 Lisinopril 374 Simvastatin 425 Metoprolol 536 Amlodipine 577 Omeprazole 98 Metformin 229 Salbutamol 28
10 Atorvastatin n.a.
Official U.S. rankings versus top-reported PLM treatments
Frequently prescribed drugs in the U.S. are less popular on PLM
Lexical analysisThe majority of the treatments and conditions is covered by UMLSLexical tools:!• BeCas1 • UMLS Metathesaurus
Browser2 • NCBO BioPortal Annotator3
• RxTerms4
All treatments and conditions from the data set are compared with UMLS!• Only 2 out of 1025 unique treatments & 9 out of 663 unique conditions are not covered:
• Too general term (e.g. accidental fall) • Term is proposed and not yet included in UMLS or under discussion • Term is removed from UMLS • Term is not evidence-based and used by alternative healers
1. http://bioinformatics.ua.pt/becas/#!/about 2. http://uts.nlm.nih.gov/home.html 3. http://bioportal.bioontology.org/annotator 4. http://wwwcf.nlm.nih.gov/umlslicense/rxtermApp/rxTerm.cfm
Issues in utilizing user-generated health content as training data for cognitive computing systems
Accessibility Privacy issuesBias & Limitations
Each data source comes with bias and
limitations that need to be considered
Data is not easily accessible How to avoid?
Access to high coverage of (rare) medical conditions
Access to patients and health-aware citizens as an intermediate between
the general crowd and experts
Knowledge from the patients’ perspective
Opportunities in utilizing user-generated health content as training data for cognitive computing systems
In the future..
Perform analysis on data from
alternative geographical contexts
Perform analysis on data with
different characteristics
Generate better
ground truth data