Upload
stephany-carson
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Translational eScience
Ida Sim, MD, PhD
March 16, 2010
Division of General Internal Medicine, and Graduate Group in Biological and Medical Informatics
UCSF
Copyright Ida Sim, 2010. All federal and state rights reserved for all original material presented in this course through any medium, including lecture or print.
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Some Observations• We reinvent the wheel with every study• We don’t repurpose data efficiently• Research and care are separate, unintegrated• We use computers for data processing, not concept
processing• Research policy emphasizes “let a thousand flowers
bloom” more than coherence and comparability of research results
• It’s logistically hard to work with collaborators• ...
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
These Problems...
• ....will increasingly limit the clinical and translational research we want and need to do– “The ‘clinical research grid’ is failing” (Crowley, et al, JAMA 2004;
291:1120-1126), Institute of Medicine
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
Here
Virtual Patient
Transactions
Raw data
Medical knowledge
Clinical research
transactions
Raw research
data
Dec
isio
n su
ppor
t
Med
ical
logi
c
PATIENT CARE / WELLNES RESEARCH
Workflow modeling and support, usability, cognitive support, computer-supported cooperative work (CSCW), etc.
CRMSsEHRs
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
IRB Funding Agency
Study DB
Data analysis
Results reporting
Contract R
esearch O
rganization (C
RO
)
Protocol
Trial DesignSponsorsAcademic PIs
?Site 1 Site 2 Site 3
Site Management Organization (SMO)
Here
Clinic 2008
FrontDesk
Radiology
MedicalInformationBureau
Walgreens
Pharm BenefitManager
Benefits Check(RxHub)
HealthNet
B&T
UCare
Specialist
ReferralAuthorization
Internet Intranet Phone/Paper/Fax
Lab
UniLab
(HL-7)
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
There?
• Open data/open science on epic scale – everyone produces content
– automated data mining and knowledge discovery across all of biomedicine
– collaborative, flat, fluid, emergent, open participation
– even very esoteric communities can be supported
• “Not your grandfather’s clinical research”
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
General Drivers of Change
• A “grand convergence” of– maturation of the Internet as connective data technology
– ubiquity of microchips in computers, appliances, and sensors
– explosion of data from everywhere and everything (Big Data)
• For all fields, frontiers of research driven by– ability to do large-scale multi-disciplinary data analysis,
visualization, etc.
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Biomedical Drivers of Change• Personalized medicine, geno-pheno correlations
– need genomic and phenotype data in computable form for large-scale small signal correlations
• predictors more likely to be rare vs common variants
• Genomic data will be a commodity– SNPs, whole genome analysis
• Large-scale phenotype is the bottleneck• Requires tighter connection between research and
care – huge volume, complex data that needs to be made sense of
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
How?
• Combination of web 2.0 and semantic web applied to health and biomedical science – web 2.0: Vague-ish term on emerging web, strongly based
on social computing• people are as important as computers in the network
– semantic web (aka web 3.0): • web 1.0 is a web of documents
• web 3.0 is a web of (computer-understandable) data
• Building the research “cyberinfrastructure” is the single most important challenge confronting the nation’s science laboratories (NSF)
http://www.nsf.gov/news/special_reports/cyber/index.jsp
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Big Picture + People
..
....
..
..
....
..
..
VirtualPatient
Transactions
Raw data
Medicalknowledge
Clinicalresearch
transactions
Rawresearch
data
DecisionsupportMedical logic
PATIENT CARE /WELLNES RESEARCH
Workflow modeling and support, usability, cognitive support,computer-supported cooperative work (CSCW), etc.
Where clinicianswant to stay
EHRs
CTMSs
Primary Care MD
Patient
Principal Investigator
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Collaborative Care
• “Upskilling” all participants– almost 50% of Americans have 1 or more chronic conditions
• chronic diseases account of >75% of total medical costs
– not enough primary care or specialists for chronic disease management
– must increase knowledge of entire care team (e.g., families)
• Beyond the EHR (i.e., beyond record-keeping)• Must support collaborative care
– messaging, task management, shared conceptualization of problem/education, group decision making, secure distributed permissioned access
– contextualized to work and living for all team members
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Web 2.0 in Health• Vague-ish term on emerging web, strongly based on
social computing– people are as important as computers in the network
• Several principles– user-generated content
– harness power/wisdom of crowds
– openness
– architecture of participation
– niche markets
(P. Anderson, What is Web 2.0? JISC Tech and Standards Watch, Feb 2007)
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
User-Generated Content
• Anyone anywhere is a source of content– YouTube, Flickr, Wikipedia. citizen journalism, blogs
– e.g., http://PatientsLikeMe.com
• Exists in parallel with (trumps?) Old/Main Stream Media (MSM), hierarchical information sources– NIH MedlinePlus http://www.nlm.nih.gov/medlineplus/
– WebMD.com
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Power/Wisdom of Crowds
• Tapping into distributed intelligence of people– wikipedia (as accurate as Encyclopedia Britannica)– www.intrade.com: “stock market” for health care reform
passage– e.g., Google Flu
• Use distributed machine and people resources– parallel computing for cheap: donate your PC cycles to find
signs of intelligence from outer space• http://setiathome.berkeley.edu/
• Crowdsourcing: e.g., http://www.answers.com/– 250,300 questions in health
http://wiki.answers.com/Q/FAQ/431
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Openness
• Dimensions of openness– open source: computer code open to all for wisdom of
crowds to improve (e.g., VistA VA EHR system)– open access: no restrictions on use or distribution of content – open participation: everyone can participate
• communal management, flat hierarchies, consensus emergent decision-making
• Allows “mash-ups” of freed data– http://www.googlelittrips.com/GoogleLit/Home.html for
Aeneid, Grapes of Wrath, user-generated road trips...
- e.g., http://healthmap.org/en http://www.nature.com/avianflu/google-earth/index.html
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Architecture of Participation
• Network externalities concept: “the service automatically gets better the more people use it” e.g., – fax machines, cell phones...the more the better– Google search
• the more “link paths” people tread, the richer the data for the Google search algorithm
– Amazon book ratings, Netflix ratings
• Anonymity important for this to happen in healthcare– whoissick.org/sickness/– better epi data if everyone contributed to public health data
• 1-3% refuse to share clinical data for research
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Niche Markets• “The web” is unlimited resource
– can service even extremely small market niches
• Shape of the web: the “long tail”
where traditional focus is
with infinitely long tail, majority of action is here
# p
eo
ple
market niche/things being done
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Niche Markets in Health
• Rare diseases– PatientsLikeMe
• Geographic, ethnic, other niches– Russian-speaking boy scouts with ADHD in rural Montana
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Study Interpretation/Hypothesis Generation
• New hypotheses arise from examining prior data and knowledge– clinical data, e.g.,
• claims data
• EHR data/data warehouses
– research data (aka the literature)• basic science research results (e.g., animal studies)
• clinical research (e.g., RCTs, GWAS, observational studies)
– all other data
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
MICU
FinanceResearch
QA
IntegratedData Repository
Internet
ADT Chem EHR XRay PBM Claims
• autofeed nightly, data stored securely with backup
Data Mining in IDR
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Data Mining
• The process of automatically discovering useful information in large data repositories– predictive: find variables to predict unknown or future
variables• e.g,. classification of people into likely tax cheaters, credit risks
• e.g., who is at risk of ER bounce-backs?
– descriptive: finding human-interpretable patterns that describe the data
• clustering: e.g., network analysis of depression trials in ClinicalTrials.gov
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Anti-depressants vs. Herbals
H i g h C o - o c c u r r e n c e
o f A n t i d e p r e s s a n t s
L o w C o - o c c u r r e n c e o f
A n t i d e p r e s s a n t s a n d
N a t u r a l S u p p l e m e n t s
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Hypothesis Generation from Clinical Data
• Background data mining algorithms running on IDR– “promising findings” put up on a website where UCSF
researchers can “vote” on their interest and/or examine
• Let non-researchers nominate hypotheses– e.g., a window in Epic for clinicians to suggest a research
question
• Collect different data to drive data mining– e.g., patients can twitter adverse symptoms, may lead to
earlier detection for adverse effects of new drugs?
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Study Interpretation/Hypothesis Generation
• New hypotheses arise from examining prior data and knowledge– clinical data, e.g.,
• claims data
• EHR data/data warehouses
– research data (aka the literature)• basic science research results (e.g., animal studies)
• clinical research (e.g., RCTs, GWAS, observational studies)
– all other data
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Biomedical Research Data
• Biomedical research data repositories– GenBank, UK BioBank, deCODE
– Gene Expression Omnibus (GEO) gene expression and genomic hybridization experiments http://www.ncbi.nlm.nih.gov/geo
– PharmGKB, pharmacogenomics http://pharmgkb.org/
– ClinicalTrials.gov http://clinicaltrials.gov/
• Biomedical literature (i.e., PubMed) • E.g., “human studyome”
– totality of human studies worldwide
– is the scientific foundation for understanding human health and disease and for advancing human health
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Sharing Raw Results
46.4 (39.2-51.2) 45.1 (39.9-50.5)
0.83 (0.79-0.99) 0.91 (0.93-1.04)
2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
110 (87-134) 121 (99-129)
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Need Standardized Metadata
• Variable names are metadata• MeSH, ICD, SNOMED, etc. are standard clinical vocabularies
– ionized calcium: UMLS code C0373561
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (lbs) 110 (87-134) 121 (99-129)
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Garlic Chocolate
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (lbs) 110 (87-134) 121 (99-129)
Need Metadata About the Study
• Study results = “study data”• Variable names = “study results metadata” • Data about study design = “study metadata”
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Garlic Chocolate
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (lbs) 110 (87-134) 121 (99-129)
Need Study Design Metadata
• Randomized trial of garlic vs. chocolate for weight loss? Observational study of ionized calcium levels?
• i.e., need data standardized in an ontology of human studies research
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Computerizing the Studyome
• Computerize human studies design and results for large-scale discovery, reanalysis, reuse
• Based on Ontology of Clinical Research
http://hsdbwiki.org/
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Study Interpretation/Hypothesis Generation
• New hypotheses arise from examining prior data and knowledge– clinical data, e.g.,
• claims data
• EHR data/data warehouses
– research data (aka the literature)• basic science research results (e.g., animal studies)
• clinical research (e.g., RCTs, GWAS, observational studies)
– all other data
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Data Mining with “Big Data”
• Text mining, data mining, model building across ALL data on web– within and outside biomedicine– supervised (e.g, neural net) and unsupervised (e.g.,
clustering) learning
• Current web is non-semantic– “the web” does not “understand” the meaning of
• content of web pages, or
• data that is sent over the network (e.g., Netflix movie names, or movie content)
– how to go from a web of documents to a web of (computer- understandable) data?
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Semantic Web• All content on or sent over the web is expressed
using OWL ontologies– Ontology Web Language, for describing everything, like
“SNOMED for everything”• see OntoWiki, National Center for Biomedical Ontology
• “Intelligent agents” can roam the web doing smart things for you– e.g., booking your summer vacation, making appointment
with the best cardiothoracic surgeon, re-balancing your retirement portfolio
– learning from your actions, acting on your behalf
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Semantic Web Databases/Technologies
• www.freebase.com– free + database = absolutely everything in structured,
computable form using OWL ontologies
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
How Will You be Getting New Ideas?
• Automated discovery of unimaginably large data sets (i.e., the whole web)
• Crowdsourcing– using distributed human intelligence and the wisdom of
crowds to sort the wheat from the chaff
• Will it be better to share your best ideas widely? or to hold them tight?
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
A Research Commons• Science Commons: open science data on semantic web
http://sciencecommons.org/ • Health Commons virtual labs vision http://www.healthcommons.net/
– “buy” scientific elements• e.g., PhenX, NHGRI’s common phenotypes for GWAS studies
– https://www.phenxtoolkit.org/
– “buy” scientific services like you shop at Amazon• high-throughput genotyping, array analysis, trial recruitment, survey
design
– assemble your team as needed
– IP, material transfer agreements, etc. all handled by Health Commons framework (like e-commerce)
• Predicated on large-scale, open data
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
On an Open Software Platform
• iPhone-like health care and research “apps”• Clinical research 24/7/without walls• Needs technical standards and a market mechanism• ???
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Content Production• Anyone can produce “content” (researchers, clinicians,
patients, etc.)– clinicians: e.g., www.ganfyd.org, a medical wiki for MDs,
www.sermo.com, etc.– patients: tens of thousands of web sites...– social tagging/social bookmarking (e.g., del.icio.us)
• (content, your-bookmark-tag, your-name) <==> (content, same-bookmark-tag, potential-collaborator)
• All content is open– e.g., Consolidated Appropriations Act of 2007 requires open
online access to NIH funded research– NIH Data Sharing initiative, PubMed Central, etc.
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Publication
• Publication is self-controlled– self-archiving, self-publishing in institutional repositories and/or
eScience communities (e.g,. http://escholarship.org/ for UC)– e.g., PLoS One, Nature portals -- “the long tail”
• papers published into PLoS platform
• scientists self-aggregate into (niche) communities
• reader ratings & comments “direct” papers to relevant communities
• evaluation is by # of views, # of comments/citations, ratings, link outs, blog mentions, etc.
• Publications should be in computable form– e.g., using Ontology of Clinical Research for human studies
Disclosure: I’m on PLoS One Advisory Board
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Big Data + Web 2.0 + Web 3.0
..
....
..
..
....
..
..
VirtualPatient
Transactions
Raw data
Medicalknowledge
Clinicalresearch
transactions
Rawresearch
data
DecisionsupportMedical logic
PATIENT CARE /WELLNES RESEARCH
Workflow modeling and support, usability, cognitive support,computer-supported cooperative work (CSCW), etc.
Where clinicianswant to stay
EHRs
CTMSs
Primary Care MD
Patient
Principal Investigator
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
eCare and eScience
Administrative Clinical Care Research
Physical Networking
Standard Communications Protocols (e.g., HL-7)
PracticeManagement
Systems
EHRExecutionAnalysis
Medical BusinessData Model
Clinical CareData Model
Clinical StudyData Models
Open de-identified repositories
OWL Ontologies of Everything
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Collab Care and Research• Beyond data storage, security, and access to smarter knowledge-
based systems• Beyond supporting transactions to supporting collaborative
sense-making– visualization, human and automated pattern matching and testing,
combining multi-disciplinary worldviews
– “marketplace” of ideas, research methods, research tools
• Continuous learning by all participants– teachable moments for new methods, findings, hypotheses
– tighter coupling of front-line clinical evidence needs to research questions
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Open Discussion
• How to balance standardization and comparability (e.g., of EHR notes, of research outcomes) with flexibility/innovation?
• Biomedical researchers are conservative– will all this web 2.0/3.0 stuff pass right by us?
• How will this change what you do/how you think, if at all?
• What would you like to see from academia/UCSF to help you stay as competitive in research as possible?
• ???
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Outline
• From Here to There (Web 2.0/3.0 eScience) • Collaborative Care and Web 2.0• Collaborative Research and Web 2.0/3.0
– study interpretation/hypothesis generation
– study design/execution
– publication and dissemination
• Closing the Loop• Class Summary
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Summary• IT focuses on storing, accessing, and exchanging
data • Informatics is use of computers to make sense of
data • The more “computable” the information, the more the
computer can do for us• ...not just us individually, but together as a community
of care and science
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Computers Must Interoperate• In a networked world, data and actions must be
shared across people and computers– syntatic interoperation: a common grammar for machines
talking to each other in biomedicine (e.g., HL7)– semantic interoperation: predictable and meaningful
exchange of common meaning• requires standard vocabularies and standard data models
• SNOMED most comprehensive but use is unproven
• Other challenging things that need standardization in biomedicine– “common data elements” in research– a standard EHR data model so all EHRs “look” alike– standard protocol models for human studies, etc.
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
State of Health IT Use• EHR adoption still low
– barriers include finances, lack of organizational change expertise, fragmentation of health care system, misaligned incentives
• Recovery Act will spur EHR adoption, for good or ill• EHR and data warehouses can but don’t always help
research • Limited success of decision support systems• Fundamental tradeoff of coding effort vs. “smartness”
of system limits both EHR and CDSS return on investment
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Take-Home Message• Informatics helps make sense of data and knowledge
– is necessary for better care and research
• Today’s technologies promise transactional support – major barriers are economic, policy, and workflow related
• Need brand new technologies for other 3/4 of Big Picture
• Disruptive change to eScience seems quite possible – as we go from data processing to concept processing– as mobile technologies break down time and space barriers– as social computing takes off
March 16, 2010: I. Sim Translational eScienceEpi – 206 Medical Informatics
Big Data + Web 2.0 + Web 3.0
..
....
..
..
....
..
..
VirtualPatient
Transactions
Raw data
Medicalknowledge
Clinicalresearch
transactions
Rawresearch
data
DecisionsupportMedical logic
PATIENT CARE /WELLNES RESEARCH
Workflow modeling and support, usability, cognitive support,computer-supported cooperative work (CSCW), etc.
Where clinicianswant to stay
EHRs
CTMSs
Primary Care MD
Patient
Principal Investigator