Upload
kees-van-bochove
View
112
Download
1
Embed Size (px)
Citation preview
Open source community for “Real World Data” Analysis
JANUARY 26, 2017, SCOPE SUMMIT, MIAMI
Kees van Bochove, CEO & Founder, The Hyve – @keesvanbochove
With thanks to Patrick Ryan, Nigel Hughes & Bart Vannieuwenhuyse from Janssen for slides & feedback!
2
Agenda
1. Introduction: The Hyve & Open Source
2. What’s OHDSI & what can it do for you?
3. Under the hood: OMOP Data Model & Mapping Process
4. Showcase: OHDSI data analytics tools
5. The application of OMOP and OHDSI in IMI EMIF
1.
INTRODUCTION
THE HYVE
3
4
The Hyve
u Professionalsupportforopensourceso+wareforbioinforma1cs&medicalinforma1cs
so5ware,suchastranSMART,cBioPortal,i2b2,Galaxy,CKANandOHDSI
MissionEnablepre-compe11vecollabora1oninlifescienceR&Dbyleveragingopensourceso+ware
Corevalues ShareReuseSpecialize
OfficeLoca6onsUtrecht,NetherlandsCambridge,MA,UnitedStates
ServicesSo5waredevelopmentDatascienceservicesConsultancyHos1ng/SLAs
Fast-growingStartedin201240peoplebynow
Interdisciplinary team
so5ware engineers, data scien1sts, project managers & staff; exper1se inbioinforma1cs,medicalinforma1cs,so5wareengineering,biosta1s1csetc.
5
Open Source u Source code openly accessible and reusable for everyone
u Enables pre-competitive collaboration: both academics and
industry can use and enhance it
u Transparency: verification (scientific as well as IT security) can be
done by anyone, no ‘black box’
7
3 Health Data Areas The Hyve is active in u Translational Research Data
(‘Clinical & bioinformatics data’)
u Population Health Data
(‘Real world data’)
u Personal Health Data
(‘Mobile & sensors data’)
Example (RWD) projects:
2. WHAT IS OHDSI?
OBSERVATIONAL HEALTH DATA SCIENCES AND INFORMATICS
8
9
10
What is OHDSI to you?
u OHDSI is a scientific community to develop best
practices for observational research studies
u OHDSI is a data network bringing together data from
over 650 million patients worldwide to execute studies
u OMOP is an open data model and OHDSI is a suite of
open source software tools for analysis (epidemiology,
but also e.g. inclusion/exclusion criteria feasibility)
11
Questions OHDSI can answer given a set of patient journeys
12
Questions OHDSI can answer
Clinical characterization
Population-level effect estimation
Patient-level prediction
Which treatment did patients choose after
diagnosis?
Which patients chose which treatments?
How many patients experienced the
outcome after treatment?
Does one treatment cause the outcome more
than an alternative?
Does treatment cause outcome?
What is the probability I will develop the disease?
What is the probability I will experience the outcome?
13
Questions OHDSI can answer
14
How are patients with major depressive disorder treated in real world data (250M)?
http://bit.ly/2jYCGkI
15
Informing Clinical Trial Design
u Designing and testing inclusion/exclusion criteria for trials
u Performing observational studies as a basis for choosing
effective randomized clinical trial designs and targets
u Elucidating real world use of medicines and treatments
for safety purposes
3.
UNDER THE HOOD THE OMOP DATA MODEL & MAPPING PROCESS
16
17
OMOP & OHDSI Tools - Overview
u OMOP: Common Data Model for observational healthcare data:
persons, drugs, procedures, devices, conditions etc.
u OHDSI: Large-scale analytics tools for observational data
An open source community, a.o. developing:
u Tools to support the ETL / mapping process into OMOP (White Rabbit etc.)
u Tools to perform analytics: e.g. Achilles for data profiling, Calypso for
feasibility assessment à now being integrated into ATLAS
www.omop.org
www.ohdsi.org
18
OMOP Common Data Model v5.0
v OMOP =
Observational
Medical
Outcomes
Partnership
v CDM = Common
Data Model
v SQL Tables
19
OMOP-CDM Person data table
20
21
Mapping the source data to OMOP CDM
ETL design
ETL implementation
White Rabbit Source data inventarisation
Rabbit in a Hat Map source tables to CDM structure To
ols
use
d
Usagi Map source terms to CDM ontologies (vocabulairies)
syntactic mapping semantic mapping
ETL verification
Achilles Review database profiles Review data quality assesment (Achilles Heel)
22
Output from White Rabbit Tab “Overview”: fields for each table
Tab “Medication”: per table values in fields and frequencies
=Medication name
23
Mapping of tables to CDM
24
v All coded items (gender, race etc) need to be mapped
v Mapping of Medication, Diagnosis, procedures values to
appropriate ontology (RXNorm, ICD-9 etc)
Map terms to target vocabularies
NHANES Gender code NHANES Gender description
Equivalent OMOP SOURCE_CODE
OMOP SOURCE_CODE_DESCRIP
TION
SOURCE_TO_CONCEPT_MAP_ID
. missing U UNKNOWN 8551
1 Male M MALE 8507
2 Female F FEMALE 8532
25
Overview of ontologies used in OMOP
over 80 healthcare vocabularies mapped
4.
OHDSI – ANALYTICS TOOLS
26
27
Tools on GitHub
28
Work with the community
29
Ask the community
30
What can I do with OHDSI tools?
u Explore & QC the mapped data
u Build cohort definitions using concept sets
u Look at patient profiles
u Run and evaluate queries for clinical study
feasibility assesment
31
ACHILLES: Database overview
32
ACHILLES: Achilles Heel Report
33
ACHILLES: Conditions Overview
34
ATLAS: Vocabulary Search
35
ATLAS: Concept Set Definition
36
ATLAS: Cohort Definition
37
ATLAS: Individual Patient Profile
38
Inclusion/Exclusion Query Results
Slide from P. Ryan, Janssen
5.
IMI EUROPEAN MEDICAL INFORMATION FRAMEWORK
39
To become the trusted European hub for health care data intelligence,
enabling new insights into diseases and treatments
EMIF vision
40
Discover
Assess
Reuse
The real story of the treatments in clinical practice
41
The value of healthcare data for secondary uses in clinical research and development — Gary K. Mallow, Merck, HIMSS 2012
1 2 3 4 5 6 7 8 9
1,000
10,000
100,000
1 million
Years
#Pa
tient
Exp
erie
nce
s /
Rec
ord
s
The “burning platform” for life sciences Pharma-owned highly controlled clinical trials data Clinical practice, patients, payers and providers own the data
Product Launch
R&D Phase IV
Challenge
Today, Pharma doesn’t have ready access to this data, yet insights for safety, CER and other areas are within this clinical domain, which includes medical records, pharmacy, labs, claims, radiology etc.
Data available through EMIF consortium
§ Large variety in “types” of data
§ Data is available from more than 53 million subjects from seven
EU countries, including
Primary care data sets
Hospital data
Administrative data Regional record-linkage systems
Registries and cohorts (broad and disease specific)
Biobanks
>25,000 subjects in AD cohorts
>90,000 subjects in metabolic cohorts
43
EMIF Platform Design
Data access Module
Data access Module
Extract Site Y
Site Z Extract C
om
mo
n O
nto
log
y /
De
-id
ent
ific
atio
n
EMIF platform solution
Governance
Data owners Researchers
User admin
User admin
Remote user 1
Remote user 2
Data Sources
1° care
Hospital
Admin
Regional
Registries & cohorts
Biobanks
2° care
Paediatric
>40
mill
ion
MAAS
SDR
EGCUT
PEDIANET
SCTS
IMASIS
HSD
AUH
IPCI
ARS
SIDIAP
PHARMO
THIN
100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000
Ap
pro
xim
ate
tota
l (c
umul
ativ
e)
num
be
r of s
ubje
cts
Available data sources in EMIF
44
EMIF-Platform
EMIF-Available Data Sources; EXAMPLES
1K
2K
52K
400K
475K
2.8M
2.3M
10M
Status Jan 2016
3.6M
1.6M
1M
12M
6M
Catalogue with available data sources
45 https://emif-catalogue.eu
Just released last week!
see www.emif.eu
Catalogue with available data sources
46 https://emif-catalogue.eu
47
Automatic Mapping of Drug Concepts to the RxNorm Vocabulary
Maxim Moinat* [1], Lars Pedersen [2], Jolanda Strubel [1], Marinel Cavelaars [1], Kees van Bochove [1], Peter Rijnbeek [3], Michel van Speybroeck [4], Martijn Schuemie [4]
[1] The Hyve, Utrecht, The Netherlands The Hyve, Cambridge, United States[2] Aarhus University Hospital, Aarhus, Denmark[3] Erasmus MC, Rotterdam, The Netherlands[4] Janssen Pharmaceuticals, Inc.
*E-mail: [email protected].
1. BackgroundMapping source concepts to the standard concepts in the OMOP vocabularies is one of the most time-consuming tasks during the transformation to the OMOP Common Data Model. Drug mapping is in particular challenging, because different components have to be mapped: ingredient, dose form and strength. As part of the European Medical Information Framework (EMIF) project, Danish population health data are mapped to the OMOP CDM, including the local drug codes. The Hyve assists in creating a script to automatically map a set of 4754 drugs to the RxNorm vocabulary. The input data contains ATC codes, dosage forms, numerical strengths and strength units. Two examples are shown in Figure 1.The mapping procedure presented here is based on the drug mapping for the Japan Medical Data Center Claims DatabaseI.
▲ Figure 2: Mapping concepts and relationships with an example for each concept.
We empower scientists by building on open source software
▲ Figure 3: Mapping results. Percentages are based on the count of unique drugs (red bars, n= 4754) or on the number of prescriptions (blue bars, n= 1,093,056). The striped bars show the percentage of manually mapped drugs and prescriptions. Each drug is mapped to only one of the concept classes. If a drug could be mapped to Clinical Drug, then it is not included in the percentages of Clinical Drug Component or Clinical Drug Form. It can be seen that 91% of the drugs could be mapped and 67.2% without any loss of information (red bars).
4. ChallengesMany unmappable drugs are drugs consisting of multiple ingredients and cannot be automatically mapped to one RxNorm ingredient. The ATC concept is often too general, see Example 2 in Figure 1. The automatic mapping from ATC to RxNorm ingredient should be revisioned to accommodate for mapping of these drugs.Other challenges include:● Synonymous dose forms (e.g. ‘Cream’ and ‘Topical Cream’)● Numerator and denominator unit (e.g. ‘GL’ to ‘gram’ and ‘liter’)● Strength derivation (e.g. 8 gram to 8000 milligram)● Duplicate mappings (e.g. one drug to multiple Drug Forms)
2. Mapping ProcedureThe mapping uses the RxNorm hierarchy and consists of four steps (see Figure 2).
1. Drugs are mapped to RxNorm Ingredient via the 5th level ATC code. The OMOP relationship ‘ATC - RxNorm’ is used for this purpose.
2. Dose form is added to the ingredient level, to map to Clinical Drug Form level. 3. The information on drug strength (including unit) is added to map to Clinical
Drug Component. The strength is rounded to two decimals.4. The above three mappings are combined to map to a Clinical Drug concept.
Manual mappings are added for a small number of frequently prescribed drugs. The Danish dose forms and units are also manually mapped to OMOP concepts.
Ingr. Form Str. Level
✓ ✓ ✓
✓ ✓
✓ ✓
✓
6. ConclusionsThe majority of the Danish drug concepts have been mapped automatically to the RxNorm vocabulary (red solid bars in Figure 3). Further improvements can be made by extending the manual mappings and supporting multi-ingredient drugs.However, a 100% mapping is not achievable with this method. Our work clearly demonstrates the need for the addition of a RxNorm extension to enable the mapping of currently missing drugs, forms and strengths in the standard OMOP vocabulary.
5. Results
▲ Figure 1: Examples of input data. Example 1 is successfully mapped automatically. Example 2 consists of two ingredients and has an ATC concept that could not be mapped to a RxNorm concept.
3. RxNorm extensionA major limitation of the mapping is the incomplete RxNorm vocabulary. Multiple drugs in this Danish dataset do not have a counterpart in RxNorm. An example is Litarex 6 mMol, where RxNorm only contains strengths in mg. To be able to map these drugs, the OHDSI community has proposed to use an extension on RxNorm, called Pseudo-RxNorm. Work is in progress to add as much drugs as possible to Pseudo-RxNorm for a complete mapping.
➢ Risperdal➢ N05AX08➢ Filmovertrukne tabletter➢ 0.5➢ MG
Example 1
Example 2➢ Forsteo➢ H06AA02➢ Injektionsvæske,
opløsning➢ 20 mikg/80 mikrol.
Teriparatide Injectable Solution (Clinical Drug)
➢ Fortzaar➢ C09DA06➢ depottabletter➢ 100 + 25 mg
3.9%
10.3%
References: I Schuemie M, Kubota K, “JMDC drug to OMOP Vocabulary mapping”, 2014 August 26
Risperidone 0.5 MG Oral Tablet (RxNorm Clinical Drug)
Condesartan and diuretics (ATC code)
Mapped to
48
Use of OMOP/OHDSI provides EMIF with:
u A uniform way to perform suitability and feasibility
queries across multiple diverse European data sources
u An entry point to quickly initiate and perform
observational studies within one or more data sources
u Direct insight & dashboarding of data for data owners
(e.g. national registries, hospitals)
The goal is patient benefit
49
Prof. Johan van der Lei Erasmus MC University Medical Center
“We need to learn from experience and find ways to unite the large volumes of data in Europe. At
the end of the day, we are in this for better health care.”
Co-coordinator EMIF-Platform
EMIF-Platform