50
Open source community for “Real World Data” Analysis JANUARY 26, 2017, SCOPE SUMMIT, MIAMI Kees van Bochove, CEO & Founder, The Hyve – @keesvanbochove With thanks to Patrick Ryan, Nigel Hughes & Bart Vannieuwenhuyse from Janssen for slides & feedback!

SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Embed Size (px)

Citation preview

Page 1: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Open source community for “Real World Data” Analysis

JANUARY 26, 2017, SCOPE SUMMIT, MIAMI

Kees van Bochove, CEO & Founder, The Hyve – @keesvanbochove

With thanks to Patrick Ryan, Nigel Hughes & Bart Vannieuwenhuyse from Janssen for slides & feedback!

Page 2: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

2

Agenda

1.  Introduction: The Hyve & Open Source

2.  What’s OHDSI & what can it do for you?

3.  Under the hood: OMOP Data Model & Mapping Process

4.  Showcase: OHDSI data analytics tools

5.  The application of OMOP and OHDSI in IMI EMIF

Page 3: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

1.

INTRODUCTION

THE HYVE

3

Page 4: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

4

The Hyve

u  Professionalsupportforopensourceso+wareforbioinforma1cs&medicalinforma1cs

so5ware,suchastranSMART,cBioPortal,i2b2,Galaxy,CKANandOHDSI

MissionEnablepre-compe11vecollabora1oninlifescienceR&Dbyleveragingopensourceso+ware

Corevalues ShareReuseSpecialize

OfficeLoca6onsUtrecht,NetherlandsCambridge,MA,UnitedStates

ServicesSo5waredevelopmentDatascienceservicesConsultancyHos1ng/SLAs

Fast-growingStartedin201240peoplebynow

Page 5: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Interdisciplinary team

so5ware engineers, data scien1sts, project managers & staff; exper1se inbioinforma1cs,medicalinforma1cs,so5wareengineering,biosta1s1csetc.

5

Page 6: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Open Source u  Source code openly accessible and reusable for everyone

u  Enables pre-competitive collaboration: both academics and

industry can use and enhance it

u  Transparency: verification (scientific as well as IT security) can be

done by anyone, no ‘black box’

Page 7: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

7

3 Health Data Areas The Hyve is active in u  Translational Research Data

(‘Clinical & bioinformatics data’)

u  Population Health Data

(‘Real world data’)

u  Personal Health Data

(‘Mobile & sensors data’)

Example (RWD) projects:

Page 8: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

2. WHAT IS OHDSI?

OBSERVATIONAL HEALTH DATA SCIENCES AND INFORMATICS

8

Page 9: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

9

Page 10: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

10

What is OHDSI to you?

u  OHDSI is a scientific community to develop best

practices for observational research studies

u  OHDSI is a data network bringing together data from

over 650 million patients worldwide to execute studies

u  OMOP is an open data model and OHDSI is a suite of

open source software tools for analysis (epidemiology,

but also e.g. inclusion/exclusion criteria feasibility)

Page 11: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

11

Questions OHDSI can answer given a set of patient journeys

Page 12: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

12

Questions OHDSI can answer

Clinical characterization

Population-level effect estimation

Patient-level prediction

Which treatment did patients choose after

diagnosis?

Which patients chose which treatments?

How many patients experienced the

outcome after treatment?

Does one treatment cause the outcome more

than an alternative?

Does treatment cause outcome?

What is the probability I will develop the disease?

What is the probability I will experience the outcome?

Page 13: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

13

Questions OHDSI can answer

Page 14: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

14

How are patients with major depressive disorder treated in real world data (250M)?

http://bit.ly/2jYCGkI

Page 15: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

15

Informing Clinical Trial Design

u  Designing and testing inclusion/exclusion criteria for trials

u  Performing observational studies as a basis for choosing

effective randomized clinical trial designs and targets

u  Elucidating real world use of medicines and treatments

for safety purposes

Page 16: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

3.

UNDER THE HOOD THE OMOP DATA MODEL & MAPPING PROCESS

16

Page 17: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

17

OMOP & OHDSI Tools - Overview

u  OMOP: Common Data Model for observational healthcare data:

persons, drugs, procedures, devices, conditions etc.

u  OHDSI: Large-scale analytics tools for observational data

An open source community, a.o. developing:

u  Tools to support the ETL / mapping process into OMOP (White Rabbit etc.)

u  Tools to perform analytics: e.g. Achilles for data profiling, Calypso for

feasibility assessment à now being integrated into ATLAS

www.omop.org

www.ohdsi.org

Page 18: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

18

OMOP Common Data Model v5.0

v  OMOP =

Observational

Medical

Outcomes

Partnership

v  CDM = Common

Data Model

v  SQL Tables

Page 19: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

19

OMOP-CDM Person data table

Page 20: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

20

Page 21: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

21

Mapping the source data to OMOP CDM

ETL design

ETL implementation

White Rabbit Source data inventarisation

Rabbit in a Hat Map source tables to CDM structure To

ols

use

d

Usagi Map source terms to CDM ontologies (vocabulairies)

syntactic mapping semantic mapping

ETL verification

Achilles Review database profiles Review data quality assesment (Achilles Heel)

Page 22: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

22

Output from White Rabbit Tab “Overview”: fields for each table

Tab “Medication”: per table values in fields and frequencies

=Medication name

Page 23: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

23

Mapping of tables to CDM

Page 24: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

24

v  All coded items (gender, race etc) need to be mapped

v  Mapping of Medication, Diagnosis, procedures values to

appropriate ontology (RXNorm, ICD-9 etc)

Map terms to target vocabularies

NHANES Gender code NHANES Gender description

Equivalent OMOP SOURCE_CODE

OMOP SOURCE_CODE_DESCRIP

TION

SOURCE_TO_CONCEPT_MAP_ID

. missing U UNKNOWN 8551

1 Male M MALE 8507

2 Female F FEMALE 8532

Page 25: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

25

Overview of ontologies used in OMOP

over 80 healthcare vocabularies mapped

Page 26: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

4.

OHDSI – ANALYTICS TOOLS

26

Page 27: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

27

Tools on GitHub

Page 28: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

28

Work with the community

Page 29: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

29

Ask the community

Page 30: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

30

What can I do with OHDSI tools?

u  Explore & QC the mapped data

u  Build cohort definitions using concept sets

u  Look at patient profiles

u  Run and evaluate queries for clinical study

feasibility assesment

Page 31: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

31

ACHILLES: Database overview

Page 32: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

32

ACHILLES: Achilles Heel Report

Page 33: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

33

ACHILLES: Conditions Overview

Page 34: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

34

ATLAS: Vocabulary Search

Page 35: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

35

ATLAS: Concept Set Definition

Page 36: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

36

ATLAS: Cohort Definition

Page 37: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

37

ATLAS: Individual Patient Profile

Page 38: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

38

Inclusion/Exclusion Query Results

Slide from P. Ryan, Janssen

Page 39: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

5.

IMI EUROPEAN MEDICAL INFORMATION FRAMEWORK

39

Page 40: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

To become the trusted European hub for health care data intelligence,

enabling new insights into diseases and treatments

EMIF vision

40

Discover

Assess

Reuse

Page 41: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

The real story of the treatments in clinical practice

41

The value of healthcare data for secondary uses in clinical research and development — Gary K. Mallow, Merck, HIMSS 2012

1 2 3 4 5 6 7 8 9

1,000

10,000

100,000

1 million

Years

#Pa

tient

Exp

erie

nce

s /

Rec

ord

s

The “burning platform” for life sciences Pharma-owned highly controlled clinical trials data Clinical practice, patients, payers and providers own the data

Product Launch

R&D Phase IV

Challenge

Today, Pharma doesn’t have ready access to this data, yet insights for safety, CER and other areas are within this clinical domain, which includes medical records, pharmacy, labs, claims, radiology etc.

Page 42: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Data available through EMIF consortium

§  Large variety in “types” of data

§  Data is available from more than 53 million subjects from seven

EU countries, including

Primary care data sets

Hospital data

Administrative data Regional record-linkage systems

Registries and cohorts (broad and disease specific)

Biobanks

>25,000 subjects in AD cohorts

>90,000 subjects in metabolic cohorts

Page 43: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

43

EMIF Platform Design

Data access Module

Data access Module

Extract Site Y

Site Z Extract C

om

mo

n O

nto

log

y /

De

-id

ent

ific

atio

n

EMIF platform solution

Governance

Data owners Researchers

User admin

User admin

Remote user 1

Remote user 2

Data Sources

1° care

Hospital

Admin

Regional

Registries & cohorts

Biobanks

2° care

Paediatric

Page 44: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

>40

mill

ion

MAAS

SDR

EGCUT

PEDIANET

SCTS

IMASIS

HSD

AUH

IPCI

ARS

SIDIAP

PHARMO

THIN

100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000

Ap

pro

xim

ate

tota

l (c

umul

ativ

e)

num

be

r of s

ubje

cts

Available data sources in EMIF

44

EMIF-Platform

EMIF-Available Data Sources; EXAMPLES

1K

2K

52K

400K

475K

2.8M

2.3M

10M

Status Jan 2016

3.6M

1.6M

1M

12M

6M

Page 45: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Catalogue with available data sources

45 https://emif-catalogue.eu

Just released last week!

see www.emif.eu

Page 46: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

Catalogue with available data sources

46 https://emif-catalogue.eu

Page 47: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

47

Automatic Mapping of Drug Concepts to the RxNorm Vocabulary

Maxim Moinat* [1], Lars Pedersen [2], Jolanda Strubel [1], Marinel Cavelaars [1], Kees van Bochove [1], Peter Rijnbeek [3], Michel van Speybroeck [4], Martijn Schuemie [4]

[1] The Hyve, Utrecht, The Netherlands The Hyve, Cambridge, United States[2] Aarhus University Hospital, Aarhus, Denmark[3] Erasmus MC, Rotterdam, The Netherlands[4] Janssen Pharmaceuticals, Inc.

*E-mail: [email protected].

1. BackgroundMapping source concepts to the standard concepts in the OMOP vocabularies is one of the most time-consuming tasks during the transformation to the OMOP Common Data Model. Drug mapping is in particular challenging, because different components have to be mapped: ingredient, dose form and strength. As part of the European Medical Information Framework (EMIF) project, Danish population health data are mapped to the OMOP CDM, including the local drug codes. The Hyve assists in creating a script to automatically map a set of 4754 drugs to the RxNorm vocabulary. The input data contains ATC codes, dosage forms, numerical strengths and strength units. Two examples are shown in Figure 1.The mapping procedure presented here is based on the drug mapping for the Japan Medical Data Center Claims DatabaseI.

▲ Figure 2: Mapping concepts and relationships with an example for each concept.

We empower scientists by building on open source software

▲ Figure 3: Mapping results. Percentages are based on the count of unique drugs (red bars, n= 4754) or on the number of prescriptions (blue bars, n= 1,093,056). The striped bars show the percentage of manually mapped drugs and prescriptions. Each drug is mapped to only one of the concept classes. If a drug could be mapped to Clinical Drug, then it is not included in the percentages of Clinical Drug Component or Clinical Drug Form. It can be seen that 91% of the drugs could be mapped and 67.2% without any loss of information (red bars).

4. ChallengesMany unmappable drugs are drugs consisting of multiple ingredients and cannot be automatically mapped to one RxNorm ingredient. The ATC concept is often too general, see Example 2 in Figure 1. The automatic mapping from ATC to RxNorm ingredient should be revisioned to accommodate for mapping of these drugs.Other challenges include:● Synonymous dose forms (e.g. ‘Cream’ and ‘Topical Cream’)● Numerator and denominator unit (e.g. ‘GL’ to ‘gram’ and ‘liter’)● Strength derivation (e.g. 8 gram to 8000 milligram)● Duplicate mappings (e.g. one drug to multiple Drug Forms)

2. Mapping ProcedureThe mapping uses the RxNorm hierarchy and consists of four steps (see Figure 2).

1. Drugs are mapped to RxNorm Ingredient via the 5th level ATC code. The OMOP relationship ‘ATC - RxNorm’ is used for this purpose.

2. Dose form is added to the ingredient level, to map to Clinical Drug Form level. 3. The information on drug strength (including unit) is added to map to Clinical

Drug Component. The strength is rounded to two decimals.4. The above three mappings are combined to map to a Clinical Drug concept.

Manual mappings are added for a small number of frequently prescribed drugs. The Danish dose forms and units are also manually mapped to OMOP concepts.

Ingr. Form Str. Level

✓ ✓ ✓

✓ ✓

✓ ✓

6. ConclusionsThe majority of the Danish drug concepts have been mapped automatically to the RxNorm vocabulary (red solid bars in Figure 3). Further improvements can be made by extending the manual mappings and supporting multi-ingredient drugs.However, a 100% mapping is not achievable with this method. Our work clearly demonstrates the need for the addition of a RxNorm extension to enable the mapping of currently missing drugs, forms and strengths in the standard OMOP vocabulary.

5. Results

▲ Figure 1: Examples of input data. Example 1 is successfully mapped automatically. Example 2 consists of two ingredients and has an ATC concept that could not be mapped to a RxNorm concept.

3. RxNorm extensionA major limitation of the mapping is the incomplete RxNorm vocabulary. Multiple drugs in this Danish dataset do not have a counterpart in RxNorm. An example is Litarex 6 mMol, where RxNorm only contains strengths in mg. To be able to map these drugs, the OHDSI community has proposed to use an extension on RxNorm, called Pseudo-RxNorm. Work is in progress to add as much drugs as possible to Pseudo-RxNorm for a complete mapping.

➢ Risperdal➢ N05AX08➢ Filmovertrukne tabletter➢ 0.5➢ MG

Example 1

Example 2➢ Forsteo➢ H06AA02➢ Injektionsvæske,

opløsning➢ 20 mikg/80 mikrol.

Teriparatide Injectable Solution (Clinical Drug)

➢ Fortzaar➢ C09DA06➢ depottabletter➢ 100 + 25 mg

3.9%

10.3%

References: I Schuemie M, Kubota K, “JMDC drug to OMOP Vocabulary mapping”, 2014 August 26

Risperidone 0.5 MG Oral Tablet (RxNorm Clinical Drug)

Condesartan and diuretics (ATC code)

Mapped to

Page 48: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

48

Use of OMOP/OHDSI provides EMIF with:

u  A uniform way to perform suitability and feasibility

queries across multiple diverse European data sources

u  An entry point to quickly initiate and perform

observational studies within one or more data sources

u  Direct insight & dashboarding of data for data owners

(e.g. national registries, hospitals)

Page 49: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project

The goal is patient benefit

49

Prof. Johan van der Lei Erasmus MC University Medical Center

“We need to learn from experience and find ways to unite the large volumes of data in Europe. At

the end of the day, we are in this for better health care.”

Co-coordinator EMIF-Platform

EMIF-Platform

Page 50: SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project