60
“Big Data, Big Opportunities: It’s time to Roll up our sleeves and get to work!” Brian D. Athey, Ph.D. Professor and Chair, Department of Computational Medicine and Bioinformatics Professor of Psychiatry and of Internal Medicine Director, Academic Informatics University of Michigan Medical School

“Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

“Big Data, Big Opportunities:

It’s time to Roll up our sleeves

and get to work!”

Brian D. Athey, Ph.D.

Professor and Chair,

Department of Computational Medicine

and Bioinformatics

Professor of Psychiatry and of Internal Medicine

Director, Academic Informatics

University of Michigan Medical School

Page 2: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• I am founding chair of the S&TAB of Appistry, Inc.; St. Louis, MO

• I am a founding member of the SAB, Accentia Biosciences; Tampa, FL

• I am joining the SAB or AssureRx Health; Mason, OH

• I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC

• I serve (or have served) on numerous CTSA Steering Committees and Boards at Academic Health Centers and Clinics (e.g Marshfield Clinic)

• I serve on the ad hoc sub-committee of overseeing caBIG and the NCI Informatics Strategy, reporting to the NCI Director and the NCI Board of Scientific Advisors

• I am no longer a consultant to the NIH CIO (ended March, 2011)

Page 3: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

4/23/2012 3 4/23/2012 Brian D. Athey, Ph.D. Department of Computational Medicine and Bioinformatics

Outline

• Our new data intensive world/issues

• The science of Biomedical Informatics

• Genomics and Medicine: Toward Personalized

Medicine

• Gearing up the Academic Health Center

Enterprise data infrastructure

• Regional and national projects: Promise and

Complexity

• Issues to think about to enhance future success

Page 4: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

4/23/2012 4 4/23/2012 Brian D. Athey, Ph.D. Department of Computational Medicine and Bioinformatics

• Personal Computing Social Networks

• “Mobile”

• “Cloud”

• “Big Data”

• Media Driven

-----------------------------

Net = New way of life for us and

for Academic Health Centers

See also The Economist, Oct. 8th – 14th, 2011

Page 5: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Lee Hood, IOM February 27, 2012

Page 6: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Lee Hood, IOM February 27, 2012

Page 7: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

4/23/2012 7 4/23/2012 Brian D. Athey, Ph.D. Department of Computational Medicine and Bioinformatics

New York Times, January 4, 2012

"IBM's chairman, Samuel Palmisano, said in a

speech last September. "But there are also upward

of a trillion interconnected and intelligent objects and

organisms - what some call the Interconnected and

intelligent objects and organisms - what some call

the Internet of Things. all of this is generating vast

stores of information. It is estimated that there will

be 44 times as much data and content coming over

the next decade...reaching 35 zettabytes in 2020. A

zettabyte is a 1 followed by 21 zeros. And thanks to

advanced computation and analytics, we can now

make sense of that data in something like real time”.

Page 8: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

“Data volumes are growing exponentially”

• There are many reasons for this growth:

– the creation of nearly all data today in digital

form

– a proliferation of sensors (e.g. Next-Generation

Sequencing)

– new data sources such as high-resolution

imagery and video.

• The collection, management, and analysis of data

is a fast-growing concern of NIT research.

• Automated analysis techniques such as data

mining and machine learning facilitate.

• Transformation of data into knowledge, and of

knowledge into action.

“Every federal agency needs to have a ‘big data’

strategy”

Page 9: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

4/23/2012 Brian D. Athey, Ph.D. Department of Computational Medicine and Bioinformatics

“In this data-rich world, your competitive advantage is your ability to transport, collect, store, organize, mine, visualize, and machine learn against data. This ‘computational knowledge extraction’ lies at the heart of 21st century discovery.”

This CI idea pervades all fields of modern research

--Ed Lazowska

Bill & Melinda Gates Professor at the University of Washington

Co-Chair, PCAST NITRD Subcommittee

E-mail responding to Athey-Glotzer U-Michigan CI Report

Page 10: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 11: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• Associate the avalanche of genomic and high-throughput molecular information with disease risk

• Powerful computational methods and integrated cyberinfrastructure to enable sophisticated hierarchal systems modeling and analysis

• Effective linkages with better environmental, dietary, and behavioral datasets for eco-genetic analyses

• Credible privacy and confidentiality protections in research and clinical care

• Breakthrough tests, vaccines, drugs, behaviors, and regulatory actions to reduce health risks and cost-effectively treat patients globally. Omenn and Athey, 2010

Page 12: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Population(s)

Athey and Omenn, 2010

Page 13: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

QSP White Paper October, 2011  Page 14 

Contemporary systems biology: four complementary approaches 

Systems biology is advancing in four distinct but complementary directions, all of which are relevant to 

pharmacology. The first  involves  large­scale measurement and network  inference. This approach aims 

to discover  interactions among hundreds or even  thousands of genes and proteins using systematic, 

high­throughput measurements (e.g. mRNA profiling, two­hybrid screening, mass spectrometry­based 

proteomics  and  metabolomics).  The  resulting  data,  which  typically  derive  from  high­throughput 

genomic, proteomic or other –omic approaches are assembled into complex networks whose properties are  studied  using  graph­based methods  derived  from  computer  science. Networks  of  this  type have 

been used to characterize drug targets in a systematic manner [30­32] and are increasingly important in developing disease classifiers based on sequence or  transcription data (sometimes called  “systems 

medicine”  [28,  33]).  The  second  direction  involves  attempts  to  elucidate  the  principles  of  biological design  or  function  based  on  analogies  with  engineering  or  physics.  Properties  elucidated  for  one 

biological  network  may  be  generalized  into  concepts  such  as  “feed  forward  control”,  “robustness”, 

“adaptation”,  etc.  A  notable  success  of  these  efforts  has  been  the  recognition  that  noise  plays  an 

important  role  in  limiting  the  accuracy  of  biochemical  circuits  and  in  creating  cell­to­cell  variability; 

conversely, the ability of some regulatory motifs (positive feedback for example) to  increase precision in  the  face of  this variability has attracted  interest  in  it as a design  feature [34].   The  third  thrust  in 

systems biology  involves combining mathematical modeling of regulatory and signaling pathways with multiplex  and  single­cell  experimental  data  as  a  means  to  understand  the  precise  biochemistry, 

dynamics and functions of the networks that control normal cellular physiology and cause disease.  This approach  is  a  natural  complement  to  molecular,  structural  and  cellular  biology  [35,  36].  At  the 

moment,  this  type  of  analysis  is  often  limited  to  pathways  of  20­100  components,  but  the  size  of 

networks  that  can  be  analyzed  is  expected  to  increase  rapidly  in  the  future.  Because  systems 

pharmacology is necessarily multi­scale, all three of these systems biology approaches are expected to 

be  important  in  the  future development of  the  field.   The  fourth approach, which may have a  large 

impact  in the long term,  is “synthetic biology”. The synthetic strand  in systems biology aims to create 

fundamentally new biological  devices  based  on  discoveries  from  other  areas  of  systems biology  and new approaches  to genetic engineering.   Synthetic biology adds  the  fields of biochemical engineering 

and  industrial process optimization to systems biology and also adds problems outside the purview of conventional biomedicine, such as bioenergy and bioremediation.   

Figure 4. Horizontal and vertical integration in systems biology

and pharmacology. One representation of horizontal and 

vertical integration emphasizing 

changes in physiological 

complexity, which tends to parallel 

changes in time scales (from 

seconds and minutes to years and lifespans). The goal for QSP is to bring network­level understanding 

of drugs to the complex physiology 

of patient responses. The arrows 

denote trend lines. 

Achieving horizontal and vertical integration through multiplex measurement and modeling 

The 2008 white paper on quantitative and systems pharmacology (summarized in Appendix 1) carefully 

considered  the  complementary  strengths  of  “horizontal”  (Appendix  2)  and  “vertical”  (Appendix  3) 

integration  in pharmacology (Figure 4). Many practical and conceptual challenges remain  in achieving 

effective horizontal and vertical integration of biological knowledge, and the difficulties are magnified by 

the tendency of practitioners to focus on a single type of data (proteomics or genomics, for example) 

and  of  funding  agencies  and  academic  organizations  to  value  specialists  over  integrators.  Cultural 

Emergence of Quantitative and Systems Pharmacology:

An NIGMS White Paper (Sorger et al., 2011)

Page 14: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Lee Hood IOM February 27, 2012

The Science in the Middle: Linking Core Facilities to

Models and Driving Research Problems

Page 15: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

4/23/2012 Brian D. Athey, Ph.D. Department of Computational Medicine and Bioinformatics

The Scope of Biomedical Informatics and

Cyberinfratstructure: Classical View

Ted Shortliffe, 2005

Page 16: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Eric E. Schadt “Molecular networks as sensors and

drivers of common human diseases”. (2009). Nature

461, 218-223. doi:10.1038/nature08454

General Models we Must Consider

Page 17: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Bill Stead, IOM 2007

Page 18: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

PCAST NITRD recommends development of:

• Electronic Health Records (EHR), Personal Health Records (PHR) and Health Information Exchange

• Universal Exchange Mechanism for Health IT Data

• Dynamic ‘OMIC’ Analytics and Data Management Infrastructure for Longitudinal Patient-Centric EHR/PHR

• Pharmacogenetic Informatics

• Link Integrated Medication Systems to Basic Research Systems such as High-Through-put Sequencing

• Development of an Informatics-based Surgery Network

Red indicates touch points to Genomics

Page 19: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• “It is recommended that a Dynamic ‘Omics Analytics and Data Management Infrastructure for enhanced analysis and standardized interoperability with a Longitudinal Patient-Centric Electronic Health Record (EHR)/Personal Health Record (PHR) be created. This will enable Integration between ‘multi-omics’ data at Patient/Research Participant level in EHR:

• Genomics; Epigenomics; Proteomics; Metabolomics

• Pharmacogenomics; Toxicogenomics

• Imaging; Cognitive and Behavioral measures; Environmental measures

• Secure links to Patient Data in EHR/PHR

• Socio-economic measures”

Page 20: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Transcriptome Digital Gene

Expression Methylation Small RNA ChIP-Seq

Gene Fusion

Discovery

Mutation discovery

Alternative Splice

(AS) Variants

Gene expression

Discovery

miRNA profiling Binding site

detection Enumerate gene

expression

The ‘Generic’ Genomics Culprit

Page 21: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Stein LD: The case for cloud computing in genome informatics.

Genome Biology 2010, 11:207.

Colliding worlds of data production and storage

Page 22: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 23: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

George Poste, IOM Feb. 28, 2012

Page 24: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

George Poste,

IOM Feb. 28, 2012

Page 25: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 26: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Spanning Discovery, Translation, and Patient Care

•“Clinomics”

•Epigenomics

•Pharmacogenomics

•Microbiome

•Biomarkers

--Courtesy Gianrico Farrugia, M.D.; Mayo Clinic Rochester

Page 27: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 28: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Westfall, J. M. et al. JAMA 2007;297:403-406.

Every step of the translational research pathway requires

Integration with HIT

T4 Outcomes

Page 29: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• Interoperability with Institutional EHR Systems

• Clinical transaction systems

• Clinical Data Repository (CDR)

• De-identification/Honest Brokering

• Tools to Facilitate Extracting/Downloading Data Software tools • CTSI Portals • Clinical Trial/Study Databases • Genomic, Proteomic, and Metabolomic High-Throughput Data

Repositories and Analysis Tools • Clinical Imaging Data Repositories and Analysis Tools • An Institutional Specimen Tracking System • A CTSA Core Lab LIMS (Laboratory Information Management System) • Population/Public Health Databases & Informatics Needs • Standards to promote interoperation within and between CTSA sites • Informatics Teaching & Training (Interface with CTSA Education

Program) • Biomedical Informatics Research in Support of C&T Research • Faculty, Staff, and Administrative Structure for Biomedical Informatics

CTSA Informatics Consortium Operations Committee

Bill Hersh (OSHU) and Brian Athey (UMICH), co-chairs.

2007

Page 30: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

We need more than IT

Bioinformatics Clinical Informatics

How to utilize basic science data to attain knowledge and make it useful

How to organize, structure and manage clinical data to make it content rich

Data Strategy , Architecture and Translation Functional output for:

“Science” + “Practice” = Research Education Patient Care Administration

Computation

Computer Science Information Technology

Science and research behind computing and data management capabilities: e.g. storage, speed, cost etc.

Hardware + Software – Where and how to capture, store, process and communicate data

Page 31: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

William S. Dalton; Moffitt Cancer Center; IOM Feb. 27, 2012

Page 32: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Gender

Ethnicity

Age

Weight

Diagnosis

Medical History

Literature

Databases

Terminologies

Ontologies

Lab Tests

Genes

Proteins

Biological Models

Technologies

Algorithms

Research

Page 33: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Essence of what we need out of the Data Factory

Lawrence Shulman, Dana-Farber Cancer Institute IOM Feb 27, 2012

Page 34: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

This perspective highlights the importance of investing in foundational IT and architecture, and interfaces between “silos” to enable secure data flow across patient care, business operations, research, and education.

Operational Management (Historical. e.g. quality, billing, reporting etc.)

Biomedical Research & Education

Trials

Quality Reports

Clinical Data Warehouse 1. CAD 2. QMP 3. ‘HSDW’ 4. Clarity 5. Others……..

Comparative Effectiveness

Research

Population Research

‘Omics Repository Administration Systems

Patient Care (Electronic Health Record)

Multiple Clinical Systems

Research Warehouse Clinical Data Repository

External Organizations

External Organizations

• Honest Broker • De-Identification • Anonymization • Consents • Identity Management • Vocabulary Mapping

---------------Enterprise Data Warehouse------------

Financial Reports

34

Page 35: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Data Warehouses must become Data Factories

William S. Dalton; Moffitt Cancer Center; IOM

Feb. 27, 2012

Page 36: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Biomedical Engineering

Historical Data

Registries

OpenClinica

Velos

BioDBX

RedCAP

Research Data Warehouse

Messaging Bus, ETL & External Collaboration Services (SOA, caGRID, SHRINE, ...)

Vocabulary & Terminology Mapping Services (ICD-9/10 SNOMED, IMO, caDSR, ...)

Common Identifier Services (Patient, Provider, Research, Specimens, External Mappings)

HIPAA/IRB Services (Honest Broker, DE-ID Consent Management, …)

Epic Clarity

HIM/ Documentation

Radiology

Pathology

Pharmacy

CareLink/ Eclipsys

Others…

Scheduling

Revenue Cycle

Emergency Med.

Ambulatory

International Data Sharing

with External

Collaborators

CTSAs caBIG

TCGA

ULAM

Tissue Biorepositories

Metabolomics

Proteomics

Bioinformatics

Next-Gen Sequencing

Collexis

eThority (billing)

Click Commerce

(IRB)

Research Pre, Post- Award

Industry:

Pharma/

Biotech I2b2/

SHRINE

Others …

Demographics

Diseases

Individuals

Populations

Po

rtals

/ Pro

vid

ers

, Payo

rs, P

. Health

Data

bases / H

IEs / N

HIN

IT S

ER

CU

IRT

Y

Cam

pu

s S

yste

ms

IT S

ecurity

HSDW

i2b2

High

Performance

Cloud

Computing &

Data Storage

IT S

ecu

rity

• Reporting

• Visualization

• Analysis &

• Data Mining

IT S

ecu

rity

Brian Athey

& ECRIT

1/11/11

Health

Sciences

Library

Resources

NIH-Specific &

External Data

Resources

(PubMed, GenBank,

KEGG, GO, etc.)

SPORES

Others

Clinical Analysis

Database (CAD)

UMHS Data Architecture Unifying the Three Missions:

Education, Research, & Patient Care

Visiting Student Application Service(VSAS)

M-Pathways

Ctools/Saki 3

Curriculum Eval. System

Clinical Scheduling & Grading System

Comprehensive Clinical Assessment Exam

Admissions

CAD

CDR Education Knowledge Repository

Research Administration

Data Warehouse

Research

Administration

Systems

Research

Core

Facilities/

‘Omics’

Research

Data

Management

Systems

Research &

Quality

Metrics

Data Marts

Quality

Metrics

Reporting

&

Peer

Review

Patient Care Systems

Legacy+/Epic EHR

Enterprise Federated Data Warehouse

Service-Oriented Information Bus

Education

Bioinformatics and Systems

Biology Workbenches

Page 37: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Biomedical Engineering

Historical Data

Research

Data

Management

Systems

Registries

OpenClinica

Velos

BioDBX

RedCAP

Research Data Warehouse

Messaging Bus, ETL & External Collaboration Services (SOA, caGRID, SHRINE, ...)

Vocabulary & Terminology Mapping Services (ICD-9/10 SNOMED, IMO, caDSR, ...)

Research

Administratio

n

Systems

Common Identifier Services (Patient, Provider, Research, Specimens, External Mappings)

HIPAA/IRB Services (Honest Broker, De-ID Consent Management, …)

Epic Clarity

Patient Care Systems

Centricity Documentation

Radiology

Pathology

Pharmacy

CareLink/ Eclipsys

Others…

Scheduling

Revenue Cycle

Emergency Med.

Ambulatory

Research Core

Facilities/‘Omics’

International Data Sharing

with External

Collaborators

CTSAs caBIG

TCGA

Epic EHR Legacy +

ULAM

Tissue Biorepositories

Metabolomics

Proteomics

Bioinformatics

Next-Gen Sequencing

Bioinformatics and Systems

Biology Workbenches

Collexis

eThority (billing)

Click Commerce

(IRB)

Research Pre, Post- Award

Industry:

Pharma/

Biotech I2b2/

SHRINE

Others …

Demographics

Diseases

Individuals

Populations

Po

rtals

/ Pro

vid

ers

, Payo

rs, P

. Health

Data

bases / H

IEs / N

HIN

IT S

ER

CU

IRT

Y

Cam

pu

s S

yste

ms

IT S

ecurity

HSDW

i2b2

High

Performance

Cloud

Computing &

Data Storage

IT S

ecu

rity

• Reporting

• Visualization

• Analysis &

• Data Mining

IT S

ecu

rity

Research &

Quality Metrics

Data Marts

Brian Athey

& ECRIT

1/11/11

Health

Sciences

Library

Resources

NIH-Specific &

External Data

Resources

(PubMed, GenBank,

KEGG, GO, etc.)

SPORES

Others

CIDSS Analytics

& Reporting Tools

Quality

Metrics

Reporting

&

Peer

Review

Education

CAD

CDR Education Knowledge Repository

Research Administration

Data Warehouse HIM

Others…

M-Pathways

CTools/Sakai 3

Curriculum Evaluation System

Clinical Scheduling

Comprehensive Clinical Assessment

Exam

Admissions

UMHS Data Architecture Unifying the Three Missions:

Education, Research, & Patient Care

Page 38: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

I2b2/

EMERSE

Biomedical Informatics Layer

Neo

nat

es

UM

ClinicalStu

dies.org

Informed Consent

Process/Forms

Genomic DNA

+ EHR/PHI

Disease Only

Genomic DNA

+ EHR/PHI

No Restrictions

Genomic DNA

+ EHR/PHI

Re-consent

DNA Samples

caTISSUE Database

EHR/PHI Data

Center for Health

Communication

Research Wel

lnes

s

Participant Portal

Asset

Layer

Permission

Layer

Informed

Consent Layer

Vu

lner

abili

ty D

om

ain

s

Ag

ed

Fat

al Il

lnes

s

Research

Data

Warehouse

Ho

nes

t

Bro

ker PI-Driven

Informatics

Analysis

(BIC)

IRB review

& approval

DNA

Sequencing

Core & Data

PI Portal

De-ID

Sequence

Data

Sequence

DNA

Samples

Access DNA

Samples

(De-ID or Re-ID)

Recruitment Enrollment, Biospecimen

Processing & Storage, EHR/PHI Capture Data Organization, Analyses,

Integration & Sharing

Recruitment Layer

MICHR Stewardship

Design

& Enable

Specific

Protocols

(BERD)

INSTITUTIONAL REVIEW BOARD

Page 39: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

I2b2/ EMERSE

Research Data

Warehouse

Honest

Broker (ID, De-ID)

PI-Driven Informatics Analysis

IRB review & approval

DNA Sequencing Core & Data

PI Portal

Sequence

Data

Sequence DNA Samples

Access DNA Samples

(ID, De-ID)

Data Organization, Analyses, Integration & Sharing

Design & Enable Specific

Protocols

INSTITUTIONAL REVIEW BOARD

DNA Samples

EHR/PHI Data

Page 40: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Marshfield Clinic

UW-Madison

UW-Milwaukee

Med. College of Wisc.

Wisconsin Genomics Initiative - Expertise/unique resources

- Clinical Data

- Biobank & Genetic Results

- Genotyping Facilities

- Biostatistics

- Machine Learning

• Rarely a “one-stop shop” • Expertise to collect, curate and maintain rich sources of data

• Expertise and resource to process rich sources of data

• Development of shared resources and networks • Move beyond just “big data”

Page 41: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Structured Data

Handwritten Documents (Scanned, Paper Charts)

Images (PACS, Photos)

Electronic Text

Past Present Future Optical Character Recognition (OCR)

Manual Abstraction

Image Analysis Manual Abstraction

Natural Language Processing (NLP)

Data Warehouse (DW) Queries

How To Process It

• The richest source of information isn’t always the easiest to get to

• A single source of information rarely tells the whole picture

• Once processed, data doesn’t always “make sense”

• Multi-disciplinary and iterative approach

• Scientists (what the goal of the data is)

• Data experts (how data is represented)

• Content experts (how the data is collected/created) Peissig P, et al. JAMIA 2012:19 Rasmussen L, et al. JAMIA 2011:ePub Starren J, Personal Communication

Genomics High Performance Computing (HPC)

Page 42: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

NIH National Center for Integrated Biomedical

Informatics (NCIBI): Overview

Page 43: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Integrated Tools and that are Integrative

and Analyzable

Athey, B.D., Cavalcoli, J., H.V Jagadish, G.S. Omenn, B. Mirel, M. Kretzler, C. Burant, R. Isopheki, C. DeLisi, the NCIBI faculty, trainees, and

staff. 2011. The NIH National Center for Integrative Biomedical Informatics (NCIBI). J. Am. Med. Inform. Assoc. doi:10.1136/amiajnl-2011-

00552.

Page 44: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

SMART Container

i2b2

PHR

EMR Epic Cerner Vista

CDR

i2b2

ETL

ETL

SHRINE

Cohort DiscoveryFederated Queries

SHRINE

Cohort DiscoveryFederated Queries

i2b2

CTMS

ETL

ETL

SMART Application

Patients

SMART Application

Research

SMART Application

Physicians

Harvard i2b2 – Michigan Partnership: Enterprise SMART i2b2

Page 45: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• EMR View running inside i2b2 web client • Searchable / scrollable list of installed SMART apps

Example – SMART enabled i2b2

Page 46: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

The Meducation SMART app processes medication lists from

the patient record and then enables viewing and printing of

simplified medication instructions in any of a dozen languages

(http://smartapp.meducation.com/)

Page 47: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

The “SMART Clinical Researc app” allows investigators to

automatically populate clinical research forms (following the

CDISC CDASH content standard) for the “Demograpics” (DM)

and “Prior and Concomitant Medications” (CM) domains for

which the FDA requires to submit data in a standardized way

from the EHR.

Page 48: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Priority Contact™ is an application that enhances the work

process of a clinician by managing contact with patients after

they have left the clinic and new information relevant to their

treatment plan has been obtained (e.g. the results of tests).

Page 49: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Data

Warehouse

Curation

ETL

ETL

Master

Data

ETL

Federation

Mining

Text Corpus (DIP, Conference Abstracts)

Lucene

Index

Internal

Clinical Research and Trials

Clinical Data

Gene Expression

RBM Analysis Results SNP

Curated Content

Non-clinical gene expression

Page 50: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Users

Data

And

Providers

Page 51: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• Clinical data after Database Load; de-identified, anonymized)

• Gene expression data (Affymetrix, Next-Gen Sequencing)

• Protein profiling data (Rules-based Medicine panels)

• Genetic data (candidate SNPs) • Metabolomics data • ELISA assays • Laboratory chemistry data • Proteomics data

Page 52: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 53: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 54: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Non-Profits Government Committers

Pharma / Biotech eTRIKS Academics

IC London

Subcontractors Core

Developer(s)

tranSMART Foundation

Community

Manager

Project Code and

Data

Page 55: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)
Page 56: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

• Science is global and thrives in the digital dimensions;

• Digital scientific data are national and global assets;

• Not all digital scientific data need to be preserved and

not all preserved data need to be preserved indefinitely;

• Communities of practice are an essential feature of the

digital landscape;

• Preservation of digital scientific data is both a

government and private sector responsibility and benefits

society as a whole;

• Long-term preservation, access, and interoperability

require management of the full data life cycle; and

• Dynamic strategies are required.

Page 57: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

CIO CRIO

EHR, ERP, Quality, PACS CTMS, IRB, Bio Banking, Grants Mgt, etc

JCAHO, AHRQ, MU, ACO, HIPAA, OSHA, HIE I2b2, caBIG, VIVO, IRB/HIPAA, clinicaltrials.gov

Customers: Physicians, nurses, CXO, managers Customers: Researchers, students, faculty

HIMSS AMIA

1000s FTEs 10s-100s FTEs

EDW (Clinical, Quality) EDW (Research, Education)

CIO vs. CRIO…

The “new

guy”

IT Analytics

Operations and workflows Decision support

Technologies Knowledge

Security Risk

Predictability, reproducibility Unpredictability, innovation

Engineering, implementation Architecture, research

… and, IT vs. Analytics The “new

buzzword”

Umberto Tachanardi, Ph.D U Wisconsin CTSA Informatics IKFC

An Emerging Need in Academic Health Center IT Organizations

Page 58: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

Classic IT

Classic

Academia/NIH

IOM 2011

Page 59: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

"If your daily life seems poor, do not blame it; blame yourself

that you are not poet enough to call forth its riches."

--R. M. Rilke

Let’s get to work!

Page 60: “Big Data, Big Opportunities: It’s time to Roll up our sleeves and … · • I am chair of the Technical Advisory Board; 1Mind4Research, Washington, DC • I serve (or have served)

NCIBI Program Officer (PO) – Dr. Karen Skinner, NIDA

NCIBI Lead Science Officer (LSO) – Dr. Jane Ye, NLM Director of Bioinformatics and Computational Biology

Dr. German Cavelier, NIMH; NCIBI Science Officer

Dr. Peter Lyster, NIGMS; Center for Bioinformatics and Computational Biology

Director, Center for Bioinformatics and Computational Biology,

NIGMS; Dr. Karin Remington

Elaine Collier, NCRR

NIGMS/NIDA U54-DA-0215191

UL-1RR024986/NCRR CTSA

tranSMART: Johnson & Johnson Corporation—Garry Neal, Corporate VP Pharma R&D; John Shon, Director, Clinical and Translational Programs