48
Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54 HL108460 Lucila Ohno-Machado, MD, MBA, PhD Biomedical Informatics, University of California San Diego

Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Genomics and Informatics Technology Solutions to Compute with Sensitive Data

April 21 2015 Innovations in Healthcare Informatics

NIH U54 HL108460 Lucila Ohno-Machado MD MBA PhD

Biomedical Informatics University of California San Diego

Patient Interaction

Data Analysis Statistics Comparative Effectiveness

Data Structuring Natural Language Processing Data Modeling

Predictive Modeling StatisticalMachine Learning Evaluation Methods

Decision Support Tools Guidelines Alert amp Reminders

Data Collection Tools Clinical Data Warehouse

Data Integration Omics Sensors

Data De-Identification Privacy Technology Ethics consents

Communication Strategies Consumer Health Informatics Medical Education

Medicine in the Era of Big Data

bull Biomedical science is moving towards data-driven approaches raquo 60+ genetic variants are

adopted in clinical practice

raquo Sequencing is getting cheaper

raquo Family history data is consider as the best genetic test available

slide from Dr Xiaoqian Jiang

3

Excerpt from a consent for care form

USE OF MEDICAL INFORMATION AND SPECIMENS

I understand that my medical information photographs andor video in any form may be used for other [INSTITUTION] purposes such as quality improvement patient safety and education I also understand that my medical information and tissue fluids cells and other specimens (collectively Specimens) that [INSTITUTION] may collect during the course of my treatment and care may be used and shared with researchers

4

Health Insurance Portability amp Accountability Act HIPAA

bull Security HIP De-identified data raquo Risk of disclosure bull removal of 18 identifiers such

raquo Liability as dates biometrics names etc bull expert certification of low risk of

bull Privacy re-identification

raquo Risk of re-identification bull Limited data sets have dates

raquo Informed consent

bull Presidential Commission for the Study of Bioethical Issues Privacy and progress in whole genome sequencing httpwwwbioethicsgov covers genome sequencing exome sequencing genome-wide SNV analysis and data from large scale genomic studies

5

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 2: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Patient Interaction

Data Analysis Statistics Comparative Effectiveness

Data Structuring Natural Language Processing Data Modeling

Predictive Modeling StatisticalMachine Learning Evaluation Methods

Decision Support Tools Guidelines Alert amp Reminders

Data Collection Tools Clinical Data Warehouse

Data Integration Omics Sensors

Data De-Identification Privacy Technology Ethics consents

Communication Strategies Consumer Health Informatics Medical Education

Medicine in the Era of Big Data

bull Biomedical science is moving towards data-driven approaches raquo 60+ genetic variants are

adopted in clinical practice

raquo Sequencing is getting cheaper

raquo Family history data is consider as the best genetic test available

slide from Dr Xiaoqian Jiang

3

Excerpt from a consent for care form

USE OF MEDICAL INFORMATION AND SPECIMENS

I understand that my medical information photographs andor video in any form may be used for other [INSTITUTION] purposes such as quality improvement patient safety and education I also understand that my medical information and tissue fluids cells and other specimens (collectively Specimens) that [INSTITUTION] may collect during the course of my treatment and care may be used and shared with researchers

4

Health Insurance Portability amp Accountability Act HIPAA

bull Security HIP De-identified data raquo Risk of disclosure bull removal of 18 identifiers such

raquo Liability as dates biometrics names etc bull expert certification of low risk of

bull Privacy re-identification

raquo Risk of re-identification bull Limited data sets have dates

raquo Informed consent

bull Presidential Commission for the Study of Bioethical Issues Privacy and progress in whole genome sequencing httpwwwbioethicsgov covers genome sequencing exome sequencing genome-wide SNV analysis and data from large scale genomic studies

5

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 3: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Medicine in the Era of Big Data

bull Biomedical science is moving towards data-driven approaches raquo 60+ genetic variants are

adopted in clinical practice

raquo Sequencing is getting cheaper

raquo Family history data is consider as the best genetic test available

slide from Dr Xiaoqian Jiang

3

Excerpt from a consent for care form

USE OF MEDICAL INFORMATION AND SPECIMENS

I understand that my medical information photographs andor video in any form may be used for other [INSTITUTION] purposes such as quality improvement patient safety and education I also understand that my medical information and tissue fluids cells and other specimens (collectively Specimens) that [INSTITUTION] may collect during the course of my treatment and care may be used and shared with researchers

4

Health Insurance Portability amp Accountability Act HIPAA

bull Security HIP De-identified data raquo Risk of disclosure bull removal of 18 identifiers such

raquo Liability as dates biometrics names etc bull expert certification of low risk of

bull Privacy re-identification

raquo Risk of re-identification bull Limited data sets have dates

raquo Informed consent

bull Presidential Commission for the Study of Bioethical Issues Privacy and progress in whole genome sequencing httpwwwbioethicsgov covers genome sequencing exome sequencing genome-wide SNV analysis and data from large scale genomic studies

5

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 4: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Excerpt from a consent for care form

USE OF MEDICAL INFORMATION AND SPECIMENS

I understand that my medical information photographs andor video in any form may be used for other [INSTITUTION] purposes such as quality improvement patient safety and education I also understand that my medical information and tissue fluids cells and other specimens (collectively Specimens) that [INSTITUTION] may collect during the course of my treatment and care may be used and shared with researchers

4

Health Insurance Portability amp Accountability Act HIPAA

bull Security HIP De-identified data raquo Risk of disclosure bull removal of 18 identifiers such

raquo Liability as dates biometrics names etc bull expert certification of low risk of

bull Privacy re-identification

raquo Risk of re-identification bull Limited data sets have dates

raquo Informed consent

bull Presidential Commission for the Study of Bioethical Issues Privacy and progress in whole genome sequencing httpwwwbioethicsgov covers genome sequencing exome sequencing genome-wide SNV analysis and data from large scale genomic studies

5

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 5: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Health Insurance Portability amp Accountability Act HIPAA

bull Security HIP De-identified data raquo Risk of disclosure bull removal of 18 identifiers such

raquo Liability as dates biometrics names etc bull expert certification of low risk of

bull Privacy re-identification

raquo Risk of re-identification bull Limited data sets have dates

raquo Informed consent

bull Presidential Commission for the Study of Bioethical Issues Privacy and progress in whole genome sequencing httpwwwbioethicsgov covers genome sequencing exome sequencing genome-wide SNV analysis and data from large scale genomic studies

5

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 6: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Biometrics and PHI

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 7: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Genomes are Biometrics

Biometrics are

Protected Health Information (PHI)

bull Biometrics require HIPAA

PHI requires HIPAA

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 8: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 8

What is the problem

bull Imagine a public data set of genomes about a certain disease (eg lzheimer s)

bull Can you find out if your neighbor has lzheimer s

bull Same problem happens with clinical data which can be re-identified to a target patient

raquo Dates of visits

raquo Combination of patient characteristics

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 9: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

New DNA tests on a secret sample collected from a relative of suspect Albert deSalvo triggered the exhumation after authorities said there was a familial match with genetic material preserved in the killing of Sullivan

Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo s nephew

But a lawyer for the DeSalvos told CNN the family was outraged disgusted and offended by the decision to secretly take a DN sample of one of its members

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 10: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Bigger data bigger challenge

bull High dimensional data (eg whole genomic sequencing)

bull Privacy rules (PHI including genomic data need to be protected under HIPAA)

bull Institutional processes (internal review boards)

bull Patient preferences Tiered information consent bull Federated clinicalgenome data research

networks raquo Secure multiparty computation

raquo Homomorphic encryption

Jiang X Sarwate A Ohno-Machado L Privacy technology to support data sharing for comparative effectiveness research A systematic review Medical Care 51(8 Suppl 3)S58-65 2013 PMID 23774511

adapted from Dr Xiaoqian Jiang

10

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 11: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Knowledge amp Tools

Privacy

Consent

Data

Share access to data and

Train the new generation of

Provide innovative software platform and infrastructure

iDASH

Knowledge

amp Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

Service WWW

Apps

Exec

AggregHosting

Sharing

Policies

Platform

Research

Develop

Federation

11Supported by the NIH Grant U54 HL108460 to the University of California San Diego

Our Goals

bull computation bull

data scientists bull

bull Protect privacy Develop

raquo Algorithms

raquo Tools

raquo Infrastructure

raquo Policies

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 12: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 12

Models for Sharing Data

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 13: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Clinical Data Network ndash UC-ReX

bull Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (gt12 million patients)

bull Improve patient safety surveillance quality improvement translational research

Funded by the UC Office of the President

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 14: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

UC-Research eXchange

UCSD

(Epic)

Data Matching Function Map D onto data dictionaries use Data Warehouses derived from

EHR

Results on Harmonized Data

Request for Data on Cohort

Registries

other DBs UC Irvine

(Allscripts) UC Davis

(Epic)

UCSF

(Epic)

Community

Partners

Researcher

wants data about

population

- Currently

- Count queries

- Future steps

- Individual-level patient data delivery

- Cross-institutional analytics

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 15: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

User requests data for

Quality Improvement

or Research Are the data

accessible

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Quality improvementhealth services research

Diverse Healthcare

How many patients over 65 are on Warfarin or Dabigatran

What are the major and minor bleeding rates for patients on these drugs

Is CYP2C9 mutation an important factor

AHRQ R01HS19913 EDM forum Entities Count queries and statistics across data in 3 different states warehouses (federal state private)

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 16: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Kim K et al Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network Medical Care 2013

Kim K et al Data Governance Requirements for Distributed Clinical Research Networks Triangulating Perspectives of Diverse Stakeholders JAMIA 2014

Meeker et al A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies Implementation in the Scalable National Network for Effectiveness Research JAMIA (accepted)

Kim K et al Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research JAMIA (accepted) Scalable National Network for

Comparative Effectiveness Research

Privacy-preserving distributed computing

Policy embedded in software

AHRQ R01HS19913 EDM forum

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 17: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

User requests data for

Quality Improvement

or Research

bullIdentity amp Trust

Management

bullPolicy

enforcement

Trusted

Broker(s)

Security Entity

Privacy-preserving distributed computing

Predictive modeling and adjustment for cofounders require lots of data

Some institutions cannot move data outside their firewalls we can bring computation to the data

Diverse Healthcare Entities in 3 different states (federal state private)

Wu Y et al Grid Binary LOgistic REgression (GLORE) Building Shared Models Without Sharing Data JAMIA 2012

Wang S et al EXpectation Propagation LOgistic REgRession (EXPLORER) Distributed Privacy-AHRQ R01HS19913 EDM forum Preserving Online Model Learning J Biomed Inf 2013

Scalable National Network for Jiang W et al WebGLORE A Webservice for Grid Logistic Regression Bioinformatics 2014 Comparative Effectiveness Research Wu Y et al Grid Multi-Category Response Logistic Models BMC Med Inform Dec Making 2015

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 18: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 18

Distributed Model

Conclusion no patient data needs to be sent from the sites only aggregates

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 19: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Horizontal and Vertical Partitions

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 20: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Privacy-preserving SVM for vertically partitioned data

Que J Jiang X Ohno-Machado L A Collaborative Framework for Distributed Privacy-Preserving Support Vector Learning AMIA Annual Symp Proc 20121350-9

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 21: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

pSCANNER Clinical Data Research

Members 5 UCs National VA (VINCI resource) 3 Federally Qualified Health Systems in LA (USC) RAND

Cohorts Kawasaki Disease Congestive Heart Failure Obesity

TCC The Childrenrsquos Clinic VM Virtual Machine SFGH San Francisco General Hospital OMOP Observational Medical Outcomes Partnership

Funded by PCORI as part of PCORnet

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 22: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

International Collaboration

Slide from Dr Shuang Wang

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 23: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 23

Models for Sharing Data

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 24: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

HIPAA and non-public data

public data tools recipes

Po

wer

ed b

y M

IDA

S

Data Tools Recipes

upload amp download data

compute request direct upload amp download of proprietary data tool recipe

middleware and HIPAA security developed by iDASH

Compute nodes Memory Disk storage Networking

Po

wer

ed b

yV

Mw

are AUTOMATED

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 24

iDASH On-Demand Resources

On-demand Virtualized Elastic Resilient Compute And Storage Technology

Safe HIPAA-compliant Annotated Data deposit box Environment

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 25: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth Remote data replication

800+ cores 7TB+ RAM 600TB+ storage

More planned 600+ cores 4TB+ RAM 200TB+ storage

iDASH On-Demand Resources

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 26: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 26

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

Repeatable Results

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 27: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 27

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterio us SNPs

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Co

nte

xt

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

Bookshelf

MyDATA

Repeatable Results

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 28: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

homomorphic encryption

secure multiparty computation

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 28

commons

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 29: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

iDSH15 Privacy Protection Challenge

bull Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacyorg)

bull Task 1 Homomorphic encryption (HME) based secure genomic data analysis bull Task 2 Secure comparison between genomic data in

a distributed setting

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 30: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

differential privacy

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 30

Models for Sharing Data

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 31: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Name Age Educ ation

Hoursw eek

hellip H T N

Frank 42 6 40 Y

Bob 31 10 60 Y

Dave 43 9 40 N

lsquoDe-Identificationrsquo (microdata release)

bull HIPAA compliant methods raquo Safe harbor dataset

bull Removal of 18 safe harbor identifiers

raquo Limited dataset

bull Removal of direct identifiers

raquo Statistical methods

bull Removalgrouping of attributes

bull Risk reasonably low

bull Re-identification and disclosure risks

Courtesy of Li Xiong

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 32: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Statistical Data Release (macrodata release)

Mohammed N Privacy Preserving Heterogeneous Health Data Sharing J Am Med Inform Assoc 2013

Jiang XL Differential-Private Data Publishing Through Component Analysis Transactions on Data Privacy 2013

Courtesy of Li Xiong

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 33: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Statistical Data Release Disclosure Risk

Original records Original histogram Courtesy of Li Xiong

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 34: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Courtesy of Li Xiong

Statistical Data Release Differential Privacy

Original records Original histogram Perturbed histogram with differential privacy

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 35: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Differential Privacy (Dwork et al)

D Drsquo

bull D and D are neighboring databases if they differ on at most one record

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D Drsquo and for any possible output S isin Range(A)

Pr[A(D) = S] le exp(ε) times Pr[A(D) = S]

Courtesy of Li Xiong

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 36: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Global Sensitivity

Differential Privacy Introducing Noise

bull Laplace Mechanism

For example for a single count query Q over a dataset D returning Q(D)+Laplace(1ε) gives ε-differential privacy

Courtesy of Li Xiong

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 37: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

iDASH 2014 First Privacy Protection Challenge

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

bull Task 1 Privacy-preserving SNP Data Sharing bull Task 2 Privacy-preserving release of top K

most significant SNPs

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 38: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Original Data

Which statistics to compute

How to apportion privacy budget

Released Statistics Synthetic Records

Statistical Data Release with Differential Privacy

Statistical Data Release under Differential Privacy

Data Privacy Interface Courtesy of Li Xiong

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 39: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Original Health Records

Data challenges - High dimensionality - Self-correlation

DP Interface

Released Statistics Synthetic Records

Target applications - Cross-sectional studies - Longitudinal studies

Low Dimensional Histograms (DPCube )

High Dimensional Distributions (DPCopula)

Sequential Data Release (DPTrie)

SHARE Statistical amp Synthetic Health Information Release

SHARE

Courtesy of Li Xiong

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 40: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

differential privacy

Supported by the NIH Grant U54 HL108460 to the University of California San Diego 40

Models for Sharing Data

Ohno-Machado L To Share or Not To Share That Is Not the Question Science Translational Medicine 2012 4(165)

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 41: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

4202015 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 41

Sharing

Look-up Patient I

I can check

that U looked

at my data D

bull Data use

agreements

bull Study registry

Trusted broker

User U requests

Data D on

individual I

Consent

Management

System

Do I wish to

disclose data D

to U

Shifting control

Patient Interface

Healthcare

Institutions

Yes

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 42: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

What to share Who to share with

Some preliminary findings from a limited set of interviews bull Healthy volunteers do not want to share with

commercially sponsored researchers bull Some want their medical information shared with

only UCSD researchers no others bull Many do not want to share at least 1 category of

sensitive information bull Most common decline was genetic followed by

sexual amp reproductive health

Courtesy of E Bell

42

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 43: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

44

Consent

Management System

Sharing Look-up Registry Patient I

Electronic Health Record

Clinical Data Warehouse

Healthcare Institution

Query

User U Results

Co

nci

erg

e o

rA

uto

mat

ed S

ervi

ces

Co

nn

ecti

ng

Soft

war

e

iCONCUR

Trusted Entity

informed CONsent for Clinical data Use for Research

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 44: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample Sharing in an Underserved Population

Partnership for Epidemiological

Research on Latinos

8213 Supported by the NIH Grant U54 HL108460 to the University of California San Diego 45

Data and Biospecimen Sharing Privacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population Emory Genome Institute of Singapore Imperial College

Does consent rate depend on who is obtaining the consent Maricopa Health System

Do patients understand what they consented for San Diego State University

Privacy amp Technology Projects

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 45: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Looking into the Near Future

Healthcare data and Biospecimen collections

Several terabytes of data are being generated every day and samples are being collected Participants could decide whatwhenwith whom to share

bull Genomes bull Images bull Body sensors bull Environmental sensors bull Electronic Health Records bull Personal Health Records bull Pharmacy bull Social media

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 46: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration

Epic CDDS New modules

Accrual for Clinical Trials

FISMA

pSCANNER

bioCADDIE

Electronic Health Record System Epic amp Clarity

Other Systems PACS lab etc

Personnel Systems Active Directory

Query Tools UC-ReX Explorer Privacy Technology

Clinical Research Data RedCAP Velos Other DBs

iDASH HIPAA SHADE Images human genomes etc

Recruitment Consent tools Custom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical Data Clinical Data Warehouse for Research

Scalable Network (Distributed Analytics Tools)

HIPAA

External data (patient reported data)

pSCANNER PCORI CDRN

iDASH HIPAAFISMA OVERCAST iDASH CTRI School of Medicine

UCSDrsquos Journey

iCONCUR Cancer Genomics

Text Mining NLP

USCLAC Cedars Sinai

San Mateo

Analytical Tools

De-ID Tools

BRIGHT

Cloud

PCORI contracts

K22 K99

NLM Training Grant

UC-ReX

iDASH

SCANNER

PhenDISCO

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA

Page 47: Genomics and Informatics · 2015-04-20 · Genomics and Informatics: Technology Solutions to Compute with Sensitive Data April 21, 2015 Innovations in Healthcare Informatics NIH U54

Acknowledgements

Department of Biomedical Informatics at UCSD funded by

NIH (U54 UL1 U24 UH3 R21 U01 T15 R00 K22 K99 D43) AHRq

UCBRAIDOP PCORI NVIDIA