The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services.
Current issues and challenges in sharing biomedical human subjects data
OASIS 2014
Lucila Ohno-Machado, MD, PhDDivision of Biomedical InformaticsUniversity of California San Diego Oasis 2014
Personalized Healthcare
What is the influence of genetics, environment?
Which therapies work best for individual patients?
Person-Centered Outcomes Research
• Genome– Sequencing data
• Phenotype– Personal monitoring
• Blood pressure, glucose
– Personal health records– Behavior monitoring
• Adherence to medication, exercise
• Environment– Air sensors, food quality– Location Source: DOE
Where does knowledge come from?
• Controlled studies with strict eligibility criteria• Does this apply to me?
Hopefully, but we need a lot of data to answer this question:• We need to build infrastructure to access large data
repositories – Lower the barriers to share data
• We need to share tools to analyze the data– Algorithms and computational facilities
Big Data, Medium Data, and Small Data
• Data integration across biological scales• Data annotation and harmonization• Data ‘anonymization’ and privacy preservation
Data for Personalized Medicine
Prevention, Diagnosis and Therapy– Genetic predisposition– Biomarkers– Pharmacogenomics– Health records– Sensors
Handling Protected Health Information– Secure Electronic Environment
• Electronic Health Records• Genetic Data
Sharing Data
• Sharing data today– Data sharing plans required
• Little incentive to actually share– One model: users download data– Yes/No decision on sharing
• Data use agreements across institutions – Pairwise, limited and complicated – Specific to a particular study– Resources for sharing are limited– Security/privacy constraints are hard
for small institutions to follow
Mission
“A national center for biomedical computing that develops new algorithms, open-source tools, computational infrastructure, and services that will enable biomedical and behavioral researchers nationwide to integrate Data for Analysis, ‘anonymization,’ and Sharing”
Vision
• Share access to data and computation– Allow healthcare providers to focus on
care, biomedical researchers to focus on research
– Provide software, platform, and infrastructure
– Protect privacy– Share
• Data• Workflows• Computation• Security• Policies
Models for Data Sharing
• Cloud Storage: data exported for computation
elsewhere– Users download data from the cloud
• Cloud Compute and Virtualization: computation goes to the data
– Users analyze data in the cloud– Users download virtual machines
Three Different Models for Data Sharing
1. Users download data2. Users compute in a central facility3. Users install software that operates on their data and
transmits results of operations (e.g., queries, analyses)
Model 1: Users download data
• “De-identification” may be necessary• Encrypted transmission• Data Use Agreement CentralLawyers from the University of California helped write
– Data Contributor Agreement• Who can have access for what purpose
– Data User Agreement• Terms of use
• iDASH serves as ‘agent’ for the data
Model 2: Users compute in central facility
• Securing the privacy of human subjects data including biometrics such as genomes
• There are known security issues with commercial clouds (business associate liability agreement mitigates some risks)
• A protected cloud compute environment is capable of operating on genomes and clinical data
• We have built this cloud environment in iDASH
Infrastructure Security for Human Subjects Data
• HIPAA (Health Insurance Portability and Accountability Act) compliant computing environment
• Segmentation (Zones) of projects & functionality• Physical and environmental protection of compute hardware• Access control with Two Factor Authentication• Secure (encrypted tunnel) system access and upload
capability• Centralized logging, intrusion detection• Proxies and filters• Hardened (secured) system configurations
Model 3: Computation goes to the data
• Some health systems cannot host data outside their facilities (e.g., VA)
• Software can be sent to those facilities in order to build an overall model (e.g., regression)
University of California Research eXchange UC-ReX
1. UC Davis2. UC Irvine3. UC Los Angeles4. UC San Diego5. UC San Francisco
Funded by the UC Office of the President to the NIH-funded CTSAs
• Integration of Clinical Data Warehouses from 5 University of California Medical Centers and affiliated institutions (>10 million patients)– Aggregate and individual-level patient data
will be accessible according to data use agreements and IRB approval
– Distributed models to adjust for confounders
• Objectives– Monitor patient safety– Improve outcomes– Promote research