Upload
satyasanket
View
641
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Health care data is growing at an explosive rate, with highly detailed physiological processes being recorded, high resolution scanning techniques (e.g. MRI), wireless health monitoring systems, and also traditional patient information moving towards Electronic Medical Records (EMR) systems. The challenges in leveraging this huge data resources and transforming to knowledge for improving patient care, includes the size of datasets, multi-modality, and traditional forms of heterogeneity (syntactic, structural, and semantic). In addition, the US NIH is emphasizing more multi-center clinical studies that increases complexity of data access, sharing, and integration. In this talk, I explore the potential solutions for these challenges that can use semantics of clinical data - both implicit and explicit, together with the Semantic Web technologies. I specifically discuss the ontology-driven Physio-MIMI platform for clinical data management in multi-center research studies. Further Details: http://cci.case.edu/cci/index.php/Satya_Sahoo Presentation at: Dagsthul Seminar: Semantic Data Management 2012 Author: Satya S. Sahoo
Citation preview
Awakening Clinical Data: Semantics for Scalable Medical Research Informatics
Satya S. Sahoo Division Medical Informatics
Electrical Engineering and Computer Science Department Case Western Reserve University
Cleveland, OH, USA
Patient Reports
Polysomnograms 1-20GB each
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
500-600MB per patient per stay in EMU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
National Sleep Research Resource: 500 TB
Case Western EMU: 250 TB
Wireless Health Data source: CWRU School of Engineering
MRI: 50-100MB PET: 60-100MB
MRI, PET scans
143, 961 Patients per year (e.g. Emory)
~5.6 billion wireless connections and growing
Big Picture of Data in Clinical Research
Patient Reports
Polysomnograms 1-20GB each
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
500-600MB per patient per stay in EMU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
National Sleep Research Resource: 500 TB
Case Western EMU: 250 TB
Wireless Health Data source: CWRU School of Engineering
MRI: 50-100MB PET: 60-100MB
MRI, PET scans
143, 961 Patients per year (e.g. Emory) • Ultra large volume of data and growing rapidly
• Data is Multi-modal, Heterogeneous • Heterogeneity: Syntactic, Structural, Semantic
~5.6 billion wireless connections and growing
Big Picture of Data in Clinical Research
Patient Reports
Polysomnograms
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
Exemplar: Sleep Medicine Research
Wireless Health Data source: CWRU School of Engineering
MRI, PET scans
Scalability in Medical Informatics: Beyond Volume
Patient Reports
Polysomnograms
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
Exemplar: Sleep Medicine Research
Wireless Health Data source: CWRU School of Engineering
MRI, PET scans
• Multi-Center Studies with differing administrative requirements – business logic
• Dynamic data – grows over project duration • Data Semantics as foundation to support a
wide spectrum of users – clinicians, nurse practitioners, research fellows
Scalability in Medical Informatics: Beyond Volume
A Wish List for Scalable Clinical Data Management
• Reconcile Data Heterogeneity – most critical to successful translational research o Syntactic heterogeneity – less of a problem, data dictionaries
help o Structural heterogeneity – problematic, XML somewhat helpful o Semantic heterogeneity – a huge problem, ontologies to the
rescue? • Provenance – essential for data quality, compliance, insight
o Blood Oxygen Baseline: oxygen saturation during the first 15 or 30 seconds of sleep
o Patient blood report last month cause of change in medication – Domain Provenance (not just tuple provenance)
• Intuitive access to information – clinical trials eligibility, cohort identification
• Scalable - Data sources, research partners added or removed dynamically
A “not to do” list for Clinical Data Management
• No Linked Open Patient Data – HIPAA, HITECH Act (US), Data Protection Act (UK) o De-identified data – IRB approval
• Ontology as global schema – but no RDF o Vast majority as RDB o Practical issues with RDF – cannot be institution-
specific URI (privacy)
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch
Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological and Clinical Research
Sleep Domain Ontology
Any number of
new centers
FMA
OGMS …
SNOMED-CT
Clinical Researcher
Physio-MIMI: Enabling Scalable Medical Research
• NCRR‐funded, multi‐CTSA site project: Sleep medicine as exemplar
• Federated data management – scalable, adapts to changing data access policies
• Ontology-driven: o Data mappings – Ontology class to data dictionary terms
(manually curated) o Drive query interface o Manage provenance
• Privacy aware, IRB-compliant • Collaboration among Case Western, U. of Michigan,
Marshfield Clinic and U. of Wisconsin, Madison o Now Harvard Medical School
Key Resource: Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts
Data Mappings: SDO to Data Dictionary
Physio-Map Module • Visual interface • Stores mappings in XML – moving towards rules • Dynamically executed in response to user query
User Voting
Provenance: Contextual Metadata for Clinical Research
Slide courtesy: Remo Mueller
Provenance: To Trace Variations in Data and Results
Slide courtesy: Remo Mueller
Modified from slide courtesy: Remo Mueller
Provenance: Source information for Patient Data
Slide courtesy: Remo Mueller
Intuitive Query Interface: Ontology (SDO)-driven Visual Aggregator and Explorer (VisAgE)
DataSets
Ontology Concept – Type of Query Widget
PhysioMIMI in National Sleep Research Resource
• National Sleep Research Resource (NSSR) – scored and awaiting funding review
• Collaboration between Harvard Medical School (domain experts) and Case Western (CS) with 15 projects o 50,000 sleep research studies – total size of 500TB
• Semantic Data Integration – SDO and Sleep Provenance Ontology (extending W3C PROV Ontology PROV-O)
• Signal processing tools – using a common format called European Data Format (EDF), XML-based
• Domain analysis, cross-linking – secure Web access
Challenges: Semantics in Large Scale Clinical Data
• Incentives for adopting RDF in clinical data management – what is already not possible in RDB?
• OWL2, RDFS reasoning – Privacy aware reasoning, semantics-aware access control (Nguyen et al. 2012)
• Missing Semantics? o Variable, missing provenance in original study - re-
create provenance with (limited) provenance? o Fine-level granularity for semantic annotation of
signal data – currently not scalable • A little semantics does not go too far in clinical data
o Need for greater involvement of Semantic Web community in development of EHR systems
Acknowledgements • Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi • Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo,
Licong Cui, Chien-Hung Chen, Catherine Jayapandian • Physio-MIMI Team: http://physiomimi.case.edu/ • Contact Information: [email protected],
http://cci.case.edu/cci/index.php/Satya_Sahoo