31
Developing Novel Data Architectures for Comparative Effectiveness Research Health Care Day, Leadership Tampa April 6, 2011 A. Fenstermacher, Ph.D. & Associate Professor ment of Biomedical Informatics Moffitt Cancer Center & Research Institute

Developing Novel Data Architectures for Comparative Effectiveness Research

Embed Size (px)

DESCRIPTION

Developing Novel Data Architectures for Comparative Effectiveness Research. Health Care Day, Leadership Tampa April 6, 2011. David A. Fenstermacher, Ph.D. Chair & Associate Professor Department of Biomedical Informatics H. Lee Moffitt Cancer Center & Research Institute. - PowerPoint PPT Presentation

Citation preview

Developing Novel Data Architectures for Comparative

Effectiveness Research   Health Care Day, Leadership Tampa

April 6, 2011

David A. Fenstermacher, Ph.D.Chair & Associate ProfessorDepartment of Biomedical InformaticsH. Lee Moffitt Cancer Center & Research Institute

What is Comparative Effectiveness Research?

• Comparative Effectiveness Research– The generation and synthesis of evidence

that compares the benefits and harms of alternative methods to prevent, diagnose, treat and monitor a clinical condition or to improve the delivery of care.

– Provides an opportunity to improve the quality and outcomes of health care by providing more and better information to support decisions by the public, patients, caregivers, clinicians and policy makers

From: Initial National Priorities for Comparative Effectiveness Research, National Academic Press

CER - Not Without Controversy

Total Cancer Care and Patient Centered Outcomes Research

The Model“The purpose of comparative effectiveness research (CER) is to provide information that helps clinicians and patients choose which option best fits an individual patient's needs and preferences.”

Federal Coordinating Council for CER (6/30/2009) Key Statements

The Consent Process

Wireless touch- screen tablet

Connects via secure interface and forwards HIPAA-compliant information to database

Consists of IRB Approved:• Introductory Video• Consent Video by PI• Informed Consent• Signature Capture• Demographics

Survey

Electronic Consenting System

The Total Cancer Care Protocol• Can we follow you throughout your lifetime?• Can we study your tumor using molecular

technology?• Can we recontact you?

Partners in the Fight Against Cancer

18 Consortium Sites

(including MCC)

88,616Consented Patients

MCC (61%) Sites (39%)

33,435Tumors Collected

MCC (37%)Sites (63%)

16,226 Gene Expression

Profiles (TCC Consented since

inception)

Data Generated from Specimens

CEL Files (Gene Expression Data) 16,226 files

Targeted Exome Sequencing 4,016 samples

Whole Exome Sequencing (Ovary, Lung, Colon) 535 samples

Whole Genome Sequencing (Melanoma) 13 samples with normal pairs

SNP/CNV (Lung, Breast Colon) 559 samples

As of 6/01/2012

Total Cancer CareTM to Date

Stratifying Populations for CER

• Stratification means that the investigator has enough knowledge of the population to subdivide the population, and to allocate sampling effort accordingly.

Treatment A Treatment B Treatment C

Non-Small Cell Lung Cancer

StageStage 2 Stage 3 Stage 4

Molecular Stratification• Molecular technologies

– Genomics/Transcriptomics– Proteomics– Metabolomics

Levering Data for Patient Centered Outcomes Research• Observational Clinical Data

– Must assess a comprehensive array of health-related outcomes for diverse patient populations

– Interventions may compare medications, procedures, medical and assistive devices and technologies, diagnostic testing, behavioral change, and delivery system strategies

– This research necessitates the development, expansion, and use of a variety of data sources and methods to assess comparative effectiveness and actively disseminate the results

Issues Curtailing Patient Centered Outcomes Research

• The information gap– Partially due to how the data are collected, whether

by electronic medical records that contain a mixture of discrete and unstructured data or in paper format. A recent survey of U.S. hospitals revealed that only 12% of respondents have a comprehensive EMR and only and additional 6% of clinician offices an EHR1,2.

– An additional hurdle is that only a small portion of patients are ever enrolled in studies that strive to capture information on risk factors, quality of life or other patient-centric parameters that will be essential to supporting personalized medicine.

1DesRoches et al., 2008 New England Journal of Medicine 359(1):50-602Jha et al., 2010 Health Affairs (Millwood) 29(10):1951-1957

Issues Curtailing Patient Centered Outcomes Research

• Although many nomenclatures and data standards exist (SNOMED CT, ICD-9-CM, MedDRA, LOINC, and GO) and are integrated through enterprise vocabulary systems, few healthcare organizations have created enterprise data governance strategies to adopt these standards across their information technology infrastructure.

Issues Curtailing Patient Centered Outcomes Research

• Data to describe the lineage and transformation of clinical and research data once moved from primary data systems (i.e. EMR or LIMS) rarely exist in formats consumable by clinicians, researchers and patients. Also, the lack of data quality standards provides significant challenges on the interpretation and usability of the data.

• No National healthcare ID; patient mobility

Issues Curtailing Patient Centered Outcomes Research

• Architectures of health information systems will be critical to the sharing of data to facilitate personalized medicine and patient centered outcomes research between healthcare providers to attain the information necessary to develop evidence-based guidelines. The two main architectures currently used are a centralized or federated data model.

The Federated Network Data Model

Moffitt and CER

• Creating CER Infrastructure based on Total Cancer Care Model– Enhance the Total Cancer Care

Informatics Infrastructure– Capitalize on biomedical informatics,

biostatistics, clinical trials and information technology expertise

– Assess evolving CER infrastructure using pilot projects

Research Information Exchange

Research Information Exchange

Data Warehouse Enhancements

CER Data Mart

Creating CER Semantics

Infrastructure: Hardware & Software

InformationScience

ResearchProcesses

(CER)

OverallGoals

PhysicalMetadata

ContextualMetadata

• Metadata is simply data about data Distinct classes of metadata required within a DW environment

• Two main classes of metadata • Contextual: relating

to the research processes

• Physical: relating to the DW infrastructure (data lineage, data transformations, etc.)

The Moffitt Data Dictionary

Conceptual DomainAgent

Data Element ConceptChemopreventive Agent Name

Data ElementChemopreventive Agent Name

Value DomainCTEP Drug Names

Valid ValuesCyclooxygenase Inhibitor

DoxercalciferolEflornithine

…Ursodiol

The ISO/IEC 11179 ModelMCC data dictionary, built using ISO/IEC 11179 metadata standards

SNOMED CT ICD-9CM MedDRA LOINCGO

MCC

CER Data Dictionary

Unlocking Clinical Data• Natural Language Processing

• EMR a mixture of data• Discrete Data• Blobs and clobs (text documents, .pdf)• Images – scanned (medical history)

Displaying Ontological-Based NLP Results

Accessing a Wealth of Data

• Effectiveness Score– A quality metric derived from the data

quality project (Attribute Score)– A measurement of that data element’s

correlation to a defined outcome variable– ES scores can be used to simply evaluate the

univariate effectiveness for each element or serve as the input data set for advanced multivariate comparative effectiveness analysis and CER modeling.

Attribute Score• Created Data Quality Metrics Framework, a scoring

system that provides percent weights and scores for each element and for each data quality attribute

E-Score Algorithm

• ES_i = P(Ri_p|U<=pi,H0) * P(Ri_a|W>=ai,H0), whereW is a random variable following the empirical distribution of the AS:

P(W>=A) = (# of AS>= A)/N. U(x) is a test statistic of the data of i-th element (x) such that

U(.)<=1, U(.)>=0, P(U(X)<=a) = a if i-th element

is not significant.

• Interpretation: ﹣P(Ri_p|U<=pi,H0) == the probability that the i-th element has the

highest significance conditional on all the uniformly optimal elements

﹣P(Ri_a|W>=ai,H0) == the probability that the i-th element has the largest AS conditional on all the uniformly optimal elements.

Conclusion

ES_i = [1-(1-pi)M]/(M*pi) *[1-P(W<ai) M]/[M*(1-P(W<ai))].

Data RepresentationInformation from the CER data model can be retrieved and displayed in several formats. The data model includes tables to hold information about CER Projects along with Milestone and Participation data that can be displayed using SQL queries to the database and BIRT generated the reports. The Cmap node links can launch “on demand” reports or present various preformatted documents such as PDF docs, Excel spreadsheets, etc.

Challenges for CER

• To improve patient outcomes and safety new information management systems built on semantic interoperability are required

• Creation of regional consortia that can collect patient-level data (clinical, environmental, risk factor, molecular, and outcomes) and focus on a specific classes of disease, develop research methodologies, create validation networks and encourage partnerships with industry leaders is needed to realize evidence-based approaches

Challenges for Patient Centered Outcomes Research

• Initiatives in comparative effectiveness research need to be developed as validation through clinical trials is not scalable and does not necessarily reflect standard of care where the care is being given

• Data sharing and privacy policies need to become global rather than regional to support Patient Center Outcomes Research

Our Mission and Vision

To contribute to the prevention and cure of cancer&

To be the leader in the discovery, translation, and delivery of personalized cancer care