Upload
robert-lawson
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Getting Clinical Data for Research:
Columbia’s Clinical Data Warehouse
Adam Wilcox, PhDAssociate Professor of Biomedical Informatics
Secondary Data Analysis of Electronic Clinical Data
Benefits
Unobtrusive Fast & inexpensive Easy
Challenges
Availability Quality Security
What data are available?
How good are the data?
How do I get data?
What’s the worst that can happen?
Questions
What’s the worst that can happen?
Names MRNs Addresses Telephone and fax #s SSNs Email addresses Dates Certificate numbers Employers
names/addresses
HIPAA PHI Geographic subdivisions
smaller than state, except initial 3 digits of zip code
Account #s URLs IP addresses Biometric identifiers Full face photographs Any other characteristics
that may be used individually or in combination to identify the individual
Notification of Breach◦ If more than 500 patients, HHS also notified◦ Media
Civil penalties◦ Up to $250,000◦ Repeat violations up to $1.5M
HITECH Penalties
100 110 120 130 140 150 160 170 18060
65
70
75
80
85
90
95
100
What’s the worst that can happen?
1994: Created, sponsored by Columbia University Department of Medical Informatics and Office of Clinical Trials◦ Populated with data from existing clinical data repository◦ Supporting clinical research
1998: Columbia + Cornell = NewYork Presbyterian Hospital◦ Warehouse funded by NYPH◦ Goal to incorporate and provide data across whole system
2004: Formal analysis of CDW user needs by Clinical Quality and Information Technology Committee (CQIT)◦ Creation of Data Warehousing Subgroup◦ Need to bring together disparate clinical data sources◦ Need to manage user requests for data
CUMC/NYP Clinical Data Warehouse History
Patient demographics Visit history Diagnoses Procedures Vital signs Medications Flowsheet elements, structured notes (Notes)
What data are available?
Patients and Visits
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
0
200000
400000
600000
800000
1000000
1200000
1400000
PatientsVisits
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 970
5000
10000
15000
20000
25000
30000
35000
Patient Ages (visits in last year)
AsianBlack/Non-HispanicDeclinedWhite HispanicAmerican IndianOtherPacific IslanderUnknownWhite/Non-HispanicBlack Hispanic
Race/Ethnicity
Female
Male
Sex
Ambula-tory
Surgery
Clinic DPO visit ED Inpatient Therapy0.00E+00
5.00E+05
1.00E+06
1.50E+06
2.00E+06
2.50E+06
3.00E+06
3.50E+06
4.00E+06
4.50E+06
Visit Types (last 5 years)
Data type Count
Diagnoses 3.3M
Procedures 570K
Lab tests
Medications 1.5M
Vital signs ~80% of patients
Flowsheet/structured elements 400M
Notes 6.3M
Other Data (in last year)
I have a WebCIS login
Submit HIPAA D preparatory to research forms
Receive HIPAA approval
Fill out DISCOVERY form to request data
Contact Adam Wilcox
1: Gain access to data(to be updated in coming weeks)
Y
N
2: Explore data using tools &
select variables
Top 50 Variables List &
Meaningful Use variables
De-identified databases:
RedEx I2B2*
3: Request & refine data from Clinical Data Warehouse (CDW)
4: Data management &
analysis
Receive data set
Import & manage data for analysis
using: SAS Stata REDCap AMALGA Other
What level of identifying patient information are you requesting?
Loop back to DISCOVERY for
approval to publish data and findings
Other**
Pin down key variables to submit
via DISCOVERY
De-identified Limited*** Identifiable
Covered by HIPAA G§ Fill out HIPAA B
Receive HIPAA approval
Submit IRB & receive approval
Work with programmer to refine dataShare results with
CER Studio regarding findings &
DISCOVERY process
How do I get data?
DISCOVERY
WICER Community Survey
Household Surveys
Com-munity
Out-reach Center
Ambulatory Clinics
Existing Studies 8,000+ surveys
WICER Research Data Warehouse
Research Data Explorer (RedX)
I2b2 Workbench
WICER CER Studio
Identify priority disparity areas for CER
Integrate statistical expertise via preliminary studies
Validation analyses on cost and service utilization
Identify high-risk physical & mental comorbidities
Integration of data Collection and storage of
patient-reported data Identify individuals based
upon eligibility criteria EHR plug-in Informatics tools to
support data retrieval Intervention delivery De-identify and link
datasets
What data are available?
How good are the data?
How do I get data?
What’s the worst that can happen?
Questions