Upload
nancy-elliott
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Ida Sim, MD, PhD
February 22, 2005
Division of General Internal Medicine, and Program in Biological and Medical Informatics
UCSF
Electronic Health Records for Clinical Research
Copyright Ida Sim, 2005. All federal and state rights reserved for all original material presented in this course through any medium, including lecture or print.
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Electronic health records (EHR)– clinical benefits
• reduction in medical errors, prescription errors• supports quality improvement programs
– research benefits• “Frankly, one of the biggest attractions to LastWord (now called
CareCast) is going to be a boon to clinical research. Information will be accessible in a much more uniform and complete way.” ex-Dean Haile Debas, Daybreak, Feb. 2, 2001
• UCSF spending $50 mil on CareCast• How real is the promise of EHRs for research ?
The Promise
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Understand key properties of useful electronic health records and data warehousing– free vs. coded entry– importance of a standardized clinical vocabulary
• Understand implications of database technologies on clinical research
• Be familiar with basic concepts in data security and privacy (time permitting)
Learning Objectives
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Example Study– a single-institution outcomes research question
• Electronic Health Records (EHRs)– relational databases– vocabulary
• Data Warehousing• (Security and Privacy)
Outline
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Retrospective analysis• Compare 1 year re-admission rate for acute MI for
– diabetics admitted with acute MI, discharged • on -blockers• not on -blockers
• First acute MI in 2000 to 2002, followup to 2004
An Outcomes Research Project
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Find diabetics admitted with AMI‘2000 to‘2002• Find whether D/C’ed on -blocker• For these patients, find all re-admissions in the year
following the index MI– identify re-admissions that were for acute MI
• Analyze– predictor = -blocker status– primary outcome = acute MI readmission rate– secondary outcome = length of stay (LOS)
Study Steps
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Data needed– admission: Admission Discharge Transfer system– diabetes diagnosis: chart, HgbA1C– MI diagnosis: chart, troponins, EKG readings
• or just trust coding of admission diagnosis?
-blocker usage: orders, pharmacy
• Existing (legacy) systems– claims, pharmacy, ADT, lab, xray, med record, etc
Health System Minnesota: 50 paper, 50 computer
200,000 lives, 460 physicians
Health System Minnesota: 50 paper, 50 computer
200,000 lives, 460 physicians
Data Needed for -Blocker Study
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Pros Cons
ChartReview
ElectronicHealthRecord
Data Collection Method
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• EHR provides individual patient data for– real-time clinical care – reimbursement (eg for E&M coding)– see table for major functionality dimensions
• Clinical workstation includes interfaces to– practice management systems– pharmacy benefit management– knowledge resources (e.g., WWW, guidelines)
• “EHRs” range from flat file, text-based systems to full-featured workstations
What is an EHR?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
8 Types of EHR Functionality
Viewing Electronic viewing of chart notes, problem and medication lists, dischargesummaries, laboratory results, and radiology results.
Documentation Entry of visit note and other information into the EMR, whether throughdictation or direct keyboard entry.
Order Entry Electronic physician order entry of drug prescriptions, laboratorytests, radiology studies, or referrals.
Care Planningand Management
Managing patients in disease management programs, such as for asthma orcongestive heart failure
Patient-Directed Patient education materials; web-based education modules, self-diagnosisalgorithms, patient-viewing of EMR data, and e-mail with care providers
Billing and OtherAdministrative
Determination of insurance eligibility, assistance with visit level coding,management and tracking of referrals.
PerformanceReporting
Quality and utilization reporting to both internal and external audiences
Messaging E-mail or other messaging system among providers and staff within theorganization, or to external organizations
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Physician friendliness– if docs won’t use it, it won’t help research
• What data it contains• How that data is stored (and retrieved)• Security• Cost, maintenance, technical support, etc
Critical EHR Features
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Workflow compatible– portable
• Easy data entry– voice-recognition– pen-based (PDAs)– digital ink
• Preserves doctor-patient relationship
• Secure Fujitsu 510
Physician Friendliness
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Structure: should store contents in relational form– lots of one-to-many relationships in clinical care
• Contents– type of content
• real-time clinical care– notes, orders, labs, prescriptions, xray (reports)...
• administration– demographic, billing, provider IDs...
• research– standardized data collection, symptom scales, etc
– form of content• free text versus coded entries
EHR Structure and Contents
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Relational Admissions Database MasterTable
ID Name Sex Birthdate Insurance000-01-001 Lee M 09-Jul-00 B/T Healthnet000-01-002 Smith F 22-Oct-25 Medicare000-01-003 Perez F 13-Jun-57 B/T Pacificare
AdmissionNumberTableAdm# ID Admit Date Discharge
Date001 000-01-001 31-Dec-94 12-Jan-95002 000-01-001 27-Mar-96 31-Mar-96003 000-01-002 03-Feb-95 16-Feb-95004 000-01-002 27-Feb-95 20-Mar-95005 000-01-003 19-Nov-97 23-Nov-97
AdmissionTableAdm# Admit
ServiceAdmit
DiagnosisPrincipalDischargeDiagnosis
001 Med Acute MI Acute MI002 Med COPD Pneumonia003 Surg THR THR004 Med Acute MI Acute MI005 Gyn Menorrhagia von Willebrand's
Secondary Discharge Diagnosis TableAdmission # Secondary Discharge Diagnoses
001 COPD001 Diabetes002 COPD003 Acute MI004 VF Arrest005 Diabetes
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
What Goes Into the Table Cells?
• Column names help narrow the meaning– “Pneumonia” entry in AdmitDiagnosis column, vs.– “Pneumonia” entry in PastHistory column
• Free text vs. coded entries– “Pneumococcal pneumonia”, “Pn PNA”, or “RLL PNA”– Pneumonia: Yes, Organism: Pneumococcus, Location:
Right Lower AdmissionTable
Adm# AdmitService
AdmitDiagnosis
PrincipalDischargeDiagnosis
001 Med Acute MI Acute MI002 Med PNA-1 Pneumonia003 Surg THR THR004 Med Acute MI Acute MI005 Gyn Menorrhagia von Willebrand's
PneumoniaTablePNA # Organism LocationPNA-1 Pneumococcus Right Lower
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• A term is a designation of a concept or an object in a specific vocabulary
• e.g., English blood = German blut
• Standardization required for communication– acts like a dictionary
• DGIM tried to use STOR to pull out all CHF patients for quality improvement program but terms used were too varied
– CHF, LVF, heart failure, etc.
• Vocabularies are collections of terms– general standardized: ICD-9, CPT, MeSH– research-domain specific: for cancer, diabetes, etc...– your own data dictionary
Standardization of Clinical Terms
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Costs vs. Benefits of Coding
• The more coded and more structured your data, the more powerful computing you can do – because the computer can “understand” more
• But coding and structuring costs time and effort– e.g., selecting billing codes for outpatient practice– tiresome to pick codes for clinical care, let alone for
even more specific codes needed for research• Tradeoff between
– costs of more coding and structuring, and– benefits to accrue from “smarter” computing– for both care and research
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Notable Clinical VocabulariesVocabulary Name Domain Use
SNOMED-CT Standardized Nomenclatureof Medicine
ClinicalMedicine
EHRDocumentation
MeSH Medical Subject Heading BiomedicalIndexing
BibliographicRetrieval
ICD-9 International Classificationof Diseases
Diseases Billing
CPT Current ProceduralTerminology
MedicalProcedures
Billing
DSM-IV Diagnostic and StatisticalManual of Mental Disorders
Pyschiatry Billing,Nosology
LOINC Logical ObservationIdentifier Names and Codes
Labs Lab systems,Billing
READ Read Clinical Classification ClinicalMedicine
EHRs in the UK
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Dangers of ICD-9 Coding• VBAC uterine rupture rate
– 665.0 and 665.1 ICD-9 discharge codes used in study (NEJM 2001;345:3-8)
– letter to editor: in 9 years of Massachusetts data• 716 patients with 665.0 and 665.1 discharged• reviewed 709 charts• 363 (51.2%) had actual uterine rupture
– others had incidental extensions of C-section incision, or were incorrectly coded or typed
• 674.1 (dehiscence of the uterine wound) used to code another 197 ruptures (or 35% of confirmed cases of uterine rupture)
• i.e., sensitivity 65%, specificity 51.2%
• Administrative codes are not ideal for research
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
ICD-9 Concept Coverage
• How well would ICD-9 do in capturing a medical chart?
• Inpatient and outpatient charts from 4 medical centers abstracted into 3061 concepts [Chute, 96]
– diagnoses, modifiers, findings, treatments and procedures, other
• Matching: 0=no match, 1=partial, 2=complete– 1.60 for diagnoses– 0.77 overall– ICD-9 augmented with CPT: overall 0.82
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
SNOMED-CT
• 364,400 health care concepts; 984,000 descriptions• Formally constructed terminology
– 18 high-level hierarchies • e.g. finding, organism, substance, body structure, event, social
context– each concept can be described by many attributes
• e.g., finding site = lung, associated-morphology = inflammation– encodes “knowledge”
• pneumonia is an infection of the lung by an organism– can build up “post-coordinated” concepts to increase
expressive power• pneumonia: finding-site=lung ; finding-site=lower lobe;
laterality=right; causative agent=pneumococcus;
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Using SNOMED-CT
• Semantic coverage– best coverage of all clinical vocabularies– test of 1603 concepts in 20 opthalmology cases
• SNOMED-CT 1.625 +/- 0.667 (0=no match, 1=partial, 2=complete)
• ICD9-CM 0.280+/-0.619
• Site-licensed (i.e., is free) to entire U.S. as of 2004– the de facto standard for EHR clinical vocabulary
• Coding barriers– how to get docs to reliably pick the right code out of 364,000??– coded data entry biggest barrier to more computable EHRs
• Financial barriers– was $50K per site, now free but site license is only till 2009
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Research Data Dictionaries
• Research data dictionaries are lists of study variables and their definitions
• Standardization of data dictionaries facilitates data sharing, merging, and meta-analysis
• Terms in data dictionaries should ideally come from a standard clinical vocabulary– e.g., SOB? shortness of breath? breathlessness?
• ICD-9: Dypsnea and other respiratory abnormalities (786.0)• CPT: no matching concept or term• SNOMED: Dypsnea
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Notable Research Data Dictionaries
• Defined by clinical domain– e.g., Common Data Elements (CDE, from the NCI)
• standardized variables for breast, lung, cervical, prostate CA• http://ncicb.nci.nih.gov/CDEBrowser/
– e.g., HCFA’s MedQuest modules • a fib, CHF, diabetes, pneumonia, orthopedics, etc.
• Defined by a research community– e.g., NCI, UCSF
CDE Example #1
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
CDE Example #2• Menopausal Status: “Indication of whether a
woman is potentially fertile or not.” • Allowed values:
Post (Prior bilateral ovariectomy, OR >12 mo since LMP with no prior hysterectomy and not currently receiving therapy with LH-RH analogs [eg. Zolades])
Post (Prior bilateral ovariectomy, OR >12 mo since LMP with no prior hysterectomy)
Pre (<6 mo since LMP AND no prior bilateral ovariectomy, AND not on estrogen replacement)
Above categories not applicable AND Age < 50Above categories not applicable AND Age >=50
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
EHR for Research Summary
• An EHR is not automatically going to help clinical research– if it’s all unstructured free text, it won’t help much at
all• the more structured it is (ie more defined fields), the better
– if it’s just coded sporadically in ICD-9• problem with gamed codes• poor coverage of many clinical concepts
– if it’s coded in SNOMED• some clinical concepts still not well covered
• EHR better than chart review; can we do even better?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Sample Study– a single-institution outcomes research question
• Electronic Medical Records (EHRs)– relational databases– vocabulary
• Data Warehousing• Security and Privacy
Outline
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Types of Queries
• Clinical care• What was Mr. Smith’s last
potassium?• Does he have an old CXR
for comparison?• What antihypertensives
has he been on before?• What did the neurology
consult say about his epilepsy?
• Research• What % of diabetics with
AMI admissions were discharged on -blockers?
• What was the average Medicine length of stay in 2004 compared to 2000?
• What is the trend in use of head CTs in patients with migraine?
• Is admission creatinine independent predictor of bacteremia outcomes?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
MICU
FinanceResearch
QA
Data Warehouse
Internet
ADT Chem EHR XRay PMB Claims
• Integrated historical data common to entire enterprise
What is a Data Warehouse?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Types of Data Warehouses
• A data warehouse is just a collection of data from other databases– is itself just a database
• Two somewhat distinct types– clinical data repository
• collects data from day-to-day clinical care, admin data, etc.• for quality improvement, outcomes research, business decision
making…– research data repository
• collects data from multiple research projects• may also collect data from day-to-day clinical care, admin
data, etc.
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Data Warehouses: Hype and Hope
• Touted for– business decision making– health care quality improvement– outcomes research– genotype-phenotype correlations for translational research
• UCSF Clinical and Genomic Information Management (CGIM) database (now defunct)– was a $4-6 million partnership with IBM– goal: a single repository of research data from all UCSF research
projects, plus data from STOR, radiology, CareCast etc. to enable • correlation of clinical, genomic, imaging, etc data across data sets
• Stanford’s STRIDE research data repository– CareCast and research data
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Need many types of data for research and QI• E.g., for our outcomes study, need
– admission: ADT (admission/discharge/transfer) system– diabetes diagnosis: e-chart, HgbA1C– MI diagnosis: e-chart, troponins, EKG readings– -blocker usage: online ordering, pharmacy system
• Existing (legacy) systems– claims, pharmacy, ADT, lab, xray, med record, etc– HealthSystems Minnesota with 50 computer systems, 50
paper systems Health System Minnesota: 50 paper, 50 computer
200,000 lives, 460 physicians
Health System Minnesota: 50 paper, 50 computer
200,000 lives, 460 physicians
Why are Data Warehouses Useful?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Extract data from legacy systems• Clean data and feed it to warehouse• Allow ad hoc use
– data query, data mining, data analysis
• Service users– modify data content based on queries– provide standard reports– provide alerts to trends
Data Warehousing Procedure
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• For uploading data to warehouse– a physical connection between the computers– common data transmission protocols
• e.g., HL-7– common database communication protocol
• e.g. SQL over TCP/IP (the telnet protocol)
• For sharing and merging– common data schema
• type (e.g., relational)• data modeling (i.e., column names)
– common naming of data items• eg., “PNA” vs. “pneumonia”
Prerequisites for Data Warehouse Construction
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Requires physical networking and transmission standards (protocols)
MICU
FinanceResearch
QA
Warehouse
Internet
ADT Chem EHR XRay PMB Claims
Networking
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Health-Level 7 (HL-7) – “original HL7”
– HL7 RIM: full object model of all of health care• Digital Imaging and Communications in Medicine
(DICOM)– common data exchange format for medical images
Health-Specific Network Protocols
MSH|…message headerPID|…patient identifier<!-OBX…observation result>OBX|1|ST|84295^NA||150|mmol/l|136-148|H||A|F|19850301<CR> OBX|2|ST|84132^K+||4.5|mmol/l|3.5-5|N||N|F|19850301<CR> OBX|3|ST|82435^CL||102|mmol/l|94-105|N||N|F|19850301<CR> OBX|4|ST|82374^CO2||27|mmol/l|24-31|N||N|F|19850301<CR>
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• For uploading data to warehouse– a physical connection between the computers– common data transmission protocols
• e.g., HL-7– common database communication protocol
• e.g. SQL over TCP/IP (the telnet protocol)
• For sharing and merging– common data schema
• type (e.g., relational)• data modeling (i.e., column names)
– common naming of data items• eg., “PNA” vs. “pneumonia”
Prerequisites for Data Warehouse Construction
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
MICU
FinanceResearch
QA
???
Internet
ADT Chem EHR XRay PMB Claims
Data Warehouse Contents
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
UCSF “CGIM” Example
• Standard coding vocabulary? data representation?• Are queries mostly within or across projects? ongoing or
completed projects or both?• Need administrative data (e.g., insurance)?
(in SNOMED-CT) xrays/CT/MRImicroarray data(in MAGE-OM) (in DICOM)
•Breast CA (not DCIS)•Menopause
•Osteoporosis (Heel US)•Menopause
Project 1
DB 1
Project 2
DB 2
Project 3
DB 3
Project 4
DB 4
•Osteoporosis (DXA)•Menopause
•Breast CA (DCIS ok)•Alzheimers (path)
CGIM
Data mining/Display ToolsRadiologySTOR
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
What’s the Warehouse for?
• Clinical care• What was Mr. Smith’s last
potassium?• Does he have an old CXR
for comparison?
• Research• What % of diabetics with
AMI admissions were discharged on -blockers?
• What was the average Medicine length of stay in 2004 compared to 2000?
• Use same schema for warehouse and EHR?– should depend on anticipated queries
• Anticipated use has huge implications for design (and eventual worth) of warehouse– if you don’t know what you want, no technology will give it to you
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Clinical Data Warehouse Schema Discharge Diagnoses
DischargeDiagnosis Admission #
LOS Service Team Attending
Acute MI 001 13 Med II RedAcute MI 004 22 Med I BlueTHR 003 14 Surg III BronzeCOPD 002 5 Med II WhiteMetrorrhagia 005 4 Gyn A Buff
Discharge Meds for AMI Admissions Table
Admission #Aspirinon D/C
Beta-Blockeron D/C
Statinon D/C
ACE Inhibitor onD/C
001 ASA 325 mg QD Atenolol 50 mgQD
Simvastatin 20 mgQD
Lisinopril 10 mgQD
004 ECA 81 mg QD Metoprolol 100mg BID
Atorvastatin 40 mgQD
Ramipril 5 mg QD
• diagnoses would be ICD-9 codes• one warehouse for both clinical improvement and research?
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Choosing a Vocabulary
• For an EHR– billing: ICD-9, CPT, Read code (in UK)– clinical data capture: SNOMED-CT best – research: any is better than none!
• For your own research databases– if standard domain-specific data dictionary exists, use it– if not, ideally use a standard clinical vocabulary
• often ICD-9 or CPT, or SNOMED
– try not to be defining your own terms and your own definitions
• upfront work will make it easier to share data later…
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Examples from the UK• UK practices contribute to research databases
– e.g., General Practice Research Database• Features that make them work
– National Health System covering treatment, prescriptions, etc.– 90% of UK practices use EHRs– Dept. of Health mandates Read code for all general practice– relatively simple “research warehouse” data structure
• registration, prescription, problems/diagnoses, notes files
• Weaknesses/cautions– biases in patients and/or practices included or excluded– completeness and accuracy of reporting (e.g., 90% sensitive for
DM)– lots of information in free text in the notes files (e.g., specialist
referrals, drug dosing instructions)– needs dedicated staff and resources to maintain
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Data Warehouse Summary
• Enterprise viewpoint more appropriate for research than patient viewpoint of EHR
• Integrates data from multiple sources– need standardization of codes, definitions, and data
formats• Schema can evolve to optimize for analytic needs
– can make or modify tables off of legacy systems• Querying and processing occurs “offline”
– little impact on real-time clinical care
Viewpoint Time Queries
EHR Patient Real-Time ClinicalData Warehouse Enterprise Historical Ad Hoc
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Compare 1 year re-admission rate for acute MI in diabetics discharged on -blockers or not– data captured in EHR and other databases– data aggregated in data warehouse– you query the data warehouse — NOT YET….
Study Steps Using EHR
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Sample Study– a single-institution outcomes research question
• Electronic Medical Records (EHRs)– relational databases– vocabulary
• Data Warehousing• Security and Privacy
Outline
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Privacy vs. Security
• Security (a technical feature)– confidentiality
• ensuring that only authorized persons can read or copy information
– encryption of data during transmission impedes eavesdropping only
– integrity• ensuring that information is modified only in appropriate ways
– availability• ensuring that information is not made inaccessible
• Privacy (a legal concept) -- see HIPAA– right to keep personal information from outside world
• study nurse, data entry clerk, investigator, database administrator, etc may be authorized to see data but may disclose it inappropriately
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Physical security– firewalls
• Encryption– public/private keys
• People security– authority– authentication – access– audit
Internet
Firewall
Network Security
itsa
jaundice
ucsf.edu
LAN
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Authentication – are you who you say you are?
• use passwords, biometrics (e.g., retinal scan), smartcards
• Authority– do you have a need to know?
• different levels of data access for different users• Access
– how to allow only authenticated users to perform authorized activities on authorized data?
• Audit– record of who actually got into what
People Security
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
De-identification Isn’t Easy• 87% of the American populace can be uniquely
identified by only [Sweeney, L. ‘97]
– date of birth• in room of 23 people, what is chance that 2 people will share
the same birthday (independent of year of birth)?• http://www.people.virginia.edu/~rjh9u/birthday.html
– gender– five-digit ZIP code– easy to find someone’s info if you’re looking for it;
harder to find out who’s info it is that you have• Anonymizing databases does not remove your
duty to enforce security and safeguard privacy
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
Summary of Privacy & Security
• Computing/network infrastructure can deal with security– but privacy is a policy matter
• Anonymizing of databases helps but it isn’t foolproof
• In general, people are the weakest security and privacy link
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• Compare 1 year re-admission rate for acute MI in diabetics discharged on -blockers or not– data captured in EHR and other databases– data aggregated in data warehouse– you request IRB approval– you are authorized to to conduct HIPAA-compliant
search (e.g.,. Limited Data Set) in data warehouse– audit trail of queries are maintained
Outcomes Research Project
February 22, 2005: I. Sim EHRs and ResearchEpi 206 — Medical Informatics
• EHR does not always = easier clinical research• Structure and coding is critical
– structure: e.g., relational schema, designed to support intended queries
– coding: standardized, coded data trumps free text• especially important for research• but most standardized vocabularies have insufficient clinical
coverage– standard formats needed for genomic, imaging, etc. data
• Clinical/Research data warehouses could be useful for research but must be designed “correctly” with high-quality, cross-compatible data
Take-Home Points