38
An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine, Indiana University School of Medicine U.S. Population Health Technical Work Group Co- Chair, Health Information Technology Standards Panel

An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Embed Size (px)

Citation preview

Page 1: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

An Overview of Patient Matching

Shaun Grannis, MD MSMedical Informatics Research Scientist,

Regenstrief InstituteAssistant Professor of Family Medicine,

Indiana University School of MedicineU.S. Population Health Technical Work Group Co-Chair,

Health Information Technology Standards Panel

Page 2: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

What We’ll Cover

Definition and Motivation Use Cases Barriers to Accurate Patient Identification Patient Identifier Characteristics Patient Identification Terminology Patient Matching Methodologies Patient Identification Architectures Overview of OpenMRS Patient Matching

Process

Page 3: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

“… Each person in the world creates a book of life. The book starts with birth and ends with death. It’s pages are made up of all the principal events in life. Record linkage is the name given to the process of assembling the pages of this book into one volume. The person retains the same identity throughout the book. Except for advancing age, he is that same person …”

- Dunn, 1946

Patient Matching: Description

Page 4: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching:Synonyms and Definition “Patient Matching” “Patient Linkage” “Record Matching” “Record Linkage” “Identity Management”

Entities are typically individual persons, but can be families, twins, organizations, etc.

Records contain fields describing the entity. These fields can include: “Unique” ID’s, Names, birth dates,

addresses, Sex, Parents’ names, tribe, telephone numbers, etc

Identify records that represent the same entity.

Page 5: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Motivation Clinical information is fragmented

across many independent databases using different identifiers

This situation makes record matching challenging for such uses as:– Public Health/Administrative Reporting– Outcomes management– Vital status determination– Research– Clinical Care

Page 6: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching Use Cases Data Aggregation

Immunization Registry Process Improvement

Newborn screening Process Evaluation

ELR Completeness Reporting/Research (combining datasets to evaluate

outcomes) Cancer rates among Depressed/Anxious Mortality Assessment – Cancer Survival Assessing effects of Maternal EtOH use on fetal

outcomes De-identified Linkage Health Information Exchange

Page 7: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Barriers to Accurate Patient Matching

Recording Errors Phonetic (“Shaun”, “Sean”, “Shawn”) Typographical

(Smith Snith, “07” “01”) Changing Identifiers

Last Name (Marriage) Geographic location (Home address, etc)

Sharing Identifiers (SSN, etc.) Identifiers Limited or Unavailable

Page 8: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Ideal Identifier Characteristics Unique

(eg, fingerprint, Iris, DNA, National ID) Ubiquitous

(eg, Name, DOB, Sex, Eye Color) Unchanging

(eg, DOB, Sex, Given Name, DNA) Uncomplicated

(eg, Name, DOB, Sex) Uncontroversial

(eg, avoid sensitive data)

Easily and Inexpensively Accessible

No identifier meets all of these characteristics

Page 9: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching Terminology True match/True link/True positive

Truly matching records declared to be the same entity False match/False link/False positive

Truly non-matching records declared to be the same entity

True Non-match/True Non-link/True negativeTruly non-matching records not declared to be the same entity

False non-match/False non-link/False negativeTruly matching records not declared to be the same entity

Page 10: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching Terminology

Matc

hin

g S

yste

m

Decla

rati

on

“Truth”True Match True Non-Match

True Match

True Non-Match

True Match

False Non-Match

False Match

True Non-Match

TMTM+FM

“Pos Predictive Value”or “Precision”

TNMTNM+FNM

“Neg Predictive Value”

TMTM+FNM

“Sensitivity”

or “Recall” TNMTNM+FM

“Specificity”

Page 11: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching Terminology Potential Pairs/Potential Links

Record-pairs that have not been declared a match or non-match

Blocking/GroupingMethod to limit search space for potential links, usually by forcing exact match with one or more fields. (Analogous to sorting socks by color before pairing)

Field Agreement Weight/Score Value assigned when two fields are declared to agree

Field Disagreement Weight/Score Value assigned when two fields are declared to disagree

Record Pair Score/Composite Score/Global ScoreValue derived from individual field contributions (typically the product or sum of field weights)

Score Thresholdrecord pair score above which a match is declared and/or below which a non-match is declared

Page 12: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Potential Solutions National Patient Identifier

Recording errors Sharing ID’s Lost ID’s Controversial (in some regions)

Biometrics Require proprietary hardware for all data

generators How secure? Privacy concerns

Page 13: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching Methodologies

ProbabilisticDeterministic Fuzzy Match

Increasing Complexity

Machine Learning

Page 14: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Deterministic ‘Rules-based’ or ‘Heuristic’ Accuracy is highly dependent on

presence of discriminating identifiers (national or local ID, etc)

Rule-based, eg declare a match if exact match on: National ID + DOB Full Name + Address etc.

Page 15: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Fuzzy Match Non-exact agreement, allows for errors:

“If last name agrees on first 6 characters then declare agreement”

“If birth date is within 1 month, then declare agreement”

To loosen agreement, string comparators or phonetic transformation functions may be used: Soundex - Phonetic NYSIIS - Phonetic Levenshtein Edit Distance - Comparator Jaro-Winkler Comparator - Comparator Longest Common Sub-sequence - Comparator

Page 16: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Probabilistic/Machine Learning

Implements a statistical model for matching

A common model is Felligi-Sunter maximum likelihood model

Establish parameters for model using machine learning algorithms (EM) or bootstrap review

Maximum Entropy Model also used

Page 17: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Matching MethodologiesDeterministic/Heuristic Rapid

Implementation Simple calculations Relies on accurate

and consistent data May not generalize

well to other data sets

Probabilistic Complex

implementation Computationally

intensive More forgiving of data

errors Algorithms adapt to

data being linked

Page 18: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Probabilistic (F-S) Example

Among the 10 true-links, the last names agreed in 9/10 pairs (e.g. one of the last names was misspelled)

This represents a 90% AGREEMENT RATE for last name among TRUE LINKS.

Similarly, among the 90 non-links, last names agreed (by random chance) in 2/90 pairs

This represents a 2% AGREEMENT RATE for last name among NON-LINKS.

Page 19: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Probabilistic (F-S) Example

90%

2%

= = 4545 Records that Records that agreeagree on last on last name are name are 45 45 times more likely times more likely to be a to be a true-linktrue-link than a non- than a non-linklink Weights for each field are combined to Weights for each field are combined to

form a composite record pair score.form a composite record pair score. Field disagreement contributes a Field disagreement contributes a

negative weight, and reduces the overall negative weight, and reduces the overall record pair score.record pair score.

Page 20: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Probabilistic (F-S) ExampleGenerate Record-Pairs:

Record ARecord BRecord C

Record X

Record ZRecord Y

File 1File 1 File 2File 2Record ARecord ARecord BRecord BRecord CRecord C

Record XRecord XRecord YRecord YRecord ZRecord Z

Each record pair is assigned a score.A histogram of scores may look like:

Which are true Which are true links?links?

Potential Record Pairs

1 2

Page 21: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Probabilistic Linkage Overview:Human Review Thresholds

Page 22: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Patient Identity Architectures There is no ideal architecture, only best

principles and practices for a particular use case(s) Patient care Reporting/Research Registry clean-up

Potential Architectures: Peer-to-peer

Patient carried Central Index

Page 23: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Peer-to-Peer

No central list of patient demographics

Each participating data source maintains a patient registry

Each source is queried for potential matches; results sets are linked

Page 24: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Peer-to-Peer

Matcher MatcherMatcherMatcher

Query/Matcher

Page 25: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Central Index

• Contains patient identifiers with pointers to clinical data sources.

• No clinical data contained in the repository• Contributing data sources send patient

demographics, matching can be performed in real-time or near real-time

Name Birth Date Sex Source

Smith, Jane 12-Oct-1943 F Public Health

Jones, Fred L 07-Feb-1955 M Hospital A

Smith, Jayne 12-Oct-1943 F Clinic B

Williams, Mary 20-Dec-1968 F Clinic A

Mary, Williams 20-Dec-1968 Hospital B

Jones, Freddy 01-Feb-1955 M Clinic A

Page 26: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Central Index

Clinic AClinic A

Jane Receives Jane Receives Immunizations @ Health Immunizations @ Health

DepartmentDepartment

Data delivered Data delivered to to

immunization immunization registryregistry

Jane Receives Jane Receives Immunizations and other Immunizations and other

care (measurements, care (measurements, labs, diagnoses, etc) @ labs, diagnoses, etc) @

Clinical PracticeClinical Practice

Data delivered Data delivered to EMRto EMR

ImmunizatioImmunization Registryn Registry

Page 27: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Central Index

ImmunizatioImmunization Registryn Registry

Clinic AClinic A

Registry Registry Web Web

InterfaceInterface

EMR EMR InterfaceInterface

??????????????????????

Page 28: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Central Index

ImmunizatioImmunization Registryn Registry

Clinic AClinic A

Patient ID: 123LMNOPPatient ID: 123LMNOPName: Jane Doe Name: Jane Doe DOB: 01/01/04DOB: 01/01/04SSN: N/A SSN: N/A Address: 555 Johnson RoadAddress: 555 Johnson RoadCity: IndianapolisCity: IndianapolisState: IndianaState: IndianaZIP: 46202ZIP: 46202

Patient ID: 6789XYZPatient ID: 6789XYZName: Jane Ellen DoeName: Jane Ellen DoeDOB: 01/01/04DOB: 01/01/04SSN:123-45-6789SSN:123-45-6789Address: 555 Johnson Address: 555 Johnson RoadRoadCity: IndianapolisCity: IndianapolisState: IndianaState: IndianaZIP: 46202ZIP: 46202

Central Central Patient Patient IndexIndex

Global ID:Global ID: 4567845678Name: Name: Jane Ellen Doe Jane Ellen Doe Lots of Demographics..Lots of Demographics..MRF1 ID: MRF1 ID: OU81247OU81247MRF2 ID: MRF2 ID: 45643564564356IMM REG ID: IMM REG ID: 123LMNOP123LMNOPCLINIC A ID:CLINIC A ID: 6789XYZ6789XYZ

Page 29: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Central Index

ImmunizatioImmunization Registryn Registry

Clinic AClinic A

Central Central Patient Patient IndexIndex

Hospital AHospital A Hospital BHospital B

Clinic CClinic C

Clinic BClinic B

ImmunizatioImmunization Registryn Registry

Central Central Patient Patient IndexIndex

Clinic AClinic A

Page 30: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

A Nation-wide Infrastructure of Central Indexes (?)

Page 31: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

OpenMRS Patient Matching: Overview

OpenMRS

Record LinkageModule

2. Operational API Component:- Incoming data is preprocessed and

validated (Case normalized, Fields validated)

- Potential pairs are formed (blocking) and scored (recently implemented frequency scaling through Google Summer of Code)

- Post-processing (detect twins/familial linkages that may represent false matches)

1. Analytic API Component:- Fields are examined for NULL values/default

values (1900, ‘JOHN DOE’, etc) - Data sources to be linked are analyzed to

customize probabilistic matching parameters

- Threshold match scores are established- Blocking variables established

1

2

Page 32: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

1. Inbound HL7 Registration or Results message

2. Linking Fields validated, cleaned (Name, DOB, etc)

3. Record Passed to Linkage Module4. Potential Pairs Scored using Felligi

Sunter probabilistic model, returned to OpenMRS registration handler

OpenMRS Patient Matching: Overview

Page 33: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,
Page 34: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

An Overview of Patient Matching

Shaun Grannis, MD MSMedical Informatics Research Scientist,

Regenstrief InstituteAssistant Professor of Family Medicine,

Indiana University School of MedicineU.S. Population Health Technical Work Group Co-Chair,

Health Information Technology Standards Panel

Questions?

Page 35: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Bibliography - Theory Fellegi IP, Sunter SB. (1969). A Theory for Record Linkage. Journal

of the American Statistical Association, 64(328), 1183-1210. Dunn HL. (1946) Record Linkage. Am J Public Health. 36, 1412-

1416. Newcombe HB. (1988) Handbook of Record Linkage, Methods for

Health and Statistical Studies, Administration, and Business. Oxford University Press.

Newcomb HB, Kennedy JM. Axford SJ, James AP. (1959) Automatic Linkage of Vital Records. Science, 130, 954-959.

Gill, L., Methods for Automatic Record Matching and Linking and their use in National Statistics. Her Majesty’s Stationary Office, Norwich, 2001.

Porter E, Winkler W. Approximate String Comparison and its Effect on an Advanced Record Linkage System. Record Linkage Techniques--1997: Proceedings of an International Workshop and Exposition. National Academy Press, Washington DC 1999.

Public Health Informatics Institute. The unique records portfolio. Decatur, GA: Public Health Informatics Institute, 2006.

Page 36: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Bibliography:Applications and Research (1)

Christen P. Febrl: A freely available record linkage system with a graphical user interface. Submitted to the Australasian Workshop on Health Data and Knowledge Management (HDKM), Wollongong, January 2008.

Potosky A, Riley G, Lubitz J, et al. Potential for Cancer Related Health Services Research Using a Linked Medicare-Tumor Registry Database. Medical Care 1993;31(8):732-748.

Whalen D, Pepitone A, Graver L, Busch JD. Linking Client Records from Substance Abuse, Mental Health and Medicaid State Agencies. SAMHSA Publication No. SMA-01-3500. Rockville, MD: Center for Substance Abuse Treatment and Center for Mental Health Services, Substance Abuse and Mental Health Services Administration, July 2000.

Liu S, Wen SW. Development of Record Linkage of Hospital Discharge Data for the Study of Neonatal Readmission. Chronic Diseases in Canada 1999; 20(2):77-81.

Pates R, Scully W, et al. Adding Value to Clinical Data by Linkage to a Public Death Registry. MedInfo 2001;10(Pt 2):1384-8

Page 37: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Bibliography:Applications and Research (2) Lynch BT, Arends WL. Selection of a surname coding procedure

for the SRS record linkage system. Washington, DC: US Department of Agriculture, Sample Survey Research Branch, Research Division, 1977.

Newman T, Brown A. Use of Commercial Record Linkage Software and Vital Statistics to Identify Patient Deaths. J Am Med Inform Assoc. 1997 May-June; 4 (3): 233-237.

Schadow G, McDonald CJ Maintaining Patient Privacy in a Large Scale Multi-Institutional Clinical Case Research Network. AMIA Proceedings (2002 Submission).

Public Health Informatics Institute. (2006). The Unique Records Portfolio. Decatur, GA: Public Health Informatics Institute

Sideli R, Friedman C. Validating Patient Names in an Integrated Clinical Information System. Symposium on Computer Applications in Medical Care, Washington, DC. November 1991:588-592.

Page 38: An Overview of Patient Matching Shaun Grannis, MD MS Medical Informatics Research Scientist, Regenstrief Institute Assistant Professor of Family Medicine,

Bibliography:Applications and Research (3) Miller PL, Frawley SJ, Sayward FG. IMM/Scrub: a domain-specific tool for the

deduplication of vaccination history records in childhood immunization registries. Computers and Biomedical Research 2000;33:126–143.

Salkowitz SM, Clyde S. De-duplication technology and practices for integrated child-health information systems. Decatur, GA: All Kids Count, Public Health Informatics Institute, 2003.

Van Den Brandt PA, Schouten LJ, Goldbohm RA, Dorant E, Hunan PMH. Development of a record linkage protocol for use in the Dutch Cancer Registry for epidemiological research. Int J Epidemiol 1990; 19:553-8.

Grannis SJ, Overhage JM, McDonald CJ. Analysis of Identifier Performance Using a Deterministic Linkage Algorithm. Proc AMIA Symp 2002:305-9.

Grannis SJ, Overhage JM, McDonald CJ. Analysis of a Probabilistic Record Linkage Technique without Human Review. In: Proceedings of American Medical Informatics Association Fall Symposium; 2003; Washington, D.C.; 2003.

Integrating the Health Care Enterprise. (2006) Patient Identifier Cross-Reference (PIX) and Patient Demographic Query (PDQ) HL7 v3 Transaction Updates. Available at: http://www.ihe.net/Technical_Framework/upload/ IHE_ITI_TF_Suppl_PIXPDQ_HL7v3_PC_2006_08_15.pdf