5
A Randomized Controlled Trial of the Accuracy of Clinical Record Retrieval using SNOMED-RT as Compared with ICD9-CM Peter L. Elkin, MD, Alexander P. Ruggieri, MD, Steven H. Brown, MD, James Buntrock, Brent A. Bauer, MD, Dietlind Wahner-Roedler, MD, Scott C. Litin, MD, Julie Beinborn, Kent R. Bailey, PhD, Larry Bergstrom, MD Abstract: Background: Concept-based Indexing is purported to provide more granular data representation for clinical records" '2 This implies that a detailed clinical terminology should be able to provide improved access to clinical records. To date there is no data to show that a clinical reference terminology is superior to a precoordinated terminology in its ability to provide access to the clinical record. Today, ICD9-CM is the most commonly used method of retrieving clinical records. Objective: In this study, we compare the sensitivity, specificity, positive likelihood ratio, positive predictive value and accuracy of SNOMED-RT vs. ICD9-CM in retrieving ten diagnoses from a random sample of2,022 episodes ofcare. Method: We randomly selected 1,014 episodes of care from the inpatient setting and 1,008 episodes of care from the outpatient setting. Each record had associated with it, the free text final diagnoses from the Master Sheet Index at the Mayo Clinic and the ICD9-CM codes used to billfor the encounters within the episode of care. The free text diagnoses were coded by two expert indexers (disagreements were addressed by a Staff Clinician) as to whether queries regarding one of 5 common or 5 uncommon diagnoses should return this encounter. The free text entries were automatically coded using the Mayo Vocabulary Processor. Each of the ten diagnoses was exploded in both SNOMED-RT and ICD9-CM and using these entry points, a retrieval set was generated from the underlying corpus of records. Each retrieval set was compared with the Gold Standard created by the expert indexers. Results: SNOMED-RTproduced significantly greater specificity in its retrieval sets (99.8% vs. 98.3%, p<0.001 McNemar Test). The positive likelihood ratios were significantly better for SNOMED-RT retrieval sets (264.9 vs. 33.8, p<0.001 McNemar Test). The positive predictive value of a SNOMED- RT retrieval was also significantly better than ICD9- CM (92.9% vs. 62.4%, p<O.001 McNemar Test). The accuracy defined as 1 - (the total error rate (FP+FN) / Total # episodes queried (20,220)) was significantly greater for SNOMED-RT (98.2% vs. 96.8%, p=0.002 McNemar Test). Interestingly, the sensitivity of the SNOMED-RT generated retrieval set was not significantly differentfrom ICD9-CM, but there was a trend toward significance (60.4% vs. 57.6%, p=0.067 McNemar Test). However, if we examine only the outpatient practice SNOMED-RT produced a more sensitive retrieval set than ICD9- CM (54.8% vs. 46.4%/, p=0.002 McNemar Test). Conclusions: Our data clearly shows that information regarding both common and rare disorders is more accurately identified with automated SNOMED-RT indexing using the Mayo Vocabulary Processor than it is with traditional hand picked constellations of codes using ICD9-CM. SNOMED-RT provided more sensitive retrievals of outpatient episodes of care than ICD9-CM. Introduction: ICD9-CM is the current standard for billing and morbidity indexing in the United States.3 This reporting responsibility makes the assignment of ICD9-CM codes a necessity for most or all healthcare organizations. The availability of this coded data has lead to the use of ICD9-CM codes for research involving the clinical record. The availability of clinical reference terminologies such as SNOMED- RT, offers the potential to provide much more granular coding of the data found within the clinical recorded' This advantage is achieved by compositional encoding of health concepts.6 This means that a clinician or system can put together multiple concepts in order to form a more detailed clinical expression."' An example of this is, "Cellulitis of the Left foot with Osteomyelitis of the third Metatarsal without Lymphangitis." This notion is represented by an expression containing six concepts within SNOMED-RT. The ability to record arbitrary complex structure within a clinical reference terminology holds the promise to significantly address the problem of content completeness.9 1067-5027/01/$5.00 C 2001 AMIA, Inc. 159

using SNOMED-RT as Compared with ICD9-CM

  • Upload
    dangnhu

  • View
    230

  • Download
    0

Embed Size (px)

Citation preview

Page 1: using SNOMED-RT as Compared with ICD9-CM

A Randomized Controlled Trial of the Accuracy of Clinical Record Retrievalusing SNOMED-RT as Compared with ICD9-CM

Peter L. Elkin, MD, Alexander P. Ruggieri, MD, Steven H. Brown, MD, James Buntrock,Brent A. Bauer, MD, Dietlind Wahner-Roedler, MD, Scott C. Litin, MD, Julie Beinborn,

Kent R. Bailey, PhD, Larry Bergstrom, MD

Abstract:

Background: Concept-based Indexing is purportedto provide more granular data representation forclinical records"'2 This implies that a detailedclinical terminology should be able to provideimproved access to clinical records. To date there isno data to show that a clinical reference terminologyis superior to a precoordinated terminology in itsability to provide access to the clinical record.Today, ICD9-CM is the most commonly used methodofretrieving clinical records.

Objective: In this study, we compare the sensitivity,specificity, positive likelihood ratio, positivepredictive value and accuracy of SNOMED-RT vs.ICD9-CM in retrieving ten diagnosesfrom a randomsample of2,022 episodes ofcare.

Method: We randomly selected 1,014 episodes ofcarefrom the inpatient setting and 1,008 episodes ofcare from the outpatient setting. Each record hadassociated with it, the free text final diagnoses fromthe Master Sheet Index at the Mayo Clinic and theICD9-CM codes used to billfor the encounters withinthe episode of care. The free text diagnoses werecoded by two expert indexers (disagreements wereaddressed by a Staff Clinician) as to whether queriesregarding one of 5 common or 5 uncommondiagnoses should return this encounter. Thefree textentries were automatically coded using the MayoVocabulary Processor. Each of the ten diagnoseswas exploded in both SNOMED-RT and ICD9-CMand using these entry points, a retrieval set wasgenerated from the underlying corpus of records.Each retrieval set was compared with the GoldStandard created by the expert indexers.

Results: SNOMED-RTproduced significantly greaterspecificity in its retrieval sets (99.8% vs. 98.3%,p<0.001 McNemar Test). The positive likelihoodratios were significantly better for SNOMED-RTretrieval sets (264.9 vs. 33.8, p<0.001 McNemarTest). The positive predictive value of a SNOMED-RT retrieval was also significantly better than ICD9-CM (92.9% vs. 62.4%, p<O.001 McNemar Test). The

accuracy defined as 1 - (the total error rate(FP+FN) / Total # episodes queried (20,220)) wassignificantly greater for SNOMED-RT (98.2% vs.96.8%, p=0.002 McNemar Test). Interestingly, thesensitivity of the SNOMED-RT generated retrievalset was not significantly differentfrom ICD9-CM, butthere was a trend toward significance (60.4% vs.57.6%, p=0.067 McNemar Test). However, if weexamine only the outpatient practice SNOMED-RTproduced a more sensitive retrieval set than ICD9-CM (54.8% vs. 46.4%/, p=0.002 McNemar Test).

Conclusions: Our data clearly shows thatinformation regarding both common and raredisorders is more accurately identified withautomated SNOMED-RT indexing using the MayoVocabulary Processor than it is with traditional handpicked constellations of codes using ICD9-CM.SNOMED-RT provided more sensitive retrievals ofoutpatient episodes ofcare than ICD9-CM.

Introduction:

ICD9-CM is the current standard for billing andmorbidity indexing in the United States.3 Thisreporting responsibility makes the assignment ofICD9-CM codes a necessity for most or all healthcareorganizations. The availability of this coded data haslead to the use of ICD9-CM codes for researchinvolving the clinical record. The availability ofclinical reference terminologies such as SNOMED-RT, offers the potential to provide much moregranular coding of the data found within the clinicalrecorded' This advantage is achieved bycompositional encoding of health concepts.6 Thismeans that a clinician or system can put togethermultiple concepts in order to form a more detailedclinical expression."' An example of this is,"Cellulitis of the Left foot with Osteomyelitis of thethird Metatarsal without Lymphangitis." This notionis represented by an expression containing sixconcepts within SNOMED-RT. The ability to recordarbitrary complex structure within a clinical referenceterminology holds the promise to significantlyaddress the problem of content completeness.9

1067-5027/01/$5.00 C 2001 AMIA, Inc. 159

Page 2: using SNOMED-RT as Compared with ICD9-CM

Vocabularies such as SNOMED-RT, the UMLS orone of several other vocabulary efforts have commoncharacteristics that are helpful to understand withinthe scope of Mayo Vocabulary Processor (MVP).?"1,11 A vocabulary is essentially a set of concepts thatare identified by a unique identifier and described byterms and relationships. For each concept, thereexists one or more terms that belong to that concept.For example, in SNOMED-RT the concept"Myocardial Infarction" is identified by a code andcontains the terms "heart attack," "infarction ofheart," and "cardiac infarction." The concept is alsodescribed by its relationships to other concepts.These relationships commonly include hierarchiesbuilt on parent/child relationships but often extend tomany other types of relationships such asmorphology, topology and etiology.

Theoretically, this should lead to improvedinformation retrieval of relevant records whenpresented with a clinical query. We have designedthis experiment to test the hypothesis that the use ofSNOMED-RT will lead to more specific informationretrieval of clinical records when compared withICD9-CM.

Method

The Mayo Clinic has a long history of exactingmanual record keeping. The Master Sheet Indexrelates to Mayo's view of a clinical episode. AtMayo, an episode of care is defined at the divisionallevel and is related to the notion that there is a pointin the time course of caring for a particular patientwhen the case is stable and is therefore no longer influx. Often this is after the diagnoses have beenestablished, or after the major interventions havebeen accomplished. At this point in time, thePrimary Physician caring for the patient is required tolist the "Final" Diagnoses for this episode of care.This requires the physician to perform the actions ofFiltering, Subsumption, and Prioritization. Filteringwould include eliminating diagnoses, originallycontained in one's differential diagnosis, that wereruled out during this episode. Subsumption allowsthe clinician to state that the patient's presentingproblem of Chest Pain was in fact due toAtherosclerotic Coronary Artery Disease (CAD).Prioritization relates to the notion that although thepatient's problem list may in fact be very long, theclinician may have only dealt with a subset of thepatient's problems during this episode. For example,our patient with Chest Pain may also have SeborrheicKeratosis, which we may not have had occasion todiscuss during the work up and treatment of thispatient's CAD.

Mayo Vocabulary Processor

The Mayo Vocabulary Processor (MVP) is a set oftools that facilitate medical vocabulary indexing.This allows the mapping of free text into coded datathat is both expressive and comparable for clinical,educational and research purposes. There are twomain divisions within the set of tools correspondingto the server side and the client side. On the serverside, there are vocabulary services that contain thepreprocessing of the underlying referencevocabulary. This application is designed to beplatform and programming language neutral. On theclient side, there are GUI building blocks forsearching and navigating the underlying vocabulary,building complex coded expressions (Using ourAutomated Term Composition6 and Dissection'2Methods), and maintaining a personal list ofcommonly used coded expressions.

Decision Support System

The Decision Support System (DSS) is a softwareand data base management system that aggregatescoded clinical and financial data from various sourceswithin Mayo Foundation for both inpatient andoutpatient care. DSS captures ICD9-CM codes foreach episode of care, which occurs within the MayoFoundation. DSS allows an analyst to directly accessand model the information to support clinical andfinancial decision-making.

Study Design

We randomly selected 1,050 episodes of care fromthe inpatient setting and 1,050 episodes of care fromthe Outpatient setting (note: all episodes were fromyear 2000). This yielded 1,014 inpatient and 1,008outpatient unique episodes of care, which containedDSS assigned ICD9-CM codes. Each record hadassociated with it the free text final diagnoses fromthe Master Sheet Index at the Mayo Clinic and theICD9 codes used to bill for the encounters within theepisode of care. The free text diagnoses were codedby two expert indexers (disagreements wereaddressed by a Staff Clinician, 607 out of 2022(30%) episodes were reviewed in this fashion) as towhether queries regarding one of 5 common or 5uncommon diagnoses should return this encounter.The five most common diagnoses at Saint Mary'sHospital (Mayo's largest teaching Hospital) wereCongestive Heart Failure, Coronary Artery Disease,Chronic Obstructive Pulmonary Disease, Pneumoniaand Atrial Fibrillation. The five uncommon (rare)diagnoses were number 50 through 53 on the samefrequency list of Master Sheet entries (Ulcerative

160

Page 3: using SNOMED-RT as Compared with ICD9-CM

Colitis, Restrictive Lung Disease, Vasculitis,Nephrotic Syndrome, Histoplasmosis). The free textentries were automatically coded using the MayoVocabulary Processor. Each of the ten diagnoses wasexploded in both SNOMED-RT and ICD9-CM andusing these entry points, a retrieval set was generatedfrom the underlying corpus of records. ForSNOMED-RT, all concepts associated with theprimary mapping by either an "isa" relationship or a

co-occurrences of the "Assoc-Morph" and "Assoc-Topo" concept relations were included in the retrieval

set. The ICD9-CM explosions were performed byhand by an expert nosologist with over 20 years ofexperience with formulating ICD9-CM queries. Eachretrieval set was compared with the Gold Standardcreated by the expert indexers (See Figure #1).Formal statistical comparison of sensitivity,specificity, and accuracy was performed by use ofMcNemar's test applied to the 2-way tables SNO-MED by ICD9, within the sets of cases with orwithout each diagnosis (by the gold standard).

Figure #1 The free text master sheet entries were hand indexed by professional medical indexers and this was compared to theretrieval sets generated using SNOMED-RT and ICD9-CM respectively.

Results

SNOMED-RT provided significantly greaterspecificity in its retrieval sets (99.8% vs. 98.3%,p<0.001 McNemar Test), (See Table #1). Thepositive likelihood ratios were significantly better forSNOMED-RT retrieval sets (264.9 vs. 33.8, p<0.001McNemar Test). The positive predictive value of a

SNOMED-RT retrieval was also significantly betterthan ICD9-CM (92.9% vs. 62.4%, p<0.001 McNemarTest). The accuracy defined as 1 - (the total error

rate (FP+FN) / Total # episodes queried (20,220))was significantly greater for SNOMED-RT (98.2%

vs. 96.8%, p=0.002 McNemar Test). Interestingly,the sensitivity of the SNOMED-RT generatedretrieval set was not significantly different from

ICD9-CM, but there was a trend toward significance(60.4% vs. 57.6%, p=0.067 McNemar Test).However, if we examine only the outpatient practiceSNOMED-RT produced a more sensitive retrieval setthan ICD9-CM (54.8% vs. 46.4%, p=0.002 McNemarTest) (See Table #2).

Retrievals of common diagnoses were more sensitivebut not more specific than rare diagnoses and overallhad higher positive likelihood ratios and positivepredictive values.

161

Study Design

I

Page 4: using SNOMED-RT as Compared with ICD9-CM

TP FN Sons +L38 44 0.463415 20.90754

59 ~37 0.61 4583 56.3660771 33 0.682692 .25.67459

18 ~~0.6 42.36429207 ~126 0.621622 19.08943

-402 258 0.60909 29.07022 17 0.105263 105.42111 29 0.033333 1.4755560 2 0 #DIV/0!7 21 ~~0.25 , ~ 83.08333 X34

RostrI~~~~ 7 19 0.269231 9 ~~12.21329k i

Table #1: 2022 Randomly selected episodes of care were queried for 5 common and 5- rare disorders. The results of a queryusing SNOMED-RT were compared with ICD9-CM against a gold standard retrieval set compiled by two expertindexers.(TP=True Positive, TN=True Negative, FPP-False Positive, FN=False Negative, Sens--Sensitivity, Spec=Specificity,+LR=Positive Likelihood Ratio, and PPV=Positive Predictive Value)

-LR140.001143.591147.844LRI19.508JI18.538%21.1 58i

116 0.540 0.06977LE6O.43273E

,nt and the outpatient subsets are presented.

+LR469.747697.25E632.933

+LR61 .641

9.6841138.364(

162

Page 5: using SNOMED-RT as Compared with ICD9-CM

Discussion

Clinical terminologies that provide clinicians andhealth systems with the ability to code their clinicaldata using a robust reference terminology, whichsupport compositional expressions, can signifcantlyimprove the accuracy and specificity of informationretrieval as compared with the current indexingprovided by ICD9-CM.' This improvement in thespecificity of clinical retrieval sets represents adistinct advantage to the organization adopting thistechnology. Aggregation and retrieval of data forresearch, education, improving the practice andadministrating the practice of medicine will becomeeasier and more accurate. For outpatient episodes,SNOMED-RT provided more sensitive retrievalsthan ICD9-CM. The authors suspect that morerigorous assignment of ICD9-CM coding in theinpatient setting may account for the lack of asignificant difference in the sensitivities of theretrieval sets, also up to 9 diagnoses are codeablefrom a hospitalization while only 3 are allowed for anoutpatient episode. SNOMED would obviouslyshow a greater advantage for concepts whoseaccurate representation requires a compositionalapproach. The greater availability of high qualityinformation will in some, perhaps many, cases be thedefining competitive advantage that one healthcareorganization has over its competitors in thehealthcare marketplace.

The overall low sensitivity of SNOMED-RT forretrieving clinically relevant episodes of care for bothcommon and rare disorders highlights the need forthe continued development of more robustrelationships, both hierarchical and non-hierarchical(e.g. "Diastolic Dysfunction" in SNOMED-RT is notlisted as a type or etiology of "CHF" whereasclinically CHF secondary to diastolic dysfunction isone of the important distinctions in the diagnosis ofchf). Of note, the description logic provided nobenefit in terms of increased retrievals from thisdataset. Some of the false negative retrievalsstemmed from the cases where the diagnosis wasdiscussed but not mentioned. Other failures were dueto the co-occurrence of concepts, which to theindexers implied a high probability that the episodewas relevant.

Our data clearly shows that information regardingboth common and rare disorders is more accuratelyidentified with automated SNOMED-RT indexingusing the Mayo Vocabulary Processor than it is withtraditional hand picked constellations of codes usingICD9-CM. SNOMED-RT provided more sensitive

retrievals of outpatient episodes of care han ICD9-CM.

References:1. Campbell KE, Musen MA. Representation of

Clinical Data Using SNOMED Ill andConceptual Graphs. Symposium on ComputerApplications in Medical Care 1992;16:354-358.

2. Elkin PL, et al; "Standard Problem ListGeneration, Utilizing the Mayo CanonicalVocabulary Embedded within the UnifiedMedical Language System"; JAMIA Suppl 1997,p. 500-504.

3. http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm

4. Spacn KA, Campbell KE. CompositionalConcept Representation Using SNOMED:Towards Further Convergence of ClinicalTerminologies. JAMIA 1998;Symp.Suppl(1998):740-744.

5. Cote R, Rothwell D, Palotay J, Beckett R,Brochu L, eds. The Systemized Nomenclature ofHuman and Veterinary Medicine: SNOMEDInternational. Northfield: College ofAmericanPathologists; 1993.

6. Elkin PL, et al.. A Randomized Controlled TrialofAutomated Term Composition. JAMIA1998;SympSuppl:765-769.

7. Elkin P, et al. Medical KnowledgeRepresentation: From a Clinically DerivedTerminology to Understanding. In: ThirteenthSymposium: TEPR 1997.

8. Elkin PL, Harris M, Brown SH, et al; "SemanticAugmentation ofDescription Logic BasedTerminologies", IMIA WG6 Symp. 2000.

9. Bousquet c, et al. Using Semantic Distance forthe Efficient Coding ofMedical Concepts;JAMIA Suppl. 2000.

10. Chute C, Elkin P. A Clinically Derived Lexicon:Qualifications to Reduction. JAMIA 1997:5704.

11. Elkin PL, Tuttle MS, et al. The Role ofCompositionality in Standardized Problem ListGeneration. In: Cesnik B, McCray AT, ScherrerJ-R, ed. Ninth World Congress on MedicalInfornatics; IOS Press; 1998. p. 660-664.

12. PL Elkin, BA Bauer, et al; "A RandomizedDouble-Blind Controlled Trial ofAutomatedTerm Dissection", JAMIA Suppl. 1999

AcknowledeementsThis work has been supported, in part, by aMayo: Clinician Engaged in EducationGrant and the Mayo Clinic's Department ofInternal Medicine. The authors wish tothank Dawn Bergen and Philip Ogren fortheir technical support of this project.

163