17
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany, NY, USA 10.16.2012- Rochester, MN MCORES: a system for noun phrase coreference resolution for clinical records 2012 SHARPn Summit “Secondary Use”

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

  • Upload
    dawn

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

MCORES: a system for noun phrase coreference resolution for clinical records . 2012 SHARPn Summit “Secondary Use”. Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany, NY, USA. - PowerPoint PPT Presentation

Citation preview

Page 1: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Andreea Bodnari,1 Peter Szolovits,1 Ozlem Uzuner2

1MIT, CSAIL, Cambridge, MA, USA2Department of Information Studies, University at Albany SUNY, Albany,

NY, USA

10.16.2012- Rochester, MN

MCORES: a system for noun phrase coreference resolution for clinical records

2012 SHARPn Summit “Secondary Use”

Page 2: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Outline

Medical coreference resolution system (MCORES)

Experimental results

Conclusion

Page 2

Page 3: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Electronic Medical Records (EMRs) – large information repositories

Clinical information requires processingLower level: sentence parsing, tokenization Higher level: coreference resolution, semantic

disambiguation

Coreference resolution: a fundamental step in text processing

Page 3

Why coreference resolution?

Page 4: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

English medical corpus provided by i2b2 National Center for Biomedical Computing De-identified medical discharge summaries▪ Source: PH & BIDMC▪ Content: 230(PH) + 196(BIDMC) discharge summaries

Annotated concepts and coreference chains

Concept types

Page 4

Data: i2b2/VA corpus

PersonsProblemsTreatmentsTests Pronouns

Page 5: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

NP Instance Creation

Feature Generation

Classification

Output Clustering

Page 5

Coreference resolution algorithm

Page 6: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Markables of same semantic category are paired together

MCORES creates positive instances only from neighboring markable pairs in a chain

1Instance creation akin to McCharty and LehnertPage 6

1. NP instance creation

Page 7: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Page 7

Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.

1. NP instance creation

Page 8: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Multi-perspective features Antecedent perspective Anaphor perspective Greedy perspective Stingy perspective

Phrase-level lexicalSentence-level lexicalSyntacticSemanticMiscellaneous

Page 8

2. Feature Generation

Page 9: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Phrase-level lexical

Token overlap*Normalized token overlapEdit-distanceNormalized edit-distance

Sentence-level lexical

Sentence-level token overlap*Filtered sentence-level token overlap*Left and right mention overlap

stingy and greedy perspectives only

Page 9

2. Feature Generation (lexical)

* multi-perspective feature

Page 10: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Syntactic

Number agreementNoun overlap*Surname match

Semantic

UMLS CUI overlap*UMLS CUI token overlap*UMLS semantic type overlap*Anaphor UMLS semantic type

Page 10

2. Feature Generation (syntactic & semantic)

* multi-perspective feature

Page 11: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Token distanceMention distanceAll-mention distanceSentence distanceSection matchSection distance

Page 11

2. Feature Generation (miscellaneous)

Page 12: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

C4.5 decision tree algorithmFlexible Readable prediction model

Classify pairs of markables based on values of the feature vectors

Page 12

3. Classification

Page 13: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Classifier makes pairwise predictions onlyPairwise predictions clustered into coference chainsAggressive-merge1 clustering algorithm

prediction [M1] - [M2]

all preceding pairwise predictions linked to [M1]or [M2]

1Aggresive-merge algorithm proposed by McCarthy and Lehnert

Page 13

4. Output Clustering

Page 14: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Feature set evaluationPerspectives evaluationPerformance evaluation against In house baseline Third party system (RECONCILEACL09

& BART)

Evaluation metric: unweighted averages of Recall, Precision, and F-measures of MUC B3

CEAF BLANC

Page 14

Evaluation

Page 15: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Page 15

Page 16: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

MCORES’ advantage comes from linking markables with no token overlap

Phrase-level sub-MCORES performs similarly to MCORES

Greedy perspective system is the most favorable single-perspective system

Multi-perspective system performs as well or better than single-perspective systems

Error analysis MCORES fails to classify misspelled person pairs

Medical problems false positives due to difference between newly and recurring events

Treatments false positives due to medications presenting different routes of administration

Tests false positive due to the large number of full overlap instances that did not corefer

Page 16

Discussion

Page 17: Andreea Bodnari, 1  Peter Szolovits, 1  Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Developed coreference resolution system for the medical domain (MCORES)

MCORES innovates through a multi-perspective and knowledge-based feature set

MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records

Page 17

Conclusion