Download pdf - Chunhua Weng, PhD - EliXR

Transcript

1

EliXR: An Approach to Eligibility Criteria Extraction and Representation

Chunhua Weng, PhD, Zhihui Luo, PhD, Stephen B. Johnson, PhDDepartment of Biomedical Informatics

Columbia UniversityMarch 10, 2011

Problem

Free-text clinical research eligibility criteria are not amenable for machine processing.

Computational representations (e.g., ontologies) are much needed to support electronic eligibility determination, clinical evidence application, clinical research knowledge management, etc.

2

Related Work

• Eligibility Rule Grammar and Ontology (ERGO)• Agreement on Standardized Protocol Inclusion

Requirements for Eligibility (ASPIRE)• Many other prior efforts1

3

1.Weng C, SW Tu, I Sim, R. Richesson, Formal Representations of Eligibility Criteria: A Literature Review, Journal of Biomedical Informatics: 43(2010), 451‐467.

The Research Gap

• Plethora of representations, no canonical model

• Ontology for human annotation vs. ontology for NLP

• Lacking ontology and NLP symbiosis

4

Our Research Question

Can we induce templates that can facilitate both representation and extraction from criteria text?

(A template is a “world model” for eligibility criteria as a semantic network)

5

From Text to Templates

TEMPLATES• Concepts• Semantic Relationships

6

TEXT• Phrases• Sentence• Phrase Co-occurrence Frequency

Template development = segmentation of UMLS Semantic Network for the eligibility criteria domain

Methods: The EliXR Framework

7

Criteria Corpus

Lexicon Creation1

Semantic Annotation1

Semantic Dependency

Parsing4

TemplateInduction6

Semantic Pattern Mining5

Template Filling

Dynamic Criteria Categorization2,3

Structured Criteria

UMLS

Automatic Template selection

Semantic Annotation Compared with MMTx

Example:Patients with complications such as serious cardiac, renal and hepatic disorders.

EliXR Annotation:{Patients | Patient or Disabled Group} {with|} {complications | Pathologic

Function} {such|} {as|} {serious | Qualitative Concept} {cardiac | Body Part, Organ, or Organ Component} {renal | Body Part, Organ, or Organ Component} {and|} {hepatic | Body Location or Region} {disorders | Disease or Syndrome} {.|.|}

MMTx 2.4C Annotation:{Patients | Patient or Disabled Group} {with complications | Pathologic Function}

{such as serious cardiac, renal | Idea or Concept} {and|} {hepatic disorders| Disease or Syndrome}

8

[Luo, CRI‐10]

9

The 27 Eligibility Criteria Categories[Luo, AMIA‐10]

Dependency Parsing for “at least 1 week since discontinuation of prior pulmonary

hypertension medication”

10

Frequent Semantic Patterns

11

GroupsSemantic patterns in

each group

Disease Criteria 60

Lab Results Criteria 36

Cancer Criteria 28

Medication Criteria 23

Therapy or Surgery Criteria 23

Temporal Expression 9

Total Unique Patterns 175

CLAS | FTCN | QLCO | QNCO | CLNA

DSYN | NEOPPATF | SOSY | FNDG

TOPP | PHSU | CLDG

BLOR | BPOC

LBPR| DIAP

TMCO

VIRS

PODG

Causes

Manifestation Of 

Occurs in

Aggregated Patterns for Disease Criteria

ORGA

Modifier

DiseaseManifestation

Treatment (Therapy or Drug)

Body Location

Diagnostic Procedure

Temporal Constraints

Etiology

Population Group

Causes

Manifestation of

Occurs in

Disease Criteria Template

History

0:m

attribute

Class

Has‐a

Micro‐Templates for Temporal Expressions

temporal relationship

reference interval

1:m

0:m

0:m

0:m

temporal pattern0:m

intrinsic temporal pattern

intrinsic duration

0:m

0:m

anchor event

Temporal Expression

event

0:m

cycle

frequency

AQUA Parsing Accuracy

• Only 900 criteria sentences were used for training• A human review served as the gold standard

15

Five Test Sets(100 criteria each)

Tree Structure Correctness

1 90.60%2 94.30%3 92.80%4 93.00%5 93.00%

Avg. 92.70%

Evaluation of Semantic Patterns

16

Min. support

Semantic Type PatternsFrequent Sub-trees

Maximal Frequent Sub-trees

Min. Pattern Cover

Total Patterns

Total Binary

PatternsCoverage Total

PatternsCoverage Total

Number

Overlap with

Patterns2 1825 669 91.30% 175 81.30% 183 90

Min. support

Semantic Group PatternsFrequent Sub-trees

Maximal Frequent Sub-trees

Total Patterns

Total Binary

PatternsCoverage Total

PatternsCoverage

2 2378 120 92.50% 39 90.60%

Contributions

1. Templates with rich semantics that can be mapped to UMLS and semantically aligned with text;

2. A method for segmenting UMLS for boot strapping knowledge representation for eligibility criteria;

3. A method combining machine learning and dependency tree pattern mining for iterative, (semi-)automatic knowledge acquisition.

17

Acknowledgements• NLM R01 LM009886 (04/01/09 - ) “Bridging the semantic gap between

eligibility criteria and clinical data” (PI: Weng)

• Colleagues on the AQUA and EliXR team– Eneida Mendoca, MD, PhD– Robert Duffy, MS– Xiaoying Wu, PhD

• Feedback fromIda Sim, Samson Tu, James J. Cimino, Nigam Shah, GQ Zhang, Albert Lai

18

Resources for Sharing & Collaboration

1. A UMLS-based semantic lexicon for eligibility criteria

2. A semantic annotator

3. A dynamic semantic classifier for criteria sentences

4. A dependency parser with enriched semantic information

5. A temporal expression ontology for eligibility criteria

6. A tool for temporal expression extraction and encoding

19

References1. Johnson, SB, Conceptual graph grammar--a simple formalism for sublanguage. Methods of Information in Med. 1998 Nov;37(4-5):345-52. 2. Johnson, SB. A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6:205--218, 1999. 3. Campbell, DA., Johnson, SB. A transformational-based learner for dependency grammars in discharge summaries. Proc. Of ACL-02 workshop on BioNLP, 37--44, 2002.4. Zaki MJ. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. Knowl.Data Eng., 17(8):1021-1035, 2005. 5. Luo Z, Duffy R, Johnson SB, Weng C, Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS. Proc of AMIA Summit on Clinical Research Informatics. 2010: 26-31. 6. Luo Z, Johnson SB, Chase HS, Weng C, Semi-automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering, Proc of AMIA Symp 2010, 487-91.7. Weng C, Luo Z, Dynamic Categorization of Eligibility Criteria, Proc of AMIA Fall Symp 2010, 1306.

20