Click here to load reader

Chunhua Weng, PhD - EliXR

  • View
    92

  • Download
    1

Embed Size (px)

DESCRIPTION

EliXR: An Approach to Eligibility Criteria Extraction and Representation

Text of Chunhua Weng, PhD - EliXR

EliXR: An Approach to Eligibility Criteria Extraction and Representation

Chunhua Weng, PhD, Zhihui Luo, PhD, Stephen B. Johnson, PhD Department of Biomedical Informatics Columbia University March 10, 2011

1

ProblemFree-text clinical research eligibility criteria are not amenable for machine processing. Computational representations (e.g., ontologies) are much needed to support electronic eligibility determination, clinical evidence application, clinical research knowledge management, etc.2

Related Work Eligibility Rule Grammar and Ontology (ERGO) Agreement on Standardized Protocol Inclusion Requirements for Eligibility (ASPIRE) Many other prior efforts1

1.WengC,SWTu,ISim,R.Richesson,FormalRepresentationsofEligibilityCriteria:ALiterature Review,JournalofBiomedicalInformatics:43(2010),451467.3

The Research Gap Plethora of representations, no canonical model Ontology for human annotation vs. ontology for NLP Lacking ontology and NLP symbiosis

4

Our Research QuestionCan we induce templates that can facilitate both representation and extraction from criteria text?(A template is a world model for eligibility criteria as a semantic network)5

From Text to TemplatesTEXT TEMPLATES Concepts Phrases Semantic Relationships Sentence Phrase Co-occurrence Frequency

Templatedevelopment=segmentationofUMLS SemanticNetworkfortheeligibilitycriteriadomain6

Methods:TheEliXRFrameworkCriteria Corpus Semantic Dependency Parsing4 Semantic Annotation1 Dynamic Criteria Categorization2,3 Semantic Pattern Mining5 Template Induction6

Lexicon Creation1

UMLS

Automatic Templateselection

Template Filling

Structured Criteria7

Semantic Annotation Compared with MMTx[Luo,CRI10] Example: Patients with complications such as serious cardiac, renal and hepatic disorders. EliXR Annotation: {Patients | Patient or Disabled Group} {with|} {complications | Pathologic Function} {such|} {as|} {serious | Qualitative Concept} {cardiac | Body Part, Organ, or Organ Component} {renal | Body Part, Organ, or Organ Component} {and|} {hepatic | Body Location or Region} {disorders | Disease or Syndrome} {.|.|} MMTx 2.4C Annotation: {Patients | Patient or Disabled Group} {with complications | Pathologic Function} {such as serious cardiac, renal | Idea or Concept} {and|} {hepatic disorders| Disease or Syndrome}8

The27EligibilityCriteriaCategories[Luo,AMIA10]

9

Dependency Parsing for at least 1 week since discontinuation of prior pulmonary hypertension medication

10

Frequent Semantic PatternsGroups Disease Criteria Lab Results Criteria Cancer Criteria Medication Criteria Therapy or Surgery Criteria Temporal Expression Total Unique Patterns Semantic patterns in each group 60 36 28 23 23 9 17511

Aggregated Patterns for Disease CriteriaLBPR| DIAP CLAS | FTCN | QLCO | QNCO | CLNA BLOR | BPOC Manifestation Of DSYN | NEOP Causes VIRS

PATF | SOSY | FNDG

Occursin

TMCO

PODG ORGA TOPP | PHSU | CLDG

DiseaseCriteriaTemplateDiagnostic Procedure Modifier Body Location Manifestation of Disease Causes Etiology

Manifestation

Occursin

Temporal Constraints

Population Group History Treatment (Therapy or Drug)

MicroTemplatesforTemporalExpressionsClass attribute Hasa

0:m 1:m 0:m

intrinsic duration

event

0:mreference interval

intrinsic temporal pattern

0:mTemporal Expression

0:mtemporal relationship cycle temporal pattern

0:m

0:m 0:manchor event

frequency

AQUA Parsing Accuracy Only 900 criteria sentences were used for training A human review served as the gold standardFive Test Sets (100 criteria each) 1 2 3 4 5 Avg. Tree Structure Correctness 90.60% 94.30% 92.80% 93.00% 93.00% 92.70%15

Evaluation of Semantic PatternsSemantic Type Patterns Frequent Maximal Frequent Min. Pattern Sub-trees Sub-trees Cover Min. Total support Overlap Total Binary Coverage Total Coverage Total with Patterns Patterns Patterns Number Patterns 2 1825 669 91.30% 175 81.30% 183 90 Semantic Group Patterns Frequent Maximal Frequent Sub-trees Sub-trees Min. Total support Total Binary Coverage Total Coverage Patterns Patterns Patterns 2 2378 120 92.50% 39 90.60%16

Contributions1. Templates with rich semantics that can be mapped to UMLS and semantically aligned with text; 2. A method for segmenting UMLS for boot strapping knowledge representation for eligibility criteria; 3. A method combining machine learning and dependency tree pattern mining for iterative, (semi)automatic knowledge acquisition.

17

Acknowledgements NLM R01 LM009886 (04/01/09 - ) Bridging the semantic gap between eligibility criteria and clinical data (PI: Weng) Colleagues on the AQUA and EliXR team Eneida Mendoca, MD, PhD Robert Duffy, MS Xiaoying Wu, PhD Feedback fromIda Sim, Samson Tu, James J. Cimino, Nigam Shah, GQ Zhang, Albert Lai

18

ResourcesforSharing&Collaboration1. A UMLS-based semantic lexicon for eligibility criteria 2. A semantic annotator 3. A dynamic semantic classifier for criteria sentences 4. A dependency parser with enriched semantic information 5. A temporal expression ontology for eligibility criteria 6. A tool for temporal expression extraction and encoding

19

References1. Johnson, SB, Conceptual graph grammar--a simple formalism for sublanguage. Methods of Information in Med. 1998 Nov;37(4-5):345-52. 2. Johnson, SB. A semantic lexicon for medical language processing. J Am Med Inform Assoc, 6:205--218, 1999. 3. Campbell, DA., Johnson, SB. A transformational-based learner for dependency grammars in discharge summaries. Proc. Of ACL-02 workshop on BioNLP, 37--44, 2002. 4. Zaki MJ. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. Knowl.Data Eng., 17(8):1021-1035, 2005. 5. Luo Z, Duffy R, Johnson SB, Weng C, Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS. Proc of AMIA Summit on Clinical Research Informatics. 2010: 26-31. 6. Luo Z, Johnson SB, Chase HS, Weng C, Semi-automatically Inducing Semantic Classes of Clinical Research Eligibility Criteria Using UMLS and Hierarchical Clustering, Proc of AMIA Symp 2010, 487-91. 7. Weng C, Luo Z, Dynamic Categorization of Eligibility Criteria, Proc of AMIA Fall Symp 2010, 1306.20