36
Introducing ODIE NCBO Seminar Series February 18, 2009

Introducing ODIE

Embed Size (px)

DESCRIPTION

Introducing ODIE. NCBO Seminar Series February 18, 2009. Example. IE using ontologies. OE using documents. punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion - PowerPoint PPT Presentation

Citation preview

Page 1: Introducing ODIE

Introducing ODIEIntroducing ODIE

NCBO Seminar Series

February 18, 2009

Page 2: Introducing ODIE

ExampleExample

Page 3: Introducing ODIE

IE using ontologiesIE using ontologies

Diagnosis Malignant MelanomaBreslow Depth 0.72 mmLateral Margin PositiveRegression ProbableUlceration NegativeTIL Focally Brisk

Page 4: Introducing ODIE

OE using documentsOE using documents

punch biopsyjunctional componentpagetoid spreaddermal melanocytesBreslow depthlymphocytic infiltratesregressionmicroscopic satellitesvascular invasiontumor infiltrating lymphocytesSpitz nevusepithelioid nevus

Page 5: Introducing ODIE

Two Tasks ~ One problemTwo Tasks ~ One problem

Ontology

TextOntology Enrichment:Uses concepts as source of concepts and relationships to enrich and validate ontology

Information Extraction:Uses concepts as source of concepts and relationships to enrich and validate ontology

Specific Aims 2,3,4

Specific Aims 1,3,5

Page 6: Introducing ODIE

Specific Aims Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction (IE) tasks

using existing OBO ontologies, including:

Named Entity Recognition (NER)

Co-reference Resolution (CR)

Discourse Reasoning (DR)

Attribute Value Extraction (AVE)

Specific Aim 2: Develop and evaluate general methods for clinical-text mining to assist in ontology development, including:

Concept Discovery (CD)

Concept Clustering (CC)

Taxonomic Positioning (TP)

Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.

Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.

Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.

Page 7: Introducing ODIE

Ontology EnrichmentOntology Enrichment

• Machine assisted

- Extraction- Filtering and Organization- Visualization- Suggestions

• Human decision-maker (developer, curator)

• Feedback and improvement of OE

Page 8: Introducing ODIE

Project OrganizationProject Organization

Concept Discovery Coreference Resolution ODIE 0.5

Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell

Wendy ChapmanGuergana SavovaMelissa Castine

Rebecca Crowley Kevin MitchellGirish ChavanEugene Tseytlin

Study and compare methods for ontology enrichment; design methods for evaluation

Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms

Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups

Page 9: Introducing ODIE

DomainDomain

Will attempt to develop general tools whenever possible

• Priorities for evaluation of components in :

Radiology and pathology reports

NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA)

Cancer domains (including hematologic oncology)

Page 10: Introducing ODIE

ProgressProgress

• ODIE 0.5 pre-release on NCBO SourceForge

• Annotation software and document sets

• Res Proj #1: LSP annotation project

• Res Proj #2: Coreference resolution annotation

• Starting Res Proj #3: Discourse Reasoning

Page 11: Introducing ODIE

• Toolkit for developers of NLP applications and ontologies

• Pre-released on NCBO SourceForge as ODIE 0.5

• Current release focuses on NER and CD

• Support interaction and experimentation

• Package systems at the conclusion of working with ODIE

• Foster cycle of enrichment and extraction needed to advance development of NLP systems

• Ontology enrichment as opposed to denovo development

• Human-machine collaboration as opposed to fully automated learning

ODIE SoftwareODIE Software

Page 12: Introducing ODIE

ODIE Download/InfoODIE Download/Info

ODIE Installer: http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe

GForge Site: https://bmir-gforge.stanford.edu/gf/project/odie/

User Forums: https://bmir-gforge.stanford.edu/gf/project/odie/forum/

ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html

Page 13: Introducing ODIE

Users/WorkflowUsers/Workflow

ODIE is intended for:

• users who want to use NCBO ontologies to perform various NLP tasks (+/- may need to add concepts locally to achieve sufficient performance)

• users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)

Page 14: Introducing ODIE

Plans for ODIE 1.0Plans for ODIE 1.0

Ability to import additional ontologies from Bioportal or from owl files

Ability to export proposal/enriched ontologies.

Ability to add and configure new processing resources (UIMA or GATE based)

Ability to build processing pipelines using processing resources

Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.

Page 15: Introducing ODIE

Research Project 1:Ontology EnrichmentResearch Project 1:

Ontology EnrichmentNearly completed survey of

lexical, statistical and hybrid methods for ontology enrichment

Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress)

First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS

Concept Discovery

Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell

Study and compare methods for ontology enrichment; design methods for evaluation

Page 16: Introducing ODIE

LSP PatternsLSP Patterns

The presence of certain “lexico-syntactic patterns” can indicate a particular semantic relationship between two nouns

Example:

DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA

“such as” indicates hyponym relationship between two noun phrase

Page 17: Introducing ODIE

Technique 1 - LSPTechnique 1 - LSP

PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)

COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA

Page 18: Introducing ODIE

LPS distribution resultLPS distribution result

Patterns

Pathology Corpus852764 reports, 16157608

sentences

Radiology Corpus209997 Reports, 4057228

sentences

  # Sentences Unique # of sentences # Sentences

Unique # of sentences

NP especially NP 14 11 19 10NP also called NP 48 37 29 22NP such as NP 98 95 906 251NP's NP 202 45 5 2NP in NP 4851 1689 106 47NP aka NP 5396 460 2 2NP including NP 6291 4952 1403 747NP other NP 6940 2251 10622 1407NP like NP 7649 2267 410 235NP, NP 8211 5351 7385 3889NP of NP 14275 4032 2906 607NP in the NP 47124 23178 64044 29285NP is NP 92374 25024 7349 2896NP of the NP 246798 70735 173016 54895

Number of sentences contain lexico-syntactic pastterns

Page 19: Introducing ODIE

Step 1 -Domain Expert annotation• Annotation tasks: 1. Meaningful medical phrases (MMP) that can stand

alone before LSP and after LSP.2. The phrases before and after LSP have to be related

•Before LSP •After LSP •LSP

Term1 Term2

PRURIGO NODULE LICHEN SIMPLEX CHRONICUS

BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA….. …….

• Calculate : total # of MMP , # of MMP per LSP • Calculate : total # of MMP , # of MMP per LSP

PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)

COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA

Page 20: Introducing ODIE

Step 2 - Curator Judgment

1. Is the concept in the ontology?

2. If not, should it be added into the ontology?

3. If not, what is the reason?

For each term

1. What is the relationship between them?

2. Is this relationship exist in the ontology?

3. If not, should it be added into the ontology?

4. If not, what is the reason?

For each pair of terms

Term1 Term2

PRURIGO NODULE LICHEN SIMPLEX CHRONICUS

BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA

….. …….

New Concept and Relationship Suggestion Rates

New Concept and Relationship Acceptance Rates

Page 21: Introducing ODIE

First experiment result–concept enrichment

First experiment result–concept enrichment

   Radiology Reports    

  Proceed the LSP  Following the

LSP  

 

Total # of meaningful

medical Phrase

# of meaningful medical Phrase/

# of LSP

Total # of meaningful

medical Phrase

# of meaningful

medical Phrase/ # of

LSP such as 17 100% 31 124%

including 27 159% 66 264%

   Pathology Reports    

 Proceed the

LSP   Following the LSP  

 

Total # of meaningful

medical Phrase

# of meaningful medical Phrase/ #

of LSP (25)

Total # of meaningful

medical Phrase

# of meaningful

medical Phrase/ # of

LSP (25)such as 27 108% 55 220%

including 24 96% 35 233%aka 25 100% 28 112%

Page 22: Introducing ODIE

First experiment result– concept enrichment (NCIT)

First experiment result– concept enrichment (NCIT)

Page 23: Introducing ODIE

First experiment – extracted relationships

First experiment – extracted relationships

Page 24: Introducing ODIE

First experiment – extracted relationships

First experiment – extracted relationships

LSPs

such as including aka

Pe

rce

nta

ge

0

20

40

60

80

100

Hyponym relationship is not in the NCIT Hyponym relationship should be added into the NCIT

Page 25: Introducing ODIE

First experiment – Concept Enrichment for RadLex

First experiment – Concept Enrichment for RadLex

Column1 # of TermsNot in

RadLexIn

RadLex Blank

Should be added to RadLex

Suggestion rate

Acceptance rate

Proceeding LSP 29 11 16 2 10 38% 91%

Following LSP 68 24 41 3 10 35% 42%

Total 97 35 57 5 20 36% 57%

Page 26: Introducing ODIE

Research Project 2:Coreference Resolution

Research Project 2:Coreference Resolution

Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)

Examples of Types of anaphoric relations:

Identity (or coreference)Set/subsetPart/whole

Anaphora resolution is a computational technique for the discovery of anaphoric relations

Coreference Resolution

Wendy ChapmanGuergana SavovaMelissa Castine

Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms

Page 27: Introducing ODIE

DefinitionsDefinitionsAnaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)

Type of anaphoric relations

Identity (or coreference)Set/subsetPart/wholeOther

Anaphora resolution is a computational technique for the discovery of anaphoric relations

Page 28: Introducing ODIE

ProgressProgressCompleted and Ongoing:Annotation schema DevelopmentGuidelinesTraining of annotators

4 training sessions

IAA: after session 1 – in the 40’s

IAA: after session 3 – in the 60’s

Planned:

Complete Reference Standard (RS)

Algorithm testing and further development

Page 29: Introducing ODIE

Data Sets for RSData Sets for RS

50 clinical notes (named entities annotated)

50 Pathology (disorders, tumors)

20 Pathology (conditions)

20 Radiology (conditions)

20 Discharge summaries (conditions)

20 ED (conditions)

20 ED (respiratory conditions) •Mayo

•Pitt

Page 30: Introducing ODIE

QUESTIONS ?QUESTIONS ?

Page 31: Introducing ODIE

Visualization of document setVisualization of document set

Page 32: Introducing ODIE

NER – viewing conceptsNER – viewing concepts

Page 33: Introducing ODIE

Multiple OntologiesMultiple Ontologies

Page 34: Introducing ODIE

OE – Concept SuggestionOE – Concept Suggestion

Page 35: Introducing ODIE

Ranked SuggestionsRanked Suggestions

Page 36: Introducing ODIE

Adding ProposalsAdding Proposals