29
Christopher J. O. Baker Institute for InfoComm Research, A*STAR, Singapore Ontology-centric Knowledge Navigation .. of the scientific literature

"Ontology-centric navigation of the scientific literature"

Embed Size (px)

DESCRIPTION

Bridging Worlds Conference 2008, SingaporeDay Two Track ThreeSpeaker 1- Christopher Baker

Citation preview

Page 1: "Ontology-centric navigation of the scientific literature"

Christopher J. O. BakerInstitute for InfoComm Research,

A*STAR, Singapore

Ontology-centric

Knowledge Navigation

.. of the scientific literature

Page 2: "Ontology-centric navigation of the scientific literature"

Motivation• Scientists typically need to integrate a spectrum of

information to successfully complete a task.

• On average a scientist or knowledge worker spends 1 day per week searching for, integrating and analyzing information, 50% of which is unstructured digital formats.

• Access to information structured according to explicit knowledge representations or taxonomiesis a fundamental concern of all scientists.

• Moving beyond keyword search requires tools thatprovide lexical matching to semantic, conceptual and contextual levels of information and this entails an infrastructure for indexing text segments according to domain-specific metadata

Page 3: "Ontology-centric navigation of the scientific literature"

In the future ….• Users will be involved in the design of information systems

• Publishers will charge users for value added search: (who will build such search systems)

• Users will search across semantically integration data sources and data types (how to facilitate system creation / adoption)

• Knowledge driven systems - rapidly built and deployed with the engagement of domain experts in a knowledge engineering team

Page 4: "Ontology-centric navigation of the scientific literature"

Literature-driven, Ontology-centricKnowledge Integration and Navigation

Ontology Population

Content delivery using expressive semantics

Text Mining

Ontology

Visual Query

500 documents, blogs, newsfeeds to browse

50 sentences

to read

Reasoning

Page 5: "Ontology-centric navigation of the scientific literature"

W3C Semantic Web Technologies• URI / LSID• Ontologies• Reasoners• Query Languages• Web Services • Service Registries• Agents• Multi Agent Systems• Workflows Engines• GRID / Semantic GRID• Text Mining• Service Oriented Architecture

Page 6: "Ontology-centric navigation of the scientific literature"

Controlled Vocabularies OntologiesGeneral

logicalconstraints

Terms/Glossary/Controlled

vocabularies

Thesauri“narrower term”Controlled vocabularies

Formalis-a

part-ofFrames

(properties)

Informalis-a

part-of

Formalinstance

Value restrictions

Catalog/ID

Capture knowledge: The meaning of important vocabulary (classes, properties/relations and instance data in a domain model). Common domain terminology

Basis for interoperability between information systems.

Make the content in information sources explicit.

Index and query model to a repository of information.

Page 7: "Ontology-centric navigation of the scientific literature"

Lipid Ontology

Lipid Hierarchy

Concept Definitions

DL Axioms Graph fragment

> Implementation:OWL-DL

> DL Expressivity ALCHIQ

> Uses LIPIDMAPS systematic nomenclature

> 560 Named classes > 352 Lipid subclasses

71 Object properties (inc inv.)

> 4 Datatypeproperties

> Lipid instance: LIPIDMAPS systematic name

> Depth: 8 levelsDomain Knowledge vs

information system metadata

Page 8: "Ontology-centric navigation of the scientific literature"

Ontologies Online

Page 9: "Ontology-centric navigation of the scientific literature"

Ontology-centric knowledge architecture

Page 10: "Ontology-centric navigation of the scientific literature"

• Content Delivery Platform - AutomatedDocument delivery from online databasesTools for conversion to text-minable text

• Text Mining - Customized and AutomatedRegular Expressions, Named Entities, Relations,

• Knowledge Engineering – Ontology CreationDomain Modeling / Customized Rapid Prototyping

• Ontology Population – Automated InstantiationSentences as instances / Co-occurrence and named relations (Rules)

Ontology-centric Knowledge Integration

Content Acquisition

Domainspecific raw text

Page 11: "Ontology-centric navigation of the scientific literature"

Domian Ontology vs Mixed Metadata:a literature specification

Page 12: "Ontology-centric navigation of the scientific literature"

Ontology Population Workflow• Ontology based information retrieval

applies NLP to link documents to existing ontologies

• Ontology-driven NLP - NLP that actively uses ontological resources for NLP tasks

• Ontological NLP - ontologies used as a knowledge base for NLP tasks while also exporting the results of NLP analyses into an ontology that can then subsequent semantic queries to the ontology using description logic reasoners and a box reasoning

• Ontology based NLP - the results of NLP are exported to another ontology, using external resources for text processing,

Witte etal. 2007

Page 13: "Ontology-centric navigation of the scientific literature"

Text Mining• Class Instance Generation from full text

– Named entity recognition (gazetteer based)– Dictionary based matching of text tokens to domain

specific vocabularies i.e. (LipidBank, Lipidmaps, KEGG, IUPAC) and curated Swissprot terms and disease ontology of CGM

– Normalization and grounding to canonical names

• Relation Detection - Role Assertions: – Co-occurrence and Rule-based relation detection of binary

pairs from which knowledgebase instances are generated. Primary set of binary interactions mined from text:

– Lipid-Protein, Lipid-Disease, Protein-Disease– Domain specific library of curated biological relations.

Page 14: "Ontology-centric navigation of the scientific literature"

Knowledgebase Instantiation1) Rule based identification of Sentences containing target keywords 2) Instantiation with JENA API http://jena.sourceforge.net/ for this purpose.

Target keywords found in sentences are instantiated to corresponding ontology class

• Lipid / Protein / Disease instances are instantiated to the respective ontology classes (as tagged by the gazetteer)

• Binary pairs instantiated to the respective Object Properties as role assertions • Sentences instantiated to the respective Data type properties.

For each lipid identified in a sentence the corresponding data are instantiated to the ontology from Lipid Data Warehouse records requiring no further text processing.

• Lipid - LIPIDMAPS Systematic Name and its associated • Lipid - IUPAC Name, Lipid – synonyms, Lipid - Database ID.

Page 15: "Ontology-centric navigation of the scientific literature"

Knowledgebase Instantiation

Lipid Instance

Lipid Instance

Lipid Class Protein Instance

Rule Based Sentence Processing<Lipid> AND <Protein> AND LipidProteinInteraction-TriggerWord e.g. "interact", "bind", "mediate" <Lipid> AND <Disease> AND LipidDiseaseInteraction-TriggerWord e.g "involve", "cause"

Page 16: "Ontology-centric navigation of the scientific literature"

Ontology instantiation

User

Knowledge Integration and QuerySearch Engine

docs tagged

withrelevant name

entities

Knowledge Navigation

vehicleOutput for end user

Baker CJ, Kanagasabai R, Ang WT, Veeramani A, Low HS, and Wenk MR. Towards ontology-driven navigation of the lipid bibliosphere. BMC Bioinformatics. 2008;9 Suppl 1:S5.

NLP tagging

Instantiation Time: 22 seconds

92 Lipidmaps names instantiated to 35 classes (2.6 lipids per class)

Co-occurrence before rules 1356 Sentences, After rules 683 Interaction sentences

Sentences:

Cross link to 59 Lipidbank entries

52 IUPAC names, 412 exact synonyms, 6 broad synonyms, 319 protein names

92 Lipidmaps systematic names

After normalisation and grounding:

528 protein names

186 lipid names

141 papers contributed to ontology instantiation

121 papers with no lipid protein relations

Papers identified: 262

“Instantiated ontology”

Web content orFull text papers

User input query

Page 17: "Ontology-centric navigation of the scientific literature"

Search Engine

Ontology instantiation

User

Knowledge Integration and Query

docs tagged

withrelevant name

entities

Knowledge Navigation

vehicleOutput for end user

Baker CJ, Kanagasabai R, Ang WT, Veeramani A, Low HS, and Wenk MR. Towards ontology-driven navigation of the lipid bibliosphere. BMC Bioinformatics. 2008;9 Suppl 1:S5.

“Instantiated ontology”

NLP taggingUser input query Web content or

Full text papers

Page 18: "Ontology-centric navigation of the scientific literature"

Knowlegator

Query Composition Panel

Ontology Content

Results Panel

Query Syntax

Query Engine DialogueConcept

PropertiesOverview

Page 19: "Ontology-centric navigation of the scientific literature"

Domain expert

Informatician

Find documents and sentences describing proteins-lipid interaction and corresponding lipid synonyms.

Complex Query Generation

Page 20: "Ontology-centric navigation of the scientific literature"

Pathway Discovery Algorithm

… paths between any object properties or a user defined object properties only e.g.protein interacts with protein

Finds transitive paths across the graph: between source and target concepts. Can define path length and result size

Page 21: "Ontology-centric navigation of the scientific literature"

Pathway Knowledge Discovery

... across multiple relations

Results with semantic labelling Kanagasabai R. Low HS ,Ang WT, Wenk MR, Baker CJO.

Ontology-centric navigation of pathway information mined from text, Bio-Ontologies SIG: Knowledge in Biology, ISMB July 2008

2 concepts or keywords

Page 22: "Ontology-centric navigation of the scientific literature"

Pathway Knowledge Discovery 2

Page 23: "Ontology-centric navigation of the scientific literature"

Navigation of Cancer Pathways

Page 24: "Ontology-centric navigation of the scientific literature"

1 search term (instance or concept) generates a list of natural language questions answerable by the ontology

and a direct link to answers

Ang WT, Kanagasabai R, Baker CJ. Knowledge Translation: Computing the query potential of bio-ontologies, Genome Informatics Workshop 2008 Submitted …..

Page 25: "Ontology-centric navigation of the scientific literature"

Application Workflow

Page 26: "Ontology-centric navigation of the scientific literature"

Semantic Technologies Architecture

Page 27: "Ontology-centric navigation of the scientific literature"

Knowledge Services: Development

P h a s e 1 P h a s e 2

Navigation Paradigms

NLP &Text

Mining

Semantic Data

Integration

Knowledge Worker involved in Discovery

Databases

Multi-user involvement

Ontology EngineeringMaintenance

EvolutionQuality

Ontology

Domain Expert

Semantics Engineer Ontology Engineer

Text Mining Engineer

Page 28: "Ontology-centric navigation of the scientific literature"

Annotation Services

Page 29: "Ontology-centric navigation of the scientific literature"

AcknowledgementsSemantic Technology Group

Christopher J. O. BakerKanagasabi Rajaraman

Menaka RajapakseAnitha VeeramaniAng Wee Tiong

Alexander Garcia (Alumnus)

CollaboratorsMarkus R Wenk, NUSLow Hong-Sang, NUSChoo Kar Heng, I2R

Shoba Ranganathan NUSSuisheng Tan, I2R