36
A Semantic Framework for Biomedical Image Discovery Ahmad C. Bukhari , Mate Levente Nagy, Michael Krauthammer , Paolo Ciccarese , Christopher J. O. Baker 1

A semantic framework for biomedical image discovery

Embed Size (px)

Citation preview

Page 1: A semantic framework for biomedical image discovery

A Semantic Framework for Biomedical Image

Discovery

Ahmad C. Bukhari , Mate Levente Nagy, Michael Krauthammer , Paolo Ciccarese ,

Christopher J. O. Baker

1

Page 2: A semantic framework for biomedical image discovery

Background and Motivation

• Rapid development in Biomedical research produces a

continuous stream of new knowledge.

• Efficient literature accessing practices are essential to transfer of

information from the research community to peer investigators

and other healthcare practitioners.

• Images depict key findings of research papers and help

academically to better understand the biological concepts.

Page 3: A semantic framework for biomedical image discovery

Background and Motivation

Protein and DNA sequence Images: Provide exact specification of the

composition of a biological entity.

Pathway diagrams:

Signaling flow / interacting proteins.

Page 4: A semantic framework for biomedical image discovery

Background and Motivation

MRI Scans - locations of specific brain

activity.

Gel Imaging - information about DNA and

protein manipulation

Ultra

soun

d

X-

Rays

Gra

phs

Page 5: A semantic framework for biomedical image discovery

Background and Motivation

• Making biomedical image content explicit is essential with regards to making

medical decisions such as:

5

Diagnosis, Treatment and Follow-up

Data management and the secondary use for biomedical research

Assessment of care delivery.

• However, the issues associated with knowledge management and

utility operations unique to image data are only recently gaining

recognition.

Page 6: A semantic framework for biomedical image discovery

Background and Motivation

6

In our previous work, we have developed

Page 7: A semantic framework for biomedical image discovery

Yale Image Finder

• Yale Image Finder (YIF) is one of the most widely accessed biomedical image

search engines.

• It retrieves biomedical images and associated data based on queries made

over the metadata of the images.

• YIF also searches within the image using a sophisticated image segmentation

method followed by OCR

• YIF repository currently holds over two million biomedical images and

associated metadata in its index.

7

Page 8: A semantic framework for biomedical image discovery

Challenges

• Searching for images of a certain type is error prone as images are still opaque to

information retrieval and knowledge extraction engines.

• In the Life Sciences, spreadsheets, databases and XML files continue to be the

conventional formats used to store experimental data, e.g. Biota , DrugBank and Open

Microscopy Environment (OME)

• The fact that data exists only in these legacy formats frequently impedes data integration

and significantly impedes scientific knowledge discovery. Interoperability and Reusability

Data integration

Image Provenance (Orphan Data)

Semantic Search is not possible

Page 9: A semantic framework for biomedical image discovery

Proposed Solution

• To overcome these issues and to accelerate the adoption of the YIF for next

generation biomedical applications, we have developed a publically

accessible semantic API for biomedical images with multiple modalities called

• iCyrus is powered by a dedicated semantic architecture that exposes the YIF

content as linked Image data

9

Page 10: A semantic framework for biomedical image discovery

Proposed Solution

• iCyrus permits integration with related information resources and consume by

linked data-aware data services.

• To facilitate the adhoc integration of image data with other online data

resources, we also built semantic web services for iCyrus, such that it is

compatible with the SADI framework.

• We have extended iCyrus functionalities further through the incorporation of

Domeo.

• The iCyrus triplestore currently holds more than thirty-five million triples and

can be accessed and operated through syntactic or semantic query interfaces.

10

Page 11: A semantic framework for biomedical image discovery

iCyrus Process diagram

11

Page 12: A semantic framework for biomedical image discovery

Stage 1

• At stage 1, Image datasets are acquired from Yale

Image Finder repository to build a knowledgebase for

iCyrus API

• Establishes connections with YIF and PubMed

concurrently to crosscheck the image metadata

• Resolve Image redundancy and completes the

metadata information

• Stored the clean image data in mysql as parallel

storage with triplestore12

Page 13: A semantic framework for biomedical image discovery

Stage 2

• The foremost task in semantic data publication is defining

appropriate semantic vocabularies.

• Reusability is considered a noble practice in semantic web

application development and it is generally accepted that

well-known semantic vocabularies

• To identify appropriate semantic mappings between

available ontologies and the YIF metadata, we created a

Java program that suggests possible mappings.

13

Page 14: A semantic framework for biomedical image discovery

Stage 2 (Continue)

A cursory evaluation of the derived mappings showed there were

three types of results;

① Mappings that fully met our requirements which suggested

predicates such as hasPubMedID and hasPMCID in the FRBR-

aligned Bibliographic Ontology (fbio)

② Mappings that were insufficiently defined, like the imageFeature

property that exists in DICOM Ontology

③ Mappings with hosted resources that did not appear trustworthy.

To manage the new vocabularies that fulfill the requirements

of iCyrus and SEBI

We developed BIM (Biomedical Image) ontology and set up an ontology-

14

Page 15: A semantic framework for biomedical image discovery

BIM Ontology

◼ Biomedical Image Ontology provides the semantic vocabularies to all modules ofSEBI.

◼ BIM vocabularies can be categorised into four types:

Annotation vocabularies hasImageAnnotationSet, hasSequenceType

Provenance Vocabularies hasAnnVerfiBy, hasCreatedBy

Features and function vocabularies hasSequenceMotif, hasConservedResidue

Semantic service vocabularies SADI services input and output

◼ It maintains the provenance on semantic annotations.

Page 16: A semantic framework for biomedical image discovery

BIM Model of automatic sequence annotation by a web

service

Page 17: A semantic framework for biomedical image discovery

17

BIM

cro

wd

-sourc

e M

odelin

g o

f a b

iom

edic

al

Image

Page 18: A semantic framework for biomedical image discovery

18

BIM

Modelin

g for

Image a

ssocia

ted text

Page 19: A semantic framework for biomedical image discovery

UNB-VPS (Semantic Vocabularies Publishing Server)

http://cbakerlab.unbsj.ca/unbvps

UNB-VPS is vocabulary publishing server deployed at UNB SJ campus and are used to publish the semantic vocabularies.

Page 20: A semantic framework for biomedical image discovery

The BIM Ontology

Page 21: A semantic framework for biomedical image discovery

Stage 2 (Continue)

• we developed a customizable domain-dependent schema mapper

for the curation of initial mappings based on our MySQL stored

data.

• With the help of the Jena Model API and domain-dependent

schema mapper, we RDFized the image data.

• At last stage, we stored the RDFized data into Sesame Triple

store

21

Page 22: A semantic framework for biomedical image discovery

Stage 3

• To expose the linked open data, we configured a

Sesame triple store and deployed SNORQL, an

AJAX front-end for exploring RDF SPARQL

endpoints

• SNORQL permits users to view and export data in

XML, XHTML and JSON - a javascript object

notation, a popular format among web developers.

• To facilitate end user navigation for technical users

through iCyrus linked data, we deployed Pubby , a

Linked Data interface for local and remote SPARQL

protocol servers. 22

Page 23: A semantic framework for biomedical image discovery

23

iCyrus SNORQL endpoint

Page 24: A semantic framework for biomedical image discovery

24

iCyrus Linked data explorer

Page 25: A semantic framework for biomedical image discovery

Stage 4

• To demonstrate iCyrus’ usability as a semantic image

API in general, we developed a number of web services

using the SADI framework to advertise their availability.

• Web services are effective medium for the use of

software functionalities in distributed environments

without deploying the entire application on the client

machine.

• A plethora of Bioinformatics software is available on the

internet, but most of them have their own accessing

criteria and information exchange formats.25

Page 26: A semantic framework for biomedical image discovery

Stage 4 (Continue)

• To get full benefit out of these utilities, output should be available in an integrated and interoperable format.

Available web services protocols - REST, SOAP WSDL etc.

Non Semantic

Schema oriented therefore interoperability and scalability issues

Page 27: A semantic framework for biomedical image discovery

SADI Framework- Stage 4 (Continue)

• The SADI framework is a set of conventions for creating HTTP-based semantic

web services that can be automatically discovered and orchestrated

• SADI services consume RDF document(s) as input and produce RDF

document(s) as output

It solves the interoperability problem.

• SADI framework is designed to achieve semantic interoperability among web

services designed for different purposes.

Page 28: A semantic framework for biomedical image discovery

Use Case (getAlternateGeneName)

SADI Service Modeling

Page 29: A semantic framework for biomedical image discovery

iCyrus with Federated Query Client

• We configured iCyrus’ SADI services registry with the

SHARE federated query engine to illustrate SHARE’s

automatic information discovery feature.

• As an example, we ran the following query: Display

images of the ‘IDS gene’ from all documents along with

their captions and extend the search with alternate

gene names.

Page 30: A semantic framework for biomedical image discovery

iCyrus Plugin for DOMEO

• The Domeo collection of software components provides a rich set of

features that can be further extended through the development of new

software plugins.

• In order to provide the additional information about the scientific contents in

Domeo, we mashed up metadata provided by Yale Image Finder through

the iCyrus SPARQL endpoint.

• We developed a server-side software connector that, given a PubMed

Central document, is able to query the iCyrus SPARQL end-point for all the

metadata related to a document’s images.

Page 31: A semantic framework for biomedical image discovery

31

Domeo EnvironmentiCyrus Images is crowd

Annotated and annotations

Are stored back to triple-

Page 32: A semantic framework for biomedical image discovery

SEBI as an Extension of iCyrus

32

• Image-first knowledge discovery framework to foster the efficient biomedical

literature searching practices and to facilitate the biological image discovery

and reuse.

Working

• SEBI unlocks information associated with and contained in biomedical

sequence images.

• Utilize the information extracted from images to harvest new image

annotations from heterogeneous online biomedical resources. e.g. BLAST,

HMMER

• SEBI incorporates knowledge infrastructure components and services

including image feature extraction, Semantic Web data services, linked open

data

These resources discover and semantically interlink new information in a

SEBI stands for Semantic Enrichment of Biomedical Images

Page 33: A semantic framework for biomedical image discovery

Semantic Sequence Image Enrichment

33

Page 34: A semantic framework for biomedical image discovery

Related Image Finding in SEBI

34

SEBI employs the cosine similarity along with fuzzy rule engine to discover and categorize the related images

Page 35: A semantic framework for biomedical image discovery

Who will get benefit out of this work?

35

Clinician looks for the visual representation of a disease or condition

Researcher searches for studies with certain types of analyses

Students seek for diagrams that elucidate complex processes such as DNA

replication

Professional or educator look for an image for a presentation

Patient wants to better understand his disease.

Page 36: A semantic framework for biomedical image discovery

Any Question?

Scan for project page

36Please Feedback @bukharig8