47
Cross Search Service for Life Science and Semantic web National Institute of Biomedical Innovation Maori Ito 1 Presentation Materials http://l.bitcasa.com/ayav_jSQ

Presentation forpd bj_1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Presentation forpd bj_1

Cross Search Service for Life Science and Semantic web

National Institute of Biomedical Innovation Maori Ito

1

Presentation Materials http://l.bitcasa.com/ayav_jSQ

Page 2: Presentation forpd bj_1

Sagace

Search for Biomedical Data & Resources in Japan

Page 3: Presentation forpd bj_1

Features

•  Focus on biomedical database •  Semi-automated Ranking •  Refining search results with facets •  More informative search results with

metadata

Page 4: Presentation forpd bj_1

4

h"p://integbio.jp/en/

Page 5: Presentation forpd bj_1

Mechanisms of Search Engine

1.  Crawling 2.  Indexing 3. Query Processing 4. Scoring

Page 6: Presentation forpd bj_1

Crawling

6

Databases

Crawling Program

Page 7: Presentation forpd bj_1

Indexing

•  Split data convenient size and store own server

Internal Server

Indexing Data

Page 8: Presentation forpd bj_1

Query Processing and Scoring

Page 9: Presentation forpd bj_1

NIBIO

MEDALS

JCGGDB  

NBDC  /  DBCLS

AgriTogo  

Collaborate by using P2P

architecture

Search System

9

Page 10: Presentation forpd bj_1

Log Analysis and Reflect Search Results

•  The members of top 8 databases are almost the same. –  Patents –  KEGG MEDICUS

–  Medicine and pharmaceutical proceedings –  Drug emergency call

–  Ingredients information of health food

–  Merck Manual –  Medical Information Network Distribution Service

–  The Encyclopedia of Psychoactive Drugs

10

Page 11: Presentation forpd bj_1

Comparison of Databases

•  Popular databases are Medical or Pharmaceutical “literal rich” databases.

•  Top databases run away with the winnings!

•  More than half of databases have never clicked!

11

Page 12: Presentation forpd bj_1

Unpopular databases

•  Sagace has started the service in March 2012.

•  Some databases have never clicked since then.

•  Eliminate these databases. •  Databases

– 272 DB -> 122 DB

12

Page 13: Presentation forpd bj_1

Results

•  Accuracy for users must have improved. •  Reducing databases also caused speed

up.

13

Page 14: Presentation forpd bj_1

Specific databases in life science

•  Some databases in life science is lacked “literal information” .

•  Cross search engine is suitable to show literal information.

•  Semantic web will help these databases.

14

Page 15: Presentation forpd bj_1

Semantic Web?

15

Page 16: Presentation forpd bj_1

What is semantic web?

Semantic web is constructed by Web of Meaningful and Machine

Understandable Data

16

Page 17: Presentation forpd bj_1

Web of Document

17 h"p://pdbj.org/mine/summary/2yi1

Page 18: Presentation forpd bj_1

Search Engine Results

18

Query  “2yi1  pdbj”  search  on  google

Search  engine  can  reflect  only  text  data.

Page 19: Presentation forpd bj_1

Web of Document to Web of Data

19

Data

Data

Data

Data Data

Data Data

Data Data Data Data Data

Data

Data

h"p://pdbj.org/mine/summary/2yi1

Page 20: Presentation forpd bj_1

20

How should the computer recognize

these data?

Page 21: Presentation forpd bj_1

21

A.(Focus on search service) Mark-up with Metadata by Database Developer

Page 22: Presentation forpd bj_1

What is metadata?

•  Data about Data

Entry ID: 2YI1 Species:HOMO SAPIENS Reference: PubMed ID 22343627 See Also:2YHY,2YHW Experimental method: X-RAY DIFFRACTION Image: http://pdbj.org/pdb_images/2yi1.jpg

22

See  Also

Keywords

Reference

Species

Experimental  method

Entry  ID

Image

Page 23: Presentation forpd bj_1

Reflect Search Results

•  Metadata encourage encounter Users and Database

23

Image

Page 24: Presentation forpd bj_1

How to markup? (microdata)

•  Add metadata with html tag

24

http://schema.org/BiologicalDatabaseEntry/entryID

http://pdbj.org/mine/summary/2yi1 2YI1

<div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>  

</div>

Declare  Vocabulary

Content  (Object)

Property  (Predicate)  

Page 25: Presentation forpd bj_1

How to reflect? •  Crawler program can find metadata easily!

•  Add indexed data

•  Reflect search results

25

@BiologicalDatabaseEntry_entryID=2YI1

<div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>  

</div>

Page 26: Presentation forpd bj_1

Machine Understandable Data

•  Declaration of vocabulary is important.

26

E.g. entryID book?

products?

biological?  

recipe?

Page 27: Presentation forpd bj_1

Machine Understandable Data

•  Declaration of vocabulary is important.

27

E.g. entryID=2YI1

Biological  DatabaseEntry!!

<div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>  

</div>

Page 28: Presentation forpd bj_1

What is schema.org?

•  "Schema.org is a set of extensible schemas that enables webmasters to embed structured data on their web pages for use by search engines and other applications.” –  (http://schema.org/)

28

Page 29: Presentation forpd bj_1

It’s not only in Sagace.

29

•  "Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages.” (h"p://schema.org/)  

Page 30: Presentation forpd bj_1

•  Google support these content types: – Reviews – People

– Products – Businesses and organizations – Recipes

– Events – Music

30

Page 31: Presentation forpd bj_1

Current Situation •  Define original properties for Biological Database and

Biological Database Entry for schema.org –  entryID, isEntryOf, taxon, seeAlso, reference –  Schema.org proposal –  http://www.w3.org/wiki/WebSchemas/BioDatabases

•  Sagace can reflect them to search results.

•  Search Collaboration organization will also reflect them to search results. –  NBDC –  MEDALS (molprof)

•  How to mark up and search results examples in Sagace •  http://sagace.nibio.go.jp/press/metadata/markup/

31

Page 32: Presentation forpd bj_1

Sagace reflects these properties

•  image   •  isEntryOf  (Database name) •  entryID

•  taxon(Species) •  disease •  seeAlso (Reference database entry)

•  dateModified (last modified) •  reference (Reference article)

32

Page 33: Presentation forpd bj_1

To reflect biological data into major search engine, it requires adding schema.org.

33

schema.org Proposal

schema.org

Reflect Search Results

Biological Database and Biological Database Entry

Page 34: Presentation forpd bj_1

•  To achieve adding our proposal into schema.org,“Need more people who think it is a good idea.” (by organizers @ schema.org)

•  We need more databases!

34

Page 35: Presentation forpd bj_1

9 DBs have applied microdata!

•  DoBISCUIT (Database Of BIoSynthesis clusters CUrated and InTegrated)

•  JCRB Cell Bank

•  Functional Glycomics with KO mice database •  Glyco-Disease Genes Database •  Carbohydrate Interaction Database (Carint) •  JCGGDB Report •  MEDALS

•  Integbio Database Catalog •  Life Science Database Archive

35

Page 36: Presentation forpd bj_1

Search Results Example 1

36

Page 37: Presentation forpd bj_1

Search Results Example 2

37

Page 38: Presentation forpd bj_1

Issues (Cons) for Microdata

•  Microdata strongly recommend using schema.org vocabulary.

•  Microdata is W3C working group not recommendation

•  If we integrate RDF data, we have to consider again which vocabularies are suitable.

Page 39: Presentation forpd bj_1

RDFa Lite

•  RDFa Lite is a minimal subset of RDFa, the Resource Description Framework in attributes (http://www.w3.org/TR/rdfa-lite/)

– Affected by Microdata – W3C recommendation 07 June 2012

•  Ability to specify more than one vocabulary (not only schema.org)

•  Easy to mark up

39

Page 40: Presentation forpd bj_1

How to markup? (RDFa Lite)

•  Add metadata with html tag

40

http://schema.org/BiologicalDatabaseEntry/entryID

http://pdbj.org/mine/summary/2yi1 2YI1

<div  vocab=“h"p://schema.org”  typeof=“BiologicalDatabaseEntry”>    <span  property=“entryID”>2YI1</span>  

</div>

Declare  Vocabulary

Property  (Predicate)  

Content  (Object)

Page 41: Presentation forpd bj_1

If you use PDBo as extension vocabulary

41

<div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40.owl#"> <span property="PDBo:exptl.method">X-RAY DIFFRACTION</span> </div>

Image

Declare  Vocabulary

Content  (Object)

Property  (Predicate)  

Page 42: Presentation forpd bj_1

If metadata add into database...,

•  Search engine can pick up many important data.

•  Database developers can appeal their service more effectively.

•  Users can find easily which they are looking for.

42

Page 43: Presentation forpd bj_1

Current Situation

•  KNApSAcK has applied RDFa Lite. •  We’d like to reflect more information by

using RDFa Lite. •  If you add metadata into your databases,

please contact NBDC or me ([email protected])

•  Please collaborate with us ! •  Please tell me what kind of information is

suitable to show and refine.

43

Page 44: Presentation forpd bj_1

Acknowledgement •  National Institute of

Biomedical Innovation

–  Mizuguchi Kenji –  Morita Mizuki –  Igarashi Yoshinobu –  Sakate Ryuichi –  Nagao Chioko –  Chen Yi-an –  Akiko Fukagawa –  Tohru Masui –  Johan Nystrom-Persson

44

•  This project is supported by a collaboration "Database integration in NIBIO and cooperation with outside organizations" with the NBDC.

•  National Bioscience Database Center (NBDC)

•  National Institute of Agrobiological Sciences database (NIAS)

•  Molecular Profiling Research Center for Drug Discovery (molprof)

•  Japan Consortium for Glycobiology and Glycotechnology DataBase (JCGGDB)

Page 45: Presentation forpd bj_1

45

Page 46: Presentation forpd bj_1

46

Web of Data (Concept)

Page 47: Presentation forpd bj_1

47

http://schema.org/BiologicalDatabaseEntry/entryID

http://pdbj.org/mine/summary/xxxx xxxx

PDBj

PubMed:xxxxxxx

http://schema.org/BiologicalDatabaseEntry/reference

http://schema.org/BiologicalDatabaseEntry/isEntryOf

http://schema.org/BiologicalDatabaseEntry/reference

http://schema.org/BiologicalDatabaseEntry/isEntryOf

Database A http://databaseA.org/publication