17
Semantic Textmining Goals and achievements BioHackathon 2010

Textmining activities at BioHackathon 2010

Embed Size (px)

DESCRIPTION

Summary of the activities developed by the text mining task force during the BioHackathon 2010http://hackathon3.dbcls.jp/wiki/TextMining

Citation preview

Page 1: Textmining activities at BioHackathon 2010

Semantic Textmining

Goals and achievements

BioHackathon 2010

Team members

bull Hammad

bull Matthias

bull Venkata

bull Heiko

bull YAMAMOTO-san

bull Alberto

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 2: Textmining activities at BioHackathon 2010

Team members

bull Hammad

bull Matthias

bull Venkata

bull Heiko

bull YAMAMOTO-san

bull Alberto

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 3: Textmining activities at BioHackathon 2010

Original Proposal

bull Integration of text mining results

ndash Reflect Whatizit Medie

ndash Results as triplets

bull URI and predicates

ndash Implementation with SADI

ndash Result presentation using aTag

bull Explore relations

bull Interfaces

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 4: Textmining activities at BioHackathon 2010

The work done

bull Integration of text mining resultsndash Reflect Whatizit Mediendash Results as triplets

bull URI and predicates

ndash Future BioPython module and REST service

bull Explore relationsndash Sesame endpointndash Biogatewayndash ARQ for federated queries

bull Interfacesndash Result presentation using aTagndash Exhibit faceted interface

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 5: Textmining activities at BioHackathon 2010

httpwhatizitneurocommonsorg

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 6: Textmining activities at BioHackathon 2010

httpwhatizitneurocommonsorg

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 7: Textmining activities at BioHackathon 2010

RDF schema for TM

ltrdfDescriptiongt

ltrdftype rdfresource=httprdfsorgsiocnsItemgt

ltsiocabout rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltsioccontentgtSBMAltsioccontentgt

ltsioctopic rdfresource=httppurluniprotorguniprotP10275gt

ltrdfsseeAlso rdfresource=httpwwwncbinlmnihgovpubmed9002550gt

ltrdfDescriptiongt

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 8: Textmining activities at BioHackathon 2010

httpreflectws

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 9: Textmining activities at BioHackathon 2010

MEDIE and Enju APIs

bull MEDIE is an intelligent search engine to retrieve biomedical correlations from MEDLINE based on indexing by Natural Language Processing and Text Mining techniques

bull Enju is a syntactic parser for English

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 10: Textmining activities at BioHackathon 2010

Medie XML outputhttpwww-tsujiiissu-tokyoacjpmediedbclscgipmid=19116711

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 11: Textmining activities at BioHackathon 2010

Enju XML outputhttpdocmandbclsjpmedieconvpmid=17551671

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 12: Textmining activities at BioHackathon 2010

httptogowsdbclsjpentrypubmedpmidttl

httpwwwuniprotorguniprotP12345rdf

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 13: Textmining activities at BioHackathon 2010

Workflow

Whatizit Reflect Medie

TogoWS

Uniprot

XML XMLXML

RDF

RDF

RDF

Pubmed

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 14: Textmining activities at BioHackathon 2010

Substance A

Interacts with

Receptor B

Region C

axonal projections

brain region D

Region D

aversive stimuli

Interlink these entities with

taxonomies amp ontologies

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 15: Textmining activities at BioHackathon 2010

TMOntology

httphackathon3dbclsjpwikiTextMining

Page 16: Textmining activities at BioHackathon 2010

httphackathon3dbclsjpwikiTextMining