49
Automatically indexing science using natural- language processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks Automatically indexing science using natural-language processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust February 16, 2008

SemanticCampLondon, 16th February 2008

Embed Size (px)

DESCRIPTION

My presentation at SemanticCamp London, 16th February 2008

Citation preview

Page 1: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Automatically indexing science usingnatural-language processing, RDF and

SPARQL

Andrew Walkingshaw, Nick Day, Peter Corbett, JimDowning, Joe Townsend, Peter Murray-Rust

February 16, 2008

Page 2: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 3: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 4: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 5: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 6: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 7: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Supplemental data: CrystalEye

• http://wwmm.ch.cam.ac.uk/crystaleye/

• Repository for crystallographic data

Page 8: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Supplemental data: CrystalEye

• http://wwmm.ch.cam.ac.uk/crystaleye/

• Repository for crystallographic data

Page 9: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journals and arXiv

• “Traditional” journal articles

• Titles and abstracts. . .

Page 10: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journals and arXiv

• “Traditional” journal articles

• Titles and abstracts. . .

Page 11: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journalism and blogs

• Unstructured text with little semantics;

• . . . hence Google Scholar, Web of Science, etc.

Page 12: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journalism and blogs

• Unstructured text with little semantics;

• . . . hence Google Scholar, Web of Science, etc.

Page 13: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 14: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 15: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 16: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 17: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 18: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 19: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 20: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 21: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 22: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 23: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 24: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 25: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 26: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 27: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 28: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 29: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 30: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 31: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 32: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 33: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 34: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 35: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 36: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 37: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 38: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL is great.

Just post queries at a SPARQL endpoint:authortemplate=’’’PREFIX dc: <http://purl.org/dc/terms/>PREFIX ce:<http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#>DESCRIBE ?file WHERE { ?file dc:contributorsome author . }’’’

Page 39: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 40: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 41: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 42: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 43: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 44: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 45: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 46: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 47: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Demo!

And here it is.

Page 48: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Thanks to. . .

• Talis (http://n2.talis.com/) for access to their platform

• and to the RSC and IUCr for their support of CrystalEye.

Page 49: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Thanks to. . .

• Talis (http://n2.talis.com/) for access to their platform

• and to the RSC and IUCr for their support of CrystalEye.