55
Publishing Linked Data from RDB Boris Villazón-Terrazas. Oscar Corcho Facultad de Informática Universidad Politécnica de Madrid Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net [email protected] Ph 34 91 3366605 F 34 91 3524819 Phone: 34.91.3366605, Fax: 34.91.3524819 Slides available at: http://www.slideshare.net/boricles/ Acknowledgements: Freddy Priyatna, Jan Schulte, Richard WorkdistributedunderthelicenseCreativeCommonsAttribution- Noncommercial-Share Alike 3.0 Cyganiak and many others that we may have omitted.

Publishing Linked Data from RDB

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Publishing Linked Data from RDB

Publishing Linked Data from RDB

Boris Villazón-Terrazas. Oscar CorchoFacultad de Informática Universidad Politécnica de MadridFacultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net

[email protected] 34 91 3366605 F 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819

Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Freddy Priyatna, Jan Schulte, Richard

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Cyganiak and many others that we may have omitted.

Page 2: Publishing Linked Data from RDB

2

Page 3: Publishing Linked Data from RDB

Specification – RDB about Research Groups

3

Page 4: Publishing Linked Data from RDB

Specification - URI design

• Base URI

• http://research.ull.es

• TBOX URITBOX URI

• http://research.ull.es/ontology/{class|property}

• ABOX URI

• http://research.ull.es/resource/{resourceType}/{resource}

4

Page 5: Publishing Linked Data from RDB

5

Page 6: Publishing Linked Data from RDB

Modelling

Linked Open Vocabularies http://labs.mondeca.com/dataset/lov/

FOAFFOAFFriend of a Friend vocabulary

BIBOTh Bibli hi O t lThe Bibliographic Ontology

GEOPFAO Geopolitical Ontology

6

Page 7: Publishing Linked Data from RDB

Modelling – NeOn Toolkit

http://neon-toolkit.org/

New Project

New Ontology

7

Page 8: Publishing Linked Data from RDB

Modelling – Creating some elements

ClassesClassesClassesClasses

Obj t P tiObj t P tiObject PropertiesObject Properties

Datatype PropertiesDatatype Properties

8

Page 9: Publishing Linked Data from RDB

9

Page 10: Publishing Linked Data from RDB

Transformation – RDB2RDF

• A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

• W3C RDB2RDF Working Group 1• R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/• Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/• R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/

R2RML/Direct Mapping in process to be a W3C Recommendation

10

1 http://www.w3.org/TR/r2rml/

Page 11: Publishing Linked Data from RDB

• R2O is an extensible, fully declarative language to describe Transformation – R2O & ODEMapster

2O s a e te s b e, u y dec a at e a guage to desc bemappings between relational database schemas and ontologies.

• The ODEMapster processor generates RDF instances from relational instances based on the mapping description pp g pexpressed in the R2O document

11

www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster

Page 12: Publishing Linked Data from RDB

Generation – Transformation - ODEMapster

ODEMapster

ResearchResearchRDBRDB

• Included in the NTK – currently under revision

12

Page 13: Publishing Linked Data from RDB

Transformation - ODEMapster

13

Page 14: Publishing Linked Data from RDB

New R2O Mapping – DB connection information

14

Page 15: Publishing Linked Data from RDB

ODEMapster – DB Connections

15

Page 16: Publishing Linked Data from RDB

R2O Mapping Perspective

16

Page 17: Publishing Linked Data from RDB

ODEMapster – Creating Mappings

17

Page 18: Publishing Linked Data from RDB

ODEMapster – Creating Mappings

18

Page 19: Publishing Linked Data from RDB

ODEMapster2 – command line version

• Odemapster2 folder

• research.r2o.properties

19

Page 20: Publishing Linked Data from RDB

Main sections of an R2O Mapping

20

Page 21: Publishing Linked Data from RDB

ODEMapster2 – command line version

• research.r2o.xml

conceptmap-defconceptmap-defconceptmap defconceptmap def

uri-asuri-as

attributemap-defattributemap-def

dbrelationmap-defdbrelationmap-def

21

Page 22: Publishing Linked Data from RDB

ODEMapster2 – command line version

• research.bat

22

Page 23: Publishing Linked Data from RDB

ODEMapster2

• File generated

23

Page 24: Publishing Linked Data from RDB

Linking - Identify suitable data sets as linking targetshttp://ckan net

Semantic Web Dog Food Corpushttp://data.semanticweb.org/

http://ckan.net

Endpointhttp://data.semanticweb.org/snorql/

24

Page 25: Publishing Linked Data from RDB

Linking – Silk framework

• http://www4.wiwiss.fu-berlin.de/bizer/silk/• Copy workbench.war to the webapps directory (tomcat directory)

• Silk Workbench• Silk Workbench

25

Page 26: Publishing Linked Data from RDB

Link Specification Language

26

Page 27: Publishing Linked Data from RDB

Silk framework

• http://localhost/workbench

27

Page 28: Publishing Linked Data from RDB

Silk framework

• First source

28

Page 29: Publishing Linked Data from RDB

Silk framework

• Second source

29

Page 30: Publishing Linked Data from RDB

Silk framework

• Add a task

30

Page 31: Publishing Linked Data from RDB

Silk framework

• Add output

31

Page 32: Publishing Linked Data from RDB

Silk framework

• Edit Linking task

32

Page 33: Publishing Linked Data from RDB

Silk framework

• Edit researchlinks

33

Page 34: Publishing Linked Data from RDB

Silk framework

• Generate links

34

Page 35: Publishing Linked Data from RDB

Silk framework

• Validate links

35

Page 36: Publishing Linked Data from RDB

Silk framework

• Export

36

Page 37: Publishing Linked Data from RDB

37

Page 38: Publishing Linked Data from RDB

Publication - Virtuoso Open-source edition

• http://sourceforge.net/projects/virtuoso/files/virtuoso/6.1.4/virtuoso-opensource-win32-20111101.zip/download

• Unzip to C:\software• Unzip to C:\software

• ODBC Registration: • Set up VIRTUOSO_HOME variable pointing out to virtuoso-opensource directory:• cd %VIRTUOSO HOME%\libcd %VIRTUOSO_HOME%\lib• regsvr32 virtodbc.dll

• Creating a Windows Service• cd %VIRTUOSO_HOME%\database• SET PATH=%PATH%;%VIRTUOSO_HOME%\bin;%VIRTUOSO_HOME%\lib• virtuoso-t -? //to verify• virtuoso-t +service screate +instance "Instance Name" +configfile virtuoso.ini • virtuoso-t +service list //to verify• virtuoso t I "Instance Name" +service start //start the service• virtuoso-t -I Instance Name +service start //start the service

38

Page 39: Publishing Linked Data from RDB

Virtuoso - Conductor

• http://localhost:8890/conductor

39

Page 40: Publishing Linked Data from RDB

Virtuoso - Conductor

• Upload the generated files

• Ontology: http://research.ull.es/graph/ontology -research.owl

• Dataset: http://research.ull.es/graph/dataset -research.rdfLi k h // h ll / h/li k• Links: http://research.ull.es/graph/links - output.nt

40

Page 41: Publishing Linked Data from RDB

Virtuoso endpoint

• http://localhost:8890/sparql

41

Page 42: Publishing Linked Data from RDB

Virtuoso endpoint

• Now you can play a bit with SPARQL … ;)

42

Page 43: Publishing Linked Data from RDB

Metadata publication – VOiD

• VOiD description• VOiD description• void.ttl

43

Page 44: Publishing Linked Data from RDB

Metadata Publication – CKAN.net / thedatahub.org

44

Page 45: Publishing Linked Data from RDB

Enable effective discovery

• Sindice: the best RDF search engine

45

Page 46: Publishing Linked Data from RDB

Enable effective discovery

• Sitemap Protocol• Used by web crawlers• Efficiently find all your content & discover what

has been updatedhtt // it /http://sitemaps.org/

A sitemap file contains information regarding one or more URLs on your Web site. Theinformation that is stored there helps search engines better spider your website.

A sitemap file contains information regarding one or more URLs on your Web site. The information that is stored there helps search enginesWeb site. The information that is stored there helps search engines better spider your website.

46

Page 47: Publishing Linked Data from RDB

Sitemap.xml example

<?xml version="1.0" encoding="UTF-8"?><urlset

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><url>

<loc>http://yoursite/</loc></url><url><url>

<loc>http://yoursite/products/53546</loc></url><url>

<loc>http://yoursite/products/98421</loc></url><url>

<loc>http://yoursite/products/41003</loc><lastmod>2010-06-24</lastmod>

<changefreq>daily</changefreq> Optional parts

47

</url></urlset>

Page 48: Publishing Linked Data from RDB

Sitemap Protocol: Huge sitemaps

• Gzip-compress your sitemap• Limit: 50k URLs or 10MB• Limit: 50k URLs or 10MB

• split into multiple sitemap filesdd it i d fil• add a sitemap index file

48

Page 49: Publishing Linked Data from RDB

Sitemap Protocol: Discovery

• Publish the sitemap file

• Add a line to http://yoursite/robots.txt

• Web site owners use the /robots.txt file to give instructions about their site gto web robots; this is called The Robots Exclusion Protocol.

Sitemap: http://yoursite/sitemap.xml

49

Page 50: Publishing Linked Data from RDB

sitemap4rdf

• Simple command line tool• Sends a SPARQL query to list all URIs• Sends a SPARQL query to list all URIs• Generates sitemap

sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Example:

sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/sitemap4rdf http://localhost:8890/sparql http://research.ull.es/

• run sitemap4rdf specifying the SPARQL endpointand the prefix of the URLs to include in the Sitemap

50

Page 51: Publishing Linked Data from RDB

Submit the sitemap location - Sindice

• http://sindice.com/main/submit

51

Page 52: Publishing Linked Data from RDB

Submit the sitemap location - Google

• https://www.google.com/webmasters/tools/

52

Page 53: Publishing Linked Data from RDB

53

Page 54: Publishing Linked Data from RDB
Page 55: Publishing Linked Data from RDB

Publishing Linked Data from RDB

Boris Villazón-Terrazas, Oscar CorchoFacultad de Informática Universidad Politécnica de MadridFacultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net

[email protected] 34 91 3366605 F 34 91 3524819Phone: 34.91.3366605, Fax: 34.91.3524819

Slides available at: http://www.slideshare.net/boricles/

Acknowledgements: Freddy Priyatna, Jan Schulte, Richard

WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0

Cyganiak and many others that we may have omitted