67
Methodological Guidelines for Publishing Linked Data Publishing Linked Data Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www oeg upm net http://www .oeg-upm.net {bvillazon,asun,ocorcho}@fi.upm.es Phone: 34.91.3366605, Fax: 34.91.3524819 CONSEGI 2011 – Brasília, Brazil 12 th May, 2011

Methodological Guidelines for Publishing Linked Data

Embed Size (px)

DESCRIPTION

Methodological Guidelines for Publishing Linked Data presented at CONSEGI 2011

Citation preview

Page 1: Methodological Guidelines for Publishing Linked Data

Methodological Guidelines for Publishing Linked DataPublishing Linked Data

Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es

Phone: 34.91.3366605, Fax: 34.91.3524819

CONSEGI 2011 – Brasília, Brazil12th May, 2011

Page 2: Methodological Guidelines for Publishing Linked Data

ToC

• Introduction to Linked Data

G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data

• Demo• Demo

2

Page 3: Methodological Guidelines for Publishing Linked Data

ToC

• Introduction to Linked Data

• Guidelines for Publishing Linked Data

• Demo

3

Page 4: Methodological Guidelines for Publishing Linked Data

Classic Web

MovieDB

Data exposed tothe Web viathe Web via

HTML, pdf, etc.

CIAWorld

FactBook

4

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Page 5: Methodological Guidelines for Publishing Linked Data

Classic Web

Information fromsingle pagesComplex queriessingle pages

can be found viasearch engines

over multiplepages / data

?search engines

sources?

5

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Page 6: Methodological Guidelines for Publishing Linked Data

What do we actually want?

• Use the Web like a single global database

MovieDBCIA

WorldFactBook

6

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Page 7: Methodological Guidelines for Publishing Linked Data

Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters usedGlobal Identifier: URI (Uniform Resource Identifier), which is a string of characters used

to identify a name or a resource on the Internet.Data Model: RDF (Resource Description Framework), which is a standard model

for data interchange on the WebAccess Mechanism: HTTPConnection: Typed Links

8000000

“Even the Rain”

http://cia.../Boliviahttp://imdb.../TLLuvia

http://.../populationhttp://.../name

http://.../filming_location

p

MovieDBCIA

WorldFactBook

7© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

Page 8: Methodological Guidelines for Publishing Linked Data

In a nutshell• An extension of the current• An extension of the current

Web…• … where information and services data

are given well-defined and explicitly represented meaning, …

• … so that it can be shared and used by humans and machinesby humans and machines, ...

• ... better enabling them to work in cooperation

• How?• Promoting information exchange by

tagging web content with machine processable descriptions of its meaning. A d t h l i d i f t t• And technologies and infrastructureto do this

• And clear principles on how to publish data

8

publish data

Page 9: Methodological Guidelines for Publishing Linked Data

The four principles (Tim Berners Lee, 2006)

1. Use URIs as names for things

• http://www.w3.org/DesignIssues/Linkedfor things

2. Use HTTP URIs so that people can look

esignIssues/LinkedData.html

that people can look up those names.

3. When someone looks http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlhttp://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

up a URI, provide useful information,

i th t d dusing the standards (RDF*, SPARQL)

4 Include links to other4. Include links to other URIs, so that they can discover more things.discover more things.

9

Page 10: Methodological Guidelines for Publishing Linked Data

So does that mean I have to publish my data as Linked Data, now?

• But, why?

• What was your incentive to publish an HTML page in 1990?• Share data in documents and because your neighbor

was doing itwas doing it

• So, why should we publish Linked Data in 2011?, y p• Share data as data and because your neighbor is doing it

10

© Slide adapted from “Introduction to Linked Data”- Juan Sequeda

Page 11: Methodological Guidelines for Publishing Linked Data

And guess who is starting to publish Linked Data now?

• UK Government• UK Government• US Government• BBC• Open Calais• Freebase• NY Times• CNET• Dbpedia• Dbpedia• ….

11

Page 12: Methodological Guidelines for Publishing Linked Data

Linked Open Data evolution

2007

2008

2009

1212

Page 13: Methodological Guidelines for Publishing Linked Data

Linked Open Data

2010

13

http://richard.cyganiak.de/2007/10/lod/

Page 14: Methodological Guidelines for Publishing Linked Data

ToC

• Introduction to Linked Data

G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data

• Demo• Demo

14

Page 15: Methodological Guidelines for Publishing Linked Data

Linked Data in OEG

• GeoLinkedData is an open initiative whose aim is toenrich the Web of Data with Spanish geospatial data.p g phttp://geo.linkeddata.es

• El Viajero Linked Data is project that focuses on theintegration of the contents produced by newspapersand digital platforms belonging to Prisa Groupand digital platforms belonging to Prisa Group.http://webenemasuno.linkeddata.es/

• A project with the Biblioteca Nacional to publish thelibrary information as Linked Data.yhttp://cultura.linkeddata.es/visualizer/

15

Page 16: Methodological Guidelines for Publishing Linked Data

Linked Data in OEG

• Tools for generating and cosuming Linked Data, e.g.,• geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf• geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf

• map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/

• Spanish Thematic Network of Linked Data http://red.linkeddata.esp

» Group leader: Ontology Engineering Group

» 19 Research Groups

» 4 companies» 4 companies

16

Page 17: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

17

Page 18: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

18

Page 19: Methodological Guidelines for Publishing Linked Data

Identification of the data sources

• Guidelines based on the Open Data Manual 1

• Two possibilities

• To find the data sources already available in a public data catalog, e.g., Aporta project 2

• To get an agreement with a particular government body topublish its data sources, e.g., GeoLinkedData - IGNp g

19

1 http://opendatamanual.org/2 http://aporta.es

Page 20: Methodological Guidelines for Publishing Linked Data

GeoLinkedDataIdentification of the data sources

IGNNational Geographic Institute of Spain

Agreement with the IGN

g p p

Oracle & MySQL

Data sources availablein a public data catalog

INENational Statistic Institute of Spain

in a public data catalog

20

Page 21: Methodological Guidelines for Publishing Linked Data

IGN & INEIdentification of the data sources

Year

Industry Production IndexProvince

21

Page 22: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

22

Page 23: Methodological Guidelines for Publishing Linked Data

OntologyVocabulary Modelling

• An ontology is an engineering artifact, which provides: • A set of terms• A set of explicit assumptions regarding the intended meaning of the terms.

• Almost always including concepts and their classification• Almost always including properties between concepts

Shared nderstanding of a domain of interest• Shared understanding of a domain of interest

23

Page 24: Methodological Guidelines for Publishing Linked Data

Reuse available vocabulariesVocabulary Modelling

Search for suitablevocabularies

Linked Open Vocabularies

are theresuitable

vocabularies?

Build the vocabulary byreusing available

vocabularies

Yes

No

24

Page 25: Methodological Guidelines for Publishing Linked Data

Reuse available non-ontological resourcesVocabulary Modelling

Highly reliable Web Sites

Search for suitablenon-ontological resources

Domain-related sites

Government CatalogsGovernment Catalogs

are theresuitable

resources?

Build the vocabulary bytransforming available

resources

Yes

No

Build the vocabulary fromscratch

25

Page 26: Methodological Guidelines for Publishing Linked Data

GeoLinkedDataVocabulary Modelling

scv:Dimensionscv:Item

scv:Dataset

WGS84 Geo Positioning: an RDF

vocabulary

hydrographical phenomena (riversphenomena (rivers,

lakes, etc.)

Vocabulary for instants, intervals, , ,durations, etc.

Ontology for OGC Geography Markup Language

Names and international code systems for territories and groupsg g

Classes 33 33

Object Properties 44 44

http://neon-toolkit.org/

j p

Data Properties 318 318

26

Page 27: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

27

Page 28: Methodological Guidelines for Publishing Linked Data

Generation of the RDF Data

INEINE

NOR2O

ODEMapster

IGNIGN

IGNIGN

GeospatialGeospatialcolumncolumn

Geometry2RDF

28

Page 29: Methodological Guidelines for Publishing Linked Data

NOR2OIndustry Production Index Year

Generation of the RDF Data

Industry Production Index

ProvinceProvince

NOR2O

29

Page 30: Methodological Guidelines for Publishing Linked Data

R2O & ODEMapsterR O is an extensible fully declarative language to describe

Generation of the RDF Data

• R2O is an extensible, fully declarative language to describe mappings between relational database schemas and ontologies.

• The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document

30

www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster

Page 31: Methodological Guidelines for Publishing Linked Data

R2O & ODEMapsterGeneration of the RDF Data

• Creation of the R2O Mappings

31

Page 32: Methodological Guidelines for Publishing Linked Data

R2O & ODEMapsterGeneration of the RDF Data

Excerpt of the R2O documentExcerpt of the R2O document

32

Page 33: Methodological Guidelines for Publishing Linked Data

geometry2rdfGeneration of the RDF Data

• Tool for generating RDF from geometrical information

• The geometry could be available in GML or WKT

• The RDF generated follows our Geometry Model

33

http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf

Page 34: Methodological Guidelines for Publishing Linked Data

geometry2rdfGeneration of the RDF Data

Oracle STO UTIL packageOracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry

FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'

34

Page 35: Methodological Guidelines for Publishing Linked Data

geometry2rdfGeneration of the RDF Data

Page 36: Methodological Guidelines for Publishing Linked Data

Geometry ModelGeneration of the RDF Data

geoes: http://geo.linkeddata.es/geo: http://www.w3.org/2003/01/geo/wgs84_pos#

geoes:ontology/Geometría

rdfs:subClassOf rdfs:subClassOf

geoes:ontology/Polígonogeoes:ontology/Curvageo:Point

rdfs:subClassOfrdfs:subClassOf

rdfs:subClassOf

3939geo:lat geo:long Collection of 2 or Collection of 3 or

formadoPor formadoPor

more geo:PointsCollection of 3 ormore geo:Points

36

Page 37: Methodological Guidelines for Publishing Linked Data

RDF generated according to our Geometry ModelGeneration of the RDF Data

1 2

0

0

37

Page 38: Methodological Guidelines for Publishing Linked Data

URI GenerationGeneration of the RDF Data

• URIs are extremely relevant in this process since they are the key for the alignment of heterogeneousthey are the key for the alignment of heterogeneous resources that come from different data sources.• Cool URIs 1

• UK Cabinet Office 2

• Examples:http://geo.linkeddata.es/ontology/{class/property}

http://geo.linkeddata.es/ontology/Lago

http://geo linkeddata es/resource/dataset/type/{resourcename}http://geo.linkeddata.es/resource/dataset/type/{resourcename}

http://geo.linkeddata.es/resource/Provincia/Madrid

38

1 http://www.w3.org/TR/cooluris/2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf

Page 39: Methodological Guidelines for Publishing Linked Data

Provenance InformationGeneration of the RDF Data

• It is relevant• to manage the provenance information of the resources• to manage the provenance information of the resources• to establish the license of the information

• Example

39

Pubby: http://www4.wiwiss.fu-berlin.de/pubby/

Page 40: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

40

Page 41: Methodological Guidelines for Publishing Linked Data

Publication of the RDF data

map4rdf

map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/

SPARQLLinked DataHTML

PubbyIncluding Provenance Pubby

Pubby 0.3

Including ProvenanceSupport

http://www4.wiwiss.fu-berlin.de/pubby/

41

Virtuoso 6.1.0

Page 42: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

42

Page 43: Methodological Guidelines for Publishing Linked Data

Data Cleansing

• To find possible errors, identified by Hogan et al.• http-level issues such as accessibility and derefencability• http-level issues, such as accessibility and derefencability,

e.g., HTTP URIs return 40x/50x errors• reasoning issues such as namespace without vocabulary,

e.g., rss:item term invented• malformed/incompatible datatypes, e.g., “true” as xsd:int

• To fix the identified errors

• Example, encoding URIs• Special characters á é ñSpecial characters á, é, ñ

• http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga

43

Page 44: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

44

Page 45: Methodological Guidelines for Publishing Linked Data

Linking the RDF Data

Identify suitable data sets li ki t t

http://ckan.netas linking targets

Discover relationshipsbetween data items

Silk FrameworkLIMEShttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/

Validate the relationshipsdiscovered sameAs Validator

http://oegdev.dia.fi.upm.es:8080/sameAs/

45

Page 46: Methodological Guidelines for Publishing Linked Data

GeoLinkedDataLinking the RDF Data

GeoLinkedData

GeoNamesDBPedia

…. …. ….

http://sws.geonames.org/6355233/

http://geo.linkeddata.es/.../Madrid

http://dbpedia.org/resource/Madrid

46

…. …. ….

Page 47: Methodological Guidelines for Publishing Linked Data

sameAs ValidatorLinking the RDF Data

http://oegdev.dia.fi.upm.es:8080/sameAs/

47

Page 48: Methodological Guidelines for Publishing Linked Data

Guidelines for Publishing Linked Data

48

Page 49: Methodological Guidelines for Publishing Linked Data

Register the dataset into CKAN RegistryEnable Effective Discovery

• Add the dataset to CKAN, the open registry of data and content packagesand content packages

• Minimum information• Minimum information• Name, unique ID for your data set on CKAN• Title, full name of your data set, y• URL, link to the data set home page

49

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

Page 50: Methodological Guidelines for Publishing Linked Data

Sitemap protocolEnable Effective Discovery

• Used by web crawlers• Efficiently find all your content & discover

what has been updatedhttp://sitemaps.org/

A i fil i i f i di URLA sitemap file contains information regarding one or more URLs onyour Web site. The information that is stored there helps searchengines better spider your website.

50

Page 51: Methodological Guidelines for Publishing Linked Data

Sindice: the best RDF search engineEnable Effective Discovery

51

Page 52: Methodological Guidelines for Publishing Linked Data

sitemap4rdfEnable Effective Discovery

• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap• Generates sitemap

it 4 df htt // it / l htt // it / /sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Example:

it 4 df if i th SPARQL d i t

sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/

• run sitemap4rdf specifying the SPARQL endpointand the prefix of the URLs to include in the Sitemap

52

http://lab.linkeddata.deri.ie/2010/sitemap4rdf/

Page 53: Methodological Guidelines for Publishing Linked Data

Submit the sitemap location - SindiceEnable Effective Discovery

• http://sindice.com/main/submit

53

Page 54: Methodological Guidelines for Publishing Linked Data

Submit the sitemap location - GoogleEnable Effective Discovery

• https://www.google.com/webmasters/tools/

54

Page 55: Methodological Guidelines for Publishing Linked Data

ToC

• Introduction to Linked Data

G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data

• Demo• Demo

55

Page 56: Methodological Guidelines for Publishing Linked Data

DEMODEMOhttp://geo linkeddata es/browserhttp://geo.linkeddata.es/browser

56

Page 57: Methodological Guidelines for Publishing Linked Data

Provinces

57

Page 58: Methodological Guidelines for Publishing Linked Data

Capital of Province

58

Page 59: Methodological Guidelines for Publishing Linked Data

Provinces – Industry Production Index

59

Page 60: Methodological Guidelines for Publishing Linked Data

Beaches

60

Page 61: Methodological Guidelines for Publishing Linked Data

DEMODEMOhttp://webenemasuno linkeddata es/http://webenemasuno.linkeddata.es/

61

Page 62: Methodological Guidelines for Publishing Linked Data

Trips

62

Page 63: Methodological Guidelines for Publishing Linked Data

Guide Locations

63

Page 64: Methodological Guidelines for Publishing Linked Data

Guide

64

Page 65: Methodological Guidelines for Publishing Linked Data

Future Work

65

Page 66: Methodological Guidelines for Publishing Linked Data
Page 67: Methodological Guidelines for Publishing Linked Data

Methodological Guidelines for Publishing Linked DataPublishing Linked Data

Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho

Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es

Phone: 34.91.3366605, Fax: 34.91.3524819

CONSEGI 2011 – Brasília, Brazil12th May, 2011