Upload
boris-villazon-terrazas
View
851
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Methodological Guidelines for Publishing Linked Data presented at CONSEGI 2011
Citation preview
Methodological Guidelines for Publishing Linked DataPublishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
CONSEGI 2011 – Brasília, Brazil12th May, 2011
ToC
• Introduction to Linked Data
G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data
• Demo• Demo
2
ToC
• Introduction to Linked Data
• Guidelines for Publishing Linked Data
• Demo
3
Classic Web
MovieDB
Data exposed tothe Web viathe Web via
HTML, pdf, etc.
CIAWorld
FactBook
4
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Classic Web
Information fromsingle pagesComplex queriessingle pages
can be found viasearch engines
over multiplepages / data
?search engines
sources?
5
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
What do we actually want?
• Use the Web like a single global database
MovieDBCIA
WorldFactBook
6
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Linked Data enables such Web of DataGlobal Identifier: URI (Uniform Resource Identifier) which is a string of characters usedGlobal Identifier: URI (Uniform Resource Identifier), which is a string of characters used
to identify a name or a resource on the Internet.Data Model: RDF (Resource Description Framework), which is a standard model
for data interchange on the WebAccess Mechanism: HTTPConnection: Typed Links
8000000
“Even the Rain”
http://cia.../Boliviahttp://imdb.../TLLuvia
http://.../populationhttp://.../name
http://.../filming_location
p
MovieDBCIA
WorldFactBook
7© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
In a nutshell• An extension of the current• An extension of the current
Web…• … where information and services data
are given well-defined and explicitly represented meaning, …
• … so that it can be shared and used by humans and machinesby humans and machines, ...
• ... better enabling them to work in cooperation
• How?• Promoting information exchange by
tagging web content with machine processable descriptions of its meaning. A d t h l i d i f t t• And technologies and infrastructureto do this
• And clear principles on how to publish data
8
publish data
The four principles (Tim Berners Lee, 2006)
1. Use URIs as names for things
• http://www.w3.org/DesignIssues/Linkedfor things
2. Use HTTP URIs so that people can look
esignIssues/LinkedData.html
that people can look up those names.
3. When someone looks http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlhttp://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
up a URI, provide useful information,
i th t d dusing the standards (RDF*, SPARQL)
4 Include links to other4. Include links to other URIs, so that they can discover more things.discover more things.
9
So does that mean I have to publish my data as Linked Data, now?
• But, why?
• What was your incentive to publish an HTML page in 1990?• Share data in documents and because your neighbor
was doing itwas doing it
• So, why should we publish Linked Data in 2011?, y p• Share data as data and because your neighbor is doing it
10
© Slide adapted from “Introduction to Linked Data”- Juan Sequeda
And guess who is starting to publish Linked Data now?
• UK Government• UK Government• US Government• BBC• Open Calais• Freebase• NY Times• CNET• Dbpedia• Dbpedia• ….
11
Linked Open Data evolution
2007
2008
2009
1212
Linked Open Data
2010
13
http://richard.cyganiak.de/2007/10/lod/
ToC
• Introduction to Linked Data
G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data
• Demo• Demo
14
Linked Data in OEG
• GeoLinkedData is an open initiative whose aim is toenrich the Web of Data with Spanish geospatial data.p g phttp://geo.linkeddata.es
• El Viajero Linked Data is project that focuses on theintegration of the contents produced by newspapersand digital platforms belonging to Prisa Groupand digital platforms belonging to Prisa Group.http://webenemasuno.linkeddata.es/
• A project with the Biblioteca Nacional to publish thelibrary information as Linked Data.yhttp://cultura.linkeddata.es/visualizer/
15
Linked Data in OEG
• Tools for generating and cosuming Linked Data, e.g.,• geometry2rdf http://www oeg upm net/index php/downloads/151 geometry2rdf• geometry2rdf http://www.oeg-upm.net/index.php/downloads/151-geometry2rdf
• map4rdf http://oegdev.dia.fi.upm.es/projects/map4rdf/
• Spanish Thematic Network of Linked Data http://red.linkeddata.esp
» Group leader: Ontology Engineering Group
» 19 Research Groups
» 4 companies» 4 companies
16
Guidelines for Publishing Linked Data
17
Guidelines for Publishing Linked Data
18
Identification of the data sources
• Guidelines based on the Open Data Manual 1
• Two possibilities
• To find the data sources already available in a public data catalog, e.g., Aporta project 2
• To get an agreement with a particular government body topublish its data sources, e.g., GeoLinkedData - IGNp g
19
1 http://opendatamanual.org/2 http://aporta.es
GeoLinkedDataIdentification of the data sources
IGNNational Geographic Institute of Spain
Agreement with the IGN
g p p
Oracle & MySQL
Data sources availablein a public data catalog
INENational Statistic Institute of Spain
in a public data catalog
20
IGN & INEIdentification of the data sources
Year
Industry Production IndexProvince
21
Guidelines for Publishing Linked Data
22
OntologyVocabulary Modelling
• An ontology is an engineering artifact, which provides: • A set of terms• A set of explicit assumptions regarding the intended meaning of the terms.
• Almost always including concepts and their classification• Almost always including properties between concepts
Shared nderstanding of a domain of interest• Shared understanding of a domain of interest
23
Reuse available vocabulariesVocabulary Modelling
Search for suitablevocabularies
Linked Open Vocabularies
are theresuitable
vocabularies?
Build the vocabulary byreusing available
vocabularies
Yes
No
24
…
Reuse available non-ontological resourcesVocabulary Modelling
Highly reliable Web Sites
Search for suitablenon-ontological resources
Domain-related sites
Government CatalogsGovernment Catalogs
are theresuitable
resources?
Build the vocabulary bytransforming available
resources
Yes
No
Build the vocabulary fromscratch
25
GeoLinkedDataVocabulary Modelling
scv:Dimensionscv:Item
scv:Dataset
WGS84 Geo Positioning: an RDF
vocabulary
hydrographical phenomena (riversphenomena (rivers,
lakes, etc.)
Vocabulary for instants, intervals, , ,durations, etc.
Ontology for OGC Geography Markup Language
Names and international code systems for territories and groupsg g
Classes 33 33
Object Properties 44 44
http://neon-toolkit.org/
j p
Data Properties 318 318
26
Guidelines for Publishing Linked Data
27
Generation of the RDF Data
INEINE
NOR2O
ODEMapster
IGNIGN
IGNIGN
GeospatialGeospatialcolumncolumn
Geometry2RDF
28
NOR2OIndustry Production Index Year
Generation of the RDF Data
Industry Production Index
ProvinceProvince
NOR2O
29
R2O & ODEMapsterR O is an extensible fully declarative language to describe
Generation of the RDF Data
• R2O is an extensible, fully declarative language to describe mappings between relational database schemas and ontologies.
• The ODEMapster processor generates RDF instances from relational instances based on the mapping description expressed in the R2O document
30
www.oeg-upm.net/index.php/en/downloads/9-r2o-odempaster
R2O & ODEMapsterGeneration of the RDF Data
• Creation of the R2O Mappings
31
R2O & ODEMapsterGeneration of the RDF Data
Excerpt of the R2O documentExcerpt of the R2O document
32
geometry2rdfGeneration of the RDF Data
• Tool for generating RDF from geometrical information
• The geometry could be available in GML or WKT
• The RDF generated follows our Geometry Model
33
http://www.oeg-upm.net/index.php/en/downloads/151-geometry2rdf
geometry2rdfGeneration of the RDF Data
Oracle STO UTIL packageOracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" cWHERE c.Etiqueta='Arroyo'
34
geometry2rdfGeneration of the RDF Data
Geometry ModelGeneration of the RDF Data
geoes: http://geo.linkeddata.es/geo: http://www.w3.org/2003/01/geo/wgs84_pos#
geoes:ontology/Geometría
rdfs:subClassOf rdfs:subClassOf
geoes:ontology/Polígonogeoes:ontology/Curvageo:Point
rdfs:subClassOfrdfs:subClassOf
rdfs:subClassOf
3939geo:lat geo:long Collection of 2 or Collection of 3 or
formadoPor formadoPor
more geo:PointsCollection of 3 ormore geo:Points
36
RDF generated according to our Geometry ModelGeneration of the RDF Data
1 2
0
0
37
URI GenerationGeneration of the RDF Data
• URIs are extremely relevant in this process since they are the key for the alignment of heterogeneousthey are the key for the alignment of heterogeneous resources that come from different data sources.• Cool URIs 1
• UK Cabinet Office 2
• Examples:http://geo.linkeddata.es/ontology/{class/property}
http://geo.linkeddata.es/ontology/Lago
http://geo linkeddata es/resource/dataset/type/{resourcename}http://geo.linkeddata.es/resource/dataset/type/{resourcename}
http://geo.linkeddata.es/resource/Provincia/Madrid
38
1 http://www.w3.org/TR/cooluris/2 http://www.cabinetoffice.gov.uk/media/301253/puiblic sector uri.pdf
Provenance InformationGeneration of the RDF Data
• It is relevant• to manage the provenance information of the resources• to manage the provenance information of the resources• to establish the license of the information
• Example
39
Pubby: http://www4.wiwiss.fu-berlin.de/pubby/
Guidelines for Publishing Linked Data
40
Publication of the RDF data
map4rdf
map4rdfhttp://oegdev.dia.fi.upm.es/projects/map4rdf/
SPARQLLinked DataHTML
PubbyIncluding Provenance Pubby
Pubby 0.3
Including ProvenanceSupport
http://www4.wiwiss.fu-berlin.de/pubby/
41
Virtuoso 6.1.0
Guidelines for Publishing Linked Data
42
Data Cleansing
• To find possible errors, identified by Hogan et al.• http-level issues such as accessibility and derefencability• http-level issues, such as accessibility and derefencability,
e.g., HTTP URIs return 40x/50x errors• reasoning issues such as namespace without vocabulary,
e.g., rss:item term invented• malformed/incompatible datatypes, e.g., “true” as xsd:int
• To fix the identified errors
• Example, encoding URIs• Special characters á é ñSpecial characters á, é, ñ
• http://geo.linkeddata.es/resource/Provincia/M%C3%A1laga
43
Guidelines for Publishing Linked Data
44
Linking the RDF Data
Identify suitable data sets li ki t t
http://ckan.netas linking targets
Discover relationshipsbetween data items
Silk FrameworkLIMEShttp://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
Validate the relationshipsdiscovered sameAs Validator
http://oegdev.dia.fi.upm.es:8080/sameAs/
45
GeoLinkedDataLinking the RDF Data
GeoLinkedData
GeoNamesDBPedia
…. …. ….
http://sws.geonames.org/6355233/
http://geo.linkeddata.es/.../Madrid
http://dbpedia.org/resource/Madrid
46
…. …. ….
sameAs ValidatorLinking the RDF Data
http://oegdev.dia.fi.upm.es:8080/sameAs/
47
Guidelines for Publishing Linked Data
48
Register the dataset into CKAN RegistryEnable Effective Discovery
• Add the dataset to CKAN, the open registry of data and content packagesand content packages
• Minimum information• Minimum information• Name, unique ID for your data set on CKAN• Title, full name of your data set, y• URL, link to the data set home page
49
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
Sitemap protocolEnable Effective Discovery
• Used by web crawlers• Efficiently find all your content & discover
what has been updatedhttp://sitemaps.org/
A i fil i i f i di URLA sitemap file contains information regarding one or more URLs onyour Web site. The information that is stored there helps searchengines better spider your website.
50
Sindice: the best RDF search engineEnable Effective Discovery
51
sitemap4rdfEnable Effective Discovery
• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap• Generates sitemap
it 4 df htt // it / l htt // it / /sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Example:
it 4 df if i th SPARQL d i t
sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/
• run sitemap4rdf specifying the SPARQL endpointand the prefix of the URLs to include in the Sitemap
52
http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
Submit the sitemap location - SindiceEnable Effective Discovery
• http://sindice.com/main/submit
53
Submit the sitemap location - GoogleEnable Effective Discovery
• https://www.google.com/webmasters/tools/
54
ToC
• Introduction to Linked Data
G id li f P bli hi Li k d D t• Guidelines for Publishing Linked Data
• Demo• Demo
55
DEMODEMOhttp://geo linkeddata es/browserhttp://geo.linkeddata.es/browser
56
Provinces
57
Capital of Province
58
Provinces – Industry Production Index
59
Beaches
60
DEMODEMOhttp://webenemasuno linkeddata es/http://webenemasuno.linkeddata.es/
61
Trips
62
Guide Locations
63
Guide
64
Future Work
65
Methodological Guidelines for Publishing Linked DataPublishing Linked Data
Boris Villazón-Terrazas, Asunción Gómez-Pérez, and Óscar Corcho
Facultad de Informática, Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www oeg upm nethttp://www.oeg-upm.net{bvillazon,asun,ocorcho}@fi.upm.es
Phone: 34.91.3366605, Fax: 34.91.3524819
CONSEGI 2011 – Brasília, Brazil12th May, 2011