Upload
laura-po
View
537
Download
0
Embed Size (px)
Citation preview
Introduction to Linked Data
Laura Po - Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain.
Laura Po
Objectives
By the end of this module you should have an understanding of
• What is linked data• What is open data • What is the difference between linked and open data• How to publish linked data (5-star schema)• What are the linked data principles and the linked data technologies
(the semantic web stack)• The economic and social impact of linked data
The Web of Data
The evolution from a Web of linked documents to a web of linked dataThe Web as a huge decentralized database (knowledge base) of machine-accessible data
Web of documents... Web of linked data...
The evolution of the web
• The Web started as a collection of documents published online – accessible at Web location identified by a URL.
• These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines.
• The Web of Data is about enabling the access to this data, by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence).
Machine-readable data (or metadata) is data in a format that can be interpreted by a computer.
2 types of machine-readable data:
• human-readable data that is marked up so that it can also be understood by computers, e.g. microformats, RDFa;
• data formats intended principally for computers, e.g. RDF , X M L and JSON.
Linked Data and the ‘Web of Data‘● Term refers to an idea originally from Tim Berners-Lee
(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)
● Set of best practices for publication and linking of structured data on the web
● Basic assumption: The value of data on the web increases when they are connected to other data sources
M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction
The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.
Defining linked data
“Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.”EC ISA Case Study: How Linked Data is transforming eGovernment
Linked Data Principles1. Use URIs as names for things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more things.
How to get Data from the Web?● Data can only be found on the Web, if it is available at some website
JDBC
Browser
Web Server
Database
HTTP
How to get Data from the Web?● There is a number of different (proprietary) Web APIs, data exchange
formats and Mashups on top of that
Database 1 Database 2 Database 3 Database 4
Web API 1
Web API 2
Web API 3
Web API 4
Mashup
In the Web today...● Data is locked up in small data islands● Other applications usually cannot access this data...
Database
DatabaseDatabase
DatabaseDatabase
Database
Database
Database
Database
Database
Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)
How to get rid of Closed Data Islands?
Database 1 Database 2 Database 3 Database 4
● Apply Semantic Web technologies○ to publish (structured) data on the web○ to draw connections from one data source to data from other data sources
RDF data RDF data RDF data RDF data
Linked Data Principles (1/4)1. Use URIs as names for things.
○ URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts
https://viaf.org/viaf/32197206/
http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
http://musicbrainz.org/artist/20244d07-534f-4eff-b4d4-930878889970
http://www.imdb.com/title/tt3659388
Linked Data Principles (2/4)2. Use HTTP URIs, so that people can look up those names.
○ HTTP URIs (URLs) as globally unique names enable dereferencing of associated information in the Web
○ via http Content Negotiation machine and humans can access the resource identified by the URI
RDFDocument
URI represents Designatumhttp://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
http://dbpedia.org/page/Wolfgang_Amadeus_Mozart
http://dbpedia.org/data/Wolfgang_Amadeus_Mozart
URI represents Designator URI represents Designator
HTMLDocument
FOR MACHINE
FOR HUMANS
DereferencableEvery term in a LOD sourcemust be accessible via its URIthrough an HTTP GET. Oncewe access the URI we foundthe definition of the term.
Linked Data Principles (3/4)
3. When someone looks up a URI, provide useful information, using thestandards (RDF, SPARQL)
○ RDF as universal data model for publishing structured data on the Web○ Make all URIs in the RDF graph dereferenceable○ Avoid RDF constructs that cause problems in Linked Data context
■ RDF Reification■ RDF Collections und Containers■ unnamed Blank Nodes
Linked Data Principles (4/4)4. Include links to other URIs, so that they can discover more things.
○ Link RDF references among data between different data sources:
○ owl:sameAs –create a link between individuals
○ rdfs:seeAlso – states that a resource may provide additional information
○ Relationship LinksLinks to external LOD Entitites related with the original entity
○ Identity LinksLinks to external LOD Entities referring to the same object or concept
○ Vocabulary LinksLinks to definitions of the original entity
Advantages of Linked Open Data vs. APIs○ Simple and generic API for various heterogeneous data sources
enables simple reuse and data sharing among applications
○ RDF Data model guarantees (simple) extensibility
○ Transport via http, standard Port 80, prevents firewall adaption
○ Ontologies enable meaningful connections between data sources
○ Reasoning over Linked Data enables to generate new knowledge,i.e. inference from implicit to explicit knowledge
The Semantic Web Technology Stack
http://dbpedia.org/resource/Santiago_de_Compostela
Santiago de CompostelaURI - Uniform Resource Identifier
From Wikipedia to DBpediahttps://en.wikipedia.org/wiki/Santiago_de_Compostela
http://dbpedia.org/resource/Santiago_de_Compostela
From Wikipedia to DBpediahttp://dbpedia.org/resource/Santiago_de_Compostela
RDF Resource Description Framework
:Santiago_de_Compostela rdf:type dbo:City . :Santiago_de_Compostela dbo:country dbr:Spain .:Santiago_de_Compostela owl:sameAsgeodata:Santiago di Compostela .dbr:University_of_Santiago_de_Composteladbp:city dbr:Santiago_de_Compostela .:Santiago_de_Compostela dbp:populationTotal95671 (xsd:integer) ....
:Santiago rdf:type dbo:City .
RDF Subject RDF Property RDF ObjectRDF Triple
From Wikipedia to DBpediahttp://dbpedia.org/resource/Santiago_de_Compostela
● Resource○ can be everything○ must be uniquely identified and referencable via URI
● Description○ = description of resources○ via representing properties and relationships among resources as graphs
● Framework○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …)○ based on formal model (semantics)
● Knowledge in RDF is expressed as a list of statements● all RDF statements follow the same simple schema (= RDF Triple)
Resource Description Framework
Resource Description Framework● RDF Statements (RDF-Triple):
+ Object / ValueSubject + Property
URI URI URI / Literal RDF Building Blocks
<http://dbpedia.org/resource/Santiago_de_Compostela>
<http://dbpedia.org/ontology/populationTotal>
N-Triples Serialization
“95671” .
graph representation
<http://dbpedia.org/resource/Santiago_de_Compostela> <http://dbpedia.org/ontology/
populationTotal>
“95671” .
Resource Description Framework
● URIs and Literals○ URIs reference resources uniquely○ Literals describe data values that don’t have a separate existence
<http://dbpedia.org/resource/Spain><http://dbpedia.org/ontology/country>
<http://dbpedia.org/resource/Santiago_de_Compostela>
<http://dbpedia.org/ontology/populationTotal> “95671” .
RDF Schema
dbo:City rdf:type owl:class .dbo:City rdfs:subClassOfdbo:Settlement .
dbo:foundationPlace rdfs:rangedbo:City....
City foundationPlace
Settlement
rdfs:isSubclassOf
The Semantic Web Technology Stackhttp://dbpedia.org/ontology/City
rdfs:range
logical constraint
City
Spain Madriddbo:country
Small_town ∩ Capital = ∅
rdf:type
rdfs:isSubclassOf
∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) )
description logics
+ logical rules
classes
entities
The Semantic Web Technology Stack
Look f o r a l l cities located i n the same area of Sant iago de Compostela (use the propertydbp:subdivis ionName)
PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX dbp: <http://dbpedia.org/property/>PREFIX dbr: <http://dbpedia.org/resource/>
SELECT distinct ?area ?cityFROM <http://dbpedia.org/> WHERE{?area dbp:subdivisionName dbr:Santiago_de_Compostela.?area dbp:subdivisionName ?city.}
The Semantic Web Technology Stack
http://dbpedia.org/sparql
http://dbpedia.org/sparql
Look f o r a l l cities located i nthe same area of Sant iago de Composte la (use the property dbp:subdiv is ionName)
Query language designed to use a syntax similar to SQL for retrieving data from relational databases.Different query forms:
• SELECT returns variables and their bindings directly.• CONSTRUCT returns a single RDF graph specified by a graph template.• ASK test whether or not a query pattern has a solution. Returns yes/no.• DESCRIBE returns a single RDF graph containing RDF data about resources.
SPARQL – * Protocol and RDF Query Language
SQL versus SPARQL
SQL SPARQL
Based on relations (tables). Based on labelled directed graphs.
The relations (tables) to be matched over should be indicated.
Assumes a default graph. (The FROM clause populates this with specific identifiedsubgraphs).
(Retrieval) queries produce a relation from a relation.
SPARQL SELECT queries produce a relation from a graph. CONSTRUCT queries (considered later) produce a graph from a graph.
The application of the Linked Data Principles leads to a ,Web of Data‘
>1014Datasets>74B RDF Triples 808M Linksas of August 2014
The Development of the Web of Data
May 2007
The Development of the Web of Data
Nov 2007
The Development of the Web of Data
The Development of the Web of Data
July 2009
The Development of the Web of Data
Aug 2014
Linked Open Data○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data
★★
★★★
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data(e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★★ All the above, plus: link your data to other people’s data to provide context
★★★★ All the above plus: use open standards from W3C(URI,RDF and SPARQL) to identify things, so that people can point at your stuff
★
December 20078 principles for the Open Government Data:
CompletePrimary (not aggregate)
Up to dateAccessible
Machine processableNon-discriminatory
Non-proprietaryNo license fees
https://opengovdata.org/
Open data
Data can be published andbe publicly available underan open licence withoutlinking to other datasources.
Linked data
Data can be linked to URIs from other data sources, using open standards such as RDF without being publicly available under an open licence.
“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”- OpenDefinition.org
See also:Cobden et al., A research agenda for Linked ClosedDatahttp://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf
Linked Data vs open Data
• Flexible data integration: LOGD facilitates data integration and enables the interconnection of previously disparate government datasets.
• Increase in data quality: The increased (re)use of LOGD triggers a growingdemand to improve data quality. Through crowd-sourcing and self-servicemechanisms, errors are progressively corrected.
• New services: The availability of LOGD gives rise to new services offered by the public and/or private sector.
• Cost reduction: The reuse of LOGD in e-Government applications leads to considerable cost reductions.
Seealso:ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business-models-linked-open-government-data-bm4logd
Linked (open) governament data
Key milestones for linked government data
Linked Data - A Guided Tour● Datasets ordered
by category
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
Government● 183 datasets● top 10 highest indegree: reference.data.gov.uk● 48 proprietary vocabularies used● c. 21% fully dereferencable
DereferencableEvery term in a LOD source must beaccessible via its URI through an HTTPGET. Once we access the URI we found thedefinition of the term.The dereferencability quota of a LODsource is define as the number ofdereferencable terms divided by all termscollected into the source.
fully dereferencable LOD source – thereexist a definition for all URIspartially dereferencable LOD source - forsome terms, but not for all, a definitioncould be retrieved
Media● 22 datasets● 22 proprietary vocabularies used● 0% fully dereferencable● 9% partially dereferencable
User Generated Content● 48 datasets● top 10 highest outdegree: semanticweb.org● 30 proprietary vocabularies used● 13% fully dereferencable● 10% partially dereferencable
Linguistics● no statistics available so far
Bibliographic Data● 96 datasets● top 10 highest indegree: data.semanticweb.org● top 10 highest outdegree: bibsonomy.org● 58 proprietary vocabularies used● 21% fully dereferencable● 7% partially dereferencable
● 83 datasets● 35 proprietary vocabularies used● 28% fully dereferencable● 6% partially dereferencable
Life Sciences
Cross Domain● 41 datasets● top 10 highest indegree: dbpedia.org, w3.org,
lexvo.org● 55 proprietary vocabularies used● 27% fully dereferencable● 11% partially dereferencable
Social Networking● 520 datasets● top 10 highest indegree: quitter.se, status.net, …● top 10 highest outdegree: deri.org, harth.org,...● 128 proprietary vocabularies used● 16% fully dereferencable● 6% partially dereferencable
Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti
Geographic● 21 datasets● top 10 highest indegree: geonames.org● 24 proprietary vocabularies used● 21% fully dereferencable● 4% partially dereferencable
Linked Data Ontologies● Ontologies hold the
Linked Data Cloud together
● OWLowl:sameAs connects identical individuals owl:equivalentClass connects equivalent classes
Linked Data Ontologies● Ontologies hold the
Linked Data Cloud together
● SKOS○ „Simple Knowledge Organization System“○ based on RDF and RDFS○ applied for definitions and mappings of
vocabularies and ontologies■ skos:Concept (classes)■ skos:narrower■ skos:broader■ skos:related■ skos:exactMatch (vacabulary)■ skos:narrowMatch■ skos:broadMatch■ skos:relatedMatch
Linked Data Ontologies● Ontologies hold the
Linked Data Cloud together
● umbel○ „Upper Mapping and Binding Exchange
Layer“○ Subset of OpenCycas RDF Triples based on
SKOS and OWL2○ Upper Ontology with 28.000 concepts
(skos:Concept)○ 46.000 Mappings into DBpedia,
geonames, e.a. (owl:equivalentClass, rdfs: subClassOf)
○ Links to more than 2 Mio Wikipedia pages
Member State initiatives – some examplesSome examples on supra-national, national, regional and private initiatives in the area of linked (open) data across Europe.
DE – Bibliotheksverbund Bayern
Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg.
IT – Agenzia per l’Italia digitiale
Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration.
NL – Building and address register
The Dutch Address and Buildings base register published as linked data.
UK – Ordnance Survey
Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open and the administrative geography taken from Boundary Line.
UK – Companies House
Publishing basic company details as linked data
using a simple URI for each company in their database.
Seealso:ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business-models-linked-open-government-data-bm4logd
Linked Government Data & Metadata initiatives funded by the European Commission
ADMS.
SWCOREPUBLICSERVICEVOCABULARY
Linked Government Data Pilots
http://health.testproject.eu/PPP/
http://maritime.testproject.eu/CISE/
http://cpsv.testproject.eu/CPSV/
Non-governmental applications
Conclusion
• Linked data is a set of design principles for sharing machine-readable data on the Web.
• Linked data and open data are not the same.• URIs, RDF and SPARQL form the foundational layer for Linked data.• Linked data offers a number of advantages for:
• Data integration with small impact on legacy systems;• Enables for semantic interoperability;• Enables creativity and innovation through context and knowledge- creation.
Group questions
Is there supply and demand for (Linked) Open Government Data in your country?
What are, in your opinion, the expected benefits and pitfalls of Linked Data?
Do you know if there are any Linked (Open) Data initiatives in your country? If so, how many stars would you give them?
Download the slide from
My research group websitewww.dbgroup.unimore.it
On slide sharehttp://www.slideshare.net/polaura
References
Some of the materials used in these slides have been rearranged from
- Slides of the “Knowledge Engineering with Semantic Web Technologies 2015” course held by Dott. Harald Sack https://open.hpi.de/courses/semanticweb2015
- Slides of the "Introduction to linked data" of Open Data Supporthttp://www.slideshare.net/OpenDataSupport/introduction-to-linked-data-23402165
- Slides of "Usage of Linked Data Introduction and Application Scenarios « and "Querying Linked Data" by Barry Norton, EUCLID project
Further readings
Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and MichaelHausenblas.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6237454
EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid-
project.eu/modules/course1
Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck.
http://www.semantic-web.at/LOD-TheEssentials.pdf
Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer.
http://linkeddatabook.com/editions/1.0/
LOD2 FP7 project, http://lod2.eu/
The Open Knowledge Foundation, http://okfn.org/
W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID,
http://projecteuclid.org/
ISA Programme, http://ec.europa.eu/isa/
W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page
LOD Around The Clock FP7 project, http://latc-project.eu/
Data.gov.uk, http://data.gov.uk/linked-data
Related projects and initiatives