75
Introduction to Linked Data Laura Po - Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain. Laura Po

Introduction to linked data

Embed Size (px)

Citation preview

Page 1: Introduction to linked data

Introduction to Linked Data

Laura Po - Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain.

Laura Po

Page 2: Introduction to linked data
Page 3: Introduction to linked data

Objectives

By the end of this module you should have an understanding of

• What is linked data• What is open data • What is the difference between linked and open data• How to publish linked data (5-star schema)• What are the linked data principles and the linked data technologies

(the semantic web stack)• The economic and social impact of linked data

Page 4: Introduction to linked data
Page 5: Introduction to linked data

The Web of Data

The evolution from a Web of linked documents to a web of linked dataThe Web as a huge decentralized database (knowledge base) of machine-accessible data

Web of documents... Web of linked data...

Page 6: Introduction to linked data

The evolution of the web

• The Web started as a collection of documents published online – accessible at Web location identified by a URL.

• These documents often contain data about real-world resources which is mainly human-readable and cannot be understood by machines.

• The Web of Data is about enabling the access to this data, by making it available in machine-readable formats and connecting it using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence).

Machine-readable data (or metadata) is data in a format that can be interpreted by a computer.

2 types of machine-readable data:

• human-readable data that is marked up so that it can also be understood by computers, e.g. microformats, RDFa;

• data formats intended principally for computers, e.g. RDF , X M L and JSON.

Page 7: Introduction to linked data

Linked Data and the ‘Web of Data‘● Term refers to an idea originally from Tim Berners-Lee

(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)

● Set of best practices for publication and linking of structured data on the web

● Basic assumption: The value of data on the web increases when they are connected to other data sources

M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction

The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.

Page 8: Introduction to linked data

Defining linked data

“Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.”EC ISA Case Study: How Linked Data is transforming eGovernment

Page 9: Introduction to linked data

Linked Data Principles1. Use URIs as names for things.

2. Use HTTP URIs, so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

4. Include links to other URIs, so that they can discover more things.

Page 10: Introduction to linked data
Page 11: Introduction to linked data

How to get Data from the Web?● Data can only be found on the Web, if it is available at some website

JDBC

Browser

Web Server

Database

HTTP

Page 12: Introduction to linked data

How to get Data from the Web?● There is a number of different (proprietary) Web APIs, data exchange

formats and Mashups on top of that

Database 1 Database 2 Database 3 Database 4

Web API 1

Web API 2

Web API 3

Web API 4

Mashup

Page 13: Introduction to linked data

In the Web today...● Data is locked up in small data islands● Other applications usually cannot access this data...

Database

DatabaseDatabase

DatabaseDatabase

Database

Database

Database

Database

Database

Page 14: Introduction to linked data

Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)

Page 15: Introduction to linked data

How to get rid of Closed Data Islands?

Database 1 Database 2 Database 3 Database 4

● Apply Semantic Web technologies○ to publish (structured) data on the web○ to draw connections from one data source to data from other data sources

RDF data RDF data RDF data RDF data

Page 16: Introduction to linked data

Linked Data Principles (1/4)1. Use URIs as names for things.

○ URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts

https://viaf.org/viaf/32197206/

http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart

http://musicbrainz.org/artist/20244d07-534f-4eff-b4d4-930878889970

http://www.imdb.com/title/tt3659388

Page 17: Introduction to linked data

Linked Data Principles (2/4)2. Use HTTP URIs, so that people can look up those names.

○ HTTP URIs (URLs) as globally unique names enable dereferencing of associated information in the Web

○ via http Content Negotiation machine and humans can access the resource identified by the URI

RDFDocument

URI represents Designatumhttp://dbpedia.org/resource/Wolfgang_Amadeus_Mozart

http://dbpedia.org/page/Wolfgang_Amadeus_Mozart

http://dbpedia.org/data/Wolfgang_Amadeus_Mozart

URI represents Designator URI represents Designator

HTMLDocument

FOR MACHINE

FOR HUMANS

DereferencableEvery term in a LOD sourcemust be accessible via its URIthrough an HTTP GET. Oncewe access the URI we foundthe definition of the term.

Page 18: Introduction to linked data

Linked Data Principles (3/4)

3. When someone looks up a URI, provide useful information, using thestandards (RDF, SPARQL)

○ RDF as universal data model for publishing structured data on the Web○ Make all URIs in the RDF graph dereferenceable○ Avoid RDF constructs that cause problems in Linked Data context

■ RDF Reification■ RDF Collections und Containers■ unnamed Blank Nodes

Page 19: Introduction to linked data

Linked Data Principles (4/4)4. Include links to other URIs, so that they can discover more things.

○ Link RDF references among data between different data sources:

○ owl:sameAs –create a link between individuals

○ rdfs:seeAlso – states that a resource may provide additional information

○ Relationship LinksLinks to external LOD Entitites related with the original entity

○ Identity LinksLinks to external LOD Entities referring to the same object or concept

○ Vocabulary LinksLinks to definitions of the original entity

Page 20: Introduction to linked data

Advantages of Linked Open Data vs. APIs○ Simple and generic API for various heterogeneous data sources

enables simple reuse and data sharing among applications

○ RDF Data model guarantees (simple) extensibility

○ Transport via http, standard Port 80, prevents firewall adaption

○ Ontologies enable meaningful connections between data sources

○ Reasoning over Linked Data enables to generate new knowledge,i.e. inference from implicit to explicit knowledge

Page 21: Introduction to linked data
Page 22: Introduction to linked data
Page 23: Introduction to linked data

The Semantic Web Technology Stack

http://dbpedia.org/resource/Santiago_de_Compostela

Santiago de CompostelaURI - Uniform Resource Identifier

Page 24: Introduction to linked data

From Wikipedia to DBpediahttps://en.wikipedia.org/wiki/Santiago_de_Compostela

http://dbpedia.org/resource/Santiago_de_Compostela

Page 25: Introduction to linked data

From Wikipedia to DBpediahttp://dbpedia.org/resource/Santiago_de_Compostela

Page 26: Introduction to linked data

RDF Resource Description Framework

:Santiago_de_Compostela rdf:type dbo:City . :Santiago_de_Compostela dbo:country dbr:Spain .:Santiago_de_Compostela owl:sameAsgeodata:Santiago di Compostela .dbr:University_of_Santiago_de_Composteladbp:city dbr:Santiago_de_Compostela .:Santiago_de_Compostela dbp:populationTotal95671 (xsd:integer) ....

:Santiago rdf:type dbo:City .

RDF Subject RDF Property RDF ObjectRDF Triple

From Wikipedia to DBpediahttp://dbpedia.org/resource/Santiago_de_Compostela

Page 27: Introduction to linked data

● Resource○ can be everything○ must be uniquely identified and referencable via URI

● Description○ = description of resources○ via representing properties and relationships among resources as graphs

● Framework○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …)○ based on formal model (semantics)

● Knowledge in RDF is expressed as a list of statements● all RDF statements follow the same simple schema (= RDF Triple)

Resource Description Framework

Page 28: Introduction to linked data

Resource Description Framework● RDF Statements (RDF-Triple):

+ Object / ValueSubject + Property

URI URI URI / Literal RDF Building Blocks

<http://dbpedia.org/resource/Santiago_de_Compostela>

<http://dbpedia.org/ontology/populationTotal>

N-Triples Serialization

“95671” .

graph representation

<http://dbpedia.org/resource/Santiago_de_Compostela> <http://dbpedia.org/ontology/

populationTotal>

“95671” .

Page 29: Introduction to linked data

Resource Description Framework

● URIs and Literals○ URIs reference resources uniquely○ Literals describe data values that don’t have a separate existence

<http://dbpedia.org/resource/Spain><http://dbpedia.org/ontology/country>

<http://dbpedia.org/resource/Santiago_de_Compostela>

<http://dbpedia.org/ontology/populationTotal> “95671” .

Page 30: Introduction to linked data

RDF Schema

dbo:City rdf:type owl:class .dbo:City rdfs:subClassOfdbo:Settlement .

dbo:foundationPlace rdfs:rangedbo:City....

City foundationPlace

Settlement

rdfs:isSubclassOf

The Semantic Web Technology Stackhttp://dbpedia.org/ontology/City

rdfs:range

Page 31: Introduction to linked data

logical constraint

City

Spain Madriddbo:country

Small_town ∩ Capital = ∅

rdf:type

rdfs:isSubclassOf

∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) )

description logics

+ logical rules

classes

entities

The Semantic Web Technology Stack

Page 32: Introduction to linked data

Look f o r a l l cities located i n the same area of Sant iago de Compostela (use the propertydbp:subdivis ionName)

PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX dbp: <http://dbpedia.org/property/>PREFIX dbr: <http://dbpedia.org/resource/>

SELECT distinct ?area ?cityFROM <http://dbpedia.org/> WHERE{?area dbp:subdivisionName dbr:Santiago_de_Compostela.?area dbp:subdivisionName ?city.}

The Semantic Web Technology Stack

http://dbpedia.org/sparql

Page 33: Introduction to linked data

http://dbpedia.org/sparql

Look f o r a l l cities located i nthe same area of Sant iago de Composte la (use the property dbp:subdiv is ionName)

Page 34: Introduction to linked data

Query language designed to use a syntax similar to SQL for retrieving data from relational databases.Different query forms:

• SELECT returns variables and their bindings directly.• CONSTRUCT returns a single RDF graph specified by a graph template.• ASK test whether or not a query pattern has a solution. Returns yes/no.• DESCRIBE returns a single RDF graph containing RDF data about resources.

SPARQL – * Protocol and RDF Query Language

Page 35: Introduction to linked data

SQL versus SPARQL

SQL SPARQL

Based on relations (tables). Based on labelled directed graphs.

The relations (tables) to be matched over should be indicated.

Assumes a default graph. (The FROM clause populates this with specific identifiedsubgraphs).

(Retrieval) queries produce a relation from a relation.

SPARQL SELECT queries produce a relation from a graph. CONSTRUCT queries (considered later) produce a graph from a graph.

Page 36: Introduction to linked data
Page 37: Introduction to linked data

The application of the Linked Data Principles leads to a ,Web of Data‘

>1014Datasets>74B RDF Triples 808M Linksas of August 2014

Page 38: Introduction to linked data

The Development of the Web of Data

May 2007

Page 39: Introduction to linked data

The Development of the Web of Data

Nov 2007

Page 40: Introduction to linked data

The Development of the Web of Data

Page 41: Introduction to linked data

The Development of the Web of Data

July 2009

Page 42: Introduction to linked data

The Development of the Web of Data

Aug 2014

Page 43: Introduction to linked data

Linked Open Data○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data

★★

★★★

Available on the web (whatever format) but with an open licence, to be Open Data

Available as machine-readable structured data(e.g. excel instead of image scan of a table)

as (2) plus non-proprietary format (e.g. CSV instead of excel)

★★★★★ All the above, plus: link your data to other people’s data to provide context

★★★★ All the above plus: use open standards from W3C(URI,RDF and SPARQL) to identify things, so that people can point at your stuff

Page 44: Introduction to linked data

Linked Open Data

http://5stardata.info/en/

Page 45: Introduction to linked data
Page 46: Introduction to linked data

December 20078 principles for the Open Government Data:

CompletePrimary (not aggregate)

Up to dateAccessible

Machine processableNon-discriminatory

Non-proprietaryNo license fees

https://opengovdata.org/

Page 47: Introduction to linked data

Open data

Data can be published andbe publicly available underan open licence withoutlinking to other datasources.

Linked data

Data can be linked to URIs from other data sources, using open standards such as RDF without being publicly available under an open licence.

“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”- OpenDefinition.org

See also:Cobden et al., A research agenda for Linked ClosedDatahttp://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf

Linked Data vs open Data

Page 48: Introduction to linked data

• Flexible data integration: LOGD facilitates data integration and enables the interconnection of previously disparate government datasets.

• Increase in data quality: The increased (re)use of LOGD triggers a growingdemand to improve data quality. Through crowd-sourcing and self-servicemechanisms, errors are progressively corrected.

• New services: The availability of LOGD gives rise to new services offered by the public and/or private sector.

• Cost reduction: The reuse of LOGD in e-Government applications leads to considerable cost reductions.

Seealso:ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business-models-linked-open-government-data-bm4logd

Linked (open) governament data

Page 49: Introduction to linked data

Key milestones for linked government data

Page 50: Introduction to linked data
Page 51: Introduction to linked data

Linked Data - A Guided Tour● Datasets ordered

by category

http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/

Page 52: Introduction to linked data

Government● 183 datasets● top 10 highest indegree: reference.data.gov.uk● 48 proprietary vocabularies used● c. 21% fully dereferencable

DereferencableEvery term in a LOD source must beaccessible via its URI through an HTTPGET. Once we access the URI we found thedefinition of the term.The dereferencability quota of a LODsource is define as the number ofdereferencable terms divided by all termscollected into the source.

fully dereferencable LOD source – thereexist a definition for all URIspartially dereferencable LOD source - forsome terms, but not for all, a definitioncould be retrieved

Page 53: Introduction to linked data

Media● 22 datasets● 22 proprietary vocabularies used● 0% fully dereferencable● 9% partially dereferencable

Page 54: Introduction to linked data

User Generated Content● 48 datasets● top 10 highest outdegree: semanticweb.org● 30 proprietary vocabularies used● 13% fully dereferencable● 10% partially dereferencable

Page 55: Introduction to linked data

Linguistics● no statistics available so far

Page 56: Introduction to linked data

Bibliographic Data● 96 datasets● top 10 highest indegree: data.semanticweb.org● top 10 highest outdegree: bibsonomy.org● 58 proprietary vocabularies used● 21% fully dereferencable● 7% partially dereferencable

Page 57: Introduction to linked data

● 83 datasets● 35 proprietary vocabularies used● 28% fully dereferencable● 6% partially dereferencable

Life Sciences

Page 58: Introduction to linked data

Cross Domain● 41 datasets● top 10 highest indegree: dbpedia.org, w3.org,

lexvo.org● 55 proprietary vocabularies used● 27% fully dereferencable● 11% partially dereferencable

Page 59: Introduction to linked data

Social Networking● 520 datasets● top 10 highest indegree: quitter.se, status.net, …● top 10 highest outdegree: deri.org, harth.org,...● 128 proprietary vocabularies used● 16% fully dereferencable● 6% partially dereferencable

Page 60: Introduction to linked data

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti

Geographic● 21 datasets● top 10 highest indegree: geonames.org● 24 proprietary vocabularies used● 21% fully dereferencable● 4% partially dereferencable

Page 61: Introduction to linked data

Linked Data Ontologies● Ontologies hold the

Linked Data Cloud together

● OWLowl:sameAs connects identical individuals owl:equivalentClass connects equivalent classes

Page 62: Introduction to linked data

Linked Data Ontologies● Ontologies hold the

Linked Data Cloud together

● SKOS○ „Simple Knowledge Organization System“○ based on RDF and RDFS○ applied for definitions and mappings of

vocabularies and ontologies■ skos:Concept (classes)■ skos:narrower■ skos:broader■ skos:related■ skos:exactMatch (vacabulary)■ skos:narrowMatch■ skos:broadMatch■ skos:relatedMatch

Page 63: Introduction to linked data

Linked Data Ontologies● Ontologies hold the

Linked Data Cloud together

● umbel○ „Upper Mapping and Binding Exchange

Layer“○ Subset of OpenCycas RDF Triples based on

SKOS and OWL2○ Upper Ontology with 28.000 concepts

(skos:Concept)○ 46.000 Mappings into DBpedia,

geonames, e.a. (owl:equivalentClass, rdfs: subClassOf)

○ Links to more than 2 Mio Wikipedia pages

Page 64: Introduction to linked data
Page 65: Introduction to linked data

Member State initiatives – some examplesSome examples on supra-national, national, regional and private initiatives in the area of linked (open) data across Europe.

DE – Bibliotheksverbund Bayern

Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg.

IT – Agenzia per l’Italia digitiale

Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration.

NL – Building and address register

The Dutch Address and Buildings base register published as linked data.

UK – Ordnance Survey

Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open and the administrative geography taken from Boundary Line.

UK – Companies House

Publishing basic company details as linked data

using a simple URI for each company in their database.

Seealso:ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business-models-linked-open-government-data-bm4logd

Page 67: Introduction to linked data

Linked Government Data Pilots

http://health.testproject.eu/PPP/

http://maritime.testproject.eu/CISE/

http://cpsv.testproject.eu/CPSV/

Page 68: Introduction to linked data

Non-governmental applications

Page 69: Introduction to linked data

Conclusion

• Linked data is a set of design principles for sharing machine-readable data on the Web.

• Linked data and open data are not the same.• URIs, RDF and SPARQL form the foundational layer for Linked data.• Linked data offers a number of advantages for:

• Data integration with small impact on legacy systems;• Enables for semantic interoperability;• Enables creativity and innovation through context and knowledge- creation.

Page 70: Introduction to linked data

Group questions

Is there supply and demand for (Linked) Open Government Data in your country?

What are, in your opinion, the expected benefits and pitfalls of Linked Data?

Do you know if there are any Linked (Open) Data initiatives in your country? If so, how many stars would you give them?

Page 71: Introduction to linked data
Page 72: Introduction to linked data

Download the slide from

My research group websitewww.dbgroup.unimore.it

On slide sharehttp://www.slideshare.net/polaura

Page 73: Introduction to linked data

References

Some of the materials used in these slides have been rearranged from

- Slides of the “Knowledge Engineering with Semantic Web Technologies 2015” course held by Dott. Harald Sack https://open.hpi.de/courses/semanticweb2015

- Slides of the "Introduction to linked data" of Open Data Supporthttp://www.slideshare.net/OpenDataSupport/introduction-to-linked-data-23402165

- Slides of "Usage of Linked Data Introduction and Application Scenarios « and "Querying Linked Data" by Barry Norton, EUCLID project

Page 74: Introduction to linked data

Further readings

Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and MichaelHausenblas.

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6237454

EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid-

project.eu/modules/course1

Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck.

http://www.semantic-web.at/LOD-TheEssentials.pdf

Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer.

http://linkeddatabook.com/editions/1.0/

Page 75: Introduction to linked data

LOD2 FP7 project, http://lod2.eu/

The Open Knowledge Foundation, http://okfn.org/

W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID,

http://projecteuclid.org/

ISA Programme, http://ec.europa.eu/isa/

W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page

LOD Around The Clock FP7 project, http://latc-project.eu/

Data.gov.uk, http://data.gov.uk/linked-data

Related projects and initiatives