Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Eurostat
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Introduction to the semantic web and to the web of linked data
Prof. Eero HyvönenUniversity of Helsinki and Aalto UniversityHelsinki, FinlandSept 28, 2017
ESTP course on Introduction to Linked Open Data
Eurostat
Outline
• The Vision• Tehnological Basis:
Metadata, ontologies, rules• How Does It Work in Practise?• What has been learned?
2
3
Eurostat
1. How to build an interoperable Web of Data?
4
Eurostat
2. How to build a more intelligent web?
1. Artificial Intelligence approach• The contents stay as they are• The machines operate more human-like (Artificial Intelligence)
2. Contents represented in intelligent ways• The contents are easier to understand• Machines may stay more stupid
In practice, both ways are needed• More intelligent systems process more intelligently represented contents
5
Eurostat
Web generations1G WWW:• WWW pages for human interpretation• HTML language2G WWW:• Structures for human/machine interpretation• XML language3G WWW: Semantic Web• Meanings for human/machine use• RDF(S) language4G WWW: Ubiquitous web for humans and machines
⇒ Semantics = foundation for intelligent web services• Semantic = “understandable” to machines
6
Eurostat
Why Semantics?
<artifact><id>NBA:H26069:467</id><target>cup and plate</target><material>porcelain</material><creationLocation>Germany</creationLocation><creator>Meissen</creator>
</artifact>
• This metadata cannot answer the following questions:• Find all vessels?• Find all ceramic products?• Find artifacts manufactured in Europe?• Does the city of Meissen manufacture ceramics?
7
Eurostat
NBA-H26069-467 :object ”cup and plate” ;:object_concept object:cup ;:object_concept object:plate ;
:material ”porcelain” ;:material_concept object:porcelain ;
:creationPlace ”Germany” ;:creationPlace_concept place:Germany ;
:creator ”Meissen”:creator_concept actor:Meissen .
NBA-H26069-467
place:Germany
object:cup
creationLocation_concept
place:Europe
loc:partOf
rdfs:subClassOfobject:vessel
object_concept
object_conceptobject:plate
rdfs:subClassOf
...
...
...
Find all vessels?Find all ceramic products?Find artifacts manufactured in Europe?Does the city of Meissen manufacture ceramics?
object ontology
place ontology
actor ontologymaterial ontology
place:Meissen
actor:Meissenmaterial:porcelain
material_concept
8
Eurostat
Case Rijksmuseum Amsterdam:CHIP Demonstrator
Example in Turtle notation• VRA metadata schema
(extension of Dublin Core)• (Aroyo et al., 2007)
A resource in the TGN ontology / vocabulary
Eurostat
Amsterdam in TGN
Eurostat
An Ontology Concept Hierarchy
11
Technological Basis of Semantic Web
12
Eurostat
The classical ”layer cake model”
(Tim Berners-Lee)
Metadata
Vocabularies/ontologies
Reasoning/logic
Trust
METADATA LEVEL
14
Eurostat
Why isn’t XML alone sufficient for the basis of Semantic web?
• The semantics of XML is only in human brain<ADDRESS>
<NAME>Peter Programmer</NAME><PHONE>123 456</PHONE>
</ADDRESS>
• We need a markup language, whose interpretation is:- Commonly agreed- Cross-domain- Machine-”understandable”- Data can be aggregated (documents combined)
15
Eurostat
The Semantic web solution: RDF Resource Description Framework• General metadata description language for web
resources• Relational model, not a syntax (as opposed to XML)
RDF description = directed graph
• Semantics is defined based on logic• Syntax/serialization
XML-based RDF/XML, especially for machinesSimple triple notations (N3, Turtle, N-triples) for humans
• Standardized and commonly usedW3C draft 1999W3C recommendation RDF 1.0, 10.2.2004W3C recommendation RDF 1.1, 25.2.2014
16
Eurostat
Metadata schemas• Standardized formats for metadata descriptions
• A set of elements/properties, and• Predefined value sets for them
• Different content types typically require different properties• E.g., book vs. song vs. museum item
• Problems• How the element values are represented?
Tarja Halonen vs. Halonen T.11.9.2001 vs. Sept 11, 2001 vs. 2001/09/11
• What do the values mean?”glass”, ”nokia”, ”Pyhäjärvi”
• How can different schema structures be combined?writer vs. creator as a property
17
Eurostat
Example: Dublin Core• Set of 15 general properties for different contents
• Dublin Core Metadata Element Set (ISO Standard 15836)TitleCreatorSubjectDescriptionPublisherContributorDataTypeFormatIdentifierRelationSourceLanguageCoverageRights
18
Eurostat
Dublin Core (2)
• DCMI Metadata Terms defines tens of qualifiers/refinements, which specialize the semantics of Dublin Core elements• E.g., accessRights < Rights
• Dumb-down principle• A qualified version can always be replaced with a more
generic oneI.e., a qualifier can only specify the meaning of an element
• The element values may have predefined encoding formats• Vocabulary encoding scheme
Set of possible terms, e.g., listing of different resource types (text, image, …)
• Syntax encoding schemeE.g., date ”2001-09-11”
19
Eurostat
Dublin Core (3)
• Application Profiles• Defines a combination of DC elements and
qualifiers + value encoding formats for a given domain (e.g., for library data)
• Possible own extensions• E.g., Visual Resources Association (VRA) Core 4.0
New elements, such as ”measurements”
20
Eurostat
Metadata Schema in HealthFinland
21
Eurostat
HealthFinland portal: Maija’s eyeglasses – PDF document on the web
22
Eurostat
Maija’s eyeglasses:metadata in RDF form
23
ONTOLOGY LEVEL
24
Eurostat
What is an ontology?
• “An ontology is an explicit specification of a conceptualization
• ...definitions need to be couched in some common formalism”
• (Gruber, 1993)
Explicit: machine can understandCommon (shared): communication is possible (not mentioned by Gruber)Formal: precisely defined
• Defines the concepts/objects and their relations in a given domain• A first requirement for the humans and machines to understand each
other25
Eurostat
class-def animalclass-def plant
subclass-of NOT animalclass-def tree
subclass-of plantclass-def branch
slot-constraint is-part-of has-value treeclass-def leaf
slot-constraint is-part-of has-value branchclass-def defined carnivore
subclass-of animalslot-constraint eats value-type animal
class-def defined herbivoresubclass-of animalslot-constraint eats
value-type plant OR (slot-constraint is-part-of has-value plant)class-def herbivore
subclass-of NOT carnivoreclass-def giraffe
subclass-of animalslot-constraint eats
value-type leafclass-def lion
subclass-of animalslot-constraint eats value-type herbivore
class-def tasty-plantsubclass-of plantslot-constraint eaten-by has-value herbivore, carnivore
EXAMPLE OF AN ONTOLOGY
26
Eurostat
Glossary- word list- little structure
Thesaurus- relations- NT, BT, RT etc.
Taxonomy- relations- inheritance- constrains
Axiomatized theory- formal system- logic-based
Ontological complexity/depth
Hum
anun
ders
tand
able
Mac
hine
un
ders
tand
able
Ontology types
Numbers
Philosophical text
27
Eurostat
W3C standards for Semantic web ontologies/vocabularies
• SKOS Simple Knowledge Organization System• Light-weight semantics• E.g., for representing existing glossaries, classification
schemes, thesauri
• OWL Web Ontology Language• Rich semantics based on logic• Supports more reasoning
28
Eurostat
IEEE Standard Upper Merged Ontology (SUMO)
• Goals• Automated reasoning support in knowledge-based
applications• Interoperability
Define new data elements using SUMO and obtain mutual interoperabilityInteroperability between applications using domain specific ontologies (that use
SUMO)Neutral interchange format for different systems
• Application areas• E-commerce• E-learning• Natural language understanding tasks• …
29
Eurostat
SUMO
30
Eurostat
SUMO principal distinctions
31
Eurostat
SUMO Object:
32
Eurostat
AAT Art & Architecture Thesaurus- maintained by J. Paul Getty Trust- 7 main classes, 125 000 concepts
33
Eurostat
Union List of Artist Names ULAN
• 120,000 instances• 293,000 names
34
Eurostat
Resolving Identities
Eurostat
Geonames• Classes: 9 feature classes, 645 feature codes• Instances:
• 8 million geographical names, 6.5 million unique features, 2.2 million populated places, 1.8 million alternate names
• Registries and Wiki used for populating the ontology
36
Eurostat
ONKI.fi -> Finto.fi
38
RULE LEVEL
39
Eurostat
The idea of rules
• “New” information can be derived from old by reasoning
• Semantic Web is based on logic!
40
Eurostat
SUMO knowledge representation
• Developed in KIF (Knowledge Interchange Format)• A version of first order predicate logic• Other versions exist (e.g., OWL)
• Size• 1006 terms• 4142 axioms• 814 rules
41
Eurostat
Application example: MuseumFinland recommends links
• Inference rules tell machine about the world• E.g., that ”student’s cap” is related to ”parties”• E.g., that entities are related to each other if their
superclasses are related to each other• Etc.
• The machine can:• Reason interesting new relations between museum
items, and• Provide them to end users as recommendation links
43
(META)DATA + ONTOLOGIES =LINKED DATA
45
How Does It Work in Practise?
Eurostat
Finnish Biography center and libraries collect historical data of people
P1
Lemu
artistperson
”Akseli Gallen-Kallela”
P2
Askainen
marshal
”Gustaf Mannerheim”
type
type
name
name
occupation
occupation
birth place
birth place
person name occupation birth place … P1 Akseli Gallen-Kallela artist LemuP2 Gustaf Mannerheim marshal Askainen…
47
Eurostat
Museum catalogues paintings
...
W1
1929
painting
creator
time
type
”Gustaf Mannerheim”name
topic
name
”Akseli Gallen-Kallela”
Work name creator time Topic …W1 Portrait of
MannerheimAkseli Gallen-Kallela
1929 GustafMannerheim
W2 Aino Triptych Akseli Gallen-Kallela
1891 Aino, Kalevala
…
48
Eurostat
Land survey maintains place registries
Varsinais-Suomi Finland
Askainen
Lemu
Turku
part-ofpart-of
part-of
part-of
municipality
type
province
type...
type
municipality
province
Askainen Varsinais-SuomiHelsinki UusimaaLemu Varsinais-SuomiTurku Varsinais-Suomi…
49
Eurostat
National library builds ontologies
artistperson
marshal
painting
concept
endrant
place
occupation municipality
subclassOf
subclassOf
subclassOf
subclassOf
subclassOf
time
subclassOfabstractperdurant
physical object
province
KOKO ontology
50
Eurostat
Semantic RDF graph combines them all: Web of Data
P1
Lemu
artistperson
”Akseli Gallen-Kallela”
P2
Askainen
marshal
”Gustaf Mannerheim”
type
type
name
name
occupation
occupation
birth place
birth place
W1
1929
painting
creator
topictime
type
Varsinais-Suomi Finland
Turku
part-of part-of
part-of part-of
concept
endurant
place
occupation municipality
type
type
type
subclassOf
subclassOf
subclassOf
subclassOf
subclassOf
time
subclassOfabstractperdurant
physical object
province
subclassOf
...
51
1+1>2
AI
In Principle a Piece of Cake but …
How to align concepts (URIs) used by different organizations?
How to align metadata modelsused by different organization?
SHARED INFRA NEEDED!
Eurostat
Linked Data – Web of Data• Utilization of distributed work• Aggregating massive cross-domain contents• Linked Open Data thinking• Semantic portals
http://linkeddata.org
55
Eurostat
Application domains of Semantic web
• Information retrieval• Recommender systems• Knowledge management• E-business and web services• Profiling and customization• …• Virtually any field dealing with data!• https://www.w3.org/2001/sw/sweo/public/UseCases/
56
Application Example:WarSampo – Linked DeathFinnish WW2 on the Semantic Web
(Hyvönen et al., ESWC 2016)
https://vimeo.com/212249404
WHAT HAS BEEN LEARNED?Conclusions
59
Eurostat
What is the Semantic web?Content perspective: A new (meta)data layer on the web
• Web as a global database system• Web of Pages vs. Web of Data
Application perspective: Machine-understandable web• Semantics for machines• Enables human usage
Intelligent web servicesSemantic interoperability
Technological perspective:Next layers above XML
• W3C standards: RDF(S), OWL , SPARQL, etc. Metadata
Ontology
Rules
60
Eurostat
Semantic Web Makes a Difference!
• End-user’s perspective• Global view to heterogeneous, distributed contents• Automatic content aggregation• Semantic search• Semantic browsing and recommendations• Other intelligent services (knowledge discovery, personalization,
visualization, …)• Content publisher’s perspective
• Distributed content creation• Enriching each other’s contents semantically• Automated link maintenance• Shared content publication channel • Reusing aggregated content in other applications
Eurostat
But the Lunch is not Free
• - More collaboration is need -> complicates work• - Integration of semantic portals with legacy systems• - Manual annotations are costly and may not scale up• - Automatic annotation lowers data quality
‹#›
”Intellectuals solve problems - geniuses prevent them”
Albert Einstein
Key Lesson Learned: create high quality semantic data
when recording data
Eurostat
Questions
64