Upload
kale-edkins
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
1
Introduction to KR
and
Semantic Web
Many slides are based on tutorial by Ivan Herman (W3C) reproduced here with kind permission. Additionally, many of the intro slides are based on tutorial by Sean Bechhofer, Ian Horrocks and Peter F. Patel-Schneider, with additions by Suresh Manandhar and Dimitar Kazakov.
2
History of the Semantic WebWeb was “invented” by Tim Berners-Lee (amongst others), a physicist
working at CERN
TBL’s original vision of the Web was much more ambitious than the
reality of the existing (syntactic) Web:
TBL (and others) have since been working towards realising this vision,
which has become known as the Semantic WebE.g., article in May 2001 issue of Scientific American…
“... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”
3
Realising the complete “vision” is too hard for now (probably)But we can make a start by adding semantic annotation to web resources
Scientific American, May 2001:
4
Where we are Today: the Syntactic Web
[Hendler & Miller 02]
5
The Syntactic Web is…
A hypermedia, a digital libraryA library of documents called (web pages) interconnected by a hypermedia of links
A database, an application platformA common portal to applications accessible through web pages, and presenting their results as web pages
A platform for multimediaBBC Radio 4 anywhere in the world! Terminator 3 trailers!
A naming schemeUnique identity for those documents
A place where computers do the presentation (easy) and people do the linking and interpreting (hard).
Why not get computers to do more of the hard work?
[Goble 03]
6
Hard Work using the Syntactic Web…
Find images of Peter Patel-Schneider, Frank van Harmelen and Alan Rector…
Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
7
Impossible (?) using the Syntactic Web…
Complex queries involving background knowledgeFind information about “animals that use sonar but are not either bats or dolphins”
Locating information in data repositoriesTravel enquiries
Prices of goods and services
Results of human genome experiments
Finding and using “web services”Visualise surface interactions between two proteins
Delegating complex tasks to web “agents”Book me a holiday next weekend somewhere warm, not too far away, and where they speak French or English
, e.g., Barn Owl
8
What is the Problem?Consider a typical web page:
Markup consists of: – rendering
information (e.g., font size and colour)
– Hyper-links to related content
Semantic content is accessible to humans but not (easily) to computers…
9
What information can we see…
WWW2002The eleventh international world wide web conferenceSheraton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days learn interactRegistered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire
Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …Speakers confirmedTim berners-lee Tim is the well known inventor of the Web, …Ian FosterIan is the pioneer of the Grid, the next generation internet …
10
What information can a machine see…
WWW2002The eleventh international world wide web conferenceSheraton waikiki hotelHonolulu, hawaii, USA7-11 may 20021 location 5 days learn interactRegistered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong,
india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire
Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …Speakers confirmedTim berners-lee Tim is the well known inventor of the Web, …Ian FosterIan is the pioneer of the Grid, the next generation internet …
11
Solution: XML markup with “meaningful” tags?<name>WWW2002The eleventh international world wide webcon</name><location>Sheraton waikiki hotelHonolulu, hawaii, USA</location><date>7-11 may 2002</date><slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>
<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the Web,</bio>…
12
But What About…<conf>WWW2002The eleventh international world wide webcon</conf><place>Sheraton waikiki hotelHonolulu, hawaii, USA</place><date>7-11 may 2002</date>
<slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>
<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the Web,…
13
Machine sees…<name>WWW2002The eleventh international world wide webc</name><location>Sheraton waikiki hotelHonolulu, hawaii, USA</location><date>7-11 may 2002</date><slogan>1 location 5 days learn interact</slogan><participants>Registered participants coming fromaustralia, canada, chile denmark, france, germany, ghana, hong kong, india,
ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>
<introduction>Register nowOn the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …Speakers confirmed</introduction><speaker>Tim berners-lee</speaker><bio>Tim is the well known inventor of the W</bio><speaker>Ian Foster</speaker><bio>Ian is the pioneer of the Grid, the ne</bio>
14
Need to Add “Semantics”External agreement on meaning of annotations
E.g., Dublin CoreAgree on the meaning of a set of annotation tags
Problems with this approachInflexible
Limited number of things can be expressed
Use Ontologies to specify meaning of annotationsOntologies provide a vocabulary of terms
New terms can be formed by combining and extending existing ones
Meaning (semantics) of such terms is formally specified
Can also specify relationships between terms in multiple ontologies
15
An ontology is an engineering artifact:
It is constituted by a specific vocabulary used to describe a certain reality, plus
a set of explicit assumptions regarding the intended meaning of the vocabulary.
Thus, an ontology describes a formal specification of a certain
domain:
Shared understanding of a domain of interest
Formal and machine manipulable model of a domain of interest
“An explicit specification of a conceptualisation” [Gruber93]
Ontology in Computer Science
16
Structure of an OntologyOntologies typically have two distinct components:
Names for important concepts in the domainElephant is a concept whose members are a kind of animal
Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants
Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years
Background knowledge/constraints on the domainAdult_Elephants weigh at least 2,000 kg
All Elephants are either African_Elephants or Indian_Elephants
No individual can be both a Herbivore and a Carnivore
18
A Semantic Web — First Steps
Extend existing rendering markup with semantic markupMetadata annotations that describe content/funtion of web accessible resources
Use formal Ontologies to provide vocabulary for
annotations“Formal specification” is accessible to machines
A prerequisite is a standard web ontology languageNeed to agree common syntax before we can share semantics
Syntactic web based on standards such as HTTP and HTML
Make web resources more accessible to automated processes
19
Ontology Design and DeploymentGiven key role of ontologies in the Semantic Web, it will
be essential to provide tools and services to help users:Design and maintain high quality ontologies, e.g.:
Meaningful — all named classes can have instances
Correct — captured intuitions of domain experts
Minimally redundant — no unintended synonyms
Richly axiomatised — (sufficiently) detailed descriptions
Store (large numbers) of instances of ontology classes, e.g.:Annotations within web pages
Answer queries over ontology classes and instances, e.g.:Find more general/specific classes
Retrieve annotations/pages matching a given description
Integrate and align multiple ontologies
20
Lets have a go towards building a semantic website
21
HTML and XML
<H1>Book </H1><UL> <LI>Title: AI introduction
<LI>Author: Frank Russell<LI>ISBN: 554-15197912-554X
</UL>
HTML:
<book><title>AI introduction</title><author>Frank Russell</author><ISBN>554-15197912-554X</ISBN>
</book>
XML:
22
XML documents are trees over textbox nodes
<book><title>...</title><author>...</author><isbn>...</isbn>
</book>
• XML Schema: grammar to describe legal trees
• Problem: No agreed semantics
book
author isbntitle
23
Sharing data across the web
Requires
Shared Syntax: XML provides this
Shared Semantics : XML does not provide
this
24
But this is a difficult task
Web documents also contain unstructured pagesWords are ambiguous
bank – river bank or financial institution (context dependent, so cannot do dictionary lookup)
Even though sentences may be unambiguous. Mostly, but not always:
I saw a man with a telescope
But the web is a collection of documents
25
But hang on
The web is not just a collection of documentsThe web is lot more:
It’s a collection of links between documentsIt’s a collection of structured documents
Hence, we can gain a lot by attaching shared semantics to:
the linksthe structure within documents
At least, this is a start!!
26
But how do we do it?
27
Example: The Music site of the BBC
27
28
28
29
Site editors roam the Web for new factsmay discover further links while roaming
They update the site manually
And the site gets soon out-of-date
How to build such a site 1.
30
Editors roam the Web for new data
published on Web sites
“Scrape” the sites with a program to extract
the informationIe, write some code to incorporate the new data
Easily gets out of date again…
Content changes
How to build such a site 2.
31
Editors roam the Web for new data via API-s
Understand those…input, output arguments, datatypes used, etc
Write some code to incorporate the new data
Easily get out of date again…
Format changes
How to build such a site 3.
32
Use external, public datasetsWikipedia, MusicBrainz, …
They are available as data not API-s or hidden on a Web site
data can be extracted using, e.g., HTTP requests or standard queries
standardised formats/semantics
The choice of the BBC
33
Use the Web of Data as a Content
Management System
Use the community at large as content
editors
Get others to do the work
In short…
34
And this is no secret…
35
There are more an more data on the Webgovernment data, health related data, general knowledge, company information, flight information, restaurants,…
More and more applications rely on the
availability of that data
Data on the Web
36
But… data are often in isolation, “silos”
Photo credit “nepatterson”, Flickr
37
A “Web” wheredocuments are available for download on the Internet
but there would be no hyperlinks among them
Imagine…
38
And the problem is real…
39
We need a proper infrastructure for a real Web
of Datadata is available on the Web
accessible via standard Web technologies
data are interlinked over the Web
ie, data can be integrated over the Web
This is where Semantic Web technologies
come in
We want machines to do the integration.
But this requires machines to understand data
Data on the Web is not enough…
40
Lets have a go i.e. connect the silos
Photo credit “kxlly”, Flickr
41
We will use a simplistic example to
introduce the main Semantic Web concepts
In what follows…
42
Map the various data onto an abstract
semantically grounded data
representation
Merge the resulting representations
Start making queries on the whole!queries not possible on the individual data sets
The rough structure of data integration
43
We start with a book...
44
A simplified bookstore data (dataset “A”)
ISBN Author Title Publisher Year
0006511409X id_xyz The Glass Palace id_qpr 2000
ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher’s name City
id_qpr Harper Collins London
45
1st: export your data as a set of relations
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:authora:publish
er
46
Relations form a graphthe nodes refer to the “real” data or contain some literal
how the graph is represented in machine is immaterial for now
Some notes on the exporting the data
47
Same book in French…
48
Another bookstore data (dataset “F”)
A B C D
1 ID Titre Traducteur Original2 ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X3
4
5
6 ID Auteur7 ISBN 0-00-6511409-
X$A11$
8
9
10 Nom11 Ghosh, Amitav12 Besse, Christianne
49
2nd: export your second set of datahttp://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteurf:ti
tre
http://…isbn/2020386682
f:nom
50
3rd: start merging your data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
51
3rd: start merging your data (cont)
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
Same URI!
52
3rd: start merging your dataa:title
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
53
User of data “F” can now ask queries like:“give me the title of the original”
well, … « donnes-moi le titre de l’original »
This information is not in the dataset “F”…
…but can be retrieved by merging with
dataset “A”!
Start making queries…
54
We “feel” that a:author and f:auteur should be the
same
But an automatic merge does not know that!
Let us add some extra information to the merged
data:a:author same as f:auteur
both identify a “Person”
a term that a community may have already defined:a “Person” is uniquely identified by his/her name and, say, homepage
it can be used as a “category” for certain type of resources
However, more can be achieved…
55
3rd revisited: use the extra knowledge
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
56
User of dataset “F” can now query:“donnes-moi la page d’accueil de l’auteur de l’original”
well… “give me the home page of the original’s ‘auteur’”
The information is not in datasets “F” or “A”…
…but was made available by:merging datasets “A” and datasets “F”
adding three simple extra statements as an extra “glue”
Start making richer queries!
57
Using, e.g., the “Person”, the dataset can
be combined with other sources
For example, data in Wikipedia can be
extracted using dedicated toolse.g., the “dbpedia” project can already extract the “infobox” information from Wikipedia (check out the dbpedia ontology)
Combine with different datasets
58
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
r:type
foaf:name w:reference
59
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name w:reference
w:author_of
w:author_of
w:author_of
w:isbn
60
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirsf:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:namea:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Personr:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../Kolkata
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name w:reference
w:author_of
w:author_of
w:author_of
w:born_in
w:isbn
w:long w:lat
61
It may look like it but, in fact, it should not
be…
What happened via automatic means is
done every day by Web users!
The difference: a bit of extra rigour so that
machines could do this, too
Is that surprising?
62
We could add extra knowledge to the merged
datasetse.g., a full classification of various types of library data
geographical information
etc.
This is where ontologies, extra rules, etc, come inontologies/rule sets can be relatively simple and small, or huge, or anything in between…
Even more powerful queries can be asked as a
result
It could become even more powerful
63
What did we do?
Data in various formats
Data represented in abstract format
Applications
Map,Expose,…
ManipulateQuery…
64
What did we do? (alternate view)
Inferencing
Query and Update
Web of Data Applications
Browser Applications
Stand Alone Applications
Common “Graph” Format &Common
Vocabularies
“Bridges”
Data on the Web
65
… the graph representation is independent
of the exact structures
… a change in local database schemas,
XHTML structures, etc, does not affect the
whole“schema independence”
… new data, new connections can be
added seamlessly
The abstraction pays off because…
66
Through URI-s we can link any data to any
data
The “network effect” is extended to the
(Web) data
“Mashup on steroids” become possible
The network effect
67
Ontology Languagesfor theSemantic Web
68
Ontology LanguagesWide variety of languages for “Explicit Specification”
Graphical notationsSemantic networksTopic Maps (see http://www.topicmaps.org/)UMLRDF
Logic basedDescription Logics (e.g., OIL, DAML+OIL, OWL)Rules (e.g., RuleML, LP/Prolog)First Order Logic (e.g., KIF)Conceptual graphs(Syntactically) higher order logics (e.g., LBase)Non-classical logics (e.g., Flogic, Non-Mon, modalities)
Probabilistic/fuzzy
Degree of formality varies widelyIncreased formality makes languages more amenable to machine processing (e.g., automated reasoning)
69
Objects/Instances/IndividualsElements of the domain of discourseEquivalent to constants in FOL
Types/Classes/ConceptsSets of objects sharing certain characteristicsEquivalent to unary predicates in FOL
Relations/Properties/RolesSets of pairs (tuples) of objectsEquivalent to binary predicates in FOL
Such languages are/can be:Well understoodFormally specified(Relatively) easy to useAmenable to machine processing
Many languages use “object oriented” model based on:
70
Web “Schema” LanguagesExisting Web languages extended to facilitate content
descriptionXML XML Schema (XMLS)
RDF RDF Schema (RDFS)
XMLS not an ontology languageChanges format of DTDs (document schemas) to be XML
Adds an extensible type hierarchyIntegers, Strings, etc.
Can define sub-types, e.g., positive integers
RDFS is recognisable as an ontology languageClasses and properties
Sub/super-classes (and properties)
Range and domain (of properties)
71
RDF and RDFSRDF stands for Resource Description Framework
It is a W3C candidate recommendation
(http://www.w3.org/RDF)
RDF is graphical formalism ( + XML syntax +
semantics)for representing metadata
for describing the semantics of information in a machine- accessible way
RDFS extends RDF with “schema vocabulary”, e.g.:Class, Property
type, subClassOf, subPropertyOf
range, domain
72
URIsURI = Uniform Resource Identifier
"The generic set of all names/addresses that are short cuts referring to resources"
URLs (Uniform Resource Locators) are a particular type of URI, used for resources that can be accessed on the WWW (e.g., web pages)
In RDF, URIs typically look like “normal” URLs, often with fragment identifiers to point at specific parts of a document:
http://www.somedomain.com/some/path/to/file#fragmentID
73
Web Ontology Language RequirementsDesirable features identified for Web Ontology Language:
Extends existing Web standards Such as XML, RDF, RDFS
Easy to understand and useShould be based on familiar KR idioms
Formally specified
Of “adequate” expressive power
Possible to provide automated reasoning support
74
From RDFS to OWL
Two languages developed to satisfy above requirementsOIL: developed by group of (largely) European researchers (several from EU OntoKnowledge project)
DAML-ONT: developed by group of (largely) US researchers (in DARPA DAML programme)
Efforts merged to produce DAML+OILDevelopment was carried out by “Joint EU/US Committee on Agent Markup Languages”
Extends (“DL subset” of) RDF
DAML+OIL submitted to W3C as basis for standardisationWeb-Ontology (WebOnt) Working Group formed
WebOnt group developed OWL language based on DAML+OIL
OWL language now a W3C Candidate Recommendation
Will soon become Proposed Recommendation
75
OWL Axioms
Axioms (mostly) reducible to inclusion (v)C ´ D iff both C v D and D v C
76
Semantic Web Wedding Cake
77
The Semantic Web provides technologies
to make such integration possible!
Hopefully you get a full picture at the end …
So where is the Semantic Web?
78
Books – Semantic Web + OWL
79
Books – Semantic Web + OWL
80
Papers – Description Logics
A Description Logic Primer
Markus Krötzsch, Frantisek Simancik, Ian Horrocks
This paper provides a self-contained first introduction to description
logics (DLs).
http://arxiv.org/abs/1201.4089
Attributive concept descriptions with complements
Manfred Schmidt-Schauß, Gert Smolka
The description logic described in this paper is closely related to the
one taught in ARIN.
http://dx.doi.org/10.1016/0004-3702(91)90078-X,