78
1 236369

The Semantic Web

Embed Size (px)

DESCRIPTION

The Semantic Web. The Need. “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee, T, Hendler , J & Lassila , O ‘The semantic web’, Scientific American , May 2001. Semantic Processing. - PowerPoint PPT Presentation

Citation preview

Page 1: The Semantic Web

1236369

Page 2: The Semantic Web

The Need

“Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

2236369

Page 3: The Semantic Web

Semantic ProcessingOur goal: to be able to pose complex search

tasks that using the semantics of pieces of information, e.g.,

I want to purchase a DVDof “Dore the Explorer” at a price lower than 10$.Is such a CD available atamazon.com?

3

Page 4: The Semantic Web

Current search agents are not suitable for such task

4236369

Page 5: The Semantic Web

236369 5

In order to get the feeling, try to read and understand a page written in a foreign

language

Page 6: The Semantic Web

Current Solution

Use “intelligent” agents

The Semantic-Web Approach

Content is machine-understandable by being bound to some formal description

of itself (i.e. metadata)

6236369

Page 7: The Semantic Web

So what is the Semantic Web?Humans can easily “connect the dots” when

browsing the Web… you disregard advertisements you “know” (from the context) that one

phone number in my homepage is my office phone and the other one is my fax number

etc.… but machines can’t!The goal is to have a Web of Data to that

machines can exploite

7236369

Page 8: The Semantic Web

Some NeedsBetter searchSemantics in Web ServicesData integration

8236369

Page 9: The Semantic Web

Need to Improve Web Search

9236369

Page 10: The Semantic Web

Need for Semantics in Web Services

If the services are ubiquitous, searching issue comes up, for example: “find me the most elegant Schrödinger equation solver”

but what does it mean to be “elegant”? “most elegant”?

mathematicians ask these questions all the time… It is necessary to characterize the service not

only in terms of input and output parameters… …but also in terms of its semantics

10236369

Page 11: The Semantic Web

Need for Data IntegrationDatabases are very different in structure & in

contentLots of applications require managing several

databases after company mergers combination of administrative data for e-

Government biochemical, genetic, pharmaceutical research etc.

Most of these data are accessible on the Web (though not necessarily public yet)

11236369

Page 12: The Semantic Web

Example: Data Integration in Life Sciences

12236369

Page 13: The Semantic Web

And the Problem is Real

13236369

Page 14: The Semantic Web

GoalsWeb of data - provides common data

representation framework to facilitate integrating multiple sources to draw new conclusions

Increase the utility of information by connecting it to its definitions and to its context

More efficient information access and analysis

236369 14

Page 15: The Semantic Web

What Is Needed (Technically)?To make data machine process-able, we need:

unambiguous names for resources that may also bind data to real world objects: URI-s

a common data model to access, connect, describe the resources: RDF

access to that data: SPARQL common vocabularies: RDFS, OWL, SKOS reasoning logics: OWL, Rules

The “Semantic Web” is an infrastructure for the interchanging and integrating data on the Web

It extends the current Web (and does not replace it)

15236369

Page 16: The Semantic Web

Required ApplicationsAgents that search the Web and retrieve

valuable information to the end userWeb services that publish their informationPrograms that try to integrate data of

different web services and to produce new results or draw new conclusions from the integrated data

236369 16

Page 17: The Semantic Web

Ontologies & Inference Engines

“For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 17

Page 18: The Semantic Web

The Four Building Blocks

1. XML2. RDF3. Ontologies4. Agents

236369 18

Page 19: The Semantic Web

XML

“XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean”

236369 19

Page 20: The Semantic Web

RDF –Resource Description FrameworkMeaning encoded in sets of ‘triples’: entities

have properties which have valuesEntities, properties and values all have

distinct URIs

236369 20

“imagine that we have access to a variety of databases with information about people, including their addresses. If we want to find people living in a specific zip code, we need to know which fields in each database represent names and which represent zip codes. RDF can specify that "(field 5 in database A) (is a field of type) (zip code)," using URIs rather than phrases for each term.” Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001a

Page 21: The Semantic Web

OntologiesDatabase A and Database B may use

different fields to contain ‘zip code’Ontologies sort this outOntology = ‘a document or file that

formally defines the relations among terms’

Ontologies for the web normally haveA taxonomyA set of inference rules

236369 21

Page 22: The Semantic Web

Agents

“Agent based computing appears to be the appropriate paradigm to work in a complex world with multiple ontologies, fragments and multiple inferencing engines.”

Stork, Hans-Georg and Mastroddi, Franco, Semantic Web Technologies - a New Action Line in the European Commission’s IST Programme, 2001

236369 22

Page 23: The Semantic Web

The Power of Agents - Integration“The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available.”

Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 23

Page 24: The Semantic Web

‘Ambient Intelligence’

“In the next step, the Semantic Web will break out of the virtual realm and extend into our physical world. URIs can point to anything, including physical entities, which means we can use the RDF language to describe devices such as cell phones and TVs.”Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001

236369 24

Page 25: The Semantic Web

Resource Description Framework

236369 25

Page 26: The Semantic Web

26

What is RDF?A part of the semantic-web activityRDF is a general-purpose language for

representing information on the webSpecifically, objects and relationships

Designed to allow computer applications to process data based on its semanticsRather than displaying data to humans (as

opposed to RSS)

An RDF document is actually a labeled graph that is represented in XMLThe specific language is called RDF/XMLW3C recommendation (Feb. 2004)

236369

Page 27: The Semantic Web

Resource Description Framework (RDF)

The data model of the Semantic WebThe data model of the Semantic Web

A schema-less data model that features unambiguous identifiers and named relations between pairs of resources

A schema-less data model that features unambiguous identifiers and named relations between pairs of resources

A labeled, directed graph of relations between resources and literal valuesA labeled, directed graph of relations between resources and literal values

27236369

Page 28: The Semantic Web

RDF Data Consists of TripletsRDF data is a set of statementsEach statement is a triplet (Resource,

Property, Value)Sometimes we refer to a triplet using the

terminology of (Subject, Predicate, Object)

236369 28

The author of http://www.cs.technion.ac.il/kanza/myPage.html

is Yaron KanzaResource (Subject): myPage.html

Property (Predicate): author Value (Object): Yaron Kanza

Page 29: The Semantic Web

29

RDF Data

Subject Objectpredicate

The basic element: Triple (labeled edge)

Person#845

#1002

address

postalCode

6941Haifa

city

Herzel

street

RDF document: edge-labeled graph

Statement

236369

Page 30: The Semantic Web

URI-s Play a Fundamental RoleAnybody can create (meta)data on any resource

on the Web e.g., the same Person could be annotated through

other terms semantics is added to existing Web resources via

URI-sData exist on the Web, because it is accessible

through standard Web meansURI-s ground RDF into the Web

information can be retrieved using existing tools this makes the “Semantic Web”, well… “Semantic

Web”

30236369

Page 31: The Semantic Web

Example: RDF Triples

<rdf:Description rdf:about="http://.../membership.svg#FullSlide"> <axsvg:graphicsType>Chart</axsvg:graphicsType> <axsvg:labelledBy rdf:resource="http://...#BottomLegend"/> <axsvg:chartType>Line</axsvg:chartType></rdf:Description>

31236369

Page 32: The Semantic Web

URI-s: Merging (data integration)

It becomes easy to merge data Merge can be done because statements refer

to the same URI-s nodes with identical URI-s are considered

identicalMerging is a very powerful feature of RDF

metadata may be defined by several (independent) parties…

…and combined by an application one of the areas where RDF is much handier

than pure XML

32236369

Page 33: The Semantic Web

Data integration example

33236369

Page 34: The Semantic Web

34

page.html

John Smith

John’s Home Page

DC:Creator

DC:Title

<?xml version=“1.0”?><rdf:RDF

xmlns:rdf=“http://www.w3.org/TR/WD-rdf-syntax#”xmlns:dc=“http://purl.org/metadata/dublin_core#”>

<rdf:Description about=“page.html”> <dc:Creator>John Smith</dc:Creator> <dc:Title>John’s Home Page</dc:Title> </rdf:Description>

</rdf:RDF>

236369

Page 35: The Semantic Web

35

page.html

John SmithJohn’s Home Page [email protected]

dc:Title

dc:Creator

Name Email

. . .<Description about=“page.html”> <dc:Creator> <Description> <corp:Name>John Smith</corp:Name> <corp:Email>[email protected]</corp:Email> </Description> </dc:Creator> <dc:Title>John’s Home Page</dc:Title></Description>

</RDF>236369

Page 36: The Semantic Web

ContainersGroups of things: <bag> <seq>

<alt><bag>: unordered list; duplicates

allowed<seq>: ordered list; duplicates

allowed<alt>: list of alternatives; one will

be selected236369 36

Page 37: The Semantic Web

DC – Dublin CoreThe mission: To make it easier to find

resources using the Internet through the following activities

Developing metadata standards for discovery across domains

Defining frameworks for the interoperation of metadata sets

Facilitating the development of community- or discipline-specific metadata sets that are consistent with the above items

37236369

Page 38: The Semantic Web

DC - Core Metadata Element SetoTitle (dc:title)oSubject (dc:subject)oDescription

(dc:description)oCreatoroPublisher

(dc:publisher)oContributoroDate

o Typeo Formato Identifier

(dc:identifier)o Sourceo Languageo Relation

(dcterms:isPartOf)o Coverageo Rights

38236369

Page 39: The Semantic Web

39

A set of fifteen basic properties for describing generalized Web resources“Title”: the name given to the resource“Creator”: the person or organization primarily

responsible for the resource“Subject”: what the resource is about“Description”: a description of the content“Publisher”: the person or organization responsible

for making the resource available“Contributor”: someone who has provided content to

the resource other than the creator“Date”: date of creation or publication

Dublin Core

236369

Page 40: The Semantic Web

40

“Type”: type of resource, such as home page, technical report, novel, photograph…

“Format”: data format of the resource“Identifier”: URL, ISBN number, …“Source”: another resource that this resource is derived

from“Language”: the language of the content“Relation”: another resource and its relationship to this

one“Coverage”: the portion of time or space described by

this resource (atlases, histories, etc.)“Rights”: the intellectual property rights adhering to

this resource, or a pointer to them

Dublin Core

236369

Page 41: The Semantic Web

41

Manually from HTML or “user domain XML”With special assisting tools – like Protégé,

Reggie, DC-dot, RDF for XMLIdeally – with some automated procedure

from HTML/XML documentsCan we use XSLT for that?

Creating RDF documents

236369

Page 42: The Semantic Web

236369 42

Page 43: The Semantic Web

RDF SchemaRDF Schema (RDFS) enriches the data model

of RDF, adding vocabulary and associated semantics forClasses and subclassesProperties and sub-propertiesTyping of properties

Support for describing simple ontologiesAdds an object-oriented flavorBut with a logic-oriented approach and using

“open world” semantics236369 43

Page 44: The Semantic Web

44

Not an XML Schema!A “companion” specification for RDF specClass, Type, subClassOf,Properties: domain, rangeMisc: label, comment, isDefinedBy,etc.

RDF Schema

236369

Page 45: The Semantic Web

Classes, Resources, … Basically, traditional ontologies:

use the term “mammal” “every dolphin is a mammal” “Flipper is a dolphin” etc.

RDFS defines resources and classes: everything in RDF is a “resource” “classes” are also resources, but… they are also a collection of possible resources (i.e.,

“individuals”) “mammal”, “dolphin”, …

Relationships are defined among classes/resources: “typing”: an individual belongs to a specific class (“Flipper is a

dolphin”) “subclassing”: instance of one is also the instance of the other

(“every dolphin is a mammal”)

RDFS formalizes these notions in RDF45236369

Page 46: The Semantic Web

Classes, Resources in RDF(S)

RDFS defines rdfs:Resource, rdfs:Class as nodes; rdf:type, rdfs:subClassOf as properties

46236369

Page 47: The Semantic Web

Typed NodesA resource may belong to several classes

rdf:type is just a property…“Flipper is a mammal, but Flipper is also a TV

star…”i.e., it is not like a data type in this sense!The type information may be very important

for applications e.g., it may be used for a categorization of

possible nodes

47236369

Page 48: The Semantic Web

Inferred Properties

(#Flipper rdf:type #Mammal) is not in the original RDF data… …but can be inferred from the RDFS rules Better RDF environments will return that triplet, too

48236369

Page 49: The Semantic Web

Inference: FormalThe RDF Semantics document has a list of

(44) entailment rules :“if such and such triplets are in the graph, add this

and this triplet”do that recursively until the graph does not changethis can be done in polynomial time for a specific

graph

Whether those extra triplets are physically added to the graph, or deduced when needed is an implementation issue

49236369

Page 50: The Semantic Web

Properties (Predicates)Property is a special class (rdf:Property)Properties are constrained by their range

and domainProperties are also resources… (have URI’s)For example, (P rdfs:range C) means:

1. P is a property

2.C is a class instance3.when using P, the “object” must be an individual in C

* this is an RDF statement with subject P, object C, and property rdfs:range

50236369

Page 51: The Semantic Web

Property Specification Example

51236369

Page 52: The Semantic Web

LiteralsLiterals may have a data type (floats, ints, etc)

defined in XML Schemas, including full XML Fragments

(Natural) language can also be specified (via xml:lang)

52236369

Page 53: The Semantic Web

236369 53

<rdf:RDFxmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"xml:base= "http://www.animals.fake/animals#">

<rdf:Description rdf:ID="animal"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/></rdf:Description>

<rdf:Description rdf:ID="horse"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#animal"/></rdf:Description>

</rdf:RDF>Horse is defined as subclass of animal

Example

Page 54: The Semantic Web

Example<?xml version="1.0"?><rdf:RDF

xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base= "http://www.animals.fake/animals#"><rdfs:Class rdf:ID="animal" /><rdfs:Class rdf:ID="horse">

<rdfs:subClassOf rdf:resource="#animal"/> </rdfs:Class>

</rdf:RDF>

236369 54

Abbreviated version. Works because an RDFS class is an RDF resource.

Use rdfs:Class instead of rdfDescription and drop the rdf:type information

Page 55: The Semantic Web

RDF and RDF Schema

<rdf:RDF xmlns:g=“http://schema.org/gen” xmlns:u=“http://schema.org/univ”> <u:Chair rdf:ID=“john”> <g:name>John Smith</g:name> </u:Chair></rdf:RDF>

<rdfs:Property rdf:ID=“name”> <rdfs:domain rdf:resource=“Person”></rdfs:Property>

<rdfs:Class rdf:ID=“Chair”> <rdfs:subclassOf rdf:resource= “http://schema.org/gen#Person”></rdfs:Class>

u:Chair

John Smith

rdf:typeg:name

g:Person

g:name

rdfs:Class rdfs:Property

rdf:typerdf:type

rdf:type

rdfs:subclassOf

rdfs:domain

55236369

Page 56: The Semantic Web

SPARQL Protocol and RDF Query Language

236369 56

Page 57: The Semantic Web

SPARQLSPARQL =

Query Language + Protocol + XML Results Format

• Access and query RDF graphs

• Product of the RDF Data Access Working Group

236369 57

Page 58: The Semantic Web

58

SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?title2WHERE{ ?doc dc:title "SPARQL at speed" . ?doc dc:creator ?c . ?docOther dc:creator ?c . ?docOther dc:title ?title2

}

• On an abstracts/papers database:“Find other papers by the authors of a given paper.”

236369

Page 59: The Semantic Web

59

SPARQL QueryPREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX shop: <http://example/shop#>

SELECT ?titleWHERE{ ?doc dc:title ?title . FILTER regex(?title, "SPARQL") . ?doc dc:creator ?c . ?c foaf:name ?name . OPTIONAL { ?doc shop:price ?price }}

• “Find books with ‘SPARQL’ in the title. Get the authors’ name and the price (if available).”

• Multiple vocabularies

236369

Page 60: The Semantic Web

60

Inference• An RDF graph may be backed by inference

− OWL, RDFS, application, rules

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?typeWHERE{ ?x rdf:type ?type .}

:x rdf:type :C .:C rdfs:subClassOf :D .

--------| type |========| :C || :D |--------

236369

Page 61: The Semantic Web

61

Another ExamplePREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX ldap: <http://ldap.hp.com/people#>PREFIX foaf:

SELECT ?name ?email{ ?doc dc:title ?title . FILTER regex(?title, “SPARQL”) . ?doc dc:creator ?reseacher . ?researcher ldap:email ?email . ?researcher ldap:name ?name}

• “Find the name and email addresses of authors of a paper about SPARQL”

236369

Page 62: The Semantic Web

Other SPARQL FeaturesLimit the number of returned results;

remove duplicates, sort them, … Return the full subgraph (instead of a

list of bound variables) Construct a graph combining a

separate pattern and the query results Use datatypes and/or language tags

when matching a pattern

62236369

Page 63: The Semantic Web

Programming Practice - JenaRDF toolkit in Java from HP’s Bristol lab; it has

a large number of classes/methods listing, removing associated properties, objects, comparing

full RDF graphs manage typed literals, mapping Seq, Alt, etc. to Java

constructs

an “RDFS Reasoner” a full SPARQL implementation a layer (Joseki) for remote access of triples

(essentially, and triple database) and more… Probably the most widely used RDF environment in

Java today

63236369

Page 64: The Semantic Web

Programming Practice - Jena RDF toolkit in Java from HP’s Bristol lab; it has

a large number of classes/methods listing, removing associated properties, objects,

comparing full RDF graphs manage typed literals, mapping Seq, Alt, etc. to Java

constructs an “RDFS Reasoner” a full SPARQL implementation a layer (Joseki) for remote access of triples

(essentially, and triple database) and more… Probably the most widely used RDF environment in

Java today

64236369

Page 65: The Semantic Web

236369 65

Page 66: The Semantic Web

RDF Schema is LimitedWe cannot express facts such as

Two classes are disjointBuild a class that is the union of two classesCardinality restrictionScope of propertiesProvide relationships between properties, such

as transitive, unique, inverse

236369 66

Page 67: The Semantic Web

Ontologies (OWL)RDFS is useful, but does not solve all the

issuesCan a program reason about some terms?

E.g.: “if «A» is left of «B» and «B» is left of «C», is «A» left of «C»?”

Programs should be able to deduce such statements

If somebody else defines a set of terms: are they the same?

67236369

Page 68: The Semantic Web

How to speak Ontology?We need a Web Ontologies Language to

define: more on the terminology used in a specific context more constraints on properties, logical

characterization of properties etc.

W3C’s Ontology Language (OWL) - A layer on top of RDFS with additional possibilities

“OWL is now the most used KR language in the history of AI…” (Jim Hendler)

68236369

Page 69: The Semantic Web

OWLA Web ontology language that is more

expressive than RDF and RDF SchemaWritten in XML on top of RDFUsing OWL we want to provide exact

descriptions of items and the relationships between them

236369 69

Page 70: The Semantic Web

Why “OWL” and not “WOL”?Some urban legends…

e.g., reference to Owl from Winie the Pooh, who misspelled his name as “WOL”

A reference to an AI project at MIT of the mid 70’s by Bill Martin, called “One World Language”…

an early attempt for a KR language and associated ontology, intended to be a universal language for encoding meaning for computers

70236369

Page 71: The Semantic Web

Property Restrictions in OWLRestriction may be by: value constraints (i.e.,

further restrictions on the range) all values must be from a class at least one value must be from a class

cardinality constraints (i.e., how many times the property can be

used on an instance?) minimum cardinality maximum cardinality exact cardinality

71236369

Page 72: The Semantic Web

Property Restriction Example“A dolphin is a mammal living in the sea or in

the Amazonas”

72236369

Page 73: The Semantic Web

OWL (cont.)Property Characterization

In OWL, one can characterize the behavior of properties (symmetric, transitive, …)

Term Equivalence/Relations For classes (owl:equivalentClass, owl:disjointWith) For properties (owl:equivalentProperty, owl:inverseOf) For individuals (owl:sameAs, owl:differentFrom)

Special class owl:Ontology with special properties:

owl:imports, owl:versionInfo, owl:priorVersion owl:backwardCompatibleWith, owl:incompatibleWith rdfs:label, rdfs:comment can also be used

73236369

Page 74: The Semantic Web

OWL and LogicOWL expresses a small subset of

First Order Logic it has a “structure” (class hierarchies,

properties, datatypes…), and “axioms” can be stated within that structure only.

Inference based on OWL is within this framework only

74236369

Page 75: The Semantic Web

However: Ontologies are Hard!A full ontology-based application is

Complex systemHard to implementHeavy to runAnd not all application may need it

Three layers of OWL are defined: Lite, DL(Description Logic) Full

decreasing level of complexity and expressiveness

75236369

Page 76: The Semantic Web

OWL Example: FOAF – The way to describe yourselfStands for "Friend Of A Friend"Provides structured linksInformation distributed & extensible

Name (foaf:name) E-mail (Foaf:mbox) Representing picture

(Foaf:img) Your publications

(Foaf:publications) Your online account

(Foaf:holdsAccount)

76236369

Page 77: The Semantic Web

Linkshttp://www.w3.org/RDF/s

236369 77

Page 78: The Semantic Web

SPARQL LinksJena: Java and .Net Semantic Web

Frameworkhttp://jena.sourceforge.net/

SPARQL Queryhttp://jena.sourceforge.net/ARQ

SPARQL Protocolhttp://www.joseki.org

SquirrelRDF: Access legacy SQL:http://jena.sourceforge.net/SquirrelRDF

236369 78