68
1 State of the Semantic Web Ivan Herman, W3C May 2009

Some news about the SW

Embed Size (px)

DESCRIPTION

Presentation given during a tour of Australia, in May 2009. The targeted audience are people who are already familiar with the fundamentals of Semantic Web, and this presentation gives an overview of what is happening at W3C

Citation preview

Page 1: Some news about the SW

1

State of the Semantic Web

Ivan Herman, W3C

May 2009

Page 2: Some news about the SW

2

What is the overall status of the Semantic Web?

Page 3: Some news about the SW

3

We have the basic technologies• Stable specifications for the basics since 2004:

RDF, OWL• Work is being done to properly incorporate rules• We have a standard for query since 2008: SPAR-

QL• We have some additional technologies to

access/create RDF data: GRDDL, RDFa, POWDER, …

• Some fundamental vocabularies became pervasive (FOAF, Dublin Core,…)

Page 4: Some news about the SW

4

Lots of Tools (not an exhaustive list!)• Categories:

• Triple Stores• Inference engines• Converters• Search engines• Middleware• CMS• Semantic Web browsers• Development environments• Semantic Wikis• …

• Some names:• Jena, AllegroGraph, Mulgara,

Sesame, flickurl, …• TopBraid Suite, Virtuoso environ-

ment, Falcon, Drupal 7, Redland, Pellet, …

• Disco, Oracle 11g, RacerPro, IODT, Ontobroker, OWLIM, Tallis Platform, …

• RDF Gateway, RDFLib, Open Anzo, DartGrid, Zitgist, Ontotext, Protégé, …

• Thetus publisher, SemanticWorks, SWI-Prolog, RDFStore…

• …

Page 5: Some news about the SW

5

Lots of tools (cont.)• Significant speed, store capacity, etc; improve-

ments are reported every day• Some of the tools are open source, some are not;

some are very mature, some are not: it is the usual picture of software tools, nothing special any more!

• Anybody can start developing RDF-based applica-tions today

Page 6: Some news about the SW

6

There is a great community• There are lots of tutorials, overviews, and books

around• again, some of them good, some of them bad, just as

with any other areas…• Active developers’ communities

• blogs, IRC channels, mailing lists, various fora: more than what one person can oversee…

Page 7: Some news about the SW

7

Great community…

From a presentation given by David Norheim, Computas AS, at the ESTC2008 Conference, Vienna, Austria

Page 8: Some news about the SW

8

Some deployment communities• Major communities pick the technology up: digital

libraries, defence, eGovernment, energy sector, fin-ancial services, health care, oil and gas industry, life sciences …

• Health care and life science sector is now very active• Semantic Web also appears in the “Web 2.0/Web

3.0” world (whatever that means )• exchange of social data• personal “space” applications• multimedia asset management (video, photos, audio, …)• etc

Page 9: Some news about the SW

9

So what is the Semantic Web?

Page 10: Some news about the SW

10

• There is a growing number of application patterns referring to the Semantic Web:

• data integration using RDF, SKOS, OWL, …• knowledge engineering with complex ontologies

• using, eg, OWL and/or rule based reasoning• better data management, archiving, cataloging, etc

• eg, digital library applications• managing, coordinating, combining Web services• intelligent software agents• improving search (usually using domain specific vocab-

ularies…)• etc

Page 11: Some news about the SW

11

The nice, structured view…

Page 12: Some news about the SW

12

But maybe this is where we are?

Page 13: Some news about the SW

13

• Maybe, but being an elephant is not necessary bad!

• it shows that the Semantic Web is a mature technology• that there is lots of interest, applications• various application areas pick what they need…

• e.g., some need sophisticated knowledge management, so they go for complex ontologies…

• some concentrate on semantically simpler vocabularies but large volume of data

• …and that is fine, there is room for many!

Page 14: Some news about the SW

14

• But it is good to (re-)emphasize some principles• The Semantic Web:

• extends the principles of the Web from documents to data; create a Web of data

Page 15: Some news about the SW

15

• It is the Semantic Web, and not only Semantics!• data, ontologies, vocabularies, etc, can (and should!) be

shared, reused, potentially on Web scale• one can use the Web infrastructure to denote “things”…

• Eg: http://www.ivan-herman/me denotes, well, me (not my home page, not my foaf file, but me!)

• … and add relationships for those, too!• The major importance of the SW is that it provides

an abstract integration layer for data on the Web

Page 16: Some news about the SW

16

Some new technologies to watch

Page 17: Some news about the SW

17

How do I get data out?

Page 18: Some news about the SW

18

How to provide RDF data?• Of course, one could create RDF data manually…• … but that is unrealistic on a large scale• Goal is to generate RDF data automatically when

possible and “fill in” by hand only when necessary• Various data formats should be considered

• databases (relational or otherwise)• data in XML, HTML, in pictures, videos, etc

• Details of the process is still subject of very active R&D!

Page 19: Some news about the SW

19

Bridge to relational databases• Huge amount of data are stored in (relational)

databases• “RDFying” them is impossible

• “Bridges” are being defined:• a layer between RDF and the relational data

• RDB tables are “mapped” to RDF graphs, possibly on the fly• a number of systems can be used as database as well

as triple stores (eg, Oracle, OpenLink, …) • Work for a standard mapping language may start at

W3C soon

Page 20: Some news about the SW

20

Linking Open Data Project• Goal: “expose” open datasets in RDF• Set RDF links among the data items from different

datasets• Set up query endpoints• Altogether billions of triples, millions of links…

Page 21: Some news about the SW

21

Example data source: DBpedia• DBpedia is a community effort to

• extract structured (“infobox”) information from Wikipedia• provide a query endpoint to the dataset• interlink the DBpedia dataset with other datasets on the

Web

Page 22: Some news about the SW

22

Extracting Wikipedia structured data @prefix dbpedia <http://dbpedia.org/resource/>.@prefix dbterm <http://dbpedia.org/property/>.

dbpedia:Amsterdam dbterm:officialName “Amsterdam” ; dbterm:longd “4” ; dbterm:longm “53” ; dbterm:longs “32” ; ... dbterm:leaderTitle “Mayor” ; dbterm:leaderName dbpedia:Job_Cohen ; ... dbterm:areaTotalKm “219” ; ...dbpedia:ABN_AMRO dbterm:location dbpedia:Amsterdam ; ...

Page 23: Some news about the SW

23

Automatic links among open datasets<http://dbpedia.org/resource/Amsterdam> owl:sameAs <http://rdf.freebase.com/ns/...> ; owl:sameAs <http://sws.geonames.org/2759793> ; ...

<http://sws.geonames.org/2759793> owl:sameAs <http://dbpedia.org/resource/Amsterdam> wgs84_pos:lat “52.3666667” ; wgs84_pos:long “4.8833333” ; geo:inCountry <http://www.geonames.org/countries/#NL> ; ...

Processors can switch automatically from one to the other…

Page 24: Some news about the SW

24

The LOD “cloud”, March 2008

Page 25: Some news about the SW

25

The LOD “cloud”, September 2008

Page 26: Some news about the SW

26

The LOD “cloud”, March 2009

Page 27: Some news about the SW

27Generate (meta)data from unstructured data

• An emerging approach:• use Natural Language Processing (NLP) to analyse text• services exist (Reuter’s Open Calais and Tagaroo, Zemanta)• these often return URI-s into, eg, Dbpedia

• Use these techniques to, eg, automatically “tag” entries• eg: Twine, Faviki• the tag URI-s provide “integration points”

Page 28: Some news about the SW

28

Data may be extracted (a.k.a. “scraped”)• Different tools, services, etc, come to the fore:

• services to get RDF data from images’ XMP data, from Flickr…

• scripts to convert spreadsheets to RDF• etc

• Many of these tools are still individual “hacks”, but show a general tendency

• Hopefully more tools will emerge• there is a separate wiki page collecting references to ex-

isting ones

Page 29: Some news about the SW

29

Getting structured data to RDF: GRDDL• Access structured data in XML/XHTML and turn it

into RDF:• defines XML attributes to bind a suitable script to trans-

form (part of) the data into RDF• script is usually XSLT but not necessarily• has a variant for XHTML

• a “GRDDL Processor” runs the script and produces RDF on–the–fly

• A way to access existing structured data and “bring” it to RDF

• eg, a possible link to microformats• exposing data from large XML use bases, like XBRL

Page 30: Some news about the SW

30

Getting structured data to RDF: RDFa• Extends XHTML with a set of attributes to include

structured data into XHTML• Makes it easy to “bring” existing RDF vocabularies

into XHTML• uses namespaces for an easy mix of terminologies

• It can also be used with GRDDL • but: no need to implement a separate transformation

per vocabulary

Page 31: Some news about the SW

31

How to “assign” RDF data to resources?• This is important when the RDF data is used as

“metadata”• Some examples:

• copyright information for your photographs• is a Web page usable on a mobile phone and how?• bibliographical data for a publication• annotation of the data resulting from a scientific experi-

ment• etc

• The issue: if I have the URI of the resource (photo-graph, publication, etc), how do I find the relevant RDF data?

Page 32: Some news about the SW

32

The data might be embedded• Some data formats allow the direct inclusion of

(RDF) metadata:• SVG (Scalable Vector Graphics)• XHTML+RDFa• microformats+GRDDL• JPG files using the comment area and, eg, Adobe’s

XMP technology• That can include all the information, or link to fur-

ther data

Page 33: Some news about the SW

33

POWDER• POWDER (Protocol for Web Description Re-

sources) provides for more elaborate scenarios• Lets you define predicates that are automatically

“assigned” to a set of resources

Page 34: Some news about the SW

34

POWDER scenario: copyright for photos

Page 35: Some news about the SW

35

Some technical details…• The “description resource” is an XML file• This XML file has a canonical conversion to OWL• Specialized POWDER services will be set up:

– give the URI of a Resource and the corresponding de-scription resource, return all RDF statements on that URI

Page 36: Some news about the SW

36

Simple Knowledge Organization System• Goal: represent and share classifications, glossar-

ies, thesauri, etc, as developed in the “Print World”. • for example:

• Dewey Decimal Classification, Art and Architecture Thesaur-us, ACM classification of keywords and terms…

• allow for a quick port of this traditional data, combine it with other data

• This is where SKOS comes in: define classes and properties to add those structures to an RDF uni-verse

Page 37: Some news about the SW

37

Example: entries in a glossary

(from the RDF Semantics Glossary)

Assertion (i) Any expression which is claimed to be true.

(ii) The act of claiming something to be true.Class A general concept, category or classification. Something

used primarily to classify or categorize other things.Resource (i) An entity; anything in the universe.

(ii) As a class name: the class of everything; the most inclusive category possible.

Page 38: Some news about the SW

38

Example: entries in a glossary in SKOS

Page 39: Some news about the SW

39

A more complex structure(using LCSH terms)

Page 40: Some news about the SW

40

SKOS and digital libraries• SKOS plays an important role in “bridging” to digital

libraries• a huge community with its own traditions, style…• … but huge amount of data to be “linked” to the Se-

mantic Web!• Major library metadata standards are being re-

defined in terms of RDF (and SKOS), • eg, “Resource Description and Access” (RDA)

• a major cataloguing rule set for librarians• potentially, all major library catalogues around the globe could

be translated into RDF and, eg, linked as an Open Linked Data…

Page 41: Some news about the SW

41

Conclusions on data access• There are many different data sources around• Making them available on the Web and interlinking

them is essential• “Give your raw data” — Tim Berners-Lee

• There are number of technologies to do that:• mapping from databases, GRDDL, RDFa, SKOS,

POWDER, conversion tools

Page 42: Some news about the SW

42

Querying Data

Page 43: Some news about the SW

43

Querying RDF: SPARQL• Is a W3C Standard since January 2008

• it has already become one of the absolutely essential technologies on the SW

• SPARQL is• a query language based on graph patterns• a protocol layer to use SPARQL over, eg, HTTP• an XML return format for the query results

Page 44: Some news about the SW

44

SPARQL as a unifying point!

Page 45: Some news about the SW

45

New SPARQL WG: Goals• To define a small set of extensions to SPARQL• No complex change, backward compatibility• Listen to user and implementation experiences of

the past few years• Group started in February 2009

Page 46: Some news about the SW

46

Planned features• Update, ie, ability to change the RDF store• Service description framework

• what type of extensions, inference possibilities, etc, are available at the endpoint

• Addition to the query language• aggregate functions• subqueries• negation• project expressions

Page 47: Some news about the SW

47

Planned features(tentative syntax examples)

• Aggregate functions and project expressions:•

• Subqueries:•

• Negation:

SELECT AVG(?age) AS average_age WHERE { .... }SELECT (?age < 18) AS minor WHERE { ... }

SELECT ?person (SELECT ?n WHERE { ?person foaf:name ?n } LIMIT 1)WHERE { <http://www.ivan-herman.net/me> foaf:knows ?person. }

SELECT *WHERE { ?x :p ?v. UNSAID { ?x :q ?v. } }

Page 48: Some news about the SW

48

Possible features (time permitting)• Definition of “entailment regimes”

• RDFS, OWL Profiles, RIF• Property paths• Commonly used functions (eg, string manipulation)• Basic control for federated queries• Additional query language syntax

• commas in select lists, some operators in filters

Page 49: Some news about the SW

49

Ontologies (OWL)

Page 50: Some news about the SW

50

Ontologies: OWL• This is also a stable specification since 2004• Separate layers have been defined, balancing ex-

pressibility vs. implementability (OWL-Lite, OWL-DL, OWL-Full)

• Looking at the tool list on W3C’s wiki again:• a number programming environments include OWL

reasoners• stand-alone reasoners (downloadable or on the Web)• ontology editors come to the fore

Page 51: Some news about the SW

51

Ontologies• Large ontologies are being developed (converted

from other formats or defined in OWL). For ex-ample:

• eClassOwl: eBusiness ontology for products and ser-vices, 75,000 classes and 5,500 properties

• National Cancer Institute’s ontology: about 58,000 classes

• Open Biomedical Ontologies Foundry: a collection of ontologies, including the Gene Ontology, to describe gene and gene product attributes; or UniProt for protein sequence and annotation terminology and data

• BioPAX: for biological pathway data• ISO 15926: “Integration of life-cycle data for process

plants including oil and gas production facilities”

Page 52: Some news about the SW

52

OWL in applications• An increasing number of applications rely on OWL

(Pfizer, Nasa, Eli Lilly, Elsevier, FAO, …)• Not all use complex reasoning; in many cases a

small fraction of OWL is used

Page 53: Some news about the SW

53

OWL Working Group• A new Working Group works on the revision of

OWL (a.ka. OWL 2)• The goal of the group:

1. add a few extensions to current OWL that are useful, and is known to be implementable

• many things happened in research since 20042. define “profiles” of OWL that are:

• smaller, easier to implement and deploy • cover important application areas and are easily understand-

able to non-expert users

Page 54: Some news about the SW

54

Some new features in OWL 2• Syntactic sugars

– eg, disjoint union of classes• New constructs for properties

– property chains, reflexive properties• Extended datatype facilities

– define a numerical interval as an OWL Datatype class• Profiles

Page 55: Some news about the SW

55

The overall structure has not changed

Page 56: Some news about the SW

56

Profiles• OWL 2 has the same duality with Full and DL• But, for a number of applications, but even OWL

Lite is too much• There is a need for “light” versions of OWL: just a

few extra possibilities added to RDFS

Page 57: Some news about the SW

57

OWL 2 defines “profiles”• Further restrictions on how terms can be used and

what inferences can be expected• The semantic approaches are identical, but restric-

tions may ensure even more manageable imple-mentations

Page 58: Some news about the SW

58

OWL 2 profiles• Classification and instance queries in polynomial

time: OWL-EL• Implementable on top of conventional relational

database engines: OWL-QL • Implementable on top of traditional rule engines:

OWL-RL

Page 59: Some news about the SW

59

An example: OWL-RL• Goal: to be implementable through rule engines• Usage follows a similar approach to RDFS:

− merge the ontology and the instance data into a big RDF graph

− use the rule engine to add new triples (as long as it is pos-sible)

− then, for example, use SPARQL to query the resulting (expanded) graph

• This application model is very important for RDF based applications

Page 60: Some news about the SW

60

Miscellaneous

Page 61: Some news about the SW

61

Everything has not been solved…• There are a number of issues, problems

• missing functionalities: encryption/signatures, fuzzy reasoning, …

• misconceptions, messaging problems• need for more applications, deployment, acceptance• incorporation of rule languages (that is being worked on

by the RIF Working Group)• etc

Page 62: Some news about the SW

62

Other items…• Security, trust, provenance

• combining cryptographic techniques with the RDF mod-el, sign a portion of the graph, etc

• trust models• Quality constraints on graphs

• “may I be sure that certain patterns are present in a graph?”

• Ontology merging, alignment, term equivalences, versioning, development, …

• What does reasoning mean on billions of triples?• etc

Page 63: Some news about the SW

63

Other items: uncertainty• Fuzzy logic

• look at alternatives of DL based on fuzzy logic• alternatively, extend RDF(S) with fuzzy notions

• Probabilistic statements• have an OWL class membership with a specific probab-

ility• combine reasoners with Bayesian networks

• A W3C Incubator Group issued a report on the cur-rent status, possibilities, directions, etc

• report published in April 2008

Page 64: Some news about the SW

64

Other items: naming• The SW infrastructure relies on unique naming of

“things” via URI-s• Lots of discussions are happening that also touch

upon general Web architecture:• HTTP URI-s or other URN-s?

• using non-HTTP unnecessarily complicates the general infra-structure

• URI-s for “informational resources” and “non informa-tional resources”

• how to ensure that URI-s used on the SW are derefer-encable

• etc

Page 65: Some news about the SW

65

Other items: naming (cont)• A different aspect of naming: what is the URI for a

specific entity (regardless of the technical details)• what is the unique URI for, eg, Bach’s Well-Tempered

Clavier?• obviously important for, eg, music ontologies and data• who has the authority or the means to define and maintain

such URI-s? • should we define characterizing properties for these and use owl:sameAs instead of a URI? • the traditional library community may be of a big help in this

area• what is the URI of time-dependent entity (e.g., a specific

point within a video)?

Page 66: Some news about the SW

66

Revision of the RDF model?• Some restrictions in RDF may be unnecessary (b-

Nodes as predicates, literals as subject, …)• Issue of “named graph”: possibility to give a URI to

a set of triplets and make statements on those• Syntax issues in RDF/XML• Add a time tag to statements?• …

Page 67: Some news about the SW

67

A major problem: messaging• Some of the messaging on Semantic Web has

gone terribly wrong over the years• This has created lots of (unnecessary) controver-

sies• The whole community should be active in rectifying

those…

Page 68: Some news about the SW

68

Thank you for your attention!

These slides are also available on the Web:

http://www.w3.org/2009/Talks/05-Oz-StateOfSW-IH/