17
Linking the Open Data? Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM) ODX’13 Montreal, April 6th

Linking the Open Data? by Petko Valtchev

  • Upload
    trudat

  • View
    290

  • Download
    3

Embed Size (px)

DESCRIPTION

Slides presented at Open Data Exchange 2013, April 6 2013, Montreal, Canada. ODX13.com. Sponsored by Trudat.co

Citation preview

Page 1: Linking the Open Data? by Petko Valtchev

Linking the Open Data?Linking the Open Data?

Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM)

ODX’13Montreal, April 6th

Page 2: Linking the Open Data? by Petko Valtchev

Why Link The DataWhy Link The Data“I want you to put your data on the Web.”

Sir T. Berners-Lee (TED’07)

•Original Web (1990s):

• network of linked documents

•Web of Data (2000s):

• network of interlinked data items

•Linked Open Data: Publish data on the Web:

• max. reuse and inter-connections, min. redundancy, network effect

Data is really useful, whenever it is shared and combined with other data.

Page 3: Linking the Open Data? by Petko Valtchev

Linking Data?• But how should one produce such data?  

1. Global identification: a URL should point to any data item. 

2. Reachability via HTTP: accessing the URL should retrieve the data item. 

3. Linked structure: outgoing links (typed!) in the data should point to additional data with URLs. 

• THE language : Resource Description Framework (RDF)

1.benefits: links provide context

http://www.w3.org/DesignIssues/LinkedData.html

Page 4: Linking the Open Data? by Petko Valtchev

A Graph?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person

rdf:typerdf:type

Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name

dbpedia:Montredbpedia:Montreal al

dbpedia:Montredbpedia:Montreal al

foaf:based_nearfoaf:based_near

3,407,9633,407,9633,407,9633,407,963

dpprop:dpprop:populationpopulation

Page 5: Linking the Open Data? by Petko Valtchev

A Graph?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person

rdf:typerdf:type

Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name

dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal

foaf:based_nearfoaf:based_near

3,407,9633,407,9633,407,9633,407,963

dpprop:dpprop:populationpopulation

dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada

dbpedia-owl:countrydbpedia-owl:country

Page 6: Linking the Open Data? by Petko Valtchev

A Graph? Global?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person

rdf:typerdf:type

Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name

dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal

foaf:based_nearfoaf:based_near

3,407,9633,407,9633,407,9633,407,963

dpprop:dpprop:populationpopulation

pd:linguopd:linguopd:linguopd:linguo foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type

Linkun GuoLinkun GuoLinkun GuoLinkun Guo

foaf:namefoaf:name

dbpedia:Beijingdbpedia:Beijingdbpedia:Beijingdbpedia:Beijing

foaf:based_nearfoaf:based_near

20,693,00020,693,00020,693,00020,693,000

dpprop:populationdpprop:population

foaf:knowsfoaf:knows

dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada

dbpedia-owl:countrydbpedia-owl:country

Page 7: Linking the Open Data? by Petko Valtchev

A Graph? Global? Giant?

pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type

Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name

dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal

foaf:based_nearfoaf:based_near

3,407,9633,407,9633,407,9633,407,963

dpprop:dpprop:populationpopulation

pd:linguopd:linguopd:linguopd:linguo foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type

Linkun GuoLinkun GuoLinkun GuoLinkun Guo

foaf:namefoaf:name

dbpedia:Beijingdbpedia:Beijingdbpedia:Beijingdbpedia:Beijing

foaf:based_nearfoaf:based_near

20,693,00020,693,00020,693,00020,693,000

dpprop:populationdpprop:population

foaf:knowsfoaf:knows

dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada

dbpedia-owl:countrydbpedia-owl:country

dbpedia-owl:countrydbpedia-owl:countrydbpedia:Torontodbpedia:Torontodbpedia:Torontodbpedia:Toronto

dbpedia:Quebecdbpedia:Quebecdbpedia:Quebecdbpedia:Quebec dbpedia-owl:countrydbpedia-owl:country

Page 8: Linking the Open Data? by Petko Valtchev

How is it Open ?• ‘‘If you want to start interlinking data then you can only do that if the data is

licensed in a way that allows such interlinking.’’

• But why is Open data on the Web not ‘linked’?

• CVS, XML, RDBs• no easy integration

• Web 2.0 Mashups?• data sources fixed

• Linked Open Data (LOD) cloud - global data space

Rufus Pollock

Page 9: Linking the Open Data? by Petko Valtchev

The LOD cloud family picture

Sept. 2011

Page 10: Linking the Open Data? by Petko Valtchev

What for?• Linking Open Drug Data (LODD), since 2008

• Publish/interlink publicly available data about drugs

• Provide answers to non trivial questions on the LODD

• For physicians

• Which are the equivalent drugs for a given condition?

• What drugs are currently under clinical trial?

• For patients

• What alternatives exist to a given drug?

• What are the contraindications for a drug?

Page 11: Linking the Open Data? by Petko Valtchev

Supplemental Slides

Supplemental Slides

Petko Valtchev

(Assoc. Prof., Dept. of CS, UQAM)

ODX’13

Montreal, April 6th

Page 12: Linking the Open Data? by Petko Valtchev

Main Entry Points into the LOD cloud

• DBPedia - a large multi-domain dataset containing extracted data from Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11 different languages.

• YAGO - precise knowledge base with 1.7M entities and 15M facts derived from Wikipedia and WordNet.

• FOAF (Friend Of A Friend) - describes people, the links between them and the things they create and do.

• GoodRelations - a vocabulary for eCommerce, enabling web sites to publish details of their products and services in a machine-readable way.

• GeoNames - provides RDF descriptions of more than 6.5M geographical features worldwide.

Page 13: Linking the Open Data? by Petko Valtchev

Cross-Media Cultural Heritage Management with LOD

• Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends recommended him the flourishing Mile End district where many cafés serve espresso and european pastry.

• Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter", Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish community on two of the area's principal streets, St Urban St., and "The Main" St. in the 1930s.

• Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains from that period. While sipping his coffee, Simon checks the author's biography and finds he has written another book, "Barney's Version".

• After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and after querying the app’s knowledge base he learns that's Rachelle Lefevre who's born in Montreal.

• Before walking out, he checks the availability of a copy of "Barney's Version" and discovers that he can find one in the local municipal library.

• When on the go, the system plays "I'm your man" a song by Leonard Cohen, another literary celebrity from Montreal.

Page 14: Linking the Open Data? by Petko Valtchev

The Semantic Annotations : RDFa• RDFa serializes RDF through HTML attributes

• similar to microformats

• @resource, @property, @href, @instanceof, @rel, etc.

Page 15: Linking the Open Data? by Petko Valtchev

Cool applications of semantic annotations

• Semantic query answering:

• Where do my colleagues live?

• Possible answers from their own web pages (via Trudat HP)

• dbpedia:Montrealdbpedia:Montreal

• dbpedia:Lavaldbpedia:Laval

• dbpedia:Torontodbpedia:Toronto

• What are their dietary restrictions?

Page 16: Linking the Open Data? by Petko Valtchev

Practical take on OD vs LOD• OD for social justice in US (say Atlanta)?

• Dataset 1: census data

• Focus on particular area with houses distinguished

• inhabited by black people vs white people

• Dataset 2: water supply data, houses connected to water lines or not

• By superposing datasets 1 and 2, analysis uncovered a discrimination

• ~83 % of the unconnected houses were inhabited by black people!!!

• How was it done (a guess)

• matching between addresses as strings compared :-(

• LOD format - simpler and more reliable processing:

• finding paths in the graph

Page 17: Linking the Open Data? by Petko Valtchev

Data about the Data• Reasoning about the dataset:

• Metadata:

• e.g. Dublin core vocabulary

• Notion of provenance

• The problem of trust: everybody could publish everything