55
Linked Data & Semantic Web Technology Linked Data Usecases Dr. Myungjin Lee

Linked Data Usecases

Embed Size (px)

DESCRIPTION

This slide describes usecases of Linked Data.

Citation preview

Page 1: Linked Data Usecases

Linked Data & Semantic Web Technology

Linked Data Usecases

Dr. Myungjin Lee

Page 2: Linked Data Usecases

2 Linked Data & Semantic Web Technology

Agenda

• Introduction of the Linked Data

• Linked Data for Cross-Domain

• Linked Geographic Data

• Linked Government Data

• Linked Media Data

• Linked Data for User Generated Con-

tent

• Linked Publication Data

• Linked Life Science Data

Page 3: Linked Data Usecases

3 Linked Data & Semantic Web Technology

Introduction ofthe Linked Data

Page 4: Linked Data Usecases

4 Linked Data & Semantic Web Technology

What is Linked Data?

• a method of publishing structured data so that data can be interlinked and become more useful

• based on standard Web technologies such as HTTP, RDF and URIs.

• to share information in a way that can be read au-tomatically by computers.

Page 5: Linked Data Usecases

5 Linked Data & Semantic Web Technology

Stack and Requirements for Linked Data

an elemental syntaxfor content structurewithin documents

a simple languagefor expressing data models,

which refer to objects ("resources")and their relationships

a vocabulary for describingproperties and classes

of RDF-based resources

a protocol and query languagefor semantic web data sources

a string of characters used to identify a name or a resource

Page 6: Linked Data Usecases

6 Linked Data & Semantic Web Technology

Four Principles of Linked Data

1. Use URIs to identify things.

2. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.

3. Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.

4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

Page 7: Linked Data Usecases

7 Linked Data & Semantic Web Technology

5 Star Linked Data

★ Available on the web (whatever format) but with an open licence, to be Open Data

★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)

★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)

★★★★ All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff

★★★★★ All the above, plus: Link your data to other people’s data to provide context

Page 8: Linked Data Usecases

8 Linked Data & Semantic Web Technology

The Linking Open Data cloud dia-gram

Page 9: Linked Data Usecases

9 Linked Data & Semantic Web Technology

Media

User Generated Content

Publications

Government

Geographic

Cross-Domain

Life Sciences

Domain Number of datasets Triples (Out-)Links

Media 25 18,4185,2061 5044,0705

Geographic 31 61,4553,2484 3581,2328

Government 49 133,1500,9400 1934,3519

Publications 87 29,5072,0693 1,3992,5218

Cross-domain 41 41,8463,5715 6318,3065

Life Sciences 41 30,3633,6004 1,9184,4090

User-generated Content 20 1,3412,7413 344,9143

Total 295 316,3421,3770 5,0399,8829

Page 10: Linked Data Usecases

10 Linked Data & Semantic Web Technology

Linked Data forCross-Domain

Page 11: Linked Data Usecases

11 Linked Data & Semantic Web Technology

DBPedia

• a project aiming to extract structured content from the information created as part of the Wikipedia project

• as of September 2011, more than 3.64 million things, more than 6.5 million interlinks, and over 1 billion pieces of information (RDF triples)

Page 12: Linked Data Usecases

12 Linked Data & Semantic Web Technology

Page 13: Linked Data Usecases

13 Linked Data & Semantic Web Technology

The DBpedia Information Extraction Framework

• Source– an abstraction over a source of Media Wiki pages

• WikiParser– a parser which transforms an Media Wiki page source into an Abstract Syn-

tax Tree (AST)

• Extractor– a mapping from a page node to a graph of statements about it

• Destination– an abstraction over a destination of RDF statements

Page 15: Linked Data Usecases

15 Linked Data & Semantic Web Technology

Page 16: Linked Data Usecases

16 Linked Data & Semantic Web Technology

OpenCyc

• Cyc– an artificial intelligence project that attempts to assemble a

comprehensive ontology and knowledge base of everyday common sense knowledge

• OpenCyc– mainly taxonomic assertions, not the complex rules avail-

able in Cyc– 239,000 concepts, 2,093,000 facts, and 69,000 owl:sameAs links to external (non-Cyc) semantic data

– the RDF-compatible content extracted from OpenCyc us-ing the open source Texai

Page 17: Linked Data Usecases

17 Linked Data & Semantic Web Technology

Linked Geographic Data

Page 18: Linked Data Usecases

18 Linked Data & Semantic Web Technology

GeoNames

• a geographical database available and accessible through various web services, under a Creative Commons attribution license

• over 10,000,000 geographical names correspond-ing to over 7,500,000 unique features

Page 19: Linked Data Usecases

19 Linked Data & Semantic Web Technology

Page 20: Linked Data Usecases

20 Linked Data & Semantic Web Technology

LinkedGeoData

• an effort to add a spatial dimension to the Web of Data / Semantic Web collected by the OpenStreetMap project according to the Linked Data principles

Dataset #Triples

Ontology 8K

RelevantNodes 66Mio

RelevantWays 65Mio

RelevantWayNodes 74Mio

RelevantNodePositions 60Mio

DBpedia Interlinks 101K

GeoNames Interlinks 487K

Page 21: Linked Data Usecases

21 Linked Data & Semantic Web Technology

Page 22: Linked Data Usecases

22 Linked Data & Semantic Web Technology

etc.

• Linked Sensor Data– an RDF dataset containing expressive descriptions of ~20,000 weather sta-

tions in the United States

• U.S. Census– Basic geographic data for the U.S., the states, counties, cities, ZCTAs, and

congressional districts.– 1,016,219 triples in N3 format

<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county> rdf:type usgovt:County ;

usgovt:fipsCountyCode "049" ;usgovt:fipsStateCountyCode "45:049" ;dc:title "Hampton County" ;dcterms:isPartOf <http://www.rdfabout.com/rdf/usgov/geo/us/sc> ;geo:lat 32.796299 ;geo:long -81.131622 ;census:population 21386 ;census:households 8582 ;census:landArea "1449823309 m^2" ;census:waterArea "7369890 m^2" ;census:details

<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county/censustables> .<http://www.rdfabout.com/rdf/usgov/geo/us/sc>

dcterms:hasPart<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county> .

Page 23: Linked Data Usecases

23 Linked Data & Semantic Web Technology

Linked Government Data

Page 24: Linked Data Usecases

24 Linked Data & Semantic Web Technology

Open Government Data

• By “open”, “open” data is free for anyone to use, re-use and re-distribute.

• By “government data” we mean data and informa-tion produced or commis-sioned by government or government controlled entities.

Open

GovData

OpenData

OpenGov

DataGov

OpenGovData

Page 25: Linked Data Usecases

25 Linked Data & Semantic Web Technology

United States

• Data.gov– "The purpose of Data.gov is to increase public access to high value, ma-

chine readable datasets generated by the Executive Branch of the Federal Government.“

– "a repository for all the information the government collects"– over 250,000 datasets

• Data-gov Wiki– a project investigating open government datasets using semantic web

technologies– to translate datasets into RDF, to get them linked to the linked data cloud,

and to develop interesting applications on linked government data– Dataset Statistics

• 417 RDFlized datasets and 6.46 billion RDF triples• 35 Non-Data.gov Datasets and 0.9 billion more RDF triples

Page 26: Linked Data Usecases

26 Linked Data & Semantic Web Technology

Page 27: Linked Data Usecases

27 Linked Data & Semantic Web Technology

United Kingdom

• Data.gov.uk – a UK Government project to make available non-personal

UK government data as open data– over 9,000 datasets– the use of Linked Data standards for flexible and easy re-

use– Dataset

• Environment, Finance, Legislation, Location, Reference, Statistics, Transport, etc.

Page 28: Linked Data Usecases

28 Linked Data & Semantic Web Technology

Page 29: Linked Data Usecases

29 Linked Data & Semantic Web Technology

All around the world

Country Official? Rating Datasets

Sweden N ★★ few

New Zealand Y ★★ many

Ireland Y ★★★ few

Canada Y ★★★ many

United States Y ★★★★ many

Spain N ★★★★★ few

United Kingdom Y ★★★★★ many

Korea ? ? ?

Page 30: Linked Data Usecases

30 Linked Data & Semantic Web Technology

Korea

• 공공데이터포털– 국가가 보유하고 있는 다양한 공공정보를 국민에 개방하여

이를 편리하고 손쉽게 활용할 수 있도록 지원– 1,717 datasets and 242 Open APIs– http://www.data.go.kr

• 공공 DB 피디아– 24 Datasets and 50,184 Resources– http://lod.data.go.kr

Page 31: Linked Data Usecases

31 Linked Data & Semantic Web Technology

<rdf:RDF xmlns:ns1="http://lod.data.go.kr/sample/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://lod.data.go.kr/schema/dataset#" > <rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501"> <ns0:sampleResource rdf:resource="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF"/> </rdf:Description> <rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF"> <ns0:prefLabel> 자전거체험장 </ns0:prefLabel> <ns0:nodeLabel> 자전거체험장 </ns0:nodeLabel> <ns1:phone>02-2204-7634</ns1:phone> <ns1:name> 자전거체험장 </ns1:name> <ns1:manageOrg> 성동구도시관리공단 </ns1:manageOrg> <ns1:description> 남녀노소 모두 편하게 이용할 수 있는 자전거체험장 </ns1:description> <ns1:address> 서울특별시 성동구 마장동 802-2 마장 2 교 ~ 사근램프 사이 </ns1:address> <rdf:type rdf:resource="http://lod.data.go.kr/sample/schema#SportFacility"/> </rdf:Description></rdf:RDF>

Page 32: Linked Data Usecases

32 Linked Data & Semantic Web Technology

Seoul, Korea

• 서울 열린 데이터 광장– 서울시의 공공정보를 민간에 공개하고 소통함으로써 공익성 , 업무효율성 ,

투명성을 높이고 시민의 자발적 참여로 새로운 서비스와 공공의 가치를 창출– http://data.seoul.go.kr

• 서울 열린 데이터 광장 Linked Data Beta 서비스– 행정동 기준 행정구역 및 문화시설과 문화재 13,600 여종– http://lod.seoul.go.kr

Page 33: Linked Data Usecases

33 Linked Data & Semantic Web Technology

Page 34: Linked Data Usecases

34 Linked Data & Semantic Web Technology

KDATA (Linked Data for Korea)

• W3C 의 시맨틱 웹 표준 기술로 Linked Data 를 구현한 공개 기반 데이터

• http://kdata.kr• http://www.li-st.com

Page 35: Linked Data Usecases

35 Linked Data & Semantic Web Technology

Domain Triples

국가코드 3,899

엔터테인먼트 44,278

행정구역 2,969

초중고등학교 126,469

교육청 1,130

대학교 2,833

사회적 기업 5,539

서울시 개방 화장실 47,340

야구선수 및 팀 228,872

지하철역 4,450

역사 5,392

행정데이터표준용어 109,101

한옥마을 1,155

공공 WiFi 설치정보 1,671

KDATA 분류용어 808

전통시장 4,535

국립공원 10,605

문화재 80,156

공공체육시설 49,799

생물분류 3,256

문화시설 9,418

공원정보 및 프로그램 2,429

가격안정모범업소 16,212

가격안정모범업소 상품목록 14,300

공공시설물 인증제품 6,931

제설함 위치정보 39,218

야생동식물정보 115,099

야생동식물 출현정보 139,608

합계 1,077,472

Page 36: Linked Data Usecases

36 Linked Data & Semantic Web Technology

Page 37: Linked Data Usecases

37 Linked Data & Semantic Web Technology

LinkedMedia Data

Page 38: Linked Data Usecases

38 Linked Data & Semantic Web Technology

MusicBrainz

• MusicBrainz– a project that aims to create an open content music data-

base– information about 750,000 artists, 1 million releases, and

12 million recordings

• LinkedBrainz– to help MusicBrainz publish its database as Linked Data– mapped to concepts in the Music Ontology

Page 39: Linked Data Usecases

39 Linked Data & Semantic Web Technology

Music Ontology

• main concepts and properties for describing music (i.e. artists, albums, tracks, but also performances, ar-rangements, etc.) on the Semantic Web

Page 40: Linked Data Usecases

40 Linked Data & Semantic Web Technology

Linked Data on BBC

• Problems– lot of data (broadcast between 1,000 and 1,500 programs a

day)– hand-crafted, customized sites– often not maintained– often not persistent

• build upon Open Data Repositories– such as MusicBrainz and Wikipedia

Page 41: Linked Data Usecases

41 Linked Data & Semantic Web Technology

Data from Wikipedia

Data from MusicBrainz

Page 42: Linked Data Usecases

42 Linked Data & Semantic Web Technology

Page 43: Linked Data Usecases

43 Linked Data & Semantic Web Technology

BBC Ontologies

• Programmes Ontology– every programme brand, series and episode broadcast by the BBC– the Programmes Ontology to expose data following the Linked Data ap-

proach, enabling the interchange of programme information on the Semantic Web

• Wildlife Ontology– a simple vocabulary for describing biological species and related taxa– terms for describing the names and ranking of taxa, as well as providing sup-

port for describing their habitats, conservation status, and behavioural charac-teristics, etc

• Curriculum Ontology– a core data model for formally describing the national curricula across the UK– to provide a model of the national curricula across the UK

Page 44: Linked Data Usecases

44 Linked Data & Semantic Web Technology

LinkedMDB

• publishing the first open semantic web database for movies, including a large number of interlinks to several datasets

Page 46: Linked Data Usecases

46 Linked Data & Semantic Web Technology

flickr™ wrappr

• to extend DBpedia with RDF links to photos posted on flickr

• to generate a collection of flickr photos for each of the 1.95 million DBpedia concepts

Page 47: Linked Data Usecases

47 Linked Data & Semantic Web Technology

Page 48: Linked Data Usecases

48 Linked Data & Semantic Web Technology

Revyu.com

• a web site where you can review and rate things

Page 49: Linked Data Usecases

49 Linked Data & Semantic Web Technology

Open Graph Protocol

• to integrate web pages into the facebook’s social graph based on RDFa

<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"> <head> <meta property="og:url" content="http://www.imdb.com/title/tt1285016/" /> <meta property='og:image' content='http://ia.media-imdb.com/…140_.jpg'> <meta property='og:type' content='movie' /> <meta property='fb:app_id' content='115109575169727' /> <meta property='og:title' content='The Social Network (2010)' /> <meta property='og:site_name' content='IMDb' />...

Page 51: Linked Data Usecases

51 Linked Data & Semantic Web Technology

BIO2RDF

• a Biological data-base using the Se-mantic web tech-nologies to provide interlinked life sci-ence data

Page 52: Linked Data Usecases

52 Linked Data & Semantic Web Technology

Linked Life Data

• a semantic data integration platform for the bio-medical domain

• Search and explore over RDF statements from various sources including UniProt, PubMed, En-trezGene and so forth

Page 53: Linked Data Usecases

53 Linked Data & Semantic Web Technology

Select drugs related to asthma that are linked to a molecular interaction

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>PREFIX biopax2: <http://www.biopax.org/release/biopax-level2.owl#>PREFIX uniprot: <http://purl.uniprot.org/core/>PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>SELECT distinct ?fullname ?drugname ?indicationWHERE {

?physicalEntity skos:semanticRelation ?protein . ?protein uniprot:recommendedName ?name. ?name uniprot:fullName ?fullname . ?target skos:exactMatch ?protein . ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugname. ?drug drugbank:indication ?indication. filter(regex(?indication, "asthma", "i"))

}

Page 54: Linked Data Usecases

54 Linked Data & Semantic Web Technology

References• http://en.wikipedia.org/wiki/Linked_data• http://en.wikipedia.org/wiki/Semantic_Web_Stack• http://www.w3.org/DesignIssues/LinkedData• http://lod-cloud.net/• http://en.wikipedia.org/wiki/Dbpedia• http://dbpedia.org/About• http://en.wikipedia.org/wiki/Freebase• http://www.freebase.com/• http://en.wikipedia.org/wiki/OpenCyc• http://www.cyc.com/platform/opencyc• http://en.wikipedia.org/wiki/GeoNames• http://www.geonames.org/• http://www.geonames.org/ontology/documentation.html• http://linkedgeodata.org/About• http://wiki.knoesis.org/index.php/SSW_Datasets• http://www.rdfabout.com/demo/census/• http://www.slideshare.net/cygri/the-state-of-linked-government-data• http://www.slideshare.net/onlyjiny/linked-open-government-data-15708234• http://data-gov.tw.rpi.edu/wiki/The_Data-gov_Wiki• http://data.gov.uk/linked-data• http://musicbrainz.org/• http://wiki.musicbrainz.org/LinkedBrainz• http://musicontology.com/• http://www.slideshare.net/alabarga/linked-data-in-industry• http://www.bbc.co.uk/ontologies/• http://linkedmdb.org/• http://wifo5-03.informatik.uni-mannheim.de/flickrwrappr/• http://revyu.com/• http://en.wikipedia.org/wiki/Open_Graph_protocol#Open_Graph_protocol• http://www.slideshare.net/onlyjiny/social-semantic-web-on-facebook-open-graph-protocol-and-twitter-annotations• http://bio2rdf.org/• http://linkedlifedata.com/• http://www.slideshare.net/echo4ngel/linked-data-in-healthcare-and-life-sciences-16926052

Page 55: Linked Data Usecases

55 Linked Data & Semantic Web Technology

Dr. Myungjin Lee

e-Mail : [email protected] : http://twitter.com/MyungjinLee

Facebook : http://www.facebook.com/mjinlee

SlideShare : http://www.slideshare.net/onlyjiny/

Thanks foryour attention.