27
DBpedia - A Crystallization Point for the Web of Data 2011.10.05 Junghee - Han

DBpedia - A Crystallization Point

Embed Size (px)

DESCRIPTION

Outline The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia DBpedia - A Crystallization Point for the Web of Data

Citation preview

Page 1: DBpedia - A Crystallization Point

DBpedia - A Crystallization Point

for the Web of Data2011.10.05

Junghee - Han

Page 2: DBpedia - A Crystallization Point

2

Outline

The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia

DBpedia - A Crystallization Point for the Web of Data

Page 3: DBpedia - A Crystallization Point

3

The DBpedia Project

DBpedia 위키피디아로부터 구조화된 정보를 추출하고 , 이를

웹에서 이용할 수 있도록 만들기 위한 커뮤니티

Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open licenseInterlink the DBpedia dataset with other open datasets on the Web

DBpedia - A Crystallization Point for the Web of Data

Page 4: DBpedia - A Crystallization Point

4

DBpedia knowledge base Currently describes more than 2.6 million entities

- 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies.

The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources.

DBpedia - A Crystallization Point for the Web of Data

The DBpedia Project

Page 5: DBpedia - A Crystallization Point

5

Linked Data

참고 :

Page 6: DBpedia - A Crystallization Point

6

Linked Data

참고 :

WebBrowsers

SearchEngines

HTTP HTTP

Page 7: DBpedia - A Crystallization Point

7

Linked Data

RDF stands for

Resource : URI 를 갖는 모든 것 ( 웹페이지 , 이미지 , 동영상등 ) Description : 자원 (Resource) 들의 속성 , 특성 , 관계기술

Framework : 위의 것들을 기술하기 위한 모델 , 언어 , 문법

RDF 는 Graph Model 을 갖고 있다 .

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 8: DBpedia - A Crystallization Point

8

Linked Data Graph Model 예시

RDF Syntax

Triple 형식표현

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

SPARQL(Simple Protocol and RDF Query Language) W3C 에서 만든 RDF 질의 언어

Page 9: DBpedia - A Crystallization Point

Linked Data

9

1. Use URI(Uniform Resource Identifier)s as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful RDF Information4. Include RDF statements that link to other URIs so that they

can discover related things

Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html

Page 10: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

10

http://bibleontology.com/page/Bilhah

1. Use URIs as names for things

http://bibleontology.com/page/Bilhah

Page 11: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

11

http://bibleontology.com/page/Bilhah

2. Use HTTP URIs so that people can look up those names

http://bibleontology.com/page/Bilhah

Page 12: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

12

http://bibleontology.com/page/Bilhah

3. When someone looks up a URI, provide useful RDF Information

Page 13: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

13

http:// http://bibleontology.com/page/Bilhah

4. Include RDF statements that link to other URIs so that they can discover related things

Page 14: DBpedia - A Crystallization Point

14

HongGilDong

Hong, Gil Dong 35

Seoul

SemanticWeb

[hasPhotoCollection]

http://dbpedia.org/resource/Semantic_Web

http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Semantic_Web

[sameAs]

http://dbpedia.org/resource/Seoul

http://sws.geonames.org/1835848/

http://sws.geonames.org/1835848/nearby.rdf

[nearbyFeatures]

[residences]

[researches]

[name] [age]

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 15: DBpedia - A Crystallization Point

15

SPARQL

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

SQL

Page 16: DBpedia - A Crystallization Point

16

공간정보

여행정보

교통정보

부동산정보문화재정

문헌정보토지정보

환경정보

XXX 정보

상품정보

일자리정보

단절된 국가 공공정보

공간정보

여행정보

교통정보

부동산정보문화재정

문헌정보토지정보

환경정보

XXX 정보

상품정보

일자리정보

연결된 국가 공공정보

포털 및 언론 대학 기타

민간 정보

DBPedia BBC etc해외 정보

여행정보 공간정보 문헌정보 환경정보 XXX 정보국가 공공정보

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 17: DBpedia - A Crystallization Point

17

Wikipedia Content

Title

Description

Languages

Web Links

Categorization

Domain specificData

Images

Infoboxes

DBpedia - A Crystallization Point for the Web of Data

Page 18: DBpedia - A Crystallization Point

Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore

18

The DBpedia Knowledge Extraction Framework(1/2)

Labels(title,rdfs:label)Abstracts(first paragraph,rdfs:comment)Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference)Page links(dbpedia:wikilink)Homepages(foaf:homepage)Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox.

Currently 19 extractors

DBpedia - A Crystallization Point for the Web of Data

Page 19: DBpedia - A Crystallization Point

19

The DBpedia Knowledge Extraction Framework(2/2)

Two Work-Flows Dump-based extraction

-The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis-The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination.

Live extraction

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

DBpedia - A Crystallization Point for the Web of Data

Page 20: DBpedia - A Crystallization Point

20

Infobox Extraction

dbpedia:BBC p:network_name„British Broadcasting Corporation (BBC)“

dbpedia:BBC p:country dbpedia:United_Kingdom

dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson

DBpedia - A Crystallization Point for the Web of Data

Page 21: DBpedia - A Crystallization Point

The DBpedia Knowledge Base

Identifying EntitiesResources are assigned a URI according to the pattern http://dbpedia.org/resource/Name (where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name)

Classifying EntitiesDBpedia entities are classified within four classification schemata in order to fulfill different application requirements.

- Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities

Every DBpedia entity is described by a set of general properties

21DBpedia - A Crystallization Point for the Web of Data

Page 22: DBpedia - A Crystallization Point

Accessing the DBpedia Knowledge Base over the Web

Linked Data DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin) SPARQL Endpoint

http://dbpedia.org/sparql

22

RDF Dumps http://wiki.dbpedia.org/Downloads32

Lookup Index http://lookup.dbpedia.org/api/search.asmx

DBpedia - A Crystallization Point for the Web of Data

Page 23: DBpedia - A Crystallization Point

Interlinked Web Content

23DBpedia - A Crystallization Point for the Web of Data

Currently contains 4.9 million outgoing RDF links

Page 24: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(1/3)

Browsing and Exploration DBpedia Mobile

24DBpedia - A Crystallization Point for the Web of Data

Page 25: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(2/3)

Querying and Search DBpedia Query Builder

.

25

http://querybuilder.dbpedia.orgDBpedia - A Crystallization Point for the Web of Data

Page 26: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(3/3)

Querying and Search Relationship Finder

.

26DBpedia - A Crystallization Point for the Web of Data

Page 27: DBpedia - A Crystallization Point

ConclusionThe resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains.

27DBpedia - A Crystallization Point for the Web of Data

Future WorkCross-language infobox knowledge fusion

- Derive an astonishingly detailed multi-domain knowledge baseWikipedia article augmentation

- Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sourcesWikipedia consistency checking

- Improve the overall quality of Wikipedia

Conclusions and Future Work