38
LOD InterLinking 선행기술연구 팀장 이경욱

인터링킹, InterLinking, LOD

  • Upload
    -

  • View
    117

  • Download
    5

Embed Size (px)

Citation preview

Page 1: 인터링킹, InterLinking, LOD

LODInterLinking

선행기술연구팀장 이경욱

Page 2: 인터링킹, InterLinking, LOD

국가적차원공공 Data 공개Linked Open Data 가무엇인지?Linked Data의기본원칙RDF GRAPH MODEL국외의경우LOD CLOUD현황

CONTENTSInterLinking에앞서01

InterLinking예시

InterLinking이란02

Data 중복구축방지Data 중복구축방지활용예잠재적지식발견및지식의확장

InterLinking의필요성03

인터링킹방법인터링킹시스템인터링킹목표

InterLinking의자동화04

Page 3: 인터링킹, InterLinking, LOD

국가 DB open 투명성확보데이터의창조적활용

InterLinking에앞서 -국가적차원공공 Data 공개

Page 4: 인터링킹, InterLinking, LOD

InterLinking에앞서 -국가적차원공공 Data 공개

국가 DB open 투명성확보데이터의창조적활용

Page 5: 인터링킹, InterLinking, LOD

InterLinking에앞서 -

개방

공공정보를적극적으로개방

누구나자유롭게활용하도록

Linked Open Data 로제공 융합

융·복합연계체계구축원천데이터를 Linked Open

Data(LOD) 기반으로통합

재활용

정보제공환경마련

Linked Open Data 개방·연계·

활용플랫폼제공창조

새로운콘텐츠창출

개방된공공정보를민간에서

타분야지식정보와 Cross-Over

하여신규서비스개발

정부

국가적차원공공 Data 공개

Page 6: 인터링킹, InterLinking, LOD

InterLinking에앞서 - HTML Linked Open Data 가무엇인지?

Resource

Resource

Resource

Resource

Resource

Resource

Resource

링크

링크 링크

링크

링크링크

링크

문서중심의Web(Web of Documents) – HTML (Hyperlink)

Page 7: 인터링킹, InterLinking, LOD

InterLinking에앞서 - HTML Linked Open Data 가무엇인지?

Human Readable

Page 8: 인터링킹, InterLinking, LOD

예종

숙종

1054.07

문종

왕옹

nikh:hasFather nikh:hasGrandFather

nikh:realName

Nikh:hasFather

nikh:birthDate 경릉(景陵)

nikh:tombPlace

InterLinking에앞서 - RDF Linked Open Data 가무엇인지?

Data(Things) 중심의Web(Web of data)– RDF (데이터간의연계, 의미부여)

Page 9: 인터링킹, InterLinking, LOD

InterLinking에앞서 - RDF Linked Open Data 가무엇인지?

Machine Readable

Page 10: 인터링킹, InterLinking, LOD

1) Use URIs as names for things

2) Use HTTP URIs so that people can look up those names.

3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

4) Include links to other URIs. so that they can discover more things.

InterLinking에앞서 -Linked Open Data 가무엇인지?

Linked Data의기본원칙 4가지 –팀버너스리

Page 11: 인터링킹, InterLinking, LOD

주어(Subject)

목적어(Object)

술어(Predicate)

주어(Subject) 술어(Predicate) 목적어(Object) 예종의 아버지는 숙종이다

<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>

</RDF:Description>

Subject

Predicate Object

InterLinking에앞서 - RDF GRAPH MODELLinked Open Data 가무엇인지?

Page 12: 인터링킹, InterLinking, LOD

InterLinking에앞서 - RDF GRAPH MODELLinked Open Data 가무엇인지?

Page 13: 인터링킹, InterLinking, LOD

InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?

Page 14: 인터링킹, InterLinking, LOD

InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?

Page 15: 인터링킹, InterLinking, LOD

InterLinking에앞서 –국외의경우Linked Open Data 가무엇인지?

Page 16: 인터링킹, InterLinking, LOD

2008

ADDTITLE

ADDTITLE

20092010

2011

InterLinking에앞서 – LOD CLOUD 현황Linked Open Data 가무엇인지?

Version 0.3, 09/19/2011

Page 17: 인터링킹, InterLinking, LOD

InterLinking에앞서 – LOD CLOUD현황Linked Open Data 가무엇인지?

Version 0.3, 09/19/2011

Page 18: 인터링킹, InterLinking, LOD

InterLinking이란

<http://dbpedia.org/resource/Amsterdam>

owl:sameAs <http://rdf.freebase.com/ns/...> ;

owl:sameAs <http://sws.geonames.org/2759793> ;

...

<http://sws.geonames.org/2759793>

owl:sameAs <http://dbpedia.org/resource/Amsterdam>

wgs84_pos:lat “52.3666667” ;

wgs84_pos:long “4.8833333” ;

geo:inCountry <http://www.geonames.org/countries/#NL> ;

...

Dbpedia DataSet에있는암스테르담과Geonames DataSet에있는 2759793(암스테르담)을owl:sameAs로인스턴스동일화

<http://dbpedia.org/resource/Amsterdam>

owl:sameAs <http://sws.geonames.org/2759793>;

InterLinking의예시

Page 19: 인터링킹, InterLinking, LOD

LOD 구축에있어서인터링킹의필요성Data 중복구축방지

Page 20: 인터링킹, InterLinking, LOD

Relational databases: primary keys

Books

TitleAuthorYear

IDAuthors

NameYear

ID

Primary key Primary key

Foreign key

Authors record

Dan Brown1964

456IDNameYear

The Da Vinci Code

Books record

4562003

1289TitleID

Author

Year

Data 중복구축방지

LOD 구축에있어서인터링킹의필요성

Page 21: 인터링킹, InterLinking, LOD

Relational databases and applications

Select title, year from booksSelect name, year from authors where books.author=authors.id

Title: The Da Vinci CodeAuthor: Dan Brown, 1964Year: 2003

Database

Application

User interface

Authors record

Dan Brown1964

456IDNameYear

The Da Vinci Code

Books record

4562003

1289TitleID

AuthorYear

SQL

Data 중복구축방지

LOD 구축에있어서인터링킹의필요성

Page 22: 인터링킹, InterLinking, LOD

OpenLibrary

TitleAuthorYear

URI

VIAF

NameYear

URI

Primary key Primary key

Foreign key

Authors record

Dan Brown

1964

http://viaf.org/viaf/102403515 URI

Name

Year

The Da Vinci Code

Books record

http://viaf.org/viaf/102403515

2003

http://openlibrary.org/works/OL76837W

Title

URI

Author

Year

Data 중복구축방지

LOD 구축에있어서인터링킹의필요성

Triple Repository: URIs(primary keys)

Page 23: 인터링킹, InterLinking, LOD

Linked data and applications

Select ?title ?year …Select ?name ?year WHERE …..

Title: The Da Vinci CodeAuthor: Dan Brown, 1964Year: 2003

Database

Application

User interface

SPARQL

Authors record

Dan Brown

1964

http://viaf.org/viaf/102403515 URI

Name

Year

The Da Vinci Code

Books record

http://viaf.org/viaf/102403515

2003

http://openlibrary.org/works/OL76837W

Title

URI

Author

Year

Data 중복구축방지

LOD 구축에있어서인터링킹의필요성

Page 24: 인터링킹, InterLinking, LOD

Data 중복구축방지활용예 – BBC Music Site

Artist Profile

Artist Biography

LOD 구축에있어서인터링킹의필요성

Page 25: 인터링킹, InterLinking, LOD

잠재적지식발견및지식의확장

LOD 구축에있어서인터링킹의필요성

<RDF:Description RDF:about="http://www.history.go.kr/ontology/사건_ 거란, 만주족 전쟁 "><nikh:isCausedBy RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매(海東靑)</nikh:title><nikh:hasStartAge RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasStartAge><nikh:beginDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:beginDate><nikh:hasEventPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasEventPlace><RDF:type RDF:resource="http://www.history.go.kr/ontology/event"/>

</RDF:Description>

지식의확장: 매사냥으로인한거란, 만주족간의전쟁유발

<RDF:Description RDF:about="http://www.biology.go.kr/ontology/조류"><nikh:hasName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매(海東靑)</nikh:title><nikh:isCategory RDF:datatype="http://www.w3.org/2001/XMLSchema#string">척삭동물</nikh:hasStartAge><nikh:isSpecies RDF:datatype="http://www.w3.org/2001/XMLSchema#string">매과</nikh:beginDate><nikh:isLivedIn RDF:datatype="http://www.w3.org/2001/XMLSchema#string">xxx</nikh:hasEventPlace><RDF:type RDF:resource="http://www. biology.go.kr/ontology/event"/>

</RDF:Description>

owl:sameAs

Page 26: 인터링킹, InterLinking, LOD

역사

의약특허

생물약초

잠재적지식발견및지식의확장

LOD 구축에있어서인터링킹의필요성

Page 27: 인터링킹, InterLinking, LOD

엄청난양의 LOD Cloud01

비효율적인 LOD Link02

InterLinking자동화란

효율적인 Linking 추천03자동으로 Source DataSet에서의미있는인스턴스를추출하고 Target DataSet로부터최대로유사한인스턴스를찾아추천해주는시스템필요

Page 28: 인터링킹, InterLinking, LOD

InterLinking자동화란인터링킹방법

Schema DependentRDF Predicate의의미에관한지식이필요Ex) Source DataSet의 Predicate #PreLable와 Target DataSet의 Predicate #Name과같다는것을알아야한다

Publisher 마다다른 Schema 구조로데이터를저장발행

Schema Independent스키마에대한인간의지식을필요하지않음

Ontology Matching Graph Matching

Instance Matching Data Matching

인터링킹방법

Page 29: 인터링킹, InterLinking, LOD

인터링킹시스템 - SERIMI

시스템 비교 KEY 차별성 알고리즘 절차 예시

SERIMI Predicate String Matching(RWSA) Algorithm

1) Source DataSet의 Class를 선택2) Class의 인스턴스를 선택3) 그 인스턴스의 Predicate를 선

택4) High Entropy 들만 선택5) Property List를 생성6) Target DataSet도 동일 수행7) Predicate으로 같거나 비슷한

Predicate를 탐색8) 탐색된 Property의 값을 본 후

Interlinking 할지 말지 결정9) 결정되면 sameAs

4,5,6,7,9) 다음페이지 참고

InterLinking자동화란

Schema Independent

<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>

</RDF:Description>

Subject

Predicate Object

Page 30: 인터링킹, InterLinking, LOD

Property List 생성realNamebirthDatedeathDate

High EntropyPredicate 선택

realName (High Entropy)birthDate (High Entropy)deathDate (High Entropy)

tombPlace (Low Entropy)

Target DataSet에서도동일수행

namebDatedDate

InterLinking<http://source.dataset.org/resource/왕우> owl:sameAs <http://target.dataset.org/왕우>;

같거나비슷한Predicate 탐색

realName = namebirthDate = bDatedeathDate = dDate

1Step 2Step 3Step 4Step 5Step

<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>

</RDF:Description>

Subject

Predicate Object

인터링킹시스템 - SERIMI

InterLinking자동화란

Page 31: 인터링킹, InterLinking, LOD

시스템 비교 KEY 차별성 알고리즘 절차 예시

SLINT Predicate BlockingStep

CoverageDiscriminabilityDice CoefficientTF-IDFInverted-Indexing(Weighted Co-occurrence)

1) 중요한 Predicate를 선택 -Coverage & Discriminability

2) Source DataSet과 Target DataSet에서 선택된 Predicate들은 같은 Type 끼리 결합해서Predicate Alignment를 생성

3) Predicate Alignment의 신뢰도평가 – Dice Coefficent

4) 각각의 Source, Target DataSet으로 부터 Object의 값을 추출해서 Inverted-Indexing

5) URI, String – TF-IDF6) Decimal, Integer, Date – 0/17) 적정 Threshold 이상 sameAs

3) 유사한 Predicate는 유사한 정보를 의미한다Ex) title <-> titleKor

인터링킹시스템 - SLINT

InterLinking자동화란

Schema Independent

<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>

</RDF:Description>

Subject

Predicate Object

Page 32: 인터링킹, InterLinking, LOD

Predicate Alignment를생성Type:string || S:realName, D:nameType:date || S:birthDate, D:bDateType:Integer || S:age, D:age

중요한Predicate 선택

realName (Coverage) birthDate(Coverage) deathDate (Coverage)

tombPlace (Discriminability)

Predicate Alignment의신뢰도평가

Ex) title <-> titleKorString: TokenURI: ‘/’ Split

Decimal: 2-decimal digitInteger, Date: 변경없음

InterLinking<http://source.dataset.org/resource/왕우> owl:sameAs <http://target.dataset.org/왕우>;

Object의값추출Ex) 숙종, 이순신, 강감찬, …

Inverted-IndexingURI, String 이면 TF-IDF

Decimal, Integer, Date 이면 0/1

1Step 2Step 3Step 4Step 5Step

<RDF:Description RDF:about="http://www.history.go.kr/ontology/인명_예종"><nikh:realName RDF:datatype="http://www.w3.org/2001/XMLSchema#string">왕우</nikh:realName><nikh:birthDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">10790100</nikh:birthDate><nikh:deathDate RDF:datatype="http://www.w3.org/2001/XMLSchema#string">11220400</nikh:deathDate><nikh:tombPlace RDF:datatype="http://www.w3.org/2001/XMLSchema#string">유릉(裕陵)</nikh:tombPlace><nikh:hasFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">숙종(肅宗)</nikh:hasFather><nikh:hasGrandFather RDF:datatype="http://www.w3.org/2001/XMLSchema#string">문종(文宗)</nikh:hasGrandFather>

</RDF:Description>

Subject

Predicate Object

인터링킹시스템 - SLINT

InterLinking자동화란

Page 33: 인터링킹, InterLinking, LOD

시스템 비교 KEY 차별성 알고리즘 절차 예시

SILK Predicate

AgreeMaker Predicate

인터링킹시스템 – SILK, AgreeMaker

InterLinking자동화란

Schema Independent

Page 34: 인터링킹, InterLinking, LOD

InterLinking자동화란인터링킹시스템 – SILK 관리도구

Page 35: 인터링킹, InterLinking, LOD

InterLinking자동화란인터링킹시스템 – SILK 관리도구

Page 36: 인터링킹, InterLinking, LOD

InterLinking의목표

Page 37: 인터링킹, InterLinking, LOD

InterLinking의목표

Page 38: 인터링킹, InterLinking, LOD

THANK YOU