44
OKFN Korea Hackathon Day 2013. 06. 22. Toward Open Data World

20130622 okfn hackathon t2

Embed Size (px)

Citation preview

Page 1: 20130622 okfn hackathon t2

OKFN Korea

Hackathon Day

2013. 06. 22.

Toward Open Data World

Page 2: 20130622 okfn hackathon t2

OKFN Korea2

What is linked data, Open

data?

Refine

Modelling

Access

TripleStorage

other topics

image: Leo Oosterloo @ flickr.com

Page 3: 20130622 okfn hackathon t2

서울시 데이터 Enrichment 목표

서울시 데이터 상세화를 위한 온톨로지 설계 또는 매핑

구조화, 의미화, 그리고 연결: 서울시 데이터 (비정형 데이터)를 온톨로지를 이용해

모델링하고, 외부 데이터와 연결

영문화: 비 한국어권 사용자가 사용할 수 있는 서울시 데이터 제공

범위

서울시 데이터셋 약 40종

문화재: 문화재청에서 수집한 국내 문화재 (국보, 보물, 지정문화재, 무형문화재 등)

방법론: 기존 RDF 어휘의 재사용을 통해 데이터 모델링

1) 데이터 선정: 서울시 열린데이터 광장에서 모델링 대상 데이터셋 선정

2) 데이터 셋 항목 검토: 데이터 셋의 개별 항목과 Dbpedia 온톨로지 (클래스, 속성)

의 매핑 관계 검토• Dbpedia 온톨로지: 사물에 대한 개념 및 위키피디아 infobox 항목을 포함하고 있음

OKFN Korea3

Page 4: 20130622 okfn hackathon t2

서울시 데이터 Enrichment 예를 들어, '박물관'을 모델링 할 경우,

• 박물관에 대한 infobox 템플릿을 위키피디아에서 선택

• Dbpedia에서 박물관 infobox와 매핑한 어휘 선택

• 어휘와 데이터셋 항목 매핑

• 매핑되지 않는 항목의 모델링 여부 결정 (클래스, 속성 포함): 모델링 도구 결정 필요

• URI 체계 (별도 설계 필요) 적용

• 온톨로지 스키마 설계 완료

3) 데이터 정제

• Google Refine을 통해 데이터 정제

• Refine에서 추가하기 전에 할 작업

• 위치 데이터: 원본 데이터 (서울시)에 위치값을 변환 또는 추가

• 영문명: 한글명의 변환, 매핑 (수작업 필요)

• Refine에서 할 작업

– 한글, 영문 위키피디아 URL 추가

– Dbpedia, Freebase URL 추가: Refine reconciliation을 이용해서 추가

– RDF 변환 매핑 Skelton 작업

– RDF, Excel 추출

4) 데이터 업로드 (RDF 또는 Excel)

데이터 스토어 선택

Jena, 4Store, …OKFN Korea4

Page 5: 20130622 okfn hackathon t2

Contents

OKFN Korea

Modeling Issues1

Management Issues2

5

Page 6: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

Page 7: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

some school has a name/label some literal

Page 8: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

http://education.data.gov.uk

/id/school/401874

has a name/label ―Cardiff High School‖

Page 9: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

http://education.data.gov.uk

/id/school/401874

http://www.w3.org/2000/01/

rdf-schema#label

―Cardiff High School‖

Page 10: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

school:401874 rdfs:label ―Cardiff High School‖

where

school: = http://education.data.gov.uk/id/school/

rdfs: = http://www.w3.org/2000/01/rdf-schema#

Page 11: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

school:401874 rdfs:label ―Cardiff High School‖

school:401874 ont:districtAdministrative la:00PT

la:00PT rdfs:label Cardiff

Page 12: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

school:401874 rdfs:label ―Cardiff High School‖

school:401874 ont:districtAdministrative la:00PT

la:00PT rdfs:label ―Cardiff‖

school:401874

―Cardiff High School‖

ont:districtAdministrative

la:00PT

―Cardiff‖

rdfs:label

rdfs:label

Page 13: 20130622 okfn hackathon t2

Modelling – RDF

Subject Predicate Object

school:401874 rdfs:label ―Cardiff High School‖

school:401874 ont:districtAdministrative la:00PT

la:00PT rdfs:label ―Cardiff‖

la:00PT rdfs:label ―Caerdydd‖@cy

Page 14: 20130622 okfn hackathon t2

Modelling – vocabulariesLogical modelling

modelling the domain, not a particular data structure

what exists

what is asserted? what can you deduce from

that?

not about constraints as such

monotonic, open world

controlled

vocabulary

taxonomy

thesaurus

ontology

Ontology

Page 15: 20130622 okfn hackathon t2

Modelling – vocabularies

unfamiliar terminology but related to

information architecture and conceptual

modelling

domain-driven design

... and yes knowledge representation

Page 16: 20130622 okfn hackathon t2

Elements of:

Vocabulary (defining terms)

• I define a relationship called “prescribed dose.”

Schema (defining types)

• “prescribed dose” relates “treatments” to “dosagee

s”

Taxonomy (defining hierarchies)

• Any “doctor” is a “medical professional”

16

RDF Schema is…

Page 17: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

classes, types and type hierarchy

ont:School rdfs:Classrdf:type

―School‖rdfs:label

Page 18: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

classes, types and type hierarchy

ont:WelshEstablishment

ont:School rdfs:Classrdf:type

rdf:typerdfs:subClassOf

―School‖rdfs:label

Page 19: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

classes, types and type hierarchy

school:401874

ont:WelshEstablishment

ont:WelshEstablishment

ont:School rdfs:Class rdf:typerdf:type

rdf:typerdfs:subClassOf

―School‖rdfs:label

Page 20: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

classes, types and type hierarchy

school:401874

ont:WelshEstablishment

ont:WelshEstablishment

ont:School rdfs:Class rdf:typerdf:type

rdf:typerdfs:subClassOf

school:401874

ont:WelshEstablishment

ont:School

rdf:type

―School‖rdfs:label

―School‖

rdfs:label

Page 21: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

properties, property hierarchy

school:401874

person:JoeBloggsont:staffAt

ont:headOf

rdf:Property

ont:headOf

rdf:type

rdfs:subPropertyOf

school:401874person:JoeBloggs

ont:staffAt

ont:headOf

Page 22: 20130622 okfn hackathon t2

Modelling – RDFSRDF vocabulary description language

class/property relations

domain

range

Already have power to do some vocabulary mapping

declare classes or properties from different vocabularies to be equivalent:

A rdfs:subClassOf B

B rdfs:subClassOf A

Page 23: 20130622 okfn hackathon t2

WOL OWL is…

23

Web Ontology Language

Page 24: 20130622 okfn hackathon t2

Elements of ontology

Same/different identity• “author” and “auteur” are the same relation

• two resources with the same “ISBN” are the same “book”

More expressive type definitions• A “cycle” is a “vehicle” with at least one “wheel”

• A “bicycle” is a “cycle” with exactly two “wheels”

More expressive relation definitions• “sibling” is a symmetric predicate

• the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”

OWL is…

24

Page 25: 20130622 okfn hackathon t2

Answer questions of

Consistency

• Are there any contradictions in this model?

Classification

• What are all the inferred types of this resource?

Satisfiability

• Are there any classes in this ontology that cannot p

ossibly have any members?

What can we do with OWL?

25

Page 26: 20130622 okfn hackathon t2

Building Useful Ontologies

Developing and maintaining quality ontolgies is very

challenging

Users need tools and services, e.g., to help check

if ontology is:

Meaningful — all named classes can have instances

http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt

Page 27: 20130622 okfn hackathon t2

Building Useful Ontologies

Developing and maintaining quality ontolgies is very

challenging

Users need tools and services, e.g., to help check

if ontology is:

Meaningful — all named classes can have instances

Correct — captures intuitions of domain experts

Page 28: 20130622 okfn hackathon t2

Building Useful Ontologies

Developing and maintaining quality ontolgies is very

challenging

Users need tools and services, e.g., to help check if ont

ology is:

Meaningful — all named classes can have instances

Correct — captures intuitions of domain experts

Minimally redundant — no unintended synonyms

Banana split Banana sundae

Page 29: 20130622 okfn hackathon t2

Modelling - OWL

richer modelling and semantics axioms on properties transitive, symmetric, inverseOf, ...

functional, inverse functional

equivalent property

axioms on classes intersection, union, disjoint, equivalent

restrictions on classes some value from, all values from, cardinality, has value,

one of, keys

axioms on individuals same as, different from, all different

imports

Page 30: 20130622 okfn hackathon t2

Modelling – OWL

supports much richer modelling

consistency checking of model

consistency checking of data

some surprises if used to schema languages

open world, no unique name assumption

can extend to closed world checking

inference

classification

inferred relationships

Page 31: 20130622 okfn hackathon t2

ModellingSpectrum of goals and styles

Lightweight vocabularies Rich ontological models

simple modelling

just enough agreement to get useful work done

removing boundaries to enable information to be found and connected

global consistency not possible

a little semantics goes a long way

rich domain models

need expressivity

consistency is critical

make complex inferences you can rely on, across data you trust

knowledge is power

Page 32: 20130622 okfn hackathon t2

ModellingOntology reuse

invest in complete ontology for a domain

rich but general model, may be modular inside

strong ―ontological commitment‖

e.g. medical ontologies

reuse small, common, vocabularies

FOAF, SIOC, Dublin Core, Org ...

pick and choose classes and properties you need

fill in a few missing links for your domain

generic reusable vocabularies

Data cube vocabulary

Page 33: 20130622 okfn hackathon t2

Reusable, public ontologies

33

Measurement Units Ontology

The Event Ontology

FOAF

Page 34: 20130622 okfn hackathon t2

schema.org is one of a number of microdata vocabularies

it is a shared collection of microdataschemas for use by webmasters

includes a type hierarchy, like an RDFS schema

starts with top-level Thing and DataType

types

properties are inherited by descendant types

Schema.org

34

Page 35: 20130622 okfn hackathon t2

annotate an item with text-valued properties using the “itemprop” attribute

microdata properties

35

<div itemscope>

<p>My name is <span itemprop="name">Daniel</span>.</p>

</div>

<div itemscope>

<p>Flavors in my favorite ice cream:</p>

<ul>

<li itemprop="flavor">Lemon sorbet</li>

<li itemprop="flavor">Apricot sorbet</li>

</ul>

</div>

Page 36: 20130622 okfn hackathon t2

Google

Yahoo

Bing

Why should you use schema.org?

36

Page 37: 20130622 okfn hackathon t2

Top types

37

Page 38: 20130622 okfn hackathon t2

maintains schema.org ↔RDF

mappings

there are mappings for BIBO, DBpedia,

Dublin Core, FOAF, GoodRelations, SIOC,

and WordNet

also provides examples, tutorials, and data dumps

Schema.rdfs.org

38

Page 39: 20130622 okfn hackathon t2

Triple Store

OKFN Korea39

Page 41: 20130622 okfn hackathon t2

Storage Solutionsfor RDF DataTriple Table (Basic Idea)

Store all RDF triples in a single table

Create indexes on combinations of S, P, and O

OKFN Korea41

Page 42: 20130622 okfn hackathon t2

The Internet Map

OKFN Korea

http://internet-map.net/

42

Page 43: 20130622 okfn hackathon t2

credits

These slides are partially based on “Linked data and its role in the semantic web” by Dave Reynolds, Epimorphics Ltd.

OKFN Korea43

Page 44: 20130622 okfn hackathon t2

OKFN Korea