17
KD2R: a Key Discovery method for semantic Reference Reconciliation Danai Symeonidou, Nathalie Pernelle and Fatiha Saϊs LRI (University Paris-Sud) WOD’2013 June, 3th

KD2R: a Key Discovery method for semantic Reference Reconciliation

  • Upload
    ros

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

KD2R: a Key Discovery method for semantic Reference Reconciliation. Danai Symeonidou , Nathalie Pernelle and Fatiha Sa ϊ s LRI ( University Paris-Sud) WOD’2013 June , 3th. More and more heterogeneous RDF sources Links can be asserted between them - PowerPoint PPT Presentation

Citation preview

Page 1: KD2R: a Key Discovery method for semantic Reference Reconciliation

KD2R: a Key Discovery method for semantic Reference

Reconciliation

Danai Symeonidou, Nathalie Pernelle and Fatiha SaϊsLRI (University Paris-Sud)

WOD’2013June, 3th

Page 2: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 2

Data Linking• More and more heterogeneous RDF sources • Links can be asserted between them

▫Same as is one of the most important types of links: combine information given in different data sources

▫ LOD: the number of already existing links is very small • How to create links automatically ?

Linked Open Data cloud

Page 3: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 3

FirstName: GeorgeLastName: Thomson

SSN : 011223456Job : Artist

FirstName: GeorgeLastName: Thomson

SSN : 444223456Job: Professor

FirstName: GeorgeLastName: Thomson

SSN : 011223456Age : 45

Dataset1 Dataset2

Data Linking Problem

P1

P2

P3

Page 4: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 4

FirstName: GeorgeLastName: Thomson

SSN : 011223456Job : Artist

FirstName: GeorgeLastName: Thomson

SSN : 444223456Job: Professor

FirstName: GeorgeLastName: Thomson

SSN : 011223456Age : 45

Dataset1 Dataset2

SameAs

Data Linking Problem

P1

P2

P3

Page 5: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 5

FirstName: GeorgeLastName: Thomson

SSN : 011223456Job : Artist

FirstName: GeorgeLastName: Thomson

SSN : 444223456Job: Professor

FirstName: GeorgeLastName: Thomson

SSN : 011223456Age : 45

Dataset1 Dataset2

SameAs

SameAs

Data Linking Problem

P1

P2

P3

Page 6: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 6

Data Linking with or without key constraints

• No knowledge given about the properties: all the properties have the same importance.

• Knowledge given by an expert: Specific expert rules [Arasu and al.’09, Low and al.’01, Volz and

al.’09 (Silk)]Example: max(jaro(phone-number;phone-number; jaro-winkler(SSN;SSN)) > 0.88

Key constraints [Saïs, Pernelle and Rousset’09]Example: hasKey(Museum (museumName) (museumAddress))

• OWL2 Key for a class expression: a combination of (inverse) properties which uniquely identify an entity▫ hasKey( CE ( OPE1 ... OPEm ) ( DPE1 ... DPEn ) )

Example: hasKey(Museum (museumName) (museumAddress)) expresses:Museum(x1)∧Museum(x2)∧museumName(x1, y)∧museumName(x2, y)∧museumAddress(x1, w)∧museumAddress(x2, w) sameAs(x1, x2)

Page 7: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 7

Problem: when data sources contain numerous data and/or complex ontologies Some keys are not obvious to find. Erroneous keys can be given by the expert.

• Aim: automatic discovery of a complete set of keys from data

• Naïve automatic way to discover keys: examine all the possible combinations of properties▫ Example: given an instance described by 15 properties the

number of candidate keys is 215-1 = 32767 ▫ For each candidate key we have to scan all the instances of the

data

• Objective: find efficiently keys by:▫ Reducing the combinations ▫ Partially scanning the data

Key Discovery Problem

Page 8: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 8

• RDF data sources (conforming to an OWL 2 ontology) • Mappings between classes and properties of the different

ontologies• Open world assumption (incomplete data) and multivalued

properties may exist

How to discover keys when we do not know if : i1 =?= i2 =?=i3 =?=i4hasFriend(i1,i4), hasFriend(i2, i3) …. ?? firstName(i1, Elodie) … ?

Key Discovery Problem

id lastName firstName

hasFriend

i1 Tompson Manuel i2,i3

i2 Tompson Maria

i3 David George i2, i4

i4 Solgar Michel

Page 9: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 9

• Unique Name Assumption (UNA): two different URIs refer to distinct entities (data sources generated from relational databases , Yago)i1 <> i2<> i3 <> i4

• Two literals that are syntactically different are semantically different ▫ (e.g. “Napoleon Bonaparte” <> “Napoleon”)

Key Discovery Problem:Assumptions

Page 10: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 10

• Heuristic 1 - Pessimistic: ▫ Not instantiated property all the values are possible

Example: hasFriend(i2, i3), hasFriend(i4, i2) are possible.

▫ Instantiated property only given values are considered Example: not hasFriend(i1, i4)

Non keys: {lastName}, {hasFriend} Keys: {firstName}, {lastName, firstName}, {firstName, hasFriend} Undetermined keys: {hasFriend, lastName}

Key Discovery:Heuristics

id lastName firstName

hasFriend

i1 Tompson Manuel i2,i3

i2 Tompson Maria

i3 David George i2, i4

i4 Solgar Michel

Page 11: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 11

• Heuristic 1 - Optimistic: ▫ Not instantiated property value not one of the already existing ones

Example: not hasFriend(i2, i3), not hasFriend(i2, i1), not hasFriend(i2, i4).

▫ Instantiated property only given values are considered Example: not hasFriend(i1, i4)

Non keys: {lastName}, {hasFriend} Keys: {firstName}, {lastName, firstName}, {firstName, hasFriend}, {hasFriend, lastName}

Key Discovery:Heuristics

id lastName firstName

hasFriend

i1 Tompson Manuel i2,i3

i2 Tompson Maria

i3 David George i2, i4

i4 Solgar Michel

Page 12: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 12

KD2R approach Topological sort of the classes (subsumption)

• Key Finder▫ Discover non keys

Ex: {lastName}, {hasFriend} ▫ Derive keys using non keys

Ex: {firstName}, {lastName, firstName}, {firstName, hasFriend}, {hasFriend, lastName}

• Key Merge ▫ Cartesian product of minimal key sets in S1,S2

Ex. Ks1 = {firstName} Ks2 = {hasFriend} Ks1-s2 = {firstName, hasFriend}

Technical report available:https://www.lri.fr/~bibli/Rapports-internes/2013/RR1559.pdf

Page 13: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 13

KD2R approach: Key Finder • Computation of maximal non keys and undetermined

keys ▫ Represent data in a prefix-tree (a compact representation of the

data of one class)

Page 14: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 14

Validation of approach• Datasets where KD2R has been tested:

Datasets RDF files #instances

Optimistic

Pessimistic

OAEI Restaurants

Dataset

Restaurant1 339 Yes Yes

Restaurant2 1390 Yes Yes

OAEI PersonsDataset

Person11 1000 Yes Yes

Peson12 1000 Yes Yes

Person21 1200 Yes Yes

Dbpedia Dataset(properties

instasiated in at least 80% of the

data)

Person 763644 Yes No

NaturalPlace 78400 Yes No

BodyOfWater 34008 Yes No

Lake 33348 Yes No

googleFusion Dataset

G_Restaurant

372813 Yes Yes

ChefMoz Dataset

C_Restaurant

1047 Yes Yes

Page 15: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 15

Demo• Ontologies

▫ Data conforming to one ontology

• RDF data▫ Dbpedia NaturalPlace dataset (78400 instances)▫ OAEIPerson dataset (2000 instances)

• Data linking▫ Link data using LN2R ▫ Measure quality of linking using:

recall precision f-measure

Page 16: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 16

QUESTIONS???

Page 17: KD2R: a Key Discovery method for semantic Reference Reconciliation

Danai Symeonidou, WOD’2013 17

THANK YOU!!!