60
Motivation Approach Evaluation Conclusion and Future Work DEER Automating RDF Dataset Transformation and Enrichment Mohamed Ahmed Sherif , Axel-Cyrille Ngonga Ngomo and Jens Lehmann June 3, 2015 Mohamed Ahmed Sherif , Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26

DEER - Automating RDF Dataset Transformation and Enrichment

Embed Size (px)

Citation preview

Page 1: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

DEERAutomating RDF Dataset Transformation and Enrichment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo andJens Lehmann

June 3, 2015

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 1/26

Page 2: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 2/26

Page 3: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 3/26

Page 4: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Why RDF Transformation & Enrichment?

Dataset DrugBank

Goal Gather information about companies related to drugs fora market study

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

:Druga

a

aa

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26

Page 5: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Why RDF Transformation & Enrichment?

Dataset DrugBank

Goal Gather information about companies related to drugs fora market study

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 4/26

Page 6: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

RDF Transformation & Enrichment

Need for enriched datasets

TourismQuestion AnsweringEnhanced Reality...

RDF transformation and enrichment

Triples to be added to the originalKB and/orTriples to be deleted from theoriginal KB

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

:Druga

a

aa

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 5/26

Page 7: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Manual Knowledge Base Enrichment

Demands for the specification of data enrichmentpipelines

Describe how data is to be integrated (usually manually)

Manual customized enrichment pipelines

⊕ Leads to the expected results

Time consuming

Cannot be ported easily to other datasets

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26

Page 8: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Manual Knowledge Base Enrichment

Demands for the specification of data enrichmentpipelines

Describe how data is to be integrated (usually manually)

Manual customized enrichment pipelines

⊕ Leads to the expected results

Time consuming

Cannot be ported easily to other datasets

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 6/26

Page 9: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Automatic Knowledge Base Enrichment

Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).

M is an ordered list of atomic enrichment functionsm ∈M

M =

{φ if K = K ′,

(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.

Research questions

1 How to create self-configuring atomic enrichmentfunctions m ∈M?

2 How to automatically generate an enrichment pipeline M?

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26

Page 10: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Automatic Knowledge Base Enrichment

Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).

M is an ordered list of atomic enrichment functionsm ∈M

M =

{φ if K = K ′,

(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.

Research questions

1 How to create self-configuring atomic enrichmentfunctions m ∈M?

2 How to automatically generate an enrichment pipeline M?

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 7/26

Page 11: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 8/26

Page 12: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

:Druga

a

aa

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26

Page 13: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26

Page 14: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:commentrdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26

Page 15: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:commentrdfs:comment

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 9/26

Page 16: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26

Page 17: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26

Page 18: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26

Page 19: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 10/26

Page 20: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26

Page 21: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26

Page 22: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26

Page 23: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 11/26

Page 24: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26

Page 25: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26

Page 26: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26

Page 27: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 12/26

Page 28: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26

Page 29: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26

Page 30: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 13/26

Page 31: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26

Page 32: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26

Page 33: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 14/26

Page 34: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26

Page 35: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26

Page 36: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo:relatedCompany

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 15/26

Page 37: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26

Page 38: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26

Page 39: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 16/26

Page 40: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Positive Example

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 17/26

Page 41: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 42: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 43: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 44: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 45: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 46: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 47: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 48: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

(m3,m2,m1)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 18/26

Page 49: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26

Page 50: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26

Page 51: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 19/26

Page 52: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 20/26

Page 53: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Experimental Setup

Datasets

1 manual experimental enrichment pipelines for Jamendo

2 manual experimental enrichment pipelines for DrugBank

5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)

Learning Algorithm

6 atomic enrichment functions

Termination criterion:

Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26

Page 54: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Experimental Setup

Datasets

1 manual experimental enrichment pipelines for Jamendo

2 manual experimental enrichment pipelines for DrugBank

5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)

Learning Algorithm

6 atomic enrichment functions

Termination criterion:

Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 21/26

Page 55: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Configuration of the Search Strategy

Node fitnessf (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer tobreadth first search (ω > 0).

Result: ω = 0.75 leads to thebest results

ω P R F

0 1.0 0.99 0.990.25 1.0 0.99 0.990.50 1.0 0.99 0.990.75 1.0 1.0 1.01.0 1.0 0.99 0.99

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 22/26

Page 56: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Effect of Positive Examples

Manual Examples Size of Time Size of Time Learn IterationsM count M M(KB) learned

M′M′(KB) Time count F -score

M1DBpedia

1 1 0.2 1 1.6 1.3 1 1.02 1 0.2 1 1.8 1.3 1 1.0

M2DBpedia

1 2 23.3 1 0.1 0.2 1 0.992 2 15 2 17 0.3 9 0.99

M3DBpedia

1 3 14.7 3 15.2 6.1 9 0.992 3 15 2 15.1 0.1 9 0.99

M4DBpedia

1 4 0.4 2 0.1 0.7 2 0.992 4 0.6 2 0.3 0.9 2 0.99

M5DBpedia

1 5 22 2 0.1 0.7 2 1.02 5 25.5 2 0.2 0.9 2 1.0

M1DrugBank

1 2 3.5 1 4.1 0.1 10 0.992 2 3.6 1 3.4 0.1 10 0.99

M2DrugBank

1 3 25.2 1 0.1 0.1 10 0.992 3 22.8 1 0.1 0.1 10 0.99

M1Jamendo

1 1 10.9 2 10.6 0.1 2 0.992 1 10.4 2 10.4 0.1 1 0.99

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 23/26

Page 57: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 24/26

Page 58: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Conclusion and Future Work

Conclusion

Presented self-configuring atomic enrichment functions

Presented an approach for learning enrichment pipelinesbased on a refinement operator

Showed that our approach can easily reconstructmanually created enrichment pipelines

Future Work

Parallelize the algorithm on several CPUs as well as loadbalancing

Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets

Pro-active enrichment strategies and active learning

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26

Page 59: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Conclusion and Future Work

Conclusion

Presented self-configuring atomic enrichment functions

Presented an approach for learning enrichment pipelinesbased on a refinement operator

Showed that our approach can easily reconstructmanually created enrichment pipelines

Future Work

Parallelize the algorithm on several CPUs as well as loadbalancing

Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets

Pro-active enrichment strategies and active learning

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 25/26

Page 60: DEER - Automating RDF Dataset Transformation and Enrichment

Motivation Approach Evaluation Conclusion and Future Work

Thank You!

Questions?Mohamed Sherif

Augustusplatz 10D-04109 Leipzig

[email protected]://aksw.org/MohamedSherif

http://aksw.org/Projects/DEER

#akswgroup

Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo and Jens Lehmann — DEER 26/26