Transcript
Page 1: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Semantically Conceptualizing and Annotating Tables

Stephen Lynn & David W. EmbleyData Extraction Research GroupDepartment of Computer ScienceBrigham Young University

Supported by the

Page 2: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Overview

Context WoK: Web of KnowledgeTANGO: Table ANalysis for Generating OntologiesMOGO: Mini-Ontology GeneratOr

Semantic Enrichment via MOGO ImplementationExperimentationEnhancements

Challenges & Opportunities

Page 3: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

WoK: a Web of Knowledge

Page 4: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

TANGO

fleck velter

gonsity (ld/gg)

hepth(gd)

burlam 1.2 120

falder 2.3 230

multon 2.5 400

velter

hepth

gonosity

fleck1

has1:*

1has 1:*

velter

hepth

gonosity

fleck1

has1:*

1has 1:*

TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology.

GrowingOntology

Page 5: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

MOGO

fleck velter

gonsity (ld/gg)

hepth(gd)

burlam 1.2 120

falder 2.3 230

multon 2.5 400

velter

hepth

gonosity

fleck1

has1:*

1has 1:*

velter

hepth

gonosity

fleck1

has1:*

1has 1:*

TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology.

GrowingOntology

MOGO generates mini-ontologiesfrom interpreted tables.

Page 6: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

MOGO Overview Table

InterpretationYields a canonical table

Canonical TableConcept/Value RecognitionRelationship DiscoveryConstraint DiscoveryYields a semantically enriched conceptual model

Mini-ontology Integration into a growing ontology

MOGO

Page 7: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Sample Input

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Sample Output

Page 8: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Concept/Value Recognition Lexical Clues

Labels as data values Data value assignment

Data Frame Clues Labels as data values Data value assignment

Default Recognize concepts and

values by syntax and layout

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Page 9: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Concept/Value Recognition Lexical Clues

Labels as data values Data value assignment

Data Frame Clues Labels as data values Data value assignment

Default Recognize concepts and

values by syntax and layout

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Concepts and Value Assignments

NortheastNorthwest

DelawareMaineOregonWashington

Location Region State

Page 10: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Concept/Value Recognition Lexical Clues

Labels as data values Data value assignment

Data Frame Clues Labels as data values Data value assignment

Default Recognize concepts and

values by syntax and layout

Population Latitude Longitude

2,122,869817,3761,305,4939,690,6653,559,5476,131,118

45444543

-90-93-120-120

Year

20022003

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Concepts and Value Assignments

NortheastNorthwest

DelawareMaineOregonWashington

Location Region State

Page 11: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Relationship Discovery Dimension Tree Mappings Lexical Clues

Generalization/Specialization Aggregation

Data Frames Ontology Fragment Merge

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

2000

Page 12: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Relationship Discovery Dimension Tree Mappings Lexical Clues

Generalization/Specialization Aggregation

Data Frames Ontology Fragment Merge

Page 13: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Page 14: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Validation Concept/Value Recognition

Correctly identified concepts Missed concepts False positives Data values assignment

Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets

Constraint Discovery Valid constraints Invalid constraints Missed constraints

Precision Recall F-measure

Concept Recognition

87% 94% 90%

Relationship Discovery

73% 81% 77%

Constraint Discovery

89% 91% 90%

FoundIncorrectTotalCorrectActual

FoundCorrectTotalprecision

___

__

CorrectActual

FoundCorrectTotalrecall

_

__

precisionrecall

precisionrecallmeasureF

**2

Page 15: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Concept Recognition Counted:

Correct/Incorrect/Missing Concepts

Correct/Incorrect/Missing Labels

Data value assignments

Page 16: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Relationship Discovery Counted:

Correct/incorrect/missing relationship sets

Correct/incorrect/missing aggregations and generalization/specializations

Page 17: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Constraint Discovery Counted:

Correct/Incorrect/Missing: Generalization/Specialization

constraints Computed value constraints Functional constraints Optional constraints

Page 18: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Concept Recognition Successes

98% of concepts identifiedMissing label identification97% of values assigned to

correct concept

Common problemsFinding an appropriate labelDuplicate concepts

Page 19: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Relationship Discovery Recall of 92% for relationship sets Missing aggregations and gen./spec.’s (only found in

label nesting) Unnecessary rel. sets generated (are computable)

Page 20: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Constraint Discovery

F-measure of 98% for functional relationship sets Computed value discovery Funtional/non-functional lists in cells

Page 21: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

MOGO Contributions

Tool to generate mini-ontologies Accuracy encouraging

Precision Recall F-measure

Concept Recognition

87% 94% 90%

Relationship Discovery

73% 81% 77%

Constraint Discovery

89% 91% 90%

Page 22: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Opportunities & Challenges: MOGO Enhancements

Check for inter-label relationshipsCheck for more complex computationsCheck for lists in cells…

Wish ListData-frame library

Atomic knowledge components Instance recognizers

Library of molecular componentsSemi-automatic construction of a WordNet-like resource for

knowledge components

Page 23: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Summary MOGO

Semantic EnrichmentEncouraging ResultsBut More Possible

Broader Implications ~ Vision & ChallengesTANGOWoK

Web of Data Semantic Annotation User-friendly Query Answering

[email protected]

Page 24: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Opportunities & Challenges: TANGO Table Interpretation

Transforming tables to F-logic [Pivk07]Layout-independent table representation [Jha08]Table interpretation by sibling tables [Tao07]

Semantic Enhancement / Ontology GenerationNaming unnamed table concepts [Pivk07]MOGO [Lynn09]

Semi-automatic Ontology IntegrationOntology Matching [Euzenat07]Ontology-mapping tools [Falconer07]Direct and indirect schema mappings for TANGO [Xu06]

Page 25: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Opportunities & Challenges: WoK

Web of Data “The Semantic Web is a web of data.” [W3C]Upcoming special issue of Journal of Web Semantics “Enabling a Web of Knowledge” [Tao09]

Information ExtractionDomain-independent IE from web tables [Gatterbauer07]Open IE [Banko07]

Page 26: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

Opportunities & Challenges: WoK … Semantic Annotation wrt Ontologies

Linking Data to Ontologies [Poggi08]TISP [Tao07]FOCIH [Tao09]

Reasoning & Query AnsweringDescription Logics [Baadar03]NLIDB CommunityAskOntos [Ding06]SerFR [Al-Muhammed07]

Page 27: Semantically Conceptualizing and Annotating Tables

ASWC’08Semantically Conceptualizing and Annotating Tables

References [Al-Muhammed07] Al-Muhammed and Embley, “Ontology-Based Constraint Recognition for Free-Form Service

Requests”, Proceedings of the 23rd International Conference on Data Engineering, 2007. [Baader, Calvanese, McGuinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge

University Press, 2003. [Banko07] Banko, Cafarella, Soderland, Broadhead and Etzioni, “Open Information Extraction from the Web”,

Proceedings of the International Joint Conference on Artificial Intelligence, 2007. [Ding06] Ding, Embley and Liddle, “Automatic Creation and Simplified Querying of Semantic Web Content: An

Approach Based on Information-Extraction Ontologies”, Proceedings of the First Asian Semantic Web Conference, 2006.

[Euzenat07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, 2007. [Falconer07] Falconer, Noy and Storey, “Ontology Mapping—A User Survey”, Proceedings of the Second

International Workshop on Ontology Mapping, 2007. [Gatterbauer07] Gatterbauer, Bohunsky, Herzog and Pollak, “Towards Domain-Independent Information

Extraction from Web Tables”, Proceedings of the Sixteenth International World Wide Web Conference, 2007. [Jha07] Jha and Nagy, “Wang Notation Tool: Layout Independent Representation of Tables”, Proceedings of the

19th International Conference on Pattern Recognition, 2007. [Pivk07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, “Transforming Arbitrary Tables into Logical Form with

TARTAR”, Data & Knowledge Engineering, 2007. [Poggi08] Poggi, Lembo, Calvanese, DeGiacomo, Lenzerini and Rosati, “Linking Data to Ontologies”, Journal on

Data Semantics, 2008. [Tao07] Tao and Embley, “Automatic Hidden-Web Table Interpretation by Sibling page Comparison”, Proceedings

of the 26th International Conference on Conceptual Modeling, 2007. [Tao09] Tao, Embley and Liddle, “Enabling a Web of Knowledge”, Technical Report : tango.byu.edu/papers, 2009. [Xu06] Xu and Embley, “A Composite Approach to Automating Direct and Indirect Schema Mappings”, Information

Systems, 2006.