View
135
Download
1
Embed Size (px)
Citation preview
TAIPAN: Automatic Property Mapping for
Tabular Data by Ivan Ermilov and Axel-Cyrille Ngonga Ngomo
November 22nd, 2016
1
Web Scale Data Mining from Web Tables
Web Data CommonsDresden Table Dataset
Other tables
The Web
TAIPAN
● Structured● Schemaless● Not using standards*
● SPARQL● RDFS● OWL
2
TAIPAN Approach Overview
Identify Subject Column
Atomize a Table
Identify Property for Each Table
Step 1 Step 2 Step 3 Step 4
Return Mappings
3
TAIPAN Approach Overview (example)1
2
3
4
The Core of TAIPAN
Subject Column Identification
● Unsupervised ML● Structural features● Semantic features
○ Support of a column○ Connectivity
● Retrieve seed entities● Rank entities● Return top entity
Property Mapping
5
Experimental setup
For T2K: 128GB, 4 Cores, Ubuntu 14.04
For TAIPAN: 16GB, 4 Cores Ubuntu 14.04
Dataset 1: curated T2D gold standard (T2D)
Dataset 2: DBpedia table dataset (DBD)
6
Subject Column Identification Experiments
Rule-based approach achieves only 51.72% accuracy
Using support and connectivity increase precision
Observations
Can be further improved using ML techniques
7
Property Mapping Experiments
TAIPAN achieves better recall, but lower precision than T2D
On the DBD dataset T2K could match only 1 property
Observations
Overall TAIPAN performs better than the state of the art
8
Conclusions & Future Work
Curated T2D & DBD datasets
Novel TAIPAN approach
Open Table Extraction
Table Extraction Benchmark (HOBBIT)
Integration of TAIPAN into GEISER project9