Upload
andre-freitas
View
104
Download
1
Tags:
Embed Size (px)
Citation preview
Semantic Computing for coping with the
long tail of data variety
frequency of use
# of entities and attributes
relational NoSQL
schema-less unstructured
more
knowledge
Full data coverage
Full automation
Full knowledge
Structure/Semantics
Unstructured Data Structured Data
Consistent
Comparable
Processable
Easy to generate Easy to analyze
Semantic Computing
Robust Semantic Model
Semantic intelligent behavior is highly dependent on knowledge scale (commonsense, semantic)
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
6
Robust Semantic Model
Not scalable! 1st Hard problem: Acquisition
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
7
Robust Semantic Model
Not scalable! 2nd Hard problem: Consistency
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
8
Robust Semantic Model
Not scalable! 3rd Hard problem: Performance
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
9
“Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.”
“If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.”
Formal World
Real World
Baroni et al. 2013
Semantics for a Complex World
10
Distributional Semantic Models
Semantic Model with low acquisition effort (automatically built from text)
Simplification of the representation
Enables the construction of comprehensive
commonsense/semantic KBs What is the cost?
Some level of noise (semantic best-effort)
Limited semantic model
11
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend to be semantically similar”
“He filled the wampimuk with the substance, passed it around and we all drunk some”
12 McDonald & Ramscar, 2001 Baroni & Boleda, 2010 Harris, 1954
Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
contexts = nouns and verbs in the same
sentence
13
Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
bark
dog
park
leash
contexts = nouns and verbs in the same
sentence
bark : 2
park : 1
leash : 1
owner : 1
14
Shift in the Database Landscape
Very-large and dynamic “schemas”.
10s-100s attributes 1,000s-1,000,000s attributes
before 2000 circa 2015
19 Brodie & Liu, 2010
Schema-agnosticism
Ab
str
ac
tio
n
La
ye
r
21
Who is the daughter
of Bill Clinton?
Bill
Clinton Chelsea
Clinton child
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Semantic Gap Schema-agnostic
query mechanisms
Abstraction level differences
Lexical variation
Structural (compositional) differences
22
Proposed Approach
Who is the daughter of Bill Clinton married to?
Abstraction level differences
Lexical variation
Structural (compositional) differences
23
Ƭ-Space: Hybrid Distributional-Relational
Semantic Model
24 A Distributional Structured Semantic Space for
Querying RDF Graph Data, IJSC 2012
Approach Overview
Query Planner
Ƭ
Large-scale
unstructured data
Database
Query Analysis Schema-agnostic
Query
Query Features
Query Plan
25
Addressing the Vocabulary Problem for
Databases (with Distributional Semantics)
Gaelic: direction
26
Dataset
Dataset (DBpedia 3.6 + YAGO classes):
45,768 properties
288,316 classes
9,434,677 instances
128,071,259 triples
27
Comparative Analysis
Better recall and query coverage compared to baselines with
equivalent precision.
More comprehensive semantic matching.
32
Distributional Semantics vs WordNet
Distributional semantics provides a more comprehensive
semantic matching
33 A Distributional Approach for Terminological Semantic Search on the Linked Data
Web, ACM SAC, 2012
Large-scale Querying
frequency of use
# of entities and attributes
relational NoSQL
schema-less unstructured
Schema-agnostic querying
Relation/Graph Extraction
Now that we are schema-agnostic ...
From Text to Knowledge Graph
Relations + Context + Entity Linking
Ontology-agnostic
RDF serialization
Relation/Graph Extraction
In 2002, GE acquired the wind power assets of Enron. In 2002 GE acquired the wind power assets of Enron
Relation/Graph Extraction
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
A Semantic Best-Effort Approach for Extracting Structured
Discourse Graphs from Wikipedia, WoLE 2012
Large-scale Extraction
frequency of use
# of entities and attributes
relational NoSQL
schema-less unstructured
Large-scale Graph Extraction
Commonsense Reasoning
Coping with KB incompleteness - Supporting semantic approximation
Selective (focussed) reasoning - Selecting the relevant facts in the context of the inference
Acquisition
Scalability
Strategy: Using distributional semantics to solve both the acquisition
and scalability problems
42
Commonsense Reasoning
43
John Smith Engineer Instance-level occupation
Does John Smith have a degree?
Commonsense Reasoning
44
John Smith Engineer Instance-level occupation
Engineer learn subjectof
Does John Smith have a degree?
Commonsense
KB
Selective Reasoning
45
John Smith Engineer Instance-level occupation
Engineer learn subjectof
memorization is a
Does John Smith have a degree?
Commonsense
KB
Selective reasoning
Commonsense Reasoning
46
John Smith Engineer Instance-level occupation
Engineer learn subjectof
memorization is a
education have or
involve
Does John Smith have a degree?
Commonsense
KB
Commonsense Reasoning
47
John Smith Engineer Instance-level occupation
Engineer learn subjectof
memorization
is a
education have or
involve
university at location
Does John Smith have a degree?
Commonsense
KB
Coping with Incompleteness
48
John Smith Engineer Instance-level occupation
Engineer learn subjectof
memorization
is a
education have or
involve
university at location college
Does John Smith have a degree?
Commonsense
KB
Coping with KB
Incompleteness
Commonsense Reasoning
Does John Smith have a degree?
49
John Smith Engineer Instance-level occupation
Engineer learn subjectof
memorization
is a
education have or
involve
university at location college
degree gives
Commonsense
KB
A Distributional Semantics Approach for Selective Reasoning on
Commonsense Graph Knowledge Bases, NLDB 2014.
Programming in a Schema-agnostic World
50 Towards An Approximative Ontology-Agnostic Approach for Logic
Programs, FOIKS 2014.
Semantics at Scale: When Distributional Semantics meets Logic
Programming, ALP Newsletter, 2014
Programming in a Schema-agnostic World
frequency of use
# of entities and attributes
relational NoSQL
schema-less unstructured
Schema-agnostic programs
Existing semantic technologies can address today major data
management problems
Muiti-disciplinarity is one key: - NLP + IR + Semantic Web + Databases
Schema-agnosticism is a central property/functionality/goal!
Distributional Semantics + semantics of structured data =
schema-agnosticism
Schema-agnosticism brings major impact for information systems.
We can tame the long tail of data variety!
The wave is just starting. Be a part of it!
Take-away Message
53