Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Open System Categorical Quantum Semanticsin Natural Language Processing
R. Piedeleu1 D. Kartsaklis2 B. Coecke1 M. Sadrzadeh2
1Department of Computer ScienceUniversity of Oxford
2School of Electronic Engineeringand Computer Science
Queen Mary University of London
CALCO 2015
[email protected] Open System Categorical Quantum Semantics in NLP 1/28
In a nutshell
Categorical compositional distributional semantics unifies twoorthogonal semantic paradigms:
The type-logical compositional approach of formal semanticsThe quantitative perspective of vector space models ofmeaning
The goal is to represent sentences as points in some highdimensional metric space
In this work:
Inspired by categorical quantum mechanics, we extend the modelin order to explicitly take into account lexical ambiguity during thecompositional process.
[email protected] Open System Categorical Quantum Semantics in NLP 2/28
Outline
1 Categorical compositional distributional models
2 Composition and lexical ambiguity
3 Open system quantum semantics
4 From theory to practice
[email protected] Open System Categorical Quantum Semantics in NLP 3/28
The meaning of words
Distributional hypothesis
Words that occur in similar contexts have similar meanings [Harris,1958].
The functional interplay of philosophy and ? should, as a minimum, guarantee......and among works of dystopian ? fiction...
The rapid advance in ? today suggests......calculus, which are more popular in ? -oriented schools.
But because ? is based on mathematics......the value of opinions formed in ? as well as in the religions...
...if ? can discover the laws of human nature.......is an art, not an exact ? .
...factors shaping the future of our civilization: ? and religion....certainty which every new discovery in ? either replaces or reshapes.
...if the new technology of computer ? is to grow significantlyHe got a ? scholarship to Yale.
...frightened by the powers of destruction ? has given......but there is also specialization in ? and technology...
[email protected] Open System Categorical Quantum Semantics in NLP 4/28
The meaning of words
Distributional hypothesis
Words that occur in similar contexts have similar meanings [Harris,1958].
The functional interplay of philosophy and science should, as a minimum, guarantee......and among works of dystopian science fiction...
The rapid advance in science today suggests......calculus, which are more popular in science -oriented schools.
But because science is based on mathematics......the value of opinions formed in science as well as in the religions...
...if science can discover the laws of human nature.......is an art, not an exact science .
...factors shaping the future of our civilization: science and religion....certainty which every new discovery in science either replaces or reshapes.
...if the new technology of computer science is to grow significantlyHe got a science scholarship to Yale.
...frightened by the powers of destruction science has given......but there is also specialization in science and technology...
[email protected] Open System Categorical Quantum Semantics in NLP 4/28
Distributional models of meaning
A word is a vector of co-occurrence statistics with every otherword in a selected subset of the vocabulary:
milk
cute
dog
bank
money
12
8
5
0
1
cat
cat
dog
account
money
pet
Semantic relatedness is usually based on cosine similarity:
sim(−→v ,−→u ) = cos θ−→v ,−→u =〈−→v · −→u 〉‖−→v ‖‖−→u ‖
[email protected] Open System Categorical Quantum Semantics in NLP 5/28
Moving to phrases and sentences
We would like to generalize this idea to phrases and sentences
However, it’s not clear how
There are practical problems—there is not enough data:
But even if we had a very large corpus, what the context of asentence would be?
A solution:
For a sentence w1w2 . . .wn, find a function f such that:
−→s = f (−→w1,−→w2, . . . ,
−→wn)
[email protected] Open System Categorical Quantum Semantics in NLP 6/28
Moving to phrases and sentences
We would like to generalize this idea to phrases and sentences
However, it’s not clear how
There are practical problems—there is not enough data:
But even if we had a very large corpus, what the context of asentence would be?
A solution:
For a sentence w1w2 . . .wn, find a function f such that:
−→s = f (−→w1,−→w2, . . . ,
−→wn)
[email protected] Open System Categorical Quantum Semantics in NLP 6/28
Categorical compositional distributional semantics
Coecke, Sadrzadeh and Clark (2010):
Let syntax drive the semantic derivation, as in formal semantics.
Pregroup grammars are structurally homomorphic with thecategory of finite-dimensional Hilbert spaces and linear maps(both share compact closure)
In abstract terms, there exists a structure-preserving passagefrom grammar to meaning:
F : Grammar→ Meaning
The meaning of a sentence w1w2 . . .wn with grammaticalderivation α is defined as:
−−−−−−−→w1w2 . . .wn := F(α)(−→w1 ⊗−→w2 ⊗ . . .⊗−→wn)
[email protected] Open System Categorical Quantum Semantics in NLP 7/28
A multi-linear model
The grammatical type of a word defines the vector space in whichthe word lives:
Nouns are vectors in N;
adjectives are linear maps N → N, i.e elements in N ⊗ N;
intransitive verbs are linear maps N → S , i.e. elements inN ⊗ S ;
transitive verbs are bi-linear maps N ⊗ N → S , i.e. elementsof N ⊗ S ⊗ N;
and so on.
The composition operation is tensor contraction, based oninner product.
[email protected] Open System Categorical Quantum Semantics in NLP 8/28
Categorical composition: example
S
NP
Adj
happy
N
kids
VP
V
play
N
games
happy kids play games
n nl n nr s nl n
Type reduction morphism:
(εrn · 1s) ◦ (1n · εl
n · 1nr ·s · εln) : n · nl · n · nr · s · nl · n→ s
F[(εr
n · 1s) ◦ (1n · εln · 1nr ·s · εl
n)] (
happy ⊗−−→kids ⊗ play ⊗−−−−→games
)=
(εN ⊗ 1S) ◦ (1N ⊗ εN ⊗ 1N⊗S ⊗ εN)(happy ⊗
−−→kids ⊗ play ⊗−−−−→games
)=
happy ×−−→kids × play ×−−−−→games
−−→kids,−−−−→games ∈ N happy ∈ N ⊗ N play ∈ N ⊗ S ⊗ N
[email protected] Open System Categorical Quantum Semantics in NLP 9/28
Outline
1 Categorical compositional distributional models
2 Composition and lexical ambiguity
3 Open system quantum semantics
4 From theory to practice
[email protected] Open System Categorical Quantum Semantics in NLP 10/28
Ambiguity in word spaces
Compositional distributional models of meaning are mainly basedon ambiguous semantic spaces:
0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.80.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
donor transplantliver
transplantation
kidney
lung
organ (medicine)
accompaniment
bass
orchestra
hymn
recital
violin
concert
organ (music)
organ
∗real vectors projected onto a 2-dimensional space using MDS
[email protected] Open System Categorical Quantum Semantics in NLP 11/28
Homonymy and polysemy (1/2)
We distinguish between two types of lexical ambiguity:
In cases of homonymy (organ, bank, vessel etc.), due to somehistorical accident the same word is used to describe two (ormore) completely unrelated concepts.
Polysemy relates to subtle deviations between the differentsenses of the same word.
Example:
The distinction between the financial sense and the river senseof bank is a case of homonymy;
Within the financial sense, a distinction between the abstractconcept of bank as an institution and the concrete building isa case of polysemy.
[email protected] Open System Categorical Quantum Semantics in NLP 12/28
Homonymy and polysemy (2/2)
Example #1: “I went to the bank to open a savings account”
The word bank is used with its financial sense
The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time
Example #2: “I went to the bank”
The word bank is probably used with the financial sense inmind (because most of the time this is the case)
However, a small possibility that the sayer has actually visiteda river bank still exists
Main point:
Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification
[email protected] Open System Categorical Quantum Semantics in NLP 13/28
Homonymy and polysemy (2/2)
Example #1: “I went to the bank to open a savings account”
The word bank is used with its financial sense
The sayer refers to both of the polysemous meanings ofbankfin (institution and building) at the same time
Example #2: “I went to the bank”
The word bank is probably used with the financial sense inmind (because most of the time this is the case)
However, a small possibility that the sayer has actually visiteda river bank still exists
Main point:
Polysemy: Relatively coherent and self-contained conceptsHomonymy: Lack of specification
[email protected] Open System Categorical Quantum Semantics in NLP 13/28
Setting our goals
The problem:
How can we formalize the explicit treatment of lexical ambiguity inthe categorical compositional model?
We seek a model that will allow us:
1 to express homonymous words as probabilistic mixings of theirindividual meanings;
2 to retain the ambiguity until the presence of sufficient contextthat will eventually resolve it during composition time;
3 to achieve all the above in the multi-linear setting imposed bythe vector space semantics of our original model.
[email protected] Open System Categorical Quantum Semantics in NLP 14/28
Outline
1 Categorical compositional distributional models
2 Composition and lexical ambiguity
3 Open system quantum semantics
4 From theory to practice
[email protected] Open System Categorical Quantum Semantics in NLP 15/28
A little quantum theory
Quantum mechanics and distributional models of meaning areboth based on vector space semantics
The state of a quantum system is represented by a vector in aHilbert space H. Fixing a basis for H:
|ψ〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉
we take |ψ〉 to be a quantum superposition of the basis states{|ki 〉}i .
i.e. the quantum system co-exists in all basis states in parallelwith strengths denoted by the corresponding weights
Such a state is called a pure state.
[email protected] Open System Categorical Quantum Semantics in NLP 16/28
Word vectors as quantum states
We take words to be quantum systems, and word vectorsspecific states of these systems:
|w〉 = c1|k1〉+ c2|k2〉+ . . .+ cn|kn〉
Each element of the ONB {|ki 〉}i is essentially an atomicsymbol:
|cat〉 = 12|milk ′〉+ 8|cute ′〉+ . . .+ 0|bank ′〉
In other words, a word vector is a probability distribution overatomic symbols
|w〉 is a pure state: when word w is seen alone, it is likeco-occurring with all the basis words with strengths denotedby the various coefficients.
[email protected] Open System Categorical Quantum Semantics in NLP 17/28
Encoding homonymy with mixed states
Ideally, every disjoint meaning of a homonymous word mustbe represented by a distinct pure state:
|bankfin〉 = a1|k1〉+ a2|k2〉+ . . .+ an|kn〉|bankriv 〉 = b1|k1〉+ b2|k2〉+ . . .+ bn|kn〉
{ai}i 6= {bi}i , since the financial sense and the river sense areexpected to be seen in drastically different contexts
So we have two distinct states referring to the same system
We cannot be certain under which state our system may befound – we only know that the former state is more probablethan the latter
In other words, the system is better described by aprobabilistic mixture of pure states, i.e. a mixed state.
[email protected] Open System Categorical Quantum Semantics in NLP 18/28
Density operators
Mathematically, a mixed state is represented by a densityoperator:
ρ(w) =∑
i
pi |si 〉〈si |
For example:
ρ(bank) = 0.80|bankfin〉〈bankfin|+ 0.20|bankriv 〉〈bankriv |
A density operator is a probability distribution over vectors.
Properties of a density operator ρ
Positive semi-definite: 〈v |ρ|v〉 ≥ 0 ∀v ∈ H
Of trace one: Tr(ρ) = 1
Self-adjoint: ρ = ρ†
[email protected] Open System Categorical Quantum Semantics in NLP 19/28
Complete positivity: The CPM construction
We need:
to replace word vectors with density operators
to replace linear maps with completely positive linear maps,i.e. maps that send density operators to density operatorswhile respecting the monoidal structure.
Selinger (2007):
Any dagger compact closed category is associated with a categoryin which the objects are the objects of the original category, butthe maps are completely positive maps.
For f1 : A⊗ A∗ → B ⊗ B∗ and f2 : C ⊗ C ∗ → D ⊗ D∗:
f1 ⊗CPM f2 : A⊗ C ⊗ C ∗ ⊗ A∗∼=−→ A⊗ A∗ ⊗ C ⊗ C ∗
f1⊗f2−−−→ B ⊗ B∗ ⊗ D ⊗ D∗∼=−→ B ⊗ D ⊗ D∗ ⊗ B∗
[email protected] Open System Categorical Quantum Semantics in NLP 20/28
Categorical model of meaning: Reprise
The passage from a grammar to distributional meaning isdefined according to the following composition:
PregF−→ FHilb
L−→ CPM(FHilb)
The meaning of a sentence w1w2 . . .wn with grammaticalderivation α becomes:
L(F(α)) (ρ(w1)⊗CPM ρ(w2)⊗CPM . . .⊗CPM ρ(wn))
Composition takes this form:
Subject-intransitive verb: ρIN = TrN(ρ(v) ◦ (ρ(s)⊗ 1S ))
Adjective-noun: ρAN = TrN(ρ(adj) ◦ (1N ⊗ ρ(n)))
Subj-trans. verb-Obj: ρTS = TrN,N(ρ(v) ◦ (ρ(s)⊗ 1S ⊗ ρ(o)))
[email protected] Open System Categorical Quantum Semantics in NLP 21/28
Using Frobenius algebras
Every vector space with fixed space has Frobenius maps∆ :: |i〉 7→ |i〉 ⊗ |i〉 and µ :: |i〉 ⊗ |i〉 7→ |i〉 over it, useful for:
reducing space and time complexity (Kartsaklis et al.,COLING 2012);
encoding the meaning of functional words, such as relativepronouns (Sadrzadeh et al., MoL 2013);
modelling various linguistic phenomena, such as intonation(Kartsaklis and Sadrzadeh, MoL 2015).
The new formulation allows for non-commutative versions ofFrobenius algebras:
µ := (1A ⊗ εA ⊗ 1∗A) ◦ (1A⊗A ⊗ σA,A∗) : A⊗ A→ A ι := ηA∗ : I → A
µ(ρ(w1)⊗ ρ(w2)) = ρ(w1) ◦ ρ(w2)
[email protected] Open System Categorical Quantum Semantics in NLP 22/28
Outline
1 Categorical compositional distributional models
2 Composition and lexical ambiguity
3 Open system quantum semantics
4 From theory to practice
[email protected] Open System Categorical Quantum Semantics in NLP 23/28
Measuring Von Neumann entropy
Relative Clausesnoun: verb1/verb2 noun noun that verb1 noun that verb2
organ: enchant/ache 0.18 0.11 0.08vessel : swell/sail 0.25 0.16 0.01queen: fly/rule 0.28 0.14 0.16nail : gleam/grow 0.19 0.06 0.14bank: overflow/loan 0.21 0.19 0.18
Adjectivesnoun: adj1/adj2 noun adj1 noun adj2 nounorgan: music/body 0.18 0.10 0.13vessel : blood/naval 0.25 0.05 0.07queen: fair/chess 0.28 0.05 0.16nail : rusty/finger 0.19 0.04 0.11bank: water/financial 0.21 0.20 0.16
An important aspect of the proposed model:
Disambiguation = Purification
[email protected] Open System Categorical Quantum Semantics in NLP 24/28
Conclusion and future work
Density operators offer richer semantics representations fordistributional models of meaning
From probability distributions over symbols we advance toprobability distributions over vectors
Many opportunities for further research
The non-commutative algebras offer a variety of options, thelinguistic intuition of which needs to be explored
Iterated use of CPM construction is an intriguing feature thatdeserves separate treatment
Density operators support a form of logic whose distributionaland compositional properties remains to be examined
Large-scale experimental evaluation currently in progress
[email protected] Open System Categorical Quantum Semantics in NLP 25/28
Thank you for listening!
[email protected] Open System Categorical Quantum Semantics in NLP 26/28
References I
Abramsky, S. and Coecke, B. (2004).
A categorical semantics of quantum protocols.In 19th Annual IEEE Symposium on Logic in Computer Science, pages 415–425.
Balkır, E. (2014).
Using density matrices in a compositional distributional model of meaning.Master’s thesis, University of Oxford.
Coecke, B., Sadrzadeh, M., and Clark, S. (2010).
Mathematical Foundations for a Compositional Distributional Model of Meaning. Lambek Festschrift.Linguistic Analysis, 36:345–384.
Kartsaklis, D., Kalchbrenner, N., and Sadrzadeh, M. (2014).
Resolving lexical ambiguity in tensor regression models of meaning.In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers), pages 212–217, Baltimore, Maryland. Association for Computational Linguistics.
Kartsaklis, D. and Sadrzadeh, M. (2013).
Prior disambiguation of word tensors for constructing sentence vectors.In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages1590–1601, Seattle, Washington, USA. Association for Computational Linguistics.
Kartsaklis, D. and Sadrzadeh, M. (2015).
A Frobenius model of information structure in categorical compositional distributional semantics.In Proceedings of the 14th Meeting on Mathematics of Language.
[email protected] Open System Categorical Quantum Semantics in NLP 27/28
References II
Kartsaklis, D., Sadrzadeh, M., and Pulman, S. (2012).
A unified sentence space for categorical distributional-compositional semantics: Theory and experiments.In Proceedings of 24th International Conference on Computational Linguistics (COLING 2012): Posters,pages 549–558, Mumbai, India. The COLING 2012 Organizing Committee.
Kartsaklis, D., Sadrzadeh, M., Pulman, S., and Coecke, B. (2015).
Reasoning about meaning in natural language with compact closed categories and Frobenius algebras.In Chubb, J., Eskandarian, A., and Harizanov, V., editors, Logic and Algebraic Structures in QuantumComputing and Information, Association for Symbolic Logic Lecture Notes in Logic. Cambridge UniversityPress.
Piedeleu, R., Kartsaklis, D., Coecke, B., and Sadrzadeh, M. (2015).
Open system categorical quantum semantics in natural language processing.arXiv preprint arXiv:1502.00831.
Sadrzadeh, M., Clark, S., and Coecke, B. (2013).
The Frobenius anatomy of word meanings I: subject and object relative pronouns.Journal of Logic and Computation, Advance Access.
Sadrzadeh, M., Clark, S., and Coecke, B. (2014).
The Frobenius anatomy of word meanings II: Possessive relative pronouns.Journal of Logic and Computation.
Selinger, P. (2007).
Dagger compact closed categories and completely positive maps.Electronic Notes in Theoretical Computer Science, 170:139–163.
[email protected] Open System Categorical Quantum Semantics in NLP 28/28