Towards Explainable, Data-Efficient, Verifiable Representation …itrc.inha.ac.kr/wp-content/uploads/2019/09/... · 2019-09-02 · Symbolic Ontologies, First-Order Logic, Logic Programming,

Towards Explainable, Data-Efficient, Verifiable Representation Learning

Pasquale Minervini University College London

Symbolic and Sub-Symbolic AI

Symbolic

Ontologies, First-Order Logic, Logic Programming, Knowledge Bases, Theorem Proving..

∀X, Y, Z : grandFather(X, Y ) ⇐ father(X, Z), parent(Z, Y )


Symbolic




✓Data-Efficient

✓Interpretable and Explainable

✓Easy to Incorporate Knowledge

✓Verifiable

Symbolic



Sub-Symbolic/Connectionist

Neural Representation Learning, (Deep) Latent Variable Models..

f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog


✓Data-Efficient



✓Verifiable



f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog

✓Noisy, Ambiguous, Sensory Data

✓Highly Parallel

✓High Predictive Accuracy

Symbolic


✓Data-Efficient



✓Verifiable





f1( f2(…fn( )…)) = {0.15 ≡ cat0.85 ≡ dog

✓Noisy, Ambiguous, Sensory Data

✓Highly Parallel

✓High Predictive Accuracy

Symbolic


✓Data-Efficient



✓Verifiable



Knowledge Graph — graph structured Knowledge Base, where knowledge is encoded by relationships between entities.

Knowledge Graphs

Knowledge Graph — graph structured Knowledge Base, where knowledge is encoded by relationships between entities. In practice — set of subject-predicate-object triples, denoting a relationship of type predicate between subject and object.

subject predicate object

Barack Obama was born in Honolulu

Hawaii has capital Honolulu

Barack Obama is politician of United States

Hawaii is located in United States

Barack Obama is married to Michelle Obama

Michelle Obama is a Lawyer

Michelle Obama lives in United States

Knowledge Graphs

Knowledge Graphs

Knowledge Graphs

Link Prediction in Knowledge Graphs

Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?

Rule-Based Link Prediction

∀X, Y, Z :married with(X, Y) ⇐

parent of(X, Z),parent of(Y, Z)

Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?

✕ Not always true ✕ Hard to learn from data ✕ Hard to formalise for other modalities

Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?

∀X, Y, Z :married with(X, Y) ⇐

parent of(X, Z),parent of(Y, Z)

Rule-Based Link Prediction

Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?

Neural Link Prediction

P(BO married MO) ∝

fmarried( , )

Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?


Learning Representationsℒ(𝒢 ∣ Θ) = ∑

(s,p,o)∈𝒢

log σ (fp(es, eo))+ ∑

(s,p,o)∉𝒢

log [1 − σ (fp(es, eo))]Washington

Malia Ann Obama

Sasha Obama

Barack Obama

Michelle Obama

lives in

parent of

?

P(BO married MO) ∝

fmarried( , )


Neural Link Prediction — Scoring Functions

Models Scoring Functions Parameters

RESCAL [Nickel et al. 2011]

TransE [Bordes et al. 2013]

DistMult [Yang et al. 2015]

HolE [Nickel et al. 2016]

ComplEx [Trouillon et al. 2016]

ConvE [Dettmers et al. 2017]

− es + rp − eo2

p

⟨es, rp, eo⟩

Re (⟨es, rp, eo⟩)

r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])

f (vec (f ([es; rp] * ω)) W) eo

rp ∈ ℝk

rp ∈ ℝk

rp ∈ ℝk

rp ∈ ℝk, W ∈ ℝc×k

rp ∈ ℂk

e⊤s Wpeo Wp ∈ ℝk×k

The interaction between the latent features is defined by the scoring function — several variants in the literature:f( ⋅ )









− es + rp − eo2

p

⟨es, rp, eo⟩


r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])


rp ∈ ℝk

rp ∈ ℝk

rp ∈ ℝk


rp ∈ ℂk











− es + rp − eo2

p

⟨es, rp, eo⟩


r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])


rp ∈ ℝk

rp ∈ ℝk

rp ∈ ℝk


rp ∈ ℂk











− es + rp − eo2

p

⟨es, rp, eo⟩


r⊤p (ℱ−1 [ℱ[es] ⊙ ℱ[eo]])


rp ∈ ℝk

rp ∈ ℝk

rp ∈ ℝk


rp ∈ ℂk



Convolutional 2D Knowledge Graph Embeddings

Subject Embedding

Predicate Embedding

[AAAI 2018]

Idea — use ideas from computer vision for modeling the interactions between latent features.


Subject Embedding

Predicate Embedding

Reshape

“Latent Images”


[AAAI 2018]


Subject Embedding

Predicate Embedding

Reshape

“Latent Images”

2D ConvolutionsConcatenate


[AAAI 2018]


Subject Embedding

Predicate Embedding

Reshape

“Latent Images”

2D ConvolutionsConcatenateProjection Logits

Product with Entity Matrix


[AAAI 2018]✓Scalable

✓State-of-the-art Results

Convolutional 2D Knowledge Graph EmbeddingsIdea — use ideas from computer vision for modeling the interactions between latent features.

[AAAI 2018]

0.1

0.225

0.35

0.475

0.6

DistMult ComplEx R-GCN ConvE

MRR Hits@1 Hits@3 Hits@10

✓Efficiency via parameter sharing ✓State-of-the-art Results

Interpreting Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..

[Minervini et al. ECML 2017]

Interpreting Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..

.. but we can use their geometric relationships for identifying — and incorporating — semantic relationships between them.


Regularising Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..



Regularising Knowledge Graph Embeddings


Regularising Knowledge Graph EmbeddingsQuite hard to understand the semantics of the learned representations..



✕ is a(x, y) ∧ is a(y, z) ⇒ is a(x, z)

Incorporating Background Knowledge via Adversarial Training

Idea — adversarial training process where, iteratively:

[Minervini et al. UAI 2017]


Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints


e.g. x, y, z such that is a(x, y) ∧ is a(y, z) ∧ ¬is a(x, z)


Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints • The model is regularised to correct such violations.



Idea — adversarial training process where, iteratively: • An adversary searches for inputs where the model violates constraints, • The model is regularised to correct such violations.

Formally:

minΘ

ℒdata(D ∣ Θ) + λ maxS

ℒviolation(S, D ∣ Θ)


e.g. S = {x, y, z} such that is a(x, y) ∧ is a(y, z) ∧ ¬is a(x, z)



Formally:

• Inputs S can be either input space or embedding space

• In most interesting cases, max has closed form solutions

• Constraints are guaranteed to hold everywhere in embedding space.


minΘ



e.g. S = {ex, ey, ez} such that is a(ex, ey) ∧ is a(ey, ez) ∧ ¬is a(ex, ez)



Formally:


e.g. S = {ex, ey, ez} such that is a(ex, ey) ∧ is a(ey, ez) ∧ ¬is a(ex, ez)

✓Incorporates Background Knowledge

✓Verifiable

minΘ





55

60.75

66.5

72.25

78

TransE KALE-Pre KALE-Joint DistMult ASR-DistMult ComplEx ASR-ComplEx

Hits@3 Hits@5 Hits@10



55

60.75

66.5

72.25

78

TransE KALE-Pre KALE-Joint DistMult ASR-DistMult ComplEx ASR-ComplEx


Incorporating Background Knowledge in Natural Language Inference Models

[Minervini et al. CoNLL 2018]

Natural Language Inference — detect the type of relationship, i.e. entailment, contradiction, neutral, between two sentences.


If a stentence x contradicts y, then also y contradicts x. If x entails y, and y entails z, then x also entails z.





x) A man in uniform is pushing a medical bed. y) A man is pushing carrying something.





ℒviolation({x, y}) : 0.01 ⇝ 0.92P(x entails y) = 0.72

P(y contradicts x) = 0.93

x) A man in uniform is pushing a medical bed. y) A man is pushing carrying something.





End-to-End Differentiable Reasoning

[NAMPI 2018, ACL 2019]

✓Can learn representations via gradient-based optimisation

✓Can provide explanations to the user (proofs)

✓Can incorporate background knowledge (rules)

✓Can learn interpretable rules from data

End-to-End Differentiable ReasoningTake a standard reasoning algorithm — e.g. backward-chaining — and replace matching between symbols with an end-to-end differentiable comparison operator.

[NAMPI 2018, ACL 2019]


Backward Chaining TL;DR — start with a list of goals, and work backwards from the consequent Q to the antecedent P (if P then Q) to see if any data supports any of the consequents.

[NAMPI 2018, ACL 2019]



Recursively reformulate each goal in simpler subgoals:

[NAMPI 2018, ACL 2019]

p(a)p(b)p(c)

…

q(X) ← p(X)q(a)?




[NAMPI 2018, ACL 2019]

p(a)p(b)p(c)

…

q(X) ← p(X)q(a)?

p(a)




[NAMPI 2018, ACL 2019]

p(a)p(b)p(c)

…

q(X) ← p(X)q(a)?

p(a)

✓


𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)

𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)

✓ ✓✕

q(X) ← p(X)q(a)?p(a)

p(b)p(c)

…p(a)

✓


[NAMPI 2018, ACL 2019]

Take a standard reasoning algorithm — e.g. backward-chaining — and replace matching between symbols with an end-to-end differentiable comparison operator.

𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)

𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏 (𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)

✓ ✓✓sim = 1sim = 1sim = 0.9


[NAMPI 2018, ACL 2019]


Knowledge Base:

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)

𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .

𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)


[NAMPI 2018, ACL 2019]


Knowledge Base:




𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)

proof score S1

𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)

proof score S2


[NAMPI 2018, ACL 2019]


Knowledge Base:





proof score S1


proof score S2

𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y )X /𝚊𝚋𝚎 Y/𝚋𝚊𝚛𝚝

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, 𝚋𝚊𝚛𝚝)

Subgoals:

proof score S3


[NAMPI 2018, ACL 2019]


Knowledge Base:





proof score S1


proof score S2



Subgoals:

proof score S3

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z

proof score S4

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …


[NAMPI 2018, ACL 2019]


Knowledge Base:𝚐𝚛𝚊𝚗𝚍𝙿𝚊𝙾𝚏(𝚊𝚋𝚎, 𝚋𝚊𝚛𝚝)


proof score S1


proof score S2



Subgoals:

proof score S3

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, Z )Z

proof score S4

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛) …

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)θ1(X, Y ) ⇐ θ2(X, Z ), θ3(Z, Y ) .

∑F∈K

log pKB∖F(F)

− ∑F̃∼corr(F)

log pKB(F̃)

TrainingMaximise Log-Likelihood:

Differentiable Reasoning at ScaleProblem — The number of proof paths grows exponentially with the proof depth.

[NAMPI 2018, ACL 2019]

Differentiable Reasoning at ScaleProblem — The number of proof paths grows exponentially with the proof depth. Solution — Dynamically consider only a subset of proof paths.

Knowledge Base:𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(𝚊𝚋𝚎, 𝚑𝚘𝚖𝚎𝚛)

𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(𝚑𝚘𝚖𝚎𝚛, 𝚋𝚊𝚛𝚝)𝚐𝚛𝚊𝚗𝚍𝙵𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Y ) ⇐

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X, Z ),𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y ) .



𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X /𝚊𝚋𝚎, Z )𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z, Y/𝚋𝚊𝚛𝚝)

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X /𝚊𝚋𝚎, Z )

𝚏𝚊𝚝𝚑𝚎𝚛𝙾𝚏(X /𝚊𝚋𝚎, Z /𝚑𝚘𝚖𝚎𝚛)

Z

𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z /𝚑𝚘𝚖𝚎𝚛, Y/𝚋𝚊𝚛𝚝)

𝚙𝚊𝚛𝚎𝚗𝚝𝙾𝚏(Z /𝚑𝚘𝚖𝚎𝚛, Y/𝚋𝚊𝚛𝚝)

Subgoal:

Subgoals:

[NAMPI 2018, ACL 2019]

Differentiable Reasoning at ScaleProblem — The number of proof paths grows exponentially with the proof depth. Solution — Dynamically consider only a subset of proof paths.

[NAMPI 2018, ACL 2019]

55

60.75

66.5

72.25

78

TransE DistMult ComplEx GNTP


Explainable Neural Link Prediction

[NAMPI 2018, ACL 2019]

Differentiable Reasoning in Natural LanguageProblem — Classic reasoners can only reason over symbolic, unambiguous representations.

[NAMPI 2018, ACL 2019]

Differentiable Reasoning in Natural LanguageProblem — Classic reasoners can only reason over symbolic, unambiguous representations. Solution — Represents Knowledge Base and Natural Language facts in a shared embedding space, and reason over them.

Rule Group p(X, Y) :- q(Y, X) Rules Rule Group p(X, Y) :- q(X, Z), r(Z, Y)

X Y :- Y X

encoder

KB

encoder

Query

AND

containedIn(River Thames, UK)

“London is located in the UK”

“London is standing on the River Thames”

“[X] is located in the [Y]”(X, Y) :- locatedIn(X, Y)

locatedIn(X, Y) :- locatedIn(X, Z), locatedIn(Z, Y)

KB Rep. Text Representations

X Y :- Y XX Y :- Y X X Y :- X Z,

Z YX Y :- X Z,

Z Y

Recurse

k-NN OR

[NAMPI 2018, ACL 2019]

Rule Group p(X, Y) :- q(Y, X) Rules Rule Group p(X, Y) :- q(X, Z), r(Z, Y)

X Y :- Y X

encoder

KB

encoder

Query

AND

containedIn(River Thames, UK)

“London is located in the UK”

“London is standing on the River Thames”

“[X] is located in the [Y]”(X, Y) :- locatedIn(X, Y)

locatedIn(X, Y) :- locatedIn(X, Z), locatedIn(Z, Y)

KB Rep. Text Representations

X Y :- Y XX Y :- Y X X Y :- X Z,

Z YX Y :- X Z,

Z Y

Recurse

k-NN OR

Differentiable Reasoning in Natural Language

[NAMPI 2018, ACL 2019]

Differentiable Reasoning in Natural Language

[NAMPI 2018, ACL 2019]

Open Problems

Reasoning Over Large and Web-scale Corpora• Reasoning over large corpora is still expensive

• We need ways of retrieving and extracting useful fragments of knowledge from massively large corpora

Natural Language Constraints and Explanations

• Background knowledge, constraints, and explanations are communicated in a declarative, formal, logic-like language • We should be able to express them in natural language,

and be able to trust machine-generated explanations

Formal Verification of Representation Learning Models

• I showed we can provide provable guarantees for a model to comply with a given set of constraints • How far can we go in this direction? • Can we have models that are provably unbiased and fair?

Thank you! [email protected]

mailto:[email protected]

Documents

Towards Explainable, Data-Efficient, Verifiable Representation …itrc.inha.ac.kr/wp-content/uploads/2019/09/... · 2019-09-02 · Symbolic Ontologies, First-Order Logic, Logic Programming,