40
Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases Tim Rockt¨ aschel University College London Computer Science 2nd Conference on Artificial Intelligence and Theorem Proving 26th of March 2017

Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Embed Size (px)

Citation preview

Page 1: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Deep Prolog: End-to-end Differentiable

Proving in Knowledge Bases

Tim Rocktaschel

University College LondonComputer Science

2nd Conference on Artificial Intelligence and Theorem Proving

26th of March 2017

Page 2: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Overview

Machine Learning

Deep Learning

Inputs Outputs

X YTrainable Function

Artificial Neural Network

First-order Logic

“Every father of a parent is a grandfather.”

grandfatherOf(X,Y) :–

fatherOf(X,Z),

parentOf(Z,Y).

• Behavior learned automatically• Strong generalization• Needs a lot of training data• Behavior not interpretable

• Behavior defined manually• No generalisation• Needs no training data• Behavior interpretable

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 1/37

Page 3: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 2/37

Page 4: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 3/37

Page 5: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Notation

Constant: homer, bart, lisa etc. (lowercase)

Variable: X, Y etc. (uppercase, universally quantified)

Term: constant or variable

Predicate: fatherOf, parentOf etc.function from terms to a Boolean

Atom: predicate and terms, e.g., parentOf(X, bart)

Literal: negated or non-negated atom, e.g.,not parentOf(bart, lisa)

Rule: head :– body.head: literalbody: (possibly empty) list of literals representing conjunction

Fact: ground rule (no free variables) with empty body, e.g.,parentOf(homer,bart).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 4/37

Page 6: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Example Knowledge Base

1 fatherOf(abe,homer).2 parentOf(homer, lisa).3 parentOf(homer,bart).4 grandpaOf(abe, lisa).5 grandfatherOf(abe,maggie).6 grandfatherOf(X1,Y1) :–

fatherOf(X1,Z1),parentOf(Z1,Y1).

7 grandparentOf(X2,Y2) :–grandfatherOf(X2,Y2).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 5/37

Page 7: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Backward Chaining

1 def or(KB, goal, Ψ):2 for rule head :– body in KB do3 Ψ′ ← unify(head, goal, Ψ)

4 if Ψ′ 6= failure then5 for Ψ′′ in and(KB, body, Ψ′) do6 yield Ψ′′

7 def and(KB, subgoals, Ψ):8 if subgoals is empty then return Ψ;9 else

10 subgoal ← substitute(head(subgoals), Ψ)11 for Ψ′ in or(KB, subgoal, Ψ) do12 for Ψ′′ in and(KB, tail(subgoals), Ψ′) do yield Ψ′′ ;

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 6/37

Page 8: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Unification

1 def unify(A, B, Ψ):2 if Ψ = failure then return failure;3 else if A is variable then4 return unifyvar(A, B, Ψ)

5 else if B is variable then6 return unifyvar(B, A, Ψ)

7 else if A = [a1, . . . , aN ] and B = [b1, . . . , bN ] are atoms then8 Ψ′ ← unify([a2, . . . , aN ], [b2, . . . , bN ], Ψ)

9 return unify(a1, b1, Ψ′)

10 else if A = B then return Ψ;

11 else return failure;

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 7/37

Page 9: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Example

Example Knowledge Base:1. fatherOf(abe,homer).2. parentOf(homer,bart).3. grandfatherOf(X,Y) :–

fatherOf(X,Z),parentOf(Z,Y).

grandfatherOf(abe, bart)?

failure failure success{X/abe,Y/bart}

3.1 fatherOf(abe, Z)?

1 2 3

success{X/abe,Y/bart,Z/homer}3.2 parentOf(homer, bart)?

failure failure

1 2 3

failure success{X/abe,Y/bart,Z/homer}

failure

1 2 3

Query

or 0

and 0

or 1

or 1

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 8/37

Page 10: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 9/37

Page 11: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Symbolic Representations

Symbols (constants and predicates) do not share any information:grandpaOf 6= grandfatherOf

No notion of similarity:apple ∼ orange, professorAt ∼ lecturerAt

No generalization beyond what can be symbolically inferred:isFruit(apple), apple ∼ organge, isFruit(orange)?

But... leads to powerful inference mechanisms and proofs forpredictions: fatherOf(abe,homer). parentOf(homer, lisa).parentOf(homer,bart).grandfatherOf(X,Y) :– fatherOf(X,Z), parentOf(Z,Y).grandfatherOf(abe,Q)? {Q/lisa}, {Q/bart}Fairly easy to debug and trivial to incorporate domain knowledge:just change/add rules

Hard to work with language, vision and other modalities‘‘is a film based on the novel of the same name by’’(X, Y)

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 10/37

Page 12: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Neural Representations

Lower-dimensional fixed-length vector representations ofsymbols (predicates and constants):vapple, vorange, vfatherOf, . . . ∈ Rk

Can capture similarity and even semantic hierarchy ofsymbols: vgrandpaOf = vgrandfatherOf,vapple ∼ vorange, vapple < vfruit

Can be trained from raw task data (e.g. facts)

Can be compositionalv‘‘is the father of’’ = RNNθ(vis, vthe, vfather, vof)

But... need large amount of training data

No direct way of incorporating prior knowledgevgrandfatherOf(X,Y) :– vfatherOf(X,Z), vparentOf(Z,Y).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 11/37

Page 13: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Related Work

Fuzzy Logic (Zadeh, 1965)Probabilistic Logic Programming, e.g.,

IBAL (Pfeffer, 2001), BLOG (Milch et al., 2005), MarkovLogic Networks (Richardson and Domingos, 2006), ProbLog(De Raedt et al., 2007) . . .

Inductive Logic Programming, e.g.,Plotkin (1970), Shapiro (1991), Muggleton (1991), De Raedt(1999) . . .Statistical Predicate Invention (Kok and Domingos, 2007)

Neural-symbolic ConnectionismPropositional rules: EBL-ANN (Shavlik and Towell, 1989),KBANN (Towell and Shavlik, 1994), C-LIP (Garcez andZaverucha, 1999)First-order inference (no training of symbol representations):Unification Neural Networks (Holldobler, 1990;Komendantskaya 2011), SHRUTI (Shastri, 1992), NeuralProlog (Ding, 1995), CLIP++ (Franca et al. 2014), LiftedRelational Networks (Sourek et al. 2015)

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 12/37

Page 14: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Neural Link Prediction

Real world knowledge bases (like Freebase) are incomplete!

placeOfBirth attribute is missing for 71% of people!

Commonsense knowledge often not stated explicitly

Weak logical relationships that can be used for inferring facts

melinda bill microsoft

seattle

spouseOf chairmanOf

headquarteredInlivesIn?

Predict livesIn(melinda, seattle) using local scoring function

f (vlivesIn, vmelinda, v seattle)

Das et al. (2016) 13/37

Page 15: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

State-of-the-art Neural Link Prediction

f (vlivesIn, vmelinda, v seattle)

DistMult (Yang et al., 2014)v s , v i , v j ∈ Rk

f (v s , v i , v j) = v>s (v i � v j)

=∑k

v skv ikv jk

ComplEx (Trouillon et al., 2016)v s , v i , v j ∈ Ck

f (v s , v i , v j) =

real(v s)>(real(v i )� real(v j))

+ real(v s)>(imag(v i )� imag(v j))

+ imag(v s)>(real(v i )� imag(v j))

− imag(v s)>(imag(v i )� real(v j))

Training Loss

L =∑

rs (ei ,ej ),y ∈ T

−y log (σ(f (v s , v i , v j)))− (1− y) log (1− σ(f (v s , v i , v j)))

Gradient-based optimization for learning v s , v i , v j from data

How do we calculate gradients ∇v sL, ∇v iL, ∇v jL?

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 14/37

Page 16: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Computation Graphs

x y

u1

dot

z

sigm

Example: z = f (x , y) = σ(x>y)

Nodes represent variables (inputs orparameters)

Directed edges to a node correspond to adifferentiable operation

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 15/37

Page 17: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Backpropagation

x y

u1

dot

z

sigm

∇z

∂z∂u1

∂u1

∂x

∂u1

∂y

Chain Rule of Calculus:Given function z = f (a) = f (g(b))

∇az =(∂b∂a

)>∇bz

Backpropagation is efficient recursiveapplication of the Chain Rule

Gradient of z = σ(x>y) w.r.t. x∇xz = ∂z

∂x = ∂z∂u1

∂u1∂x = σ(u1)(1− σ(u1))y

Given upstream supervision on z , we canlearn x and y !

Deep Learning = “Large” differentiable computation graphs

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 16/37

Page 18: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 17/37

Page 19: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Aims

“We are attempting to replace symbols by vectors so wecan replace logic by algebra.” — Yann LeCun

End-to-end-differentiable proving

Calculate gradient of proof success w.r.t. symbolrepresentations

Train symbol representations from facts and rules in aknowledge base via gradient descent

Use similarity of symbol representations during proofs

Induce rules of predefined structure via gradient descent

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 18/37

Page 20: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Neural Knowledge Base

Symbolic Representation

1 fatherOf(abe,homer).2 parentOf(homer, lisa).3 parentOf(homer,bart).4 grandpaOf(abe, lisa).5 grandfatherOf(abe,maggie).6 grandfatherOf(X1,Y1) :–

fatherOf(X1,Z1),parentOf(Z1,Y1).

7 grandparentOf(X2,Y2) :–grandfatherOf(X2,Y2).

Neural-Symbolic Representation

1 vfatherOf(vabe, vhomer).2 vparentOf(vhomer, v lisa).3 vparentOf(vhomer, vbart).4 vgrandpaOf(vabe, v lisa).5 vgrandfatherOf(vabe, vmaggie).6 vgrandfatherOf(X1,Y1) :–

vfatherOf(X1,Z1),vparentOf(Z1,Y1).

7 vgrandparentOf(X2,Y2) :–vgrandfatherOf(X2,Y2).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 19/37

Page 21: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Neural Unification

Soft-matching: τA,B = e−‖vA−vB‖2 ∈ [0, 1]

1 def unify(A, B, Ψ, τ):2 if Ψ = failure then return failure, 0;3 else if A is variable then4 return unifyvar(A, B, Ψ), τ

5 else if B is variable then6 return unifyvar(B, A, Ψ), τ

7 else if A = [a1, . . . , aN ] and B = [b1, . . . , bN ] are atoms then8 Ψ′, τ ′ ← unify([a2, . . . , aN ], [b2, . . . , bN ], Ψ, τ)

9 return unify(a1, b1, Ψ′, τ ′)

10 else if A and B are symbol representations then return Ψ, min(τ, τA,B);

11 else return failure, 0;

Example: unify vgrandfatherOf(X, vbart) with vgrandpaOf(vabe, vbart)

Ψ = {X/vabe}, τ = min(e−‖vgrandfatherOf−vgrandpaOf‖2 , e−‖vbart−vbart‖2 )

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 20/37

Page 22: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Compiling a Computation Graph using Backward Chaining

1 def or(KB, goal, Ψ, τ , D):2 for rule head :– body in KB do3 Ψ′, τ ′ ← unify(head, goal, Ψ, τ)

4 if Ψ′ 6= failure then5 for Ψ′′, τ ′′ in and(KB, body, Ψ′, τ ′, D) do6 yield Ψ′′, τ ′′

7 def and(KB, subgoals, Ψ, τ , D):8 if subgoals is empty then return Ψ, τ ;9 else if D = 0 then return failure;

10 else11 subgoal ← substitute(head(subgoals), Ψ)12 for Ψ′, τ ′ in or(KB, subgoal, Ψ, τ , D − 1) do13 for Ψ′′, τ ′′ in and(KB, tail(subgoals), Ψ′, τ ′, D) do yield

Ψ′′, τ ′′ ;

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 21/37

Page 23: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Example

Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–

vfatherOf(X,Z),vparentOf(Z,Y)

v s(v i , v j)?

{ }, τ1 { }, τ2 {X/v i ,Y/v j}, τ3

3.1 vfatherOf(v i ,Z)?

1 2 3

{X/v i ,Y/v j ,Z/vhomer}, τ31

3.2 vparentOf(vhomer, v j)?

1

failure

3

{X/v i ,Y/v j ,Z/vbart}, τ32

3.2 vparentOf(vbart, v j)?

2

{X/v i ,Y/v j ,Z/vhomer}, τ311

{X/v i ,Y/v j ,Z/vhomer}, τ312

failure

12

3

{X/v i ,Y/v j ,Z/vbart}, τ321

{X/v i ,Y/v j ,Z/vbart}, τ322

failure

12

3

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 22/37

Page 24: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Training

Proof Aggregation

Ψ, τ = or(KB,q, { }, 1,D)

τq = max τ

Supervision Signal

yq =

{1.0 if q ∈ F0.0 otherwise

Masking Unification for Training Facts

τq,B =

{0.0 if q ∈ F and q = B

τq,B otherwise

LossL =

∑q ∈ T

−yq log(τq)− (1− yq) log(1− τq)

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 23/37

Page 25: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Neural Inductive Logic Programming

1 v fatherOf(vabe, vhomer).2 v parentOf(vhomer, v lisa).3 v parentOf(vhomer, vbart).4 v grandpaOf(vabe, v lisa).5 v grandfatherOf(vabe, vmaggie).

6 θ1(X1,Y1) :–θ2(X1,Z1),θ3(Z1,Y1).

7 θ4(X2,Y2) :–θ5(X2,Y2).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 24/37

Page 26: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 25/37

Page 27: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Batch Proving: Utilizing GPUs

Let A ∈ RN×k be a matrix of N symbol representations that are tobe unified with M other symbol representations B ∈ RM×k

τA,B = e−√

Asq+Bsq−2AB>+ε ∈ RN×M

Asq =

∑k

i=1 A21i

...∑ki=1 A2

Ni

1>M ∈ RN×M

Bsq = 1N

∑k

i=1 B21i

...∑ki=1 B2

Mi

>

∈ RN×M

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 26/37

Page 28: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Batch Proving Example

Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–

vfatherOf(X,Z),vparentOf(Z,Y)

v s(v i , v j)?

{ },[τ1τ2

] 1,2

{X/v i ,Y/v j}, τ3

3.1 vfatherOf(v i ,Z)?

3

{X/v i ,Y/v j ,Z/

[vhomer

vbart

]},[τ31τ32

]3.2

[vparentOfvparentOf

]([vhomer

vbart

],[

v jv j

])?

failure

1,2 3

{X/v i ,Y/v j ,Z/

[vhomer

vbart

]},[τ311 τ321τ312 τ322

]failure

1,2 3

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 27/37

Page 29: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Gradient Approximation with Kmax Proofs

Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–

vfatherOf(X,Z),vparentOf(Z,Y)

v s(v i , v j)?

{ },[τ1τ2

] {X/v i ,Y/v j}, τ3

3.1 vfatherOf(v i ,Z)?

1,2 3

{X/v i ,Y/v j ,Z/ [vK ]} , [τ3K ]

3.2[vparentOf

]([vK ] , [v j ])?

failure

1,2 3

{X/v i ,Y/v j ,Z/ [vK ]} ,[τ3K1τ3K2

]failure

1,2 3

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 28/37

Page 30: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Regularization by Neural Link Predictor

Train jointly with neural link prediction method

Share symbol representations

Neural link prediction model quickly learns similarities betweensymbols

Let pq be score by neural link prediction model (DistMult orComplEx), and τq be the proof success

Multi-task training loss:

L =∑

q ∈ T−yq(log(τq) + log(pq))− (1− yq)(log(1− τq) + log(1− pq))

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 29/37

Page 31: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Outline

1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining

2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs

3 Deep Prolog: Neural Backward Chaining

4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor

5 Experiments

6 Summary

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 30/37

Page 32: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Experiments

Countries Knowledge Base (Bouchard et al., 2015)

Test Country

Train Country

Region

Subregion

neighborOf

locatedIn

locatedIn

locatedIn

locatedIn

locatedIn

locatedIn

locatedIn

locatedIn

Test Country

Train Country

Region

Subregion

neighborOf

locatedIn

locatedIn

locatedIn

locatedIn

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 31/37

Page 33: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Models

NTP: prover is trained alone

DistMult: neural link prediction model by Yang et al. (2014)

NTP DistMult: jointly training prover and DistMult, and usemaximum prediction at test time

NTP DistMult λ: only prover is used at test time; DistMult actsas a regularizer

ComplEx: neural link prediction model by Trouillon et al. (2016)

NTP ComplEx: jointly training prover and ComplEx, and use themaximum prediction at test time

NTP ComplEx λ: only prover is used at test time; ComplEx actsas a regularizer

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 32/37

Page 34: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Rule Templates

S1 θ1(X,Y) :– θ2(Y,Z).θ1(X,Y) :– θ2(X,Z), θ2(Z,Y).

S2 θ1(X,Y) :– θ2(X,Z), θ3(Z,Y).S3 θ1(X,Y) :– θ2(X,Z), θ3(Z,W), θ4(W,Y).

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 33/37

Page 35: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Results

Model S1 S2 S3

Random 32.3 32.3 32.3Frequency 32.3 32.3 30.8

ER-MLP (Dong et al., 2014) 96.0 74.5 65.0Rescal (Nickel et al., 2012) 99.7 74.5 65.0HolE (Nickel et al., 2015) 99.7 77.2 69.7TARE (Wang et al., 2017) 99.4 90.6 89.0

NTP 97.3 83.7 70.0

DistMult (Yang et al., 2014) 98.1 98.3 65.5NTP DistMult 99.2 96.7 87.0NTP DistMult λ 99.4 98.3 95.9

ComplEx (Trouillon et al., 2016) 99.9 97.1 78.6NTP ComplEx 100.0 98.9 89.1NTP ComplEx λ 99.3 98.2 95.1

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 34/37

Page 36: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Results

S1 S2 S3Task

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

AU

C

NTPDistMultComplExNTP DistMultNTP ComplExNTP DistMult NTP ComplEx

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 35/37

Page 37: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Induced Logic Programs

Task Confidence Rule

S1 0.999 neighborOf(X,Y) :–neighborOf(Y,X).

0.767 locatedIn(X,Y) :–locatedIn(X,Z),locatedIn(Z,Y).

S2 0.998 neighborOf(X,Y) :–neighborOf(Y,X).

0.995 locatedIn(X,Y) :–locatedIn(X,Z),locatedIn(Z,Y).

0.705 locatedIn(X,Y) :–neighborOf(X,Z),locatedIn(Z,Y).

S3 0.891 neighborOf(X,Y) :–neighborOf(Y,X).

0.750 locatedIn(X,Y) :–neighborOf(X,Z),neighborOf(Z,W),locatedIn(W,Y).

Test Country

Train Country

Region

Subregion

neighborOf

locatedIn

locatedIn

locatedIn

locatedIn

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 36/37

Page 38: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Summary

Prolog’s backward chaining can be used as a recipe forrecursively constructing a neural network

Proof success differentiable w.r.t. symbol representations

Can learn vector representations of symbols and rules ofpredefined structure

Various optimizations: batch proving, gradient approximation

Outperforms neural link prediction models on a medium-sizedknowledge base

Induces interpretable rules

Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 37/37

Page 39: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

Thank you!

http://rockt.github.com

tim [dot] rocktaeschel [at] gmail [dot] comTwitter: @ rockt

Page 40: Deep Prolog: End-to-end Di erentiable Proving in Knowledge ...aitp-conference.org/2017/slides/Tim_aitp.pdf · Deep Prolog: End-to-end Di erentiable Proving in Knowledge Bases Tim

References

Guillaume Bouchard, Sameer Singh, and Theo Trouillon. 2015. On approximate reasoning capabilities of low-rankvector spaces. AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): IntegratingSymbolic and Neural Approaches.

Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. 2016. Chains of reasoning over entities,relations, and text using recurrent neural networks. arXiv preprint arXiv:1607.01426.

Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, ShaohuaSun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 601–610. ACM.

Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2015. Holographic embeddings of knowledge graphs.arXiv preprint arXiv:1510.04935.

Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing yago: scalable machine learning forlinked data. In Proc. of International Conference on World Wide Web (WWW), pages 271–280.

Tim Rocktaschel. 2017. Combining Representation Learning with Logic for Language Processing. Ph.D. thesis,University College London, Gower Street, London WC1E 6BT, United Kingdom.

Tim Rocktaschel and Sebastian Riedel. 2016. Learning knowledge base inference with neural theorem provers. InNAACL Workshop on Automated Knowledge Base Construction (AKBC).

Theo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. 2016. Complexembeddings for simple link prediction.

Mengya Wang, Hankui Zhuo, and Huiling Zhu. 2017. Embedding knowledge graphs based on transitivity andantisymmetry of rules. arXiv preprint arXiv:1702.07543.

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding entities and relations forlearning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.