Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Neural-Symbolic Integration
Steffen HolldoblerInternational Center for Computational LogicTechnische Universitat DresdenGermany
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science1
Introduction & Motivation: Overview
I Introduction & Motivation
I Propositional Logic
. Existing Approaches
. Proposititonal Logic Programs and the Core Method
I First-Order Logic
. Existing Approaches
. First-Order Logic Programs and the Core Method
I The Neural-Symbolic Learning Cycle
I Challenge Problems
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science2
Introduction & Motivation: Connectionist Systems
I Well-suited to learn, to adapt to new environments, to degrade gracefully etc.
I Many successful applications.
I Approximate functions.
. Hardly any knowledge about the functions is needed.
. Trained using incomplete data.
I Declarative semantics is not available.
I Recursive networks are hardly understood.
I McCarthy 1988: We still observe a propositional fixation.
I Structured objects are difficult to represent.
. Smolensky 1987: Can we instantiate the power of symboliccomputation within fully connectionist systems?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science3
Introduction & Motivation: Logic Systems
I Well-suited to represent and reason aboutstructured objects and structure-sensitive processes.
I Many successful applications.
I Direct implementation of relations and functions.
I Explicit expert knowledge is required.
I Highly recursive structures.
I Well understood declarative semantics.
I Logic systems are brittle.
I Expert knowledge may not be available.
. Can we instantiate the power of connectionist computationwithin a logic system?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science4
Introduction & Motivation: Objective
I Seek the best of both paradigms!
I Understanding the relation between connectionist and logic systems.
I Contribute to the open research problems of both areas.
I Well developed for propositional case.
I Hard problem: going beyond.
I In this lecture:
. Overview on existing approaches.
. Logic programs and recurrent networks.
. Semantic operators for logic programs can be computedby connectionist systems.
. Semantic operators can be learned.
. Logic programs can be extracted.
Introduction & Motivation: Objective
I Seek the best of both paradigms!
I Understanding the relation between connectionist and logic systems.
I Contribute to the open research problems of both areas.
I Well developed for propositional case.
I Hard problem: going beyond.
I In this lecture:
. Overview on existing approaches.
. Logic programs and recurrent networks.
. Semantic operators for logic programs can be computedby connectionist systems.
. Semantic operators can be learned.
. Logic programs can be extracted.
Neural Symbolic Integration using the Core Method
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science5
Connectionist Networks
I A connectionist network consists of
. a set U of units and
. a set W ⊆ U × U of connections.
I Each connection is labeled by a weight w ∈ R.
I If there is a connection from unit uj to uk, then wkj is its associated weight.
Connectionist Networks
I A connectionist network consists of
. a set U of units and
. a set W ⊆ U × U of connections.
I Each connection is labeled by a weight w ∈ R.
I If there is a connection from unit uj to uk, then wkj is its associated weight.
I A unit is specified by
. an input vector~i = (i1, . . . , im), ij ∈ R, 1 ≤ j ≤ m,
. an activation function Φ mapping~i to a potential p ∈ R,
. an output function Ψ mapping p to an (output) value v ∈ R.
Connectionist Networks
I A connectionist network consists of
. a set U of units and
. a set W ⊆ U × U of connections.
I Each connection is labeled by a weight w ∈ R.
I If there is a connection from unit uj to uk, then wkj is its associated weight.
I A unit is specified by
. an input vector~i = (i1, . . . , im), ij ∈ R, 1 ≤ j ≤ m,
. an activation function Φ mapping~i to a potential p ∈ R,
. an output function Ψ mapping p to an (output) value v ∈ R.
I If there is a connection from uj to uk
then wkjvj is the input received by uk along this connection.
I The potential and value of a unit are synchronously recomputed (or updated).
I Often a linear time t is added as parameter to input, potential and value.
I The state of a network with units u1, . . . , un at time t is (v1(t), . . . , vn(t)).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science6
A Simple Connectionist Network
v3 v4
m mu3 u4
? ?
m mu1 u2
? ?
w31 w42
@@w43
����
�����@@R
��w34
HHHH
HHHHH��
w34, w43 = −0.5w31, w42 = 1
pi(t + 1) = pi(t) +P4
j=1 wijvj(t)
vi(t) = round(pi(t))
v1(t) =
6 if t = 02 otherwise
v2(t) =
5 if t = 02 otherwise
I What happens if the network is synchronously updated?
A Simple Connectionist Network
v3 v4
m mu3 u4
? ?
m mu1 u2
? ?
w31 w42
@@w43
����
�����@@R
��w34
HHHH
HHHHH��
w34, w43 = −0.5w31, w42 = 1
pi(t + 1) = pi(t) +P4
j=1 wijvj(t)
vi(t) = round(pi(t))
v1(t) =
6 if t = 02 otherwise
v2(t) =
5 if t = 02 otherwise
I What happens if the network is synchronously updated?
I A winner-take-all network is a synchronously updated connectionist network of n
units (not counting input units) such that after each unit receives an initial inputat t = 0 eventually only the unit with the highest initial input produces a valuegreater than 0 whereas the value of all other units is 0.
I Exercise Construct a winner-take-all network of 3 units.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science7
Literature
I Feldman, Ballard 1982: Connectionist Models and Their Properties.Cognitive Science 6 (3), 205-254.
I McCarthy 1988: Epistemological Challenges for Connectionism.Behavioural and Brain Sciences 11, 44.
I Smolensky 1987: On Variable Binding and the Representation of Symbolic Struc-tures in Connectionist Systems. Report No. CU-CS-355-87, Department of Com-puter Science & Institute of Cognitive Science, University of Colorado, Boulder.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science8
Propositional Logic
I Existing Approaches
. Finite Automata and McCulloch-Pitts Networks
. Weighted Automata and Semiring Artificial Neural Networks
. Propositional Reasoning and Symmetric/Stochastic Networks
. Other Approaches
I Proposititonal Logic Programs and the Core Method
. The Very Idea
. Logic Programs
. Propositional Core Method
. Backpropagation
. Knowledge-Based Artificial Neural Networks
. Propositional Core Method using Sigmoidal Units
. Further Extensions
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science9
McCulloch-Pitts Networks
I McCulloch, Pitts 1943:Can the activities of nervous systems be modelled by a logical calculus?
I A McCulloch-Pitts network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connections.
I The set UI of input units is defined as UI = {uk ∈ U | (∀uj ∈ U) wkj = 0}.I The set UO of output units is defined as UO = {uj ∈ U | (∀uk ∈ U) wkj = 0}.
McCulloch-Pittsnetwork
-
-
...UI
-
-
... UO
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science10
Binary Threshold Units
I uk is a binary threshold unit if
Φ(~ik) = pk =Pm
j=1 wkjvj
Ψ(pk) = vk =
1 if pk ≥ θk
0 otherwise
where θk ∈ R is a threshold.
I Three binary threshold units:
v1 -w21 = −1 θ2
= −0.5
u2
- v2 = ¬v1
Binary Threshold Units
I uk is a binary threshold unit if
Φ(~ik) = pk =Pm
j=1 wkjvj
Ψ(pk) = vk =
1 if pk ≥ θk
0 otherwise
where θk ∈ R is a threshold.
I Three binary threshold units:
v1 -w21 = −1 θ2
= −0.5
u2
- v2 = ¬v1
v2 -
w32 = 1
v1 -w31 = 1
θ3 = 0.5
u3
- v3 = v1 ∨ v2
Binary Threshold Units
I uk is a binary threshold unit if
Φ(~ik) = pk =Pm
j=1 wkjvj
Ψ(pk) = vk =
1 if pk ≥ θk
0 otherwise
where θk ∈ R is a threshold.
I Three binary threshold units:
v1 -w21 = −1 θ2
= −0.5
u2
- v2 = ¬v1
v2 -
w32 = 1
v1 -w31 = 1
θ3 = 0.5
u3
- v3 = v1 ∨ v2v2 -
w32 = 1
v1 -w31 = 1
θ3 = 1.5
u3
- v3 = v1 ∧ v2
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science11
A Simple McCulloch-Pitts Network
I Example Consider the following network of logical threshold units:
"!#
"!#
.5 .5u1 u3
-
-
1
1"!#
"!#
.5 .5
u2 u4
����
��������*HHHH
HHHHHH
HHj
-1-1
"!#
.5u5������������:
XXXXXXXXXXXXz
1
1
I Exercise
. Specify UI and UO.
. What is computed by the network if all units are updated synchronously?
. Specify the states of the network ignoring input and output units.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science12
Finite Automata
I A finite automaton consists of:
. Σ, a finite set of input symbols,
. Φ, a finite set of output symbols,
. Q, a finite set of states,
. q0 ∈ Q, an initial state,
. F ⊂ Q, a set of final states
. δ : Q× Σ→ Q, a state transition function,
. ρ : Q→ Φ, an output function.
I Exercise Let Σ = Φ = {1, 2}, Q = {p, q, r}, F = {r}, q0 = p,
ρ p q r
1 1 2, δ 1 2
p q p
q r q
r r r
.
What is computed by this automaton?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science13
Finite Automata and McCulloch-Pitts Networks
I Theorem McCulloch-Pitts networks are finite automata and vice versa.
I Proof
⇒ Exercise⇐ Let T = (Σ, Φ, Q, q0, F, δ, ρ) an automaton with
• Σ = {b1, . . . , bm},• Φ = {c1, . . . , cr},• Q = {q0, . . . , qk−1}.
To show there exists network N with
• inputs {b′1, . . . , b′m},• outputs {c′1, . . . , c′r},• states {q′0, . . . , q′k−1} such that
if T generates cj1, . . . , cjn given bj1, . . . , bjn
then N generates c′j1, . . . , c′jngiven b′j1, . . . , b′jn
.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science14
Construction of the Network: Inputs and Outputs
I Remember |Σ| = m, |Φ| = r.
Construction of the Network: Inputs and Outputs
I Remember |Σ| = m, |Φ| = r.
I Inputs x1, . . . , xm with b′j = ~x where
xi =
1 if i = j,
0 otherwise.
Construction of the Network: Inputs and Outputs
I Remember |Σ| = m, |Φ| = r.
I Inputs x1, . . . , xm with b′j = ~x where
xi =
1 if i = j,
0 otherwise.
I Outputs y1, . . . , yr with c′j = ~y where
yi =
1 if i = j,
0 otherwise.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science15
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
I qb-units represent that T in state q receives input b (k×m units).
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
I qb-units represent that T in state q receives input b (k×m units).
I c-units represent output c (r units).
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
I qb-units represent that T in state q receives input b (k×m units).
I c-units represent output c (r units).
I Connections
. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in
vuq∗b∗(t + 1) =
1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
I qb-units represent that T in state q receives input b (k×m units).
I c-units represent output c (r units).
I Connections
. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in
vuq∗b∗(t + 1) =
1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.
. Let {l1, . . . , ln(l)} = {(q, b) | ρ(q) = c} in
vuc(t + 1) =
1 if l1(t) ∨ . . . ∨ ln(l)(t),0 otherwise.
Construction of the Network: Units and Connections
I Remember |Σ| = m, |Φ| = r, |Q| = k.
I qb-units represent that T in state q receives input b (k×m units).
I c-units represent output c (r units).
I Connections
. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in
vuq∗b∗(t + 1) =
1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.
. Let {l1, . . . , ln(l)} = {(q, b) | ρ(q) = c} in
vuc(t + 1) =
1 if l1(t) ∨ . . . ∨ ln(l)(t),0 otherwise.
I The theorem follows by induction on the length of the input sequence.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science16
Exercises
I Specify the automaton corresponding to the sample network.
I Specify the network corresponding to the sample finite automaton.
I Complete the proof of the theorem.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science17
Some Remarks on McCulloch-Pitts Networks
I McCulloch-Pitts networks are not just simple reactive systems, but theirbehavior depends on previous inputs as well as the activity within the network.
. Example
x -
-
1
����0.5 -
1 ����0.5 - y
I Literature
. Arbib: Brains, Machines and Mathematics. Springer, 2nd edition (1987).
. McCulloch & Pitts: A Logical Calculus and the Ideas Immanent in theNervous Activity. Bulletin of Mathematical Biophysics 5, 115-133 (1943).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science18
Weighted Automata and Semiring Artificial Neural Networks
I Bader, Holldobler, Scalzitti 2004:Can the result by McCulloch and Pitts be extended to weighted automata?
I Let (K,⊕,�, 0K, 1K) be a semiring.
I uk is a⊕-unit ifΦ(~ik) = pk =
Lmj=1 wkj � vj
Ψ(pk) = vk = pk
I uk is a�-unit ifΦ(~ik) = pk =
Jmj=1 wkj � vj
Ψ(pk) = vk = pk
I A semiring artificial neural network consists of a set U of⊕- and�-unitsand a set W ⊆ U × U of K-weighted connections.
I Theorem Weighted automata are semiring artificial neural networks.
I Literature Bader, Holldobler, Scalzitti 2004: Semiring Artificial Neural Networksand Weighted Automata – and an Application to Digital Image Encoding.In: KI 2004: Advances in Artificial Intelligence,Lecture Notes in Artificial Intelligence 3238, 281-294.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science19
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5}5
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5}5 }1 m12
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5}5 }1 m12}1 ml1
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5}5 }1 m12}1 ml1}m1 }1
Symmetric Networks
I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?
I Original application: associative memory.
I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.
I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.
m0
m0
−1 m5
�����
������
2
HHHHH
HHHHHH
2
m02
}0
}0
ml0}0
}0}0m
}m0
}0
ml0}0ml5}5 }1 m12}1 ml1}m1 }1
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science20
Energy Minimization
I What happens precisely when a symmetric network is updated?
I Consider the energy function
E(t) = −12
Pk,j wkjvj(t)vk(t) +
Pk θkvk(t)
= −P
k<j wkjvj(t)vk(t) +P
k θkvk(t)
describing the state of the network at time t.
I We assume wii = 0 for all units i in the network.
I Exercise
. Specify E(t) for the symmetric networks on the previos page.
Energy Minimization
I What happens precisely when a symmetric network is updated?
I Consider the energy function
E(t) = −12
Pk,j wkjvj(t)vk(t) +
Pk θkvk(t)
= −P
k<j wkjvj(t)vk(t) +P
k θkvk(t)
describing the state of the network at time t.
I We assume wii = 0 for all units i in the network.
I Exercise
. Specify E(t) for the symmetric networks on the previos page.
. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?
Energy Minimization
I What happens precisely when a symmetric network is updated?
I Consider the energy function
E(t) = −12
Pk,j wkjvj(t)vk(t) +
Pk θkvk(t)
= −P
k<j wkjvj(t)vk(t) +P
k θkvk(t)
describing the state of the network at time t.
I We assume wii = 0 for all units i in the network.
I Exercise
. Specify E(t) for the symmetric networks on the previos page.
. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?
I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).
Energy Minimization
I What happens precisely when a symmetric network is updated?
I Consider the energy function
E(t) = −12
Pk,j wkjvj(t)vk(t) +
Pk θkvk(t)
= −P
k<j wkjvj(t)vk(t) +P
k θkvk(t)
describing the state of the network at time t.
I We assume wii = 0 for all units i in the network.
I Exercise
. Specify E(t) for the symmetric networks on the previos page.
. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?
I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).
I Exercise Does this theorem still hold if we drop the assumption that wij = wji?
Energy Minimization
I What happens precisely when a symmetric network is updated?
I Consider the energy function
E(t) = −12
Pk,j wkjvj(t)vk(t) +
Pk θkvk(t)
= −P
k<j wkjvj(t)vk(t) +P
k θkvk(t)
describing the state of the network at time t.
I We assume wii = 0 for all units i in the network.
I Exercise
. Specify E(t) for the symmetric networks on the previos page.
. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?
I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).
I Exercise Does this theorem still hold if we drop the assumption that wij = wji?
I Exercise How plausible is the assumption that wij = wji?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science21
Stochastic Networks or Boltzmann Machines
I Hinton, Sejnowski 1983: Can we escape local minima?
I A stochastic network is a symmetric network,but the values are computed probabilistically
P (vk = 1) =1
1 + e(θk−pk)/T
where T is called pseudo temperature.
I In equilibrium stochastic networks are more likely to be in a state with low energy.
I Kirkpatrick etal. 1983: Can we compute a global minima?
I Simulated annealing decrease T gradually.
I Theorem (Geman, Geman 1984)A global minima is reached if T is decreased in infinitesimal small steps.
I Applications Combinatorial optimization problems like thetravelling salesman problem or graph bipartitioning problem.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science22
Literature
I Geman, Geman 1984: Stochastic Relaxation, Gibbs Distribution, and the BayesianRestoration of Image. IEEE Transactions on Pattern Analysis and Machine Intelli-gence 6, 721-741.
I Hinton, Sejnowski 1983: Optimal Perceptual Inference. In: Proceedings of theIEEE Conference on Computer Vision and Recognition, 448-453.
I Hopfield 1982: Neural Networks and Physical Systems with Emergent CollectiveComputational Abilities. In: Proceedings of the National Academy of SciencesUSA, 2554-2558.
I Kirkpatrick etal. 1983: Optimization by Simulated Annealing. Science 220, 671-680.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science23
Propositional Logic
I Variables are p1, . . . , pn.
I Connectives are ¬,∨,∧.
I Atoms are variables.
I Literals are atoms and negated atoms.
I Clauses are (generalized) disjunctions of literals.
I Formulas in clause form are (generalized) conjunctions of clauses.
Propositional Logic
I Variables are p1, . . . , pn.
I Connectives are ¬,∨,∧.
I Atoms are variables.
I Literals are atoms and negated atoms.
I Clauses are (generalized) disjunctions of literals.
I Formulas in clause form are (generalized) conjunctions of clauses.
I Notation Sometimes variables are denoted by different lettersif there is a bijection between these letters and p1, . . . , pn.
I Example
(¬o ∨m) ∧ (¬s ∨ ¬m) ∧ (¬c ∨m) ∧ (¬c ∨ s) ∧ (¬v ∨ ¬m)
which is abbreviated by
〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science24
Interpretations and Models
I Notation (all symbols may be indexed)
. A denotes an atom.
. L denotes a literal.
. F, G denote formulas.
. C denotes a clause.
Interpretations and Models
I Notation (all symbols may be indexed)
. A denotes an atom.
. L denotes a literal.
. F, G denote formulas.
. C denotes a clause.
I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.
. They can be encoded as ~v.
Interpretations and Models
I Notation (all symbols may be indexed)
. A denotes an atom.
. L denotes a literal.
. F, G denote formulas.
. C denotes a clause.
I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.
. They can be encoded as ~v.
. They are extended to formulas as follows:
pi(~v) = vi
(¬F )(~v) = 1− F (~v)(F ∧G)(~v) = F (~v)×G(~v)(F ∨G)(~v) = F (~v) + G(~v)− F (~v)×G(~v)
Interpretations and Models
I Notation (all symbols may be indexed)
. A denotes an atom.
. L denotes a literal.
. F, G denote formulas.
. C denotes a clause.
I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.
. They can be encoded as ~v.
. They are extended to formulas as follows:
pi(~v) = vi
(¬F )(~v) = 1− F (~v)(F ∧G)(~v) = F (~v)×G(~v)(F ∨G)(~v) = F (~v) + G(~v)− F (~v)×G(~v)
I ~v is a model for F iff F (~v) = 1.
I F is satisfiable if it has a model.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science25
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))
= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))
= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))
= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))
= ((1− 1) + 0− (1− 1)× 0)× (1 + (1− 0)− 1× (1− 0))= 0× 1
Interpretations and Models – Example
I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:
F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))
= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))
= ((1− 1) + 0− (1− 1)× 0)× (1 + (1− 0)− 1× (1− 0))= 0× 1= 1
I Hence, ~v is not a model for F , but is a model for [p3,¬p2].
I Exercise
. Is F satisfiable? Prove your claim.
. Is 〈[¬p], [p,¬q], [q]〉 satisfiable? Prove your claim.
. Find all models of 〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science26
Propositional Reasoning and Energy Minimization
I Pinkas 1991:Is there a link between propositional logic and symmetric networks?
I Let F = 〈C1, . . . , Cm〉 be a propositional formula in clause form.
I We define
τ (C) =
8>><>>:0 if C = [ ],A if C = [A],1−A if C = [¬A],τ (C1) + τ (C2)− τ (C1)τ (C2) if C = (C1 ∨ C2).
τ (F ) =Pm
i=1(1− τ (Ci))
I Example τ (〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉)= vm− cm− cs + sm− om + 2c + o.
I Exercise Compute τ (〈[¬p], [p,¬q], [q]〉).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science27
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
������
������
1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
������
������
1
2
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
������
������
1
2
1
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
������
������
1
2
1
0
0
0
Propositional Reasoning and Symmetric Networks
I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.
I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o
with E(~v) = −P
k<j wkjvjvk +P
k θkvk.
mu1 = o
mu2 = m
mu3 = s
mu5 = v
mu4 = c
������������
−1
AA
AA
AA
AA
AA
AA
1
��
��
��
1
HHHHHH
HHHHHH
−1
������
������
1
2
1
0
0
0
}0
}0
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science28
Propositional Non-Monotonic Reasoning
I Pinkas 1991a:Can the above mentioned approach be extended to non-monotonic reasoning?
I Consider F = 〈(C1, k1), . . . , (Cm, km)〉, where Ci are clauses and ki ∈ R+.
I The penalty of ~v for (C, k) is k if C(~v) = 0 and 0 otherwise.
I The penalty of ~v for F is the sum of the penalties for (Ci, ki).
I ~v is preferred over ~w wrt F
if the penalty of ~v for F is smaller than the penalty of ~w for F .
I Modify τ to become τ (F ) =Pm
i=1 ki(1− τ (Ci)), e.g.,
τ (〈([¬o, m], 1), ([¬s,¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v,¬m], 4)〉)= 4vm− 4cm− 4cs + 2sm− om + 8c + o.
I The corresponding stochastic network computes most preferred interpretations.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science29
Exercises and Literature
I Exercise Consider
F = 〈([¬o, m], 1), ([¬s,¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v,¬m], 4)〉.
. Compute the most preferred interpretations of F .
. What happens if we add (o, 100) to F ?
. What happens if we add (o, 100) and (s, 100) to F ?
I Literature
. Pinkas 1991: Symmetric Neural Networks and Logic Satisfiability. NeuralComputation 3, 282-291.
. Pinkas 1991a: Propositional Non-Monotonic Reasoning and Inconsistencyin Symmetrical Neural Networks. In: Proceedings International Joint Con-ference on Artificial Intelligence, 525-530.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science30
Proposititonal Logic Programs and the Core Method
I The Very Idea
I Logic Programs
I Propositional Core Method
I Backpropagation
I Knowledge-Based Artificial Neural Networks
I Propositional Core Method using Sigmoidal Units
I Further Extensions
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science31
The Very Idea
I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).
The Very Idea
I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).
I Banach Contraction Mapping Theorem A contraction mapping f defined ona complete metric space (X, d) has a unique fixed point. The sequencey, f(y), f(f(y)), . . . converges to this fixed point for any y ∈ X.
. Fitting 1994: Consider logic programs,whose immediate consequence operator is a contraction.
The Very Idea
I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).
I Banach Contraction Mapping Theorem A contraction mapping f defined ona complete metric space (X, d) has a unique fixed point. The sequencey, f(y), f(f(y)), . . . converges to this fixed point for any y ∈ X.
. Fitting 1994: Consider logic programs,whose immediate consequence operator is a contraction.
I Funahashi 1989: Every continuous function on the reals can be uniformly approx-imated by feedforward connectionist networks.
. Holldobler, Kalinke, Storr 1999: Consider logic programs,whose immediate consequence operator is continuous on the reals.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science32
Metrics
I A metric on a space M is a mapping d : M ×M → R such that
. d(x, y) = 0 iff x = y,
. d(x, y) = d(y, x), and
. d(x, y) ≤ d(x, z) + d(z, y).
I Let (M, d) be a metric space and S = (si | si ∈M) a sequence.
. S converges if (∃s ∈M)(∀ε > 0)(∃N)(∀n ≥ N) d(sn, s) ≤ ε.
. S is Cauchy if (∀ε > O)(∃N)(∀n, m ≥ N) d(sn, sm) ≤ ε.
. (M, d) is complete if every Cauchy sequence converges.
I A mapping f : M →M is a contraction on (M, d)if (∃0 < k < 1)(∀x, y ∈M) d(f(x), f(y)) ≤ k · d(x, y).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science33
Propositional Logic Programs
I A propositional logic programP over a propositional language Lis a finite set of clauses
A← L1 ∧ . . . ∧ Ln,
where A is an atom, Li are literals and n ≥ 0.P is definite if all Li, 1 ≤ i ≤ n are atoms.
I Let V be the set of all propositional variables occurring in L.
I An interpretation I is a mapping V → {>,⊥}.I I can be represented by the set of atoms which are mapped to> under I.
I 2V is the set of all interpretations.
I Immediate consequence operator TP : 2V → 2V:
TP(I) = {A | there is a clause A← L1 ∧ . . . ∧ Ln ∈ P
such that I |= L1 ∧ . . . ∧ Ln}.
I I is a supported model iff TP(I) = I.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science34
Exercises
I ConsiderP = {p, q ← p, r ← q}
. Draw the lattice of all interpretations ofP wrt the⊆ ordering.
. Mark the models ofP .
. Compute TP(∅), TP(TP(∅)), . . ..
. Mark the supported models ofP .
I LetP be a definite program.
. Show that if M1 and M2 are models ofP then so is M1 ∩M2.
. Let M be the least model ofP . Show that M is a supported model.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science35
The Core Method
I Let L be a logic language.
I Given a logic programP together with immediate consequence operator TP .
I Let I be the set of interpretations forP .
I Find a mapping R : I → Rn.
I Construct a feed-forward network computing fP : Rn → Rn, called the core,such that the following holds:
. If TP(I) = J then fP(R(I)) = R(J), where I, J ∈ I.
. If fP(~s) = ~t then TP(R−1(~s)) = R−1(~t), where ~s,~t ∈ Rn.
I Connect the units in the output layer recursively to the units in the input layer.
I Show that the following holds
. I = lfp (TP) iff the recurrent network converges to or approximates R(I).
The Core Method
I Let L be a logic language.
I Given a logic programP together with immediate consequence operator TP .
I Let I be the set of interpretations forP .
I Find a mapping R : I → Rn.
I Construct a feed-forward network computing fP : Rn → Rn, called the core,such that the following holds:
. If TP(I) = J then fP(R(I)) = R(J), where I, J ∈ I.
. If fP(~s) = ~t then TP(R−1(~s)) = R−1(~t), where ~s,~t ∈ Rn.
I Connect the units in the output layer recursively to the units in the input layer.
I Show that the following holds
. I = lfp (TP) iff the recurrent network converges to or approximates R(I).
Connectionist model generation using recurrent networks with feed forward core.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science36
3-Layer Recurrent Networks
input layer
hidden layer
output layer
3-Layer Recurrent Networks
input layer
hidden layer
output layer
�������
�������
��
��
���
��
��
���
��
��
��
���36
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I6
QQQk
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I
QQQk 6
6
�������
�������
��
��
���
��
��
���
��
��
���
��
��
��
���3
����������
AAAAAAAAAA core
. . .
. . .
3-Layer Recurrent Networks
input layer
hidden layer
output layer
�������
�������
��
��
���
��
��
���
��
��
��
���36
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I6
QQQk
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I
QQQk 6
6
�������
�������
��
��
���
��
��
���
��
��
���
��
��
��
���3
����������
AAAAAAAAAA core
. . .
. . .
6 6���� 6 6����
���� ����
. . .
. . .
3-Layer Recurrent Networks
input layer
hidden layer
output layer
�������
�������
��
��
���
��
��
���
��
��
��
���36
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I6
QQQk
AA
AA
AAK
AA
AA
AAK
@@
@@
@@I
@@
@@
@@I
QQQk 6
6
�������
�������
��
��
���
��
��
���
��
��
���
��
��
��
���3
����������
AAAAAAAAAA core
. . .
. . .
6 6���� 6 6����
���� ����
. . .
. . .
I At each point in time all units do:
. apply activation function to obtain potential,
. apply output function to obtain output.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science37
Propositional Core Method using Binary Threshold Units
I Let L be the language of propositional logic over a set V of variables.
I LetP be a propositional logic program, e.g.,
P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.
I I = 2V is the set of interpretations forP .
I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.
TP(∅) = {p}
Propositional Core Method using Binary Threshold Units
I Let L be the language of propositional logic over a set V of variables.
I LetP be a propositional logic program, e.g.,
P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.
I I = 2V is the set of interpretations forP .
I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.
TP(∅) = {p}TP({p}) = {p, r}
Propositional Core Method using Binary Threshold Units
I Let L be the language of propositional logic over a set V of variables.
I LetP be a propositional logic program, e.g.,
P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.
I I = 2V is the set of interpretations forP .
I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.
TP(∅) = {p}TP({p}) = {p, r}TP({p, r}) = {p, r} = lfp (TP)
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science38
Representing Interpretations
I I = 2V
I Let n = |V| and identify V with {1, . . . , n}.I Define R : I → Rn such that for all 1 ≤ j ≤ n we find:
R(I)[j] =
1 if j ∈ I,
0 if j 6∈ I.
E.g., if V = {p, q, r} = {1, 2, 3} and I = {p, r} then R(I) = (1, 0, 1).
I Other encodings are possible, e.g.,
R(I)[j] =
1 if j ∈ I,
−1 if j 6∈ I.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science39
Computing the Core
I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:
p q r
12
12
12 input layer
ω2
ω2
ω2 output layer
hidden layer
p q r
Computing the Core
I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:
p q r
12
12
12 input layer
ω2
ω2
ω2 output layer
hidden layer
p q r
−ω2
6
ω
Computing the Core
I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:
p q r
12
12
12 input layer
ω2
ω2
ω2 output layer
hidden layer
p q r
−ω2
6
ωω
ω2
�������6
−ω
�������
Computing the Core
I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:
p q r
12
12
12 input layer
ω2
ω2
ω2 output layer
hidden layer
p q r
−ω2
6
ωω
ω2
�������6
−ω
�������
−ω
ω2
��
��
���
�������
6
I Exercise Specify the core for {p1 ← p2, p1 ← p3 ∧ p4, p1 ← p5 ∧ p6}.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science40
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
ω2
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
ω2
12
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
ω2
12
ω2
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
ω2
12
ω2
ω2
Some Results
I Proposition 2-layer networks cannot compute TP for definiteP .
I Theorem For each programP , there exists a core computing TP .
I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:
12
12
12
ω2
ω2
ω2
−ω2
6
ω2
�������6
�������
ω2
��
��
���
�������
6
666
1−ω2
ω2
12
ω2
ω2
12
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science41
Strongly Determined Programs
I A logic programsP is said to be strongly determined if there exists a metric d onthe set of all Herbrand interpretations forP such that TP is a contraction wrt d.
I Exercise Are the following programs strongly determined?
. {p, q ← p, r ← q},
. {p1 ← p2, p1 ← p3 ∧ p4, p1 ← p5 ∧ p6},
. {p← ¬p}.
I Corollary Let P be a strongly determined program. Then there exists a corewith recurrent connections such that the computation with an arbitrary initial inputconverges and yields the unique fixed point of TP .
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science42
Time and Space Complexity
I Let n be the number of clausesand m be the number of propositional variables occurring inP .
. 2m + n units, 2mn connections in the core.
. TP(I) is computed in 2 steps.
. The parallel computational model to compute TP(I) is optimal.
. The recurrent network settles down in 3n steps in the worst case.
I Exercise Give an example of a program with worst case time behavior.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science43
Rule Extraction (1)
I PropositionFor each core C there exists a programP such that C computes TP .
-0.2 0.2
-0.40.3 0.6
u6
u3 u4
u1 u2
u5
u7
2
0.7 0 -1 -0.2
1
1 -2-0.5 1.5
0.3 0.8
u1 u2 u3 u4 u5 u6 u7
p3 v3 p4 v4 p5 v5 p6 v6 p7 v7
0 0 0 0 0 1 0 0 0 1 −1 00 1 1.5 1 .3 1 .8 1 1.8 1 .7 11 0 1 1 −1 0 −.5 0 2 1 .7 11 1 2.5 1 −.7 0 .3 0 2 1 .7 1
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science44
Rule Extraction (2)
I Extracted program:
P = { q1 ← ¬q1 ∧ ¬q2,
q1 ← ¬q1 ∧ q2, q2 ← ¬q1 ∧ q2,
q1 ← q1 ∧ ¬q2, q2 ← q1 ∧ ¬q2,
q1 ← q1 ∧ q2, q2 ← q1 ∧ q2 }.
I Simplified form:P = {q1, q2 ← q1, q2 ← ¬q1 ∧ q2}.
I You can do much better compared to this simple approach(see Mayer-Eichberger 2006).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science45
Literature
I Apt, van Emden 1982: Contributions to the Theory of Logic Programming. Journalof the ACM 29, 841-862.
I Fitting 1994: Metric Methods – Three Examples and a Theorem. Journal of LogicProgramming 21, 113-127.
I Funahashi 1989: On the Approximate Realization of Continuous Mappings byNeural Networks. Neural Networks 2, 183-192.
I Hitzler, Holldobler, Seda 2004: Logic Programs and Connectionist Networks.Journal of Applied Logic 2, 245-272.
I Holldobler, Kalinke 1994: Towards a Massively Parallel Computational Model forLogic Programming. In: Proceedings of the ECAI94 Workshop on Combining Sym-bolic and Connectionist Processing, 68-77.
I Holldobler, Kalinke, Storr 1999: Approximating the Semantics of Logic Programsby Recurrent Neural Networks. Applied Intelligence 11, 45-59.
I Markus-Eichberger 2006: Extracting Propositional Logic Programs from NeuralNetworks: A Decompositional Approach. Bacherlor Thesis TU Dresden.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science46
3-Layer Feed-Forward Networks Revisited
I Theorem (Funahashi 1989) Suppose that Ψ : R → R is non-constant, bounded,monotone increasing and continuous. Let K ⊆ Rn be compact, let f : K → Rbe continuous, and let ε > 0. Then there exists a 3-layer feed-forward networkwith output function Ψ for the hidden layer and linear output function for the inputand output layer whose input-output mapping f : K → R satisfies
maxx∈K|f(x)− f(x)| < ε.
. Every continuous function f : K → R can be uniformly approximated byinput-output functions of 3-layer feed-forward networks.
I uk is a sigmoidal unit if
Φ(~ik) = pk =Pm
j=1 wkjvj
Ψ(pk) = vk = 1
1+eβ(θk−pk)
where θk ∈ R is a threshold (or bias) and β > 0 a steepness parameter.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science47
Backpropagation
I Bryson, Ho 1969, Werbos 1974, Parker 1985, Rumelhart, etal. 1986:Can 3-layer feed-forward networks learn a particular function?
I Training set of input-output pairs {(~il, ~ol) | 1 ≤ l ≤ n}.I Minimize E =
Pl El where El = 1
2
Pk(o
lk − vl
k)2.
I Gradient descent algorithm to learn appropriate weights.
I Backpropagation
. Initialize weights arbitrarily.
. Do until all input-output patterns are correctly classified.
1 Present input pattern ~il at time t.2 Compute output pattern ~vl at time t + 2.3 Change weights according to ∆wl
ij = ηδliv
lj, where
δli =
Ψ′i(p
li)× (ol
i − vli) if i is output unit,
Ψ′i(pli)×
Pk δl
kwki if i is hidden unit,
η > 0 is called learning rate.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science48
Output Functions Revisited
I Remember sigmoidal function (with β = 1):
vi =1
1 + e−(P
j wijvj+θi)
I We finddvi
d(P
j wijvj + θi)= vi(1− vi).
I Hence
δli =
vl
i(1− vli)(o
li − vl
i) if ui is an output unit,vl
i(1− vli)
Pk δl
kwki if ui is a hidden unit.
I Units are active if vi ≥ 0.9 and passive if vi ≤ 0.1.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science49
Properties
I Learning rate η:
. If η is large, then system learns rapidly but may oscillate.
. If η is small, then system learns slowly but will not oscillate.
. In the ideal case η should be adapted during learning:
∆wij(t + 1) = ηδi(t)vj(t) + α∆wij(t)
where α is a constant and α∆wij(t) is called momentum term.
I Almost all functions can be learned.
I Learning is NP–hard.
I Literature Rumelhart etal. 1986: Parallel Distributed Processing. MIT Press.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science50
Level Mappings and Hierarchical Logic Programs
I Let V be a set of propositional variablesandP be a propositional logic program wrt V .
I A level mapping forP is a function l : V → N.
. We define l(¬A) = l(A).
I P is hierarchical if for all clauses A← L1 ∧ . . . ∧ Ln ∈ P we findl(A) > l(L1) for all 1 ≤ i ≤ n.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science51
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
��
��
@@
@@
@@I
−ω
ω2
K
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
��
��
@@
@@
@@I
−ω
ω2
K
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
��
��
@@
@@
@@I
−ω
ω2
K
3ω2
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
��
��
@@
@@
@@I
−ω
ω2
K
3ω2
ω2
Knowledge Based Artificial Neural Networks
I Towell, Shavlik 1994: Can we do better than empirical learning?
I Sets of hierarchical logic programs, e.g.,
P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.
B C D E F G
��
��6
@@
@I
ω2A
ω
3ω2
��
��
ω2
@@
@I
��
��6
��
��6
3ω2 H
��
��
@@
@@
@@I
−ω
ω2
K
3ω2
ω2
ω2
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science52
Knowledge Based Artificial Neural Networks – Learning
I Given hierachical sets of propositional rules as background knowledge.
I Map rules into multi-layer feed forward networks with sigmoidal units.
I Add hidden units (optional).
I Add units for known input features that are not referenced in the rules.
I Fully connect layers.
I Add near-zero random numbers to all links and thresholds.
I Apply backpropagation.
. Empirical evaluation: system performs betterthan purely empirical and purely hand-built classifiers.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science53
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
I pA = pB = 9ω
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.
I pC = 0.92ω
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.
I pC = 0.92ω and vc = 11+eβ(0.5ω−0.92ω) ≈ 0.6 with β = 1.
Knowledge Based Artificial Neural Networks – A Problem
I Works if rules have few conditions andthere are few rules with the same head.
. . .
A1 A9 A10
������*6
@@
@I
19ω2A
. . .
B1 B2 B10
��
��6
HHHHHHY
19ω2 B
ω2
C
����
��*
HHHH
HHY
ω2
I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.
I pC = 0.92ω and vc = 11+eβ(0.5ω−0.92ω) ≈ 0.6 with β = 1.
I Literature Towell, Shavlik 1994: Knowledge Based Artificial Neural Networks.Artificial Intelligence 70, 119-165.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science54
Propositional Core Method using Bipolar Sigmoidal Units
I d’Avila Garcez, Zaverucha, Carvalho 1997:Can we combine the ideas in Holldobler, Kalinke 1994 and Towell, Shavlik 1994while avoiding the above mentioned problem?
I Consider propositional logic language.
I Let I be an interpretation and a ∈ [0, 1].
R(I)[j] =
v ∈ [a, 1] if j ∈ I,
w ∈ [−1,−a] if j 6∈ I.
I Replace threshold and sigmoidal units by bipolar sigmoidal ones,i.e., units with
Φ(~ik) = pk =Pm
j=1 wkjvj,
Ψ(pk) = vk = 2
1+eβ(θk−pk) − 1,
where θk ∈ R is a threshold (or bias) and β > 0 a steepness parameter.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science55
The Task
I How should a, ω and θi be selected such that:
. vi ∈ [a, 1] or vi ∈ [−1,−a] and
. the core computes the immediate consequence operator?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science56
Hidden Layer Units
I Consider A← L1 ∧ . . . ∧ Ln.
I Let u be the hidden layer unit for this rule.
. Suppose I |= L1 ∧ . . . ∧ Ln.
• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+
u .
Hidden Layer Units
I Consider A← L1 ∧ . . . ∧ Ln.
I Let u be the hidden layer unit for this rule.
. Suppose I |= L1 ∧ . . . ∧ Ln.
• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+
u .
. Suppose I 6|= L1 ∧ . . . ∧ Ln.
• u receives input≤ −ωa from at least one unit representing Li.• pu ≤ (n− 1)ω1− ωa = p−u .
Hidden Layer Units
I Consider A← L1 ∧ . . . ∧ Ln.
I Let u be the hidden layer unit for this rule.
. Suppose I |= L1 ∧ . . . ∧ Ln.
• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+
u .
. Suppose I 6|= L1 ∧ . . . ∧ Ln.
• u receives input≤ −ωa from at least one unit representing Li.• pu ≤ (n− 1)ω1− ωa = p−u .
I θu = nωa+(n−1)ω−ωa2 = (na + n− 1− a)ω
2 = (n− 1)(a + 1)ω2 .
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science57
Output Layer Units
I Let µ be the number of clause with head A.
I Consider A← L1 ∧ . . . ∧ Ln.
I Suppose I |= L1 ∧ . . . ∧ Ln.
. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.
Output Layer Units
I Let µ be the number of clause with head A.
I Consider A← L1 ∧ . . . ∧ Ln.
I Suppose I |= L1 ∧ . . . ∧ Ln.
. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.
I Suppose for all rules of the form A← L1∧ . . .∧Ln we find I 6|= L1∧ . . .∧Ln.
. pA ≤ −µωa = p−A.
Output Layer Units
I Let µ be the number of clause with head A.
I Consider A← L1 ∧ . . . ∧ Ln.
I Suppose I |= L1 ∧ . . . ∧ Ln.
. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.
I Suppose for all rules of the form A← L1∧ . . .∧Ln we find I 6|= L1∧ . . .∧Ln.
. pA ≤ −µωa = p−A.
I θA = ωa−(µ−1)ω−µωa2 = (a− µ + 1− µa)ω
2 = (1− µ)(a + 1)ω2 .
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science58
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
I p+A > p−A:
. ωa− (µ− 1)ω > −µaω.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
I p+A > p−A:
. ωa− (µ− 1)ω > −µaω.
. ωa + µaω > (µ− 1)ω.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
I p+A > p−A:
. ωa− (µ− 1)ω > −µaω.
. ωa + µaω > (µ− 1)ω.
. a(1 + µ)ω > (µ− 1)ω.
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
I p+A > p−A:
. ωa− (µ− 1)ω > −µaω.
. ωa + µaω > (µ− 1)ω.
. a(1 + µ)ω > (µ− 1)ω.
. a > µ−1µ+1 .
Computing a Value for a
I p+u > p−u :
. nωa > (n− 1)ω − ωa.
. nωa + ωa > (n− 1)ω.
. a(n + 1)ω > (n− 1)ω.
. a > n−1n+1 .
I p+A > p−A:
. ωa− (µ− 1)ω > −µaω.
. ωa + µaω > (µ− 1)ω.
. a(1 + µ)ω > (µ− 1)ω.
. a > µ−1µ+1 .
I Consider all rules minimum value for a.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science59
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
I ln(1−a1+a) ≥ β(θ − p).
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
I ln(1−a1+a) ≥ β(θ − p).
I 1β ln(1−a
1+a) ≥ θ − p.
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
I ln(1−a1+a) ≥ β(θ − p).
I 1β ln(1−a
1+a) ≥ θ − p.
I Consider a hidden layer unit:
. 1β ln(1−a
1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na
2 ω = n−1−a(n+1)2 ω.
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
I ln(1−a1+a) ≥ β(θ − p).
I 1β ln(1−a
1+a) ≥ θ − p.
I Consider a hidden layer unit:
. 1β ln(1−a
1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na
2 ω = n−1−a(n+1)2 ω.
. ω ≥ 2(n−1−a(n+1))β ln(1−a
1+a) because a ≥ n−1n+1 .
Computing a Value for ω
I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.
I 21+eβ(θ−p) ≥ 1 + a.
I 21+a ≥ 1 + eβ(θ−p).
I 21+a − 1 = 2
1+a −1+a1+a = 1−a
1+a ≥ eβ(θ−p).
I ln(1−a1+a) ≥ β(θ − p).
I 1β ln(1−a
1+a) ≥ θ − p.
I Consider a hidden layer unit:
. 1β ln(1−a
1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na
2 ω = n−1−a(n+1)2 ω.
. ω ≥ 2(n−1−a(n+1))β ln(1−a
1+a) because a ≥ n−1n+1 .
I Consider all hidden and output layer units as well as the case that Ψ(p) ≤ −a:
minimum value for ω.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science60
Exercises
I Show that hierarchical programs are strongly determined.
I ConsiderP = {r ← p ∧ ¬q, r ← ¬p ∧ q, p← s ∧ t}.
. Compute values for a, ω and θi.
. Specify the core forP .
. How can the approach be extended to handle facts like s and t.?
I Consider nowP ′ = P ∪ {s, t}, whereP is as before.
. Show thatP ′ is strongly determined.
. Show that the recurrent network computes the least model ofP ∪ {s, t}.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science61
Results
I Relation to logic programs is preserved.
I The core is trainable by backpropagation.
I Many interesting applications, e.g.:
. DNA sequence analysis.
. Power system fault diagnosis.
I Empirical evaluation:system performs better than well-known machine learning systems.
I See d’Avila Garcez, Broda, Gabbay 2002 for details.
I Literature
. d’Avila Garcez, Zaverucha, Carvalho 1997: Logic Programming and Induct-ive Inference in Artificial Neural Networks. In: Knowledge Representationin Neural Networks Logos, Berlin, 33-46.
. d’Avila Garcez, Broda, Gabbay 2002: Neural-Symbolic Learning Systems:Foundations and Applications, Springer.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science62
Further Extensions
I Many-valued logic programs
I Modal logic programs
I Answer set programming
I Metalevel priorities
I Rule extraction
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science63
Propositional Core Method – Three-Valued Logic Programs
I Kalinke 1994: Consider truth values>, ⊥, u.
I Interpretations are pairs I = 〈I+, I−〉.I Immediate consequence operator ΦP(I) = 〈J+, J−〉, where
J+ = {A | A← L1 ∧ . . . ∧ Lm ∈ P and I(L1 ∧ . . . ∧ Lm) = >},J− = {A | for all A← L1 ∧ . . . ∧ Lm ∈ P : I(L1 ∧ . . . ∧ Lm) = ⊥}.
I Let n = |V| and identify V with {1, . . . , n}.I Define R : I → R2n as follows:
R(I)[2j − 1] =
1 if j ∈ I+
0 if j 6∈ I+
ffand R(I)[2j] =
1 if j ∈ I−
0 if j 6∈ I−
ff
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science64
Propositional Core Method – Multi-Valued Logic Programs
I For each programP , there exists a core computing ΦP , e.g.,
P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.
12
12
12
12
12
12
12
12
12
12
A ¬A B ¬B C ¬C D ¬D E ¬E
A ¬A B ¬B C ¬C D ¬D E ¬E
Propositional Core Method – Multi-Valued Logic Programs
I For each programP , there exists a core computing ΦP , e.g.,
P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.
12
12
12
12
12
12
12
12
12
12
A ¬A B ¬B C ¬C D ¬D E ¬E
A ¬A B ¬B C ¬C D ¬D E ¬E
3ω2
ω2
������*
������*
��
��
@@
@I
����
��*
����
��*
ω2
ω2
Propositional Core Method – Multi-Valued Logic Programs
I For each programP , there exists a core computing ΦP , e.g.,
P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.
12
12
12
12
12
12
12
12
12
12
A ¬A B ¬B C ¬C D ¬D E ¬E
A ¬A B ¬B C ¬C D ¬D E ¬E
3ω2
ω2
������*
������*
��
��
@@
@I
����
��*
����
��*
ω2
ω2
3ω2
ω2
6 6
XXXXXXXXXXXXy
XXXXXXXXXXXXy
������*
������*
ω2
3ω2
Propositional Core Method – Multi-Valued Logic Programs
I For each programP , there exists a core computing ΦP , e.g.,
P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.
12
12
12
12
12
12
12
12
12
12
A ¬A B ¬B C ¬C D ¬D E ¬E
A ¬A B ¬B C ¬C D ¬D E ¬E
3ω2
ω2
������*
������*
��
��
@@
@I
����
��*
����
��*
ω2
ω2
3ω2
ω2
6 6
XXXXXXXXXXXXy
XXXXXXXXXXXXy
������*
������*
ω2
3ω2
ω2
ω2
���������1
��
��>
6 6
Propositional Core Method – Multi-Valued Logic Programs
I For each programP , there exists a core computing ΦP , e.g.,
P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.
12
12
12
12
12
12
12
12
12
12
A ¬A B ¬B C ¬C D ¬D E ¬E
A ¬A B ¬B C ¬C D ¬D E ¬E
3ω2
ω2
������*
������*
��
��
@@
@I
����
��*
����
��*
ω2
ω2
3ω2
ω2
6 6
XXXXXXXXXXXXy
XXXXXXXXXXXXy
������*
������*
ω2
3ω2
ω2
ω2
���������1
��
��>
6 6
ω2 −
ω2
ω2 −
ω2
ω2 −
ω2
I Lane, Seda 2004: Extension to finitely determined sets of truth values.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science65
Propositional Core Method – Modal Logic Programs
I d’Avila Garcez, Lamb, Gabbay 2002.
I Let L be a propositional logic language plus
. the modalities 2 and 3, and
. a finite set of labels w1, . . . , wk denoting worlds.
I Let B be an atom, then 2B and 3B are modal atoms.
I A modal definite logic programP is a set of clauses of the form
wi : A← A1 ∧ . . . ∧Am
together with a finite set of relations wi Iwj, wherewi, 1 ≤ i, j ≤ k, are labels and A, A1, . . . , Am are atoms or modal atoms.
I P =Sk
i=1Pi, wherePi consists of all clauses labelled with wi.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science66
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
A
2AB
2B2C
B 2AB
2B
2C
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
A
2AB
2B2C
B 2AB
2B
2C
2B3B3C
2B3B
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
A
2AB
2B2C
B 2AB
2B
2C
2B3B3C
2B3B
C
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
A
2AB
2B2C
B 2AB
2B
2C
2B3B3C
2B3B
C2C3C
Modal Logic Programs – Semantics
I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }
I Kripke semantics:
•
•
• •
w1
w3
w2 w4
������
�����*
HHHHHHH
HHHHj
-
A
2AB
2B2C
B 2AB
2B
2C
2B3B3C
2B3B
C2C3C
3C
C
fC(w1) = w4
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science67
Modal Immediate Consequence Operator
I Interpretations are tuples I = 〈I1, . . . , Ik〉I Immediate consequence operator MTP(I) = 〈J1, . . . , Jk〉, where
Ji = {A | there exists A← A1 ∧ . . . ∧Am ∈ Pi
such that {A1, . . . , Am} ⊆ Ii}∪ {3A | there exists wi Iwj ∈ P and A ∈ Ij}∪ {2A | for all wi Iwj ∈ P we find A ∈ Ij}∪ {A | there exists wj Iwi ∈ P and 2A ∈ Ij}∪ {A | there exists wj Iwi ∈ P, 3A ∈ Ij and fA(wj) = wi}
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science68
Modal Logic Programs – The Translation Algorithm
I Let n = |V| and identify V with {1, . . . , n}.I Let a ∈ [0, 1].
I Define R : I → R3n as follows:
R(I)[3j − 2] =
v ∈ [a, 1] if j ∈ Ij
w ∈ [−1,−a] if j 6∈ Ij
R(I)[3j − 1] =
v ∈ [a, 1] if 2j ∈ Ij
w ∈ [−1,−a] if 2j 6∈ Ij
R(I)[3j] =
v ∈ [a, 1] if 3j ∈ Ij
w ∈ [−1,−a] if 3j 6∈ Ij
I Translation algorithm such that
. for each world the “local” part of MTP is computed by a core,
. the cores are turned into recurrent networks, and
. the cores are connected with respect to the given set of relations.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science69
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
∧ ∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
∧ ∨
The Example Network
A 2A
3A
B 2B
3B
C 2C
3C
w1
w2 w3
w4
6 6
6
6 6
6
∧
6
∧
6
∧
6
∧
6
∧
6
∧
6
r6∧
6
6
∨
6
r6∧6
6
∨
6
r
r6 66 6r
6
∨
6
r6∧
6
6
∨
6
∧ ∧ ∧
∧ ∧ ∧
∧ ∨
∧ ∨
∨
∧ ∨
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science70
First-Order Logic
I Existing Approaches
. Reflexive Reasoning and SHRUTI
. Connectionist Term Representations
• Holographic Reduced Representations Plate 1991• Recursive Auto-Associative Memory Pollack 1988
. Horn logic and CHCL Holldobler 1990, Holldobler, Kurfess 1992
. Other Approaches
I First-Order Logic Programs and the Core Method
. Initial Approach
. Construction of Approximating Networks
. Topological Analysis and Generalisations
. Employing Iterated Function Systems
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science71
Literature
I Holldobler 1990: A Structured Connectionist Unification Algorithm. In: Proceed-ings of the AAAI National Conference on Artificial Intelligence, 587-593.
I Holldobler, Kurfess 1992: CHCL – A Connectionist Inference System. In: Parallel-ization in Inference Systems, Lecture Notes in Artificial Intelligence, 590, 318-342.
I Plate 1991: Holographic Reduced Representations. In Proceedings of the Interna-tional Joint Conference on Artificial Intelligence, 30-35.
I Pollack 1988: Recursive auto-associative memory: Devising compositional dis-tributed representations. In: Proceedings of the Annual Conference of the Cognit-ive Science Society , 33-39.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science72
Reflexive Reasoning
I Humans are capable of performing a wide variety of cognitive taskswith extreme ease and efficiency.
I For traditional AI systems, the same problems turn out to be intractable.
I Human consensus knowledge: about 108 rules and facts.
I Wanted: “Reflexive” decisions within sublinear time.
I Shastri, Ajjanagadde 1993: SHRUTI.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science73
SHRUTI – Knowledge Base
I Finite set of constants C, finite set of variables V .
I Rules:
. (∀X1 . . . Xm) (p1(. . .) ∧ . . . ∧ pn(. . .)→ (∃Y1 . . . Yk p(. . .)).
. p, pi, 1 ≤ i ≤ n, are multi-place predicate symbols.
. Arguments of the pi: variables from {X1, . . . , Xm} ⊆ V .
. Arguments of p are from {X1, . . . , Xm} ∪ {Y1, . . . , Yk} ∪ C.
. {Y1, . . . , Yk} ⊆ V .
. {X1, . . . , Xm} ∩ {Y1, . . . , Yk} = ∅.
I Facts and queries (goals):
. (∃Z1 . . . Zl) q(. . .).
. Multi-place predicate symbol q.
. Arguments of q are from {Z1, . . . , Zl} ∪ C.
. {Z1, . . . , Zl} ⊆ V .
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science74
Further Restrictions
I Restrictions to rules, facts, and goals:
. No function symbols except constants.
. Only universally bound variables may occur as argumentsin the conditions of a rule.
. All variables occurring in a fact or goal occur only onceand are existentially bound.
. An existentially quantified variable is only unified with variables.
. A variable which occurs more than once in the conditions of a rule mustoccur in the conclusion of the rule and must be bound when the conclusionis unified with a goal.
. A rule is used only a fixed number of times.
Incompleteness.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science75
SHRUTI – Example
I RulesP = { owns(Y, Z)← gives(X, Y, Z),
owns(X, Y )← buys(X, Y ),can-sell(X, Y )← owns(X, Y ),gives(john, josephine, book),(∃X) buys(john, X),owns(josephine, ball) },
I Queries:can-sell(josephine, book) ; yes(∃X) owns(josephine, X) ; yes {X 7→ book}
{X 7→ ball}
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science76
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
H } } H } }
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
H } } H } }
I
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
H } } H } }
I
H
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
H } } H } }
I
H
H
SHRUTI : The Network
�� ��AA AAgives AA AA�� ��m mm mm buys
- -�� ��HH HH
6 6
r rr rrr r r
from johnfrom jos.from book
r from john
��
��
��
��
���
@@
@@
@@
@@
@@I �
��
��
��
��
�
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
@@
@@
@@
@@
@@R
��
��
��
��
��
��
��
��
? ?
��
��
AA
AA
owns
can-sell
AA
AA
��
��
m
m
m
m
@@
@ rr@@ r
r@
@@
@@
@ -��HH
@@
@@
@@
@@I
6
? ? ?
jose
phin
e
john
ball
book
H } }
H } }
H } } H } }
I
H
H
H
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science77
Solving the Variable Binding Problem
bookjohnball
josephinecan–sell4can–sell5
can–sell 1st argcan–sell 2nd arg
owns4owns5
owns 1st argowns 2nd arg
owns �
gives4gives5
gives 1st arggives 2nd arggives 3nd arg
gives �
buys4buys5
buys 1st argbuys 2nd arg
buys �
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science78
SHRUTI – Remarks
I Answers are derived in time proportional to depth of search space.
I Number of units as well as of connections is linear in the sizeof the knowledge base.
I Extensions:
. compute answer substitutions
. allow a fixed number of copies of rules
. allow multiple literals in the body of a rule
. built in a taxonomy
I ROBIN (Lange, Dyer 1989): signatures instead of phases.
I Biological plausibility.
I Trading expressiveness for time and size.
I Logical reconstruction by Beringer, Holldobler 1993:
. Reflexive reasoning is reasoning by reduction.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science79
Literature
I Beringer, Holldobler 1993: On the Adequateness of the Connection Method. In:Proceedings of the AAAI National Conference on Artificial Intelligence, 9-14.
I Shastri, Ajjanagadde 1993: From Associations to Systematic Reasoning: A Con-nectionist Representation of Rules, Variables and Dynamic Bindings using Tem-poral Synchrony. Behavioural and Brain Sciences 16, 417-494.
I Lange, Dyer 1989: High-Level Inferencing in a Connectionist Network. ConnectionScience 1, 181-217.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science80
First-Order Logic Programs and the Core Method
I Initial Approach
I Construction of Approximating Networks
I Topological Analysis and Generalisations
I Employing Iterated Function Systems
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science81
Logic Programs
I A logic programP over a first-order language L is a finite set of clauses
A← L1 ∧ . . . ∧ Ln,
where A is an atom, Li are literals and n ≥ 0.
I BL is the set of all ground atoms over L called Herbrand base.
I A Herbrand interpretation I is a mapping BL → {>,⊥}.I 2BL is the set of all Herbrand interpretations.
I ground(P) is the set of all ground instances of clauses inP .
I Immediate consequence operator TP : 2BL → 2BL:
TP(I) = {A | there is a clause A← L1 ∧ . . . ∧ Ln ∈ ground(P)such that I |= L1 ∧ . . . ∧ Ln}.
I I is a supported model iff TP(I) = I.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science82
The Initial Approach
I Holldobler, Kalinke, Storr 1999:Can the core method be extended to first-order logic programs?
I Problem
. Given a logic programP over a first order language Ltogether with TP : 2BL → 2BL.
. BL is countably infinite.
. The method used to relate propositional logic and connectionist systems isnot applicable.
. How can the gap between the discrete, symbolic setting of logic, and thecontinuous, real valued setting of connectionist networks be closed?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science83
The Goal
I Find R : 2BL → R and fP : R→ R such that the following conditions hold.
. TP(I) = I′ implies fP(R(I)) = R(I′).fP(x) = x′ implies TP(R−1(x)) = R−1(x′).
fP is a sound and complete encoding of TP .
. TP is a contraction on 2BL iff fP is a contraction on R.
The contraction property and fixed points are preserved.
. fP is continuous on R.
A connectionst network approximating fP is known to exist.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science84
Acyclic Logic Programs
I LetP be a program over a first order language L.
I A level mapping forP is a function l : BL → N.
. We define l(¬A) = l(A).
I We can associate a metric dL with L and l. Let I, J ∈ 2BL:
dL(I, J) =
0 if I = J
2−n if n is the smallest level on which I and J differ.
I Proposition (Fitting 1994) (2BL, dL) is a complete metric space.
I P is said to be acyclic wrt a level mapping l,if for every A← L1 ∧ . . . ∧ Ln ∈ ground(P) we find l(A) > l(Li) for all i.
I Proposition LetP be an acyclic logic program wrt l and dL the metric associatedwith L and l, then TP is a contraction on (2BL, dL).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science85
Mapping Interpretations to Real Numbers
I LetD = {r ∈ R | r =P∞
i=1 ai4−i, where ai ∈ {0, 1} for all i}.I Let l be a bijective level mapping.
I {>,⊥} can be identified with {0, 1}.I The set of all mappings BL → {>,⊥} can be identified with
the set of all mappings N→ {0, 1}.I Let IL be the set of all mappings from BL to {0, 1}.I Let R : IL → D be defined as
R(I) =∞Xi=1
I(l−1(i))4−i.
I Proposition R is a bijection.
We have a sound and complete encoding of interpretations.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science86
Mapping Immediate Consequence Operators to Functions on the Reals
I We define fP : D → D : r 7→ R(TP(R−1(r))).
r -
-
fP
TP
r′
I I′
? ?
R R
We have a sound and complete encoding of TP .
I Proposition LetP be an acylic program wrt a bijective level mapping.fP is a contraction onD.
Contraction property and fixed points are preserved.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science87
Approximating Continuous Functions
I Corollary fP is continuous.
I Recall Funahashi’s theorem:
. Every continuous function f : K → R can be uniformly approximated byinput-output functions of 3-layer feed forward networks.
I Theorem fP can be uniformly approximated by input-output functions of 3-layerfeed forward networks.
. TP can be approximated as well by applying R−1 .
Connectionist network approximating immediate consequence operator exists.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science88
An Example
I Consider P = {q(0), q(s(X))← q(X)} and let l(q(sn(0))) = n + 1.
. P is acyclic wrt l, l is bijective, R(BL) = 13.
. fP(R(I)) = 4−l(q(0)) +P
q(X)∈I 4−l(q(s(X)))
= 4−l(q(0)) +P
q(X)∈I 4−(l(q(X)))+1) = 1+R(I)4 .
I Approximation of fP to accuracy ε yields
f(x) ∈»1 + x
4− ε,
1 + x
4+ ε
–.
I Starting with some x and iterating f yields in the limit a value
r ∈»1− 4ε
3,1 + 4ε
3
–.
I Applying R−1 to r we find
q(sn(0)) ∈ R−1(r) if n < −log4ε− 1.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science89
Approximation of Interpretations
I LetP be a logic program over a first order language L and l a level mapping.
I An interpretation I approximates an interpretation J to a degree n ∈ Nif for all atoms A ∈ BL with l(A) < n we find I(A) = > iff J(A) = >.
. I approximates J to a degree n iff dL(I, J) ≤ 2−n.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science90
Approximation of Supported Models
I Given an acyclic logic programP with bijective level mapping.
I Let TP be the immediate consequence operator associated withP andMP the least supported model ofP .
I We can approximate TP by a 3-layer feed forward network.
I We can turn this network into a recurrent one.
Does the recurrent network approximate the supported model ofP?
I Theorem For an arbitrary m ∈ N there exists a recursive network with sigmoidalactivation function for the hidden layer units and linear activation functions forthe input and output layer units computing a function fP such that there exists ann0 ∈ N such that for all n ≥ n0 and for all x ∈ [−1, 1] we find
dL(R−1(f
n
P(x)), MP) ≤ 2−m.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science91
First Order Core Method – Extensions
I Detailed study in (topological) continuity of semantic operatorsHitzler, Seda 2003 and Hitzler, Holldobler, Seda 2004:
. many-valued logics,
. larger class of logic programs,
. other approximation theorems.
I A core method for reflexive reasoning Holldobler, Kalinke, Wunderlich 2000.
I The graph of fP is an attractor of some iterated function systemBader 2003 and Bader, Hitzler 2004:
. representation theorems,
. fractal interpolation,
. core with units computing radial basis functions.
I Finitely determined sets of truth values Lane, Seda 2004.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science92
Constructive Approaches: Fibring Artificial Neural Networks
I Fibring function Φ associated with neuron i maps some weights w of a networkto new values depending on w and the input x of i (Garcez, Gabbay:2004).
w
Φ
x y= =
I Idea approximate fP by computing values of atoms with level n = 1, 2, . . ..
Clause1
Clause2
Clausex
Φ
+1
TP(I)I
n
I Works well for acyclic logic programs with bijective level mapping(Bader, Garcez, Hitzler 2004).
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science93
Constructive Approaches: Approximating Piecewise Constant Functions
I Consider graph of fP .
Approximate fP up to a given level l.
Construct network computing piecewise constant function.
Step activation functions.Sigmoidal activation functions.Radial basis functions.
0.5
0.2
0.45
0.1
0.4
0.35
0 0.50.40.3
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science94
Constructive Approaches: Approximating Piecewise Constant Functions
I Consider graph of fP .
I Approximate fP up to a given level l.
Construct core computing piecewise constant function.
Step activation functions.Sigmoidal activation functions.Radial basis functions.
0.5
0.2
0.45
0.1
0.4
0.35
0 0.50.40.3
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science95
Constructive Approaches: Approximating Piecewise Constant Functions
I Consider graph of fP .
I Approximate fP up to a given level l.
I Construct core computing piecewise constant function.
. Step activation functions.Sigmoidal activation functions.Radial basis functions.
0.5
0.2
0.45
0.1
0.4
0.35
0 0.50.40.3
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science96
Constructive Approaches: Approximating Piecewise Constant Functions
I Consider graph of fP .
I Approximate fP up to a given level l.
I Construct core computing piecewise constant function.
. Step activation functions.
. Sigmoidal activation functions.Radial basis functions.
0.5
0.2
0.45
0.1
0.4
0.35
0 0.50.40.3
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science97
Constructive Approaches: Approximating Piecewise Constant Functions
I Consider graph of fP .
I Approximate fP up to a given level l.
I Construct core computing piecewise constant function.
. Step activation functions.
. Sigmoidal activation functions.
. Radial basis functions.
3210-1-2
1
-3
0.8
0.6
0.4
0.2
03210-1-2
1
-3
0.8
0.6
0.4
0.2
0
I Bader, Hitzler, Witzel 2005.
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science98
Open Problems
I How can first order terms be represented and manipulatedin a connectionist system? Pollack 1990, Holldobler 1990, Plate 1994.
I Can the mapping R be learned? Gust, Kuhnberger 2004.
I How can first order rules be extracted from a connectionist system?
I How can multiple instances of first order rules be representedin a connectionist system? Shastri 1990.
I What does a theory for the integration of logic and connectionist systemslook like?
I Can such a theory be applied in real domains outperformingconventional approaches?
I How does the core method relate to model-based reasoning approachesin cognitive science (e.g. Barnden 1989, Johnson-Laird, Byrne 1993)?
ICCLInternational Center for Computational Logic
Algebra, Logic and Formal Methods in Computer Science99