€¦ · Introduction & Motivation: Connectionist Systems IWell-suited to learn, to adapt to...

Preview:

Citation preview

Neural-Symbolic Integration

Steffen HolldoblerInternational Center for Computational LogicTechnische Universitat DresdenGermany

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science1

Introduction & Motivation: Overview

I Introduction & Motivation

I Propositional Logic

. Existing Approaches

. Proposititonal Logic Programs and the Core Method

I First-Order Logic

. Existing Approaches

. First-Order Logic Programs and the Core Method

I The Neural-Symbolic Learning Cycle

I Challenge Problems

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science2

Introduction & Motivation: Connectionist Systems

I Well-suited to learn, to adapt to new environments, to degrade gracefully etc.

I Many successful applications.

I Approximate functions.

. Hardly any knowledge about the functions is needed.

. Trained using incomplete data.

I Declarative semantics is not available.

I Recursive networks are hardly understood.

I McCarthy 1988: We still observe a propositional fixation.

I Structured objects are difficult to represent.

. Smolensky 1987: Can we instantiate the power of symboliccomputation within fully connectionist systems?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science3

Introduction & Motivation: Logic Systems

I Well-suited to represent and reason aboutstructured objects and structure-sensitive processes.

I Many successful applications.

I Direct implementation of relations and functions.

I Explicit expert knowledge is required.

I Highly recursive structures.

I Well understood declarative semantics.

I Logic systems are brittle.

I Expert knowledge may not be available.

. Can we instantiate the power of connectionist computationwithin a logic system?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science4

Introduction & Motivation: Objective

I Seek the best of both paradigms!

I Understanding the relation between connectionist and logic systems.

I Contribute to the open research problems of both areas.

I Well developed for propositional case.

I Hard problem: going beyond.

I In this lecture:

. Overview on existing approaches.

. Logic programs and recurrent networks.

. Semantic operators for logic programs can be computedby connectionist systems.

. Semantic operators can be learned.

. Logic programs can be extracted.

Introduction & Motivation: Objective

I Seek the best of both paradigms!

I Understanding the relation between connectionist and logic systems.

I Contribute to the open research problems of both areas.

I Well developed for propositional case.

I Hard problem: going beyond.

I In this lecture:

. Overview on existing approaches.

. Logic programs and recurrent networks.

. Semantic operators for logic programs can be computedby connectionist systems.

. Semantic operators can be learned.

. Logic programs can be extracted.

Neural Symbolic Integration using the Core Method

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science5

Connectionist Networks

I A connectionist network consists of

. a set U of units and

. a set W ⊆ U × U of connections.

I Each connection is labeled by a weight w ∈ R.

I If there is a connection from unit uj to uk, then wkj is its associated weight.

Connectionist Networks

I A connectionist network consists of

. a set U of units and

. a set W ⊆ U × U of connections.

I Each connection is labeled by a weight w ∈ R.

I If there is a connection from unit uj to uk, then wkj is its associated weight.

I A unit is specified by

. an input vector~i = (i1, . . . , im), ij ∈ R, 1 ≤ j ≤ m,

. an activation function Φ mapping~i to a potential p ∈ R,

. an output function Ψ mapping p to an (output) value v ∈ R.

Connectionist Networks

I A connectionist network consists of

. a set U of units and

. a set W ⊆ U × U of connections.

I Each connection is labeled by a weight w ∈ R.

I If there is a connection from unit uj to uk, then wkj is its associated weight.

I A unit is specified by

. an input vector~i = (i1, . . . , im), ij ∈ R, 1 ≤ j ≤ m,

. an activation function Φ mapping~i to a potential p ∈ R,

. an output function Ψ mapping p to an (output) value v ∈ R.

I If there is a connection from uj to uk

then wkjvj is the input received by uk along this connection.

I The potential and value of a unit are synchronously recomputed (or updated).

I Often a linear time t is added as parameter to input, potential and value.

I The state of a network with units u1, . . . , un at time t is (v1(t), . . . , vn(t)).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science6

A Simple Connectionist Network

v3 v4

m mu3 u4

? ?

m mu1 u2

? ?

w31 w42

@@w43

����

�����@@R

��w34

HHHH

HHHHH��

w34, w43 = −0.5w31, w42 = 1

pi(t + 1) = pi(t) +P4

j=1 wijvj(t)

vi(t) = round(pi(t))

v1(t) =

6 if t = 02 otherwise

v2(t) =

5 if t = 02 otherwise

I What happens if the network is synchronously updated?

A Simple Connectionist Network

v3 v4

m mu3 u4

? ?

m mu1 u2

? ?

w31 w42

@@w43

����

�����@@R

��w34

HHHH

HHHHH��

w34, w43 = −0.5w31, w42 = 1

pi(t + 1) = pi(t) +P4

j=1 wijvj(t)

vi(t) = round(pi(t))

v1(t) =

6 if t = 02 otherwise

v2(t) =

5 if t = 02 otherwise

I What happens if the network is synchronously updated?

I A winner-take-all network is a synchronously updated connectionist network of n

units (not counting input units) such that after each unit receives an initial inputat t = 0 eventually only the unit with the highest initial input produces a valuegreater than 0 whereas the value of all other units is 0.

I Exercise Construct a winner-take-all network of 3 units.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science7

Literature

I Feldman, Ballard 1982: Connectionist Models and Their Properties.Cognitive Science 6 (3), 205-254.

I McCarthy 1988: Epistemological Challenges for Connectionism.Behavioural and Brain Sciences 11, 44.

I Smolensky 1987: On Variable Binding and the Representation of Symbolic Struc-tures in Connectionist Systems. Report No. CU-CS-355-87, Department of Com-puter Science & Institute of Cognitive Science, University of Colorado, Boulder.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science8

Propositional Logic

I Existing Approaches

. Finite Automata and McCulloch-Pitts Networks

. Weighted Automata and Semiring Artificial Neural Networks

. Propositional Reasoning and Symmetric/Stochastic Networks

. Other Approaches

I Proposititonal Logic Programs and the Core Method

. The Very Idea

. Logic Programs

. Propositional Core Method

. Backpropagation

. Knowledge-Based Artificial Neural Networks

. Propositional Core Method using Sigmoidal Units

. Further Extensions

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science9

McCulloch-Pitts Networks

I McCulloch, Pitts 1943:Can the activities of nervous systems be modelled by a logical calculus?

I A McCulloch-Pitts network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connections.

I The set UI of input units is defined as UI = {uk ∈ U | (∀uj ∈ U) wkj = 0}.I The set UO of output units is defined as UO = {uj ∈ U | (∀uk ∈ U) wkj = 0}.

McCulloch-Pittsnetwork

-

-

...UI

-

-

... UO

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science10

Binary Threshold Units

I uk is a binary threshold unit if

Φ(~ik) = pk =Pm

j=1 wkjvj

Ψ(pk) = vk =

1 if pk ≥ θk

0 otherwise

where θk ∈ R is a threshold.

I Three binary threshold units:

v1 -w21 = −1 θ2

= −0.5

u2

- v2 = ¬v1

Binary Threshold Units

I uk is a binary threshold unit if

Φ(~ik) = pk =Pm

j=1 wkjvj

Ψ(pk) = vk =

1 if pk ≥ θk

0 otherwise

where θk ∈ R is a threshold.

I Three binary threshold units:

v1 -w21 = −1 θ2

= −0.5

u2

- v2 = ¬v1

v2 -

w32 = 1

v1 -w31 = 1

θ3 = 0.5

u3

- v3 = v1 ∨ v2

Binary Threshold Units

I uk is a binary threshold unit if

Φ(~ik) = pk =Pm

j=1 wkjvj

Ψ(pk) = vk =

1 if pk ≥ θk

0 otherwise

where θk ∈ R is a threshold.

I Three binary threshold units:

v1 -w21 = −1 θ2

= −0.5

u2

- v2 = ¬v1

v2 -

w32 = 1

v1 -w31 = 1

θ3 = 0.5

u3

- v3 = v1 ∨ v2v2 -

w32 = 1

v1 -w31 = 1

θ3 = 1.5

u3

- v3 = v1 ∧ v2

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science11

A Simple McCulloch-Pitts Network

I Example Consider the following network of logical threshold units:

"!#

"!#

.5 .5u1 u3

-

-

1

1"!#

"!#

.5 .5

u2 u4

����

��������*HHHH

HHHHHH

HHj

-1-1

"!#

.5u5������������:

XXXXXXXXXXXXz

1

1

I Exercise

. Specify UI and UO.

. What is computed by the network if all units are updated synchronously?

. Specify the states of the network ignoring input and output units.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science12

Finite Automata

I A finite automaton consists of:

. Σ, a finite set of input symbols,

. Φ, a finite set of output symbols,

. Q, a finite set of states,

. q0 ∈ Q, an initial state,

. F ⊂ Q, a set of final states

. δ : Q× Σ→ Q, a state transition function,

. ρ : Q→ Φ, an output function.

I Exercise Let Σ = Φ = {1, 2}, Q = {p, q, r}, F = {r}, q0 = p,

ρ p q r

1 1 2, δ 1 2

p q p

q r q

r r r

.

What is computed by this automaton?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science13

Finite Automata and McCulloch-Pitts Networks

I Theorem McCulloch-Pitts networks are finite automata and vice versa.

I Proof

⇒ Exercise⇐ Let T = (Σ, Φ, Q, q0, F, δ, ρ) an automaton with

• Σ = {b1, . . . , bm},• Φ = {c1, . . . , cr},• Q = {q0, . . . , qk−1}.

To show there exists network N with

• inputs {b′1, . . . , b′m},• outputs {c′1, . . . , c′r},• states {q′0, . . . , q′k−1} such that

if T generates cj1, . . . , cjn given bj1, . . . , bjn

then N generates c′j1, . . . , c′jngiven b′j1, . . . , b′jn

.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science14

Construction of the Network: Inputs and Outputs

I Remember |Σ| = m, |Φ| = r.

Construction of the Network: Inputs and Outputs

I Remember |Σ| = m, |Φ| = r.

I Inputs x1, . . . , xm with b′j = ~x where

xi =

1 if i = j,

0 otherwise.

Construction of the Network: Inputs and Outputs

I Remember |Σ| = m, |Φ| = r.

I Inputs x1, . . . , xm with b′j = ~x where

xi =

1 if i = j,

0 otherwise.

I Outputs y1, . . . , yr with c′j = ~y where

yi =

1 if i = j,

0 otherwise.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science15

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

I qb-units represent that T in state q receives input b (k×m units).

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

I qb-units represent that T in state q receives input b (k×m units).

I c-units represent output c (r units).

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

I qb-units represent that T in state q receives input b (k×m units).

I c-units represent output c (r units).

I Connections

. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in

vuq∗b∗(t + 1) =

1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

I qb-units represent that T in state q receives input b (k×m units).

I c-units represent output c (r units).

I Connections

. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in

vuq∗b∗(t + 1) =

1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.

. Let {l1, . . . , ln(l)} = {(q, b) | ρ(q) = c} in

vuc(t + 1) =

1 if l1(t) ∨ . . . ∨ ln(l)(t),0 otherwise.

Construction of the Network: Units and Connections

I Remember |Σ| = m, |Φ| = r, |Q| = k.

I qb-units represent that T in state q receives input b (k×m units).

I c-units represent output c (r units).

I Connections

. Let {k1, . . . , kn(k)} = {(q, b) | δ(q, b) = q∗} in

vuq∗b∗(t + 1) =

1 if xb∗(t) ∧ [k1(t) ∨ . . . ∨ kn(k)(t)],0 otherwise.

. Let {l1, . . . , ln(l)} = {(q, b) | ρ(q) = c} in

vuc(t + 1) =

1 if l1(t) ∨ . . . ∨ ln(l)(t),0 otherwise.

I The theorem follows by induction on the length of the input sequence.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science16

Exercises

I Specify the automaton corresponding to the sample network.

I Specify the network corresponding to the sample finite automaton.

I Complete the proof of the theorem.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science17

Some Remarks on McCulloch-Pitts Networks

I McCulloch-Pitts networks are not just simple reactive systems, but theirbehavior depends on previous inputs as well as the activity within the network.

. Example

x -

-

1

����0.5 -

1 ����0.5 - y

I Literature

. Arbib: Brains, Machines and Mathematics. Springer, 2nd edition (1987).

. McCulloch & Pitts: A Logical Calculus and the Ideas Immanent in theNervous Activity. Bulletin of Mathematical Biophysics 5, 115-133 (1943).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science18

Weighted Automata and Semiring Artificial Neural Networks

I Bader, Holldobler, Scalzitti 2004:Can the result by McCulloch and Pitts be extended to weighted automata?

I Let (K,⊕,�, 0K, 1K) be a semiring.

I uk is a⊕-unit ifΦ(~ik) = pk =

Lmj=1 wkj � vj

Ψ(pk) = vk = pk

I uk is a�-unit ifΦ(~ik) = pk =

Jmj=1 wkj � vj

Ψ(pk) = vk = pk

I A semiring artificial neural network consists of a set U of⊕- and�-unitsand a set W ⊆ U × U of K-weighted connections.

I Theorem Weighted automata are semiring artificial neural networks.

I Literature Bader, Holldobler, Scalzitti 2004: Semiring Artificial Neural Networksand Weighted Automata – and an Application to Digital Image Encoding.In: KI 2004: Advances in Artificial Intelligence,Lecture Notes in Artificial Intelligence 3238, 281-294.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science19

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5}5

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5}5 }1 m12

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5}5 }1 m12}1 ml1

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5}5 }1 m12}1 ml1}m1 }1

Symmetric Networks

I Hopfield 1982: Can statistical models for magnetic materialsexplain the behavior of certain classes of networks?

I Original application: associative memory.

I A symmetric network consists of a set U of binary threshold unitsand a set W ⊆ U × U of weighted connectionssuch that wkj = wjk for all k, j with k 6= j.

I Asynchronous update procedure:while state ~v is unstable: update an arbitrary unit.

m0

m0

−1 m5

�����

������

2

HHHHH

HHHHHH

2

m02

}0

}0

ml0}0

}0}0m

}m0

}0

ml0}0ml5}5 }1 m12}1 ml1}m1 }1

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science20

Energy Minimization

I What happens precisely when a symmetric network is updated?

I Consider the energy function

E(t) = −12

Pk,j wkjvj(t)vk(t) +

Pk θkvk(t)

= −P

k<j wkjvj(t)vk(t) +P

k θkvk(t)

describing the state of the network at time t.

I We assume wii = 0 for all units i in the network.

I Exercise

. Specify E(t) for the symmetric networks on the previos page.

Energy Minimization

I What happens precisely when a symmetric network is updated?

I Consider the energy function

E(t) = −12

Pk,j wkjvj(t)vk(t) +

Pk θkvk(t)

= −P

k<j wkjvj(t)vk(t) +P

k θkvk(t)

describing the state of the network at time t.

I We assume wii = 0 for all units i in the network.

I Exercise

. Specify E(t) for the symmetric networks on the previos page.

. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?

Energy Minimization

I What happens precisely when a symmetric network is updated?

I Consider the energy function

E(t) = −12

Pk,j wkjvj(t)vk(t) +

Pk θkvk(t)

= −P

k<j wkjvj(t)vk(t) +P

k θkvk(t)

describing the state of the network at time t.

I We assume wii = 0 for all units i in the network.

I Exercise

. Specify E(t) for the symmetric networks on the previos page.

. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?

I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).

Energy Minimization

I What happens precisely when a symmetric network is updated?

I Consider the energy function

E(t) = −12

Pk,j wkjvj(t)vk(t) +

Pk θkvk(t)

= −P

k<j wkjvj(t)vk(t) +P

k θkvk(t)

describing the state of the network at time t.

I We assume wii = 0 for all units i in the network.

I Exercise

. Specify E(t) for the symmetric networks on the previos page.

. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?

I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).

I Exercise Does this theorem still hold if we drop the assumption that wij = wji?

Energy Minimization

I What happens precisely when a symmetric network is updated?

I Consider the energy function

E(t) = −12

Pk,j wkjvj(t)vk(t) +

Pk θkvk(t)

= −P

k<j wkjvj(t)vk(t) +P

k θkvk(t)

describing the state of the network at time t.

I We assume wii = 0 for all units i in the network.

I Exercise

. Specify E(t) for the symmetric networks on the previos page.

. How does an update change the energy of a symmetric network(you may assume that θk = 0 for all k)?

I Theorem E is monotone decreasing, i.e., E(t + 1) ≤ E(t).

I Exercise Does this theorem still hold if we drop the assumption that wij = wji?

I Exercise How plausible is the assumption that wij = wji?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science21

Stochastic Networks or Boltzmann Machines

I Hinton, Sejnowski 1983: Can we escape local minima?

I A stochastic network is a symmetric network,but the values are computed probabilistically

P (vk = 1) =1

1 + e(θk−pk)/T

where T is called pseudo temperature.

I In equilibrium stochastic networks are more likely to be in a state with low energy.

I Kirkpatrick etal. 1983: Can we compute a global minima?

I Simulated annealing decrease T gradually.

I Theorem (Geman, Geman 1984)A global minima is reached if T is decreased in infinitesimal small steps.

I Applications Combinatorial optimization problems like thetravelling salesman problem or graph bipartitioning problem.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science22

Literature

I Geman, Geman 1984: Stochastic Relaxation, Gibbs Distribution, and the BayesianRestoration of Image. IEEE Transactions on Pattern Analysis and Machine Intelli-gence 6, 721-741.

I Hinton, Sejnowski 1983: Optimal Perceptual Inference. In: Proceedings of theIEEE Conference on Computer Vision and Recognition, 448-453.

I Hopfield 1982: Neural Networks and Physical Systems with Emergent CollectiveComputational Abilities. In: Proceedings of the National Academy of SciencesUSA, 2554-2558.

I Kirkpatrick etal. 1983: Optimization by Simulated Annealing. Science 220, 671-680.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science23

Propositional Logic

I Variables are p1, . . . , pn.

I Connectives are ¬,∨,∧.

I Atoms are variables.

I Literals are atoms and negated atoms.

I Clauses are (generalized) disjunctions of literals.

I Formulas in clause form are (generalized) conjunctions of clauses.

Propositional Logic

I Variables are p1, . . . , pn.

I Connectives are ¬,∨,∧.

I Atoms are variables.

I Literals are atoms and negated atoms.

I Clauses are (generalized) disjunctions of literals.

I Formulas in clause form are (generalized) conjunctions of clauses.

I Notation Sometimes variables are denoted by different lettersif there is a bijection between these letters and p1, . . . , pn.

I Example

(¬o ∨m) ∧ (¬s ∨ ¬m) ∧ (¬c ∨m) ∧ (¬c ∨ s) ∧ (¬v ∨ ¬m)

which is abbreviated by

〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science24

Interpretations and Models

I Notation (all symbols may be indexed)

. A denotes an atom.

. L denotes a literal.

. F, G denote formulas.

. C denotes a clause.

Interpretations and Models

I Notation (all symbols may be indexed)

. A denotes an atom.

. L denotes a literal.

. F, G denote formulas.

. C denotes a clause.

I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.

. They can be encoded as ~v.

Interpretations and Models

I Notation (all symbols may be indexed)

. A denotes an atom.

. L denotes a literal.

. F, G denote formulas.

. C denotes a clause.

I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.

. They can be encoded as ~v.

. They are extended to formulas as follows:

pi(~v) = vi

(¬F )(~v) = 1− F (~v)(F ∧G)(~v) = F (~v)×G(~v)(F ∨G)(~v) = F (~v) + G(~v)− F (~v)×G(~v)

Interpretations and Models

I Notation (all symbols may be indexed)

. A denotes an atom.

. L denotes a literal.

. F, G denote formulas.

. C denotes a clause.

I Interpretations are mappings from {p1, . . . , pn} to {0, 1}.

. They can be encoded as ~v.

. They are extended to formulas as follows:

pi(~v) = vi

(¬F )(~v) = 1− F (~v)(F ∧G)(~v) = F (~v)×G(~v)(F ∨G)(~v) = F (~v) + G(~v)− F (~v)×G(~v)

I ~v is a model for F iff F (~v) = 1.

I F is satisfiable if it has a model.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science25

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))

= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))

= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))

= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))

= ((1− 1) + 0− (1− 1)× 0)× (1 + (1− 0)− 1× (1− 0))= 0× 1

Interpretations and Models – Example

I Let F = 〈[¬p1, p2], [p3,¬p2]〉 and ~v = ~101, then:

F (~v)= [¬p1, p2](~v)× [p3,¬p2](~v)= ((¬p1)(~v) + p2(~v)− (¬p1)(~v)× p2(~v))× (p3(~v) + (¬p2)(~v)− p3(~v)× (¬p2)(~v))

= ((1− p1(~v)) + p2(~v)− (1− p1(~v))× p2(~v))× (p3(~v) + (1− p2(~v))− p3(~v)× (1− p2(~v)))

= ((1− 1) + 0− (1− 1)× 0)× (1 + (1− 0)− 1× (1− 0))= 0× 1= 1

I Hence, ~v is not a model for F , but is a model for [p3,¬p2].

I Exercise

. Is F satisfiable? Prove your claim.

. Is 〈[¬p], [p,¬q], [q]〉 satisfiable? Prove your claim.

. Find all models of 〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science26

Propositional Reasoning and Energy Minimization

I Pinkas 1991:Is there a link between propositional logic and symmetric networks?

I Let F = 〈C1, . . . , Cm〉 be a propositional formula in clause form.

I We define

τ (C) =

8>><>>:0 if C = [ ],A if C = [A],1−A if C = [¬A],τ (C1) + τ (C2)− τ (C1)τ (C2) if C = (C1 ∨ C2).

τ (F ) =Pm

i=1(1− τ (Ci))

I Example τ (〈[¬o, m], [¬s,¬m], [¬c, m], [¬c, s], [¬v,¬m]〉)= vm− cm− cs + sm− om + 2c + o.

I Exercise Compute τ (〈[¬p], [p,¬q], [q]〉).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science27

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

������

������

1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

������

������

1

2

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

������

������

1

2

1

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

������

������

1

2

1

0

0

0

Propositional Reasoning and Symmetric Networks

I Theorem F (~v) = 1 iff τ (F ) has a global minima at ~v and τ (F )(~v) = 0.

I Compare τ (F ) = vm− cm− cs + sm− om + 2c + o

with E(~v) = −P

k<j wkjvjvk +P

k θkvk.

mu1 = o

mu2 = m

mu3 = s

mu5 = v

mu4 = c

������������

−1

AA

AA

AA

AA

AA

AA

1

��

��

��

1

HHHHHH

HHHHHH

−1

������

������

1

2

1

0

0

0

}0

}0

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science28

Propositional Non-Monotonic Reasoning

I Pinkas 1991a:Can the above mentioned approach be extended to non-monotonic reasoning?

I Consider F = 〈(C1, k1), . . . , (Cm, km)〉, where Ci are clauses and ki ∈ R+.

I The penalty of ~v for (C, k) is k if C(~v) = 0 and 0 otherwise.

I The penalty of ~v for F is the sum of the penalties for (Ci, ki).

I ~v is preferred over ~w wrt F

if the penalty of ~v for F is smaller than the penalty of ~w for F .

I Modify τ to become τ (F ) =Pm

i=1 ki(1− τ (Ci)), e.g.,

τ (〈([¬o, m], 1), ([¬s,¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v,¬m], 4)〉)= 4vm− 4cm− 4cs + 2sm− om + 8c + o.

I The corresponding stochastic network computes most preferred interpretations.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science29

Exercises and Literature

I Exercise Consider

F = 〈([¬o, m], 1), ([¬s,¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v,¬m], 4)〉.

. Compute the most preferred interpretations of F .

. What happens if we add (o, 100) to F ?

. What happens if we add (o, 100) and (s, 100) to F ?

I Literature

. Pinkas 1991: Symmetric Neural Networks and Logic Satisfiability. NeuralComputation 3, 282-291.

. Pinkas 1991a: Propositional Non-Monotonic Reasoning and Inconsistencyin Symmetrical Neural Networks. In: Proceedings International Joint Con-ference on Artificial Intelligence, 525-530.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science30

Proposititonal Logic Programs and the Core Method

I The Very Idea

I Logic Programs

I Propositional Core Method

I Backpropagation

I Knowledge-Based Artificial Neural Networks

I Propositional Core Method using Sigmoidal Units

I Further Extensions

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science31

The Very Idea

I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).

The Very Idea

I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).

I Banach Contraction Mapping Theorem A contraction mapping f defined ona complete metric space (X, d) has a unique fixed point. The sequencey, f(y), f(f(y)), . . . converges to this fixed point for any y ∈ X.

. Fitting 1994: Consider logic programs,whose immediate consequence operator is a contraction.

The Very Idea

I Various semantics for logic programs coincide with fixed points of associated im-mediate consequence operators (e.g., Apt, vanEmden 1982).

I Banach Contraction Mapping Theorem A contraction mapping f defined ona complete metric space (X, d) has a unique fixed point. The sequencey, f(y), f(f(y)), . . . converges to this fixed point for any y ∈ X.

. Fitting 1994: Consider logic programs,whose immediate consequence operator is a contraction.

I Funahashi 1989: Every continuous function on the reals can be uniformly approx-imated by feedforward connectionist networks.

. Holldobler, Kalinke, Storr 1999: Consider logic programs,whose immediate consequence operator is continuous on the reals.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science32

Metrics

I A metric on a space M is a mapping d : M ×M → R such that

. d(x, y) = 0 iff x = y,

. d(x, y) = d(y, x), and

. d(x, y) ≤ d(x, z) + d(z, y).

I Let (M, d) be a metric space and S = (si | si ∈M) a sequence.

. S converges if (∃s ∈M)(∀ε > 0)(∃N)(∀n ≥ N) d(sn, s) ≤ ε.

. S is Cauchy if (∀ε > O)(∃N)(∀n, m ≥ N) d(sn, sm) ≤ ε.

. (M, d) is complete if every Cauchy sequence converges.

I A mapping f : M →M is a contraction on (M, d)if (∃0 < k < 1)(∀x, y ∈M) d(f(x), f(y)) ≤ k · d(x, y).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science33

Propositional Logic Programs

I A propositional logic programP over a propositional language Lis a finite set of clauses

A← L1 ∧ . . . ∧ Ln,

where A is an atom, Li are literals and n ≥ 0.P is definite if all Li, 1 ≤ i ≤ n are atoms.

I Let V be the set of all propositional variables occurring in L.

I An interpretation I is a mapping V → {>,⊥}.I I can be represented by the set of atoms which are mapped to> under I.

I 2V is the set of all interpretations.

I Immediate consequence operator TP : 2V → 2V:

TP(I) = {A | there is a clause A← L1 ∧ . . . ∧ Ln ∈ P

such that I |= L1 ∧ . . . ∧ Ln}.

I I is a supported model iff TP(I) = I.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science34

Exercises

I ConsiderP = {p, q ← p, r ← q}

. Draw the lattice of all interpretations ofP wrt the⊆ ordering.

. Mark the models ofP .

. Compute TP(∅), TP(TP(∅)), . . ..

. Mark the supported models ofP .

I LetP be a definite program.

. Show that if M1 and M2 are models ofP then so is M1 ∩M2.

. Let M be the least model ofP . Show that M is a supported model.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science35

The Core Method

I Let L be a logic language.

I Given a logic programP together with immediate consequence operator TP .

I Let I be the set of interpretations forP .

I Find a mapping R : I → Rn.

I Construct a feed-forward network computing fP : Rn → Rn, called the core,such that the following holds:

. If TP(I) = J then fP(R(I)) = R(J), where I, J ∈ I.

. If fP(~s) = ~t then TP(R−1(~s)) = R−1(~t), where ~s,~t ∈ Rn.

I Connect the units in the output layer recursively to the units in the input layer.

I Show that the following holds

. I = lfp (TP) iff the recurrent network converges to or approximates R(I).

The Core Method

I Let L be a logic language.

I Given a logic programP together with immediate consequence operator TP .

I Let I be the set of interpretations forP .

I Find a mapping R : I → Rn.

I Construct a feed-forward network computing fP : Rn → Rn, called the core,such that the following holds:

. If TP(I) = J then fP(R(I)) = R(J), where I, J ∈ I.

. If fP(~s) = ~t then TP(R−1(~s)) = R−1(~t), where ~s,~t ∈ Rn.

I Connect the units in the output layer recursively to the units in the input layer.

I Show that the following holds

. I = lfp (TP) iff the recurrent network converges to or approximates R(I).

Connectionist model generation using recurrent networks with feed forward core.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science36

3-Layer Recurrent Networks

input layer

hidden layer

output layer

3-Layer Recurrent Networks

input layer

hidden layer

output layer

�������

�������

��

��

���

��

��

���

��

��

��

���36

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I6

QQ

QQ

QQ

QQQk

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I

QQ

QQ

QQ

QQQk 6

6

�������

�������

��

��

���

��

��

���

��

��

���

��

��

��

���3

����������

AAAAAAAAAA core

. . .

. . .

3-Layer Recurrent Networks

input layer

hidden layer

output layer

�������

�������

��

��

���

��

��

���

��

��

��

���36

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I6

QQ

QQ

QQ

QQQk

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I

QQ

QQ

QQ

QQQk 6

6

�������

�������

��

��

���

��

��

���

��

��

���

��

��

��

���3

����������

AAAAAAAAAA core

. . .

. . .

6 6���� 6 6����

���� ����

. . .

. . .

3-Layer Recurrent Networks

input layer

hidden layer

output layer

�������

�������

��

��

���

��

��

���

��

��

��

���36

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I6

QQ

QQ

QQ

QQQk

AA

AA

AAK

AA

AA

AAK

@@

@@

@@I

@@

@@

@@I

QQ

QQ

QQ

QQQk 6

6

�������

�������

��

��

���

��

��

���

��

��

���

��

��

��

���3

����������

AAAAAAAAAA core

. . .

. . .

6 6���� 6 6����

���� ����

. . .

. . .

I At each point in time all units do:

. apply activation function to obtain potential,

. apply output function to obtain output.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science37

Propositional Core Method using Binary Threshold Units

I Let L be the language of propositional logic over a set V of variables.

I LetP be a propositional logic program, e.g.,

P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.

I I = 2V is the set of interpretations forP .

I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.

TP(∅) = {p}

Propositional Core Method using Binary Threshold Units

I Let L be the language of propositional logic over a set V of variables.

I LetP be a propositional logic program, e.g.,

P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.

I I = 2V is the set of interpretations forP .

I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.

TP(∅) = {p}TP({p}) = {p, r}

Propositional Core Method using Binary Threshold Units

I Let L be the language of propositional logic over a set V of variables.

I LetP be a propositional logic program, e.g.,

P = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.

I I = 2V is the set of interpretations forP .

I TP(I) = {A | A← L1 ∧ . . . ∧ Lm ∈ P such that I |= L1 ∧ . . . ∧ Lm}.

TP(∅) = {p}TP({p}) = {p, r}TP({p, r}) = {p, r} = lfp (TP)

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science38

Representing Interpretations

I I = 2V

I Let n = |V| and identify V with {1, . . . , n}.I Define R : I → Rn such that for all 1 ≤ j ≤ n we find:

R(I)[j] =

1 if j ∈ I,

0 if j 6∈ I.

E.g., if V = {p, q, r} = {1, 2, 3} and I = {p, r} then R(I) = (1, 0, 1).

I Other encodings are possible, e.g.,

R(I)[j] =

1 if j ∈ I,

−1 if j 6∈ I.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science39

Computing the Core

I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:

p q r

12

12

12 input layer

ω2

ω2

ω2 output layer

hidden layer

p q r

Computing the Core

I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:

p q r

12

12

12 input layer

ω2

ω2

ω2 output layer

hidden layer

p q r

−ω2

6

ω

Computing the Core

I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:

p q r

12

12

12 input layer

ω2

ω2

ω2 output layer

hidden layer

p q r

−ω2

6

ωω

ω2

�������6

−ω

�������

Computing the Core

I Consider againP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I A translation algorithm translatesP into a core of binary threshold units:

p q r

12

12

12 input layer

ω2

ω2

ω2 output layer

hidden layer

p q r

−ω2

6

ωω

ω2

�������6

−ω

�������

−ω

ω2

��

��

���

�������

6

I Exercise Specify the core for {p1 ← p2, p1 ← p3 ∧ p4, p1 ← p5 ∧ p6}.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science40

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

ω2

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

ω2

12

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

ω2

12

ω2

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

ω2

12

ω2

ω2

Some Results

I Proposition 2-layer networks cannot compute TP for definiteP .

I Theorem For each programP , there exists a core computing TP .

I RecallP = {p, r ← p ∧ ¬q, r ← ¬p ∧ q}.I Adding recurrent connections:

12

12

12

ω2

ω2

ω2

−ω2

6

ω2

�������6

�������

ω2

��

��

���

�������

6

666

1−ω2

ω2

12

ω2

ω2

12

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science41

Strongly Determined Programs

I A logic programsP is said to be strongly determined if there exists a metric d onthe set of all Herbrand interpretations forP such that TP is a contraction wrt d.

I Exercise Are the following programs strongly determined?

. {p, q ← p, r ← q},

. {p1 ← p2, p1 ← p3 ∧ p4, p1 ← p5 ∧ p6},

. {p← ¬p}.

I Corollary Let P be a strongly determined program. Then there exists a corewith recurrent connections such that the computation with an arbitrary initial inputconverges and yields the unique fixed point of TP .

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science42

Time and Space Complexity

I Let n be the number of clausesand m be the number of propositional variables occurring inP .

. 2m + n units, 2mn connections in the core.

. TP(I) is computed in 2 steps.

. The parallel computational model to compute TP(I) is optimal.

. The recurrent network settles down in 3n steps in the worst case.

I Exercise Give an example of a program with worst case time behavior.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science43

Rule Extraction (1)

I PropositionFor each core C there exists a programP such that C computes TP .

-0.2 0.2

-0.40.3 0.6

u6

u3 u4

u1 u2

u5

u7

2

0.7 0 -1 -0.2

1

1 -2-0.5 1.5

0.3 0.8

u1 u2 u3 u4 u5 u6 u7

p3 v3 p4 v4 p5 v5 p6 v6 p7 v7

0 0 0 0 0 1 0 0 0 1 −1 00 1 1.5 1 .3 1 .8 1 1.8 1 .7 11 0 1 1 −1 0 −.5 0 2 1 .7 11 1 2.5 1 −.7 0 .3 0 2 1 .7 1

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science44

Rule Extraction (2)

I Extracted program:

P = { q1 ← ¬q1 ∧ ¬q2,

q1 ← ¬q1 ∧ q2, q2 ← ¬q1 ∧ q2,

q1 ← q1 ∧ ¬q2, q2 ← q1 ∧ ¬q2,

q1 ← q1 ∧ q2, q2 ← q1 ∧ q2 }.

I Simplified form:P = {q1, q2 ← q1, q2 ← ¬q1 ∧ q2}.

I You can do much better compared to this simple approach(see Mayer-Eichberger 2006).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science45

Literature

I Apt, van Emden 1982: Contributions to the Theory of Logic Programming. Journalof the ACM 29, 841-862.

I Fitting 1994: Metric Methods – Three Examples and a Theorem. Journal of LogicProgramming 21, 113-127.

I Funahashi 1989: On the Approximate Realization of Continuous Mappings byNeural Networks. Neural Networks 2, 183-192.

I Hitzler, Holldobler, Seda 2004: Logic Programs and Connectionist Networks.Journal of Applied Logic 2, 245-272.

I Holldobler, Kalinke 1994: Towards a Massively Parallel Computational Model forLogic Programming. In: Proceedings of the ECAI94 Workshop on Combining Sym-bolic and Connectionist Processing, 68-77.

I Holldobler, Kalinke, Storr 1999: Approximating the Semantics of Logic Programsby Recurrent Neural Networks. Applied Intelligence 11, 45-59.

I Markus-Eichberger 2006: Extracting Propositional Logic Programs from NeuralNetworks: A Decompositional Approach. Bacherlor Thesis TU Dresden.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science46

3-Layer Feed-Forward Networks Revisited

I Theorem (Funahashi 1989) Suppose that Ψ : R → R is non-constant, bounded,monotone increasing and continuous. Let K ⊆ Rn be compact, let f : K → Rbe continuous, and let ε > 0. Then there exists a 3-layer feed-forward networkwith output function Ψ for the hidden layer and linear output function for the inputand output layer whose input-output mapping f : K → R satisfies

maxx∈K|f(x)− f(x)| < ε.

. Every continuous function f : K → R can be uniformly approximated byinput-output functions of 3-layer feed-forward networks.

I uk is a sigmoidal unit if

Φ(~ik) = pk =Pm

j=1 wkjvj

Ψ(pk) = vk = 1

1+eβ(θk−pk)

where θk ∈ R is a threshold (or bias) and β > 0 a steepness parameter.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science47

Backpropagation

I Bryson, Ho 1969, Werbos 1974, Parker 1985, Rumelhart, etal. 1986:Can 3-layer feed-forward networks learn a particular function?

I Training set of input-output pairs {(~il, ~ol) | 1 ≤ l ≤ n}.I Minimize E =

Pl El where El = 1

2

Pk(o

lk − vl

k)2.

I Gradient descent algorithm to learn appropriate weights.

I Backpropagation

. Initialize weights arbitrarily.

. Do until all input-output patterns are correctly classified.

1 Present input pattern ~il at time t.2 Compute output pattern ~vl at time t + 2.3 Change weights according to ∆wl

ij = ηδliv

lj, where

δli =

Ψ′i(p

li)× (ol

i − vli) if i is output unit,

Ψ′i(pli)×

Pk δl

kwki if i is hidden unit,

η > 0 is called learning rate.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science48

Output Functions Revisited

I Remember sigmoidal function (with β = 1):

vi =1

1 + e−(P

j wijvj+θi)

I We finddvi

d(P

j wijvj + θi)= vi(1− vi).

I Hence

δli =

vl

i(1− vli)(o

li − vl

i) if ui is an output unit,vl

i(1− vli)

Pk δl

kwki if ui is a hidden unit.

I Units are active if vi ≥ 0.9 and passive if vi ≤ 0.1.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science49

Properties

I Learning rate η:

. If η is large, then system learns rapidly but may oscillate.

. If η is small, then system learns slowly but will not oscillate.

. In the ideal case η should be adapted during learning:

∆wij(t + 1) = ηδi(t)vj(t) + α∆wij(t)

where α is a constant and α∆wij(t) is called momentum term.

I Almost all functions can be learned.

I Learning is NP–hard.

I Literature Rumelhart etal. 1986: Parallel Distributed Processing. MIT Press.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science50

Level Mappings and Hierarchical Logic Programs

I Let V be a set of propositional variablesandP be a propositional logic program wrt V .

I A level mapping forP is a function l : V → N.

. We define l(¬A) = l(A).

I P is hierarchical if for all clauses A← L1 ∧ . . . ∧ Ln ∈ P we findl(A) > l(L1) for all 1 ≤ i ≤ n.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science51

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

��

��

@@

@@

@@I

−ω

ω2

K

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

��

��

@@

@@

@@I

−ω

ω2

K

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

��

��

@@

@@

@@I

−ω

ω2

K

3ω2

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

��

��

@@

@@

@@I

−ω

ω2

K

3ω2

ω2

Knowledge Based Artificial Neural Networks

I Towell, Shavlik 1994: Can we do better than empirical learning?

I Sets of hierarchical logic programs, e.g.,

P = {A← B ∧ C ∧ ¬D, A← D ∧ ¬E, H ← F ∧G, K ← A,¬H}.

B C D E F G

��

��6

@@

@I

ω2A

ω

3ω2

��

��

ω2

@@

@I

��

��6

��

��6

3ω2 H

��

��

@@

@@

@@I

−ω

ω2

K

3ω2

ω2

ω2

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science52

Knowledge Based Artificial Neural Networks – Learning

I Given hierachical sets of propositional rules as background knowledge.

I Map rules into multi-layer feed forward networks with sigmoidal units.

I Add hidden units (optional).

I Add units for known input features that are not referenced in the rules.

I Fully connect layers.

I Add near-zero random numbers to all links and thresholds.

I Apply backpropagation.

. Empirical evaluation: system performs betterthan purely empirical and purely hand-built classifiers.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science53

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

I pA = pB = 9ω

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.

I pC = 0.92ω

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.

I pC = 0.92ω and vc = 11+eβ(0.5ω−0.92ω) ≈ 0.6 with β = 1.

Knowledge Based Artificial Neural Networks – A Problem

I Works if rules have few conditions andthere are few rules with the same head.

. . .

A1 A9 A10

������*6

@@

@I

19ω2A

. . .

B1 B2 B10

��

��6

HHHHHHY

19ω2 B

ω2

C

����

��*

HHHH

HHY

ω2

I pA = pB = 9ω and vA = vB = 11+eβ(9.5ω−9ω) ≈ 0.46 with β = 1.

I pC = 0.92ω and vc = 11+eβ(0.5ω−0.92ω) ≈ 0.6 with β = 1.

I Literature Towell, Shavlik 1994: Knowledge Based Artificial Neural Networks.Artificial Intelligence 70, 119-165.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science54

Propositional Core Method using Bipolar Sigmoidal Units

I d’Avila Garcez, Zaverucha, Carvalho 1997:Can we combine the ideas in Holldobler, Kalinke 1994 and Towell, Shavlik 1994while avoiding the above mentioned problem?

I Consider propositional logic language.

I Let I be an interpretation and a ∈ [0, 1].

R(I)[j] =

v ∈ [a, 1] if j ∈ I,

w ∈ [−1,−a] if j 6∈ I.

I Replace threshold and sigmoidal units by bipolar sigmoidal ones,i.e., units with

Φ(~ik) = pk =Pm

j=1 wkjvj,

Ψ(pk) = vk = 2

1+eβ(θk−pk) − 1,

where θk ∈ R is a threshold (or bias) and β > 0 a steepness parameter.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science55

The Task

I How should a, ω and θi be selected such that:

. vi ∈ [a, 1] or vi ∈ [−1,−a] and

. the core computes the immediate consequence operator?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science56

Hidden Layer Units

I Consider A← L1 ∧ . . . ∧ Ln.

I Let u be the hidden layer unit for this rule.

. Suppose I |= L1 ∧ . . . ∧ Ln.

• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+

u .

Hidden Layer Units

I Consider A← L1 ∧ . . . ∧ Ln.

I Let u be the hidden layer unit for this rule.

. Suppose I |= L1 ∧ . . . ∧ Ln.

• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+

u .

. Suppose I 6|= L1 ∧ . . . ∧ Ln.

• u receives input≤ −ωa from at least one unit representing Li.• pu ≤ (n− 1)ω1− ωa = p−u .

Hidden Layer Units

I Consider A← L1 ∧ . . . ∧ Ln.

I Let u be the hidden layer unit for this rule.

. Suppose I |= L1 ∧ . . . ∧ Ln.

• u receives input≥ ωa from unit representing Li.• pu ≥ nωa = p+

u .

. Suppose I 6|= L1 ∧ . . . ∧ Ln.

• u receives input≤ −ωa from at least one unit representing Li.• pu ≤ (n− 1)ω1− ωa = p−u .

I θu = nωa+(n−1)ω−ωa2 = (na + n− 1− a)ω

2 = (n− 1)(a + 1)ω2 .

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science57

Output Layer Units

I Let µ be the number of clause with head A.

I Consider A← L1 ∧ . . . ∧ Ln.

I Suppose I |= L1 ∧ . . . ∧ Ln.

. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.

Output Layer Units

I Let µ be the number of clause with head A.

I Consider A← L1 ∧ . . . ∧ Ln.

I Suppose I |= L1 ∧ . . . ∧ Ln.

. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.

I Suppose for all rules of the form A← L1∧ . . .∧Ln we find I 6|= L1∧ . . .∧Ln.

. pA ≤ −µωa = p−A.

Output Layer Units

I Let µ be the number of clause with head A.

I Consider A← L1 ∧ . . . ∧ Ln.

I Suppose I |= L1 ∧ . . . ∧ Ln.

. pA ≥ ωa + (µ− 1)ω(−1) = ωa− (µ− 1)ω = p+A.

I Suppose for all rules of the form A← L1∧ . . .∧Ln we find I 6|= L1∧ . . .∧Ln.

. pA ≤ −µωa = p−A.

I θA = ωa−(µ−1)ω−µωa2 = (a− µ + 1− µa)ω

2 = (1− µ)(a + 1)ω2 .

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science58

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

I p+A > p−A:

. ωa− (µ− 1)ω > −µaω.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

I p+A > p−A:

. ωa− (µ− 1)ω > −µaω.

. ωa + µaω > (µ− 1)ω.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

I p+A > p−A:

. ωa− (µ− 1)ω > −µaω.

. ωa + µaω > (µ− 1)ω.

. a(1 + µ)ω > (µ− 1)ω.

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

I p+A > p−A:

. ωa− (µ− 1)ω > −µaω.

. ωa + µaω > (µ− 1)ω.

. a(1 + µ)ω > (µ− 1)ω.

. a > µ−1µ+1 .

Computing a Value for a

I p+u > p−u :

. nωa > (n− 1)ω − ωa.

. nωa + ωa > (n− 1)ω.

. a(n + 1)ω > (n− 1)ω.

. a > n−1n+1 .

I p+A > p−A:

. ωa− (µ− 1)ω > −µaω.

. ωa + µaω > (µ− 1)ω.

. a(1 + µ)ω > (µ− 1)ω.

. a > µ−1µ+1 .

I Consider all rules minimum value for a.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science59

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

I ln(1−a1+a) ≥ β(θ − p).

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

I ln(1−a1+a) ≥ β(θ − p).

I 1β ln(1−a

1+a) ≥ θ − p.

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

I ln(1−a1+a) ≥ β(θ − p).

I 1β ln(1−a

1+a) ≥ θ − p.

I Consider a hidden layer unit:

. 1β ln(1−a

1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na

2 ω = n−1−a(n+1)2 ω.

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

I ln(1−a1+a) ≥ β(θ − p).

I 1β ln(1−a

1+a) ≥ θ − p.

I Consider a hidden layer unit:

. 1β ln(1−a

1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na

2 ω = n−1−a(n+1)2 ω.

. ω ≥ 2(n−1−a(n+1))β ln(1−a

1+a) because a ≥ n−1n+1 .

Computing a Value for ω

I Ψ(p) = 21+eβ(θ−p) − 1 ≥ a.

I 21+eβ(θ−p) ≥ 1 + a.

I 21+a ≥ 1 + eβ(θ−p).

I 21+a − 1 = 2

1+a −1+a1+a = 1−a

1+a ≥ eβ(θ−p).

I ln(1−a1+a) ≥ β(θ − p).

I 1β ln(1−a

1+a) ≥ θ − p.

I Consider a hidden layer unit:

. 1β ln(1−a

1+a) ≥ (n− 1)(a + 1)ω2 −nωa = na+n−a−1−2na

2 ω = n−1−a(n+1)2 ω.

. ω ≥ 2(n−1−a(n+1))β ln(1−a

1+a) because a ≥ n−1n+1 .

I Consider all hidden and output layer units as well as the case that Ψ(p) ≤ −a:

minimum value for ω.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science60

Exercises

I Show that hierarchical programs are strongly determined.

I ConsiderP = {r ← p ∧ ¬q, r ← ¬p ∧ q, p← s ∧ t}.

. Compute values for a, ω and θi.

. Specify the core forP .

. How can the approach be extended to handle facts like s and t.?

I Consider nowP ′ = P ∪ {s, t}, whereP is as before.

. Show thatP ′ is strongly determined.

. Show that the recurrent network computes the least model ofP ∪ {s, t}.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science61

Results

I Relation to logic programs is preserved.

I The core is trainable by backpropagation.

I Many interesting applications, e.g.:

. DNA sequence analysis.

. Power system fault diagnosis.

I Empirical evaluation:system performs better than well-known machine learning systems.

I See d’Avila Garcez, Broda, Gabbay 2002 for details.

I Literature

. d’Avila Garcez, Zaverucha, Carvalho 1997: Logic Programming and Induct-ive Inference in Artificial Neural Networks. In: Knowledge Representationin Neural Networks Logos, Berlin, 33-46.

. d’Avila Garcez, Broda, Gabbay 2002: Neural-Symbolic Learning Systems:Foundations and Applications, Springer.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science62

Further Extensions

I Many-valued logic programs

I Modal logic programs

I Answer set programming

I Metalevel priorities

I Rule extraction

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science63

Propositional Core Method – Three-Valued Logic Programs

I Kalinke 1994: Consider truth values>, ⊥, u.

I Interpretations are pairs I = 〈I+, I−〉.I Immediate consequence operator ΦP(I) = 〈J+, J−〉, where

J+ = {A | A← L1 ∧ . . . ∧ Lm ∈ P and I(L1 ∧ . . . ∧ Lm) = >},J− = {A | for all A← L1 ∧ . . . ∧ Lm ∈ P : I(L1 ∧ . . . ∧ Lm) = ⊥}.

I Let n = |V| and identify V with {1, . . . , n}.I Define R : I → R2n as follows:

R(I)[2j − 1] =

1 if j ∈ I+

0 if j 6∈ I+

ffand R(I)[2j] =

1 if j ∈ I−

0 if j 6∈ I−

ff

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science64

Propositional Core Method – Multi-Valued Logic Programs

I For each programP , there exists a core computing ΦP , e.g.,

P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.

12

12

12

12

12

12

12

12

12

12

A ¬A B ¬B C ¬C D ¬D E ¬E

A ¬A B ¬B C ¬C D ¬D E ¬E

Propositional Core Method – Multi-Valued Logic Programs

I For each programP , there exists a core computing ΦP , e.g.,

P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.

12

12

12

12

12

12

12

12

12

12

A ¬A B ¬B C ¬C D ¬D E ¬E

A ¬A B ¬B C ¬C D ¬D E ¬E

3ω2

ω2

������*

������*

��

��

@@

@I

����

��*

����

��*

ω2

ω2

Propositional Core Method – Multi-Valued Logic Programs

I For each programP , there exists a core computing ΦP , e.g.,

P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.

12

12

12

12

12

12

12

12

12

12

A ¬A B ¬B C ¬C D ¬D E ¬E

A ¬A B ¬B C ¬C D ¬D E ¬E

3ω2

ω2

������*

������*

��

��

@@

@I

����

��*

����

��*

ω2

ω2

3ω2

ω2

6 6

XXXXXXXXXXXXy

XXXXXXXXXXXXy

������*

������*

ω2

3ω2

Propositional Core Method – Multi-Valued Logic Programs

I For each programP , there exists a core computing ΦP , e.g.,

P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.

12

12

12

12

12

12

12

12

12

12

A ¬A B ¬B C ¬C D ¬D E ¬E

A ¬A B ¬B C ¬C D ¬D E ¬E

3ω2

ω2

������*

������*

��

��

@@

@I

����

��*

����

��*

ω2

ω2

3ω2

ω2

6 6

XXXXXXXXXXXXy

XXXXXXXXXXXXy

������*

������*

ω2

3ω2

ω2

ω2

���������1

��

��>

6 6

Propositional Core Method – Multi-Valued Logic Programs

I For each programP , there exists a core computing ΦP , e.g.,

P = {C ← A ∧ ¬B, D ← C ∧ E, D ← ¬C}.

12

12

12

12

12

12

12

12

12

12

A ¬A B ¬B C ¬C D ¬D E ¬E

A ¬A B ¬B C ¬C D ¬D E ¬E

3ω2

ω2

������*

������*

��

��

@@

@I

����

��*

����

��*

ω2

ω2

3ω2

ω2

6 6

XXXXXXXXXXXXy

XXXXXXXXXXXXy

������*

������*

ω2

3ω2

ω2

ω2

���������1

��

��>

6 6

ω2 −

ω2

ω2 −

ω2

ω2 −

ω2

I Lane, Seda 2004: Extension to finitely determined sets of truth values.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science65

Propositional Core Method – Modal Logic Programs

I d’Avila Garcez, Lamb, Gabbay 2002.

I Let L be a propositional logic language plus

. the modalities 2 and 3, and

. a finite set of labels w1, . . . , wk denoting worlds.

I Let B be an atom, then 2B and 3B are modal atoms.

I A modal definite logic programP is a set of clauses of the form

wi : A← A1 ∧ . . . ∧Am

together with a finite set of relations wi Iwj, wherewi, 1 ≤ i, j ≤ k, are labels and A, A1, . . . , Am are atoms or modal atoms.

I P =Sk

i=1Pi, wherePi consists of all clauses labelled with wi.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science66

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

A

2AB

2B2C

B 2AB

2B

2C

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

A

2AB

2B2C

B 2AB

2B

2C

2B3B3C

2B3B

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

A

2AB

2B2C

B 2AB

2B

2C

2B3B3C

2B3B

C

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

A

2AB

2B2C

B 2AB

2B

2C

2B3B3C

2B3B

C2C3C

Modal Logic Programs – Semantics

I Example: P = {w1 : A, w1 : 3C ← A}∪ {w2 : B}∪ {w3 : B}∪ {w4 : B}∪ {w1 Iw2, w1 Iw3, w1 Iw4, w2 Iw4, }

I Kripke semantics:

• •

w1

w3

w2 w4

������

�����*

HHHHHHH

HHHHj

-

A

2AB

2B2C

B 2AB

2B

2C

2B3B3C

2B3B

C2C3C

3C

C

fC(w1) = w4

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science67

Modal Immediate Consequence Operator

I Interpretations are tuples I = 〈I1, . . . , Ik〉I Immediate consequence operator MTP(I) = 〈J1, . . . , Jk〉, where

Ji = {A | there exists A← A1 ∧ . . . ∧Am ∈ Pi

such that {A1, . . . , Am} ⊆ Ii}∪ {3A | there exists wi Iwj ∈ P and A ∈ Ij}∪ {2A | for all wi Iwj ∈ P we find A ∈ Ij}∪ {A | there exists wj Iwi ∈ P and 2A ∈ Ij}∪ {A | there exists wj Iwi ∈ P, 3A ∈ Ij and fA(wj) = wi}

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science68

Modal Logic Programs – The Translation Algorithm

I Let n = |V| and identify V with {1, . . . , n}.I Let a ∈ [0, 1].

I Define R : I → R3n as follows:

R(I)[3j − 2] =

v ∈ [a, 1] if j ∈ Ij

w ∈ [−1,−a] if j 6∈ Ij

R(I)[3j − 1] =

v ∈ [a, 1] if 2j ∈ Ij

w ∈ [−1,−a] if 2j 6∈ Ij

R(I)[3j] =

v ∈ [a, 1] if 3j ∈ Ij

w ∈ [−1,−a] if 3j 6∈ Ij

I Translation algorithm such that

. for each world the “local” part of MTP is computed by a core,

. the cores are turned into recurrent networks, and

. the cores are connected with respect to the given set of relations.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science69

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

∧ ∨

The Example Network

A 2A

3A

B 2B

3B

C 2C

3C

w1

w2 w3

w4

6 6

6

6 6

6

6

6

6

6

6

6

r6∧

6

6

6

r6∧6

6

6

r

r6 66 6r

6

6

r6∧

6

6

6

∧ ∧ ∧

∧ ∧ ∧

∧ ∨

∧ ∨

∧ ∨

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science70

First-Order Logic

I Existing Approaches

. Reflexive Reasoning and SHRUTI

. Connectionist Term Representations

• Holographic Reduced Representations Plate 1991• Recursive Auto-Associative Memory Pollack 1988

. Horn logic and CHCL Holldobler 1990, Holldobler, Kurfess 1992

. Other Approaches

I First-Order Logic Programs and the Core Method

. Initial Approach

. Construction of Approximating Networks

. Topological Analysis and Generalisations

. Employing Iterated Function Systems

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science71

Literature

I Holldobler 1990: A Structured Connectionist Unification Algorithm. In: Proceed-ings of the AAAI National Conference on Artificial Intelligence, 587-593.

I Holldobler, Kurfess 1992: CHCL – A Connectionist Inference System. In: Parallel-ization in Inference Systems, Lecture Notes in Artificial Intelligence, 590, 318-342.

I Plate 1991: Holographic Reduced Representations. In Proceedings of the Interna-tional Joint Conference on Artificial Intelligence, 30-35.

I Pollack 1988: Recursive auto-associative memory: Devising compositional dis-tributed representations. In: Proceedings of the Annual Conference of the Cognit-ive Science Society , 33-39.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science72

Reflexive Reasoning

I Humans are capable of performing a wide variety of cognitive taskswith extreme ease and efficiency.

I For traditional AI systems, the same problems turn out to be intractable.

I Human consensus knowledge: about 108 rules and facts.

I Wanted: “Reflexive” decisions within sublinear time.

I Shastri, Ajjanagadde 1993: SHRUTI.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science73

SHRUTI – Knowledge Base

I Finite set of constants C, finite set of variables V .

I Rules:

. (∀X1 . . . Xm) (p1(. . .) ∧ . . . ∧ pn(. . .)→ (∃Y1 . . . Yk p(. . .)).

. p, pi, 1 ≤ i ≤ n, are multi-place predicate symbols.

. Arguments of the pi: variables from {X1, . . . , Xm} ⊆ V .

. Arguments of p are from {X1, . . . , Xm} ∪ {Y1, . . . , Yk} ∪ C.

. {Y1, . . . , Yk} ⊆ V .

. {X1, . . . , Xm} ∩ {Y1, . . . , Yk} = ∅.

I Facts and queries (goals):

. (∃Z1 . . . Zl) q(. . .).

. Multi-place predicate symbol q.

. Arguments of q are from {Z1, . . . , Zl} ∪ C.

. {Z1, . . . , Zl} ⊆ V .

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science74

Further Restrictions

I Restrictions to rules, facts, and goals:

. No function symbols except constants.

. Only universally bound variables may occur as argumentsin the conditions of a rule.

. All variables occurring in a fact or goal occur only onceand are existentially bound.

. An existentially quantified variable is only unified with variables.

. A variable which occurs more than once in the conditions of a rule mustoccur in the conclusion of the rule and must be bound when the conclusionis unified with a goal.

. A rule is used only a fixed number of times.

Incompleteness.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science75

SHRUTI – Example

I RulesP = { owns(Y, Z)← gives(X, Y, Z),

owns(X, Y )← buys(X, Y ),can-sell(X, Y )← owns(X, Y ),gives(john, josephine, book),(∃X) buys(john, X),owns(josephine, ball) },

I Queries:can-sell(josephine, book) ; yes(∃X) owns(josephine, X) ; yes {X 7→ book}

{X 7→ ball}

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science76

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

H } } H } }

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

H } } H } }

I

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

H } } H } }

I

H

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

H } } H } }

I

H

H

SHRUTI : The Network

�� ��AA AAgives AA AA�� ��m mm mm buys

- -�� ��HH HH

6 6

r rr rrr r r

from johnfrom jos.from book

r from john

��

��

��

��

���

@@

@@

@@

@@

@@I �

��

��

��

��

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

@@

@@

@@

@@

@@R

��

��

��

��

��

��

��

��

? ?

��

��

AA

AA

owns

can-sell

AA

AA

��

��

m

m

m

m

@@

@ rr@@ r

r@

@@

@@

@ -��HH

@@

@@

@@

@@I

6

? ? ?

jose

phin

e

john

ball

book

H } }

H } }

H } } H } }

I

H

H

H

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science77

Solving the Variable Binding Problem

bookjohnball

josephinecan–sell4can–sell5

can–sell 1st argcan–sell 2nd arg

owns4owns5

owns 1st argowns 2nd arg

owns �

gives4gives5

gives 1st arggives 2nd arggives 3nd arg

gives �

buys4buys5

buys 1st argbuys 2nd arg

buys �

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science78

SHRUTI – Remarks

I Answers are derived in time proportional to depth of search space.

I Number of units as well as of connections is linear in the sizeof the knowledge base.

I Extensions:

. compute answer substitutions

. allow a fixed number of copies of rules

. allow multiple literals in the body of a rule

. built in a taxonomy

I ROBIN (Lange, Dyer 1989): signatures instead of phases.

I Biological plausibility.

I Trading expressiveness for time and size.

I Logical reconstruction by Beringer, Holldobler 1993:

. Reflexive reasoning is reasoning by reduction.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science79

Literature

I Beringer, Holldobler 1993: On the Adequateness of the Connection Method. In:Proceedings of the AAAI National Conference on Artificial Intelligence, 9-14.

I Shastri, Ajjanagadde 1993: From Associations to Systematic Reasoning: A Con-nectionist Representation of Rules, Variables and Dynamic Bindings using Tem-poral Synchrony. Behavioural and Brain Sciences 16, 417-494.

I Lange, Dyer 1989: High-Level Inferencing in a Connectionist Network. ConnectionScience 1, 181-217.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science80

First-Order Logic Programs and the Core Method

I Initial Approach

I Construction of Approximating Networks

I Topological Analysis and Generalisations

I Employing Iterated Function Systems

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science81

Logic Programs

I A logic programP over a first-order language L is a finite set of clauses

A← L1 ∧ . . . ∧ Ln,

where A is an atom, Li are literals and n ≥ 0.

I BL is the set of all ground atoms over L called Herbrand base.

I A Herbrand interpretation I is a mapping BL → {>,⊥}.I 2BL is the set of all Herbrand interpretations.

I ground(P) is the set of all ground instances of clauses inP .

I Immediate consequence operator TP : 2BL → 2BL:

TP(I) = {A | there is a clause A← L1 ∧ . . . ∧ Ln ∈ ground(P)such that I |= L1 ∧ . . . ∧ Ln}.

I I is a supported model iff TP(I) = I.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science82

The Initial Approach

I Holldobler, Kalinke, Storr 1999:Can the core method be extended to first-order logic programs?

I Problem

. Given a logic programP over a first order language Ltogether with TP : 2BL → 2BL.

. BL is countably infinite.

. The method used to relate propositional logic and connectionist systems isnot applicable.

. How can the gap between the discrete, symbolic setting of logic, and thecontinuous, real valued setting of connectionist networks be closed?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science83

The Goal

I Find R : 2BL → R and fP : R→ R such that the following conditions hold.

. TP(I) = I′ implies fP(R(I)) = R(I′).fP(x) = x′ implies TP(R−1(x)) = R−1(x′).

fP is a sound and complete encoding of TP .

. TP is a contraction on 2BL iff fP is a contraction on R.

The contraction property and fixed points are preserved.

. fP is continuous on R.

A connectionst network approximating fP is known to exist.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science84

Acyclic Logic Programs

I LetP be a program over a first order language L.

I A level mapping forP is a function l : BL → N.

. We define l(¬A) = l(A).

I We can associate a metric dL with L and l. Let I, J ∈ 2BL:

dL(I, J) =

0 if I = J

2−n if n is the smallest level on which I and J differ.

I Proposition (Fitting 1994) (2BL, dL) is a complete metric space.

I P is said to be acyclic wrt a level mapping l,if for every A← L1 ∧ . . . ∧ Ln ∈ ground(P) we find l(A) > l(Li) for all i.

I Proposition LetP be an acyclic logic program wrt l and dL the metric associatedwith L and l, then TP is a contraction on (2BL, dL).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science85

Mapping Interpretations to Real Numbers

I LetD = {r ∈ R | r =P∞

i=1 ai4−i, where ai ∈ {0, 1} for all i}.I Let l be a bijective level mapping.

I {>,⊥} can be identified with {0, 1}.I The set of all mappings BL → {>,⊥} can be identified with

the set of all mappings N→ {0, 1}.I Let IL be the set of all mappings from BL to {0, 1}.I Let R : IL → D be defined as

R(I) =∞Xi=1

I(l−1(i))4−i.

I Proposition R is a bijection.

We have a sound and complete encoding of interpretations.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science86

Mapping Immediate Consequence Operators to Functions on the Reals

I We define fP : D → D : r 7→ R(TP(R−1(r))).

r -

-

fP

TP

r′

I I′

? ?

R R

We have a sound and complete encoding of TP .

I Proposition LetP be an acylic program wrt a bijective level mapping.fP is a contraction onD.

Contraction property and fixed points are preserved.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science87

Approximating Continuous Functions

I Corollary fP is continuous.

I Recall Funahashi’s theorem:

. Every continuous function f : K → R can be uniformly approximated byinput-output functions of 3-layer feed forward networks.

I Theorem fP can be uniformly approximated by input-output functions of 3-layerfeed forward networks.

. TP can be approximated as well by applying R−1 .

Connectionist network approximating immediate consequence operator exists.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science88

An Example

I Consider P = {q(0), q(s(X))← q(X)} and let l(q(sn(0))) = n + 1.

. P is acyclic wrt l, l is bijective, R(BL) = 13.

. fP(R(I)) = 4−l(q(0)) +P

q(X)∈I 4−l(q(s(X)))

= 4−l(q(0)) +P

q(X)∈I 4−(l(q(X)))+1) = 1+R(I)4 .

I Approximation of fP to accuracy ε yields

f(x) ∈»1 + x

4− ε,

1 + x

4+ ε

–.

I Starting with some x and iterating f yields in the limit a value

r ∈»1− 4ε

3,1 + 4ε

3

–.

I Applying R−1 to r we find

q(sn(0)) ∈ R−1(r) if n < −log4ε− 1.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science89

Approximation of Interpretations

I LetP be a logic program over a first order language L and l a level mapping.

I An interpretation I approximates an interpretation J to a degree n ∈ Nif for all atoms A ∈ BL with l(A) < n we find I(A) = > iff J(A) = >.

. I approximates J to a degree n iff dL(I, J) ≤ 2−n.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science90

Approximation of Supported Models

I Given an acyclic logic programP with bijective level mapping.

I Let TP be the immediate consequence operator associated withP andMP the least supported model ofP .

I We can approximate TP by a 3-layer feed forward network.

I We can turn this network into a recurrent one.

Does the recurrent network approximate the supported model ofP?

I Theorem For an arbitrary m ∈ N there exists a recursive network with sigmoidalactivation function for the hidden layer units and linear activation functions forthe input and output layer units computing a function fP such that there exists ann0 ∈ N such that for all n ≥ n0 and for all x ∈ [−1, 1] we find

dL(R−1(f

n

P(x)), MP) ≤ 2−m.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science91

First Order Core Method – Extensions

I Detailed study in (topological) continuity of semantic operatorsHitzler, Seda 2003 and Hitzler, Holldobler, Seda 2004:

. many-valued logics,

. larger class of logic programs,

. other approximation theorems.

I A core method for reflexive reasoning Holldobler, Kalinke, Wunderlich 2000.

I The graph of fP is an attractor of some iterated function systemBader 2003 and Bader, Hitzler 2004:

. representation theorems,

. fractal interpolation,

. core with units computing radial basis functions.

I Finitely determined sets of truth values Lane, Seda 2004.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science92

Constructive Approaches: Fibring Artificial Neural Networks

I Fibring function Φ associated with neuron i maps some weights w of a networkto new values depending on w and the input x of i (Garcez, Gabbay:2004).

w

Φ

x y= =

I Idea approximate fP by computing values of atoms with level n = 1, 2, . . ..

Clause1

Clause2

Clausex

Φ

+1

TP(I)I

n

I Works well for acyclic logic programs with bijective level mapping(Bader, Garcez, Hitzler 2004).

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science93

Constructive Approaches: Approximating Piecewise Constant Functions

I Consider graph of fP .

Approximate fP up to a given level l.

Construct network computing piecewise constant function.

Step activation functions.Sigmoidal activation functions.Radial basis functions.

0.5

0.2

0.45

0.1

0.4

0.35

0 0.50.40.3

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science94

Constructive Approaches: Approximating Piecewise Constant Functions

I Consider graph of fP .

I Approximate fP up to a given level l.

Construct core computing piecewise constant function.

Step activation functions.Sigmoidal activation functions.Radial basis functions.

0.5

0.2

0.45

0.1

0.4

0.35

0 0.50.40.3

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science95

Constructive Approaches: Approximating Piecewise Constant Functions

I Consider graph of fP .

I Approximate fP up to a given level l.

I Construct core computing piecewise constant function.

. Step activation functions.Sigmoidal activation functions.Radial basis functions.

0.5

0.2

0.45

0.1

0.4

0.35

0 0.50.40.3

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science96

Constructive Approaches: Approximating Piecewise Constant Functions

I Consider graph of fP .

I Approximate fP up to a given level l.

I Construct core computing piecewise constant function.

. Step activation functions.

. Sigmoidal activation functions.Radial basis functions.

0.5

0.2

0.45

0.1

0.4

0.35

0 0.50.40.3

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science97

Constructive Approaches: Approximating Piecewise Constant Functions

I Consider graph of fP .

I Approximate fP up to a given level l.

I Construct core computing piecewise constant function.

. Step activation functions.

. Sigmoidal activation functions.

. Radial basis functions.

3210-1-2

1

-3

0.8

0.6

0.4

0.2

03210-1-2

1

-3

0.8

0.6

0.4

0.2

0

I Bader, Hitzler, Witzel 2005.

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science98

Open Problems

I How can first order terms be represented and manipulatedin a connectionist system? Pollack 1990, Holldobler 1990, Plate 1994.

I Can the mapping R be learned? Gust, Kuhnberger 2004.

I How can first order rules be extracted from a connectionist system?

I How can multiple instances of first order rules be representedin a connectionist system? Shastri 1990.

I What does a theory for the integration of logic and connectionist systemslook like?

I Can such a theory be applied in real domains outperformingconventional approaches?

I How does the core method relate to model-based reasoning approachesin cognitive science (e.g. Barnden 1989, Johnson-Laird, Byrne 1993)?

ICCLInternational Center for Computational Logic

Algebra, Logic and Formal Methods in Computer Science99

Recommended