21
The Principle of Presence: A Heuristic for Growing Knowledge Structured Neural Networks Laurent Orseau, INSA/IRISA, Rennes, France

The Principle of Presence: A Heuristic for Growing Knowledge Structured Neural Networks Laurent Orseau, INSA/IRISA, Rennes, France

Embed Size (px)

Citation preview

The Principle of Presence:

A Heuristic for Growing Knowledge Structured Neural

Networks

Laurent Orseau, INSA/IRISA, Rennes, France

Neural Networks Efficient at learning single problems

Fully connected Convergence in W3

Lifelong learning: Specific cases can be important More knowledge, more weights Catastrophic forgetting

-> Full connectivity not suitable-> Need localilty

How can people learn so fast? Focus, attention Raw table

storing? Frog and Car and Running woman

With generalization

What do people memorize? (1)

1 memory: a set of « things » Things are made of other, simpler

things Thing=concept Basic concept=perceptual event

What do people memorize? (2)

Remember only what is present in mind at the time of memorization: What is seen What is heard What is thought Etc.

What do people memorize? (3) Not what is not in mind!

Too many concepts are known What is present:

Few things Probably important

What is absent: Many things Probably unrelevant

Good but not always true -> heuristic

Presence in everyday life Easy to see what is present,

harder to tell what is missing Infants lose attention to balls that

have just disappeared The zero number invented long

after other digits Etc.

The principle of presence Memorization = create a new

concept upon only active concepts Independant of the number of

known concepts Few active concepts

-> few variables-> fast generalization

Implications A concept can be active or inactive. Activity must reflect importance, be rare

~ event (programming) New concept = conjunction of actives

ones Concepts must be re-usable(lifelong):

Re-use = create a link from this concept 2 independant concepts = 2 units

-> More symbolic than MLP: a neuron can represent too many things

Implementation: NN Nonlinearity Graphs properties: local or global

connectivity Weights:

Smooth on-line generalization Resistant to noise

But more symbolic: Inactivity: piecewise continuous activation

function Knowledge not too much distributed Concepts not too much overlapping

First implementation Inputs: basic events Output: target concept No macro-concept:

-> 3-layer Neuron = conjunction,

unless explicit (supervised learning),-> DNF

Output weights simulate priority

Locality in learning Only one neuron modified at a time:

Nearest = most activated If target concept not activated when it

should: Generalize the nearest connected neuron Add a neuron for that specific case

If target active, but not enough or too much: Generalize the most activating neuron

Learning: example (0) Must learn AB. Examples: ABC, ABD, ABE, but not

AB. A

B

C

D

E

AB

Target already exists

Inputs:

Learning: example (1) ABC:

A

B

C

D

E

AB1

Disjunction1/3

1/3

1/3

2/3

Conjunction

10

1

1-1/Ns

N1

N1 active when A, B and C all active

Learning : example (2) ABD:

A

B

C

D

E

AB

N1>1/3

>1/3

<1/3

1

2/3

N2

1/31/3

1/3

12/3

1/3

1/3

1/3

Learning : example (3) ABE: N1 slightly active for AB

A

B

C

D

E

AB

N1>>1/3

>>1/3

<<1/3

1

N2

1/31/3

1/3

1

2/3

2/3

>1/3

>1/3

<1/3

Learning : example (4) Final: N1 has generalized, active for AB

A

B

C

D

E

AB

N11/2

1/2

0

1

N2

1/31/3

1/3

1

2/3

2/3

Unuseful neuronDeleted by criterion

NETtalk task TDNN: 120 neurons, 25.200 cnx, 90% Presence: 753 neurons, 6.024 cnx,

74% Then learns by heart

If inputs activity reversed-> catastrophic!

Many cognitive tasks heavily biased toward the principle of presence?

Advantages w/r NNs As many inputs as wanted, only

active ones are used Lifelong learning:

Large scale networks Learns specific cases and generalizes,

both quickly Can lower weights without wrong

prediction -> imitation

But… Few data, limiting the number of

neurons:not as good as backprop

Creates many neurons (but can be deleted)

No negative weights

Work in progress Negative case, must stay rare

Inhibitory links Re-use of concepts

Macro-concepts: each concept can become an input