22
IJCNN, July 27, 2004 [email protected] 1 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

Embed Size (px)

DESCRIPTION

IJCNN, July 27, Introduction ● Spiking neural networks get increased attention: ● Biologically more plausible ● Computationally stronger (W. Maass) ● Compact and fast implementation in hardware possible (analogue and digital) ● Have temporal nature ● Main problem: supervised learning algorithms

Citation preview

Page 1: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 1

Extending SpikeProp

Benjamin SchrauwenJan Van Campenhout

Ghent UniversityBelgium

Page 2: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 2

Overview

● Introduction● SpikeProp● Improvements● Results● Conclusions

Page 3: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 3

Introduction

● Spiking neural networks get increased attention:

● Biologically more plausible● Computationally stronger (W. Maass)● Compact and fast implementation in hardware

possible (analogue and digital)● Have temporal nature

● Main problem: supervised learning algorithms

Page 4: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 4

SpikeProp

● Introduced by S. Bohte et al. in 2000● An error-backpropagation learning algorithm● Only for SNN using “time-to-first-spike”

coding

t

~1/a

Page 5: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 5

Architecture of SpikeProp

● Originally introduced by Natschläger and Ruf● Every connection consists of several synaptic

connections● All 16 synaptic connections have enumerated delays (1-

16ms) and different weights, originally same filter

Page 6: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 6

SRM neuron

● Modified Spike Response Model (Gerstner)

t

Neuron reset ofno interest because

only one spike needed !

Page 7: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 7

Idea behind SpikeProp

Minimize SSE between actual output spike time and desired output spike time

Change weight along negative direction of the gradient

Page 8: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 8

Math of SpikeProp

Only output layer given

Linearise around thresholdcrossing time

Page 9: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 9

Problems with SpikeProp

● Overdetermined architecture● Tendency to get stuck when a neuron stops

firing● Problems with weight initialisation

Page 10: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 10

Solving some of the problems

● Instead of enumerating parameters: learn them

● Delays● Synaptic time constants● Thresholds

● We can use much more limited architecture● Add specific mechanism to keep neurons

firing: decrease threshold

Page 11: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 11

Learn more parameters

● Quite similar to weight update rule● Gradient of error with respect to parameter● Parameter specific learning rate

Page 12: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 12

Math of the improvements - delays

Delta is the same as for weight rule,thus different delta formula for outputas for inner layers.

Page 13: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 13

Math of the improvements – synaptic time constants

Page 14: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 14

Math of the improvements - thresholds

Page 15: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 15

What if training gets stuck?

● If one of the neurons in the network stops firing: training rule stops working

● Solution: actively lower threshold of neuron whenever it stops firing (multiply by 0.9)

● Same as scaling all the weights up● Improves convergence

Page 16: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 16

What about weight initialisation

● Weight initialisation is a difficult problem● Original publication has vague description of process● S. M. Moore contacted S. Bohte personally for

clarifying the subject for his masters thesis● Weight initialisation is done by a complex procedure● Moore concluded that: ”weights should be initialized in

such a way that every neuron initially fires, and that its membrane potential doesn’t surpass the threshold too much”

Page 17: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 17

What about weight initialisation

● In this publication we chose a very simple initialisation procedure

● Initialise all weights randomly● Afterwards, set a weight such that the sum of all

weights is equal to 1.5● Convergence rates could be increased by

using more complex initialisation procedure

Page 18: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 18

Problem with large delays

• During the testing of the algorithm a problem arose when the trained delays got very large: delay learning stopped

• If input is preceded by output: problem• Solved by constraining delays

Output of neuron Input of neuron

Page 19: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 19

Results

● Tested for binary XOR (MSE = 1ms)• Bohte:

• 3-5-1 architecture• 16 synaptic terminals• 20*16 = 320 weights• 250 training cycles

• Improvements:• 3-5-1 architecture• 2 synaptic terminals• 20*2 = 40 weights• 130 training cycles• 90% convergence

• 3-3-1 architecture• 2 synaptic terminals• 12*2 = 24 weights• 320 training cycles• 60% convergence

Page 20: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 20

Results

● Optimal learning rates (found by experiment):● ● ● ●

● Some rates seem very high, but that is because the values we work with are times expressed in ms

● Idea that learning rate must be approx. 0.1 is only correct when input and weights are normalised !!

Page 21: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 21

Conclusions

● Because parameters can be learned, no enumeration is necesarry, thus architectures are much smaller

● For XOR: ● 8 times less weights needed● Learning converges faster (50% of original)● No complex initialisation functions● Positive and negative weights can be mixed● But convergence deteriorate with further reduction

of weights

Page 22: IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 22

Conclusions

● Technique only tested on small problem, should be tested on real world applications

● But, we are currently preparing a journal paper on a new backprop rule that:

● supports a multitude of coding hypotheses (population coding, convolution coding, ...)

● better convergence● simpler weight initialisation● ...