IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium

IJCNN, July 27, 2004 [email protected] 1

Extending SpikeProp

Benjamin SchrauwenJan Van Campenhout

Ghent UniversityBelgium


Overview

● Introduction● SpikeProp● Improvements● Results● Conclusions


Introduction

● Spiking neural networks get increased attention:

● Biologically more plausible● Computationally stronger (W. Maass)● Compact and fast implementation in hardware

possible (analogue and digital)● Have temporal nature

● Main problem: supervised learning algorithms


SpikeProp

● Introduced by S. Bohte et al. in 2000● An error-backpropagation learning algorithm● Only for SNN using “time-to-first-spike”

coding

t

~1/a


Architecture of SpikeProp

● Originally introduced by Natschläger and Ruf● Every connection consists of several synaptic

connections● All 16 synaptic connections have enumerated delays (1-

16ms) and different weights, originally same filter


SRM neuron

● Modified Spike Response Model (Gerstner)

t

Neuron reset ofno interest because

only one spike needed !


Idea behind SpikeProp

Minimize SSE between actual output spike time and desired output spike time

Change weight along negative direction of the gradient


Math of SpikeProp

Only output layer given

Linearise around thresholdcrossing time


Problems with SpikeProp

● Overdetermined architecture● Tendency to get stuck when a neuron stops

firing● Problems with weight initialisation


Solving some of the problems

● Instead of enumerating parameters: learn them

● Delays● Synaptic time constants● Thresholds

● We can use much more limited architecture● Add specific mechanism to keep neurons

firing: decrease threshold


Learn more parameters

● Quite similar to weight update rule● Gradient of error with respect to parameter● Parameter specific learning rate


Math of the improvements - delays

Delta is the same as for weight rule,thus different delta formula for outputas for inner layers.


Math of the improvements – synaptic time constants


Math of the improvements - thresholds


What if training gets stuck?

● If one of the neurons in the network stops firing: training rule stops working

● Solution: actively lower threshold of neuron whenever it stops firing (multiply by 0.9)

● Same as scaling all the weights up● Improves convergence


What about weight initialisation

● Weight initialisation is a difficult problem● Original publication has vague description of process● S. M. Moore contacted S. Bohte personally for

clarifying the subject for his masters thesis● Weight initialisation is done by a complex procedure● Moore concluded that: ”weights should be initialized in

such a way that every neuron initially fires, and that its membrane potential doesn’t surpass the threshold too much”


What about weight initialisation

● In this publication we chose a very simple initialisation procedure

● Initialise all weights randomly● Afterwards, set a weight such that the sum of all

weights is equal to 1.5● Convergence rates could be increased by

using more complex initialisation procedure


Problem with large delays

• During the testing of the algorithm a problem arose when the trained delays got very large: delay learning stopped

• If input is preceded by output: problem• Solved by constraining delays

Output of neuron Input of neuron


Results

● Tested for binary XOR (MSE = 1ms)• Bohte:

• 3-5-1 architecture• 16 synaptic terminals• 20*16 = 320 weights• 250 training cycles

• Improvements:• 3-5-1 architecture• 2 synaptic terminals• 20*2 = 40 weights• 130 training cycles• 90% convergence

• 3-3-1 architecture• 2 synaptic terminals• 12*2 = 24 weights• 320 training cycles• 60% convergence


Results

● Optimal learning rates (found by experiment):● ● ● ●

● Some rates seem very high, but that is because the values we work with are times expressed in ms

● Idea that learning rate must be approx. 0.1 is only correct when input and weights are normalised !!


Conclusions

● Because parameters can be learned, no enumeration is necesarry, thus architectures are much smaller

● For XOR: ● 8 times less weights needed● Learning converges faster (50% of original)● No complex initialisation functions● Positive and negative weights can be mixed● But convergence deteriorate with further reduction

of weights


Conclusions

● Technique only tested on small problem, should be tested on real world applications

● But, we are currently preparing a journal paper on a new backprop rule that:

● supports a multitude of coding hypotheses (population coding, convolution coding, ...)

● better convergence● simpler weight initialisation● ...

Documents

IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium