Brain-like software architecture Confessions of an ex-neuroscientist Bill Softky

Preview:

Citation preview

Brain-like software architecture

Confessions of an ex-neuroscientist

Bill Softky

Which comes first: the problem or the solution?

• Reverse engineering starts with hardware, works backward

• Usually only succeeds if problem is understood

• “Forward” software engineering starts with the problem, and saves hardware for last

“Forward” software engineering

Question UPS example e*Trade example

1 What is the “truth” out there?

Package en route John Doe, investor

2 How does input data approximate the truth?

Tracking #, destination Customer #portfolio ownedtrades made

3 What do we want to do with the data?

Deliver to next station; display web tracking page

Log in, display net worth, trade stocks

4 What architecture? Client-server, relational DB

Distributed serversseparate web front-end

5 What implementation? Oracle, C++, cgi-script web, Solaris machines

MySQL, Java server pages, Linux/Intel clusters

“Reverse” engineeringQuestion Neocortex vision Neocortex audio

? What is the “truth” out there?

Moving objects Speech

? How does input data approximate the truth?

Retinal “pixels”: contours, color, correlation, disparity

Sound pressure waveforms: frequencies, stereo, echoes

? What do we want to do with the data?

Find what and where an object is

Figure who/where is talking, what they're saying, what they mean

1-2 What architecture? Cortical columns, attractors, spikes, associative

1-2 What implementation? Hebbian synapses, integrate-and-fire, shunting inhibition

From an engineering perspective, this is nuts!

Initial goals here

• Input: we need a generic description of sensory input (at least audio & visual)

• Processing: speculate on generic, modular processing “API” which can untangle those correlations

• No neurons, synapses, spikes…yet.

Simple “truth” tangled inputs

Hypothesis: each entangling transformation is fairly simple

Stepwise decorr untangled truth

Hypothesis: a sequence of similar compressions will yield useful representation

First toy problem: cocktail party with echoes

• Multiple independent speakers• Multiple “ears” (mics)• Multiple echoes/amplitudes for each speaker/mic

combo• Echo patterns constrained (3-D) and unchanging

Try to remove echoes and separate speakers (our brains can do this...)

Echo kernels = location info

M-a

S 1

S 2

M-c

M-b=

M-a

M-b

M-c

“pure signals” Echo kernels, transfer

functions, “maps”

“entangled signals”

3 x 10kHz(x,y,z) static2 x 10kHz+

+

+

Second toy problem: video

•Moving “objects” (simple shapes)

•Constant velocity

•Spatiotemporal pixel pattern is just echoes from t=0 at center

Echo kernels = location/shape/velocity

(0,0)

(4,4)

(0,1)=

“Time at center”

Spatio-temporal Pixel responses

“entangled signals”

100 x 1 kHz{v, } semi-static

1 kHz

+

.

.

.

(0,0)

(0,1)

(4,4)

+

+

Generic entanglement

=

Very few independent pure signals

to track

Echo kernels in low-dim

subspace give persistent structure

many entangled, correlated,

high-bw signals as

inputs

.

.

. .

.

.

. .

Recap: echo-entanglement as a generic perceptual problem

• Very similar to early vision

• Just like audio echo-removal

• Structured “echoes” carry near-static info

• Associative memory and vector quantization are special cases

How to dis-entangle?

• Want to reveal original signals and structures

• Problem is hard (unsolved!)So…– Skip the mere algorithms– Skip the neurons and biology– Focus on a module’s inputs & outputs– Try to make modules work together

What would one disentangling module do?

• Note separate timescales:– Many channels of high-BW input– 1-3 indep channels med-BW output (time blurring)– Many channels near-static output & input

• Learn correlations (echoes) in input• Find low-dim subspace for echos (e.g. {x,y,z}, or

{v, })• Reconstruct inputs all at once (batch)• Minimize reconstruction error

(Assume typically 1 pure signal max during learning)

Basic disentangling module

T=-500 +100, coarseFloat outputs

Float inputs

Decorrelation &vector quantization

Reconstruction &prediction

x,y,z Pure signal

T=-500 +100, fine

e.g. for cocktail-party decorrelation

“mics”

“now”

“now”

Add multiple, independent outputs

• Multiple speakers/objects multiple outputs

• Each output represents one object (max 3)

• Output streams and mappings are independent

• An even harder disentangling task

• (complications too!....)

Module with multiple outputs

X1,y1,z1

X2,y2,z2

X3,y3,z3

Speaker 1

Speaker 2

Speaker 3

Add confidence estimates (sigmas)

• Disentangling is already a statistical-estimation task

• Confidence estimates come for free during reconstruction

• Propagate inputs’ sigmas forward

• Create output sigmas based on input sigmas and reconstruction error

Module with sigmas

Add layers

• Pure signal outputs become inputs to next layer• Many modules below feed each module above• Maybe, modules below can feed more than one

above• Whole upper layer uses longer and coarser

timescale• Stackable indefinitely• Top layers have huge input range, long memory,

broad abstractions

Modules in layers

T=-1000200

-500100

Add feedback

• Upper layer reconstructions provide estimates to lower modules (might help, can’t hurt)

• Near-static channels provide cheap “prediction” of input interrelations

• Update all estimates frequently

• Predicted pure signals could help reconstruction below

Feedback between modules

Open problems

• How do do the decompression?– Iterative? Monte Carlo? Low-dim subspace?

• Multiple objects/pure signals:– Deciding how many objects from a module– “binding” problem across modules– Which goes with which?– Layers 2-N need “clones,” one clone per extra

object

Summary: generic sensory model

• Assume inputs result from cascading a simple entangling transformation

• Entangling transformation is cocktail-party with echoes

=

Summary: stackable disentangling modules

• Assume one layer of disentangling can be learned and done somehow

• Separate time-series from static echo-kernel structure• Disentangle time-series in batches• Use reconstructions for error-checking and feedback• Propose “API” by which such modules can interact to

solve multi-scale, multi-sensory problems

Recommended