31
Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University

Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Information Theory and Neural Coding PhD Oral Examination November 29, 2001

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Collaborators:Tomas Gedeon

Alexander DimitrovJohn P. Miller

Zane Aldworth

Information Theory and Neural CodingPhD Oral

ExaminationNovember 29, 2001

Albert E. Parker

Complex Biological Systems Department of Mathematical Sciences

Center for Computational Biology

Montana State University

Outline

The ProblemOur Approach Build a Model: Probability and Information Theory Use the Model: Optimization

ResultsBifurcation TheoryFuture Work

Why are we interested in neural coding?

• We are computationalists: All computations underlying an animal's behavioral decisions are carried out within the context of neural codes.

• Neural prosthetics: to enable a silicon device (artificial retina, cochlea, etc.) to interface with the human nervous system.

Neural Coding and Decoding

The Problem: Determine a coding scheme: How does neural activity represent information about environmental stimuli?

Demands: • An animal needs to recognize the same object on repeated

exposures. Coding has to be deterministic at this level.• The code must deal with uncertainties introduced by the

environment and neural architecture. Coding is by necessity stochastic at this finer scale.

Major Obstacle: The search for a coding scheme requires large amounts of data

How to determine a coding scheme?

Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to:

• Meet the demands of a coding scheme:o Define a coding scheme as a relation between stimulus and neural

response classes.

o Construct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes.

• Deal with the major obstacle:o Use whatever quantity of data is available to construct coarse but

optimally informative approximations of the coding scheme.

o Refine the coding scheme as more data becomes available.

Probability Framework(coding scheme ~ encoder)

(1) Want to Find: The Encoder

X YQ(Y |X)

environmentalstimuli

neuralresponses

The coding scheme between X and Y is defined by the conditional probability Q.

Probability Framework(elements of the respective probability spaces)

stimulus X=x neural response Y=y

(2) We have data: realizations of the r.v.’s X and Y

Q(Y=y|X=x)

k = 25ms windows over discretized time

X Y

environmentalstimuli

neuralresponsesQ(Y |X)

{0,1}k = 10 ms windows over discretized time

We assume that Xn(w) = X(Tn(w)) and Yn(w) = Y(Tn(w)) are stationary ergodicr.v.s, where T is a time shift.

Y

1

2

3

4

Xenvironmental stimuli

neur

al r

espo

nses

Overview of Our Approach

How to determine stimulus/response classes?Given a joint probability p(X,Y):

The Stimulus and Response Classes

environmental stimuli

neur

al r

espo

nses

Distinguishable stimulus/response

classes

Y

X

1

2

3

4

Information TheoryThe Foundation of the Model

• A signal x is produced by a source (r.v.) X with a probability p(X=x). A signal y is produced by another source Y with probability p(Y=y).

• A communication channel is a relation between two r.v.’s X and Y. It is described by the (finite) conditional probability or quantizer: Q(Y | X).

• Entropy: the uncertainty or self information of a r. v.

• Conditional Entropy:

• Mutual Information: the amount of information that one r.v. contains about another r.v.

),()()(

)()(

),(log);( ,

YXHYHXH

YpXp

YXpEYXI YX

)(

1log

XpEXH X

)|(log| , XYQEXYH YX

The entropy and mutual information of the data asymptotically approach the true population entropy and mutual information respectively.

Shannon McMillan Breiman Theorem (iid case)

If are i.i.d., then

a.s.)(),...,,(log1

lim 110 XHXXXpn n

n

niiX 1}{

Proof: Let Yi=log p(Xi ) are i.i.d.. The theorem follows from the Strong Law of Large Numbers. �

This result holds if is a stationary ergodic sequences as well. This is important for us since our data is not i.i.d., but we do assume that X and Y are stationary ergodic.

niiY 1}{

Why Information Theory?

niiX 1}{

Conceptual ModelMajor Obstacle: To determine a coding scheme, Q, between X and Y

requires large amounts of data

Idea: Determine the coding scheme, Q*, between X and YN , a quantization of Y, such that: YN preserves as much mutual information with X as possible:

X Y

Q(Y |X)environmentalstimuli

neuralresponses

YN

quantizedneural

responsesq*(YN |Y)

New Goal: Find the quantizer q*(YN |Y) that maximizes I(X,YN )

Q*(YN |X)

Mathematical Model

}0)|( and1)|(|)|({

:

yyqyyqyyq Ny

NNy

yYy

N

We search for the maximizer q*(YN|Y) which satisfies maxq H(YN |Y)

constrained by I(X,YN ) = Imax

Imax := maxqI(X,YN)

The feasible region assures that q*(YN|Y) is a true conditional probability.

321 yyy

y y is a product of simplices (each simplex is a discrete probability space)

We begin our search for the maximizer q*(YN|Y) by solving

① q* = argmaxqI(X,YN )

② If there are multiple solutions to , then, by Jayne's maximum entropy principle, we take the one that maximizes the entropy

maxq H(YN |Y) constrained by I(X,YN ) = Imax

In order to solve , use the method of lagrange multipliers to get

maxq H(YN |Y) + I(X,YN )

④ Annealing: In practice, we increment in small steps to . For each , we solve

q* = argmaxq H(YN |Y) + I(X,YN )

Note that lim q* = q* from .

Justification

Some nice properties of the model The information functions are nice.

Theorem 1 H(YN |Y) is concave, I(X,YN ) is convex.

is really nice.Lemma 2 is the convex hull of vertices ().

We can reformulate as two different optimization problems Theorem 3 An equivalent problem to is to solve

q*(YN|Y) = argmaxq vertices() I(X,YN )

Proof: This result follows from Theorem 1 and Lemma 2. �

Corollary 4 The extrema of lie on the vertices of .

Theorem 5 If q*M is the maximizer of

constrained by I(X,YN ) = Imax

then q* = 1/M q*M, where

Proof: By Theorem 3 and the fact that M is the convex hull of vertices(M) �

321 yyy

321 yyy

}0)|( and)|(|)|({

:

,

,

yyqMyyqyyq Ny

NNyM

yMYy

M

N

)|(max YYH Nq MM

Annealing:

q* = argmaxq H(YN |Y) + I(X,YN )

Augmented Lagrangian Method

Implicit solution:

Set . Solve implicitly for q:

Drawback: current choice of is ad hoc.

Vertex Search Algorithm:

maxqvertices() I(X,YN )

Drawback: |vertices ()| = N|Y|

N

n

n

y

yp

qI

yy

yp

qI

n ee

q

,1

Optimization SchemesGoal: Build a sequence .

*1}{ qq n

kk

1

2

3

321 yyy

0 y y

Ny

N

yyqIH ))|((

Results: Application to synthetic data(p(X,Y) is known)

Four Blob Problem

Algorithm Cost in MFLOPS I(X,YN ) in bits

N 2 3 4 2 3 4 Langrangian 431 822 1220 .8272 1.2925 1.6269 Implicit Solution 38 106 124 .8280 1.2942 1.6291 Vertex Search 6 18 21

.8280 1.2942 1.6291

optimal quantizers q*(YN |Y) for N=2, 3, 4, 5I(X,YN ) vs. N

Signal

Nervous system

Communicationchannel

Modeling the cricket cercal sensory system as a communication channel

Why the cricket?

• The structure and details of the cricket cercal sensory system are well known.

• All of the neural activity (about 20 neurons) which encode the wind stimuli can be measured.

• Other sensory systems (e.g. mammalian visual cortex) consist of millions of neurons, which are impossible (today) to measure in totality.

?

?

Sensory interneuron

Sensoryafferent

cerci

Preparation: the cricket cercal sensory system.

Wind stimulus and neural response in the cricket cercal system

Neural Responses (over a 30 minute recording) caused by white noise wind stimulus.

T, ms

Neural Responses (these are all doublets) for a 10 ms window

Some of the air current stimuli preceding one of the neural responses

Time in ms. A t T=0, the first spike occurs

X

Y

YN

yyq N |

Y

Quantization for real data:A quantizer is any map f: Y -> YN from Y to YN with finitely many elements. Quantizers can be

deterministic or

refined

yyN

Y

probabilistic

• p(X,Y) cannot be estimated directly for rich stimulus sets -there is

not enough data. Use data to estimate a maximum entropy model.

• I(X,YN)=H(X) -H(X|YN). Only H(X|YN ) depends on the q(yN|y). So

an upper bound of H(X|YN) produces a lower bound of I(X,YN).

• is bounded by a Gaussian:

• We estimate and .

• Over all yN,we have a Gaussian mixture model.

Application to real data (p(X,Y) isNOT known)

Idea: maximize a lower bound of I(X,YN).

NNyN yYXHEYXHN

||

NNN yX

XyNGyN CeEyXHEYXH |2 det2log2

1)|(|

y

y NyXC |

Optimization problem for real data

maxq H(YN|Y) constrained by

H(X)-HG(X|YN ) = Imax

The equivalent annealing problem:

maxq H(YN |Y) - HG(X|YN )

Algorithm Cost in GFLOPS I(X,YN ) in bits

N 3 4 5 3 4 5 Implicit Solution 7 11 10 .43 .80 1.14 Vertex Search 31 84 141

.44 .85 1.81

Applying the algorithm to cricket sensory data.

Y

YN

YN

Class 1

Class 2

Class 1

Class 2

Class 3

Investigating the Bifurcation Structure

Goal: To efficiently solve q* = argmaxq H(YN |Y) + I(X,YN ) for each as

.

Idea: Choose wisely.

Method: Study the equilibria of the of the flow

which are precisely the maximizers q* . The first equilibrium is q* = 0 1/N.

Search for bifurcations of the equilibria Use numerical continuation to choose

Conjecture: There are only pitchfork bifurcations

IHqgq :),(

q* (YN|Y)

Nq

1*

Bifurcations of q*

Observed Bifurcations for the 4 Blob Problem

Conceptual Bifurcation Structure

q* (YN|Y)

Nq

1*

Other Applications

Solving problems of the form

x* = argmax H(Y | X) + Dare common in many fields:

• Clustering • Compression and communications (GLA)

• Pattern recognition (ISODATA, K-mean)

• Regression

Future Work• Bifurcation structure

o Capitalize on the symmetries of q* (Singularity and Group Theory)

• Annealing Algorithm Improvemento Perform optimization only at where bifurcations occur

o Use Numerical Continuation to choose

o Implicit Solution method qn+1 = f (qn , ) converges reliably and quickly. Why? Investigate the superattracting directions.

• Perform optimization over a product of M-simplices

• Joint Quantizationo Quantize X and Y simultaneously

• Better maximum entropy models for real data.

• Compare our method to others.