View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Collaborators:Tomas Gedeon
Alexander DimitrovJohn P. Miller
Zane Aldworth
Information Theory and Neural CodingPhD Oral
ExaminationNovember 29, 2001
Albert E. Parker
Complex Biological Systems Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Outline
The ProblemOur Approach Build a Model: Probability and Information Theory Use the Model: Optimization
ResultsBifurcation TheoryFuture Work
Why are we interested in neural coding?
• We are computationalists: All computations underlying an animal's behavioral decisions are carried out within the context of neural codes.
• Neural prosthetics: to enable a silicon device (artificial retina, cochlea, etc.) to interface with the human nervous system.
Neural Coding and Decoding
The Problem: Determine a coding scheme: How does neural activity represent information about environmental stimuli?
Demands: • An animal needs to recognize the same object on repeated
exposures. Coding has to be deterministic at this level.• The code must deal with uncertainties introduced by the
environment and neural architecture. Coding is by necessity stochastic at this finer scale.
Major Obstacle: The search for a coding scheme requires large amounts of data
How to determine a coding scheme?
Idea: Model a part of a neural system as a communication channel using Information Theory. This model enables us to:
• Meet the demands of a coding scheme:o Define a coding scheme as a relation between stimulus and neural
response classes.
o Construct a coding scheme that is stochastic on the finer scale yet almost deterministic on the classes.
• Deal with the major obstacle:o Use whatever quantity of data is available to construct coarse but
optimally informative approximations of the coding scheme.
o Refine the coding scheme as more data becomes available.
Probability Framework(coding scheme ~ encoder)
(1) Want to Find: The Encoder
X YQ(Y |X)
environmentalstimuli
neuralresponses
The coding scheme between X and Y is defined by the conditional probability Q.
Probability Framework(elements of the respective probability spaces)
stimulus X=x neural response Y=y
(2) We have data: realizations of the r.v.’s X and Y
Q(Y=y|X=x)
k = 25ms windows over discretized time
X Y
environmentalstimuli
neuralresponsesQ(Y |X)
{0,1}k = 10 ms windows over discretized time
We assume that Xn(w) = X(Tn(w)) and Yn(w) = Y(Tn(w)) are stationary ergodicr.v.s, where T is a time shift.
Y
1
2
3
4
Xenvironmental stimuli
neur
al r
espo
nses
Overview of Our Approach
How to determine stimulus/response classes?Given a joint probability p(X,Y):
The Stimulus and Response Classes
environmental stimuli
neur
al r
espo
nses
Distinguishable stimulus/response
classes
Y
X
1
2
3
4
Information TheoryThe Foundation of the Model
• A signal x is produced by a source (r.v.) X with a probability p(X=x). A signal y is produced by another source Y with probability p(Y=y).
• A communication channel is a relation between two r.v.’s X and Y. It is described by the (finite) conditional probability or quantizer: Q(Y | X).
• Entropy: the uncertainty or self information of a r. v.
• Conditional Entropy:
• Mutual Information: the amount of information that one r.v. contains about another r.v.
),()()(
)()(
),(log);( ,
YXHYHXH
YpXp
YXpEYXI YX
)(
1log
XpEXH X
)|(log| , XYQEXYH YX
The entropy and mutual information of the data asymptotically approach the true population entropy and mutual information respectively.
Shannon McMillan Breiman Theorem (iid case)
If are i.i.d., then
a.s.)(),...,,(log1
lim 110 XHXXXpn n
n
niiX 1}{
Proof: Let Yi=log p(Xi ) are i.i.d.. The theorem follows from the Strong Law of Large Numbers. �
This result holds if is a stationary ergodic sequences as well. This is important for us since our data is not i.i.d., but we do assume that X and Y are stationary ergodic.
niiY 1}{
Why Information Theory?
niiX 1}{
Conceptual ModelMajor Obstacle: To determine a coding scheme, Q, between X and Y
requires large amounts of data
Idea: Determine the coding scheme, Q*, between X and YN , a quantization of Y, such that: YN preserves as much mutual information with X as possible:
X Y
Q(Y |X)environmentalstimuli
neuralresponses
YN
quantizedneural
responsesq*(YN |Y)
New Goal: Find the quantizer q*(YN |Y) that maximizes I(X,YN )
Q*(YN |X)
Mathematical Model
}0)|( and1)|(|)|({
:
yyqyyqyyq Ny
NNy
yYy
N
We search for the maximizer q*(YN|Y) which satisfies maxq H(YN |Y)
constrained by I(X,YN ) = Imax
Imax := maxqI(X,YN)
The feasible region assures that q*(YN|Y) is a true conditional probability.
321 yyy
y y is a product of simplices (each simplex is a discrete probability space)
We begin our search for the maximizer q*(YN|Y) by solving
① q* = argmaxqI(X,YN )
② If there are multiple solutions to , then, by Jayne's maximum entropy principle, we take the one that maximizes the entropy
maxq H(YN |Y) constrained by I(X,YN ) = Imax
In order to solve , use the method of lagrange multipliers to get
maxq H(YN |Y) + I(X,YN )
④ Annealing: In practice, we increment in small steps to . For each , we solve
q* = argmaxq H(YN |Y) + I(X,YN )
Note that lim q* = q* from .
Justification
Some nice properties of the model The information functions are nice.
Theorem 1 H(YN |Y) is concave, I(X,YN ) is convex.
is really nice.Lemma 2 is the convex hull of vertices ().
We can reformulate as two different optimization problems Theorem 3 An equivalent problem to is to solve
q*(YN|Y) = argmaxq vertices() I(X,YN )
Proof: This result follows from Theorem 1 and Lemma 2. �
Corollary 4 The extrema of lie on the vertices of .
Theorem 5 If q*M is the maximizer of
constrained by I(X,YN ) = Imax
then q* = 1/M q*M, where
Proof: By Theorem 3 and the fact that M is the convex hull of vertices(M) �
321 yyy
321 yyy
}0)|( and)|(|)|({
:
,
,
yyqMyyqyyq Ny
NNyM
yMYy
M
N
)|(max YYH Nq MM
Annealing:
q* = argmaxq H(YN |Y) + I(X,YN )
Augmented Lagrangian Method
Implicit solution:
Set . Solve implicitly for q:
Drawback: current choice of is ad hoc.
Vertex Search Algorithm:
maxqvertices() I(X,YN )
Drawback: |vertices ()| = N|Y|
N
n
n
y
yp
qI
yy
yp
qI
n ee
q
,1
Optimization SchemesGoal: Build a sequence .
*1}{ qq n
kk
1
2
3
321 yyy
0 y y
Ny
N
yyqIH ))|((
Results: Application to synthetic data(p(X,Y) is known)
Four Blob Problem
Algorithm Cost in MFLOPS I(X,YN ) in bits
N 2 3 4 2 3 4 Langrangian 431 822 1220 .8272 1.2925 1.6269 Implicit Solution 38 106 124 .8280 1.2942 1.6291 Vertex Search 6 18 21
.8280 1.2942 1.6291
optimal quantizers q*(YN |Y) for N=2, 3, 4, 5I(X,YN ) vs. N
Signal
Nervous system
Communicationchannel
Modeling the cricket cercal sensory system as a communication channel
Why the cricket?
• The structure and details of the cricket cercal sensory system are well known.
• All of the neural activity (about 20 neurons) which encode the wind stimuli can be measured.
• Other sensory systems (e.g. mammalian visual cortex) consist of millions of neurons, which are impossible (today) to measure in totality.
Wind stimulus and neural response in the cricket cercal system
Neural Responses (over a 30 minute recording) caused by white noise wind stimulus.
T, ms
Neural Responses (these are all doublets) for a 10 ms window
Some of the air current stimuli preceding one of the neural responses
Time in ms. A t T=0, the first spike occurs
X
Y
YN
yyq N |
Y
Quantization for real data:A quantizer is any map f: Y -> YN from Y to YN with finitely many elements. Quantizers can be
deterministic or
refined
yyN
Y
probabilistic
• p(X,Y) cannot be estimated directly for rich stimulus sets -there is
not enough data. Use data to estimate a maximum entropy model.
• I(X,YN)=H(X) -H(X|YN). Only H(X|YN ) depends on the q(yN|y). So
an upper bound of H(X|YN) produces a lower bound of I(X,YN).
• is bounded by a Gaussian:
• We estimate and .
• Over all yN,we have a Gaussian mixture model.
Application to real data (p(X,Y) isNOT known)
Idea: maximize a lower bound of I(X,YN).
NNyN yYXHEYXHN
||
NNN yX
XyNGyN CeEyXHEYXH |2 det2log2
1)|(|
y
y NyXC |
Optimization problem for real data
maxq H(YN|Y) constrained by
H(X)-HG(X|YN ) = Imax
The equivalent annealing problem:
maxq H(YN |Y) - HG(X|YN )
Algorithm Cost in GFLOPS I(X,YN ) in bits
N 3 4 5 3 4 5 Implicit Solution 7 11 10 .43 .80 1.14 Vertex Search 31 84 141
.44 .85 1.81
Investigating the Bifurcation Structure
Goal: To efficiently solve q* = argmaxq H(YN |Y) + I(X,YN ) for each as
.
Idea: Choose wisely.
Method: Study the equilibria of the of the flow
which are precisely the maximizers q* . The first equilibrium is q* = 0 1/N.
Search for bifurcations of the equilibria Use numerical continuation to choose
Conjecture: There are only pitchfork bifurcations
IHqgq :),(
q* (YN|Y)
Nq
1*
Bifurcations of q*
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
q* (YN|Y)
Nq
1*
Other Applications
Solving problems of the form
x* = argmax H(Y | X) + Dare common in many fields:
• Clustering • Compression and communications (GLA)
• Pattern recognition (ISODATA, K-mean)
• Regression
Future Work• Bifurcation structure
o Capitalize on the symmetries of q* (Singularity and Group Theory)
• Annealing Algorithm Improvemento Perform optimization only at where bifurcations occur
o Use Numerical Continuation to choose
o Implicit Solution method qn+1 = f (qn , ) converges reliably and quickly. Why? Investigate the superattracting directions.
• Perform optimization over a product of M-simplices
• Joint Quantizationo Quantize X and Y simultaneously
• Better maximum entropy models for real data.
• Compare our method to others.