A new window on sound

Fig. 1. Different models for glutamate spillover in the hippocampus. (a) ‘Local’ spilloverinvolving only a couple of synapses. Diffusion is fast enough that the time course of NMDAreceptor activation (right) may be similar at the site of release (red) and in adjacent synapses(green). (b) More broadly distributed spillover, in which glutamate released from one synapsereaches a larger number of synapses, though at a much lower concentration. As a result, veryfew receptors are activated in any adjacent synapse (right). (c) Cooperation between activesynapses could enhance activation of receptors at quiescent synapses in between, resulting ina larger postsynaptic response (right).

292 nature neuroscience • volume 5 no 4 • april 2002

the induction of LTP without violating Don-ald Hebb’s postulated requirement for coin-cident pre- and postsynaptic activity13. Ifnearest neighbors were to experience anintermediate influx of calcium, these synaps-es could undergo depression14, resulting in

4. Choi, S., Klingauf, J. & Tsien, R. W. NatureNeurosci. 3, 330–336 (2000).

5. Rusakov, D. A. & Kullmann, D. M. J. Neurosci.18, 3158–3170 (1998).

6. Barbour, B. J. Neurosci. 21, 7969–7984 (2001).

7. Lozovaya, N. A., Kopanitsa, M. V., Boychuk, Y. A. & Krishtal, O. A. Neuroscience 91,1321–1330 (1999).

8. Diamond, J. S. J. Neurosci. 21, 8328–8338(2001).

9. Lester, R. A., Clements, J. D., Westbrook, G. L.& Jahr, C. E. Nature 346, 565–567 (1990).

10. Bergles, D. E. & Jahr, C. E. J. Neurosci. 18,7709–7716 (1998).

11. Clements, J. D. & Westbrook, G. L. Neuron 7,605–613 (1991).

12. Denk, W., Yuste, R., Svoboda, K. & Tank, D. W.Curr. Opin. Neurobiol. 6, 372–378 (1996).

13. Hebb, D. O. The Organization of Behavior: ANeuropsychological Theory (Wiley, New York,1949).

14. Cummings, J. A., Mulkey, R. M., Nicoll, R. A. &Malenka, R. C. Neuron 16, 825–833 (1996).

a center–surround arrangement thatmay even enhance synapse specificity.Other factors, such as alternate routes ofcalcium entry into the postsynaptic den-drite or a possible role for postsynapticneuronal glutamate transporters8, arelikely to complicate this simplified sce-nario. Although the evidence for spillovercontinues to mount, it remains to beseen whether the phenomenon, contraryto its unfortunate, accidental moniker,proves beneficial to neuronal informa-tion processing.

1. Arnth-Jensen, N., Jaboudon, D. & Scanziani,M. Nature Neurosci. 5, 325–331 (2002).

2. Asztely, F., Erdemli, G. & Kullmann, D. M.Neuron 18, 281–293 (1997).

3. Malenka, R. C. & Nicoll, R. A. Neuron 19,473–476 (1997).

news and views

A new window on soundBruno A. Olshausen and Kevin N. O’Connor

Auditory filters must trade off frequency tuning against temporalprecision. The compromise achieved by the mammalian cochleaseems well matched to the sounds of the natural environment.

In the vertebrate cochlea, sound isdetected by an array of several thou-sand hair cells that transduce mechan-ical vibrations into electrical activity.The individual hair cells, and the audi-tory nerve fibers to which they are con-nected, are tuned to specificfrequencies. The population of audito-ry nerve fibers thus provides us with afrequency analysis of sound waveformsin the enviorment, but there are asmany ways to perform a frequencyanalysis as there are to build a house.Which one is the most appropriate forthe auditory system? An article by

Lewicki in this issue sheds new light onthis question1. Using powerful new sta-tistical methods, the author shows thatthe frequency analysis performed by themammalian cochlea is well matched tothe range of sounds encountered in thenatural environment.

Each auditory nerve fiber may beconsidered as a filter that signals infor-mation about the temporal structure ofstimuli within its preferred frequencyrange. As engineers have understood foryears, the design of a filter involves aninevitable trade-off between the preci-sion of frequency tuning and temporaltuning. A tone consists of cyclical fluc-tuations of air pressure, and to obtain anaccurate frequency estimate, many cyclesmust be integrated. But a longer inte-gration period means a decrease in thetemporal accuracy of the filter—in otherwords, a filter cannot signal both the fre-quency and the timing of a sound with

arbitrary precision. Yet discriminationof real-world sounds often requiresaccurate measurements of both fre-quency and timing. In human speech,for example, the difference betweenvowel sounds depends on the relativestrengths of different frequencies (har-monics), whereas the distinctionbetween certain consonant sounds (‘ta’and ‘da’ or ‘ba’ and ‘pa’) is a matter of adifference in voice onset time of a fewtens of milliseconds. Precise temporalinformation is also important for soundlocalization, which in many casesdepends on time-of-arrival differencesbetween the two ears. The challenge forthe auditory system, then, is to find theright trade-off between timing and fre-quency analysis.

One way to think about the time–fre-quency tradeoff is in terms of a ‘tiling’of the time–frequency plane (Fig. 1). Atone extreme, frequency discriminationis sacrificed completely to maximizetemporal discrimination. At the otherextreme is the Fourier transform, inwhich temporal discrimination isignored to extract the maximum infor-mation about the frequency composi-tion of the stimulus. Between theseextremes are many other possibilities.Two schemes that are well known to

The authors are at the Center for Neuroscienceand the Department of Psychology (B.A.O.)and Section of Neurobiology, Physiology andBehavior (K.N.O.), University of California atDavis, 1544 Newton Court, Davis, California95616, USA.e-mail: [email protected]

a

b

c

Amy Center

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://n

euro

sci.n

atu

re.c

om

nature neuroscience • volume 5 no 4 • april 2002 293

engineers are the Gabor filter and thewavelet filter. Gabor filters, which con-sist of Gaussian-windowed sinusoids,provide the optimal joint resolution inboth time and frequency, and they tilethe time–frequency plane with windowsof equal temporal width at all frequen-cies. Wavelet filters can be viewed asvariants of Gabor filters in which thetemporal windows become narrower asfrequency increases, giving the filters theproperty of self-similarity. They are pop-ular for many signal-processing applica-tions because they capture theself-similar structure that is present inmany natural signals. But these are justexamples, and there is no end to thevariety of tiling schemes that can beimagined.

Which scheme is ‘optimal’ wouldseem to depend on a number of factors:the relative behavioral importance of dif-ferent types of information, neurobio-logical or biophysical constraints, andthe statistical properties of signals pre-sent in the environment. Lewicki focus-es on the last of these, attempting to finda time–frequency tiling scheme thatmaximizes statistical independenceamong the filters. The motivation formaximizing independence is related toideas proposed long ago by Attneave2

and Barlow3, who argued that the ner-vous system should try to exploit theredundancies present in signals in orderto form representations of the structurepresent in the environment. It is alsorelated to principles of efficient coding,which aim to make the most use of lim-ited neural resources4.

Lewicki’s method1 for deriving a setof optimal filters draws on a recentadvance in signal analysis called ‘inde-pendent component analysis’ (ICA). ICAprovides a method for extracting a lin-

pre-existing properties of the peripheralauditory system.

Lewicki’s analysis does not attempt toprovide a comprehensive account ofauditory coding. For example, it does notconsider the effect of changing soundintensity. The tuning of Lewicki’s filtersis independent of sound intensity, butthis is not true of real auditory nervefibers. Most fibers reach the limit of theirdynamic range roughly 30–40 dB abovethreshold, meaning their firing rates sat-urate at even moderate intensities7. Atthese intensities, their frequency tuningalso becomes considerably broader8. Butthese facts are not necessarily inconsis-tent with Lewicki’s results, because mostphysiological measurements are madeusing isolated pure tones. Less is knownabout how auditory nerve fibers behavein response to more ecologically realisticbroadband stimuli, and it is possible thatgain control mechanisms maintain fre-quency selectivity even with high-inten-sity stimuli9.

There are also a few peculiarities toLewicki’s filters that arise from the par-ticular way in which ICA was imple-mented. For example, the filters learnedby the algorithm are fairly symmetric intime (the attack and decay occur atabout the same rate), whereas the‘gamma-tone’ filters that have been char-acterized physiologically are asymmet-ric in time (they rise more steeply thanthey decay). In addition, the algorithmproduces filters with the same frequen-cy response but shifted in time, whereasauditory nerve fibers do not show suchdelays. But it would be fairly straight-forward to modify the algorithm so thatthe filters are constrained to be causal(that is, filter outputs are determinedfrom present and past values of theinput), in which case their temporal

ear decomposition of signals that mini-mizes not just correlations but manyhigher-order statistical dependencies aswell5. Lewicki shows that when ICA isapplied to different ensembles of natur-al sounds (using short samples of 8 msduration), the time–frequency tiling pat-terns that emerge are strikingly differ-ent. For environmental sounds (such ascrackling twigs), one obtains time–fre-quency windows similar to a wavelet,whereas for animal vocalizations (mon-key coos), one obtains a tiling patternsimilar to the Fourier transform. Speech,which (as noted above) contains a mix-ture of temporal and frequency cues,gives rise to an intermediate tiling pat-tern, somewhere between a Gabor and awavelet; the temporal accuracy increaseswith frequency, but to a lesser extentthan with wavelet filters.

The tiling pattern that is optimal forspeech is thus intermediate betweenthose optimized for environmentalsounds and animal vocalizations. AsLewicki shows, a close match is obtainedwith a 2:1 mixture of environmental toanimal sounds. Interestingly, this patternis similar to what has been observedphysiologically in cat auditory nervefibers, and it also bears similarity to theauditory filters that have been charac-terized psychophysically in humans andother animals. Although auditory filtersmeasured behaviorally are not necessar-ily determined by the cochlea6 (theycould theoretically arise anywhere with-in the auditory system), these results,taken together, suggest that the cochleaand auditory nerve may be optimized totransmit a wide range of naturally occur-ring sounds to the brain. It is even pos-sible, as the author suggests, that theacoustic properties of human speechhave evolved to make efficient use of the

news and views

best timelocalization

? wavelet Gabor ? best frequencylocalization(Fourier transform)

time

freq

uenc

y

. . . . . . . . . . . . . . .

Fig. 1. The time–frequency plane can be tiled in multiple ways, any of which provides a complete representation of a signal. Some of these possi-bilities have been named by engineers (Gabor, wavelet, Fourier transform). Each row within a tiling represents one filter. The vertical dimension ofeach row represents the frequency specificity of the filter. The width of each box within a row represents the temporal resolution of the filter.Note that although the shapes vary, the area of each box is the same; this area represents the lower bound imposed by the fact that it is impossibleto achieve arbitrarily precise resolution of both timing and frequency. How the boxes within a tiling are shaped reflects the chosen trade-offbetween time and frequency.

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://n

euro

sci.n

atu

re.c

om

294 nature neuroscience • volume 5 no 4 • april 2002

envelopes would most likely becomeasymmetric, like gamma-tone filters.And if the filters were also assumed tobe time-invariant, so that they are actu-ally convolved with the input signal,then there would be no need for time-shifted filters. However, such modifica-tions are unlikely to dramatically affectthe time-frequency tiling scheme learnedby the algorithm, which is really themain point of the paper.

Lewicki’s results share an intriguingsimilarity to recent work in vision. Neu-rons in the visual cortex encode both thelocation and the spatial frequency ofvisual stimuli, and the trade-off betweenthese two variables is analogous to thatbetween timing and frequency in audi-tory coding. It has been shown thatmaximizing statistical independence or‘sparseness’, of visual representationsyields spatial receptive field propertiessimilar to those of cortical neurons10–12.Curiously, the space–frequency tilingscheme of both the derived filters andthose measured physiologically deviatesfrom a wavelet in much the same way asLewicki finds for the auditory system;bandwidth at high spatial frequencies isnarrower than one would expect13. It isnot yet clear whether this similarity isprofound or simply coincidental.

Perhaps an even deeper question iswhy ICA accounts for neural responseproperties at the very earliest stage ofanalysis in the auditory system, whereasin the visual system ICA accounts for theresponse properties of cortical neurons,which are many synapses removed fromphotoreceptors. It seems likely thatstructural or neurobiological constraints

are really just the tip of the iceberg, andthat ahead lies a vast territory ripe forinvestigation.

1. Lewicki, M. Nature Neurosci. 5, 356–363(2002).

2. Attneave, F. Psychol. Rev. 61, 183–193(1954).

3. Barlow, H. B. in Sensory Communication (ed.Rosenbluth, W. A.) 217–234 (MIT Press,Cambridge, MA, 1961).

4. Simoncelli, E. P. & Olshausen, B. A. Annu.Rev. Neurosci. 24, 1193–1215 (2001).

5. Hyvarinen, A., Karhunen, J. & Oja, E.Independent Component Analysis (Wiley, NewYork, 2001).

6. Ehret, G. in Advances in Hearing Research.Proceedings of the 10th InternationalSymposium on Hearing (eds. Manley, G. A.,Klump, G. M., Koppl, C., Fastl, H. &Oekinghaus, H.) 387–400 (World Scientific,London, 1995).

7. Evans, E. F. & Palmer, A. R. Exp. Brain Res. 40,115–118 (1980).

8. Kiang, N. Y.-S., Watanabe, T., Thomas, E. C.& Clark, L. F. Discharge Patterns of SingleFibers in the Cat’s Auditory Nerve (MIT Press,Cambridge, Massachusetts, 1965).

9. Schwartz, O. & Simoncelli, E. P. NatureNeurosci. 4, 819–825 (2001).

10. Olshausen, B. A. & Field, D. J. Nature 381,607–609 (1996).

11. Bell, A. J. & Sejnowski, T. J. Vision Res. 37,3327–3338 (1997).

12. van Hateren, J. H. & van der Schaafi, A. Proc.Royal Soc. Lond. B 265, 359–366 (1998).

13. De Valois, R. L., Albrecht, D. G. & Thorell, L. G. Vision Res. 22, 545–559 (1982).

14. Kording, K.P., Konig, P., Klein, D.J. (2002)Proceedings of the IEEE International JointConference on Neural Networks(IJCNN'2002) (IEEE Press, New York, inpress)

15. Schreiner, C. E., Read, H. L. & Sutter, M. L.Annu. Rev. Neurosci. 23, 501–529 (2000).

are crucial in determining the stage ofanalysis appropriate for an independentcomponent analysis of sensory signals.For example, the visual system is facedwith an early bottleneck, where infor-mation from more than 100 millionphotoreceptors is funneled into 1 mil-lion optic nerve fibers. The representa-tion is then expanded by a factor of 50in the cortex. By contrast, in the audito-ry system, there is no early bottleneck,and the 3000 inner hair cells of thecochlea immediately fan out onto 30,000auditory nerve fibers. Thus it seems thatin both systems, ICA is applied at thepoint of expansion in the representation.

It is tempting to speculate aboutwhat additional insights may be gainedregarding neural mechanisms higher upin the auditory stream—the midbrain,thalamus and cortex—by consideringhigher-order forms of structure overlonger time scales and across multiplefrequency bands. But this can not bedone by simply applying the same analy-sis yet again. To add descriptive power,any additional stage of analysis wouldhave to be nonlinear. Divisive normal-ization9 or a signal power representation(as in a spectrogram) seem like obviouschoices, and some preliminary workalong these lines has produced promis-ing results14. Conceivably, this type ofapproach could begin to account formore complex properties of auditoryneurons such as multiple excitatory–inhibitory band structure or frequencymodulation sensitivity15, or perhapseven predict heretofore unnoticedresponse properties of cortical neurons.One gets the feeling that these findings

news and views

Color visions in the brain

A vivid perception of color is evoked by spoken words ("seven" is blue, for instance)in people with a condition called 'colored-hearing synesthesia'. This percept isassociated with activity in an area of the brain that responds to color vision, reportJulia Nunn and colleagues on page 371 of this issue. Because other visual areas arenot activated, these results suggest that a conscious perception of color can becreated by activation of the brain's 'color center' alone.

In a control experiment, normal subjects did not show activity in the color center inresponse to spoken words, even after they had been extensively trained to visualizeparticular colors in association with those words. The authors conclude that synesthesia ismuch more like a color hallucination than color imagery. Synesthesia may also have agenetic basis, as it runs in families and is strongly sex-linked (six times more common inwomen). The authors speculate that this condition may result from developmental errorsin the formation or retraction of connections between auditory cortex and visual cortex.

Sandra Aamodt

Am

y C

ente

r

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://n

euro

sci.n

atu

re.c

om

Documents

A new window on sound