View
103
Download
2
Category
Preview:
Citation preview
Project Number: 145
Is the Function of the Auditory
System to Predict Future
Inputs?
1
AbstractThe brain evolved to process natural stimuli in order to rapidly make decisions about future actions. We
gathered a high quality natural sound database for use in two studies related to this observation. This
database will also be useful in future studies, in compliment to artificial stimuli, which have the
limitation that being artificial, may produce neural responses irrelevant to the natural function of the
nervous system. Our first study examined the hypothesis that the brain’s neural code is optimized to
best predict future inputs and used this database to train a mixture density network. We compared the
resulting parameters of the network with past recordings from the auditory nerve and found some
similarities in their properties, suggesting there may be some value in this hypothesis. In our second
study, which was unrelated to the first, we compared the capacity of different artificial stimuli to
characterise the spectro-temporal response function (STRF) of auditory cortical neurons. We aimed to
assess this capacity by comparing how well the characterizations of the STRF for different artificial
stimuli could predict the neural response to our natural sounds.
IntroductionUnderstanding how the brain encodes natural sensory information is a key question in systems
neuroscience, Over the last twenty years, the field of computational neuroscience has developed models to test
ideas as to how the central nervous system is able to complete this processing, whilst in vivo studies have
investigated the response of sensory systems to stimuli. However, traditionally, the artificial stimuli used in such
experiments are simple and unrepresentative of the natural environment1. Therefore, any studies using such
stimuli, and the conclusions they draw, are limited.
In this project, we gathered a database of 'natural' sounds, that is, sounds present environment of
evolutionary adaptedness- the environment that the auditory system of model animals would have
evolved to respond to. We sought to utilise this database in two separate studies. For the first study, we
used the database to train a model based on the idea that the brain evolved to predict future inputs. For
the second study we used the database to test the capacity of artificial stimuli used in past physiological
studies to characterise auditory cortical neurons.
Natural Sounds Database- Databases of natural sounds do presently exist2,3 but none are particularly
high quality, suffering from problems in recording such as clipping (Figure 1) and containing
2
microphone noise. Those which are high quality often only feature only one kind of natural sound, such
as speech. Finally, all such databases have sampling rate of 44.1kHz or less, leading to a Nyquist
frequency23 of 22.05kHz. However, most neurophysiological studies are done on small mammals, many
of which hear up to frequencies well above 22.05kHz. For example, ferrets hear in the range of 0.016-
44kHz16, 23. The sound database gathered will comprise of both natural and anechoic clips of sounds
present in the environment of evolutionary adaptedness, for instance, vocalisations and foliage recorded
in areas without ambient mechanical sounds. It should be of a high enough quality to be continually
useful in a wide range of future physiological studies, especially in animals with hearing above the
range of humans.
Computational Modeling- This sound database will be used as training for a model based to an
artificial neural network. This network will be trained to predict future values of sounds, given their
past values.
One of the few well tested principles of the nervous system is efficient coding hypothesis 4. This is the
idea that neural systems evolved to efficiently represent natural stimuli. It is important to note that most
instantiations of this idea consider only the representation of current stimuli e.g. the currently observed
image. By comparing the predictions of models that embody this hypothesis with data from in vivo
experiments, the predictive power of such models can be tested and, if proven, suggest what the
function (but not necessarily the mechanism) of the neural system in question is. This approach has so
far elucidated the function of different features of the nervous system including visual5,6 and auditory7,8
coding, by finding efficient ways to represent stimuli.
However, this approach has limitations. For instance, the brain discards some information9 present in a
stimulus rather than encode all features of it. Understanding what information is of importance to the
3
Figure 1: An example of clipping (taken from 'crunching leaves louder') in the Pittsburgh natural sound database
brain and must be processed is a key problem in neuroscience. One idea is that the brain encodes the
information within stimuli that is useful for predicting the future state of said stimuli10. From a
behavioral perspective this seems intuitive as the key role of the nervous system is to make decisions
about future actions that will maximize the organism's chance of survival and reproduction. The
reaction time of humans to an auditory stimulus is on the order of 160ms11 and so in order to bypass this
delay and make optimal decisions about future actions, the present and future state of the world must be
predicted by the auditory system from past states of the world.
One of our aims therefore, was to use our sound database as input for a computational model (a mixture
density network) which embodies this concept of a prediction code. We trained the model to predict
future values of the natural sounds we gathered, given past values of these sounds. The results of this
model will be compared against data from in vivo studies of the auditory nerves of model organisms,
and against past7,12 models of the auditory system which have attempted to produce a set of filters to
minimise the statistical dependence amongst the filter outputs (i.e. efficiently represent the current
sound). These efficient coding models give filters that are back to front8, steep and high frequency in
the far past and are shallow and with a low frequency in the near past, problems that we aimed to
overcome with our predictive model.
Auditory Cortical Neuron Responses-We also sought to evaluate artificial stimuli commonly used in in
vivo auditory neuroscience studies. Such studies use these simuli to characterise the spatio-temporal
receptive fields (STRFs) of neurons within the auditory cortex of animals. STRFs give the sensitivity of
neurons to different frequencies across time. The STRF can be seen as a linear model that transforms a
time-varying stimulus, such as a spectrogram, into a prediction of neural firing rate13,14. Studies have
previously used dynamic random chords (DRCs)15,16, or temporally orthogonal ripple combinations
(TORCs)17 to characterise the STRFs of neurons, though a cross stimulus comparison has never been
investigated. In addition we also tested two novel stimuli: randomly modulated noise and variable
speed dynamic random chords (vsDRCs). We aimed to see which stimulus would give an STRF that
could best predict the peristimulus time histogram (PSTH) of ferret cortical neurons played the natural
sounds we had gathered. However, due to technical difficulties, we were limited in the amount of data
we could gather and so such analysis was only preliminary.
4
Materials and MethodsNatural Sounds Database- In order to carry out our study (and for use in future physiological studies)
we required roughly an hour of sounds gathered from natural and anechoic environments. The sounds
were gathered using a Zoom H29 recorder in the anechoic chamber (a sonically insulated chamber with
walls lined with foam to absorb, and not reflect, noise from echoes) of the Auditory Neuroscience
laboratory in the Sherrington Building, Oxford and also from areas isolated from non-natural sounds in
the countryside of Oxfordshire. The gathered files were hand-edited in Audacity to remove any artifacts
or distortions, such as clipping or microphone noise, and were systematically filed in a provisional
database of 1-20s clips in the .wav format.
As we could only hold a neuron for a limited time, we could only play a small set of natural sounds in
our physiological experiment. Because of this we need to ensure that the sounds we use are a good
representation of the range of natural sounds. We therefore generated a sub-database of 1 second
sounds by the following method:
First a set of ‘good’ 1 second sound segment were found from our gathered database. For each sound
file, first the sound was resampled to the sampling rate of the physiology sound hardware (97,656Hz).
Then a highpass 5th-Order Butterworth Filter was applied at 200 Hz to remove any low frequency
noise. Then multiple 1 second long segments of sound were taken from the file, with starting points
every 100 samples. After this, in order to have sounds than come on and off within the 1 second, only
those segments were taken in which the root mean square (RMS) magnitude in both the first 10 ms and
the last 10 ms were at least 18 dB less than the RMS magnitude in the time between. To ensure that
sounds (such as rain) which did not show such isolated sounds were included, one randomly chosen
segment was also taken from each file. Next, any segments that overlap were excluded, excepting one
segment from each set of overlapping segments. Then the sounds were placed in one of 12 categories –
(table 1). This process left 709 ‘good’ 1-second sound segments.
We also carried out some preliminary analysis into intensity of these sounds above 20kHz (the limiting
sampling rate of present natural sound databases). To do this we recorded silence in the anechoic
chamber using the Zoom H2. From this, we found power spectral density at 1kHz. For each category of
sound, We then found the power spectral density for each frequency relative to the power spectral
5
density of silence at 1kHz. Next, for each category of sound, we plotted a histogram of the distribution
of power spectral densities for each frequency. We visually inspected the distribution of power spectral
densities for frequencies above 20kHz (Figure 5), compared to that for frequencies below 20kHz.
Category of Sound Files Number of Files (each 1s in length)
Leaves 32
Twigs 14
Gravel 42
Heavy Rain 12
Water 10
Birds 6
Sheep 13
Voice1 150
Voice2 213
Voice3 89
Voice4 50
Voice5 77
Computational Modeling- The task of the model is to predict future sound inputs given past sound
inputs. The model will be trained to do this task, and then we will examine its parameters to see if they
are similar to those found in physiological studies of the auditory nerve. We will use a model related to
an artificial neural network (a.k.a. a multilayer perceptron, or MLP) to do this task. Although the
cochlea and auditory nerve is not just a network of neurons, it
is an interconnected system, which can be modeled
functionally as a network, implemented as an artificial neural
network.
First we must preprocess the sounds. Because of limitations in
computing power we must down sample our sounds to a
sampling rate of 4000 Hz, and high pass our sounds at 400 Hz.
6
Table 1: The sound files in our refined natural sound database
Figure 2: a simple 3-layer backpropagation network (redrawn from ref. 30)
Then for each sound file we take lots of snippets of the sound. Each snippet is 60 samples long (15
ms). We take 200,000 snippets at random from across all the sound files. The first 50 samples (12.5ms)
of each snippets is used as input to a model, and the model is trained to predict the corresponding last
10 samples (2.5 ms) of each snippet.
Multilayer perceptron (MLP)
The first supervised learning technique we used involved a backpropogation18 network (Figure 2)
which was used to train a MLP by minimizing the error function via the sum of least squares. This
method was comprised of two distinct stages. Given that the predicted output z is given by:
(1)
Where xi(n)
is component i of vector x(n), the past values of a sound snippet, where i=1 is the present,
going to i=I into the past, where I=50. n is the snippet number, where there are n=1 to n=N snippets,
and N=200,000. wji is an IxJ matrix of weights on the inputs xi where J=50. bj is the biases. f ( ) is some
non-linearity ( in our case tanh), wkj is a JxK matrix of weights to the output z(n). zk (n) is the kth
component of vector z(n), where k=1 is the present, going to k=K into the future, where K=10. z(n) is
the prediction by the network of output t(n), the values in the future of x(n). Equation 1 is optimized to
predict t(n)over all snippets by minimizing the least squared difference between the prediction and the
true values of the future sounds:
(2)
The derivative of this error function is then taken with respect to the weights. In the second stage of
error backpropagation, these derivatives are then used to make adjustments to the weights in equation
(1) to minimize the difference between the expected values of the future sound (z) and the actual values
(t). The results of our MLP are not shown
7
Mixture Density Network
Error backpropagation is limited in its description of a highly variable, 'noisy' input as it can only
predict the expected value each future value, instead of the possible distribution of future values that a
mixture density network (MDN) can19 (Figure 3). An MDN constitutes a multilayer perceptron whose
output is used to parameterize a mixture of Gaussian distributions with a likelihood function of
(3)
Where πk represents the mixing coefficients and μk(x) and σk(x) represent the means and variances of the
input respectively. These 3 parameters are governed by the output of the multilayer perceptron, which
depends on the input (i.e. the past values, x(n) ). p(t|x) gives the distribution for the conditional
probability of t, given x. The likelihood of the target data (the future) under the above distribution,
conditional on the input data (the past), is maximized with respect to the weights of the multilayer
perceptron. This is equivalent to minimizing the negative logarithm of the likelihood function with
respect to w, for all the data, where w is all weight matrices, as given in equation 4:
(4)
In order to test the MDN, 100,000 snippets of files in the sound database were taken, each comprising
of sixty 15ms units after having been downsampled to a rate of 4kHz. The first 50 of these units were
used as training for the network to predict the last 10 units (t).
The MDN also had 50 input units and 50 hidden units. As an output it had 30 10-dimensional isotropic
Gaussians parameterised by a total of 360 parameters; 30 means each consisting of 10 values each, 30
variances, and 30 weightings. As described in the methods, the sum of these Gaussian gives the
probability distribution over the 10 sample sections to be predicted. The Gaussians were initialised by
first applying the k-means algorithm to the target data, k-means clusters the data into k clusters (in our
8
case k=30) by minimising the total squared distance between
the cluster centres and the data. The positions of the k means
gave the initial centres of the 30 Gaussian, and the relative
number of data points associated with each of the centres
gave the weighting. The variance of each Gaussian was set
to the distance to the nearest other data-point (although if this
was 0 the variance was set to 1). The input weights were
initialised using Gaussian random variables with a variance
of 1/2000. The MDN was optimised using a scaled conjugate
gradient algorithm for 100,000 iterations.
Auditory Cortical Neuron Responses-
Experimental Set-up
To obtain STRFs, electrophysiological data from a male
ferret was recorded using silicon probe electrodes
(Neuronexus Technologies, Ann Arbor, Michigan) with 16
sites on a single probe, vertically spaced at 50μm or
150μm. Responses were elicited using Panasonic RPHV27
earphones (Bracknell, UK), coupled to otoscope specula, inserted into each ear canal driven by Tucker-
Davis (Alachua, Florida) Technologies System III hardware with a 97.656kHz sample rate. Sounds
were played after being filtered through a simple cochleagram with 23 filters. Anesthesia was induced
using medetomidine hydrochloride (Domitor; 0.022 mg.kg-1.h-1) and ketamine (Ketaset; 5 mg.kg-1.h-1).
Anesthesia was maintained with an intravenous infusion (5 ml/h) of this mixture in physiological saline
containing 5% glucose and animals also received a single subcutaneous dose of 0.06 mg.kg -1.h-1
atropine sulfate and subcutaneous doses of 0.5 mg/kg dexamethasone every 12 h to reduce bronchial
secretions and cerebral edema, respectively. All animal procedures were approved by the local ethical
review committee and performed under license from the United Kingdom Home Office.
Stimuli
As stimuli we used DRCs, vsDRCs, TORCs, randomly modulated noise and our natural sounds. A
DRC consists of multiple simultaneous pure tones over a range of frequencies. Every epoch (5ms) the
9
Figure 3: a cartoon of a mixture density network (redrawn from ref. 13)
sound intensity of the each tone is picked from a uniform distribution from 10-70 dB. vsDRCs are the
same as a standard dynamic random chord, except that each tone remains at the same intensity for 1-6
epochs, the number of epochs being chose from a uniform distribution. A TORC is the sum of a set of
ripples- Gaussian white noise, modulated at a certain modulation frequency over time, and a different
modulation frequency over sound frequency. For a full description see reference 31. For the randomly
modulated noise, Gaussian white noise was taken and he adjusted using the methods in to have on
average power spectrum that was pink (1/f) and a pink modulation spectrum in over both time and
frequency. This produced a stochastic stimulus with a power spectrum and modulation spectrum
matching the average spectra of natural sounds. In comparison with the TORC stimuli it has a
stochastic aspect without the ordered phase structure. For all of the artificial stimuli, two 30-second
long sound files were played, except for the TORCs where we played three 30 second long sound files
each consisting of 10 consecutive TORCs. This was longer than the other sounds in order to be able to
compare with results from previous experiments with TORCs.
To generate our natural sound stimuli 60 sounds were randomly chosen from the database of processed
sounds in table 1 by first randomly selecting a category and then randomly selecting a file from within
that category. To ensure the chosen sounds were representative of natural sounds, this process was
repeated 2000 times and the set of 60 sounds of was chosen which had a power spectrum and
modulation spectrum closest to that of the power spectra and modulation spectra of a large database of
natural sound (ie. 1/f and 1/f^1.5 respectively, formulae taken from Singh & Theunissen20 ). These 60
natural sounds were then placed in two sound files, each containing 30 randomly order natural sound
segments, with a 250ms gap between each.
All sound types were renormalized to have a final RMS of 80 dB SPL. The 11 sounds files (2 DRCs, 2
vsDRCs, 3 TORC, 2 RMN, and 2 Natural Sounds) were played in a random order. This was done 10
times, with a new random order each time.
Data Analysis
The raw neural response traces from the 32 electrodes were sorted into spike trains from putative
neurons using spikemonger- an in-house spike sorting algorithm. Then for each neuron, the response to
each stimulus over time was taken by getting the spike count in 10 ms windows, placed every 5 ms
10
over the course of the sound. This was then averaged over the 10 repeats, and finally the average count
was converted to an average rate by dividing by the window size (10 ms). This produced a PSTH,
denoted by r, where rt is the spike rate at time t.
The stimuli were then processed into cochleagram, plotting the power of the sound as a function of
time and sound frequency. The cochleagram is constructed using a power density STFT with a 10ms
Hamming window with 5ms overlap. The magnitude of the power spectral density which is then
summed over frequency using a number of triangularly weighted bins corresponding to the equivalent
rectangular bandwidth (ERB) width in the cat (ERBs widths in the ferret are assumed to be similar).
Subsequently, the log10 taken, and all values below -2 set to -2. This gives a very simple cochleagram.
Then for each response bin at time t, the section of cochleagram preceding it is taken, extending τ steps
into the past. Thus each response rt has a corresponding preceding cochleagram segment Xf(t-τ).
The STRF is the matrix, W, which best predicts the PTSH, yt, from the preceding cochleagram
segment, Xf(t-τ) where t is time, f is sound frequency and τ is time into the past. We want to find W which
minimises the difference between the response rates and the product of the STRF and preceding
cochleagram segment:
(5)
where,
(6)
However, as there are too many parameters in W to find a clear minimum, it is required that as
additional constraint: that only a few values of W are very large. This is reasonable as it is expected that
most delays and frequencies will not influence the neuron. Thus, the error function becomes:
(7)
11
We then used the MATLAB minFunc27 to minimise E with respect to W. The data was divided into 3
parts, 80% was used to fit the STRF, 10% was used for cross-validation, and 10% was used to test how
well the STRF predicted the PSTH.
The value of λ was set by crossvalidation, that is the minimization was done for various values of λ, and
λ and the value with the least error in predicting the cross validation set was used. E, appropriately
scaled, is the measure of prediction error.
ResultsNatural Sounds Database- In total 8568 seconds of natural sounds were recorded which was refined
to a database of 12 categories of natural sounds, totaling 709s of data (table 1). Spectrograms that are
characteristic of each sound category from table 1 are plotted in Figure 4.
By inspection, it is clear that there is information in our recordings above 20kHz (Figure 4) in the form
of brief transients. We can examined this in more detail by taking the distribution of power spectral
12
Figure 4: Spectrograms characteristic of each sound category from table 1
density over all the sound files within one category (these are plotted in Figure 5). These show that all
stimuli have some information above 20kHz, though this is most pronounced in the sounds in the
'leaves', 'twigs' and 'birds' categories.
Computational Modeling- We began our analysis by looking at the input weights wji of the optimised
mixture density network (Figure 6a). Each subfigure on this plot shows the input weights to a hidden
unit in our network. These are the weights we would expect to correspond to the frequency tuned
cochlear filters, whose properties can be recorded from the auditory nerve. These subplots have been
reserved in time so they appear as impulse responses in order to compare with impulse responses
recorded from the auditory nerve of cats21,22. The input weights wji show an oscillatory form, and had
most of their power towards more recent time points (on the left of each subplot). These findings are in
line with previous physiological measurements of cochlear filters , and in contrast to Lewicki model7
which produced back-to-front filters.
These weights were then examined in detail by looking at their magnitude spectrum under a Fourier
transform (Figure 7a). Each column is the input to a hidden unit in our network. We can see that most
units show a distinct frequency peak. In Figure 7b the number of units with a best frequency in a given
octave above 125Hz was plotted. In the cochlear, we would expect a roughly equal number of units per
octave. However, in our model, low frequencies were overrepresented.
Each neuron's frequency tuning was also examined in terms of its Q10dB (the center frequency divided
by the bandwidth as measured by the range of frequencies with intensity no less than 10dB below the
maximal intensity) which, in auditory nerve physiological data, has a fixed relationship with the best
frequency of the neuron. As Figure 8a shows, plotting the Q10dB against the center frequency shows
clear correlation (R=0.3) which is significant (p=0.032). This was in line with previous data using cat
auditory nerve fibres23,24 which were plotted against our data in the same Figure. In the Lewicki model,
a similar relationship was found (Figure 8b).
13
14
15
Figure 5: The histogram of intensities (as measured by power spectral densities) across frequencies for the refined natural sound database
16
Figure 6: a) The weight vector to each hidden unit in our modelb) The cochlear filters measured in cats in references 21 and 22
17
Figure 7: a) the magnitude spectrum of the weighting vectors to the hidden units under a Fourier transform- each column is a weight vector b) A histogram of the preferred frequency of the units as assessed by the maximum value of each column on plot a)
Figure 8: a) Our plot of center frequency vs. Q10dB for each hidden unit (black dots) and a line of best fit (black line). Overlayed on this are the best fit lines from past physiological studies (references 23 [blue] and 24 [red]
Auditory Cortical Neuron Responses-In Figures 9a and b, the STRFs for the 5 stimuli presented are
shown. Unfortunately, the neural responses from the experiment were too noisy to be able to do give a
rigorous analysis of how well each stimulus type can predict the response to natural sounds. Only two
neurons showed sufficient regularity for even a preliminary analysis. However, we can still examine the
STRFs for each stimulus type, although any features are based on a very small sample size, and so may
not be representative.
From Figures 9a and 9b, the DRC responses look the noisiest but are of similar preferred frequency to
the STRFs from natural sounds. The responses to the TORCs and randomly modulated noise a width of
frequency tuning more similar to that of the STRFs from natural sounds, but their preferred frequency
is slightly higher. The cleanest STRFs appear to come from the randomly modulated noise stimuli. The
natural sound STRFs also appear more narrow over time than the STRFs over any of the artificial
stimuli.
18
ConclusionsNatural Sounds Database- In total 8568 seconds of natural sound were initially gathered, which
translated as 709 seconds of stimulus after filtering for snippets that complied with our conditions
outlined in our methods (table 1). Most importantly, these files contained no distortions and analysis
19
Figure 9 a) and b) The STRFs, neural responsitivity as a function of sound frequency and delay, for two neurons (a and b) as measured using 5 different sound types
into the modulation spectra of these sounds showed at least some power above 20kHz (Figure 5) with
the categories for 'leaves', 'twigs' and 'birds' showing pronounced power above this frequency. This is
completely lacking in previous sound databases that have a sampling rate of 44.1kHz, reducing their
relevance to studies involving model animals which can hear frequencies above 20kHz.
We have also demonstrated two uses of our natural sounds database here with encouraging, though
preliminary, results.
Computational Modeling- Our modeling studies produced a mixture density network that could predict
the future values of a sound, but also (and more importantly) showed units whose tuning to frequency
had some similarity to data from the auditory nerve, suggesting that the idea of prediction may have
some value in describing the function of the cochlea. The weighting vector to each hidden unit showed
a degree of oscillation which was, to an extent, similar to that found in the auditory nerve. The steepest
rise of the envelope of our weighting filters was nearer to the present, the reverse of the weighting
filters produced by the Lewicki model, but similar to the experimental data we compared with. The
center frequency/Q10dB relationship per unit that the model produced showed some a similar positive
slope to the same relationship examined found in past in vivo studies. However, the absolute were
lower by a factor of approximately 2.
Our model had limitations. Firstly, our weighting vectors to each hidden unit did not oscillate over time
as much as found in the physiological literature. Secondly we saw an over-representation of low
frequencies compared to the even distribution found in the cochlea. In order to overcome this,
anisotropic Gaussians could be used (which might better allow the variance of the predictions to differ
as one moves into the future), or convolutional neural networks similar to those described by Le Cun25
could be used.
Auditory Cortical Neuron Responses-Although our experiment did not yield enough data for rigorous
quantitative analysis, the results we did gather suggest that the randomly modulated noise might
produce the cleanest, most reliable, STRFs. However, which kind of stimulus would best predict the
responses to natural sounds remains unclear due to lack of clean data.
20
Although the results of our study were interesting, at this stage our data is still very preliminary and
further experiments are needed to both confirm the benefits of our sound database and of our approach
to modeling the auditory system.
Both the hypothesis that the function of the auditory system10 (as well as other systems26,27) is one of
prediction has only recently emerged. The use of an MDN model to investigat this is entirely novel. As
mentioned, one of the primary goals of this project was to create a sound database that could be used in
further investigation within the field, and so not only is it hoped that our findings might spur future
studies, but that the database we gathered might be an integral part of these.
A similar paradigm of predictive coding is already emerging in the field of vision and so it would seem
clear that adapting our model to other modalities would also be a worthwhile future task. As good
natural visual databases already exist28,29 we could easily apply our model to these.
In conclusion, we have gathered a large database of natural sounds that will proved valuable in further
investigation into the auditory system and to compliment artificial stimuli, which while useful, have the
limitation that they may be inducing neural responses without physiological relevance.
21
References
1. Felsen, G, and Yang D, "A natural approach to studying vision." Nat Neurosci. 8, no. 12, 1643-
1646, (2005)
2. Pittsburgh Natural Sound Database http://www.cnbc.cmu.edu/cplab/data_NaturalSounds.html
3. Cornell Lab of Ornithology Macaulay Library of Natural Sounds
http://vivo.cornell.edu/display/individual5547
4. Barlow, HB, “Possible principles underlying the transformations of sensory messages.” from
Sensory Communication. (1961)
5. Olshausen, B, and Field, D, "Emergence of simple-cell receptive field properties by learning a
sparse code for natural images." Nature 381 no.6583 607-609, (1996)
6. Lewicki, M. and Olshausen, B. "Probabilistic framework for the adaption of comparison image
codes." J. Opt. Soc. Am. A. 16 no. 7, 1587, (1999)
7. Lewicki, M. "Efficient coding of natural sounds.." Nat Neurosci. 5, no. 4, 356-363 (2002)
8. Dean, I. Harper, N. and McAlpine, D, "Neural population coding of sound level adapts to
stimulus statistics."Nature Neurosci. 8 no. 12, 1684-1689, (2005)
9. Mesgarami, N. and Chang, E. “Selective cortical representation of attended speaker in multi-
talker speech perception” Nature 485, 233–23 (2012)
10. Winkler, Denham and Nelken “Modeling the auditory scene: predictive regularity
representations and perceptual objects” Trends In Cognitive Science, 13 no.12, 532-40 (2009)
11. Welford, Reaction Times, 1980
12. Smith, E. and Lewicki, M. "Efficient auditory coding." Nature 439 no.7079, 978-982 (2006)
13. Zhao and Zhaoping, “Understanding Auditory Spectro-Temporal Receptive Fields and Their
Changes with Input Statistics by Efficient Coding Principles” PLoS Comp Bio, 7, e1002123,
(2011)
14. Theunissen, F.E. David, S.V. Singh, N.C. Hsu, A Vinje, W.E. and Gallant, J.L. “Estimating
spatio-temporal receptive fields of auditory and visual neurons from their responses to natural
stimuli” Network: Computation in Neural Systems, 12 no. 3, 289-316 (2001)
15. Rabionwitz, N. Willmore, B. Schnupp, J. and King, A. “Contrast Gain Control in Auditory
Cortex” Neuron, 70 6 , 1178-1191 (2011)
16. deCharms, C. Blake, D. Merzenich, M. “Optimising sound features for cortical neurons”
Science, 280 no. 53, 1439-1444 (1998)
22
17. Fritz, J. Shamma, S. Elhilali, M. Klein, D. “Rapid task-related plasticity of spectrotemporal
receptive fields in primary auditory cortex” Nature Neuroscience 6, 1216- 1223 (2003)
18. Rumelhart, D. Hinton, G. and Williams, R. “Learning representations by back propagating
errors” Nature, 323 no. 9, 533 (1986)
19. Bishop, eprint (1994)
20. Singh and Theunissen, “Modulation of natural sounds and ethological theories of auditory
processing” Acoustical Society of America, 114 no.6, 3394-3411 (2003)
21. de Boer, E. and dr Jongh, H. “On cochlear encoding: Potentialities and limitations of the
reverse‐correlation technique” Acoustical Society of America,63 no. 1, 115-135 (1978)
22. Carney, L. and Yin, T. “Temporal Coding of Resonances By Low-Frequency Auditory Nerve
Fibers: Single-Fiber Responses and a Population Model” J. Neurophysiol. 60 no. 5, 1653-
1677 (1988)
23. Evans, E. F. Cochlear nerve and cochlear nucleus. in Handbook of Sensory Physiology Vol. 5/2
1–108 (1975).
24. Rhode, W. S. & Smith, P. H. “Characteristics of tone-pip response patterns in relationship to
spontaneous rate in cat auditory nerve fibers.” Hearing Res. 18, 159–168 (1985).
25. LeCun, Y. Bengio, Y. “Convolutional Networks for Images Speech and Time Series” from
Handbook of brain theory and neural networks (1995)
26. Bar, M. “The proactive brain: using analogies and associations to generate predictions” Trends
In Cognitive Science, 11 no.7, 280-289 (2007)
27. Summerfield, C and Egner, T. “Expectation (and attention) in visual cognition” Trends In
Cognitive Science, 11 no.7, 403-409 (2009)
28. Kyoto natural image database http://www.cnbc.cmu.edu/cplab/data_kyoto.html
29. Kayser, C. Einhauser, W. and Konig, P “Temporal correlations of orientations in natural scenes”
Computational Neuroscience: Trends in Research, 52, 117-123 (2003)
30. Depireux, Simon, Klein, Shamma, “Spectro-Temporal Response Field Characterization With
Dynamic Ripples in Ferret Primary Auditory Cortex” J. Neurophysiol. 85 no. 3, 1220-1234
(2001)
31. McDermott, J.H. & Simoncelli, E.P “Sound texture perception via statistics of the auditory
periphery: Evidence from sound synthesis”. Neuron, 71, 926-940 (2011)
23
Recommended