Chapter 4 Decoding Algorithms for Brain–Machine Interfaces

Chapter 4Decoding Algorithms for Brain–MachineInterfaces

Austin J. Brockmeier and Jose C. Prıncipe

Brain–machine interfaces (BMIs) are a bridge by which neural activity from thebrain can interact with the external world, and sensory information may be returnedto the brain. In the case of damage along the pathway between the brain and theperipheral muscles in the upper limbs, a motor BMI can bridge the gap and restore alevel of interaction with the external world via direct neural control of the motion ofcomputer cursors or robotic arms. Essentially, motor BMIs are constructed bylearning a functional model that maps observed neural activity into the kinematicor dynamic variables of continuous movements. This chapter discusses theprinciples for model estimation relevant to BMIs as well as coverage of successfulimplementations. Although neural signals relevant to motor control range frompopulations of single units, local field potentials, electrocorticograms (ECoG), orelectroencephalogram (EEG), this chapter specifically reviews the methods andmodels that use a population of single-unit activity, where action potential timingsfrom single cells have been resolved, to estimate continuous kinematic variables.Chapters 2 and 3 provide coverage of other instances of brain–computer interfaces.After an introduction to the BMI setting, modeling approaches and challenges areemphasized, along with a detailed coverage of the most common BMI approaches:from the population vector algorithm, based on individual neuron tuning, to optimallinear filters based on statistical methods such as Kalman and Bayesian filters.

A.J. Brockmeier • J.C. Prıncipe (*)Department of Electrical and Computer Engineering, College of Engineering,University of Florida, Gainesville, FL, USAe-mail: [email protected]

B. He (ed.), Neural Engineering, DOI 10.1007/978-1-4614-5227-0_4,# Springer Science+Business Media New York 2013

223

http://dx.doi.org/10.1007/978-1-4614-5227-0_2

http://dx.doi.org/10.1007/978-1-4614-5227-0_3

mailto:[email protected]

1 The Role and Setting of Motor Brain–Machine Interfaces

The brain integrates sensory information with an internal representation of theworld to plan and initiate actions in the environment. Although seemingly effortlessto a professional soccer player, the combination of perception, prediction, move-ment planning, and execution needed to kick a moving ball is far from simple—asany roboticist can attest. The information processing for these tasks is sharedbetween multiple brain areas and the spinal cord, and the final execution of musclecontraction is caused by the firing of motor neurons in the muscles. Damage fromillness or disease along the pathways that lead from the brain to the motor neuronsmay prevent the actual movement execution, even though the brain is still capableof representing movement information. The role of motor BMIs is to extract andtranslate the information about the intention of movement from brain activity anduse this information to control the movement of a computer cursor, robotic device,or even the users’ own limbs (through functional electrical stimulation or anexoskeleton). Thus, the design of motor BMI requires knowledge from bothneuroscience—on the theory of motor representation in the cortex—and engineer-ing—in terms of modeling and operation—to establish a communication channelby which the user can efficiently execute movements.

Since the representation of the intention of movement is dependent on the specificuser and the sampling of neural signals, BMI designers build general models that canbe trained for a specific user. The problem of extracting the movement intentionfrom neural activity is known as decoding. In this chapter, the decoding is in terms ofsome subset of kinematic variables (position, speed, acceleration); alternatively,BMIs that translate discrete positional goals have also been investigated [1–3], butthe discrete goal case will not be covered in this chapter. Chapters 2 and 3 providecoverage of other instances of brain–computer interfaces.

Initially, motor BMIs have been tested on animal models with access to able-bodied movements. Decoding models can be trained with the kinematics ofnatural movement and tested for their predictive performance. The task is morecomplicated when there is no initial training set based on natural movements(a characteristic of the target user population). Without natural movement, theunderlying intention can be derived from an instruction signal [4] that guides theusers’ imagined movements and serves as surrogate kinematics. In other cases, agoal-oriented workspace aligns the user’s intentions with the BMI’s [5, 6].

Rewards can be considered the impetus for performing movements. The brainis able to predict the sequence of movements needed to maximize reward.These predictions are based on sensory input and internal motor models [7–9].These processes form a perception–action–reward cycle as shown in Fig. 4.1.As the brain forms predictions about the result of motor actions in the environ-ment it may make corrective actions based on internal models and the incomingsensory information. BMIs serve the role of the bridge between the environmentand the brain, and motor BMIs specifically bridge the brain’s neural activity toaction manifestation.

224 A.J. Brockmeier and J.C. Prıncipe

1.1 Brain–Machine Interfaces: A Case of Neural Decoding

BMIs operate by extracting movement or goal-related variables from neural modula-tion. This neural decoding can be posed as a system identification problem: the“unknown brain system” maps neural inputs to movement outputs. System identifica-tionproblems are common in thefields of control theory and adaptive signal processing;thus, these are the engineering areas where many BMI algorithms originate.

System identification models the relationships between inputs and output: thetechnique is known as input–output modeling. By defining the architecture ofthe model, a BMI designer can impart as much or as little knowledge about theunderlying system as they wish. Furthermore, defining the inputs and the outputs isjust as important; in this chapter, we cover methods that are applicable for an inputcomposed of a population of spiking neurons with kinematic variable outputs.

Finding adequate decoding models for neural data to control movement is achallenge because of a number of factors: noise in neural data collection, irrelevantinputs (i.e., not all probed neurons contribute to the task), and coarseness (i.e., onlya subset of the neurons involved in the task are probed), let alone changes in theuser’s operation. The challenges are highlighted in Table 4.1.

Fig. 4.1 Diagram of the perception–action–reward cycle. The brain does not interact withenvironment, but is motivated by rewards and predicts the necessary actions. The effects of actionsin the environment are perceived through the sensory organs, and the brain processes thisinformation. The role of BMIs can be either of the two arrows that cross the environment/brainboundary

4 Decoding Algorithms for Brain–Machine Interfaces 225

Many tools from statistical signal processing and machine learning have beenemployed to combat these challenges. One of the goals of this chapter is to coverthe underlying terminology of system identification in the context of BMI research,highlight models that have been successfully applied in the BMI literature, andcover the challenges associated with transforming neural modulation into reliablecontrol signals for movement.

1.2 Developing and Deploying BMI Decoding Algorithms

There are two primary phases in testing BMIs: off-line development and closed-loop deployment. The first phase is to develop algorithms and models withprerecorded data in fully able subjects with natural limb movements and test theperformance of the algorithms (open loop or off-line experiments) with quantitativemetrics. The testing is typically a twofold validation where the dataset is dividedinto two parts with one part used for training and the rest for testing. This phase isnecessary to evaluate important details about the appropriateness of the models andthe training algorithms (linear versus nonlinear models, robustness to noise, gener-alization ability, etc.) and to explore preprocessing and user-defined parameterchoices. Moreover, it is necessary to accurately compare different algorithms onthe same dataset since in closed-loop deployment task execution may vary betweentrials. For this reason most quantitative comparisons in the decoding algorithmliterature were obtained in off-line scenarios.

In general, off-line results may not be indicative of performance in the secondphase when the user is using the BMI to control the device from neural activity(closed-loop experiments). In this scenario, the initial performance may suffer asthe device is dissociated from the natural movement and the BMI decoding takesover [10, 11]. Over time, the user may learn to change the neural activity to moreeffectively control the BMI. On the contrary, the quantitative metric used for off-line analysis may penalize constant biases in position or velocity, whereas inclosed-loop experiments the user may be able to compensate for these biases andsubsequently performance may actually improve.

During BMI operation, two changes can happen (1) the user may change theirneural modulation to compensate for errors, or (2) there may be functional changesin the biology of the brain as learning takes place (brain plasticity) [12]. Bothrepresent a nonstationary system in engineering terms. During online operation,

Table 4.1 Crucial challenges of BMI modeling and operation

Challenge Coarse sampling Irrelevant inputs Noise Changing useroperation

Approach: Electrode design andpositioning

Feature selection Modeling Adaptive and state spacemodels


feedback that the user receives about the operation of the BMI during neural controlis crucial. Most research studies have used animated cursors as visual feedbackdisplayed on 2D screens or 3D headsets; however, a robotic arm that interacts withthe user during self-feeding [13] may invoke different or additional neuralprocessing.

A third stage is also possible: the BMI model is not fixed after initial training, butinstead adapted online. In this scenario, there are actually two adaptive systems inthe loop: the subject and the BMI. It is a cooperative situation: both the user and thedecoding model need to learn co-adaptively to improve performance [10, 14, 15].This last scenario is unique as the co-adaptation may be able to increase perfor-mance to the level of dexterity needed in complex movements.

2 A Review of Decoding Approaches

The advancement of BMIs to the level of science [16] within the recent decades islandmarked by closed-loop experiments where the BMI directly controlled externalmachines (either computer cursors or robotics). The coverage of these is expandedin Appendix A. Much of the work owes to the pioneering work of earlyinvestigations. Single-unit recordings of motor units were performed by Evarts[17]. Fetz [18] showed that the modulation of specific neurons could be learned by amonkey through reward reinforcement and visual feedback; some 40 years laterKennedy and Bakay [19] reported on human control of neural firing rate. Ofparticular relevance to this chapter is decoding single-joint kinematics and dynam-ics from simultaneous recording from a population of motor neurons by Humphreyet al. [20]. Furthermore, the contributions of researchers who found variouscorrelates between neuron firing and kinematics created a fertile landscape forfuture realization in BMIs.

Often highlighted examples of motor decoding in different subjects includeclosed-loop experiments with able rats [21] and John Donohue’s group withhuman subjects [4]. Open-loop real-time motor decoding [22] and closed-loopmotor BMIs [10, 23] were established in able-bodied nonhuman primates andlaid the foundation for ongoing development and experimentation.

Closed-loop experiments are more expensive in terms of cost, time, and laborthan off-line testing. For new algorithms to be tested in closed loop, it takes provenoff-line performance with substantial gains in performance. Closed-loopexperiments have tended to be conservative in choosing decoding techniques,relying heavily on linear approaches. Different nonlinear decoding techniqueshave also been used in off-line modeling and analysis to ascertain properties ofthe neural activity such as the shape of the nonlinearity characteristics and tuningstrength.

Out of the experiments found in the literature, the majority used linear methodsduring the online portion of the experiment. Discriminative linear regression wasused in [4, 11, 21, 23–26]. The population vector algorithm [27] is a generative


linear model used by a number of experiments [5, 10, 13, 28, 29]. In addition, Chaseet al. [30] compared the population vector algorithm to a more general lineargenerative model [31]. Bayesian state space generative models—especially theKalman filter—have also been used [32–35] and the linear assumptions are com-pared in [36]. Nonlinear mapping via kernel based autoregressive moving averagefilter was tested in [37]. The rest of the chapter is dedicated to exploring some theengineering necessary for off-line testing and closed-loop deployment of BMIdecoding algorithms.

3 Neural Signals for Motor Brain–Machine Interfaces

Brain–machine interfaces based on cortical neural recordings stem from a century ofneurophysiology experiments. However, it has been the previous three decades wheregroundbreaking developments have occurred: the development of multielectrodearrays, instrumentation and faster computing that allowed the simultaneous recordingand processing ofmultiple neurons, and improved neuroscience understanding of howneural activity elicits motor movements. Microelectrode arrays implanted into themotor cortex and related areas can capture the activity of single neurons that havecharacteristics desired for motor BMIs.

Recording from a population of single units is both precise and invasive.Recordings are precise in the sense that given two units recorded on a singleelectrode, one may have a firing rate that is highly correlated with the intention ofmovement, whereas the other one be completely independent. The overall popula-tion of recorded neurons may be quite heterogeneous in function.

Microelectrode arrays capture the time-varying extracellular potential at eachrecording location. Single-unit action potentials exist in the high portion of thespectrum (300 Hz to 6 KHz) from this signal, and the low-frequency portion of thespectrum (0.5–300 Hz) is often referred to as local field potential (LFP) activity.Single-unit action potentials, or spikes, are isolated from the high frequency portion ofthe voltage trace in two steps: first, the potential spike times are found by identifyingwhen the signal crosses a threshold, and second, each specific neuron’s spikes arediscriminated based on the shape of the action potential waveform. This secondprocess is called spike sorting; there has been substantial research into improvingautomatic spike sorting, and this is still an area of ongoing research. Sorting involvesmatching the waveforms surrounding a threshold crossing to a set of shape templates.Defining the shape templates for the different units can be done as either a supervisedor unsupervised process. Since noise can also cause a spurious threshold crossing,waveforms that do not sufficiently match any of the templates are discarded. Whethersorted to separate the different units or not, spiking activity no longer containsamplitude information and is encoded solely in the sequence of times called a spiketrain. Spiking activity cannot be directly used in algorithms that require real-valuedvectors or scalars. More precisely, the sequence of spike times (the spike train) is a


realization of a point process and can be modeled probabilistically, but still most BMIalgorithms cannot use spike trains directly.

To use spike times as real-valued vectors, they are transformed into a discrete-timerate function by either smoothing (similar to kernel density estimation) or binningthe spikes (using a histogram of non-overlapping windows with a prespecified binwidth; see Fig. 4.2). A bin width or kernel width in the hundreds of milliseconds isoften used for a smooth estimate of the firing rate. The conversion from spike timesto continuous rates is representative of the conversion frommotor neuron dischargesinto graded muscle contractions.

The exact form of binning and smoothing may differ; some generative modelscan have fine time resolution—essentially binless—because they directly treatthe spike times as realizations from point processes. Models that use spikecounts may make assumptions that may or may not be valid depending on theexact smoothing or binning properties. For instance, spike counts are more likelyto approach a Gaussian distribution when the bin width is large. Thus, thepreprocessing steps such as picking the bin width or smoothing must be validatedto match the algorithm requirements. Different bin widths may produce differentresults even when using the same algorithm. So these parameters must be part ofthe design process, and they can be chosen through cross-validation along withother model parameters [38].

4 Input–Output Modeling: Motor Decodingfrom Neural Signals

4.1 System Architecture, Performance Criteria, and Adaptation

Translating input signals to output signals (neural modulation to kinematics)requires defining the system architecture. In classical signal processing both the

Fig. 4.2 The preprocessing steps and signal flow. After initial amplifying, filtering, and sampling,the signal is low-pass filtered if destined for LFP analysis and high-pass filtered for spike detection.To resolve a spike train, the signal is thresholded and sorted. Binning the spike train is the final step


system architecture and the coefficients, or parameters, are defined a priori. Thisapproach is not feasible in the BMI setting because the system characteristics thattranslate a specific user’s recorded neural activity to motor movement trajectoriesare unknown; thus, the system must be identified. The system identified isassumed to be a function from a certain family. The family defines the architec-ture a priori, but leaves the specific parameters to be estimated from the statisticsof available data. Statistics include empirical estimates of the mean, variance,covariance, auto-correlation function, cross-correlation function, or the full dis-tribution. System identification learns the parameters of a model such that theoutput fits the desired (target) signal in training data, where fitting is defined by astatistical criterion.

In the subsequent coverage, system architectures are grouped into two classes:generative and discriminative. For the case of BMI data, let us call the neuralactivity x and the kinematics y. A generative model forms an estimate of thelikelihood p(x|y)—that is the probability of neural activity given the kinematics—then during BMI operation, Bayes’ rule is used to pick the most likely y to explainthe neural activity. A discriminative model directly learns a mapping from the neuralactivity x to kinematics y that minimizes the prediction error. For a generative modelthe neural activity is a function of the kinematics, but for a discriminative model thekinematic are a function of the neural activity. In either case, the function or itsinverse is learned so that the desired kinematics in online operation can be predictedfrom the ongoing neural activity.

Parameters are learned to fit the data in terms of a goodness-of-fit criterion. Thiscriterion is the ruler for performance and its choice depends in part on the systemarchitecture. The most tractable criterion for discriminative models is based on themean-squared error (MSE). For generative models the criteria is based on thedistributions of the output data; maximum likelihood (ML) and maximum a poste-rior (MAP) are typically used. Given a criterion, there exists a set (not necessarilyunique) of the system parameters that minimizes the discrepancy (maximizes thefit) in the training set between the system output and the desired (target) signal. Ifone assumes that the system does not change and certain aspects of the optimizationare fulfilled, the system output can then be treated as the estimated output in onlineoperation, when the desired response is no longer available.

Various algorithms exist for how to analytically compute or incrementallyupdate the system parameters to meet any given criterion. A generic diagram ofsystem identification problem is shown in Fig. 4.3.

4.2 Achieving Generalization Without Overfitting

There are a number of challenges with any model or decoding technique for BMIs;see Table 4.2 for an overview. One challenge of BMI systems is that they aremultiple-input and multiple-output (MIMO) systems. The number of parametersin the system will scale with the dimensionality of the inputs and outputs.


Unless there is a priori knowledge about independence between parameters, theamount of data required to accurately fit the model also increases. For example, indiscriminative models the dimensions of the kinematic output are estimatedindependently so more output dimensions do not necessitate more data, but thecorrelation structure of the neural activity needs to be estimated and the amount ofrequired data grows with the number of inputs. With limits on data this can lead toBMI models that may perform well on the training data but suffer from poorperformance on novel data, a condition known as overfitting. In addition, withsome analytic solutions involving matrix inversion operations may become unsta-ble without sufficient data.

A general solution to ill-posed problems is adding constraints to the perfor-mance criterion. These constraints regularize the solution and avoid overfitting,but this approach biases the final solution away from the optimal. Alternatively,choosing a parsimonious architecture and avoiding large matrix inversions canimprove results. For generative models, this is often achieved by two assumptions:conditionally independent neurons (which avoids the estimation of a large covari-ance matrix) and first-order Markov assumptions on kinematic state transitions—i.e., the current state is independent of all past states given just the previous state.Model selection criteria, namely the Akaike Information Criterion (AIC) [39] andBayesian Information Criterion BIC (BIC) [40], quantitatively compare multiplemodels.

Model selection criteria balance two quantities: the training set performance andthe number of parameters. In general, having many more parameters will lead tobetter performance on the training set but potentially poor generalization, whereas amodel with less parameters may have moderate training set performance but bettergeneralization; the second situation is preferred.

The heterogeneous nature of neurons and brain areas is such that many neuralinput signals are not associated with the desired movements at all; thus, they appearto add only noise to the system. System performance is improved by selecting onlythose channels that are most associated with the desired variables; likewise, perfor-mance is degraded on systems that incorporate large numbers of independent but

Fig. 4.3 System identification. The system model has access to the original input and noisyobservations of the true system output. The model is updated via an adaptation algorithm that usesthat difference between desired signal and model-estimated output


widely varying inputs [5, 41]. Pruning the inputs can improve performance. Featuresselection is the principled and automatic method of selecting the subset of inputs thatyields the highest performance. The most common approach is to use a subset ofneurons whose firing rate is highly correlated with the desired variable. This can bedone in a greedy approach where the most correlated neurons are selected first,followed by less-correlated neurons until a predetermined stopping criterion. As analternative, information theoretic criteria, such as the mutual information betweeninput and output, can be used for feature selection. Other approaches include: dim-ensionality reduction on the input by methods as in principal component analysis(PCA) [42] (this is suboptimal formodeling since the principal component projectionis agnostic to the desired signal), adding regularization terms as in ridge regression[26], or addingL1 costs on parameters as inLASSO [43, 44]. In principle, the conceptof maximum relevance and minimum redundancy can guide feature selection. ForBMIs this means picking a set of neurons that are most relevant for different portionsof the movement space, yet are themselves uncorrelated [20] so that there is mini-mum redundancy between them.

Stationarity is also a prerequisite for generalization. Stationarity assumes thatthe system characteristics captured by the model do not change over time—or thatthey change slowly enough to track the changes. Specific architectures (switchingstate space, mixture of experts, hidden states) are designed to be less restrictive onwhat aspects of the signal must stay stationary. The increased flexibility mayallow the model to capture multiple regimes, each with distinct characteristics.This added flexibility inevitably comes at the cost of higher complexity for themodel, which can also lead to overfitting; these approaches have been applied toBMI studies [45–47]. Alternatively, if the nonstationari is not just a matter ofproper modeling but instead is due to brain plasticity, then the best approach is anadaptive model that is updated so as to track and incorporate the changing systemcharacteristics.

Table 4.2 Modeling challenges of BMIs

Challenge: Numerous, irrelevant,or noisy inputs

Choosing themodel size

Nonstationarity(multipleregimes)

Nonstationarity(brain plasticity)

Approach: – Feature selection(greedy withcross-correlationor mutualinformation)

– Dimensionalityreduction (PCA)

– Regularization(Ridge regression,LASSO)

– Modelselection(via AICor BIC)

– Independenceassumptions(Markovianstate space,conditionallyindependentneurons)

– Discrete mixtureof continuousmodels(switching statespace,competitivemixture ofexperts)

– Adaptivemodels


4.3 Experimental Dataset for Method Comparisons:Natural Reaching Task

To illustrate some of the decoding algorithms discussed in following sections, asingle dataset was used for comparing of test performance. The data are from Dr.Nicolelis’s primate laboratory at Duke University and were originally published in[22] with further off-line studies in [42, 45, 46, 48–50]. The data are recorded froman owl monkey’s cortex while the animal was performing a food-reaching task witha single arm. Multiple microwire arrays record from 104 neural cells in multiplecortical areas: posterior parietal cortex; left and right primary motor cortex; anddorsal premotor cortex. Synchronous recordings provide the reaching hand’s posi-tion in three dimensions. The spikes are binned at 100 ms, and the three dimensions(x, y, and z) of hand position are also at 10 Hz. In this dataset, there are over 38 minof data during which the animal would reach and eat food sitting on two differenttrays, while in-between reaches, the animal returns its hand to a resting location. Inour experiments we used 200 s of training data and 35 min of testing data. Thesections covering the Wiener filter, ridge regression, linear generative models, andKalman filter use the same data. The code for the examples is made available athttp://cnel.ufl.edu/bmi

The decoding algorithms are assessed in terms of the cross-correlation betweenthe true hand position and the decoded hand position with each dimension ofposition assessed individually. The cross-correlation here is the correlation coeffi-cient, and the estimate approaches 0 for independence and 1 for linear correlation.Because of the long periods of no movement between food reaches, two values aregiven: the first is the cross-correlation during movement and the second is the cross-correlation during the entire test set.

5 Applying Linear Modeling to BMIs

Linear modeling with parameters estimated via least-squares is a straightforwardblack-box approach to BMI decoding, where the inputs are typically binned spikecounts from a population of neurons, and the output are a set of kinematic variables.The approach is a black-box model because except for choosing the input, there isno prior knowledge added to the solution. Indeed, in most cases each one of thekinematic variables is estimated independently from the others. Most engineeringtextbooks consider the single-input case, but for BMIs it is relevant to capture thecorrelation structure among multiple neurons. As the number of inputs grows, thenumber of parameters grows at a quadratic rate. This may lead to poor estimationand generalization. Furthermore, linear modeling assumes a stationary correlationamong the inputs and between the input and the output. Therefore, a thoroughunderstanding of the assumptions and limitations of linear models is needed forproper application.


http://cnel.ufl.edu/bmi

To cover the general linear equation, let us define some quantities illustrated inFig 4.4:

yc(n)—the cth kinematic variable, at time step nxi(n)—the spike count for the ith neuron (out of M) at time step nac—an offset for the cth kinematic variableL—the number of causal lags to use; the length of the discrete time-delay linewc;il

! "—the set of weights, one weight per every combination kinematic variable,

lag, and neuronA linear model predicts the cth kinematic variable at time step n from the

lagged spike counts. The estimate is a weighted combination of firing rates acrossthe L lags and the M neurons.

ycn ¼ ac þXM

i¼1

XL

l¼1

xiðn$ lÞwc;il :

The minimum mean square error criterion (MMSE) is

wc;il

! "¼ argmin E ek k2

h i; ek k2 ¼ eT e; e ¼ ½ðy1 $ y1Þ ' ' ' ðyC $ yCÞ(T :

In the discrete time series predicts, theWiener-Hopf solution is the analytic solutionfor finding the optimum weights based on a mean-square error cost; the Wiener-Hopfsolution extends least-squares regression to time series [51–53], and is also referred toas theWienerfilter or theoptimal linear decoder.Analytic linearfilters invarious forms

a b

Fig. 4.4 (a) Finite impulse response (FIR) linear filter with a tapped-delay line of length L. Thez-transform notation z$1 indicates a single time delay. (b) Multiple-input and multiple-output FIRsystem. For each output (1. . .C) there is a vector of L weights for every neuron (1. . .M); resultingin a total of L • M • C weights. In the diagram only the cth output is shown for clarity


(both generative anddiscriminative; studied separatelyhere) havebeen theworkhorsesof many BMI studies, because of their ease of implementation. Analytic solutionscompute the best parameters from all available data.

As an alternative to analytic solutions, a stochastic update rule that iterativelymoves the system parameters toward the optimal from an initial setting may bedesired. The incremental updates are less computationally intensive and allow thesystem to track changes over time [42]. The most common stochastic update is theleast mean squares (LMS) algorithm that uses online gradient descent to approach aneighborhood of the optimal. The recursive least squares (RLS), which convergesto the Wiener solution has also been tested on BMI applications [54].

5.1 Wiener Filter for BMIs: Multiple-input and Multiple-output

As mentioned, the Wiener filter is the optimal set of weights for a finite-impulseresponse filter (FIR), (optimal in the sense that it minimizes the mean-square-errorof the estimate). The estimate is constructed from a transfer function between thespike counts and the kinematic variable. This transfer function has a finite-impulseresponse (FIR), as it uses only the L past values from each neuron.

One drawback of the Wiener filter is that it does not capture the dependenciesbetween output dimensions; thus, it is necessary to estimate each orthogonaldimension (analogous to the x, y, and z dimensions) of a single kinematic quantitysuch as velocity or position. If both position and velocity are estimated there is nonatural way to combine the two predicted quantities. The choice of which kinemat-ics to decode may vary depending on the movements required in the task.

The Wiener filter assumes that the modeled system is wide-sense stationary, sothe mean and correlation between the kinematics and different neurons do notchange in time. Unlike standard regression where there are multiple independentobservations of the same multivariate random variable, in the time domain there isone random variable for each time instance, and there are correlations amongstthem that need to be quantified. The correlation between pairs of random variablesat different lags is quantified with the cross-correlation function. For the case of thesame signal this is the autocorrelation function. Since for nonperiodic signals thesecorrelations decay with increasing lag, only a few need to be included in theanalysis. The number of lags included is the filter order L, and this can be chosenby inspection of the cross-correlation between the spike counts and the desiredmovements or through cross-validation [38].

Estimating the correlation structure of the underlying random process from asingle realization using multiple time windows requires an ergodic assumption. Thisallows the autocorrelation or cross-correlation to be estimated via time averages ofthe observations. We will denote two generic random processes asuiðtÞ; ujðtÞ. Theseprocesses could represent either the spike count of a neuron or the kinematicvariable. The theoretical analysis often assumes that the random processes havezero-mean. If the kinematic output has a nonzero mean, the mean should be removed


and added after regression. The mean spike count can also be removed from therates, but for small bin sizes it typically does not affect the solution.

For a wide sense stationary system, the cross-correlation is shift invariant Ruiuj

ðt1; t2Þ ¼ E uiðt1Þ ujðt2Þ½ ( ¼ E uiðt1 $ t2Þujð0Þ½ ( 8t1; t2. This can be abbreviated asRuiuj ðtÞ ¼ Ruiujðt1; t2Þ. For a finite observation window of N samples, there aretwo common options for estimating this, which only adjust the normalization. Thefirst assumes that the processes are zero outside of the window, and the seconduses smaller windows that are subsets of the full observation size. Here we useonly the first option.

Ruiuj ðtÞ )

1

N $ 1

XN$t

n¼1

uiðnÞujðnþ tÞ t * 0

1

N $ 1

XN$t

n$1

uiðnþ tÞujðnÞ t<0

8>>>>><

>>>>>:

:

Given this estimator of correlation, the rest of the Wiener filter is matrix algebra.For the case of multiple inputs in a BMI decoding we build one large crosscorrelation matrix blockwise from the individual neuron cross-correlation matrices.The cross-correlation matrix between the neurons should capture statistical corre-lation between neurons caused by synchronous firing [55].

Denote a cross-correlation matrix between spike counts of neurons i and j overthe L lags as

rxi xj ¼Rxi xjð0Þ ' ' ' Rxi xjðL$ 1Þ

..

. . .. ..

.

Rxi xjð1$ LÞ ' ' ' Rxi xjð0Þ

2

64

3

75

Form the multiple-input cross-correlation matrix for allM neurons blockwise as

R ¼

rx1x1 rx1x2 ' ' ' rx1xMrx2x1 rx2x2 ' ' ' rx2xM

..

. ... . .

. ...

rxMx1 rxMx2 ' ' ' rxMxM

2

6664

3

7775

The entries on the diagonal are the autocorrelation matrices for each of theneurons. Now the cross-correlation between each neuron and each kinematicvariable with a one step delay for prediction is calculated. Denote a cross-correlation vector for the spike count xi and the kinematic variable yc as

pxiyc ¼

Rxiycð$1ÞRxiycð$2Þ

..

.

Rxiycð$LÞ

2

6664

3

7775


Form the neuron-kinematic cross-correlation matrix P over allM neurons for allC kinematic variables as

P ¼px1y1 ' ' ' px1yC

..

. . .. ..

.

pxMy1 ' ' ' pxMyC

2

64

3

75

Then the optimal weights are computed by multiplying the inverse of the neuroncorrelation matrix by the neuron-kinematic correlation matrix.

w1;11 ' ' ' wC;1

1

w1;12 ' ' ' wC;1

2

..

. . .. ..

.

w1;1L ' ' ' wC;1

L

..

. . .. ..

.

w1;M1 ' ' ' wC;M

1

..

. . .. ..

.

w1;ML ' ' ' wC;M

L

2

666666666666664

3

777777777777775

¼ R$1 P:

Each column of the resulting matrix corresponds to the weights for a singlekinematic variable. In each column the weights are arranged for all lags of the firstneuron, followed by all the lags of the next neuron, and so on.

On the basis of the construction of the spike count cross-correlation matrix, it canbe inverted using Cholesky factorization, which reduces the computational complex-ity from cubic with Gaussian elimination to quadratic. For large matrices with a smallnumber of samples the matrix may be ill-conditioned, thus it is best to regularize it byadding a scaled identitymatrix—a technique known as ridge regression and employedin the BMI setting by [26]. The scaling parameter l can be chosen through cross-validation—i.e., training multiple filters on a subset of the training data and choosingthe filter that performs best on the remaining subset of the training data.

~R ¼ Rþ l

1 0 ' ' ' 00 1 0... . .

. ...

0 0 ' ' ' 1

2

664

3

775

Applying the Wiener Filter and Ridge RegressionHere the Wiener filter with and without ridge regression is applied to predict the

hand position of a monkey during a natural reaching task.A segment of the three dimensions of the hand position vector is shown during

the testing phase in Fig. 4.5. Using ridge regression (with cross-validated l)improves the average cross-correlation during the moving sections of the dataset.


5.2 Least Mean Squares

Iterative algorithms use online adaptations to incrementally update the parametersin system identification. This is in contrast to analytic solutions that use all availabledata in a window at one time. Iterative algorithms are useful when there is no initialtraining data but there is a known desired signal during operation to be used as atarget for error data. A simple, robust, and widely used adaptive algorithm isnormalized least mean squares (NLMS). For BMIs, NLMS has been compared inoff-line analysis to the Wiener solution [42, 48]. The algorithm is derived from leastmean square (LMS) algorithm by Widrow and Hoff [56], which is in the family ofalgorithms based on gradient descent. At every sample/step, gradient descentcorrects the system parameters by translating them in the opposite direction ofthe gradient of the cost function∇J(n). This update is w(n + 1) ¼ w(n) – !∇J(n),where ! is called the stepsize. When using the mean square error cost function, thegradient is still unknown and must be estimated. The simplest approach is to

Fig. 4.5 Comparison of Wiener filter with and without ridge regression on the natural reaching task[22] during a segment of the test set. The filter length is 3 for both approaches. Visually,ridge regression not only decreases the variance during rest, but also underestimates the reach length.In terms of cross-correlation ridge regression helps on average during the moving portions of thetest set


estimate the mean square error with only the latest error JðnÞ ¼ eðnÞk k2 ¼dðnÞ $ xðnÞwðnÞk k2 . Then the instantaneous (noisy) gradient with respect to the

last weights is the negative of the error times the input. The least mean square(LMS) algorithm updates the weights accordingly, wðnþ 1Þ ¼ wðnÞ þ !eðnÞxðnÞ.

The simplicity of this algorithm—with two multiplications and an addition perweight parameter—allows embedding in hardware such as microcontrollers. Thecomputational complexity is much less than an analytic solution; on the other hand,there is the addition of the free parameter !. LMS converges to a neighborhood ofthe optimal solution instead of the optimal solution; the size of this neighborhood iscontrolled by ! and is referred to as the misadjustment. The misadjustment is thenormalized excess MSE—the additional MSE from the optimal least squaressolution. There is an inherent tradeoff between faster adaptation and lowermisadjustment. Choosing the step-size for data that have unknown variance addsto the difficulty; the NLMS algorithm replaces the fixed step size with anormalizing factor based on the instanteanous power of the input. This increasesthe robustness of the LMS in noisy situations. These issues and application to BMIsare further discussed in [52].

6 Nonlinear Discriminative Models

The combination of a linear model with mean-square error cost function allows ananalytic solution for optimization and adaptive systems with no risk of being stuckin a local-minimum of the cost function (there still remains the risk of slowconvergence). However, this comes at a cost: linear functions are only a smallsubset of all possible functions. For example, linear models are not able to capturethe sigmoid shape of how the firing rate of some cells relate to velocity [57]. Yet,finding an optimal nonlinear function is an ill-posed problem. To this end weexplore two approaches: artificial neural networks and kernel methods.

Artificial neural networks define general networks of linear summations andnonlinear activation functions; for networks of sufficient size they can approximateany function to an arbitrary error (if properly trained). This is analogous to how anyfunction can be approximated to a given error by a polynomial of sufficient degree.Artificial neural networks have an architecture defined a priori and use iterativealgorithms to find the parameters that fit the data.

Kernel methods construct mapping functions from predefined kernel functionsthat operate on the data samples. With properly chosen kernels, a nonlinearmapping from the input space to the feature space is realized as a linear combina-tion of the kernel function evaluated at each of the sample. The cost function isconvex, so kernel methods realize a nonlinear mapping with no local minima.Mathematically motivated, this data-driven approach allows kernel functions tosucceed in a variety of applications, including recent application to BMIs. Due tothe mathematics required we do not cover kernel methods in this chapter, but referthe interested readers to [37].


6.1 Artificial Neural Network

A time-delay neural network (TDNN) is a form of nonlinear model for systemidentification formed by concatenating a tap-delay line with a multilayer perceptron(MLP). The basic building block of ANNs are perceptrons, which are linearsummations followed by fixed nonlinearities as shown in Fig. 4.6. A MLP is amultilayer feedforward network formed from multiple perceptrons as shown inFig. 4.7. In a TDNN, shown in Fig. 4.8, the delay line allows the system to workwith past data (just like the FIR filter), but the nonlinear units in the MLP yieldnonlinear (potentially universal) input–output mapping functions. The other advan-tage of the TDNN is that the weights (its parameters) can still be easily estimated usingthe back-propagation algorithm. Back-propagation is the application of the chain ruleto find the gradient of the error with respect to each parameter in the network.

a b

Fig. 4.6 (a) Perceptron, formed by linear combination of inputs and fixed sigmoid nonlinearity.(b) Concise notation for perceptron with L inputs

Fig. 4.7 Multilayer perceptron with L inputs,M hidden units in a single hidden layer, and 2 outputunits. In general, for C outputs the number of weights is (L • N) + (M • C). Notice that theseparate outputs share the same hidden layer, making the training of the kinematic variablesdependent on each other in contrast to the linear multiple-input and multiple-output system wherethe weights are trained independently


When compared with the linear model the nonlinear mapping capability is anadvantage, but the TDNN shares several of the same shortcomings as well as havingsome of its own: it is still an input–output model that is trained off-line andperforms poorly if the data are nonstationary during the training interval. Moreover,the TDNN has many more parameters for the same memory and number of outputswhen compared with the linear model, and since the parameter adaptation is notconvex there are multiple local minima in the performance surface that may hinderperformance. One approach is to divide the decoding into multiple linear filters withcompetitive gating such that only one linear filter is active at a time. This wasapplied to a natural reaching task where separate linear filters perform well ondifferent segments of the movement [45]. Sanchez and Prıncipe [52] give a fulloverview of TDNNs for BMI decoding. In general, TDNNs have been used in off-line analysis of movement decoding but have not been shown to be significantlybetter [22, 42].

There are several other artificial neural network topologies that have beenapplied in BMIs. In fact the first online BMI [21] utilized a recurrent neural network(i.e., the hidden layer of the MLP has time-delayed feedback connections amongstthe units) and was used in further off-line analysis [50]. Recurrent neural networksare typically the most parsimonious in terms of parameters since they do not need touse a memory layer at the input due to its recurrent connections that capture pastinformation. However, the training is far more complicated since the topology hasrecurrent loops, and the gradients become dynamic also. Therefore, special trainingalgorithms such as back-propagation through time or real-time recurrent learningmust be utilized [51].

Fig. 4.8 Time-delay neural network (TDNN) formed by concatenatingM tapped-delay lines witha multilayer perceptron with a single hidden layer and N hidden units


7 Generative Models

Generative models create a probabilistic model of the neural activity or both theneural activity and kinematics. The observed neural activity is assumed to berealizations of this generating process, which is a function of the kinematics andother variables. Thus, the system to be identified is the inverse of the generativemodel; instead of finding a mapping from the neural data to kinematics of interest,generative models seek the function that defines the random process and by whichthe neural data are generated.

As an illustrative example—in a simplistic parametric model—one could main-tain that binned spike counts are realizations from a Poisson distribution whosemean is a linear function of the kinematics. Furthermore, it is often assumed thatdifferent neurons are conditionally independent given the kinematics. In this case,one can estimate the coefficients of the linear function via maximum likelihood.

Additionally, one can assume that the observed kinematics are realizations froma random process. In discrete time, the kinematics evolve according to somefunctions of the previous kinematic position and an innovation. An innovation isthe difference between two successive state estimates or, equivalently, the distribu-tion of the next state conditional to the current state. For kinematic variablesinnovations are usually assumed to be independent and identically drawn from adistribution, commonly Gaussian. In this context the kinematics are referred to as astate. A prior model defines the probability of any state sequences (hand or armmovements). This prior model can improve the performance of the decoding.Imagine estimating the future path of a ball toss given two previous snapshots ofits position and velocity. Continuing along the path would be the most likelyassumption, and a similar naıve assumption on dynamics can be used for motormovements. For example, if there is a large velocity in one direction, it is unlikelythat the next movement will be retrograde.

The combination of the prior model on state sequence with the likelihood of theneural activity given the kinematics is Bayesian estimation. The Kalman filter isapplicable when the neural activity is assumed to be produced from a Gaussiandistribution whose mean is a linear function of the kinematics and when thekinematics themselves have linear updates with Gaussian innovations.

Generative models define the neural activity at a particular instant, as opposed todiscriminative models that use a time-embedding of neural activity. Therefore, it isnecessary to align the neural activity with the kinematic variables. This differs fromdiscriminative models with delay lines where multiple delays are used. The optimallag can be found by picking the lag at which the cross-correlation between theneural activity and the kinematics peaks, or it can be found by training models atdifferent lags and picking the best performing through a cross-validation scheme,where part of the off-line training data are left out of training and used for testing.

The breadth of literature on generative models for movement decodingdeals with a variety of cases on the type of linking functions and distributions forboth the neural activity and kinematics and how exactly the model coefficients are


estimated from data. The linking functions can be linear, log-linear, or arbitrarylinear–nonlinear functions, and the distributions are typically Gaussian or Poisson.As such, the likelihood function is typically a closed-form function with parametersestimated from the data, but the likelihood function can also be sequentially (recur-sively) estimated by simulation via particle filtering, also known as the sequentialMonte Carlo method. This second option allows estimation where there are nottractable solutions to the parameter estimation. Particle filtering has been applied in[57–61].

One advantage of generative models over discriminative models is the ability toexpress the probability of observing the neural signal such that point processrealizations—spike trains—can be used instead of time series of spike counts. Theuse of explicit point process models is used in off-line analysis [25, 57, 60, 61].Often, sequential Monte Carlo methods are employed to compute the parameters inthese point-process models.

7.1 Population Vector Algorithm—Linear GaussianGenerative Models

The encoding model for the population vector algorithm is based on twoassumptions: first, each neuron’s firing rate is conditionally Gaussian distributed,and is conditionally independent of the others given the kinematic variable, andsecond, the preferred direction of the kinematic vectors are uniformly distributedamong the sampled population of neurons. These encoding models may be referredto as linear Gaussian generative models, as they model the firing rate as a Gaussiandistribution with mean equal to a linear function of the kinematics. It is assumedthat the spiking activity correlates with subsequent kinematics; thus, the firing rateis a function of the kinematics that will occur. Unlike the discriminative models thatmodel the kinematics as a function of past spiking, the rate is a function of thekinematics at a certain time step in the future.

To cover linear Gaussian generative models, let us define some quantities:

yn+D—kinematic vector, velocity in C dimensions, at time step n + DY—N + C matrix formed from the kinematic vectors in the training setxin—spike count for the ith neuron (out of M) at time step nX—at N + M matrix formed from spike count vectors training setwi—unit vector pointing in the preferred direction for the ith neuronW—C + M matrix formed from the preferred direction vectors

1—a M + 1 vector of ones

The exact formulation of the “population vector algorithm” depends uponwhether the velocity or just the direction of the movement is encoded and upon avariety of normalization schemes for both the firing rates and the movementdirection. Here the notation follows [36], except for some normalization that was


not addressed. Let the mean spike count for a neuron be a linear function of itspreferred direction, and the velocity vector D time steps in the future

E xin# $

¼ f ðynþD; WiÞ ¼ bi þ aiWiTynþD:

The unnormalized preferred directions ui and the vector of mean firing rates bare estimated via least-squares. This estimation does not force them to be unitvectors, so they must be normalized to length 1.

Z ¼ 1 Y½ (

bU½ ( ¼ ðZTZÞ$1ðZTXÞ

Wi ¼ ui

uik k

In addition, the normalized spike count is used for each neuron, such that allneurons have the same variance. Denote the normalized spike rate with a scalingfactor of the number of neurons M and the dimension of the velocity vector C as

zin ¼M

C ' std½xi(xin $ bi% &

:

Then the expected normalized firing rate is linear combination of the kinematicvariable E zin

# $¼ wT

i ynþD. Let the vector of normalized spike rates be denoted:

zn ¼ z1n ' ' ' zMn# $T

:

From these quantities, there are three different algorithms based on differentassumptions on the distributions of the rates and preferred directions. For the caseof nonuniform preferred directions and uncorrelated neural firing the optimal linearestimate [31] of the kinematic variable is

ynþD ¼ ðWTWÞ$1WTzn:

To take unequal variance into account, the cross-correlation matrix of the firingrate is needed:

ynþD ¼ ðWTR$1WÞ$1WTR$1zn;

where R is the cross-correlation matrix between neurons. For a large number ofneurons the cross-correlation may need regularization R ¼ (ZTZ + lI).

The simplest approach often referred to as the population vector algorithmavoids both of these matrix inversions. If the preferred directions are uniformlydistributed, then the expectation of the inverse of the correlation matrix is a scaled


identity matrix with its diagonal equal to the dimension of the kinematic vectordivided by the number of neurons,

ynþD ¼ C

MWTzn:

As the presented Gaussian generative models explicitly decode velocity, theposition information is formed via a discrete summation to approximate integration.In an online environment, the user can adjust for bias errors, but in the off-lineenvironment high-pass filtering using an infinite impulse function with a low fre-quency cut-off suffices. However, quantitative measures on the decoded position aregenerally poor. The three linear Gaussian generative models (uniformly distributedpreferred directions and uncorrelated neurons, nonuniformly distributed preferreddirections and uncorrelated neurons, and nonuniformly distributed preferred directionsand correlated neurons) are compared on the natural reaching task as shown in Fig. 4.9.

Fig. 4.9 Linear generative models with different assumptions (uniform preferred directions,uncorrelated neurons; nonuniform, uncorrelated neurons; and nonuniform, correlated neurons)applied to the natural reaching task [22]. The discrete time velocity is estimated at 100 msintervals; the result is integrated, and filtered with a first-order IIR high-pass filter with a cutoffof 2.5 Hz. Also shown are the results from the Wiener filter with three taps and ridge regression. Interms of cross-correlation, the linear generative models perform poorly: cross-correlation less than0.15; however, qualitatively the generative models appear to be better at estimating the reachingsegments than the Wiener filter with ridge regression


7.2 Maximum Likelihood Point Process

With generative models, it is possible to define the neural activity in terms of a pointprocess. For the common case of a marginal intensity function that does not dependon past spikes (e.g. Poisson), the spike rate is estimated in bins, but the bin size canbe decreased to the limit of at most one spike per bin. Therefore, the model can usemore of the exact timing of the spikes from the single units at the cost of using muchhigher number of parameters. This is in contrast to encoding the rate using the spikecounts as observations where multiple spikes per bin are the norm for smoothness.

The basic definition of a point-process in time follows. Let N(t) be a countingprocess—N(t) * 0, N(t) is an integer, and N(t) , N(t þ s) for every 0 , t, 0 , s—of the number of spike events that occur in the interval (0,t] then the conditionalintensity function is

lðtjHtÞ ¼ limDt!01

DtPr Nðtþ DtÞ $ NðtÞ ¼ 1jHtf g;

where Ht is the history of the past spiking events.The conditional intensity function can be modeled as a function of the kinematic

variables and, in addition, past spikes to account for the history. As it is difficult toestimate the full conditional intensity function, the first assumption is to avoid thehistory term and estimate the marginal intensity function.

By using the marginal intensity function l(t) each spike interval is independentof previous spikes, and the point process is known as an inhomogeneous Poissonprocess. For very small time steps Dt (using small time steps minimizes the chancefor more than one spike, and we assume the rate is constant within the time step),the probability of a spike in the interval (t, t + D) is Pr{N(t + Dt) $ N(t) ¼ 1} ¼l(t)Dtexp[$l(t)Dt]. The marginal intensity function is often modeled as a cascade:a linear combination A of the input vector xt followed by a fixed nonlinearity,l(t) ¼ f(b + Axt). This approach is known as the linear–nonlinear–poisson (LNP)model [62].

The LNP model assumes that all spikes are conditionally independent given theinput; in order to relax this assumption, the past spiking activity of the unit andother units can be treated as inputs. Thus, the conditioning on spiking history can beapproximated by a generalized linear model (GLM) for spike train modeling [63]that explicitly incorporates cross-coupled and self-spiking terms into the linearportion of the LNP model. A diagram of GLM is shown in Fig. 4.10. The GLMrecursively incorporates the post-spike effects and coupling effects to estimate thePoisson spiking rate by computing a nonlinear function of the sum of the filteredkinematics and spiking terms. See [63] for further details, and see [25] for furtherapplication.


7.3 Bayesian Models

Bayesian generative models gain efficiency by assuming a fixed probabilisticstructure between the kinematics, neural activity, and any additional information.For a Bayesian generative model, the neural activity is ascribed to be dependent onthe kinematic variables as shown in Fig. 4.11. This is seemingly opposite of thediscriminative models that define the kinematic variables to be a deterministicfunction of the neural activity with additive observation noise. Additionally, byusing temporal state space models, there is an explicit modeling of how differentkinematic variables—position, velocity, and acceleration—interact in time; thisinteraction is not available in the forward models that independently estimatethese variables.

For the case of the popular and powerful Kalman filter the innovations (errors)are assumed to be Gaussian distributed and the update equations for both the stateand observations are linear. Under these assumptions the update equations haveclosed-form. The state model is normally a low-pass filter on the kinematics, andthe observation model—which relates the input neural activity with the kinemat-ics—is trained for each neuron.

To cover the Kalman filter with fixed updates [64], let us define some quantities:

yn—kinematic vector, containing position, velocity, etc. with C dimensions, at timestep n (here no explicit delay is noted, but a delay is added such that the spiking is aresult of future kinematics)yn+1 ¼ Ayn + wn—state update equation with innovation wn ~ N(0,W)A—C + C state update matrix, either predefined or fit by least squaresW—C + C covariance matrix of innovationsxn—vector of spike counts from M neurons at time step n

Fig. 4.10 Generalized linear model: a discrete time approximation of the conditional intensity ofthe point process is estimated as a nonlinear function applied to finite impulse response filter of theinput, spike train realizations from itself (self-spiking), and spike train realizations from otherneurons in the ensemble (cross-coupling)


xn ¼ Hyn + qn—observation model with errors qn ~ N(0,Q)H—M + C observation matrix, fit by least squaresQ—M + M covariance matrix of errors

The matrices A,W,H,Q are fit with training data. For a window of length N,

A ¼XN$1

n¼1

ynþ1yTn

!XN$1

n¼1

ynyTn

!$1

W ¼ 1

N $ 1

XN$1

n¼1

ynþ1yTnþ1 $ A

XN$1

n¼1

ynyTnþ1

!

H ¼XN

n¼1

xnyTn

!XN

n¼1

ynyTn

!$1

Q ¼ 1

N

XN

n¼1

xnxTn $ H

XN

n¼1

ynxTn

!

:

The equations for the operation of the Kalman filter to estimate the kinematicvector (state vector) follow:

0. Set initial kinematic vector y0 and error covariance matrix P0 (use the mean andscaled identity matrix, respectively, for the case of no prior information)

1. PredictionA priori state estimate based on previous kinematic vector

y$n ¼ Ayn$1

A priori error covariance estimate

P$n ¼ APn$1A

T þW

Fig. 4.11 First-order Markov state space model where the neural activity x at time n is conditionallydependent of only the state y and the state update is a function of the current state


2. UpdateKalman gain

Kn ¼ P$n H

T ¼ HP$n H

T þ Q% &$1

State update using observed spike counts xn

yn ¼ y$n þ Kn ¼ xn $ Hy$n% &

Error covariance update

Pn ¼ ðI $ KnHÞP$n

See Fig. 4.12 for the performance of the Kalman filter trained and operating inthis manner on the natural reaching task.

Fig. 4.12 Kalman filter applied to the natural reaching task [22]. The state is six dimensionsconsisting of the three dimensions for position and velocity, but only the estimated position isdisplayed. On this particular dataset the Kalman filter has the best average performance on the testset, beating both the Wiener filter with and without regularization shown in Fig. 4.5 and the lineargenerative models shown in Fig. 4.9


For increased flexibility with unconstrained error distributions or nonlinearmodels (the most general formulation is known as the recursive Bayesian filter),there are methods such as particle filtering (also referred to as a sequential Monte-Carlo), which are still computationally tractable, but they lack the elegance ofKalman filter with its closed-form update. In these models, the single-unit activity isusually assumed to be Poisson, but other distributions may also perform well [60].For the case of Poisson, the Linear Nonlinear Poisson (LNP) is a general model forrelating the linear filtered stimulus (or kinematics) to the mean of Poissondistributed spiking by an arbitrary nondecreasing function [62] (similar to theGLM model in shown Fig. 4.10). For BMIs the nonlinearity can be an exponential[59] or arbitrary nonlinear function [57, 58, 60, 62].

7.4 Closed-Loop Comparison of Generative Models

The end goal for motor BMIs is to give users the ability to interact throughcontinuous movement. Any decoding algorithm is an artificial surrogate for naturalmotor control. Some characteristics of the decoding algorithm may not be apparentin off-line studies. In the study by Koyama et al. [36], three generative models withdifferent frameworks were compared in pairwise sessions. Although the task was asimple 8-target, center-out reaching task, there were noticeable differences in theonline trajectories; furthermore, the model with an explicit state space had thelowest reach time. The constrained nature of the task with linear optimal trajectorieswould seem to favor state space methods. For illustrative purposes, the decodedtrajectories and task performance graph are reproduced in Figs. 4.13 and 4.14.

8 Conclusion

Brain–machine interfaces, especially those that control the continuous movementof robotics, are a particular challenge for neural decoding. Past studies have shownthe successful application of numerous decoding approaches in both closed-loopand off-line decoding. When engineering a BMI system, one should carefullyconsider not only the choice of the decoding algorithm but also the issues such asfeature space (inputs), regularization, and adaptation. This chapter has highlightedsome of the basic concepts that must be considered when engineering solutions forneural decoding and has illustrated the most popular decoding algorithms.

BMI research has proved that decoding techniques can extract the intention ofmovement from neural activity and brought new insights into the neurosciencebehind motor control. As ongoing research continues to advance the capabilitiesand complexity of BMIs, there is a fundamental need to understand the conceptsand challenges of system identification for neural decoding of motor control.


Fig. 4.13 Comparison of generative models in online cursor control. The generative modelsinclude population vector algorithm (PVA), linear Gaussian generative model with boxcar filter onspike trains (LGB), and log-link function with Poisson distributed spike counts coupled with astate space model (NPS). The task consists of a center-out reaching task with 8 targets. Each targetis selected 15 times in pseudo-random order. Side-by-side algorithms were run during the samesession. Modified from [36] with permission

The coverage of algorithm development in this chapter has been limited tosupervised learning approaches, as this is the first phase of BMI operation. Butthe reader is challenged to think of training BMIs in the general case where thedesired movement is not always known, and the required movements are ofincreasing complexity. Fundamentally, BMI research goes beyond decoding theneural systems as they exist; BMIs seek to fundamentally alter or augment how theusers’ neural activity interacts with the external devices.

Appendix A. Review of Closed-Loop Experiments

In 1969, Fetz showed how a monkey could control a feeding apparatus bycontrolling the firing rate of a single cortical neuron in a closed-loop experiment.Some 40 years later the work by Chapin et al. [21] demonstrated the use ofsimultaneous recording of multiple neurons to drive a robotic device. This studyseems to be at the point in time when many research groups shifted from studyingaspects of motor decoding to implementing their algorithms in real-time motorBMIs. In the study, rats were trained to operate a small lever with their paw thatcontrolled a robotic watering device, whose one-dimensional movement retrieved

Fig. 4.14 Comparison of average speed and target hitting time for the three generative models.The state space model (NPS) has visible better hitting time. If the velocity decoded from the neuralfiring rate is correct, the state space model (NPS) helps keep the trajectories linear, which isnecessary on a center-out task. PVA and NPS were used in 16 sessions and LGB was used in 12sessions. Modified from [36] with permission


a water droplet from a dispenser and delivered it to the rat’s mouth. The control ofthe device was switched from the physical lever movement to a signal derivedfrom the firing rates of simultaneously recorded neurons in the motor cortex.This pioneering effort identified feature selection as a key component to improv-ing the decoding: namely, the authors used the first component from principalcomponent analysis (PCA) instantaneously applied across neuronal firing ratesfrom the motor cortex and ventral lateral thalamus to decrease the dimensionalityof the input signal, then they showed that after thresholding; this signal correlatedwith the movement of the lever. Incorporating further temporal information viathe more powerful recurrent artificial neural network improved the resultssubstantially.

In March 2002, Serruya et al. [23] reported how a monkey trained to move amanipuladum to control a two-dimensional cursor for a target tracking task couldcontinue the task when switched to BMI control with similar success. The decoderwas formed via linear regression with less than 30 neurons in the primary motorcortex using inputs of twenty 50-ms bins of spike counts.

Soon after in June 2002, Taylor et al. [10] reported work that addressed twoaspects of BMIs: first, the importance of visual feedback in closed-loop operation,which allows the user to adjust for discrepancies in the algorithmic mapping, andsecond, the need for “co-adaption” of both the user and the decoder. The monkeyperformed a center-out reaching task in a three-dimensional virtual environment;the reaches were to stationary targets. The input was single-unit activity from apopulation of an average 40 motor cortex cells, binned at 90 ms, and the move-ment was updated at 30 ms (subsequent estimates overlapped by 60 ms). Fordecoding, the authors used a modified population vector algorithm: first, eachneuron’s firing rate is transformed to a normalized deviation from a mean, second,the sign of the deviation controls which of two weights is multiplied to thedeviation, and finally, the contribution over all neurons is summed. This valuehad its expected mean removed to avoid the cursor from drifting under averagefiring rates.

In a conference paper in 2003, Helms Tillery et al. [28] reported using a similaradaptive training of initially random coefficients in a population vector algorithm toa level of proficiency such that a monkey could control portions of a self-feedingtask with a robotic arm.

The report published by Carmena et al. 2003 [11] solidified closed-loop opera-tion of BMIs. Using linear models, the authors extracted hand position, velocity,and gripping force. With the addition of gripping force, the monkeys were enabledto grasp virtual objects in their task (estimated gripping was indicated by changingthe diameter of the circular cursor, much as the Z-direction in [10] was indicated bythe diameter of the sphere). This study used spike counts from a 1-s windoworganized in ten 100-ms bins from large population of neurons in various brainareas—dorsal premotor cortex (PMd), supplementary motor area (SMA), and theprimary motor cortex (M1)—in both hemispheres.


Wu et al. [32, 35] highlighted a few closed-loop trials of a Kalman filter in aconference paper presented in September 2004. The Kalman filter has been shownin numerous off-line studies to be a robust algorithm for inferring the kinematicsfrom neural states [35, 54, 65].

In a study from May 2006, Wahnoun et al. [5] avoided using natural movementsas initialization. Instead, the BMI was initialized by training data in which themonkey actively watched a cursor movement. This surrogate training was a steptoward moving BMI applicability to the user population without the ability to makemovements.

In July 2006, Hochberg et al. [4] reported a landmark study with a functioningBMI for a human with tetraplegia. In this work, the BMI decoder was trained whilethe human subject imagined making movements as instructed by a technician. Thehuman user was able to control a 2D cursor using a linear filter [53] with twenty 50-ms binned spike counts. This was followed by additional studies with human usersusing the Kalman filter [33] for decoding along with a discriminator for cursorclicking [24, 34]. In a study performed in January 2008, Truccolo et al. [25]reported adding another human user with no peripheral sensation using the sameonline protocol as [4].

In June 2008, Velliste et al. [13] published a successful story in which the use ofgradually decreased assistive training enabled a monkey to perform a self-feedingtask using a robotic arm and BMI. The BMI decoder was able to recreatemovements using three dimensions of velocity and gripper aperture. The underly-ing decoder used was a population vector algorithm similar to Taylor et al. [10].

In November 2008, Mulliken et al. [26] showed continuous trajectories decodedfrom the parietal reach region as opposed to the primary motor cortex. They used alinear filter with regularization known as ridge regression.

In December 2008, Shpigelman et al. [37] demonstrated closed-loop operationwith a nonlinear decoding algorithm over multiple days. The decoder was based ona kernel method approach.

Also in December 2008, Jarosiewicz et al. [29] demonstrated online closed-loopoperation where subsets of available neurons were removed as inputs to thepopulation vector algorithm. This analysis is known as neuron dropping and ispopular for off-line analysis where the importance of neurons to the motor decodingperformance is assessed. This is the first study to see how the remaining neuronsmust compensate to maintain task performance in an online task.

Further online closed-loop comparisons were performed by Chase et al. [30]between the population vector algorithm [27] and optimal linear estimators [31]which take neuron correlation and variance into account. Different assumptions ondistributions, tuning functions, and filtering were compared in consecutive closed-loop trials in a pairwise manner in a study by Koyama et al. [36].


References

1. Shenoy KV et al (2003) Neural prosthetic control signals from plan activity. Neuroreport14(4):591

2. Musallam S (2004) Cognitive control signals for neural prosthetics. Science 305(5681):258–262

3. Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV (2006) A high-performancebrain–computer interface. Nature 442(7099):195–198

4. Hochberg LR et al (2006) Neuronal ensemble control of prosthetic devices by a human withtetraplegia. Nature 442(7099):164–171

5. Wahnoun R, He J, Tillery SIH (2006) Selection and parameterization of cortical neurons forneuroprosthetic control. J Neural Eng 3(2):162–171

6. DiGiovanna J, Mahmoudi B, Fortes J, Principe JC, Sanchez JC (2009) Coadaptivebrain–machine interface via reinforcement learning. IEEE Trans Biomed Eng 56(1):54–64

7. Fuster JM (1990) Prefrontal cortex and the bridging of temporal gaps in the perception-actioncycle. Ann N Y Acad Sci 608(1):318–336

8. Wolpert DM, Ghahramani Z, JordanMI (1995) An internal model for sensorimotor integration.Science 269(5232):1880

9. Miall R, Wolpert DM (1996) Forward models for physiological motor control. Neural Netw9(8):1265–1279

10. Taylor DM (2002) Direct cortical control of 3D neuroprosthetic devices. Science 296(5574):1829–1832

11. Carmena JM et al (2003) Learning to control a brain–machine interface for reaching andgrasping by primates. PLoS Biol 1(2):e2

12. Ganguly K, Carmena JM (2009) Emergence of a stable cortical map for neuroprostheticcontrol. PLoS Biol 7(7):e1000153

13. Velliste M, Perel S, Spalding MC, Whitford AS, Schwartz AB (2008) Cortical control of aprosthetic arm for self-feeding. Nature 453(7198):1098–1101

14. Sanchez JC, Mahmoudi B, DiGiovanna J, Principe JC (2009) Exploiting co-adaptation for thedesign of symbiotic neuroprosthetic assistants. Neural Netw 22(3):305–315

15. Mahmoudi B, Sanchez JC (2011) A symbiotic brain–machine interface through value-baseddecision making. PLoS One 6(3):e14760

16. Hatsopoulos NG, Donoghue JP (2009) The science of neural interface systems. Annu RevNeurosci 32(1):249–266

17. Evarts EV (1968) Relation of pyramidal tract activity to force exerted during voluntarymovement. J Neurophysiol 31(1):14–27

18. Fetz EE (1969) Operant conditioning of cortical unit activity. Science 163(3870):95519. Kennedy PR, Bakay RAE (1998) Restoration of neural output from a paralysed patient by a

direct brain connection. Neuroreport 9(8):170720. Humphrey DR, Schmidt E, Thompson W (1970) Predicting measures of motor performance

from multiple cortical spike trains. Science 170(3959):75821. Chapin JK, Moxon KA, Markowitz RS, Nicolelis MA (1999) Real-time control of a robot arm

using simultaneously recorded neurons in the motor cortex. Nat Neurosci 2(7):664–67022. Wessberg J et al (2000) Real-time prediction of hand trajectory by ensembles of cortical

neurons in primates. Nature 408(6810):361–36523. Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP (2002) Brain-machine

interface: Instant neural control of a movement signal. Nature 416(6877):141–14224. Kim S-P, Simeral JD, Hochberg LR, Donoghue JP, Friehs GM, Black MJ (2007) “Multi-state

decoding of point-and-click control signals from motor cortical activity in a human withtetraplegia.” In: 3rd international IEEE/EMBS conference on neural engineering, 2007, CNE‘07, pp 486–489.

25. Truccolo W, Friehs GM, Donoghue JP, Hochberg LR (2008) Primary motor cortex tuning tointended movement kinematics in humans with tetraplegia. J Neurosci 28(5):1163–1178


26. Mulliken GH, Musallam S, Andersen RA (2008) Decoding trajectories from posterior parietalcortex ensembles. J Neurosci 28(48):12913–12926

27. Georgopoulos AP, Kalaska JF, Caminiti R, Massey JT (1982) On the relations between thedirection of two-dimensional arm movements and cell discharge in primate motor cortex.J Neurosci 2(11):1527–1537

28. Helms Tillery SI, Taylor DM, Schwartz AB (2003) “The general utility of a neuroprostheticdevice under direct cortical control.” In: Proceedings of the 25th annual international confer-ence of the IEEE engineering in medicine and biology society, 2003, vol. 3, pp 2043–2046.

29. Jarosiewicz B, Chase SM, Fraser GW, Velliste M, Kass RE, Schwartz AB (2008) Functionalnetwork reorganization during learning in a brain-computer interface paradigm. Proc NatlAcad Sci USA 105(49):19486–19491

30. Chase SM, Schwartz AB, Kass RE (2009) Bias, optimal linear estimation, and the differencesbetween open-loop simulation and closed-loop performance of spiking-based brain–computerinterface algorithms. Neural Netw 22(9):1203–1213

31. Salinas E, Abbott LF (1994) Vector reconstruction from firing rates. J Comput Neurosci1(1):89–107

32. Wu W, Shaikhouni A, Donoghue JP, Black MJ (2004) “Closed-loop neural control of cursormotion using a Kalman filter.” In: Proceedings of the 26th annual international conference ofthe IEEE Engineering in Medicine and Biology Society, 2004, vol. 2, pp 4126–4129

33. Kim S-P, Simeral JD, Hochberg LR, Donoghue JP, Black MJ (2008) Neural control ofcomputer cursor velocity by decoding motor cortical spiking activity in humans withtetraplegia. J Neural Eng 5(4):455–476

34. Kim S-P, Simeral JD, Hochberg LR, Donoghue JP, Friehs GM, Black MJ (2011) Point-and-click cursor control with an intracortical neural interface system by humans with tetraplegia.IEEE Trans Neural Syst Rehab Eng 19(2):193–203

35. Wu W, Black MJ, Mumford D, Gao Y, Bienenstock E, Donoghue JP (2004) Modeling anddecoding motor cortical activity using a switching Kalman filter. IEEE Trans Biomed Eng 51(6):933–942

36. Koyama S, Chase SM,Whitford AS, Velliste M, Schwartz AB, Kass RE (2009) Comparison ofbrain–computer interface decoding algorithms in open-loop and closed-loop control. J ComputNeurosci 29(1):73–87

37. Shpigelman L, Lalazar H, Vaadia E (2009) “Kernel-ARMA for hand tracking and brain-machine interfacing during 3D motor control.” In: Koller D, Schuurmans D, Bengio Y, BottouL (eds.), Advances in Neural Information Processing Systems, 21: 1489–1496.

38. Hatsopoulos N (2004) Decoding continuous and discrete motor behaviors using motor andpremotor cortical ensembles. J Neurophysiol 92(2):1165–1174

39. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr19(6):716–723

40. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–46441. Sanchez JC, Carmena JM, Lebedev MA, Nicolelis MAL, Harris JG, Principe JC (2004)

Ascertaining the importance of neurons to develop better brain–machine interfaces. IEEETrans Biomed Eng 51(6):943–953

42. Kim S-P et al (2006) A comparison of optimal MIMO linear and nonlinear models forbrain–machine interfaces. J Neural Eng 3(2):145–161

43. Tibshirani R (1996) Regression shrinkage and selection via the lasso. JR Statist Soc Ser B58(1):267–288

44. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat32(2):407–451

45. Kim S-P et al (2003) Divide-and-conquer approach for brain machine interfaces: nonlinearmixture of competitive linear models. Neural Netw 16(5):865–871

46. Darmanjian S et al (2003) “Bimodal brain–machine interface for motor control of roboticprosthetic.” In: Proceedings of the 2003 IEEE/RSJ International Conference on IntelligentRobots and Systems, 2003 (IROS 2003), vol 4, pp 3612–3617


47. Wood F, Prabhat, Donoghue JP, Black MJ (2005) “Inferring attentional state and kinematicsfrom motor cortical firing rates.” In: Proceedings of the 27th annual international conference ofthe IEEE engineering in medicine and biology society, 2005, pp 149–152.

48. Sanchez JC et al (2002) “Input–output mapping performance of linear and nonlinear modelsfor estimating hand trajectories from cortical neuronal firing patterns.” In: Proceedings of the2002 12th IEEE Workshop on neural networks for signal processing, 2002, pp 139–148.

49. Sanchez JC et al (2003) “Interpreting neural activity through linear and nonlinear models forbrain machine interfaces.” In: Proceedings of the 25th annual international conference of theIEEE engineering in medicine and biology society, 2003, vol 3, pp 2160–2163

50. Sanchez JC, Erdogmus D, Nicolelis MAL, Wessberg J, Principe JC (2005) Interpreting spatialand temporal neural activity through a recurrent neural network brain–machine interface.IEEE Trans Neural Syst Rehab Eng 13(2):213–219

51. Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems. Wiley,New York, NY, p 656

52. Sanchez JC, Principe JC (2007) Brain–machine interface engineering. Morgan & Claypool,New York, NY

53. Warland DK, Reinagel P, Meister M (1997) Decoding visual information from a population ofretinal ganglion cells. J Neurophysiol 78(5):2336–2350

54. Wu W, Hatsopoulos NG (2008) Real-time decoding of nonstationary neural activity in motorcortex. IEEE Trans Neural Syst Rehab Eng 16(3):213–222

55. Hatsopoulos NG, Ojakangas CL, Paninski L, Donoghue JP (1998) Information about move-ment direction obtained from synchronous activity of motor cortical neurons. Proc Natl AcadSci USA 95(26):15706–15711

56. B. Widrow, “Adaptive switching circuits,” IRE WESCON convention record, 196057. Wang Y, Principe JC (2010) Instantaneous estimation of motor cortical neural encoding for

online brain–machine interfaces. J Neural Eng 7(5):05601058. Gao Y, Black MJ, Bienenstock E, Wu W, Donoghue JP (2003) “A quantitative comparison of

linear and non-linear models of motor cortical activity for the encoding and decoding of armmotions.” In: 1st International IEEE EMBS conference on neural engineering, 2003, pp189–192

59. Brockwell AE (2004) Recursive Bayesian decoding of motor cortical signals by particlefiltering. J Neurophysiol 91(4):1899–1907

60. Shoham S, Paninski LM, Fellows MR, Hatsopoulos NG, Donoghue JP, Normann RA (2005)Statistical encoding model for a primary motor cortical brain–machine interface. IEEE TransBiomed Eng 52(7):1312–1322

61. Wang Y, Paiva ARC, Principe JC, Sanchez JC (2007) “AMonte Carlo sequential estimation ofpoint process optimum filtering for brain machine interfaces.” In: International joint confer-ence on neural networks, 2007. IJCNN 2007, pp 2250–2255.

62. Paninski L (2004) Superlinear population encoding of dynamic hand trajectory in primarymotor cortex. J Neurosci 24(39):8551–8561

63. Truccolo W (2004) A point process framework for relating neural spiking activity to spikinghistory, neural ensemble, and extrinsic covariate effects. J Neurophysiol 93(2):1074–1089

64. Wu W, Gao Y, Bienenstock E, Donoghue JP, Black MJ (2006) Bayesian population decodingof motor cortical activity using a Kalman filter. Neural Comput 18(1):80–118

65. Wei W, Kulkarni JE, Hatsopoulos NG, Paninski L (2009) Neural decoding of hand motionusing a linear state-space model with hidden states. IEEE Trans Neural Syst Rehab Eng 17(4):370–378


Documents

Chapter 4 Decoding Algorithms for Brain–Machine Interfaces