NEURAL DYNAMICS OF MOTOR PREPARATION AND A …shenoy/Theses/Yu.pdf · 2018-05-26 · Unit at University College London visited our lab. ... 2.3 Goal-directed reach task and neural

NEURAL DYNAMICS OF MOTOR PREPARATION AND

EXECUTION

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRIAL

ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Byron M. Yu December 2006

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

UMI Number: 3242641

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copy

submitted. Broken or indistinct print, colored or poor quality illustrations and

photographs, print bleed-through, substandard margins, and improper

alignment can adversely affect reproduction.

In the unlikely event that the author did not send a complete manuscript

and there are missing pages, these will be noted. Also, if unauthorized

copyright material had to be removed, a note will indicate the deletion.

®

UMIUMI Microform 3242641

Copyright 2007 by ProQuest Information and Learning Company.

All rights reserved. This microform edition is protected against

unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company 300 North Zeeb Road

P.O. Box 1346 Ann Arbor, Ml 48106-1346


(c) Copyright by Byron M. Yu 2007

All Rights Reserved

ii


I certify th a t I have read this dissertation and tha t, in my opinion, it

is fully adequate in scope and quality as a dissertation for the degree

of Doctor of Philosophy.

(Krishna V. Shenoy) Principal Advisor




(Maneesh Sahani)




Udijs* ~ — ■—(William T. Newsome)




(

Approved for the University Committee on G raduate Studies.

iii

a H. Meng


Abstract

In everyday life, the brain takes in sensory inputs, processes that information, and sends motor commands to muscles. A key aspect of the brain’s ability to perform these functions is time. The brain takes in sensory inputs that vary with time; it takes time for the brain to process the sensory inputs; the brain sends time-varying control signals that guide the arm to an intended reach target. The time-varying aspect of these processes is referred to as dynamics, which is reflected in the corresponding neural activity. In this dissertation, we seek to study and

characterize the dynamics of neural activity, in particular that related to motor

behavior.There are two core aims to this dissertation. The first is to improve the per

formance of neural prosthetic systems for assisting disabled patients. A key factor in determining the clinical viability of such systems is its accuracy in translating

neural activity into desired movements. The decoder can either attem pt to estimate the entire arm trajectory, or simply its endpoint. Through the development of novel decoding algorithms and system design, this dissertation advances the state-of-the-art under both modes of operation.

The second aim is to elucidate the process of motor preparation, or motor

“planning” . Most studies to date of neural activity related to motor preparation have been descriptive, without attempting to uncover the m ech an ism s u nderly ing

that activity on a systems level. We test the hypothesis that neural activity in

premotor cortex becomes optimized during motor preparation using a variability measure, which requires collapsing across hundreds of trials. To extend the characterization of this process, we develop latent variable techniques for studying motor

iv


preparation on single trials. Such techniques should enable a new class of questions

in neuroscience to be studied that involve tracking one’s cognitive state.

v


Acknowledgements

As an undergraduate studying signal processing and communication systems at UC Berkeley, I wondered if there were applications of my training in areas broadly labeled “biological” . While I kept an eye out for research opportunities in biological areas, nearly all seemed to lie on the opposite side of a chasm that I could not, or

perhaps did not necessarily want to, cross. W hat I needed was a bridge between

my engineering training and biological problems.

Upon entering graduate school, I found that bridge: his name was Krishna Shenoy. He was the one that introduced me to neuroscience and convinced me

that there was a place for engineers in neuroscience. More importantly, he offered me a chance to study the brain in the most concrete way imaginable. From rig construction to experimental design to animal training to data collection to algorithmic design, it was clear from day one that no facet of the Shenoy lab enterprise

was off-limits to me. In fact, I was strongly encouraged, and even expected, to play an important part in all of these aspects. I have been extremely privileged in this respect.

Several years along, it became increasingly clear that novel techniques for data analysis were needed to most effectively study the body of scientific questions we were interested in. My initial forays into developing these techniques were met with little success. To Krishna’s credit, he was supportive of me putting more time into this, even though at the time it might have seemed a bit pie-in-the-sky. During the summer of 2004, Maneesh Sahani from the Gatsby Computational Neuroscience Unit at University College London visited our lab. He was kind enough to take me

under his wings and introduce me to statistical and machine learning techniques

vi


that would eventually form the core of my thesis work. This was the start of a fruitful on-going collaboration, which led me to make two extended visits to the Gatsby Unit in 2004-2005. Maneesh’s uncanny ability to select the correct heading direction in the face of uncertainty has saved me countless evenings of wasted pencil pushing. His steadying influence has positively shaped the face of my thesis work

and led me to learn about techniques that have wide-ranging applications beyond neuroscience and engineering. I am grateful for Krishna and Maneesh’s continued

support and guidance.

This thesis has been made possible by the efforts of all members of the Shenoy

lab. Prom end to end, it has been a team effort. I have been privileged to have had the opportunity to work with such a talented, dedicated, and fun bunch. The following are members of the Shenoy lab with whom I have worked most closely with over the years: Gopal Santhanam, Mark Churchland, Aaron Batista, Stephen Ryu, Caleb Kemere, Afsheen Afshar, and John Cunningham. While I won’t endeavor to enumerate here all the ways in which the members of the Shenoy

lab have enriched my graduate studies and assisted me along the way, I would

particularly like to acknowledge Gopal for first putting me in touch with Krishna,

for his computing wizardry which has enabled and facilitated countless aspects of this thesis from data collection to paper writing, and for his uncompromising standards from which I have greatly learned and benefited; Mark for imparting his remarkable intuition to me; and Aaron for going out of his way on numerous occasions to help me more effectively communicate my work to others. I would also like to thank Sandy Eisensee for her first-rate administrative assistance and Missy Howard and Mackenzie Risch for their expert veterinary care.

I would like to express my gratitude towards the members of the Gatsby Unit

for taking me in as one of their own during my visits there and providing me with a stim u la tin g environm ent in w hich to learn and work. M uch o f th e co m p u ta tio n a l

machinery contained in this thesis has roots in Zoubin Ghahramani’s Unsupervised Learning course and the Non-linear Dynamical Systems Journal Club held at the Gatsby Unit, both in fall 2004. It was during this time that I went from simply being able to do problem sets, to being able to draw from my basic education

vii


in linear algebra, probability, calculus, optimization, and control theory to solve

research-grade problems.Joseph Kahn and David Tse, my advisors at UC Berkeley, inspired me to pursue

a PhD. Prof. Kahn offered me my first research position as an undergraduate and showed me the ropes of bringing a research result to publication. Prof. Tse’s courses in probability and stochastic processes captivated me, primarily due to

his remarkable ability to appeal to intuition. This intuition has proven invaluable throughout my thesis work.

I would like to thank Bill Newsome and Teresa Meng for serving on my thesis

committee, as well as Tirin Moore for serving as my orals chair. During my gradu

ate studies, I have been generously supported by the National Science Foundation

Graduate Research Fellowship and the National Defense Science and Engineering Graduate Fellowship, as well as by the NIH-NINDS-CRCNS-R01 through Krishna

and Maneesh. Last, but not least, I am indebted to my parents and my brother for pointing me in the right directions over the years and being a seemingly unlimited source of love and support.

viii


Contents

A bstract iv

A cknowledgem ents vi

1 Introduction 1

2 D ecoding G oal-D irected M ovem ents 5

2.1 Mixture of trajectory models framework ............................................. 82.1.1 Mixture of linear-Gaussian trajectory models with Poisson

observations.................................................................................... 102.1.2 Recursive Bayesian decoding....................................................... 112.1.3 Evaluating performance .............................................................. 13

2.2 Random walk trajectory m o d e l .............................................................. 15

2.3 Goal-directed reach task and neural recordings.................................... 162.4 Model fitting ............................................................................................. 21

2.4.1 Trajectory m o d e l .......................................................................... 21

2.4.2 Observation m o d e l ....................................................................... 25

2.5 Incorporating goal information from delay ac tiv ity ............................. 282.6 R esu lts ........................................................................................................... 312.7 D iscussion.................................................................................................... 43

3 A Real-T im e Com m unication Prosthesis 483.1 Offline system d e s ig n ................................................................................. 50

3.1.1 Skip T im e ........................................................................................ 50

ix


3.1.2 Integration T i m e .......................................................................... 523.2 Online perform ance................................................................................... 54

4 Neural Variability and M otor Preparation 614.1 Measuring neural v a ria b ility ................................................................... 63

4.2 Timecourse of neural v a riab ility ............................................................ 66

4.3 Relationship to reaction time variability . ...................................... 68

4.4 Relationship to timecourse of motor p rep ara tio n ................................ 70

5 E xtracting Dynam ical Structure Em bedded in Neural A ctiv ity 735.1 Latent variable m o d e ls ............................................................................. 745.2 Hidden non-linear dynamical system ................................................... 76

5.3 Model fitting ............................................................................................. 785.4 E-step: Trajectory e s tim a tio n ................................................................ 795.5 Moment estimation for E P ...................................................................... 86

5.5.1 Modal Gaussian ap p ro x im atio n ................................................ 875.5.2 Gaussian q u a d ra tu re ................................................................... 87

5.6 M-step: Parameter e s tim a tio n ................................................................ 92

5.7 Application to simulated d a t a ................................................................ 96

5.8 Application to neural d a t a ...................................................................... 98

6 Conclusions and future work 107

A D etails of E xpectation Propagation 117A .l Inference......................................................................................................... 117A.2 Data lik e lih o o d ............................................................................................ 119

B D etails o f M -step param eter updates 121

Bibliography 123

x


List of Tables

2.1 Eims (mean ± SE in mm) comparison of unshuffled and shuffled

trajectories. Shuffling was carried out on the decoded trajectories

across trials with the same reach goal. The Erms values for the unshuffled case are identical to those appearing in Figure 2.8............ 42

3.1 BCI experiments with highest ITRC for monkeys H and G. Eachrow lists the experiment with highest performance (ITRC) for a

given target layout. Other experiments yielded higher single-trial accuracy or involved faster cursor rates, but did not achieve thehighest ITRC for the corresponding target layout (not shown). . . . 56

5.1 Root-mean-square error (m eanisem ) between the actual and estimated state trajectories.............................................................................. 97

xi


List of Figures

2.1 Directed graphical model depicting the relationship between the arm states (xt) and the peri-movement activity from multiple, simultaneously- recorded neural units (yt) .......................................................................... 12

2.2 Placement of electrode arrays in PMd of monkeys G and H. For

both monkeys, the arrays were placed in a location that spans dorsal

premotor and primary motor cortices. Intraoperative photographs

of the array implanted in cerebral cortex are shown with sulci indicated. Overlapping diagram shows the relative array placement between monkeys. Monkey H’s sulcal pattern is reflected vertically and rotated to bring the sulci into alignment with those of mon

key G. Ce.S.: central sulcus; S.Pc.D.: superior precentral dimple; Sp.A.S.: spur of the arcuate sulcus; A.S.: arcuate sulcus.................... 18

2.3 Delayed reach task and neural recordings. A: Task timeline (top), simultaneously-recorded spike trains (middle), and arm and eye position traces (bottom) are shown for a single trial Blue and red lines

correspond to horizontal and vertical position, respectively. The full

range of scale for the arm and eye position is ±15 cm from the center target. Trial from experiment H20041106.1. B: Spatial arrangement o f the eight reach goals and corresponding spike h istogram s

for one representative unit (Unit H20041217.23.0). Bars below histograms indicate delay (hatched) and peri-movement (gray) activity. Dotted lines denote reach goal onset and movement onset.................. 19

XII


2.4 Position trajectories (upper row) and speed profiles (lower row)

A: collected empirically, B: generated by the RWM, C: generated by the STM, and D: generated by the MTM. Only 24 reaches (threeto each reach goal) are shown in each column for clarity. ............... 23

2.5 Non-directional firing rate modulations can be captured by including magnitude terms (in this case, position and velocity magnitudes) as

explanatory variables. The empirical firing rate histograms (gray)

are compared to firing rate profiles predicted by firing rate models

with (blue) and without (red) magnitude terms. Vertical arrows

denote movement onset. Each panel corresponds to one reach goal.(Unit G20040508.38.1) 27

2.6 A representative test trial in which the use of delay activity improved the MTM decoded trajectory. Upper panels: actual trajectory (thick black), STM decoded trajectory (thick green), MTM decoded trajectories without (left panel, thick red) and with (right panel,

thick orange) delay activity, and three MTM component trajec

tory estimates with the largest weights (cyan, blue, magenta). Note

that the actual trajectory, STM decoded trajectory, and MTM component trajectory estimates are identical in the two upper panels.Lower panels: the three corresponding MTM component weights as they evolve during the trial. Time zero corresponds to 60 ms before movement onset (i.e., one time step before we begin to decode movement). For this trial, E rms was 17.4, 7.8, and 7.4 mm for STM, MTM without delay activity, and MTM with delay activity, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 686) ............................................................................................................. 32

xiii


2.7 A representative test trial in which the peri-movement activity cor

rected an incorrect goal identification from delay activity. Figure

conventions identical to those in Figure 2.6. Note that the thick red

and orange traces in the upper panels are overlaid with the cyan trace. For this trial, E rms was 16.7, 13.3, and 13.4 mm for STM,MTM without delay activity, and MTM with delay activity, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 676) . 34

2.8 Evms (mean ± SE) comparison for decoders using the RWM, STM,MTM without delay activity (MTMm), and MTM with delay activ

ity (MTMdm) • A: Monkey G (98 units), B: monkey H (99 units). . 36

2.9 Two-dimensional histogram of Eims differences between pairs of de

coders for A: monkey G (98 units), B: monkey H (99 units). Horizontal axis: E rms difference between MTMm and STM, vertical axis:

E ims difference between M T M dm and STM, diagonal axis: Eims difference between M T M dm and M T M m - The grayscale intensity (log scale) indicates the number of trials lying in each bin. The red dotted lines represent the means of the E vms differences along each axis.The letters (a, b, c, d) show where the trials in Figs. 2.6, 2.7, 2.10,and 2.11 lie on the histogram, respectively........................................... 37

2.10 An outlying test trial in which the MTMm decoded trajectory exhibited a snap-to effect. Figure conventions identical to those in

Figure 2.6. For this trial, E rms was 21.6, 43.8, and 9.3 mm for STM, MTMm, and MTMDm, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 1 9 2 1 ) ............................................................. 38

2.11 An outlying test trial in which the peri-movement activity was not

able to correct an incorrect reach goal identified by the delay activity. F igure con ven tion s id en tica l to th o se in F igure 2.6. For th is tria l,

-Erms was 17.1, 11.6, and 48.8 mm for STM, MTMm , and MTMdm, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 1608)............................................................................................................. 40

xiv


2.12 ETms (mean ± SE) comparison of STM (green), MTMm (red), and

MTMdm (orange) decoders at different numbers of units. Dashed curves: monkey G, solid curves: monkey H. The vertical gray bar

indicates the number of units used for the performance reported in Figure 2.8...................................................................................................

3.1 PMd latency analysis with the single-target instructed-delay task (one reach target was shown out of a possible of 8 locations and the

remaining 7 locations are invisible) as a function of TskiP. Performance was calculated by training a Poisson model on all trials in a dataset and computing the leave-one-out cross-validated performance on the same data. The shaded area denotes the 95% confidence interval (Bernoulli process) around the mean performance

(embedded line). Dark curves correspond to monkey G (dataset

G20040603) and light curves to monkey H (dataset H20041117).

Performance was calculated for a constant T;nt of 50 ms with varying Tskip.......................................................................................................

3.2 Performance curves investigating the dependence on T;nt were cal

culated from offline experiment H20041118 (8-target configuration). The trial length was Tskip + Tint + Tdec+rend with Tskip = 150 ms and

Tdec+rend ~ 40 ms. Tint was varied and performance was computed. Performance metrics were very consistent day after day and between monkeys (data not shown). The theoretical maximum ITRC in bps,

assuming 100% accuracy regardless of Tint, is plotted as the dotted red curve.....................................................................................................

xv


3.3 Chain of three prosthetic cursor trials followed by a standard instructed-

delay reach trial. Ts ip is denoted by the orange parts of the time

line. Neural activity was integrated (7int) during the purple shaded interval and used to predict the reach target location. After a short

processing time (7dec+rend ~ 40 ms), a prosthetic cursor was briefly rendered and a new target was displayed. The dotted circles represent the reach target and prosthetic cursor from the previous trial, both of which were rapidly extinguished before the start of the trial

indicated. Large ellipses draw attention to the increase in neural

activity related to the peripheral reach target. Trials shown hereare from experiment H20041106.1 with monkey H............................... 55

3.4 Performance measured during BCI experiments. Performance is

plotted for each target configuration and across varying total trial lengths. Each data symbol represents performance calculated from one experiment (many hundreds of trials). Across target configurations, single-trial accuracy decreases and ITRC increases as more

targets locations are used........................................................................... 57

3.5 ITRC as a function of number of neural units and Tint. All data are

from experiment H20041118, which used an 8-target configuration

and contained over 1,300 trials. Tskip was fixed at 150 ms. The main panel shows contours of ITRC (bps) as a function of the number of neural units available and Tint. The inset shows the value of Tint that

achieves the maximum ITRC for each neural ensemble size that we tested. Similar results were obtained for data set G20040508 from monkey G...................................................................................................... 59

xvi


4.1 Illustration of optimal subspace hypothesis. Each axis represents the

underlying firing rate of one neuron; only three of them are drawn.

Different movements have different optimal subspaces (shaded ar

eas). For different trials, the process of motor preparation (arrows) may take place at different rates, along different paths, and from different starting points.............................................................................. 63

4.2 Simulations illustrating how an increasing consistency in across-trial firing rate could be detected using the NV metric. Simulations were based on the mean firing rate of one recorded neuron (solid black

trace at top). Baseline activity was artificially extended (to the left) to allow longer simulations. For each of 10,000 simulated trials,

spike trains were generated using Poisson statistics. Two versions

of the simulation were run. For the first version, the underlying

firing rate was identical (black trace at top) on all simulated trials.The resulting NV is shown by the black trace at the bottom. For the second version, each trial had a different underlying firing rate, generated by adding noise, filtered with a 30 ms SD Gaussian, to the mean. The magnitude of this noise decayed with an exponential time constant of 200 ms after target onset. Ten examples of the resulting underlying firing rates are shown in gray at top, and the

resulting spike trains (computed with Poisson statistics, with the

time-varying mean taken from the gray traces) are shown in the

rasters. The NV computed from 10,000 such spike trains is shownby the gray trace at the bottom ................................................................ 65

4.3 NV timecourse. Black trace: mean ± SEM across all isolations and target conditions. Gray traces: mean absolute hand speed.T w o tem p ora l ep och s are show n, a lign ed to target and movement onset times (black arrows). The small solid histogram shows the distribution of go cue onset times, reflecting the fact that RTs are variable. Monkey G (816 trials, 47 isolations: 14 single- and 33 multi-unit)..................................................................................................... 67

xvii


4.4 Relationship of the NV to natural RT variability. A: A prediction

of how RT might relate to firing rate given the optimal-subspace

hypothesis. The shaded area represents the optimal subspace for the

movement being prepared, as in Figure 4.1. Each dot corresponds to one trial and represents the configuration of firing rates at the

time of the go cue. For some trials, that configuration may lie

within the optimal subspace (green dots), leading to a short RT.For other trials, the configuration may lie outside (red dots), leading to a longer RT. B: Red and green traces show the NV, around the

time of the go cue, for trials with RTs longer and shorter than the median. Traces at bottom show the mean percentage difference (short — long, ±SEM) in the NV (black). Data were pooled across the recordings from 7 days (monkey G), including all trials with delay periods >200 ms................................................................................ 69

4.5 Analyses with three different delay-period durations: 30, 130 and

230 ms. Data are from one day using monkey G (39 isolations, 957

trials). A: Change in mean firing rate (±SEM) from baseline (top) and NV±SEM (bottom) for each delay-period duration. Analysis

was performed with data aligned to the go cue. B: Mean RT versus the change in firing rate from baseline, measured at the time of the go cue for the three delay-period durations. Black symbols plot the mean change averaged across all neurons and conditions. Gray symbols plot the same analysis but including only the preferred

condition of each neuron. Note that the x-axis has been rescaled in

the latter case. C: Mean RT versus the NV, measured at the time

of the go cue for the three delay-period durations. Error bars show SEM................................................................................................................ 71

5.1 Non-linear activation function erf(z) = f* e~t2d t.................................... 775.2 Link function h(z) = log (1 + ez) ............................................................... 78

xviii


5.3 Illustration of an EP update during the forward pass. The left panel shows how P (xt_ i , x t) (red) is formed from its constituent factors (black). The right panel shows two possible Gaussian approximations (ellipsoidal contours) of P (xt~ d e p e n d i n g on whether Laplace-EP (blue) or GQ-EP (green) is used. This results in two

possible updates of a t (xt), plotted as one-dimensional densities onthe vertical axis............................................................................................ 86

5.4 Learning curves for the approximate EM algorithm developed in

this chapter, with GQ-EP used for the E-step. Data likelihoods

computed using sequential Monte Carlo techniques (2500 particles).Left panel: training data (70 trials). Right panel: test data (100trials). Red traces: linear-Gaussian state model. Blue traces: recurrent network state model......................................................................... 100

5.5 Model comparison between two dynamical models (linear and recurrent state dynamics) and two non-dynamical models (FAP and PSTHP). Left panel: training data (70 trials). Right panel: test data (100 trials).............................................................................................. 101

5.6 Inferred state trajectories (black) in latent x space for 100 test trials,

based on the model with recurrent state dynamics. Dots indicate

50 ms after target onset (blue) and 50 ms after the go cue (green).

The radius of the green dots is logarithmically-related to delay period duration (200, 750, or 1000 ms).......................................................... 103

5.7 Imputed trial-by-trial firing rates (blue) and empirical firing rates(red). Gray vertical line indicates the time of the go cue. Each panel corresponds to one unit. For clarity, only test trials with delay periods of 1000 ms (35 trials) are plotted for each unit 105


Chapter 1

Introduction

In everyday life, the brain takes in sensory inputs, processes that information, and

sends motor commands to muscles. For example, to pick up a cup, the visual scene is taken in, a cup is identified, a decision is made to reach to it, and motor commands are sent down the spinal cord which guide the arm to the cup. A key

aspect of the brain’s ability to perform these functions is time. The brain takes in sensory inputs that vary with time; it takes time for the brain to process the sensory inputs; the brain sends time-varying control signals that guide the arm to an intended reach target. We refer to the time-varying aspect of these processes as dynamics, which is reflected in corresponding neural activity. In this dissertation,

we seek to study and characterize the dynamics of neural activity, in particular that related to motor behavior.

To further narrow the scope of this study, we consider neural activity related to motor preparation and motor execution. In Chapter 2, we seek to improve the accuracy of decoding goal-directed movements from neural activity related to

motor execution. A vital component of probabilistic decoders is the trajectory m od el, w h ich bu ild s in to th e decoder prior know ledge a b o u t th e form o f th e tra

jectories. Most current trajectory models assume little or no knowledge about the form of movement, but instead provide a basic smoothness constraint on the decoded trajectory. When trajectories are goal-directed however, stronger models can be defined that reflect the typical dynamics of movement, as well as the

1


CHAPTER 1. INTRODUCTION 2

likely goal locations. To address this need, we develop a mixture of trajectory models (MTM) framework. We show that the MTM better captures the statistics of the goal-directed movements than current trajectory models and, as a result,

its decoded trajectories are more accurate. In addition, there is often prior in

formation available about the endpoint of the upcoming movement, even before

movement begins. This information is represented by a probability distribution

across the possible goals. We show how the MTM framework can naturally incorporate this information to further improve decoding performance. Variants of the MTM framework were first proposed by Caleb Kemere, Gopal Santhanam, and myself. Animal training and neural recordings were performed jointly by Gopal Santhanam, Stephen Ryu, Afsheen Afshar, and myself. The surgical team for array implantation was led by Stephen Ryu. I was the primary contributor to the development of the MTM decoder and the corresponding analyses.

Chapters 3-5 investigate the neural dynamics of motor preparation. W hat is

motor preparation? Voluntary movements are believed to be “prepared” before

they are executed. An important line of evidence comes from tasks where subjects are explicitly given time to prepare a movement before being instructed to move.

These studies have shown that reaction times (RT, defined as the duration of time between when a subject is instructed to move and when movement begins) tend

to be shorter when subjects are given time to prepare the movement (Rosenbaum, 1980; Riehle and Requin, 1989,1993; Crammond and Kalaska, 2000). This suggests that there is some time-consuming preparatory process that needs to be carried out before a movement can be executed and, if given time, can be carried out well in advance of the movement itself. Furthermore, disrupting what is believed to be the neural substrate of motor preparation can erase the RT savings (Churchland

and Shenoy, 2006). Thus, motor preparation is often thought of as the process of“p lan n in g” a m ovem ent.

In Chapter 3, we first use a decoding approach to gain insights into the neural dynamics of motor preparation. We investigate how quickly the neural activity reflects a reach target after it is specified, how our ability to decode varies with the duration and placement of the time window over which neural activity is observed,



and how quickly the brain can change its motor “plan” from one reach target to

another. These findings are then used to design a real-time brain-computer interface (BCI). Rather than attempting to decode an entire reach trajectory, our BCI

decodes only the identity of the reach target and can be viewed as a key selection device. This reduces the target acquisition time and, consequently, increases the number of targets that can be decoded per second. The performance of the system is evaluated based on the accuracy of decoded targets and speed at which targets

are decoded, which are together captured by an information theoretic analysis. We

show that our BCI offers manyfold higher performance than current BCIs, thereby

increasing the clinical viability of BCIs in humans. Stephen Ryu initially conceived

of the key selection paradigm. The real-time decoding system was set up jointly by Gopal Santhanam, Stephen Ryu, and myself. Animal training and neural recordings were performed jointly by Gopal Santhanam, Stephen Ryu, Afsheen Afshar, and myself. The surgical team for array implantation was led by Stephen Ryu. Gopal Santhanam was the primary contributor to all analyses.

In Chapter 4, we study the neural dynamics of motor preparation through the development of a variability-based measure similar to the Fano factor, termed the

normalized variance (NV). Previous attempts to track the progress motor prepa

ration have focused on mean firing rate (Riehle and Requin, 1993; Bastian et al.,

2003), whose effects appear to be inconsistent across different subjects (Cram- mond and Kalaska, 2000; Churchland et al., 2006b). Our hypothesis is that firing rates in premotor cortex become optimized during motor preparation, approaching their ideal values over time. Firing rates are initially variable across trials but are

brought over time to their “appropriate” values, becoming consistent in the process. This hypothesis predicts that i) the NV should drop after the reach target is presented, and ii) the NV at the time at which the movement is “triggered” should b e p red ictive o f R T. W e find our d a ta to b e con sisten t w ith th ese p red iction s across

three monkeys and interpret the NV timecourse as a signature of motor preparation. Mark Churchland and I initially conceived of the variability-based measure. Multi-electrode recordings were performed jointly by Gopal Santhanam, Stephen



Ryu, and myself; single electrode recordings were performed by Mark Church

land. The surgical team for array implantation was led by Stephen Ryu. Mark Churchland was the primary contributor to all analyses.

While the NV appears to track the progress of motor preparation, it provides lit

tle insight into the timecourse of motor preparation on a single trial. Furthermore,

the NV is first computed per-neuron then averaged across neurons, ignoring any

structure that may be present in the correlated firing of neurons on a single trial. In Chapter 5, we develop statistical techniques that can provide single-trial characterizations and exploit the fact that the neurons are recorded simultaneously. A dynamical systems approach is taken, where the central idea is that that responses of different neurons reflect different views of a common dynamical process.

We present and validate latent variable methods that simultaneously i) extract

this dynamical process from neural activity on single trials, and ii) learn the rules

governing the system dynamics. An approximate expectation-maximization (EM) algorithm is developed for fitting a nonlinear dynamical systems model with underlying recurrent structure and stochastic point-process output. The approximate EM algorithm uses expectation-propagation (EP) for inference and approximate

gradients for learning. Techniques like these should enable a new class of questions in neuroscience to be studied that involve tracking one’s cognitive state. Maneesh Sahani, Afsheen Afshar, and I initially conceived of the approach. Animal training and neural recordings were performed jointly by Gopal Santhanam, Stephen Ryu,

Afsheen Afshar, and myself. The surgical team for array implantation was led by Stephen Ryu. I was the primary contributor to the technical development of the inference and learning algorithms, and associated analyses.


Chapter 2

Decoding Goal-Directed Movements

Neural activity from motor cortical areas has been shown to be related to vari

ous aspects of the corresponding arm reach, including movement direction (Tanji

and Evarts, 1976; Georgopoulos et al., 1986; Riehle and Requin, 1989; Ashe and

Georgopoulos, 1994; Moran and Schwartz, 1999), movement extent (Riehle and

Requin, 1989), position (Ashe and Georgopoulos, 1994; Paninski et al., 2004), velocity (Ashe and Georgopoulos, 1994; Paninski et al., 2004), acceleration (Ashe and Georgopoulos, 1994), posture (Caminiti et al., 1991; Scott and Kalaska, 1997), speed (Moran and Schwartz, 1999), joint angular velocity (Reina et al., 2001), force (Evarts, 1968; Sergio and Kalaska, 1998), and intended reach goal (Shen and Alexander, 1997; Messier and Kalaska, 2000). While the coding scheme employed by motor cortical areas is still incompletely understood (Fetz, 1992; Moran and

Schwartz, 2000; Scott, 2004), the regularities in the relationship between neural

activity and aspects of the arm reach have enabled the development of direct brain- controlled p ro sth etic d ev ices (Serruya et a l., 2002; T aylor et a l., 2002; C arm ena

et al., 2003; Musallam et al., 2004; Santhanam et al., 2006). These devices are designed to allow disabled patients to regain motor function through the use of prosthetic limbs, or computer cursors, that are controlled by neural activity.

One of the key components of a prosthetic device is its decoding algorithm,

5


CHAPTER 2. DECODING GOAL-DIRECTED MOVEMENTS 6

which translates neural activity into arm reaches. Examples of decoding algorithms

that translate neural activity around the time of the movement (termed peri-

movement activity) into continuous arm trajectories include population vectors (Taylor et al., 2002) and linear filters (Serruya et ah, 2002; Carmena et al., 2003).

Both of these decoding algorithms assume a linear relationship between the neural activity and arm state. In general, the arm state may include, but is not limited to, arm position, velocity, and acceleration.

While these linear decoding algorithms are effective, recursive Bayesian decoders have been shown to provide more accurate trajectory estimates (Brown et al., 1998; Brockwell et al., 2004; Wu et al., 2003, 2004, 2006). Recursive Bayesian

decoders are based on the specification of a probabilistic model comprising (1) a

trajectory model, which describes how the arm state changes from one time step to the next, and (2) an observation model, which describes how the observed neural

activity relates to the time-evolving arm state. If the modeling assumptions are satisfied, then Bayesian estimation makes optimal use of the observed data. Unlike the aforementioned linear decoding algorithms, recursive Bayesian decoders provide confidence regions for the arm state estimates and allow for non-linear relationships between the neural activity and arm state.

The functionality of the trajectory model is to build into the recursive Bayesian

decoder prior knowledge about the form of the reaches. The model may reflect

(1) the hard, physical constraints of the limb (for example, the elbow can’t bend

backwards), (2) the soft, control constraints imposed by neural mechanisms (for example, the arm is more likely to move smoothly than in a jerky motion), as well as (3) the physical surroundings of the patient and his/her objectives in that environment. The degree to which the trajectory model reflects the dynamics

of the actual reaches directly affects the accuracy with which trajectories can be decoded from neural d a ta . A com m on ly -u sed tra jecto ry m od el is th e random walk (Brown et al., 1998; Brockwell et al., 2004), which captures the fact that arm trajectories tend to be smooth. In other words, small changes in arm state from one time step to the next are more likely than large changes. An alternative

trajectory model is based on linear dynamics perturbed by Gaussian noise, termed



a linear-Gaussian model (Wu et al., 2004; Shoham et al., 2005; Wu et al., 2006).

These models have been successfully applied to decoding the path of a foraging rat

(Brown et al., 1998), as well as arm trajectories in ellipse-tracing (Brockwell et al.,

2004), pursuit-tracking (Wu et al., 2004; Shoham et al., 2005; Wu et al., 2006), and “pinball” tasks (Wu et al., 2004, 2006).

It is often the case that there are a finite number of distinct objects that a disabled patient may wish to reach for in his/her workspace. Examples include reaching for the lighting, bed, or temperature controls; typing on a keyboard; or picking up the phone. Natural reaching movements in such settings exhibit the

following three properties. First, many, though clearly not all, reaching movements in the workspace will be directed to this set of discrete goals. Second, multiple reaches to the same goal are not all identical. For example, there may be variability

in reach speed or curvature. Third, the trajectories generally start at rest, proceed

out to the reach goal, and end at rest. Current trajectory models, such as the random walk or linear-Gaussian models, are limited in their ability to capture all

three aforementioned properties. In particular, it is not possible to specify multiple discrete reach goals at which the trajectories are likely to come to rest. Thus, we

seek a trajectory model that better captures the dynamics of goal-directed reaches, which should in turn yield more accurate trajectory estimates.

In addition, on a given trial, there may be information available about the identity of the upcoming reach goal before the reach begins. For example, if the phone rings, there is a greater chance that the goal of the upcoming reach will be the

phone rather than the light switch. Even without such an external clue, informa

tion about the goal of an upcoming reaching movement can often be deduced before

the reach begins from neural activity related to motor preparation (termed delay activity, since motor preparation is typically studied using a delayed-reach behavioral task) (e.g., W einrich and W ise, 1982; R ieh le an d R equin , 1989; Kurata, 1993;

Shen and Alexander, 1997; Messier and Kalaska, 2000; Churchland et al., 2006b). The type of information conveyed by delay activity is categorically different from that provided by peri-movement activity. Whereas peri-movement activity specifies the moment-by-moment details of the arm trajectory (e.g., Schwartz, 1992;



Ashe and Georgopoulos, 1994; Moran and Schwartz, 1999; Paninski et al., 2004), delay activity has been shown to indicate the upcoming reach goal (Shenoy et al., 2003; Hatsopoulos et al., 2004; Yu et al., 2004; Musallam et al., 2004; Santhanam

et al., 2006). It should be possible to use this goal information, when available, to improve the accuracy of the decoded trajectory. We recently demonstrated how to

use delay and peri-movement activity together to select from a set of canonical tra

jectories (Kemere et al., 2002, 2004b). While effective for stereotyped trajectories, behavioral variability across reaches to the same goal was not taken into account.

We further showed that a linear filter could be employed in tandem with either a mixture of hidden Markov models (Kemere et al., 2003) or a set of canonical

trajectories (Kemere et al., 2004a) to estimate reaches from neural data.Here, we seek to unify our previous work, while taking into account reach

variability and eliminating the need for a linear filter. We present a mixture of

trajectory models (MTM) framework that provides (1) a suitable trajectory model for goal-directed reaches, and (2) a principled way to incorporate information about

the identity of the upcoming reach goal. We first describe the MTM framework

and show how it can be used to decode arm trajectories from neural activity. Then,

the behavioral task and neural recordings are described. Next, we detail how the

models were fit to the data and how goal information can be extracted from delay

activity. Finally, we compare the performance of the MTM decoder with those based on the random walk and linear-Gaussian trajectory models.

2.1 M ixture of trajectory m odels framework

Recursive Bayesian decoders require the specification of a trajectory model that

describes the statistics of arm trajectories we expect to observe. Ideally, we would like to construct a complete model of neural motor control that captures the hard physical constraints of the limb, the soft control constraints imposed by neural mechanisms, as well as the physical surroundings and context. One way to approximate such a complete model is to build a separate trajectory model for each group of movements with similar objectives (Kemere et al., 2002, 2003, 2004a,b).



As in our previous work, we group the movements by reach goal. At the onset of a

new movement the desired reach goal is unknown, or imperfectly known, and so the full trajectory model is composed of a mixture of the individual, goal-specific trajectory models. We develop here a recursive Bayesian decoder based on a mixture of trajectory models (MTM).

The task of decoding a continuous arm trajectory involves finding the likely sequences of arm states corresponding to the observed neural activity. At each time step t, we seek to compute the distribution of the arm state xt given the peri-

movement neural activity y i, y 2, . . . , y* (denoted by M i ) observed up to that

time. This distribution is written P (x t | (y}() and termed the state posterior.

Here, y t is a vector of binned spike counts across the neural population at time step t, and t = 1 corresponds to the time at which we begin to decode movement.If the desired reach goal m* is perfectly known before the reach begins, then we can compute the state posterior based only on the individual trajectory model

corresponding to that reach goal. This distribution is written P (xt | {y}*, m*) and termed the conditional state posterior. However, in general, the desired reach goal is unknown or imperfectly known, so we need to compute P (x* | {y}j, m) for each m e { 1 , . . . , M}, where M is the number of possible reach goals.

To combine the M conditional state posteriors, we can simply expand P (x* | (y}j) by conditioning on the reach goal m

M

P f a I M i ) = J 2 P (x t \ M i . rn) P (m | M i ) . (2.1)771=1

In other words, the state posterior is a weighted sum of the conditional state posteriors. The weights P (m | (y}i) represent the probability tha t the desired reach goal is m, given the observed spike counts up to time t. Bayes’ rule can then be applied to these weights in (2.1), yielding the key equation for the MTM framework

I M I ) = J M I M t ™ ) P({y^ (j " ), )P (m ). (2 .2 )



The conditional state posteriors P (xt | {y}\,'rn) and data likelihoods P ({y}* | m) in (2.2) can be computed or approximated using any of a number of different re

cursive Bayesian decoding techniques, including Bayes’ filter (Brown et al., 1998),

particle filters (Brockwell et al., 2004; Shoham et al., 2005), and Kalman filter

variants (Wu et al., 2004, 2006). If available, information about the identity of

the upcoming reach goal can be incorporated naturally into the MTM framework via P(m) in (2.2). This information must be available before the reach begins and may differ from trial-to-trial. If no such information is available, the same P(m) (e.g., a uniform distribution) can be used across all trials.

2.1.1 Mixture of linear-Gaussian trajectory models with Poisson observations

The particular probabilistic model explored here is

X£ | Xj_i,77i ~ A l (ATOX£_ 1 -f- bm, Q rn) (2'3)

Xi I m ~ A f(7cm, Vm) (2.4)

s£_lag. | x £ ~ Poisson êc*x*+diA^ , (2.5)

where m £ { 1 , . . . , M} indexes reach goal and M is the number of reach goals. The dynamical arm state at time step t £ { 1 ,. . . , T} is x t £ Mpxl, which includes position, velocity, and acceleration terms. The corresponding observation, s£-iag- € {0 ,1, 2 , . . .}, is a peri-movement spike count for unit % £ {1,. . . , q} taken in a time bin of width A, where lag, is the time lag (in time steps) between the neural firing

of the ith unit and the associated arm state. For notational convenience, the spike counts across the q simultaneously-recorded units are assembled into a q x 1 vector y t , whose ith element is s£_iag.. This is the y t that appears in (2.1) and (2.2). The parameters A m £ Rpxp, b m 6 Rpxl, Qm e RpXp, 7rm e RpXl, Vm £ Rpxp, lag£ e Z, ci £ Rpxl, di £ R do not depend on time and are fit to training data, as described below.

Equations (2.3) and (2.4) define the trajectory model, which describes how



the arm state x.t changes from one time step to the next. In this case, the full trajectory model is a mixture of linear-Gaussian trajectory models, each describing the trajectories toward a particular reach goal indexed by m. In other words, conditioned on the reach goal m, the trajectory model is a standard linear-Gaussian dynamic model.

Equation (2.5) defines the observation model, which describes how the observed

peri-movement spike counts 4-iagi relate to the arm state x t . In (2.5), the linear mapping c-x* + d, is a cosine tuning model (Georgopoulos et al., 1982), where c, is the “preferred state vector” . This linear mapping is then passed through an

exponential to ensure that the mean firing rate of the ith unit at time t — lag^, gc'xt+d,;, jg non_negative. Note that, whereas each mixture component indexed by

m in the trajectory model (2.3) and (2.4) can have different parameters leading to different arm state dynamics, the observation model (2.5) is the same for all m.

The directed graphical model in Figure 2.1 illustrates the conditional independence relationships defined by (2.3)-(2.5). Although the neural activity is known to be physically driving the trajectories, the model prescribes a relationship in

the opposite direction. Nevertheless, models with this “inverted” structure have

been shown to decode arm trajectories effectively (Brockwell et al., 2004; Shoham et al., 2005; Wu et al., 2004, 2006). The primary motivation for adopting such a structure is that there are established techniques for efficiently estimating an unobserved time series with known dynamics (in this case, the arm trajectory) from noisy observations (in this case, the neural spike counts). These techniques, which will be detailed in the next section, assume the relationships shown in Figure 2.1. One can think of a trajectory as a stimulus that “evokes” the corresponding neural response.

2.1.2 Recursive Bayesian decoding

Arm trajectories can be decoded from neural activity by applying Bayes’ rule to

the statistical relationships (2.3) -(2.5). Having observed the neural data, we seek

the likely sequences of arm states tha t could have led to those neural observations.



©© ©

Figure 2.1: Directed graphical model depicting the relationship between the arm states (xt) and the peri-movement activity from multiple, simultaneously-recorded neural units (y t).

For each m and t, we need to compute the following two terms that appear in

(2.2): the conditional state posteriors P (xt | {y}(, to) and the data likelihoods

The conditional state posteriors can be obtained by iterating the following two updates. First, the one-step prediction is found by applying (2.3) to the conditional state posterior at the previous time step

Second, the conditional state posterior at the current time step is computed using Bayes’ rule

since, given the current arm state x t , the current observation y t does not depend

on the previous observations {y}i_1 nor the reach goal m (cf. (2.5) and Figure 2.1).When the trajectory and observation models are both linear-Gaussian, all of

the relevant distributions are Gaussian and the integral in (2.6) can be computed exactly. Taking the iterations defined by (2.6) and (2.7) is identical to applying the standard Kalman filter.

However, the observation model (2.5) is not linear-Gaussian. This leads to distributions that are difficult to manipulate, and the integral in (2.6) cannot be

p ({y}* I mJ-

-P(x< 1 {y}‘i 1,m) = I P (x , I x , v x j P (x, . i I jy}': , m) d x ,. , . (2.6)

(2.7)

Note that P (yt | x t , (y}( 1, rn) has been replaced by P (yt | x t) to obtain (2.7)



computed analytically. We instead employ a modified Kalman filter that uses a

Gaussian approximation during the measurement update step (2.7) (Brown et al., 1998). We approximate P (xt | (y } i,m ) as a Gaussian matched to the location

and curvature of the mode of P (x* | {y}(, m), as in Laplace’s method (MacKay,

2003). Since the conditional state posterior is strictly log-concave in Xt, its unique

mode can easily be found by Newton’s method. This Gaussian approximation then

allows the integral in (2.6) to be computed exactly at the next time step, as in the standard Kalman filter.

The data likelihoods P ({y}( | m) can be expressed as

p ({y}‘i m) = n P (yr I M r1.™), (2.8)T—l

where

p (yt I {y}i“\ m ) = j P ( y t | xt) P (xt I {y}‘f \ m) dxt. (2.9)

The integral in (2.9), which cannot be computed analytically, is approximated using Laplace’s method (MacKay, 2003). Note that this involves the same Gaussian approximation in x t (i.e., the same mean and covariance) as above for P (xt | {y}j, m).

2.1.3 Evaluating performance

The state posterior P (x t | (y}() in (2.1) is a multi-modal distribution. To compare

the performance of different decoders and for prosthetic applications, we need to

collapse this multi-modal distribution into a single decoded trajectory. In other

words, we need to summarize the belief embodied in the state posterior with a single value x t at each time point. This can be done by first defining a loss function L, which specifies the loss incurred by the summary x* for each possible value of x t. The single decoded trajectory is then the x.t that minimizes the average loss under



the state posterior

Average loss (xt) = J L (xt, x t) P (xt | {y}*) dxt. (2.10)

Here, we choose to use the instantaneous sum of squared distance loss function

L (xt,xt) = ||xt - x * | |2 , (2.11)

in which case the x t that minimizes the average loss (2.10) is the mean of the state posterior

x t = E [xt | {y}*] . (2.12)

In particular, the mean of the state elements corresponding to arm position is taken to be the decoded position trajectory. To compare different decoders, we compute the root-mean-square position error (Erms) between the actual and decoded trajectories on a per-trajectory basis.

The expectation in (2.12) can be expanded by conditioning on the reach goal m as in (2.1), yielding

M

** = E I M 5> H p (m I M ! ) • (2-13)m = l

The interpretation of (2.13) is similar to that of (2.1). If the desired reach goalm* is perfectly known before the reach begins, the decoded trajectory (xt) is computed based only on the individual trajectory model (i.e., the mixture component) corresponding to that reach goal. The decoded trajectory, in this case, is

simply the mean of the conditional state posterior corresponding to the known reach goal, written E [xt | {y}\,rn*] and termed the component trajectory estimate

for m*. However, in general, the desired reach goal is unknown or imperfectly known, so we need to compute a component trajectory estimate E \ x t \ {y}j, m] for each m e { 1 ,. . . , M}. The final decoded trajectory (xt) is a weighted sum of these component trajectory estimates, as shown in (2.13). As in (2.1), the weights



P (m | {y}*) represent the probability that the desired reach goal is m, given the

observed spike counts up to time t.

2.2 Random walk trajectory m odel

We will compare decoders based on different trajectory models. Most state-of-the-

art trajectory models are special cases of the MTM defined by (2.3)—(2.5), including a single linear-Gaussian model (Wu et a l, 2004, 2006) and a random walk model (Brown et al., 1998; Brockwell et al., 2004). The single linear-Gaussian model issimply the MTM with a single mixture component. The random walk model is a

special case of the linear-Gaussian model with appropriately chosen parameters in

(2.3) and (2.4). We implemented the random walk trajectory model with Poisson

observations presented by Brockwell and colleagues (Brockwell et al., 2004)

v t - vt_i = v*_i - v t_2 + £t (2.14)

v2 vi

st - lag* I v t ~ Poisson êc*Vt+diA j , (2.16)

where e t ~ Af (0, Q) in (2.14), v t E Rpxl is the arm velocity at time t, v t is defined

to be [vj llvfH]' in (2.16), and j|vtJ| is the arm speed at time t. As in (2.5), lag-

is the peri-movement spike count of the ith unit at time t — lag{, where lag* is the time lag between the neural firing of unit i € ( 1 , . . . , g} and the associated arm velocity. Spike counts are taken in time bins of width A. The parameters Q G RpXp, 7T G R2pxl, V G R2px2p, lag* 6 Z, Cj 6 Rh>+1)x l, d* e R are fit to training data, as described below.

Equations (2.14) and (2.15) define the random walk trajectory model that imposes smoothness in acceleration; Equation (2.16) defines the Poisson observation

model. To decode arm trajectories using this probabilistic model, we followed Brockwell and colleagues (Brockwell et al., 2004) and implemented particle filtering with 2500 particles at each time step. This yielded a velocity estimate at each

1 A/”(7T, V) (2.15)



time step. To obtain a single decoded position trajectory, the means of these ve

locity estimates were integrated over time. Because the arm state does not include

positional variables in this model, we assumed the actual initial arm position was

known. Thus, the decoder based on the random walk trajectory model was given a slight advantage over the other decoders.

2.3 G oal-directed reach task and neural record

ings

Animal protocols were approved by the Stanford University Institutional Animal Care and Use Committee. We trained two adult male monkeys (Macaca mulatta, monkeys G and H) to perform delayed center-out reaches for juice rewards. As illustrated in Figure 2.3A, visual targets were back-projected onto a fronto-parallel

screen 30 cm in front of the monkey. The monkey touched a central target and

fixated his eyes on a crosshair at the upper right corner of the central target. After a center hold period of 500 (monkey G) or 400-600 ms (monkey H, selected

randomly and uniformly in this range), a pseudo-randomly chosen reach goal was

presented at one of eight possible radial locations (30, 70, 110, 150, 190, 230, 310, 350°) 10 cm away. After a pseudo-randomly chosen instructed delay period of 200, 750, or 1000 ms, the GO cue (signaled by both the enlargement of the reach goal and the disappearance of the central target) was given and the monkey reached to the goal. After a hold time of 250 (monkey G) or 200 ms (monkey H) at the reach goal, the monkey received a liquid reward. The next trial started 200 (monkey G) or 100 ms (monkey H) later.

Eye fixation at the crosshair was enforced throughout the delay period. Reaction times were enforced to be greater than 80 ms and less than 600 (monkey G) or 400 ms (monkey H). The following are the statistics for the actual reaction times (mean±SD in ms): 237±23 for monkey G and 248±22 for monkey H. Occasional trials with a 200 ms delay period were inserted to encourage the monkey to “plan” throughout the delay period and were not used in subsequent analyses.



During experiments, monkeys sat in a custom chair (Crist Instruments, Hagerstown, MD) with the head braced and the arm ipsilateral to the neural recording apparatus restrained. The presentation of the visual targets was controlled using the Tempo software package (Reflective Computing, St. Louis, MO). A custom photo-detector recorded the timing of the video frames with 1 ms resolution. The

position of the hand was measured in three dimensions using the Polaris optical tracking system (Northern Digital, Waterloo, Ontario, Canada; 60 Hz, 0.35 mm

accuracy), whereby a passive marker taped to the monkey’s fingertip reflected in

frared light back to the position sensor. Eye position was tracked using an overhead

infrared camera (Iscan, Burlington, MA; 240 Hz, estimated accuracy of 1°).A 96-channel silicon electrode array (Cyberkinetics, Foxborough, MA) was im

planted straddling dorsal premotor (PMd) and motor (Ml) cortex, as estimated

visually from local landmarks (Figure 2.2; monkey G, right hemisphere; monkey H, left hemisphere). Surgical procedures have been described previously (Churchland et al., 2006b; Santhanam et al., 2006). Spike sorting was performed offline using techniques described in detail elsewhere (Sahani, 1999; Santhanam et al., 2004; Zumsteg et al., 2005). Briefly, neural signals were monitored on each channel dur

ing a two-minute period at the start of each recording session while the monkey

performed the behavioral task. Data were high-pass filtered, and a threshold level

of three times the RMS voltage was established for each channel. The portions

of the signals that did not exceed threshold were used to characterize the noise

on each channel. During experiments, snippets of the voltage waveform containing threshold crossings (0.3 ms pre-crossing to 1.3 ms post-crossing) were saved with 30 kHz sampling. After each experiment, the snippets were clustered as follows. First, they were noise-whitened using the noise estimate made at the start of the experiment. Second, the snippets were trough-aligned and projected into a four-d im ensional sp ace u sin g a m od ified principal com p on en ts an a lysis. N ex t, un-

supervised techniques determined the optimal number and locations of the clusters in the principal components space. We then visually inspected each cluster, along

with the distribution of waveforms assigned to it, and assigned a score based on how well-separated it was from the other clusters. This score determined whether



Monkey G Monkey H

PosteriorAnteriorPosterior

Figure 2.2: Placement of electrode arrays in PMd of monkeys G and H. For both monkeys, the arrays were placed in a location that spans dorsal premotor and primary motor cortices. Intraoperative photographs of the array implanted in cerebral cortex are shown with sulci indicated. Overlapping diagram shows the relative array placement between monkeys. Monkey H’s sulcal pattern is reflected vertically and rotated to bring the sulci into alignment with those of monkey G. Ce.S.: central sulcus; S.Pc.D.: superior precentral dimple; Sp.A.S.: spur of the arcuate sulcus; A.S.: arcuate sulcus.

a cluster was labeled a single-neuron unit or a multi-neuron unit.

Figure 2.3A shows neural and behavioral data for a single trial with a lower- right reach goal. The top row shows the delayed reach task timeline, where the time

between reach goal onset and the GO cue is the instructed delay period, and the time between the GO cue and movement onset is the reaction time. The multiple, simultaneously-recorded spike trains are shown in the middle. The corresponding arm and eye position traces are plotted at the bottom. Figure 2.3B illustrates the spatial arrangement of the eight reach goals, as well as the corresponding spike histograms for one representative unit across the eight reach goals. Each spike


CHAPTER 2. DECODING GOAL-DIRECTED M OVEMENTS 19

Center hold Reach goal onset GO cue Move onset

W.VVvJ,'*.

••• • / . • . - * rt* ** (->.>:; v: * * . .. ...’ • t '' ‘ I.;/™s . • i •

T ; . . f t :r- , - r v i

" ;' . U v ,;

Arm 3 ■ E y e ] ■

200 ms

B

Delay Peri-Movement Activity Activity

Figure 2.3: Delayed reach task and neural recordings. A: Task timeline (top), simultaneously-recorded spike trains (middle), and arm and eye position traces (bottom) are shown for a single trial. Blue and red lines correspond to horizontal and vertical p o sitio n , respectively . T h e fu ll range o f sca le for th e arm and eye position is ±15 cm from the center target. Trial from experiment H20041106.1. B: Spatial arrangement of the eight reach goals and corresponding spike histograms for one representative unit (Unit H20041217.23.0). Bars below histograms indicate delay (hatched) and peri-movement (gray) activity. Dotted lines denote reach goal onset and movement onset.



histogram was obtained by averaging the spike trains across multiple trials to the

same reach goal. The hatched and gray bars below each spike histogram indicate

delay and peri-movement activity, respectively. In broad terms, delay activity occurs during the delay period (before movement onset), while peri-movement activity occurs around the time of the reach. The precise windows of delay and peri-movement used are defined in later sections.

The monkeys were trained over several months and multiple data sets of the same behavioral task were collected. Each data set was collected in one day’s

recording session. For each monkey, we chose to analyze the data set with the

largest number of successful trials. The selected data sets comprised 1456 successful trials for monkey G (experiment G20040508) and 1072 successful trials for monkey H (experiment H20041217), not including trials with 200 ms delay periods. The data set for monkey G (H) included 30 (56) single-neuron units and 68 (143) multi-neuron units, for a total of 98 (199) units.

The results reported in this chapter are cross-validated by randomly splitting the entire data set by trials into J roughly equal-sized parts. For each j € { 1 , . . . , J } , the j th part served as test data for a model trained on the other

J — 1 parts. We used J = 9 (11) for the data set for monkey G (H). To evaluate

decoder performance at different numbers of neural units, we further randomly

split each part by units into K equal subparts. Each subpart contained the same number of trials and identical behavioral data as its parent, but with only 1 /K of the neural data. To meaningfully compare the two data sets, we roughly equalized the number of units in each subpart. Unless otherwise specified, the results

presented here assume K = 1 (98 units) for monkey G and K = 2 (99 units) for monkey H.



2.4 M odel fitting

2.4.1 Trajectory model

We considered three trajectory models: a random walk model (RWM, (2.14) and

(2.15)) in acceleration, a single linear-Gaussian trajectory model (STM, (2.3) and(2.4) for special case of M = 1), a n d 'a mixture of linear-Gaussian trajectory models (MTM, (2.3) and (2.4)). Arm position data was taken from 50 ms before movement onset to the end of the trial. The data was then padded with the final arm position up to 1000 ms after movement end to emphasize the importance of stopping at the reach goal. In effect, this penalized trajectory models whose

trajectories simply pass through, rather than come to rest at, the reach goals.

Each of the trajectory models was fitted to the padded arm data with a time step

of dt = 10 ms. Although arm position was tracked in three dimensions, we only

analyzed movement in the plane of the fronto-parallel screen as there was relatively little movement perpendicular to the screen.

For the STM and MTM, the following physical quantities were included in the arm state vector x t: position, velocity, acceleration, position magnitude, and velocity magnitude. As shown in (2.17), this eight-dimensional state vector included two dimensions each for position, velocity, and acceleration; and one dimension each for position magnitude and velocity magnitude. The magnitude terms had very little effect on the state dynamics, but were critical for fitting the observation

model ((2.5)) to the neural data, as described below.

The parameters of all three trajectory models were fit using least squares. For

the RWM, the fitted parameters were {Q, 7r, V}, where Q was constrained to be diagonal (Brockwell et al., 2004). For the STM and MTM, the fitted parameters were {Am, b TO, Qm, 7tot, Vm} (STM: m = 1; MTM: m e { 1 ,. . . , 8}). For the STM, a single linear-Gaussian trajectory model was shared across all goal locations. The STM is similar to the trajectory model used by Donoghue and colleagues (Wu et al., 2004, 2006), where it was applied to pursuit-tracking and “pinball” tasks. In contrast, for the MTM, a separate linear-Gaussian trajectory model was trained for each reach goal, based only on reaches to that goal.



For the STM and MTM, the fitted transition matrices A m and additive constants b m took on the form shown in (2.17), where * denotes a non-zero entry. The elements of the state vector x t are included for visual reference.

Am —

1 0 dt 0 0 0 0 00 1 0 dt 0 0 0 00 0 1 0 dt 0 0 00 0 0 1 0 dt 0 0★ *

— Xt = (2.17)

horz pos

vert pos horz vel

vert vel horz acc

vert acc mag pos mag vel

While not explicitly constrained as such in the fitting procedure, the fitted A m and b m took on this form due to the physical relationships of the state vector elements.

The upper panel of Figure 2.4A shows position trajectories to each reach goal

collected empirically. The corresponding speed profiles are plotted in the lower

panel. Three properties of goal-directed reaches are seen in Figure 2.4A. First, the

trajectories lead to discrete reach goals rather than taking on arbitrary paths in the workspace. Second, multiple reaches to the same goal are not all identical. In particular, there is variability in the position traces and speed profiles. Third, the trajectories start at rest, proceed out to the reach goal, and end at rest. The degree to which the trajectory model captures the dynamics of the empirically-collected reaches directly affects the accuracy with which new trajectories can be decoded

from neural data. We therefore seek a trajectory model that can capture these

three properties of goal-directed reaches.

We can qualitatively assess the fitness of different trajectory models by generating sample trajectories from the fitted models and comparing them with the empirically-collected trajectories. We will quantitatively compare decoders based on the different trajectory models in later sections. Generative trajectories of the fitted RWM, STM, and MTM are shown in Figure 2.4B-D. Note that these are



B DActual

E£<noQ_c0>>

100

100100 0 100

Horz pos (mm)

RWM

100 0 100 Horz pos (mm)

STM

□ □

100 0 100 Horz pos (mm)

MTM

100 0 100Horz pos (mm)

Figure 2.4: Position trajectories (upper row) and speed profiles (lower row) A: collected empirically, B: generated by the RWM, C: generated by the STM, and D: generated by the MTM. Only 24 reaches (three to each reach goal) are shown in each column for clarity.

sample trajectories of the trajectory models and are not decoded trajectories; gen

erating these trajectories did not involve neural data. The RWM (2.14) provides

smoothness in acceleration, where the degree of smoothness is determined by the random walk covariance Q. We generated sample velocity trajectories according to (2.14) and (2.15) using a Q fitted to training data, then integrated the velocities over time to obtain sample position trajectories (Figure 2.4B). On the other hand, the STM favors certain characteristic trajectory patterns in arm state space. One characteristic pattern that looks similar to Figure 2.4A has trajectories emanating radially outward from the origin (ignoring the non-position terms in the arm state vector for now). Such trajectories (not shown) extend outward indefinitely

and cannot stop at the reach goals. To minimize the average mismatch between the trajectory model and the empirically-collected trajectories (Figure 2.4A) over the entire duration of the padded arm data, the STM fitted to the training data



has sample position trajectories (Figure 2.4C) that proceed outwards very slowly.

Other features seen in the sample trajectories in Figure 2.4C can be explained by

the presence of non-position terms in the arm state vector and the noise covariance

Qm in (2.3) and (2.4).

While the sample trajectories of the RWM and STM each reflect some aspects

of arm dynamics, they are not flexible enough to capture the goal-directed nature of the actual reaches. The corresponding speed profiles also do not match those of the actual reaches very well. In contrast, as shown in Figure 2.4D, the sample trajectories of the MTM exhibit the three properties of goal-directed reaches; namely, the trajectories are directed toward the eight discrete reach goals, there is variability among trajectories to the same goal, and the trajectories start and end roughly at rest. Furthermore, these sample trajectories are similar to the empirically-

collected trajectories in Figure 2.4A in terms of their bell-shaped speed profiles

and the across-trial variability seen in the position traces and speed profiles. In

essence, compared to the RWM and STM, the MTM better captures the dynamics of goal-directed reaches.

The following is the intuition behind how a model as simple as a mixture

of linear-Gaussian models can capture the essential properties of goal-directed reaches. For each m, the fitted transition matrix A m in (2.3) defines a convergent linear-Gaussian model. In other words, in the noiseless case, its sample trajectories converge to a point in state space. If bm = 0, this equilibrium point is the origin of the state space. For a non-zero bTO, the equilibrium point (in particular, those

elements corresponding to arm position) can be shifted away from the origin and,

in this case, lie at the m th reach goal. Regardless of where the sample trajectories

start, they are directed by the m th mixture component toward the m th reach goal, where they come to rest. These trajectories are further constrained by the fitted 7rm and Vm in (2 .4 ) to sta rt near th e center of th e workspace w ith nearly zero velocity. Thus, one can imagine a point mass, initially at rest at the center of the workspace, that is released and directed towards the m th reach goal, where it comes to rest.

The trajectory model can be viewed, in the space of all possible trajectories,



as a specification of which trajectories are more likely than others and by how much. This information is encoded in the parametric form of the trajectory model (e.g., random walk or linear-Gaussian), as well as in the fitted values of the model

parameters. For the trajectory models considered in this chapter, there is a nonzero probability of generating any arbitrary trajectory in Figure 2.4B-D. However,

for the MTM fitted to the training data shown in Figure 2.4A, trajectories that

do not head towards one of the eight reach goals or those that do not have a bell

shaped speed profile are far less likely than those that do. While it is technically

possible to generate a trajectory in Figure 2.4D that looks very different from those shown, the chances are negligibly small. Thus, the MTM can be viewed as imposing a soft constraint on what trajectories are possible; how steeply the soft

constraint drops off depends on how tightly the training trajectories are clustered in Figure 2.4A.

2.4.2 Observation model

For each observation model ((2.5) and (2.16)), we sought the optimal lag for each

unit and the parameters where i indexes unit. The optimal lag refers

to the temporal relationship between the activity of a neural unit and the arm trajectory (Moran and Schwartz, 1999). For example, if a unit is causally related to motor execution, the unit’s firing would be expected to lead the arm movement

in time. Wu and colleagues (Wu et al., 2006) used a greedy algorithm to find the set of lags that minimized the uncertainty of the position estimates. In contrast, Brockwell and colleagues (Brockwell et ah, 2004) chose the best-fitting lag for each unit by comparing model deviances. The optimal lag could be found for each unit

separately because the units were modeled to be independent given the arm state (cf. (2.16)).

Here, we obtain the optimal lags using Bayesian model selection (MacKay,

2003). Each choice of lag represents a different model class. The following is an overview of the approach, which will be formalized in the remainder of this section. For a given unit, we first determine the most likely lag given the training data.



This requires averaging over all possible settings of the parameters {c*, d^}. As in

(2.16), the units in (2.5) are modeled to be independent given the arm state. Thus, the lag for each unit can be optimized separately. Then, we find the most likely parameter settings, given both the training data and the optimal lag just found.

We considered a fixed window of neural activity starting 200 ms before move

ment onset (fi) and ending 150 ms after movement end (t2). This was aligned to segments of arm trajectory data of the same duration, but offset by 26 possible

lags ranging from 150 ms to -100 ms at 10 ms intervals. The convention here is that positive lags are causal (neural activity leads arm movement), while negative

lags are acausal. Acausal lags in the context of prosthetic applications will be addressed in Sections 2.6 and 2.7.

For both observation models ((2.5) and (2.16)), we performed the following fitting procedure for each of the q units. For notational simplicity, the unit index

i is omitted. Let {x} denote x* at all times, {,§}(( denote the spike counts fromtime ti to f2> and 0 = { c , d}. Spike counts were taken in A = 20 ms bins. We firstcomputed a posterior distribution on the lag at the 26 possible values

p (laS I W ti , {*}) oc P ({s}£ | lag, {x}) P (lag | {x}) (2.18)

« J p (W ii I la§> (x )) P (0 I la§» ( x ) ) de- (2-19)

(2.18) was obtained using Bayes’ rule. The first term on the right side of (2.18) was expanded by conditioning on the parameters 9, yielding (2.19). The prior

distribution on the lag, P (lag | {x}), was assumed to be uniform and independent of {x}.

We took the Laplace approximation (MacKay, 2003) of the integral in (2.19). The prior distribution on the parameters, P (9 | lag, {x}), was set to be A f (0, 10000) for each scalar element and independent of the lag and {x}. For both observation models ((2.5) and (2.16)), the log integrand was strictly concave in 9 and a

global optimum could be found. The lag with the highest posterior probability,



without magnitude terms

with magnitude terms

200 ms

Figure 2.5: Non-directional firing rate modulations can be captured by including magnitude terms (in this case, position and velocity magnitudes) as explanatory variables. The empirical firing rate histograms (gray) are compared to firing rate profiles predicted by firing rate models with (blue) and without (red) magnitude terms. Vertical arrows denote movement onset. Each panel corresponds to one reach goal. (Unit G20040508.38.1)

P (lag | {s}^, {x}), and its corresponding modal 6 obtained from

p (# I ( s }p . !ag, {x}) oc P ({s}£ | 0, lag, {x}) P {6 | lag, {x}) (2.20)

were then used in (2.5) and (2.16). This optimal lag should be interpreted as the

best-fitting temporal alignment between the neural activity and arm trajectories for the particular parametric observation model used. In general, different observation models yield different optimal lags. Thus, the optimal lag is model-dependent and only roughly reflects how the unit is temporally related to motor execution.

We included magnitude terms in the arm state vector for the STM and MTM



for the same reason that that the velocity magnitude (i.e., the arm speed) appears in (2.16) for the RWM; namely, to allow the associated firing rate models to capture non-directional firing rate modulations. The firing rate models are the

exponentials in (2.5) and (2.16), where each dimension of x t in (2.5) and v* in

(2.16) is an explanatory variable. The importance of including arm speed as an

explanatory variable for firing rate modulations was first recognized by Schwartz and colleagues (Schwartz, 1992). While the focus here is not to compare different parametric firing rate models, we demonstrate this point by comparing the firing rate profiles that result from including and excluding the magnitude terms in the arm state vector x 4 in (2.5) for one illustrative unit (Figure 2.5). Using the methods described above, we found the optimal lag and fitted {c*, dj} based on the training data. Then, actual arm trajectories (from test data) were mapped to mean firing rates using the firing rate model in (2.5). These predicted mean firing rates were

aligned on movement onset and averaged across test trials. Figure 2.5 shows the resulting firing rate profiles for this unit when including (blue) and excluding (red)

the magnitude terms in the firing rate model. These firing rate profiles can be compared to the empirical firing rate histograms (gray) for the same test trials. In this case, the magnitude terms allowed firing rate peaks to be present in all reach

directions, considerably improving the model fit for the lower reach goals. Non- directional firing rate modulations, like those shown in Figure 2.5, were common across the population of units recorded in both monkeys and were better captured by including magnitude terms as explanatory variables.

2.5 Incorporating goal information from delay

activity

Up to this point, the neural activity discussed has been peri-movement activity, which takes place around the time of movement and specifies the moment-by- moment details of the arm trajectory. In the delayed reach task, there is also neural activity present during an instructed delay period that directly preceeds the



GO cue (termed delay activity). Rather than specifying the moment-by-moment details of the trajectory, delay activity has been shown to reliably indicate the upcoming reach goal (Shenoy et ah, 2003; Hatsopoulos et al., 2004; Yu et ah, 2004; Musallam et al., 2004; Santhanam et al., 2006). The data sets for both monkeys G

and H contain both delay and peri-movement activity on each trial. Furthermore,

both types of activity may be emitted by the same unit on a single trial, as can

be seen in Figure 2.3.

The following describes how the reach goal can be decoded from delay activity

by applying Bayes’ rule. Let z be a q x 1 vector of spike counts across the q

simultaneously-recorded units in a pre-specified time window during the delay period on a single trial. The distribution of spike counts (from training data) for each reach goal m can be fit to either a product of Gaussians (Maynard et al., 1999; Yu et a l , 2004)

z I m ~ J Ja /" {zi\ Hi,m, a?m) (2.21)i= i

or a product of Poissons (Shenoy et al., 2003; Hatsopoulos et al., 2004)

Qz | m ~ J J Poisson (zf, Aj>m) , (2.22)

i= 1

where of m, and Xi>m are the parameters of the ith unit for the m th reach goal. The Z{ notation in (2.21) and (2.22) specifies that the distribution is describing the ith element of the vector z. In both models, the units are assumed to beindependent given the reach goal. It would be natural to introduce conditional dependencies between the units using a general multivariate Gaussian, but there are often difficulties in estimating an invertible covariance matrix for tens to hundreds of units with a limited number of training trials (Maynard et al., 1999).

For any test trial, the probability that the upcoming reach goal is m given the



delay activity z can be computed by applying Bayes’ rule

r-w i \ P (z I m)P(m) P (z I m)p { m ' 2> = P ( ,) = E U W Y ( 2 ' 2 3 )

m'

where P(m) in (2.23) is assumed to be uniform. The most likely reach goal (i.e., the one with the largest P{m \ z ) ) is usually taken to be the decoded reach goal (Shenoy et al., 2003; Hatsopoulos et al., 2004; Yu et al., 2004; Musallam et al.,

2004; Santhanam et al., 2006), as will be the case in Chapter 3.

The accuracy of the goal decoder (2.23) varies with the duration and placement

of the time window in which spikes are counted, as well as the precise spike count

model P {z | m) that is used (Hatsopoulos et al., 2004; Santhanam et al., 2006). Optimizing the goal decoder is treated in detail in Chapter 3 and the aforementioned references. We focus here on how to incorporate this goal information, if available, when decoding continuous arm trajectories. For this purpose, we choose to use the Gaussian model (2.21) with a 200 ms spike count window starting 150 ms after the appearance of the reach goal.

The goal information from delay activity, P{m | z ) , can be incorporated natu

rally in the MTM framework in the place of P{m) in (2.2). The distribution P(m)

in (2.2) represents the prior knowledge (i.e., prior to movement onset) that the

upcoming reach goal is m. Because the delay activity entirely preceeds movement onset and provides information about the upcoming reach goal, it can be used to set P(m) in (2.2) on a per-trial basis.

It is important to note that the most likely goal from (2.23) is not simply assumed here to be the goal of the upcoming reach. On a given trial, the delay activity may not definitively indicate the goal of the upcoming reach (e.g., two dif

ferent reach goals may have significant probability), or it may indicate an incorrect goal for the upcoming reach. In this case, we would like to allow the subsequent peri-movement activity to determine the goal of the reach, or even correct the mistake, “in-flight” . Instead of making a hard goal decode based on delay activity,

the entire distribution P (m | z ) is retained and passed to the MTM framework. For simplicity, we make the approximation that delay activity is only informative



of the upcoming reach goal and is independent of the peri-movement activity; in other words, we assume that z is not directly coupled with x t or y t.

2.6 R esults

In this section, we first illustrate how arm trajectories were decoded under the

MTM framework by analyzing two representative trials in detail. Then, we compare the decoding performance using different trajectory models. For all decoders, we first fit the model parameters to training data. The test data for a single trial consisted of (1) the arm trajectory, taken from 50 ms before movement onset to 50 ms after movement end at dt = 10 ms time steps, (2) the peri-movement spike

counts, taken in overlapping A = 20 ms bins and temporally offset from the arm

trajectory by the optimal lag found for each unit, and (3) the delay period spike

counts, taken in a single 200 ms bin starting 150 ms after the appearance of the

reach goal. Arm trajectories in the test phase were used to evaluate the accuracy of the trajectories estimated from neural data. Because neural data collection ended shortly after movement end, the arm trajectories were not padded as in the training phase.

Figures 2.6 and 2.7 demonstrate how the MTM framework was used to decode arm trajectories for two particular test trials. In Figure 2.6, the upper-left panel compares the actual position trajectory (thick black) with those decoded using

the STM (thick green) and MTM (thick red). For the purposes of this plot, only

the state elements corresponding to arm position are shown. From (2.13), the MTM decoded trajectory is a weighted sum of component trajectory estimates E [xt | {y}(, m], one for each reach goal indexed by m G { 1 ,... ,8 } . The three component trajectory estimates with the largest weights for this trial are plotted in th e u p p er-left p an el (cyan , b lue, m agen ta ).

The lower-left panel in Figure 2.6 shows the time evolution of the corresponding weights P (m \ {y}() during the course of the trial. The values of these weights at time zero (t = 0) represent the probability that the upcoming reach goal is m, before any peri-movement neural activity had been observed. The distribution of



Without delay activity With delay activity

100

EE<0oQ.-e<u>

100 0

100

EEcooQ_■e>

100 0Horz pos (mm) Horz pos (mm)

_ 0.5

200 Time (ms)

400

0.5

^ “O w - A .200

Time (ms)400

Figure 2.6: A representative test trial in which the use of delay activity improved the MTM decoded trajectory. Upper panels: actual trajectory (thick black), STM decoded trajectory (thick green), MTM decoded trajectories without (left panel, thick red) and with (right panel, thick orange) delay activity, and three MTM component trajectory estimates with the largest weights (cyan, blue, magenta). Note that the actual trajectory, STM decoded trajectory, and MTM component trajectory estimates are identical in the two upper panels. Lower panels: the three corresponding MTM component weights as they evolve during the trial. Time zero corresponds to 60 ms before movement onset (i.e., one time step before we begin to decode movement). For this trial, E rms was 17.4, 7.8, and 7.4 mm for STM, MTM without delay activity, and MTM with delay activity, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 686)

weights at t = 0 is precisely P(m) in (2.2). In the left panels, we assumed that there

was no information available about the identity of the upcoming reach goal before the reach began (i.e., no delay activity), so all eight goals were equiprobable (i.e., P(m) = 1/8 f o r m e {1, . . . , 8}). As time proceeded, these weights were updated as more and more peri-movement activity was observed. Recall that P (m | { y } ( )

represents the probability that the actual reach goal is m, given the observed neural



activity up to time t. During the first 200 ms, the actual reach goal (cyan) was

more likely than the other seven reach goals at nearly every time step; however, there was some competition with a neighboring reach goal (blue). It was only after approximately 200 ms that the decoder became certain of the actual reach goal

(i.e., P (m | {y}i) approached one) and remained certain for the rest of the trial.I #

For clarity, the other five weights are not plotted since they went to zero shortly after t = 0 and did not contribute subsequently to the weighted sum. The weighted

sum of the eight component trajectory estimates (of which three are plotted in the

upper-left panel) using the weights shown in the lower-left panel yield the MTM

decoded trajectory (thick red, Erms: 7.8 mm) in the upper-left panel.

If delay activity is available, it can be used to set a non-uniform P(m ) in(2.2) on a per-trial basis, as previously discussed. The only difference between the left and right columns in Figure 2.6 is that the MTM decoder used delay

activity in the latter, but not the former. In the lower-right panel, the weights at t — 0 represent the probabilities of each reach goal based only on delay activity, before any peri-movement activity was observed. In this case, the delay activity indicated that the actual reach goal (cyan) was more probable than the other

goals. This prior knowledge of the identity of the upcoming reach goal was then

taken into account when updating the weights P (m | (y}() during the course

of the trial as more and more peri-movement activity was observed. Note that using delay activity only affected P (m ) in (2.2); the conditional state posteriors

P (xt | (y ) i,m ) and the data likelihoods P ({y}\ \ m) remained unchanged. As the means of the conditional state posteriors, the component trajectory estimates therefore also remained unchanged, as can be verified by comparing the cyan, blue, and magenta traces in the two upper panels.

For the trial shown in Figure 2.6, using delay activity reduced the competition between th e a c tu a l reach goa l (cyan) and th e n eighboring goa l (b lu e). C om pared

to the lower-left panel, the weight for the actual reach goal (cyan) in the lower- right panel was higher at every timepoint, the clearest effect seen during the first 200 ms. In other words, by using delay activity, the decoder was more certain of the actual reach goal throughout the trial. In both lower panels, the decoders



W ithout delay activity With delay activity

o 100 0 100Horz pos (mm) Horz pos (mm)

o 200 Time (ms)

400 o 200 Time (ms)

400

Figure 2.7: A representative test trial in which the peri-movement activity corrected an incorrect goal identification from delay activity. Figure conventions identical to those in Figure 2.6. Note that the thick red and orange traces in the upper panels are overlaid with the cyan trace. For this trial, Eims was 16.7, 13.3, and 13.4 mm for STM, MTM without delay activity, and MTM with delay activity, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 676)

definitively identified the actual reach goal after approximately 200 ms. As before,

the weighted sum of the eight component trajectory estimates (of which three are

plotted in the upper-right panel) using the weights shown in the lower-right panel yield the MTM decoded trajectory (thick orange, Evms: 7.4 mm) in the upper-right panel. By comparing the MTM decoded trajectories in the two upper panels with the actual trajectory (thick black), we see that the use of delay activity decreased the error for this trial.

In contrast to Figure 2.6, Figure 2.7 shows a trial where the peri-movement activity alone was able to quickly determine the actual reach goal without much competition from neighboring goals. This can be seen in the lower-left panel of Figure 2.7, where the weight corresponding to the actual reach goal (cyan) rose to



unity after approximately 80 ms and stayed there for the remainder of the trial. As a result, compared to the actual trajectory (thick black) in the upper-left panel, the resulting MTM decoded trajectory (thick red, Erms: 13.3 mm) was quite accu

rate. As in Figure 2.6, we can incorporate delay activity if it is available; however,

in this case, the dominant weight at t = 0 (blue) did not correspond to the ac

tual reach goal (cyan). In other words, the delay activity incorrectly indicated the

identity of the upcoming reach goal. However, as these weights were updated by the observation of peri-movement activity, this “error” was soon corrected (within approximately 80 ms). From that point on, the weight corresponding to the ac

tual reach goal dominated. Despite this error at the beginning of the trial, the MTM decoded trajectory (thick orange, EIms: 13.4 mm) in the upper-right panel

remained nearly identical to that in the upper-left panel. The reason is that the error occured early-on in the trial, when all eight component trajectory estimates

were still near the origin of the workspace; the weighted sum of these component estimates lies near the origin no matter how they are weighted.

Figures 2.6 and 2.7 together illustrate the benefits of the joint use of peri-

movement and delay activity. When one type of activity is unable to definitively identify (or incorrectly identifies) the actual reach goal, the MTM framework allows the other type of activity to strengthen (or overturn) the goal identification

in a probabilistic manner. In Figure 2.6, the peri-movement activity alone was unable to definitively identify the actual reach goal during the first 200 ms, as there was competition with a neighboring goal. When prior goal information from delay activity was incorporated, the decoder was more certain of the actual reach

goal throughout the trial. In Figure 2.7, the delay activity incorrectly indicated

the identity of the upcoming reach goal. However, the peri-movement activity

overturned this incorrect goal identification early-on and rescued the decoder from incurring a large E rms on th is tria l. In all four upper p anels in F igures 2.6 and 2.7, it is worth noting that the MTM decoded trajectories (thick red and orange) were closer than the STM decoded trajectories (thick green) to the actual trajectories (thick black), whether or not delay activity was used.

Having demonstrated how the MTM framework produces trajectory estimates,



Monkey GB

30

25

20

EE 15

(/>

Em o

5

0

30

25

20

15

10

5

0

Monkey H

RWM STM MTM,, MTM,,,,M DM

RWM STM MTM,, MTM.,,M DM

Figure 2.8: Evms (mean ± SE) comparison for decoders using the RWM, STM, MTM without delay activity (MTMm), and MTM with delay activity (MTMdm)- A: Monkey G (98 units), B: monkey H (99 units).

we can now quantify and compare the performance of decoders based on different trajectory models. Figure 2.8 compares the trial-averaged decoding performance

using the RWM, STM, MTM without delay activity (labeled MTMm, since only peri-movement activity is used), and MTM with delay activity (labeled MTMdm, since both delay and peri-movement activity are used). For each monkey, the trend was the same: E ims decreased when going from RWM to STM, from STM

to MTMm, and from MTMm to MTMdm (Wilcoxon paired-sample test, p < 0.01). The superior performance of the MTMm compared to the RWM and STM can be attributed to the fact that the MTM better captures the dynamics of goal-directed reaches, as shown in Figure 2.4. If delay activity is available, this additional source of information can be naturally incorporated in the MTM framework to further improve decoding performance (MTMdm)- The RWM can be seen as a restricted form of the STM, which explains the higher Erms of the RWM compared to the STM in Figure 2.8. For this reason, we will focus on comparing the STM, MTMm,

and MTMdm in the remainder of this chapter.The performance of these decoders can also be compared on a trial-by-trial

basis. Figure 2.9 shows two-dimensional histograms of Erms differences between



Monkey G Monkey H

-1 0

-20

-30

-40-40 -30 -20 -10 0 10 20 30

EE2I—COI5o

1-2

-1 0

-2 0

-30

-40-40 -30 -20 -10 0 10 20 30

MTMm - STM (mm) MTMm - STM (mm)

Figure 2.9: Two-dimensional histogram of Erms differences between pairs of decoders for A: monkey G (98 units), B: monkey H (99 units). Horizontal axis: E rms difference between MTMm and STM, vertical axis: E ims difference between MTMdm and STM, diagonal axis: Eims difference between MTMDm and MTMm- The grayscale intensity (log scale) indicates the number of trials lying in each bin. The red dotted lines represent the means of the E rms differences along each axis. The letters (a, b, c, d) show where the trials in Figs. 2.6, 2.7, 2.10, and 2.11 lie on the histogram, respectively.

pairs of decoders for each monkey. The horizontal axis represents the E ims difference between the MTMm and STM, while the vertical axis represents the E rms difference between the MTMdm and STM. Both error differences are computed on a single-trial basis. The grayscale intensity of each bin indicates the number

of trials whose Eims differences fall in that bin. The MTMm performed better than the STM for any trial to the left of the vertical zero axis, while the MTMdm performed better than the STM for any trial below the horizontal zero axis. We can also directly compare the MTMm and MTMdm using this two-dimensional histogram. By construction of the histogram, the MTMDm performed better than the MTMm for any trial that lies below the diagonal axis. The means of these error differences are shown as red dotted lines parallel to each of the three axes (horizontal, vertical, and diagonal). For both monkeys, all three means differ from




100

EEinOC l

•ea>>

0 100

100

a .

1000Horz pos (mm) Horz pos (mm)

_ 0.5

200 Time (ms)

400

1

0.5

200 Time (ms)

400

Figure 2.10: An outlying test trial in which the MTMm decoded trajectory exhibited a snap-to effect. Figure conventions identical to those in Figure 2.6. For this trial, Erms was 21.6, 43.8, and 9.3 mm for STM, MTMm, and MTMdm, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 1921)

zero (Wilcoxon paired-sample test, p < 0.01). The values of these means show

that, on average, the MTMm performed better than the STM, the MTMdm performed better than the STM, and the MTMDm performed better than the MTMm- The same mean differences can be obtained by taking pairwise differences in bar heights in Figure 2.8.

The letters (a and b) in Figure 2.9A indicate where the trials shown in Figures 2.6 and 2.7 lie on the histogram. Both trials are taken from the dominant

central region of the histogram and are thus considered to be representative trials. However, there are also outlying trials for which the STM performed better than

the MTMm and/or the MTMdm- We consider two of these trials (labeled c and d in Figure 2.9A) in detail in Figures 2.10 and 2.11.

Figure 2.10 shows an outlying test trial for which the MTMm performed worse than the STM. While the MTM framework allows for soft weighting between



the mixture components, the MTM decoded trajectories often transitioned rather abruptly from one component trajectory estimate to another (referred to as the

snap-to effect). This effect is seen in the upper-left panel, where the MTM decoded trajectory (thick red, E ims: 43.8 mm) moved back and forth between the cyan and blue component trajectory estimates, rather than taking a path in-between as did the STM (thick green). Eventually, the MTM decoded trajectory “snapped to” the

incorrect blue component. From the perspective of weights, the snap-to effect cor-\responds to rapid weight changes with only a single dominant weight at most time

points, as seen in the lower-left panel. At a given time point, the presence of a sin

gle dominant weight is related to the variability of the neural responses. The effect tends to arise if the neural variability across multiple reaches to a given goal (the “within-class scatter”) is small relative to the differences in mean neural responses across goals (the “between-class scatter”). When delay activity was incorporated in the MTM decoder, the competition between the two neighboring reach goals (cyan and blue) was supressed and the weight corresponding to the actual reach goal (cyan) dominated throughout the reach, as shown in the lower-right panel.

Notice that the delay activity strongly favored the actual reach goal (cyan), as

indicated by the distribution of weights at t = 0. Thus, the incorporation of delay

activity biased the choice of models towards the correct goal sufficiently strongly so to avoid the “snap” to the competing trajectory. The resulting MTM decoded trajectory (thick orange, E vms: 9.3 mm) is shown in the upper-right panel.

Figure 2.11 shows an outlying test trial for which the M T M dm performed worse than the STM. W ithout delay activity, the weight for the actual reach goal (cyan) rapidly rose from 1/8 to unity and remained there for the rest of the trial. T his led to a fairly accurate MTM decoded trajectory (thick red, E ims: 11.6 mm) in

the upper-left panel. As in Figure 2.7, the delay activity incorrectly indicated th e id en tity of th e u p com in g reach goal, as show n in th e low er-right panel of Figure 2.11. The dominant weight (blue) at t = 0 did not correspond to the actual

reach goal (cyan). Unlike in Figure 2.7 however, the observed peri-movement activity was not able to correct this error and the resulting decoded trajectory

(thick orange, Eims: 48.8 mm) in the upper-right panel headed to a neighboring




EE,inOC l

■c<D>

50

0

50

0 100

EE,inoa .-e0>

50

0

50

0 100Horz pos (mm) Horz pos (mm)

1

0.5

200 Time (ms)

400

1

0.5

200 Time (ms)

400

Figure 2.11: An outlying test trial in which the peri-movement activity was not able to correct an incorrect reach goal identified by the delay activity. Figure conventions identical to those in Figure 2.6. For this trial, Eims was 17.1, 11.6, and 48.8 mm for STM, MTMm, and MTMdm, respectively. Monkey G, 98 units. (Experiment G20040508, trial ID 1608)

goal.

The weights represent a probabilistic compromise between the reach goal in

dicated by the peri-movement activity and that indicated by the delay activity. This can be seen by comparing (2.1) and (2.2), where the weights P (m | ( y } ( ) are computed by multiplying a term P ( { y } ( | m) that depends only on peri-movement activity with a term P(m) that depends only on delay activity (if delay activity is available). The relative influence of the two types of neural activity is dependent n ot on ly on th e neural d a ta th a t is observed , b u t a lso on th e p articu lar forms ofparametric models used ((2.3) (2.5) and (2.21)). Figure 2.11 suggests that, for this particular trial, the relative influence of the delay activity was too strong relative to that of the peri-movement activity.

Because the number of neural units that can be isolated on a given day varies



Monkey G H

STM30 \

\\ MTMM

EE

\

c

0 40 80 120 160 200Number of units

Figure 2.12: Eims (mean ± SE) comparison of STM (green), MTMm (red), and MTMdm (orange) decoders at different numbers of units. Dashed curves: monkey G, solid curves: monkey H. The vertical gray bar indicates the number of units used for the performance reported in Figure 2.8.

during the lifetime of an electrode array implant, we are interested in how the different decoders perform as the number of units varies. We started with 98 units for monkey G and 199 units for monkey H. The neural data in each data set were then randomly divided into two, three, and four disjoint subsets, as described in

Section 2.3. Performance curves as a function of the number of units are plotted

in Figure 2.12 for the STM, M T M m , and M T M dm decoders. The trends seen in

Figure 2.8 were preserved across the range of unit counts tested for both monkeys.

The ranking of the decoders in order of decreasing mean Eims remained STM, M T M m , and M T M d m - Except in one case (M T M m versus M T M dm for monkey H at 198 units), all pairwise comparisons between decoders for a particular monkey and unit count were statistically significant (Wilcoxon paired-sample test, p < 0.01). The vertical gray bar indicates the number of units used for the performance reported in Figure 2.8. As expected, in all cases, the error decreased as more units



STM MTMm MTMdm

Monkey GUnshuffled 18.0±0.19 13.6±0.24 11.0±0.16Shuffled 18.5±0.20 14.3±0.25 11.8±0.17

Monkey HUnshuffled 18.4±0.17 11.5±0.16 10.3±0.14Shuffled 18.5±0.17 12.2±0.17 11.0±0.14

Table 2.1: EIIDS (mean ± SE in mm) comparison of unshuffled and shuffled trajectories. Shuffling was carried out on the decoded trajectories across trials with the same reach goal. The Erms values for the unshuffled case are identical to those appearing in Figure 2.8.

were used.

We previously demonstrated how goal-directed reaches can be decoded by defining a canonical trajectory to each goal and selecting among them based on neural activity (Kemere et al., 2002, 2004b). This can be seen as a special case of the

MTM, where the trajectory model is a mixture of canonical trajectories. Because it defines a unique trajectory to each goal, this model does not capture behavioral

variability, such as reach speed or curvature, across reaches to the same goal. In

settings where only the goal identity is required, we can forgo decoding the contin

uous trajectory and simply place an icon on the decoded goal (Shenoy et ah, 2003; Musallam et ah, 2004; Santhanam et ah, 2006), which will be discussed further in Chapter 3. In this chapter, we are concerned both with decoding the continuous trajectory, as well as capturing trial-by-trial behavioral variability. For example, if the reach speed is faster than usual on a particular trial, this fact should also be reflected in the decoded trajectory. To verify that trial-by-trial behavioral variabil

ity was captured by the MTM decoder, we shuffled the decoded trajectories across trials with the same reach goal. If the decoded trajectories reflected the trial-

by-trial variability of the actual reaches, then we expect the Eims of the shuffled trajectories to be higher than that of the unshuffled trajectories. In cases where the duration of the actual and decoded trajectories differed due to shuffling, Erms was computed by either truncating or padding the decoded trajectory. Table 2.1



compares the Evms of the unshuffled and shuffled trajectories. For the MTMm

and MTMdm in both monkeys, the shuffled trajectories yielded higher Erms than the unshuffled trajectories (Wilcoxon paired-sample test, p < 0.01). The effect of shuffling for the STM was largely washed out by the higher overall jFrms, although it was still present in a weaker form for monkey G (Wilcoxon paired-sample test, p < 0.05). The absolute differences in means between the unshuffled and shuffled cases were rather modest since the actual reaches in the particular datasets used were fairly stereotyped (cf. Figure 2.4A). Nevertheless, these results show that the MTM decoder indeed captured trial-by-trial behavioral variability.

Although beyond the scope of the present report, we have also begun to explore

how the MTM framework performs for larger numbers of reach goals. A data set

with 16 reach goals was collected from monkey H. The goals were arranged in two

rings of eight goals at radii of 70 and 120 mm. A total of 189 single- and multi

neuron units were isolated and 63 trials per reach goal were analyzed. The mean

E ima for the STM, MTMm, and MTMdm decoders were 20.5, 17.6, and 16.2 mm, respectively. Thus, the ranking of the decoders remained the same for 16 reach goals.

2.7 D iscussion

We have presented a mixture of trajectory models framework that provides (1) a

suitable trajectory model for goal-directed reaches, and (2) a principled way to

incorporate information about the identity of the upcoming reach goal. In contrast to current trajectory models, a mixture of linear-Gaussian trajectory models (MTM) can capture the notion of goal-directed control, whereby trajectories start at rest, proceed out to one of M discrete reach goals, and end at rest (cf. Figure 2.4). Because the MTM better describes the dynamics of goal-directed reaches, its decoded trajectories were on average more accurate than those based on the random walk and linear-Gaussian (STM) trajectory models.



Using 98 (99) units for monkey G (H), the MTM decoder with only peri-

movement activity yielded 24 (38)% lower E ims than the STM decoder. Incorporating 200 ms of delay activity further decreased E rms by 14 (7)%. In general,

the accuracy of the goal information increases with the duration of delay activity

observed (Santhanam et ah, 2006), which will be shown in Section 3.1.2. Thus, the benefit of incorporating delay activity can be adjusted based on the desired tradeoff between the time required to obtain the goal information and its accuracy.

Overall, the E rms improved by 39 (44)%. This is the main result of this chapter (cf. Figure 2.8).

Figure 2.12 shows a larger improvement in performance when incorporating delay activity for monkey G than for monkey H. Indeed, using the same window of

delay activity as before, a maximum-likelihood classifer based on (2.23) correctly determined the actual reach goal on 91.6 (80.0)% of the trials for monkey G (H).

This difference can be attributed to the fact that a greater proportion of the units

from monkey G exhibited reach goal-selective delay activity. PMd and Ml are known to lie along a rostrocaudal gradient with more reach goal-selective “planning” activity present rostrally and more execution-related peri-movement activity

present caudally (Riehle and Requin, 1989; Crammond and Kalaska, 2000; Hat- sopoulos et al., 2004). Especially for implants that straddle PMd and M l (as is the case for both monkeys G and H), the proportion of units showing reach goal- selective delay activity may vary from implant to implant. As shown in Figure 2.2, the locations at which the arrays were implanted in the monkeys G and H were similar.

Figure 2.12 also suggests that the MTM decoder is more robust to a loss of units

than the STM decoder. Consider the decline in performance when the number of units for monkey G was reduced by a factor of four from 98 to 24 units. The £j.ms for the STM d ecod er increased by 102%. In con trast, th e Eims for th e M T M m

and MTMdm decoders increased by 58 and 41%, respectively. Because the number of units available on an implant decreases over time due to biological processes (Polikov et al., 2005), decode algorithms that are robust to unit loss are critical for prosthetic applications.



The observed neural activity provides two categorically-different types of information about the arm trajectory to be estimated. One type is informative of the moment-by-moment details of the arm trajectory (dynamic), while the other is informative of the identity of the upcoming reach goal (static). The former is typically extracted from neural activity from motor cortical areas, such as M l and

PMd, during movement (e.g., Moran and Schwartz, 1999; Hatsopoulos et al., 2004).

The latter may be obtained from several possible sources. In the present work,

the goal information was extracted from “planning” activity present in motor and pre-motor cortical areas preceding the reach. The posterior parietal cortex has also been shown to encode reach goals and could serve as a source of goal information (Batista et al., 1999; Shenoy et al., 2003). In addition, there may be events in a

patient’s surroundings that could be indicative of the upcoming reach goal. For example, if the phone rings, the upcoming reach goal is likely to be the phone.

If the moment-by-moment details of the arm trajectory can be decoded perfectly using only neural activity present during movement, then there would be no

need for goal information. However, the moment-by-moment details of the arm

trajectory and the goal identity are each decoded with varying levels of uncertainty.

When both types of information are available, it is desirable to combine them in

a way that takes into account their relative uncertainty and yields a coherent arm

trajectory estimate (Kemere et al., 2004a). Previous approaches either assumed

that there was no variability in the moment-by-moment details of reaches to a given goal (Kemere et al., 2002, 2004b), or employed a switching scheme between the two types of information (Tkach et al., 2005). The MTM framework presented here unifies our previous work (Kemere et al., 2002, 2003, 2004a,b) and provides a principled way to combine the two types of information.

To date, the field of cortical prosthetics has largely been split based on which o f th e tw o ty p es o f in form ation is b e in g used (P esaran e t a l., 2006). While mo

tor prosthetics attem pt to decode the moment-by-moment details of a trajectory (Serruya et al., 2002; Taylor et al., 2002; Carmena et al., 2003), communication (or cognitive) prosthetics seek to decode the intended reach goal (Shenoy et al., 2003; Musallam et al., 2004; Santhanam et al., 2006). By combining the two types



of information, the MTM decoder can be viewed as a way to bridge differences in

the design approach of cortical prosthetics.Based on previous studies, both types of information can likely be extracted

from neural activity present in paralyzed patients. First, motor cortical units can be activated (i.e., emit peri-movement activity) without physical movement and be used to control prosthetic cursors or limbs (Serruya et al., 2002; Taylor et al.,

2002; Carmena et al., 2003). Recently, neural recordings from motor cortex in a tetraplegic patient were used to control a prosthetic cursor (Hochberg et al.,

2006). In all of these studies, moment-by-moment details of the trajectory were

estimated from the available neural activity. Second, it has been shown in PMd (Hatsopoulos et al., 2004; Musallam et al., 2004; Santhanam et al., 2006) and

parietal areas (Shenoy et al., 2003; Musallam et al., 2004) that goal information can be reliably decoded from neural activity without physical movement. In addition,

functional magnetic resonance imaging studies have revealed that motor cortical areas activate similarly in tetraplegics and in healthy humans (Shoham et al., 2001; Glidden et al., 2006). In the present work, we extracted both types of information from the same cortical areas - PMd and M l. The type of information being decoded depends on when the neural activity occurs relative to the reach, which we assumed to be known. In settings where the subject is free to decide when

to reach, it will be necessary to implement a state machine (Shenoy et al., 2003;

Afshar et al., 2005; Kemere et al., 2006) that determines the type of information

that is being conveyed by the neural activity at each time point. Even without an instructed delay period, it has been shown that neural activity present in M l and PMd just before movement onset has similar properties (directional tuning and, in some cases, the activity’s temporal profile) to the activity present during the instructed delay period (Crammond and Kalaska, 2000).

T h e co m p u ta tio n a l requirem ents o f th e M T M decoder sca le rough ly w ith th e

number of reach goals M. Because the computations for each reach goal can theoretically be carried out in parallel, it is possible to set up the MTM decoder

so that its running time remains nearly constant, regardless of the number of

reach goals. Furthermore, the MTM framework is compatible with all current



probabilistic decoding techniques, including the Bayes filter (Brown et al., 1998),

particle filter (Brockwell et al., 2004; Shoham et al., 2005), and Kalman filter

variants (Wu et al., 2004, 2006). It also preserves the real-time properties of its

consituent estimators and, thus, is suitable for prosthetic applications.

Although activity in M l and PMd generally precedes or coincides with movement, a minority of units show activity trailing the associated arm movement (e.g., Paninski et al., 2004). The optimal lags of 26.2 (33.0)% of the 98 (99) units for monkey G (H) were indeed negative (i.e., neural activity trails movement). These acausal units cannot be used for real-time prosthetic applications without incurring a decoding delay. If their activity is related to proprioception, the ac

tivity may altogether be unavailable in disabled patients. We thus excluded the

units with acausal lags from our analyses and found the same trends as in Fig

ure 2.8 across both monkeys. In particular, the ranking of the decoders in order of

decreasing E ims remained STM, MTMm, and MTMdm (Wilcoxon paired-sample test, p < 0.01). These results indicate that the MTM framework could provide

substantial performance benefits in real-time prosthetic applications.


Chapter 3

A Real-Time Communication Prosthesis

Most brain-computer interface (BCIs) translate neural activity into a continuous

movement command, which guides a computer cursor to a desired visual target

(Serruya et al., 2002; Taylor et ah, 2002; Carmena et ah, 2003; Kennedy et ah,

2000; Leuthardt et ah, 2004; Hochberg et ah, 2006; Patil et ah, 2004; Wolpaw and McFarland, 2004). This is the case for the decoders presented in Chapter 2, where we showed how goal information can be combined with moment-by-moment trajectory information to improve the accuracy of the decoded trajectory in an off-line setting. In this case, the neural activity is used to guide a computer cursor to acquire visual targets, each of which represents a discrete action. Examples include typing keys on a keyboard, turning on room lights, or moving a wheelchair in specific directions. For such applications, accurately decoding a continuous

movement command may be of secondary importance. Of primary importance

may be the speed and accuracy with which these discrete actions can be selected. T h is ty p e of BCI is referred to as a com m u n ica tion p rosth esis. In this chapter, we present the design and demonstration of a real-time communication prosthesis that provides manyfold higher performance than previously reported.

Human-operated BCIs are currently capable of communicating only a few letters per minute (~1 bits per second (bps) sustained rate (Wolpaw and McFarland,

48


CHAPTER 3. A REAL-TIM E COMMUNICATION PROSTHESIS 49

2004)) and monkey-operated systems can only accurately select one target every

1-3 sec (~1.6 bps sustained rate (Taylor et al., 2003)), despite using invasive electrodes. Rather than decoding a continuous movement command, a potentially higher-performance approach to is to translate neural activity into a prediction of the intended target and immediately place the cursor directly on that location.

This type of control is appropriate for communication prostheses and benefits

from not having to estimate unnecessary parameters such as continuous trajectory

(Shenoy et al., 2003; Musallam et al., 2004). We conducted a series of experi

ments to investigate how quickly and accurately a BCI could operate under direct end-point control.

As described in Section 2.5, neural activity present during the delay period in dorsal premotor cortex (PMd) has been shown to reliably indicate the upcoming reach target (Yu et al., 2004; Hatsopoulos et al., 2004; Santhanam et al., 2006). Section 3.1 investigates, in an offline setting, how this delay activity might be used to provide fast and accurate targeting information. In Section 3.2, we describe

how to build a real-time communication prosthesis and report our experimentally- observed performance.

The work described in this chapter serves two primary aims. First, we seek

to improve the speed and accuracy with which targets can be selected, thereby increasing the clinical viability of such systems in humans. Second, using a decoding approach, we would like to gain insights into the neural dynamics of motor preparation. In particular, we are interested in how quickly the neural activity reflects the desired target after the visual target is presented, how our ability to decode varies with the duration and placement of the time window over which neural activity is observed, and how quickly the brain can change its motor “plan” from one reach target to another. In Chapter 5, we will adopt a dynamical systems approach to characterizin g th e process o f m otor p rep aration on a single-trial basis.



3.1 Offline system design

The behavioral task and neural recordings described in Section 2.3 are also used here to investigate how the choice of time window over which neural activity is

observed affects decoding performance. We took spike counts in a window of duration Tint during the delay period, starting TskiP after target presentation. For each target location, the distribution of spike counts for each trial was modeled using either a Poisson or multivariate Gaussian distribution, as described in Section 2.5. To decode a reach target, we used maximum-likelihood methods to choose the

most probable reach target m* based on the observed spike counts z

m* = argmax P (m | z), (3.1)m

where P{m | z) is given in (2.23).

3.1.1 Skip Time

The first timing parameter we assessed relates to the time delay between when a visual target is presented on the screen and when PMd neurons have established a reach plan. This time, Tskip, includes i) the time for visual information about the

target to arrive in PMd (50-70 ms), ii) the time for the subject to select among

targets if more than one are present, and Hi) the time for neural activity reflecting the desired target to be generated. Neural activity during these early periods is discarded in the present BCI design, although it will be of central scientific interest in Chapters 4 and 5. Some activity during this period may already be predictive of the desired target, but it is not yet clear how best to decode this information.

Before visual information is relayed to PMd, the measured neural activity in P M d is n o t target-re la ted . W e sou gh t to e stim a te th e tim e required for target-

related information to reach PMd using a decoding approach. We computed singletrial accuracy as a function of TskiP, fixing T nt to 50 ms. Figure 3.1 demonstrates that the neural activity in PMd cannot be meaningfully decoded to predict the reach target until ~75 ms after the target is displayed. This estimate includes a



Monkey G

I 1 I-----------1 I------------ I 1 I

0 100 200 300 400

Skip time (Tskip>) (ms)

Figure 3.1: PMd latency analysis with the single-target instructed-delay task (one reach target was shown out of a possible of 8 locations and the remaining 7 locations are invisible) as a function of TskiP- Performance was calculated by training a Poisson model on all trials in a dataset and computing the leave-one-out cross-validated performance on the same data. The shaded area denotes the 95% confidence interval (Bernoulli process) around the mean performance (embedded line). Dark curves correspond to monkey G (dataset G20040603) and light curves to monkey H (dataset H20041117). Performance was calculated for a constant Tint of 50 ms with varying Tskip.

~16 33 ms delay between when the software sends a request to show the stimulus and when it is actually displayed by the CRT projector. Figure 3.1 also reveals that there is target related information in PMd as early as 50-70 ms after the target

is first cued. It would not be possible to decode the target with above chance

probability otherwise. This rough estimate of latency agrees with neural response plots from other previous studies in PMd (e.g., Crammond and Kalaska, 2000; Kalaska and Crammond, 1995), where some neurons show a change in activity very soon after stimulus onset. This exact latency has further implications for BCI experiments where reach targets are presented in rapid succession. We often observed neurons that spike according to the target location of a previous trial for many tens of milliseconds after the start of a new trial.

A question that frequently arises in visually cued studies such as ours is whether



the neural activity measured during the delay period is related to a reach plan, the visually cued stimulus, or a combination of both. For example, recording from primary visual cortex could provide excellent prospects for decoding the reach target in our single-target task, but a BCI operating on this neural activity would not represent the motor intentions of the subject. To address this issue, we performed a separate control experiment with both monkeys. We presented the monkey with

a multi-target task where all of the eight possible reach locations were shown on every trial, but only one was colored yellow while the rest were colored green or blue. The monkey was trained to reach for the yellow target following the de

lay period. Compared to the single-target task, we found that it took longer in the multi-target task for delay activity to represent the desired target after visual target presentation. Further details can be found in the supplementary materials of Santhanam et al. (2006). Based on those results and the curves shown in Figure 3.1, we chose to use a Tskip of 150 ms.

3.1.2 Integration Time

The integration time, Tint, refers to the duration of the time window during which

spikes are counted for target estimation. Given the Poisson-like noise in the spike timing of cortical neurons, a longer Tint will average away more noise and result in more accurate predictions of reach end-point. However, a longer Tint will also reduce the total number of cursor positionings that can be made per second. Herein lies the fundamental speed-accuracy trade-off that we must optimize in order to increase BCI performance.

To determine the best Tint to be used in BCI experiments, we analyzed the

effect of this parameter on two performance metrics. The first is single-trial ac

curacy, which is the percentage of targets correctly predicted. We found tha t accuracy rises and largely saturates around 85-90% as Tint increases to 200-250 ms. Figure 3.2 illustrates this effect as a function of total trial length, which is defined to be the sum of Tskip (150 ms), Tint (variable) and a small system overhead time associated with decoding and rendering the prosthetic cursor on the screen



10 3

-o

r~200

—1—250 300 350 400

Trial length (ms)

Figure 3.2: Performance curves investigating the dependence on T;nt were calculated from offline experiment H20041118 (8-target configuration). The trial length WaS T kip Pint “I" Tdec+rend with Tskip — 150 mS and Tdec+rend ~ 40 HIS. 7 int WUS varied and performance was computed. Performance metrics were very consistent day after day and between monkeys (data not shown). The theoretical maximum ITRC in bps, assuming 100% accuracy regardless of Tint, is plotted as the dotted red curve.

(Tdec+rend ~ 40 ms). Should a minimum level of single-trial accuracy be required for a particular application, a corresponding minimum Tint can be chosen.

The second performance metric is information transfer rate capacity (ITRC, in bps). This quantity measures the rate at which information is conveyed from the subject, through the BCI, to the environment (Taylor et al., 2002). It is the information per trial, which is closely related to single-trial accuracy, divided by the total trial length. The information per trial is given by the well-known channel capacity introduced by Shannon (Shannon, 1948). The channel capacity is the

quantity that maximizes the mutual information between the target presented and the target estimated and it provides a theoretical upper-bound on the amount of information that can be transmitted through a communication system, based solely on the error pattern of the communication channel. In our experimental setup, the communication system consists of a source (PMd) planning a reach to a given target (the communication symbol to be transmitted). The receiver



(our prosthetic system) decodes the neural activity and produces an estimate of

which symbol was transmitted (our estimate of the reach target). To estimate the information transmission rates afforded by our prosthetic system, we calculated

the information capacity using the Blahut-Arimoto algorithm (Cover and Thomas, 1990).

Figure 3.2 shows the information transfer rate versus trial length. As expected, the bits per trial generally increases with increasing trial length since longer trials allow for larger integration windows and increased accuracy. However, the plot also demonstrates an interesting tradeoff in the ITRC. The optimal ITRC occurs at short trial lengths, despite relatively low single-trial accuracy at these trial lengths. In other words, it is critical to keep Tint brief, even at the expense of

accuracy. The highest ITRC is 7.7 bps at a total trial time of 260 ms, which

corresponds to a Tint of 70 ms (Tskip = 150 ms, Tdec+rend = 40 ms). This is due to

diminishing returns: beyond some optimal point, the subsequent gain in accuracy (and bits per trial) for each additional millisecond of Tint is so small that the ITRC

begins to decrease. The ITRC vs. trial length curve reveals the optimum Tint for maximum information transfer.

3.2 Online performance

The performance curves in Figure 3.2 are extrapolations using experimental data

from individual trials that had long delay periods and long times between trials.

The offline experiments as designed cannot accurately assess a sustained information rate. It is quite possible that if targets are displayed rapidly - immediately after the preceding trials reach target has been estimated - that the performance of the system can be significantly compromised. For example, the reach plan region o f th e brain m ay sim p ly n o t b e ab le to keep up w ith such a fast p ace o f target

presentation.

To measure directly the ITRC performance when actually presenting trials at high speeds, we conducted a series of BCI experiments using a real-time system

capable of rapidly decoding neural information. BCI experiments began with the



Touch hold Trial #1 Trial #2 Trial #3 Real reach

0'

BBBBJllL IM .'111 .1! .I I I IL'.LM,

f l i

360 '

TD£Q.

*. .•V'r' • . ............. ..

i ■*.■»• tJ ' * , ,y

H ]

*3200 ms

Figure 3.3: Chain of three prosthetic cursor trials followed by a standardinstructed-delay reach trial. Ts p is denoted by the orange parts of the time line. Neural activity was integrated (Tint) during the purple shaded interval and used to predict the reach target location. After a short processing time (Tdec+rend ~ 40 ms), a prosthetic cursor was briefly rendered and a new target was displayed. The dotted circles represent the reach target and prosthetic cursor from the previous trial, both of which were rapidly extinguished before the start of the trial indicated. Large ellipses draw attention to the increase in neural activity related to the peripheral reach target. Trials shown here are from experiment H20041106.1 with monkey H.

collection of delay-period activity preceding reaches to different target locations (Figure 2.3) and fitting statistical models to the activity (model training). Then, during BCI prosthetic cursor trials (Figure 3.3), the intended target was decoded and a circular cursor was rendered on the screen at the predicted location. If the

prediction was correct, an auditory tone was delivered and the next target was

displayed immediately. If the prediction was incorrect, the trial was considered a

failure and aborted, or the monkey was allowed to make a real reach to the target. Real reach trials were interspersed to ensure that the monkey remained engaged in the task. In this manner, a sequence of high-speed prosthetic cursor trials could be generated. Figure 3.3 illustrates three successful prosthetic cursor trials followed by a standard real reach trial.

Using this paradigm, we varied the number of locations at which a target could



Monkey Number of targets Accuracy (%) Trials per second bps

H 2 94.3 3.5 2.44 94.5 2.8 4.78 68.9 3.5 6.516 51.1 2.9 6.4

G 2 84.2 3.6 1.34 93.0 2.5 3.88 76.8 2.5 5.316 26.4 2.2 3.1

Table 3.1: BCI experiments with highest ITRC for monkeys H and G. Each row lists the experiment with highest performance (ITRC) for a given target layout. Other experiments yielded higher single-trial accuracy or involved faster cursor rates, but did not achieve the highest ITRC for the corresponding target layout (not shown).

appear on any given trial. This allowed task difficulty to be varied, which contributes to the ITRC metric. Performance values were calculated by averaging data from several hundred trials per condition. Table 3.1 lists the highest ITRC

results during BCI experiments with 2, 4, 8 or 16 targets. The best overall per

formance was achieved with the 8-target task (6.5 and 5.3 bps, monkeys H and

G). This performance corresponds to typing ~15 words per minute with a basic alphanumeric keyboard.

Although a sustained performance rate of 6.5 bps is approximately four times greater than reported previously, it is lower than the extrapolated result (7.7 bps). Furthermore, the ITRC peak was expected at a total trial length of 260 ms, but our BCI experiments yielded 5 bps with this timing (monkey H). These discrepancies are due to the limitations inherent when using offline experiments to extrapolate performance for speeds at which the subject must quickly recognize new targets and rapidly change neural activity (that is, change reach plans).

Having confirmed that large BCI performance gains are possible with a direct

end-point control strategy, we investigated two additional performance aspects. First, we varied Tmt in BCI experiments with monkey H to verify experimentally the trends seen in Figure 3.2. Figure 3.4 also demonstrates an increase in single-trial



2 targets 4 targets 8 targets 16 targets

+ + +

Q100 7

8

oCG

CO

O 60

r

30 0200 300 400 500 200 300 400 500 200 300 400 500 200 300 400 500

Trial length (ms) Trial length (ms) Trial length (ms) Trial length (ms)

Figure 3.4: Performance measured during BCI experiments. Performance is plotted for each target configuration and across varying total trial lengths. Each data symbol represents performance calculated from one experiment (many hundreds of trials). Across target configurations, single-trial accuracy decreases and ITRC increases as more targets locations are used.

accuracy with increasing trial length (black curves) as well as a peak in each ITRC

curve (red curves). These results reveal how two or four target tasks restrict ITRC

by virtue of the lower number of maximum bits per trial (1 and 2, respectively).

Furthermore, given the numbers of neural units available in these experiments, it appears that ITRC is approaching a saturation point beyond which adding more target locations may not produce an appreciable increase in performance (doubling targets from 8 to 16 does not increase ITRC; although the latter layout requires distance tuning, which is known to be weaker than direction tuning (Messier and K alaska, 2 0 0 0 )). Additional target locations should improve ITRC when more neurons are available.

Second, a common concern for BCIs such as ours is that as the electrode im

plant ages the number of recordable neurons declines, leading to a drop in overall

performance (Polikov et ah, 2005). To investigate the impact of neuronal loss,



we performed analyses of single-trial accuracy and ITRC using data from offline

experiments. As expected, single-trial accuracy falls as neuron ensemble size de

creases. However, it is possible to compensate partially for this performance loss

by increasing Tint; BCI speed may be compromised as a result, but single-trial

accuracy can be preserved (data not shown). Figure 3.5 plots ITRC as a function of the number of neural units and Tint. For small ensembles (for example, 20 neurons), the ITRC peaks at Tint ss 120 ms but does not decline sharply as Tmt is further increased; accuracy (and bits per trial) is increasing so as to offset the longer trial times. For larger ensembles, the information content at small Tjnt is relatively high such that further lengthening Tint has a dramatic effect on ITRC.

Finally, plotting the Tint value that maximizes ITRC (Figure 3.5 inset) for each

ensemble size illustrates that the maximum ITRC is achieved with small Tint (60-

130 ms), over a broad range of ensemble sizes. Thus, high-performance BCIs may

require far shorter trials than previously explored.Using a direct end-point control strategy, we report here a greater than four

fold (6.5 versus 1.6 bps) increase in BCI performance compared to recent studies. Performance is calculated in a conservative fashion, as the entire trial time

(Tskip + Tint + Tdec+rend) was used; had just Tjnt been used, as is often done, the maximum ITRC would have been 28.4 bps, but this does not reflect an achievable selection rate. As described previously, our system differs from continuous BCI approaches in several ways, which might account for the performance gain. Additionally, continuous BCIs attem pt to move the cursor well enough-although at the expense of speed (1-3 sec per selection)-to avoid making errors for a given

target selection. Conversely, the direct end-point control reported here need not

correct errors within a given selection because these errors can be rectified with

rapid follow-on selections.O ur perform ance resu lts far exceed electroen cep h alogram (E E G )-b a sed non-

invasive system performance, and helps motivate the use of invasive, electrodebased systems in clinical BCIs. Although at its fastest, this direct end-point control BCI demonstrates selection speeds (~3.5 trials per second) on a par with saccadic eye movements, the ITRC for saccades is much higher due to their exceptional



160 ■

cn

§ 120 - "cO3Oco 80 -*_CDJD§2 40 ■

0 ■

T^t (ms)

Figure 3.5: ITRC as a function of number of neural units and T;nt. All data are from experiment H20041118, which used an 8-target configuration and contained over 1,300 trials. Ts p was fixed at 150 ms. The main panel shows contours of ITRC (bps) as a function of the number of neural units available and Tint. The inset shows the value of Tint that achieves the maximum ITRC for each neural ensemble size that we tested. Similar results were obtained for data set G20040508 from monkey G.

precision and accuracy. Whereas eye or even speech control may be effective in specific settings, BCIs attempting to restore lost motor function must rely on the natural neural signals if they are to avoid commandeering and interfering with another motor modality.

In Chapter 2, we considered the estimation of a continuous arm trajectory.

In the present chapter, we sought to predict the discrete reach endpoint. Under both modes of operation, we have shown how delay period neural activity can be used to advance the state-of-the-art in prosthetic system design. In both cases,

we have intentionally ignored the first 150 ms (TskiP) of delay activity after target presentation to allow i) the visual information about the target to arrive in PMd

140

100

0 50 100Numberof neural units

(bps)

I------------------1----------------- 1----------------- 1----------------- 1----------------- 1----------------- 10 100 200 300 400 500 600



and ii) the neural activity reflecting the desired target to be generated. The next

two chapters investigate how the motor plan forms during these first few hundred milliseconds after target presentation. This would not only contribute to our basic understanding of how the process of motor preparation unfolds over time, but may also allow us to design even higher performance prosthetic systems.


Chapter 4

Neural Variability and Motor Preparation

In Chapter 3, delay activity was used to drive a real-time BCI. The fact that the

delay activity was tuned for reach target allowed us to reliably decode the intended target. Figure 3.1 shows that our ability to decode the intended target increases shortly after target presentation. This indicates that, at the beginning of the delay period, the neural activity is changing in such a way so as to reflect the desired target. In this chapter, we investigate why motor preparation takes time and how

its progress might be reflected in the neural activity.

Motor preparation is often studied using an instructed delay behavioral task (cf. Section 2.3), where a variable-length “planning” period temporally separates

an instruction stimulus from a go cue (Tanji and Evarts, 1976; Weinrich et ah, 1984; Godschalk et ah, 1985; Riehle and Requin, 1989; Crammond and Kalaska,

2000; Messier and Kalaska, 2000). Longer delay periods typically lead to shorter reaction times (RT, defined as time between go cue and movement onset), and th is h a s b een in terp reted as ev id en ce for a m otor p rep aration p rocess th a t tak es

time (Rosenbaum, 1980; Riehle and Requin, 1989, 1993; Crammond and Kalaska, 2000). In this view, the delay period allows for motor preparation to complete prior to the go cue, thus shortening the RT.

Neurons in a number of brain areas, including dorsal premotor cortex (PMd),

61


CHAPTER 4. NEURAL VARIABILITY AND MOTOR PREPARATION 62

exhibit activity during the delay (Tanji and Evarts, 1976; Weinrich and Wise, 1982; Weinrich et al., 1984; Godschalk et al., 1985; Kurata, 1989; Riehle and Requin,

1989; Snyder et al., 1997). Delay period activity is typically tuned for the instruction and can be predictive of RT (Riehle and Requin, 1993; Bastian et al., 2003).

For example, the delay activity for the unit shown in Figure 2.3B is greater be

fore leftward than rightward reaches. Electrical disruption of that activity largely erases the RT savings earned during the delay (Churchland and Shenoy, 2006). It is therefore suspected that delay activity is the substrate of motor preparation occurring at tha t time (Wise, 1985; Riehle and Requin, 1993; Bastian et al., 2003).

It is currently unclear why the process of motor preparation takes time and how its progress is reflected in the neural activity. One possibility is that the neural

activity must rise above a threshold to trigger the movement, as seems likely for

saccades (Carpenter and Williams, 1995; Hanes and Schall, 1996; Roitman and

Shadlen, 2002). An instructed delay could allow activity to approach threshold, shortening the subsequent RT (Erlhagen and Schoner, 2002). Supporting this “rise-

to-threshold” hypothesis, higher firing rates are often associated with shorter RTs (Riehle and Requin, 1993; Bastian et al., 1998, 2003), although Crammond and Kalaska (2000) found that peak firing rates after the go cue (when the movement is presumably triggered) were on average lower after an instructed delay.

An alternate hypothesis, illustrated in Figure 4.1, assumes that the movement produced is a function of the state of preparatory activity, at the time some trig

ger is applied. For each possible movement, there would be an “optimal” subspace of firing rates, appropriate to generate a sufficiently accurate movement. Motor

preparation might therefore be an optimization whereby firing rates are brought from their initial state to the appropriate subspace. Activity might drift somewhat while waiting to execute, but motor preparation would remain complete as long as firing rates rem ain w ith in th e o p tim a l su b sp ace. T h e m ost ob v iou s p red iction s

of this hypothesis are trivially true: delay-period firing rates occupy a smallish subspace (of the total space possible), and this subspace is different for each instructed movement. However, is there evidence that the brain actively attempts to bring firing rates to that subspace? Is some penalty paid, perhaps a longer RT,


CHAPTER 4. NEURAL VA RIABILITY AND MOTOR PREPARATION 63

neuron 3

left reachright reach

trial 1

trial 2

--------neuron 2

firing rate, neuron 1

Figure 4.1: Illustration of optimal subspace hypothesis. Each axis represents the underlying firing rate of one neuron; only three of them are drawn. Different movements have different optimal subspaces (shaded areas). For different trials, the process of motor preparation (arrows) may take place at different rates, along different paths, and from different starting points.

if firing rates are elsewhere? We show that these questions can be addressed by

measuring the variability of firing rates.

4.1 M easuring neural variability

Many of our analyses rely on the measurement of neural variability, across trials of the same type, made as a function of time. A central assumption of this approach is that the measured variability is attributable to both cell-intrinsic variability in

spike production and to variability in the underlying firing rate on each trial. Our goal was to isolate the latter, as best as possible, by normalizing with respect to

the estimated contribution of the former. To do so, we compute the variance of firing rate across tr ia ls and norm alize by the mean firing rate, all as a function of time. We term the resulting measurement the normalized variance (NV). The logic behind this metric is as follows. Intrinsic spiking variability is thought to be near Poisson for cortical neurons, so that its variance scales linearly with mean

firing rate. Thus, if the measured across-trial variability were attributable solely to


CHAPTER 4. NEURAL VA RIA BILITY AND M OTOR PREPARATION 64

intrinsic spiking variability (i.e., the underlying firing rate were identical on each

trial), the NV should be unity. In the presence of variability in underlying firing rate, the NV should be greater than unity. In particular, we were interested in

whether variability in underlying firing rate declined during the course of the trial (Figure 4.1). In this case, the NV should decline from above one to near one. The

simulations in Figure 4.2 illustrate that the NV behaves as expected for a simulated neuron with Poisson spiking statistics. When the underlying firing rate is the same on every trial (black trace at top), the NV (black trace at bottom) remains near

unity throughout the trial and is largely unaffected by changes in mean firing rate.

When the underlying firing rate is initially variable across trials (gray traces at

top), the NV is initially elevated (gray trace at bottom) and declines to unity as firing rates become consistent.

To compute the NV, the spikes of each trial were smoothed with a Gaussian (SD of 30 ms) to estimate rate (in spikes/s) as a function of time. The basic unit of analysis was the “set” of trials recorded from one isolation for one target condition (by target condition, we simply mean target location). For each such set, the NV was computed as follows

y ' ' " ( n r i a i ( 0 - r ( Q ) 2

NV(t) = c • ^ trial=1 . " - 1------ (4.1)r(t)

where r triai is the firing rate on tha t trial and f is the mean firing rate across all trials in that set. The NV was then averaged across all sets of trials. The numerator in (4.1) is the across-trial variance (spikes2 / sec2) and the denominator is the mean firing rate (spikes / sec). The constant c scales the NV so that, like the Fano factor, it will be unity for a neuron with Poisson spiking statistics and the same underlying rate on every trial. The value of c depends on the filter used to smooth the spike trains (c = 0.109 for the 30 ms Gaussian filter). For a “box” filter, c is equal to the filter length in seconds, and the NV is mathematically identical to the Fano factor.

Despite its similarity to NV, the Fano factor has typically been used to gauge in

trinsic spiking variability (Tolhurst et al., 1983; Gur et al., 1997; Bair and O’Keefe,


CHAPTER 4. NEURAL VA RIA BILITY AND MOTOR PREPARATION 65

target80

2

0J

>z1J

200 ms

Figure 4.2: Simulations illustrating how an increasing consistency in across-trial firing rate could be detected using the NV metric. Simulations were based on the mean firing rate of one recorded neuron (solid black trace at top). Baseline activity was artificially extended (to the left) to allow longer simulations. For each of 10,000 simulated trials, spike trains were generated using Poisson statistics. Two versions of the simulation were run. For the first version, the underlying firing rate was identical (black trace at top) on all simulated trials. The resulting NV is shown by the black trace at the bottom. For the second version, each trial had a different underlying firing rate, generated by adding noise, filtered with a 30 ms SD Gaussian, to the mean. The magnitude of this noise decayed with an exponential time constant of 200 ms after target onset. Ten examples of the resulting underlying firing rates are shown in gray at top, and the resulting spike trains (computed with Poisson statistics, with the time-varying mean taken from the gray traces) are shown in the rasters. The NV computed from 10,000 such spike trains is shown by the gray trace at the bottom.

1998; Averbeck and Lee, 2003); additional sources of variability are excluded if possible. Here, any additional variability is the quantity of interest. As with the Fano factor, the NV remains at unity regardless of the magnitude of firing rate, assuming that intrinsic spiking statistics are Poisson and that firing rates are consistent across trials. Thus, to the degree that this assumption holds, changes in the NV can be attributed to changes in the variability of firing rates across-trials. We also


CHAPTER 4. NEURAL VA RIA BILITY AND MOTOR PREPARATION 66

note that the NV is somewhat forgiving of violations of the Poisson assumption.

It will remain constant regardless of firing rate as long as there is a constant ratio between spiking variance and spike rate. The slope need not be 1, as for a Poisson

process. Indeed, we did sometimes observe (especially around the time of movement onset) values of the NV < 1, indicating that spiking can be more regular

than Poisson.

4.2 Tim ecourse of neural variability

A typical assumption made during the analysis of extracellular recordings is that the spikes observed on a given trial provide a noisy measurement of an underlying process (an underlying firing rate) that is similar on every trial of that type. This assumption might seem to apply nicely to the delayed reach task (cf. Section 2.3) used here. During the time before target onset, we required that the hand be

held steady at the central touch point. From an outwards perspective, behavior

is essentially identical on every trial. Yet before target onset, no specific demands

have yet been placed on the circuits devoted to motor preparation. Activity related to motor preparation might therefore be quite variable across trials. If, after

target onset, firing rates are brought to a relatively compact optimal subspace, as suggested in Figure 4.1, then firing rate variability should be reduced. Thus, the first key prediction of the optimal subspace hypothesis is that firing rate variability, measured across trials, will decline after target onset. The rate of this decline should be approximately related to the rate at which firing rates reach the optimal subspace, i.e., the rate of motor preparation.

Figure 4.3 shows the NV applied to neural data recorded from monkey G (cf.

Section 2.3). These measurements were computed across all isolations (47) and target conditions. The NV (±SEM, computed across isolations/ target locations) declined after target onset, remained at a rough plateau during the delay, and fell again after the go cue. From 200 ms before target onset to the median time of the go cue, the NV declined 24% (t-test, p < 10-10). By movement onset it had declined 38% (p < 10-10). The initial decline spanned ~119 ms. NV time courses


CHAPTER 4. NEURAL VA RIA BILITY AND M OTOR PREPARATION 67

1.5NV

1

monkey G

hand speed

gotarget 200 ms move

Figure 4.3: NV timecourse. Black trace: mean ± SEM across all isolations and target conditions. Gray traces: mean absolute hand speed. Two temporal epochs are shown, aligned to target and movement onset times (black arrows). The small solid histogram shows the distribution of go cue onset times, reflecting the fact that RTs are variable. Monkey G (816 trials, 47 isolations: 14 single- and 33 multi-unit).

from two other monkeys performing the same, or similar, tasks shared all essential

features and were quantitatively quite similar (Churchland et al., 2006b).The NV reveals a previously unknown degree of temporal structure in the

variability of neural activity during a delayed reach task. Because the NV is a

measurement of across-trial firing-rate variability, the most natural interpretation

is that there is a decline in the across-trial variability of the underlying firing rates.

Might there be a trivial explanation for this result? Might the decline result from

saccadic behavior? Might the decline instead be related to small arm movements? Might it result from a change in intrinsic spiking variability? Finally, might the decline in the NV be due to more regular spiking caused by some unknown network mechanism, perhaps the locking of spiking to a central rhythm? We have thoroughly explored these potential concerns with a combination of control experiments and data analysis and ruled out these possibilities, to our satisfaction (Churchland et al., 2006b).

T h e in itia l d eclin e in th e N V con su m ed 9 8 -1 9 8 m s d ep en d in g on th e m onkey

and dataset. This is similar to the timecourse of the decline in RT with delay-period duration (100-200 ms) (Churchland et al., 2006b), as well as the timecourse of the increase in decoding performance with TSkip (Figure 3.1). Our interpretation would be that the height of the N V indicates the approximate degree of motor preparation


CHAPTER 4. NEURAL VARIABILITY AND M OTOR PREPARATION 68

yet to be accomplished. Shortly after target onset, firing rates are frequently far

from their mean. If the go cue arrives then, it will take time to correct these

“errors” and RTs will therefore be longer. By the time the NV has reached its plateau at a lower level, firing rates are consistently near their mean (which we presume is near the optimal subspace), and RTs will be shorter if the go cue arrives then.

Admittedly, this interpretation rests on some assumptions. First, it assumes

that the increasing consistency of firing rates with time reflects their increasing

accuracy (i.e., their increasing tendency to occupy the optimal subspace, whose

boundaries cannot be easily inferred using current methods). This will be discussed

in Section 4.3. Second, it assumes that there is a limit on the rate at which firing

rates approach their putatively optimal values, such that progress before the go cue shortens the subsequent RT. This will be tested in Section 4.4.

4.3 R elationship to reaction tim e variability

The results above indicate that firing rates become more consistent after target onset. We chose to look for this increase in consistency because we suspected it might reflect an increase in accuracy: an increase in the average occupancy of an optimal subspace of firing rates. This hypothesis makes the following prediction,

illustrated in Figure 4.4A. For some trials, the configuration of firing rates around

the time of the go cue will lie within the optimal subspace (green dots, motor

preparation is complete) and RTs will be short. For other trials, the configuration

of firing rates will lie outside the optimal subspace (red dots, motor preparation is “sloppy” or incomplete) and RTs will be longer. Because of both a lack of adequate theory regarding the “representation” in PMd and the difficulty of making su ffic ien tly d eta iled and ex ten siv e m easurem ent o f th e tu n in g p rop erties o f individual neurons, it is not currently possible to estimate the location or extent of

the optimal subspace with any degree of confidence. Nevertheless, we can exploit the fact that the variability of firing rates, as indexed by the NV, is expected to

be greater for the long RT trials. This is perhaps the most striking prediction of


CHAPTER 4. NEURAL VARIA BILITY AND M OTOR PREPARATION 69

B

« long RT's

short RTs

1.4 n

long RT's

sh o rt RT's

-5J

target100 ms

Figure 4.4: Relationship of the NV to natural RT variability. A: A prediction of how RT might relate to firing rate given the optimal-subspace hypothesis. The shaded area represents the optimal subspace for the movement being prepared, as in Figure 4.1. Each dot corresponds to one trial and represents the configuration of firing rates at the time of the go cue. For some trials, that configuration may lie within the optimal subspace (green dots), leading to a short RT. For other trials, the configuration may lie outside (red dots), leading to a longer RT. B: Red and green traces show the NV, around the time of the go cue, for trials with RTs longer and shorter than the median. Traces at bottom show the mean percentage difference (short — long, ±SEM) in the NV (black). Data were pooled across the recordings from 7 days (monkey G), including all trials with delay periods >200 ms.

the optimal-subspace hypothesis: that neural variability before the go cue ought to predict behavior after the go cue. Importantly, such a connection should be

detectible (given sufficient data) without needing to know the “representational scheme” used by PMd, as long as we can assume that neural activity is on average nearly optimal.

Figure 4.4B plots the NV, around the time of the go cue, for trials with RTs longer than the median (red, long RT trials) and shorter than the median (green, short RT trials). For statistical power, we collapsed data across all seven datasets from monkey G (36-60 isolations per day, yielding 174,725 total neuron trials; data collapsed after segregating long vs short RTs within each dataset). Consistent with


CHAPTER 4. NEURAL VARIA BILITY AND M OTOR PREPARATION 70

the above prediction, short RT trials had less variability in firing rate around the time of the go cue. The black trace at the bottom plots the mean percentage difference in the NV. This was computed by taking the percentage difference between the NV for the short and long RT trials for each isolation/condition and then computing the mean and SE.

Counter to the “rise-to-threshold” hypothesis, we found an inconsistent rela

tionship between the magnitude of mean firing rate and RT. In Figure 4.4B, the

mean percent difference in firing rate (blue) is negative, indicating that RTs were shorter when firing rates were lower. This was a significant effect (p < 0.05, t test) when data were collapsed across the delay period, and also when they were collapsed across the 200 ms period after the go cue. For monkey B, the opposite effect was found in both periods, but neither case was statistically significant.

4.4 Relationship to tim ecourse of m otor prepa

ration

To test the assumption that there is a limit on the rate at which firing rates approach their putatively optimal values, we compare the rate of decline in the NV for trials with different delay durations. We collected a dataset from monkey G using

three discrete delay-period durations (30, 130, and 230 ms, randomly interleaved), intended to interrupt motor preparation at varying degrees of completeness. Figure 4.5A shows the NV computed for the three delays, aligned to the onset of the go cue. The rate of decline in the NV is similar for the three delay durations. As a consequence, at the time of the go cue, the NV for the 230 ms delay has dropped to

a plateau, whereas the NV for the 30 ms delay does not reach the same point until ~80 ms later, potentially explaining why mean RT is longer. Figure 4.5C plots RT versus the NV at the time of the go cue for the three delays. The relationship increases monotonically. Thus, the height of the NV at the time of the go cue is predictive of RT, as would be expected if it reflected the average degree of motor preparation yet to be accomplished.


CHAPTER 4. NEURAL VARIABILITY AND M OTOR PREPARATION 71

Bgo

firing rate (spikes/s)

r 1.5130 ms 30 ms

NV230 ms deiay

target 100 ms movement onset

■ X -X.11 ■£■30 ms delay

r 350

- CC c (Co Et-275

0 8(24'Change in rate (spikes/s) by go cue

30 ms delay

130 ms

r 350

230 mst-275

NV at go cue 1.3

Figure 4.5: Analyses with three different delay-period durations: 30, 130 and 230 ms. Data are from one day using monkey G (39 isolations, 957 trials). A: Change in mean firing rate (±SEM) from baseline (top) and NV±SEM (bottom) for each delay-period duration. Analysis was performed with data aligned to the go cue. B: Mean RT versus the change in firing rate from baseline, measured at the time of the go cue for the three delay-period durations. Black symbols plot the mean change averaged across all neurons and conditions. Gray symbols plot the same analysis but including only the preferred condition of each neuron. Note that the x-axis has been rescaled in the latter case. C: Mean RT versus the NV, measured at the time of the go cue for the three delay-period durations. Error bars show SEM.

Contrary to the “rise-to-threshold” hypothesis, Figure 4.5B shows that there was no simple relationship between RT and mean firing rate at the time of the go cue. This was true whether we considered all conditions (black) or just preferred conditions (gray). Note that this would also have been true had we considered firing rate at some fixed time (e.g., 100 ms) after the go cue. At that point, the 30 ms delay (which produced the longest RTs) produced the highest firing rates (Figure 4.5A, top). At no time after the go cue were firing rates highest for the

230 ms delay duration, although it produced the shortest RT.


CHAPTER 4. NEURAL VA RIABILITY AND MOTOR PREPARATION 72

The NV reveals a previously unknown degree of temporal structure in the vari

ability of neural activity during a delayed reach task. Unlike most studies of motor

preparatory neural activity to date, we attempted to uncover the mechanisms un

derlying tha t activity on a systems level. We have proposed an optimal subspace hypothesis in which the process of motor preparation is viewed as solving an op

timization problem. While our data appear to be consistent with predictions that follow from the optimal subspace hypothesis, the NV does not allow us to produce plots like Figure 4.1 from real neural data. The reason is that the NV tracks the average progress of motor preparation across multiple trials. In Chapter 5, we

seek to develop the statistical tools that may enable the extraction of single-trial

trajectories and optimal subspaces directly from the neural data.


Chapter 5

Extracting Dynamical Structure Embedded in Neural Activity

In Chapter 4, the NV revealed an average process of settling during the delay

period by measuring the convergence of firing across different trials. However,

it provides little insight into the course of motor planning on a single trial. A gradual fall in trial-to-trial variance might reflect a gradual convergence on each trial, or might reflect rapid transitions that occur at different times on different trials. All the NV tells us about the dynamic properties of the underlying network is the basic fact of convergence from uncontrolled initial conditions to a consistent pre-movement preparatory state. The structure of any underlying attractors and corresponding basins of attraction is unobserved. Furthermore, the NV is first

computed per-unit and averaged across units, ignoring any structure that may be present in the correlated firing of units on a given trial.

In this chapter, we develop techniques that can be used to characterize the process of motor preparation on single trials. These techniques exploit the simulta n e ity o f m u lti-e lectro d e recordings and can also provide in sig h ts into th e sy stem

dynamics (e.g., the existence of underlying attractors and the structure of the corresponding basins of attraction). In Section 5.1, the concept of latent variable models in introduced. Section 5.2 defines a non-linear dynamical model capable of expressing the rich behavior expected of neural systems. Sections 5.3-5.6 detail a

73


CHAPTER 5. EXTRACTING DYNAM ICAL STRUCTURE 74

novel learning algorithm developed to fit this model to neural data. These tech

niques are then applied to simulated and real neural data in Sections 5.7 and 5.8, respectively.

5.1 Latent variable m odels

The usual way to characterize neural dynamics is to average responses from dif

ferent trials, and study the evolution of the peri-stimulus time histogram (PSTH).

Unfortunately, such averaging can obscure important internal features of the response. For the delayed reach task described in Section 2.3, the presentation of

the reach target provides the trigger for the delay period neural activity, but the resulting timecourse of the response is internally regulated and may not be identical on each trial. In this case, the PSTH may not reflect the true trial-by-trial dynamics. For example, a sharp change in firing rate that occurs with varying latency might appear as a slow smooth transition in the average response.

An alternative approach is to adopt latent variable methods and to identify

a hidden dynamical system that can summarize and explain the simultaneously-

recorded spike trains. The central idea is that the responses of different neurons reflect different views of a common dynamical process in the network, whose effective dimensionality is much smaller than the total number of neurons in the network. While the underlying state trajectory may be slightly different on each trial, the commonalities among these trajectories can be captured by the network’s parameters, which are shared across trials. These parameters define how the network evolves over time, as well as how the observed spike trains relate to the network’s state at each time point.

Dimensionality reduction in a latent dynamical model is crucial and yields b en efits b eyon d sim p le noise e lim in ation . Som e o f these benefits can be illustrated by a simple physical example. Consider a set of noisy video sequences of a bouncing ball. The trajectory of the ball may not be identical in each sequence, and so simply averaging the sequences together would provide little information about the dynamics. Independently smoothing the dynamics of each pixel might identify



a dynamical process; however, correctly rejecting noise might be difficult, and in

any case this would yield an inefficient and opaque representation of the underlying physical process. In contrast, a hidden dynamical system account could capture the

video sequence data using a low-dimensional latent variable that represented only the ball’s position and momentum over time, with dynamical rules that captured the physics of ballistics and elastic collision. This representation would exploit shared information from all pixels, vastly simplifying the problem of noise rejection, and would provide a scientifically useful depiction of the process.

The example also serves to illustrate the two broad benefits of this type of

model. The first is to obtain a low dimensional summary of the dynamical trajec

tory in any one trial. Besides the obvious benefits of denoising, such a trajectory

can provide an invaluable representation for prediction of associated phenomena.

In the video sequence example, predicting the loudness of the sound on impact

might be easy given the estimate of the ball’s trajectory (and thus its speed), but would be difficult from the raw pixel trajectories, even if denoised. In the neural case, behavioral variables such as reaction time might similarly be most easily predicted from the reconstructed trajectory. The second broad goal is systems

identification: learning the rules that govern the dynamics. In the video example

this would involve discovery of various laws of physics, as well as parameters de

scribing the ball such as its coefficient of elasticity. In the neural case this would

involve identifying the structure of dynamics available to the circuit: the number and relationship of attractors, appearance of oscillatory limit cycles and so on.

The use of latent variable models with hidden dynamics for neural data has, thus far, been limited. Small groups of neurons in the frontal cortex have been modeled using hidden Markov models, in which the latent dynamical system is assumed to transition between a set of discrete states (Abeles et al., 1995; Gat et al., 1997). In ad d ition , a s ta te sp ace m od el w ith linear h id d en d yn am ics and p o in t-

process outputs has been applied to simulated data (Smith and Brown, 2003). While this previous work lays the groundwork for extracting dynamical structure from spiking activity, the latent models used cannot capture the richness of dynamics that recurrent networks exhibit. In particular, systems that converge toward



point or line attractors, exhibit limit cycle oscillations, or even transition into

chaotic regimes have long been of interest in neural modeling. If such systems are

relevant to real neural data, we must seek to identify hidden models capable of reflecting this range of behaviors.

5.2 Hidden non-linear dynamical system

The results from Chapter 4 suggest that the network underlying motor preparation

exhibits rich dynamics. Activity is initially variable across trials, but appears to settle during the delay period. A useful dynamical system model capable of expressing the rich behavior expected of neural systems is a fully-connected recurrent network (RN) with Gaussian perturbations

time t = 1 , . . . , T, k E M is related to the time constant of the network, W € MpXp is a connection weight matrix, and Q € W'xp is a covariance matrix. The function

activation function that acts element-by-element on its vector argument. In the following sections, g is taken to be the error function (Figure 5.1) defined by

It is one of a family of sigmoid activation functions that yield similar behavior in a RN. We chose the error function because it made parts of the fitting algorithm described in Section 5.6 analytically tractable. The initial state is Gaussian- distributed

X* I x*_! ~A/’(f (xt_ i) , Q)

f(x) = (1 — A:) • x + k • W • g(x),(5.1)

where the state x t € Mpx'J is a vector of the node values in the recurrent network at

f : W xl -y Rpxl defines the non-linear state dynamics and g is a non-linear

(5.2)

X! ~ A / ’(pi, V i ) , (5.3)



Figure 5.1: Non-linear activation function erf (z) — ^ Jq e ^ dt.

where pi € Rpxl and Vi E Rpxp are the mean vector and covariance matrix, respectively.

Models of this class have long been used, albeit generally without stochastic pertubation, to describe the dynamics of neuronal responses (e.g., Amari, 1977).

In this classical view, each node of the network represents a neuron or a column of neurons. Our use is more abstract. The RN is chosen for the range of dynamics it can exhibit, including convergence to point or surface attractors, oscillatory limit cycles, or chaotic evolution. Each node is simply an abstract dimension of latent

space which may couple to many or all of the observed neurons.

The output distribution is given by a generalized linear model that describes the

relationship between all nodes in the state xt and the spike count y\ E {0 ,1 ,2 ,...} of neuron i = 1 , . . . , q in the t th time bin

y\ | xt ~ Poisson (h (c' xt + di) ■ A ) , (5.4)

where c* E Rpxl and di <G R define a linear function of the state and A E R+ is the time bin width. For notational compactness, the spike counts for all q

simultaneously-recorded neurons are assembled into a q x 1 vector y t , whose ith element is y\.

The link function h : R —> R+ is needed to ensure that mean firing rates are non-negative. The exponential link function h(z) — ez is typically used (Brown et al., 1998; Brockwell et al., 2004). However, this exponential mapping would distort the relationship between perturbations in the latent state (whose size is set by the covariance matrix Q) and the resulting fluctuations in firing rates. In



M9

Figure 5.2: Link function h(z) = log (1 + ez).

particular, the size of firing-rate fluctations would grow exponentially with the mean, an effect that would then add to the usual linear increase in spike-count variance that comes from the Poisson output distribution. Since neural firing does not show such a severe scaling in variability, such a model would fit poorly.

Therefore, to maintain more even firing-rate fluctuations, we instead take

h(z) = log (1 + ez) , (5.5)

as plotted in Figure 5.2.

Related models, such as those presented in Chapter 2, have been explored for

motor decoding (Brown et al., 1998; Wu et al., 2004; Brockwell et al., 2004; Wu et al., 2006). In that setting, the dynamical system assumed to underlie the neural

activity is the motor effector itself, most commonly the arm, and is thus not hidden. The dynamics of the effector can be readily modeled, and the “output” mapping from dynamic state to neural activity can be learned by regression between simultaneously collected neural and behavioral data. The machinery of latent variable models is thus exploited only during the decoding stage when the trajectory of the motor effector is estimated from neural data.

5.3 M odel fitting

As discussed in Section 5.1, the dynamical systems approach seeks to uncover

• an underlying state trajectory for each trial, and



• the rules governing the system dynamics.

If we knew the actual state trajectory on each trial, it would be relatively straightforward to estimate the rules governing the system dynamics. Conversely, if we knew the rules governing the system dynamics, we would have a much easier time inferring the single-trial state trajectories. This is a “chicken and egg” problem that can be naturally approached using the Expectation-Maximization (EM) algorithm (Dempster et al., 1977), which involves iterating the E-step and the M-

step. In the E-step, the state trajectories for each trial are inferred given the

most recent estimates of the system parameters. In the M-step, system parameters are updated given the most recent estimates of the single-trial state tra

jectories. For the system presented in Section 5.2, the system parameters are 6 = {k, W, Q, p i, Vj, {cj}, {dj}}, where the inner brackets denote the set over all i = 1 , . . . , q. By iterating the E-step and the M-step, we are able to bootstrap our way out of the chicken and egg problem. In fact, the EM algorithm guarantees

that the data likelihood P ({y}^) is non-decreasing as the system parameters are updated (Dempster et al., 1977).

If the state and output models were both linear with additive Gaussian noise (referred to as linear-Gaussian), the EM algorithm could be carried out exactly. However, the state model (5.1) involves a non-linear activation function g and the output model (5.4) uses a non-linear link function h with Poisson, rather than

Gaussian, noise. Consequently, an exact EM algorithm is not analytically tractable

in this case and approximations need to be made. Sections 5.4-5.6 detail how these approximations can be made.

5.4 E-step: Trajectory estim ation

In the E-step, we seek to infer the underlying state hidden state trajectories given the most recent estimates of the system parameters. In other words, for each trial, we seek to recover a distribution over the hidden sequence {x}f cor

responding to the observations {y }f . While computing the full state posterior



P ({x } f | { y } f ) may be intractable, it is often possible to obtain reasonable es

timates of the marginal P (x* | (y } f ) and pairwise joint P (xt_i, x t | {y }f) state

posteriors. As will be shown in Section 5.6, parameter estimation only requires

knowing P (xt | { y } f ) and P (xt_ i,x t | (y } f ) for all t.

The recursive Bayesian decoding methods described in Section 2.1.2 are insuffi

cient to compute these state posteriors. The recursive Bayesian techniques produce filtered posteriors that only take into account observations up to the current time t. Here, we seek smoothed state posteriors that take into account all past, present, and future observations. Taking into account future observations is not trivial since the state model (5.1) is non-linear. The extended Kalman smoother (EKS)

(Haykin, 1996) is commonly used for state estimation in non-linear dynamical

models. It transforms a non-linear time-invariant system into a linear time-variant

system using local linearization. Unfortunately, the EKS cannot be directly applied

here because the observation noise in (5.4) is not additive Gaussian.A possible alternative is the unscented Kalman smoother (UKS) (Wan and

van der Merwe, 2001; Briers et al., 2004), which employs multi-dimensional quadrature techniques to approximate Gaussian integrals that are analytically intractable. For smoothing, the UKS requires that the state dynamics be run backwards in time, either exactly, or approximately using, for example, a neural network. However, inverting non-linear state dynamics is generally difficult and may not be possible without altering the behavior of the system. Furthermore, the UKS makes Gaussian approximations in the observation space. For discrete-valued observations as in (5.4), this approximation may not be appropriate.

Another technique for non-linear state estimation was recently developed (Hes- kes and Zoeter, 2002; Zoeter et al., 2004; Ypma and Heskes, 2005) using the ex

pectation propagation (EP) framework (Minka, 2001b). In contrast to the UKS,th e E P -b a sed approach i ) d o es n o t require in vertin g th e s ta te d yn am ics, ii) m akes

Gaussian approximations only in the state space and not in the observation space, and in) allows state estimates to be refined iteratively using multiple forward- backward passes. We generally observe tens to hundreds of neurons simultaneously, and the number of spikes emitted by a neuron in a single time bin is most often



0 or 1. Thus, the observations are high-dimensional and distinctly non-Gaussian. In such settings, property ii) above is critical.

Expectation propagation (Minka, 2001b; Heskes and Zoeter, 2003) is a general

approach for approximate inference when exact belief propagation (Pearl, 1988) is

difficult or intractable. As with belief propagation, messages are passed between

nodes in the graph and can be combined to form belief states. The idea is to

retain only the expectations of the messages and belief states, for example their

first and second moments, and to iterate until these expectations are consistent throughout the graph. The exact belief states are approximated by exponential family distributions, which are fully specified by these expectations.

For the standard dynamical systems model with chain-like structure (cf. Figure 2.1), the forward at and backward fit messages are defined as (Minka, 1999)

It can easily be shown that the desired state posteriors, also known as the belief states, can be expressed in terms of these messages as

The message-passing rules are obtained by equating the marginal posterior (5.8) with a marginalized form of the joint posterior (5.9). The forward pass iteratively computes the a t as follows

a* (xt) = P (xt, {y}i) fit (xt) = P ({y>r+i I X t).

(5.6)

(5.7)

p (x* IMDP (xt—r, Xj I (y } [)

(5.8)

(5.9)

p (xt I {y}i) = J P (xt_i,xt | {y}f) dxt_i (5.10)

a t (xt) fit (xt) = J i (xj_i) P (xt | x t_i) P (yt | xt) fit (xt) dxt_i (5.11)

(5.12)



Similarly, by integrating the joint posterior instead over xt, we get the backward pass which iteratively computes the (5t- i as follows

Pt-i (xt_i) = y P (xt I x*_i) P (y* I x t) Pt (xt) dx*. (5.13)

If the integrals in (5.12) and (5.13) can be computed exactly, the forward and

backward messages do not interfere with each other and can be computed in par

allel. Furthermore, a single forward and backward pass is sufficient to compute

the desired state posteriors. This is the case when the state and observation equa

tions are both linear-Gaussian, and the message passing scheme described above is equivalent to the standard Kalman smoother.

Due to the non-linearities in (5.1) and (5.4), the integrals in (5.12) and (5.13) cannot be computed analytically. To make inference tractable, we make the following two approximations. First, the full state posterior is decoupled across time

P (W f I {y>D ~ I I & fe) > (5-14)t= i

where

Qt (xt) oc a t (xt) pt (xt) . (5.15)

Second, the messages a t and pt are assumed to belong to the exponential family, which is closed under multiplication and division and whose distributions are fully

described by their canonical parameters. This allows us to easily manipulate and keep track of the messages and belief states. In particular, we choose a Gaussian approximating distribution

a t (xt) oc J\f (xj, V*) (5.16)

Pt(x t) ( x A f ( n t , Et) . (5.17)

This yields an approximate marginal posterior (5.15) that is also Gaussian.



The following is how EP is applied to a dynamical systems model with chain

like structure (cf. Figure 2.1) (Heskes and Zoeter, 2003). For a given t,

1. Replace Qt-i and Qt in the factored posterior (5.14) by the joint posterior

(5.9)

t —2

T = 1

P ({x}f) oc J J Qt (xt ) a*-! (xt_!) P (x* I Xt_i) P (yt | x*) (3t (x*)

T

• n Qr (Xr). (5.18)T=t+1

2. Project P ({x}f) back to the approximating family by finding a new factored

distribution n ^ i Qt (x t) that minimizes

KL \ P ( { x )P f j Qr W J •

This is equivalent to minimizing

KL ( P (xt_i, xt) Qt~i (xt_i) Qt (xt))

with respect to Qt- i (xt_i) Qt (xt), where

(5.19)

(5.20)

P (xt—i , Xj) a ott~i (xj_i) P ( x t I X t - i )P (y t | Xt) A (x t) • (5-21)

In general, if P is the exponential family, the problem

qnew = argmin KL(p || q) (5.22)

is solved by matching the expected sufficient statistics of q to those of p. If q is further constrained to be Gaussian, then the problem reduces to moment matching. In other words, the parameters of q are chosen such that its moments match those of p .



Due to the non-linearities in (5.1) and (5.4), the moments of P (xt_i, X*) cannot be computed analytically. We will consider two different techniques to approximate the moments of P (xt_ i ,x t) in Section 5.5. For now, assume that we’ve estimated

the moments using one of the two techniques, yielding

P ( x t_1)Xt) ~ A f ( X u Vt) , (5.23)

where Xt R2pxl is the estimated mean and Vt G R2px2p is the estimated covariance. The submatrices of Xt and Vt will later be referred to using the following conventions

Xt,i Vt =\ n Vt, 12

Xt,2. 2i Vt, 22

According to (5.20), Qt~i (x*_i) Q t (xt) should be updated by matching its mo

ments with those of P ( x t- i , x t). Using a greedy approach (Heskes and Zoeter, 2003),

• the forward pass iteratively fixes Qt-i (xt_i) and updates Qt (xt) by matching its moments with those of J P (xt_1(x 4) d x ^ , 1 and

• the backward pass iteratively fixes Q t(x t) and updates Qt~\ (x t-i) by matching its moments with those of f P (xt_i, x t) dxt .

From (5.15) and (5.23), the moment-matching rule for the forward pass is

at (xt) fo (x*) oc Af (xti2, Vt,22) , (5.24)

where the right side of (5.24) is a function of a 4_i and 0t- Thus, a t can be updatedin order from t = 1 to t = T while keeping f3t fixed for all t. Similarly, themoment-matching rule for the backward pass is

a t~i (xt_i) Pt-i (x t-i) AT (xt,i, Vt,n) , (5.25)

1Note the parallel with (5.10) when performing exact inference.



where the right side of (5.25) is a function of a t- i and ,6t . Thus, /3t_i can be

updated in reverse order from t = T to t = 2 while keeping at fixed for all t. Expectation propagation derives its name from the fact that these updates only

involve expectations of the relevant distributions, as specified by the moment- matching rules (5.24) and (5.25).

Figure 5.3 illustrates an a t update during the forward pass for a one-dimensional state (p = 1). The left panel shows how P (xt- i , x t) (red) is formed from ott- \ (xt_i),

P (xt | xt_ i ) P ( y t | x t), and f3t(xt), as defined in (5.21). According to (5.23), we

take a Gaussian approximation of P (xt- i , x t), which is represented by ellipsoidal contours in the right panel. There are two sets of ellipsoidal contours (blue and green), one for each of the Gaussian approximations that will be presented in Section 5.5. Using (5.24), the forward message a t (xt) can then be updated by dividing out the backward message f3t (xt), which is held fixed during the forward pass. The resulting a t (xt) (blue and green, one for each Gaussian approximation) are plotted on the vertical axis in the right panel.

For exact inference, the forward a t and backward /3t messages do not interfere,

as shown in (5.12) and (5.13). In this case, a single forward and backward pass are

sufficient to converge to the correct beliefs. In contrast, for approximate inference,

the forward and backward messages do interfere due to the Gaussian approximation (5.23). By performing multiple forward and backward passes, it is often possible to improve the consistency of the estimated beliefs across the entire graph (cf. Figure 2.1). Although EP has no convergence guarantees, it often works well in practice. It has been shown that EP always has at least one fixed point when using

exponential family approximations (Minka, 2001a). Convergent EP algorithms for dynamical system models is a topic of current research (Heskes and Zoeter, 2003).

Appendix A .l summarizes how the messages are updated when taking forward and backw ard passes.

The data likelihood P ({y}f) is often needed for model comparison or tracking the progress of an EM algorithm. Data likelihoods can be approximated using either the belief states returned by EP or sequential Monte Carlo (i.e., particle) techniques (Doucet et al., 2001). The former technique is detailed in Appendix A.2;



X t - l

P ( x t | X t - i ) P ( y t | x t )

■MH

Figure 5.3: Illustration of an EP update during the forward pass. The left panel shows how P (ay_i, x t) (red) is formed from its constituent factors (black). The right panel shows two possible Gaussian approximations (ellipsoidal contours) of P Xt), depending on whether Laplace-EP (blue) or GQ-EP (green) is used. This results in two possible updates of a t (xt), plotted as one-dimensional densities on the vertical axis.

its principal drawback is that the likelihood computation is subject to the same approximations as those used for inference. It is often desirable to evaluate data

likelihoods independently of the approximations used for inference. In this case,

sequential Monte Carlo techniques can be applied, whereby theoretically P ({y}f) can be computed exactly given enough particles. However, particle techniques tend to be computationally intensive.

5.5 M om ent estim ation for EP

As described in Section 5.4, EP requires finding the moments of P(x*_i,Xi), as defined in (5.21). As the exact moment computation is analytically intractable,

we consider two approximations here. The first fits a Gaussian to a mode of P (xt_i,Xt) (Ypma and Heskes, 2005), as in the Laplace approximation of an integral (MacKay, 2003). This will be described in Section 5.5.1. The second estimates



the moments using Gaussian quadrature (Zoeter et al., 2004; Lerner, 2002), de

scribed in Section 5.5.2. Each type of approximation leads to a variant of EP, termed Laplace-EP and GQ-EP, respectively. The difference in these two methods

is illustrated in Figure 5.3.

5.5.1 Modal Gaussian approximation

The mean of P (xt_1; xt) is approximated to be one of its modes Xt £ K2pxl. In other words, we take Xt ~ Xt- A Gaussian is then locally fit to P (xt_i, x t) at that mode, whose covariance is defined as

especially if it is multi-modal. This motivates the use of Gaussian quadrature. The

for appropriate choices of the function g. For example, if g (xt_i, x t) = xt, the mean

of x t based on P (xt_1, x t) is obtained2. By introducing a proposal distribution

(2(xt_i,Xf)3, (5.27) can be expressed as an integral over a known “weighting” function

(5.26)

5.5.2 Gaussian quadrature

While the modal Gaussian approximation often works well if P ( x t_ i ,x t) is uni- modal, it is often desirable to take into account more global properties of P (x*_i, x t),

moments of P (xt_i, x t) can be expressed as

(5.27)

2Note that this is not the same g as in (5.1).3Q (xt_ i,x*) determines the distribution of quadrature points and so is referred to as a pro

posal distribution by analogy to importance sampling.



Gaussian quadrature (Zoeter et al., 2004; Lerner, 2002) approximates this integral

based on Q (xt_ i,x t). An example of a quadrature rule (i.e., a set of quadrature points and weights) based on a Gaussian proposal will be given below.

Proposal distributions for GQ-EP Using the proposal distribution

with Gaussian messages at and /3t , Zoeter and colleagues (Zoeter et al., 2004)

reported that GQ-EP was sensitive to outlying observations. In particular, the

tive to Q (xt_!, x t). As a result, covariance matrices estimated from (5.29) may be

ill-conditioned, and GQ-EP becomes largely unusable. Outlying observations are

common in the early stages of learning the model parameters, when the parameters are not a good match with the observed data. Even without outlying observations per se, quadrature point locations can be poorly chosen during the first forward pass if Q (xt_i, Xt) is determined without knowledge of the current observation yt.

To overcome this problem, we choose Q (xt_i,xt) to be a Gaussian matched to the location and curvature of a mode of P (xt_!, x t), as in the Laplace approxima

tion of an integral (MacKay, 2003). Note that this is the same Gaussian described

in Section 5.5.1, but it is used here as a proposal distribution for GQ-EP. With this choice of proposal distribution, the quadrature points are centered on a mode of P (xt_i,Xi), making GQ-EP more robust to outlying observations.

A further advantage of choosing Q (xt_ i , x t) to be a modal Gaussian approximation of P (x t_ i,x t) is that correlations between x t_i and x* can be taken into

by

(5.29)

where [ ( x ^ )7 (x()']; and Wj are, respectively, the j th quadrature point and weight

Q (x t-u x t) oc u t- i (xt_i) (It- i (xt_i) at (xt) (3t (xt) , (5.30)

quadrature points may lie in regions where P (xt_!, x t) has negligible density rela-



account. A proposal distribution chosen according to (5.30) is necessarily axis- aligned due to the assumed factorization between x t_i and xt. As a result, some quadrature points may lie in regions where P ( x t_ i,x t) has neglible density relative to Q (xt_ i,x t). In contrast, a modal Gaussian approximation can be oriented along a direction of correlation between x 4_x and x t specified by P (xt_!, x 4).

Quadrature rules w ith non-negative weights Covariance matrices are formed

in (5.29) by a weighted sum of outer products If one or more of the

quadrature weights Wj is negative, the resulting covariance matrix may have neg

ative eigenvalues. It is important to emphasize that this appearance of negative eigenvalues is not due to numerical instabilities; in particular, if a square-root

filter (Wan and van der Merwe, 2001) is used, negative quadrature weights may lead to invalid Cholesky updates. Thus, quadrature rules with non-negative Wj are necessary to stabilize quadrature-based EP.

Furthermore, evaluating P (xt_1; x 4) at the quadrature points in (5.29) requires computing the data likelihood

p ( M D = J J a t - 1 (x * - i ) P (x* I x t - i ) P ( y t I x i) Pt (x t) dTtt-id-Xf (5.31)

This integral is generally analytically intractable and must also be approximated by Gaussian quadrature, frequently using the same proposal distribution Q (xt_!, x t). Once again, negative quadrature weights may lead to instability, here in the form of an impossible negative likelihood estimate.

We consider two quadrature rules with non-negative weights. For notational clarity, a Gaussian integral is approximated by Gaussian quadrature as follows

/ n —1

h(z) Af (z; //, E) dz « ^ Wjh (zj ) , (5.32)j=o



where z 6 Rrx l, fi 6 Rrx l, E e Rrxr, h is a deterministic nonlinear func

tion4, Zo,. . . , z„_i are the quadrature points, and w0 l , wn - 1 are the quadra

ture weights. The first quadrature rule is the classical precision 3 rule (Wan and

van der Merwe, 2001; Lerner, 2002; Julier and Uhlmann, 2004), which prescribes the following points and weights

i rZo = £4 Wo = 1 -----97

Zi = / / + 7 ( V E ^ Wi = - L (5 .3 3 )

z r+i = ii - 7 wr+i =

where i — 1, . . . , r and 7 S R is a free parameter. is the zth column ofR G Rrxr, where R R ' = E. The number of quadrature points is n = 2r + 1. This quadrature rule is exact if h(z) in (5.32) is a monomial of degree 3 or less. Furthermore, as long as 7 is chosen such that 7 2 > r, the quadrature weights Wj in (5.33) are non-negative.

The second quadrature rule is a custom “precision 3” rule derived using Gaussian processes (GPs) under the constraint of non-negative weights. Whereas the classical rule achieves zero error for monomials of degree 3 or less and offers no guarantees for monomials of higher degree, the custom rule minimizes the average error across an entire family of functions. Under the GP approach, the task of

selecting quadrature points and weights is formulated as an optimization prob

lem (Minka, 2000). The details of how to derive quadrature rules in this way can be found in the aforementioned reference; here, we describe how this technique

was applied to derive the custom “precision 3” rule. We first transformed the unconstrained optimization problem into a constrained optimization problem by introducing a non-negativity constraint on the quadrature weights. Assuming the same constellation of quadrature points as in (5.33) up to the scaling factor 7 , the optimization problem was then solved to obtain 7 and a new set of quadra

ture weights w q , . . . , w 2 r - Note that these optimized weights will not necessarily be

4Note that this is not the same h as in (5.4).



the same as the classical weights of (5.33). A GP requires the specification of a covariance function. We chose the commonly-used radial basis function

where the free parameter b sets the relative importance of monomials of varying

degree. As b —> 0 , monomials of lower degree have priority. This GP approach is general and can be used to derive other quadrature rules with non-negative weights.

In the classical precision 3 rule (5.33), only the central quadrature weight w0 can be negative. Julier and colleagues (Julier et al., 2000) recognized that, if a covariance estimate is expanded about a point away from the estimated mean, positive semidefiniteness can be guaranteed even though vjq < 0. To illustrate this, let z ~ J\f (ji, E) and h(z) be a column vector. The estimated covariance of

h(z) using Gaussian quadrature is

where m is the estimated mean of h(z) from (5.32). Julier and colleagues replaced

m with h(z0) in (5.35), thereby expanding the covariance about a point h(z0) away from the estimated mean m. As a result, the j = 0 term in the sum in (5.35) disappears and all remaining terms have positive quadrature weights. The UKS tested in Section 5.7 uses this expansion. While effective for the precision 3 rule, this expansion doesn’t generalize to the precision 5 rule (Lerner, 2002; Julier and

Uhlmann, 2004), where multiple quadrature weights can be negative. Furthermore, this technique cannot be used to estimate data likelihoods.

A n oth er way to ensure p o sitiv e semidefiniteness is to use a one-dimensional quadrature rule along each dimension of z, rather than a multi-dimensional rule such as (5.33). However, the number of quadrature points required would grow exponentially, rather than linearly, with r. In addition, Lerner showed how to

K{zj ,zk) = e ^ - Z , (5.34)

(5.35)j= o



project a covariance matrix with predominantly known components to the posi

tive semidefinite cone (Lerner, 2002). However, applying this technique to problems discussed in this paper would require extension to covariance matrices whose elements are entirely unknown.

5.6 M -step: Param eter estim ation

Based on the estimate of the state posterior P ({x}f | (y } f) from the E-step, the

M-step then updates the system parameters 8 = {k, W, Q, p i, Vi, {c*}, {di}}.

We seek the 0 that maximizes { log P ^{x}^ , {y}f 0 ^ , where the angle brackets denote the expectation over the state posterior from the E-step (Roweis and Ghahramani, 2001). In this section, we derive parameter updates for a single sequence. For multiple sequences that are independent conditioned on 8, we use the sum of expectations over all sequences. The parameter updates for multiple sequences are a straightforward extension of those for a single sequence.

We first expand the log of the joint probability distribution between the hidden {x}[ and observed {y}^ variables

£(0) = log P ( { x } [ , {y}[ 0) (5.36)T g T

= EE log P (y\ | xt) + ^ 2 lo§ P (x* I x *-i) + lo§ P (x 0 (5-37)t= 1 i= 1 t= 2

T g

= EE (~ h (c( xj + di) ■ A + y\ • log (h (c' Xf + • A) - log (yj!))£=1 i=l

T

_ \ _ ^ _ _ k ' w ' s (x i - i ) ) ' Q ~ \ • ) - — 5— l o s l Q It= 2

- ^ (Xi - p i) ' Vi-1 ( • ) - ~ log| Vj | - ^ lo g 2 7 r . (5.38)

For notational clarity, the conditional on 0 is supressed in (5.37) and the quadratic

terms in (5.38) are abbreviated using ( • )5. A local maximum of (£(0)) with

5For example, (xi — P i ) 'V 1 x( • ) is shorthand for (xi — pi / Vj 1 (xj — pi)



respect to W must satisfy = 0, which yields

W = A . l k - ^ 2 ( g (xi_!) g (xt_i)#t= 2

where

, i = 2

Similarly, solving = 0 for Q yields

t=2

<? = T - 1 E (xt x() - (1 - A:) ( E (x t x 't-l) + E x i)i= 2 , i = 2

Tt=2

+ (1 - ^)2 E <x*-i xj_:) - A ( J 2 (g (Xi-l) S ta - i) ') -4'i= 2 .4=2

where A is given in (5.40). Solving d^ l )) = 0 for /c yields

k = —b\ + b2 + ^3 — ^4

b2 — 264 + &5

where

6i = t r ( ^ E ^ - O j

b2 = tr ^ T 1 (x t-1 x*-i> j

63 = tr ^ ' Q - 1 <xt g (x t.i)') j

4 — tr ( w ' g ^ E ^ g t x ^ ) ' )4= 2

(5.39)

^ = ( E ( x * S ( x 4 - i ) ' ) - ( ! - fc) E ( x * - ! 8 ( ^ - i ) 7) J • ( 5 -4 0 )

(5.41)

(5.42)



65 = tr 'W (g (xt_i) g (xi_i);)J .

All but one angle-bracketed expression in the updates for W , Q, and k can be

computed analytically based on Gaussian estimates of the marginal P (xt | {y }f) and pairwise joint P ( x t_ i,x t | (y } f) state posteriors from the E-step. This is

made possible by judiciously choosing the non-linear activation function to be the

error function (5.2). The only expectation that cannot be computed analyticallyis (g (xt_!) g ( x i - i / ) , which is approximated by numerical integration.

The updates for W and Q each depend on h, conversely, the update for k depends on both W and Q. Thus, the three parameters need to be updated together so that their new values satisfy (5.39), (5.41), and (5.42) simultaneously. This can be achieved through fixed-point iterations, whereby the three update equations (5.39), (5.41), and (5.42) are applied iteratively until convergence. Empirically,

the fixed-point iterations have not failed to converge when applied in the present setting.

The updates for pi and V\ are

Pi = (*i) (5.43)

Vi = (xxxj) - (xi) ( x i) '. (5.44)

To find the updates for {c*} and {di}, we first define an augmented state x* = [x( 1]' G R(p+1)xl and c* = [c' d*]' G R ^ +1)xl so that c, and d* can be

optimized simultaneously. The gradient with respect to c* is

T

dJ 7 L = E ( “ A • I ; <k (3 *•)> + V', ■ | : (log h « *<)>) . (5.45)

Unlike for the other parameters, it is not possible to solve = 0 and obtain aclosed-form expression for c*. Instead, we use conjugate gradients based on (5.45) to find a local optimum of (C{B)).



The following describes how to compute (5.45) based on a Gaussian approximation Af ( x f , V f) of P (xt | {y}f) from the E-step. Augmenting the state vector

as above, the Gaussian approximation from the E-step becomes P (xt | (y } f) ss A ^(x f, , where

~T=

Tx / E R ( p + 1 ) x 1 and v tT =

bio ’

1 O ' 0 _e K ( p + 1 ) x (p+ 1 )_

The first partial derivative on the right side of (5.45) can be expanded as follows

c’,VtT c,

■ h I u ■ Wc( VtT c, + c' x f ) M («; 0, l i du. (5.46)

which is a Gaussian integral in u e R. This integral can then approximated using one-dimensional Gaussian quadrature, also known as Hermite-Gauss quadrature (Press et al., 1992). The detailed derivation of (5.46) will be given in Appendix B.

The second partial derivative on the right side of (5.45) can be approximated as in (5.46), but replacing the function h with log h.

The following is an alternate expression that is entirely equivalent to (5.46), although it may appear improbable when comparing the integrals

d_dc- (Mc'x*)} = /

i J — a

u ' V t , ~T------------- + Xt

■ hi [u - c( VtT Ci + c' x ^ A f (u ; 0, 1) du. (5.47)

While (5.46) involves the function h , (5.47) involves its derivative h'. The derivation of (5.47) will also be given in Appendix B.

It is worth noting that the parameter updates derived in this section only require estimates of the marginal P (xt | (y } f) and pairwise joint P (xt_x, x t | (y } f)



state posteriors for all t, rather than the full state posterior P ({x}f | {y}f). Thus,

the E-step only needs to estimate the marginal and pairwise joint state posteriors.

5.7 Application to sim ulated data

E-step : Table 5.1 compares the state estimation accuracy of the UKS, GQ-EP,

and Laplace-EP for state dimensionalities p = 3,10 and observation dimensionality q = 100. We generated 50 state trajectories, each with 50 time points, and corre

sponding spike counts from (5.1), (5.3), and (5.4). The model parameters 6 were randomly chosen within a range that provided biologically realistic spike counts (typically, 0 or 1 spike in each bin). This procedure was repeated three times for each state dimensionality. The time constant k was set to 0.1.

For the UKS, we applied the classical precision 3 rule with 7 = y/3, which

yields quadrature points that match some fourth order moments of a Gaussian

distribution (Julier et al., 2000). The UKS requires computing and inverting a predicted observation covariance Pyy £ (Wan and van der Merwe, 2001;

Briers et al., 2004; Julier and Uhlmann, 2004). Because the observations here are high-dimensional, with a large number of elements equal to 0, Pyy was usually ill-conditioned. Thus, to make the inversion possible, we added a constant (0.5, which was determined by a systematic sweep) to the diagonal elements of Pyy, by analogy to ridge regression. For the UKS backward pass, we defined an artificial

state prior and approximated the backward-time dynamics with a linear-Gaussian

system (Briers et al., 2004).

For the EP-based estimators, the results are based on a single forward-backward pass. GQ-EP was tested using the modal Gaussian proposal distribution in tandem with each of the two quadrature rules from Section 5.5.2. For the classical rule (5 .3 3 ), w e se t y 2 = r to ensure n o n -n eg a tiv e quadrature w eights. Larger va lues o f

7 led to higher estimation errors. The GP hyperparameter b used to derive the custom quadrature rule was set to 0 .1 .

The UKS yielded higher estimation errors than the EP-based estimators because i) it makes Gaussian approximations in the observation space where the data



p = 3 p = 10 UKS 1.94±0.02 4.10T0.03GQ-EP, classical 0.93±0.01 2.62±0.02GQ-EP, custom 0.93±0.01 2.35±0.02Laplace-EP 0.94±0.01 2.22T0.01

Table 5.1: Root-mean-square error (m eanisem ) between the actual and estimated state trajectories.

are distinctly non-Gaussian, and ii) it approximates the backward-time dynamics of the non-linear system (5.1) using a linear-Gaussian system. For p = 3, the three EP-based estimators provide comparable performance. However, for p = 10,

Laplace-EP is preferred and the custom quadrature rule that we derived using

Gaussian processes outperforms the classical rule.Higher precision quadrature rules have been proposed (e.g., precision 5 rules

(Lerner, 2002; Julier and Uhlmann, 2004)), but the techniques used to guarantee positive semidefinite covariances and non-negative data likelihoods for the classical precision 3 rule don’t apply. In particular, there is no free parameter that can be

chosen to keep weights non-negative. Furthermore, because more than one weight

can be negative, it is not possible to guarantee valid covariances by expanding about a different point. Developing quadrature rules that further improve the

estimation accuracy of GQ-EP, especially at higher state dimensionalities, is a topic of future research.

M -step : W ith a set of known model parameters 9, state trajectories and spikecounts can be generated as described above. Then, we can attem pt to recover these model parameters using only the actual state trajectories, spike counts, and

parameter update equations derived in Section 5.6. An arbitrarily small variance was added to the actual state trajectories, since the E-step typically estimates state trajectories with some uncertainty. This procedure returned reasonable estimates

of the actual model parameters, thereby validating the parameter update equations derived in Section 5.6.

There are several known challenges related to the use of the EM algorithm to



recover the actual parameters of specific non-linear systems (Roweis and Ghahra-

mani, 2001). First, EM is a hill-climbing algorithm. As such, while exact EM is guaranteed to find a local maximum of the likelihood, it cannot be expected to find the global maximum in a non-convex problem. Thus, the initialization of the model parameters is critical. In Section 5.8, an initialization technique will be described that attempts start EM in a reasonable region of parameter space. Second,

different parameter settings within the non-linear dynamical systems model (5.1),(5.3), and (5.4) may result in not only the same data likelihood, but also the same

marginal distribution on all possible spike trains. This is due to degeneracies in

the model specification. For example, the dimensions of the hidden system are

exchangeable, so that a simultaneous permutation of the elements of p i, Vi, W, Q, and {c*} would result in an identical generative model for spike data. Unfortunately, in the case of a non-linear system, more subtle transformations can also lead to equivalent models. This is not in itself a problem; the issue is only to distinguish such equivalent models from genuine local optima. This can be achieved by

studying the generative distributions of the models found at convergence. Where these are indistinguishable, we may conclude that any of the equivalent models is an equally suitable description of the data.

5.8 Application to neural data

As discussed in Section 5.7, good parameter initialization for EM is critical. Rather than initializing the parameters to randomly-chosen values, we propose a systematic way to find reasonable initial guesses of the parameters. The idea is to first fit a non-dynamical model to the training data, projecting the high-dimensional spike count vectors into a lower dimensional latent space without regard for dynamics. The fitted parameters of the non-dynamical model, along with the projected lowdimensional points, can then be used to initialize the parameters of the dynamical model.

The non-dynamical model, termed factor analysis with Poisson output (FAP),



is defined as follows

x ~ A f ( 0 , I) (5.48)

yi | x ~ Poisson (h (c' x + di) • A ) , (5.49)

where x £ Mpxl as in (5.1). Instead of a dynamical state model (5.1), the state model here (5.48) is a static Gaussian prior. The observation model (5.49) is identical to (5.4), except the time index is removed. The FAP model is fit to the training data using an approximate EM algorithm that employs the same approximations as for the dynamical model. The fitted parameters {c;} and {di} are then used to initialize the corresponding parameters for the dynamical model in (5.4).

The parameters in the state model (5.1) are initialized by first taking the es

timated means of P (x | y) to be the projected low-dimensional points and label

ing these points with their original time indices. Expressions for the maximum- likelihood parameters based on P ({ x } f) can be derived. These expressions are virtually identical to to M-step parameter updates (cf. Section 5.6), but with the angle brackets removed. In other words, W , Q, k, p i, and Vi can be expressed

in terms of the locations of the low-dimensional points. The maximum-likelihood parameters are then used to initialize EM for the dynamical model.

We fit the dynamical model (5.1), (5.3), and (5.4) with three latent dimensions (;p — 3) to training data, consisting of delay activity preceding 70 reaches to the same target (30°). All results in this section are based on experiment G20040508.

The behavioral task and neural recordings are detailed in Section 2.3. Spike counts

were taken in non-overlapping A = 20 ms bins at 20 ms time steps from 50 ms

after target onset to 50 ms after the go cue. After every 20 EM iterations, thed a ta likelihood P({y}^) g iven th e current param eter estim a tes w as co m p u ted us

ing sequential Monte Carlo (i.e., particle) techniques (Doucet et al., 2001). We used sequential Monte Carlo techniques so tha t the likelihood computation would not be subjected to the same approximations as the fitting algorithm. Figure 5.4 shows how the training and test data likelihoods change with EM iteration. The



TDOO<DO)o

Training data (70 trials) Test data (100 trials)-0.992

-1.35

-0.996

-1.355

-1.36-1.004

Linear Recurrent net

Linear Recurrent net -1.365

-1.008120 160 200 0 40 80 120 160 200

EM iteration EM iteration

Figure 5.4: Learning curves for the approximate EM algorithm developed in this chapter, with GQ-EP used for the E-step. Data likelihoods computed using sequential Monte Carlo techniques (2500 particles). Left panel: training data (70 trials). Right panel: test data (100 trials). Red traces: linear-Gaussian state model. Blue traces: recurrent network state model.

blue curves ( “recurrent net”) correspond to the model defined by (5.1), (5.3), and

(5.4). The red curves ( “linear”) use a linear-Gaussian state model rather than (5.1), along with the initial state (5.3) and observation (5.4) models. For exact EM, the data likelihood is guaranteed to be non-decreasing with EM iteration for the training data. However, with the approximations detailed in Sections 5.4-5.6 , this monotonicity is no longer guaranteed. In Figure 5.4 (left panel), the training

likelihoods increase quickly then plateau. This near-monotonicity is encouraging given the numerous approximations made by the fitting algorithm. The intermediate parameter estimates can also be used to evaluate test data likelihoods, as shown in Figure 5.4 (right panel). The fact that the test likelihoods also increase quick ly th en p la tea u in d ica tes th a t th e fitted param eters generalize well to d a ta

not used in the fitting process. Note that the absolute training likelihoods should

not be directly compared to the absolute test likelihoods since different data (and a different number of trials) are used in each case.

The appropriateness of different models were then evaluated by comparing data



Training data (70 trials) Test data (100 trials)

-0 .9 9 2

■aoo

— -1 .0 0 8 O)o

-1 .0 1 6

-1 .3 5

-1 .3 6

-1 .3 7

-1 .3 8

-1 .3 9

Linear RN FAP PSTHP Linear RN FAP PSTHP

Figure 5.5: Model comparison between two dynamical models (linear and recurrent state dynamics) and two non-dynamical models (FAP and PSTHP). Left panel: training data (70 trials). Right panel: test data (100 trials).

likelihoods. We compared the two dynamical models from Figure 5.4 to two non- dynamical models - the FAP model presented above and a peri-stimulus time histogram with Poisson noise (PSTHP). For the PSTHP, we first produced spike histograms by aligning trials on target onset and averaging the raw spike counts across trials. For each unit, an estimate of the time-varying mean firing rate was

obtained. This served as the mean of a Poisson distribution at each timepoint,

which was used to evaluate data likelihoods. Figure 5.5 compares the four models based on training and test data likelihoods. For the dynamical models and FAP, the

estimated parameters after 200 EM iterations were used. On the training data (left panel), it is not surprising that the PSTHP yielded the highest likelihood, since it

has the largest number of parameters. On the test data (right panel), however, the PSTHP gave the lowest likelihood, indicating that the PSTHP overfit the training data and was not able to generalize to the test data. The right panel shows that the two dynamical models do a better job than the two non-dynamical models at accounting for the test data.

The key comparison in Figure 5.5 is between dynamical and non-dynamical models. It is interesting to note, however, in Figures 5.4 and 5.5 tha t the model



with linear state dynamics yielded slightly higher data likelihoods than the one with recurrent state dynamics. This can be explained by recalling that only reaches to

one target are being considered in the current analysis. The optimal subspace hypothesis from Chapter 4 would therefore predict a single attractor (corresponding to the single reach target) underlying the neural activity. Given that systems with

linear dynamics can exhibit a single attractor (e.g., all trajectories head toward the origin), it is not surprising that the two dynamical systems perform similarly

on the test data. However, if we were to apply these models to neural data cor

responding to reaches to different targets, each with its own attractor, we would

expect the model with recurrent state dynamics to do a better job at accounting for the test data. Furthermore, although accurate systems identification may appear to be a prerequisite to trajectory reconstruction, this is not entirely the case. An approximate linear system model might still yield reasonable inferred trajectories if constrained well enough by the data and if the overall model has the correct output distribution. However, if we also desire to meaningfully interpret

the learned latent space, it will be necessary to use a state model whose generative

dynamics better match the hypothesized non-linear dynamics of the system being

modeled.

We have thus far focused on the development of the statistical tools to fit the non-linear dynamical systems model to neural data. The next step is to use these tools to study the process of motor preparation. While this is the topic of future work, as will be discussed in Section 6 , we present here some preliminary results that illustrate the types of analyses that can be performed and the types of insights that might be gained with the latent variable approach.

Recall that the central idea of this approach is that responses of different neu

rons reflect different views of a common dynamical process (in this case, the process o f m otor preparation) th a t underlies th e neural activ ity . T h is com m on d yn am ica l

process can be extracted from the neural activity on a single-trial basis, represented as trajectories in the latent (x) space. Figure 5.6 shows the means of the marginal state posteriors P (xt | (y } f) (black traces) for 100 test trials based on



• Target onset + 50ms• Go cue + 50 ms

Figure 5.6: Inferred state trajectories (black) in latent x space for 100 test trials, based on the model with recurrent state dynamics. Dots indicate 50 ms after target onset (blue) and 50 ms after the go cue (green). The radius of the green dots is logarithmically-related to delay period duration (200, 750, or 1000 ms).

the dynamical model with recurrent state dynamics; note that a separate trajectory is inferred for each trial. The blue and green dots correspond to 50 ms after target presentation and 50 ms after the go cue, respectively. Despite the trial-

to-trial variability in the delay period neural responses, the state evolves along

a characteristic path on each trial. It could have been that the neural variability across trials would cause the state trajectory to evolve in markedly different ways on different trials. Even with the characteristic structure however, the state trajectories are not all identical. This presumably reflects the fact that the motor planning process is internally-regulated, and its timecourse may differ from trial to trial, even when the presented stimulus (in this case, the reach target) is identical. How these timecourses differ from trial to trial would have been obscured had we combined the neural data across trials, as with the NV in Chapter 4. From the

results in Chapter 4, we would expect the trajectories to converge to a common



subspace, as the hypothesized trajectories in Figure 4.1 suggest. While the trajectories in Figure 5.6 seem to all head towards a common region in latent space, the hypothesized convergence is not apparent. When attempting to relate Figure 5.6

to the NV results from Chapter 4, it is important to note that the activity of each

neuron may be coupled to the latent state x in a different way, depending on that

neuron’s c, and di in (5.4). Thus, variability along a given dimension in latent space results in potentially different firing rate variability for each neuron. Even

when this is taken into account, we are not able to reproduce the NV results from Chapter 4. Further investigation will be required to reconcile these results.

Is the low-dimensional description of the system dynamics adequate to describe the firing of all 75 recorded units? We can transform the inferred latent trajectories

from Figure 5.6 into trial-by-trial inhomogeneous firing rates using the output relationship from (5.4)

A\ = h (c' x t + d i) , (5.50)

where X\ is the imputed firing rate of the ith unit at time t. Figure 5.7 shows the

imputed firing rates for the 40 most active units in our dataset. Each panel corresponds to a unit and each blue trace represents the imputed firing rate on a single

trial. For comparison, the empirical firing rates obtained by directly averaging

raw spike counts across the same test trials are shown in red. If the imputed firing rates truly reflect the rate functions underlying the observed spikes, then the mean behavior of the imputed firing rates should track the empirical firing rates. On the other hand, if the latent system were inadequate to describe the activity, we should expect to see dynamical features in the empirical firing that could not be captured

by the imputed firing rates. The strong agreement observed in Figure 5.7 and

across all 75 units suggests that this simple dynamical system is indeed capable of capturing significant components of the dynamics of this neural circuit. We can view the dyamical system approach as a form of non-linear dynamical embedding of point-process data. This is in contrast to most current embedding algorithms that rely on continuous data. Figure 5.6 effectively represents a three-dimensional



80

40

0

80'37o5 40CD oQ.IE 80

CD 40

00 0I_O)

_ c 801_

40Li-0

80

40

0

—

MuJiPUP*!

V ... jfcjj0 500 1000 0 500 1000 0 500 1000 0 500 1000 0 500 1000 0 500 1000 0 500 1000 0 500 1000

Time relative to target onset (ms)

Figure 5.7: Imputed trial-by-trial firing rates (blue) and empirical firing rates (red). Gray vertical line indicates the time of the go cue. Each panel corresponds to oneunit. For clarity, only test trials with delay periods of 1000 ms (35 trials) areplotted for each unit.

manifold in the space of firing rates along which the dynamics unfold.Beyond the model comparison from Figure 5.5 and the agreement of imputed

means demonstrated by Figure 5.7, we would like to directly test the fit of the

model to the neural spike data. Unfortunately, current goodness-of-fit methods for spike trains, such as those based on time-rescaling (Brown et al., 2002), cannot be

applied directly to latent variable models. The difficulty arises because the average trajectory obtained from marginalizing over the latent variables in the system (by which we might hope to rescale the inter-spike intervals) is not designed to provide an accurate estimate of the trial-by-trial firing rate functions. Instead, each trial must be described by a distinct trajectory in latent space, which can only be inferred after observing the spike trains themselves. This could lead to overfitting. Developing an appropriate goodness-of-fit test is a topic of future research. One possibility is to infer latent trajectories using a subset of recorded neurons, and then test the quality of firing rate predictions for the remaining neurons.



In this chapter, we have presented a dynamical systems approach to studying the process of motor preparation. This approach is capable of providing singletrial characterizations of the neural activity, as well as insights into the types of dynamical behaviors exhibited by the underlying neural circuitry. We have

proposed a non-linear dynamical systems model with recurrent state dynamics

and Poisson outputs. Off-the-shelf techniques to fit such models to neural data

are inadequate due to the non-linearities in the both the state and observation

models. Thus, we have developed an approximate EM algorithm for model fitting that uses EP for state estimation. While we are not the first to apply EP to dynamical systems models (Heskes and Zoeter, 2002, 2003), previous applications have typically involved one-dimensional states and observations (Ypma and Heskes, 2005; Zoeter et al., 2004). In contrast, our application involves higher dimensional states and observations (tens to hundreds). This leads to issues in model fitting

that have not been encountered previously and we have presented solutions to these issues. Furthermore, we have proposed a novel approach to initializing the

model parameters for the approximate EM algorithm. These developments will

enable the study of motor preparation on single trials and allow us to further test the optimal subspace hypothesis proposed in Chapter 4.


Chapter 6

Conclusions and future work

In this dissertation, we sought to advance two primary areas. The first was to improve the performance of neural prosthetic systems for assisting disabled pa

tients. The performance of such systems is directly dependent on factors such as the number of neurons being recorded from simultaneously, the decoding algorithm used, and the conceptual design of the system itself. While increasing the number of neurons generally increases the system performance, the number of neurons available is usually limited by the electrode implant used, surgical risks, and biological processes that degrade the signal-to-noise ratio over time. The challenge,

therefore, is to improve the system performance with a fixed number of neurons.

Chapter 2 presented algorithmic advances to improve the accuracy of decod

ing goal-directed reaches. By defining a trajectory model that better captures the statistics of goal-directed movements, its decoded trajectories were more accurate

than those based on existing trajectory models. We then showed how the accuracy could be further improved by incorporating prior information about the goal

identity in a principled way. At the time of this dissertation, a manuscript is currently in rev ision (Y u, K em ere, S an th an am , A fshar, R yu , M eng, S ah an i, Shenoy,

J Neurophysiol). In Chapter 3, we designed and implemented a real-time system that decodes only the intended reach endpoint and forgoes decoding the moment- by-moment details of the trajectory. By decoding sequences of intended reach endpoints in rapid succession using unprecedentedly brief neural recordings, we

107


CHAPTER 6. CONCLUSIONS AND FUTURE WORK 108

achieved a greater than fourfold increase in communication throughput compared to the current state-of-the-art. These results have been reported as Santhanam, Ryu, Yu, Afshar, Shenoy, Nature (2006). Taken together, the work presented

in these two chapters should substantially increase the clinical viability of neural prosthetic systems in humans.

Extensions of Chapter 2 include applying the MTM framework to settings with novel reach goals and larger numbers of reach goals. The ability of the MTM de

coder to estimate trajectories to novel goals depends on how tightly the training

trajectories are clustered. Because the trajectory model encodes the relative like

lihoods of different trajectories, less tightly clustered training trajectories will in general lead to greater flexibility in estimating reaches to novel goals. The tradeoff, however, is that there will also be greater uncertainty in the estimated trajectories to known goals. As for increasing the number of reach goals, the primary analyses in Chapter 2 were based on eight reach goals. Our preliminary results based on 16 reach goals, as described in Section 2.6, are encouraging. Additional experiments

will need to be conducted to further explore the performance of the MTM decoder

with larger numbers of reach goals.

There are four other possible directions of future work. First, we would like to extend the mathematical framework of the MTM from M discrete reach goals to a

continuum of goal locations (i.e., replace the summation in (2 .2 ) with an integral). Second, the MTM framework can be implemented and tested in a real-time setting. All results in Chapter 2 were obtained in an off-line setting. Third, our work with 16 reach goals also suggests that, when larger numbers of goals are used, more sophisticated firing rate models may need to be developed to capture the firing

rate profiles (cf. Figure 2.5) across an increased number of reach goals. Fourth,

the MTM framework can be viewed as a discrete approximation to a complete model of neural motor control. In Chapter 2, we grouped trajectories by reach goal. In other contexts, the trajectories can be grouped by other criteria such as reach speed, reach curvature, etc. As with goal identity, prior information about reach speed can be obtained from delay activity in PMd and M l (Churchland et al., 2006a).


CHAPTER 6. CONCLUSIONS AND FUTURE W ORK 109

The following are two extensions to the work presented in Chapter 3. First,

we chose the target locations in a somewhat ad-hoc way (cf. Figure 3.4). We

tried various target configurations and chose those that yielded the best target

discriminability based on delay activity. It would be interesting to know if there

is a systematic way of choosing the best target configurations given a certain number of targets. This would allow us to either confirm our current choice of target configurations or discover new configurations that outperform existing ones. We’ve already done some work in this area (Cunningham et ah, 2006). Second, we assumed that the time during which the subject was planning was known. In other words, we knew exactly when to start and stop integrating the neural activity (cf.

Figure 3.3, purple shaded interval). In a clinical setting, the subject will want

to decide on his/her own when to select targets. It will be up to the prosthetic system to determine whether the subject is planning or not at any given point

in time based solely on the neural activity. This may be done using either finite

state machines (Afshar et ah, 2005) or hidden Markov models (Kemere, 2006). Experiments are already underway to design such a system.

The second primary area in which this dissertation seeks to contribute is to elucidate the process of motor preparation. Most studies to date of neural activity related to motor preparation have been descriptive, without attempting to uncover the mechanisms underlying that activity on a systems level. In Chapter 4, we

tested the hypothesis that firing rates in premotor cortex become optimized during

motor preparation, approaching their ideal values over time. We found that the

across-trial variability of neural responses decreased during the delay period and was predictive of the reaction time of the impending movement, consistent with the

hypothesis. This suggested that the timecourse of neural variability approximately tracks the progress of motor preparation. These results have been reported as C hurchland, Yu, R yu , S an th an am , Shenoy, J N e u r o s c i (2006).

To extend the characterization of this process, we developed and validated latent variable methods that enable the study of motor preparation on single trials in Chapter 5. These methods simultaneously estimate the system parameters and single-trial dynamical trajectories, as reported in Yu, Afshar, Santhanam, Ryu,



Shenoy, Sahani, NIPS (2006) and Yu, Shenoy, Sahani, IEEE N SSPW (2006). Preliminary analyses indicated that the dynamics underlying delay period activity can be captured by a low-dimensional non-linear dynamical systems model, with

underlying recurrent structure and stochastic point-process output. While promis

ing, it is important to note that we haven’t yet learned anything about the brain

using these latent variable methods. In this dissertation, we focused on the development of the statistical tools. The next steps are to determine whether what we’re seeing in the latent space is “real” and to show how these methods provide insights into the process of motor preparation that cannot be obtained using existing techniques. This will likely involve relating the single-trial trajectories to single-trial

behavior (e.g., reaction time). We also need to compare models with different la

tent dimensionalities and possibly variants of the proposed dynamical model. For example, we may find it necessary to take into account refractory periods in the observation model (5.4). Furthermore, up to now, we have only considered neural data preceding reaches to a single reach target. The dynamical model should be

fit to delay activity from trials with different reach targets. In the longer term,

we seek to identify the structure of the dynamics underlying motor preparatory

activity (i.e., the rules governing the system dynamics). We hope to gain insights into the computational properties of the underlying neural circuit by studying the

number and relationship of attractors, the appearance of oscillatory limit cycles, etc. Such findings can then be fed into the work described in Chapters 2 and 3 to design higher-performance neural prosthetic systems.

On the statistical end, we would like to continue to improve the approximate inference and learning techniques for model-fitting. There are three major avenues, among others, to explore. First, variational approaches to the EM algorithm (Jor

dan et al., 1999) for non-linear dynamical systems can be developed, where the s ta te p oster ior is ap p rox im ated as a factored d istr ib u tion (V alp o la and K arhunen,

2002). For such methods, it can be shown that a lower bound on the data likelihood is guaranteed to increase at each EM iteration. Second, while sequential Monte Carlo techniques have been shown to be effective for inference in non-linear dynamical systems (e.g., the particle smoother) (Doucet et al., 2000), they remain



unproven for learning the model parameters. The difficulty is that the joint state

posterior, which is needed for learning, is typically high-dimensional and requires

an exponentially large number of samples to represent it. Developing sampling methods that can learn is an active research area (Doucet and Tadic, 2003; Neal et al., 2004). Third, we should explore hybrid approaches between deterministic techniques employing Gaussian approximations and stochastic sampling techniques for inference (Wan and van der Merwe, 2001). The idea is to try to combine the ability of deterministic techniques to learn model parameters with the ability of

sampling techniques to represent arbitrary probability distributions. W ith all these

techniques, including the one developed in Chapter 5, we should continue to in

vestigate the impact of local optima and model degeneracies, as well as assess the

stability of parameter learning.

The techniques presented in Chapters 4 and 5 can be applied to behavioral tasks and brain areas other than those explored in this dissertation. Especially relevant would be other cognitive tasks, such as decision-making, where different “cognitive

paths” may be taken to achieve the same ends and where there is no overt behavior while the corresponding neural activity is being played out over time. For these tasks, it would be interesting to study the rate at which the NV drops and the level that it plateaus to, and compare them to the results shown in Chapter 4. Using latent variable techniques similar to those developed in Chapter 5, we can attempt to track the subject’s cognitive state on a single-trial basis, which may offer insights

into the cognitive process being studied. In designing these other experiments, it is

useful to bear in mind that these latent variable methods exploit the trial-to-trial

variability in cognitive paths to learn the structure of the underlying dynamics. If the cognitive path taken on each trial were the same, it would be difficult to uncover the rules governing the system dynamics. This is in contrast to many current exp erim en ta l m eth o d o lo g ies w here w e try to m ake each tr ia l id en tica l as

possible so that trial-averaging is possible. While this is not intended to imply that we should be sloppy with how our experiments are conducted, it does suggest that we should embrace trial-to-trial variability rather than going out of our way to try to eliminate it.



The following is my complete list of publications to provide easy reference for those interested in further reading.

Refereed journal papers

• Yu BM, Kemere C, Santhanam G, Afshar A, Ryu SI, Meng TH, Sahani M,

Shenoy KV. Mixture of trajectory models for neural decoding of goal-directed movements. In revision, J Neurophysiol.

• Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV. (2006). A High- Performance Brain-Computer Interface. Nature, 442:195-198.

• Churchland MM, Yu BM, Ryu SI, Santhanam G, Shenoy KV. (2006). Neural

variability in premotor cortex provides a signature of motor preparation. J Neurosci, 26(14):3697-3712.

Refereed conference papers

• Yu BM, Shenoy KV, Sahani M. (2006). Expectation propagation for inference in non-linear dynamical models with Poisson observations. IEEE Nonlinear Statistical Signal Processing Workshop, Cambridge, UK.

• Yu BM, Afshar A, Santhanam G, Ryu, SI, Shenoy KV, Sahani M. (2006).

Extracting dynamical structure embedded in neural activity. In Y.Weiss, B.Scholkopf, and J.P latt, eds., Adv. Neural Info. Processing Sys. 18, 1545- 1552, Cambridge, MA. MIT Press.

• Cunningham JP, Yu BM, Shenoy KV. (2006). Optimal target placement

for neural communication prostheses. IEEE EMBS 28th Annual Meeting: 2912-2915.

• Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV. (2005). A high performance neurally-controlled cursor positioning system. IEEE EM BS 2nd Annual Conference on Neural Engineering: 494-500.



• Yu BM, Ryu SI, Santhanam G, Churchland MM, Shenoy KV. (2004). Im

proving neural prosthetic system performance by combining plan and peri-

movement activity. IEEE EM BS 26th Annual Meeting: 4516-4519.

• Kemere CT, Santhanam G, Yu BM, Ryu SI, Meng TH, Shenoy KV. (2004). Model-based decoding of reaching movements for prosthetic systems. IEEE EM BS 26th Annual Meeting: 4524-4528.

• Shenoy KV, Churchland MM, Santhanam G, Yu BM, Ryu SI. (2003). Influence of movement speed on plan activity in monkey pre-motor cortex and

implictions for high-performance neural prosthetic systems design. IEEE

EM BS 25th Annual Meeting: 1897-1900.

• Kemere CT, Santhanam G, Yu BM, Shenoy KV, Meng TH. (2002). Decoding of plan and peri-movement neural signals in prosthetic systems. IEEE Workshop on Signal Processing Systems: 276-283.

Unrefereed conference papers / abstracts 2006

• Batista AP, Santhanam G, Yu BM, Ryu SI, Afshar A, Shenoy KV. (2006).

Heterogeneous reference frames for reaching in macaque PMd. Neural Con

trol of Movement (NCM) Annual Meeting. Abstract F-12.

• Batista AP, Yu BM, Santhanam G, Ryu SI, Afshar A, Shenoy KV. (2006). Influence of eye position on end-point decoding accuracy in dorsal premotor

cortex. Soc. for Neurosci. abstracts. Program No. 148.8.

• Chestek CA, Batista AP, Yu BM, Santhanam G, Ryu SI, Afshar A, Shenoy KV. (2006). The relationship between PMd neural activity and reachingb eh avior is s ta b le in h ig h ly tra in ed m acaques. S o c . f o r N e u r o s c i . a b s t r a c t s .

Program No. 148.5.

• Cunningham JP, Yu BM, Shenoy KV. (2006). Optimal target placement for neural communication prostheses. Soc. for Neurosci. abstracts. Program No. 256.21.



• Kemere C, Yu BM, Santhanam G, Ryu SI, Afshar A, Meng TH, Shenoy KV. (2006). Hidden Markov models for spatial and temporal estimation for prosthetic control. Soc. for Neurosci. abstracts. Program No. 256.17.

• Shenoy KV, Santhanam G, Ryu SI, Afshar A, Yu BM, Gilja V, Linderman MD, Kalmar RS, Cunningham JP, Kemere CT, Batista AP, Churchland

MM, Meng TH. (2006). Increasing the performance of cortically-controlled

prostheses. IEEE EM BS 28th Annual Meeting.

2005

• Afshar A, Achtman N, Santhanam G, Ryu SI, Yu BM, Shenoy KV. (2005). Free-paced target estimation in a delayed reach task. Soc. for Neurosci. abstracts. Program No. 401.13.

• Batista AP, Santhanam G, Yu BM, Ryu SI, Afshar A, Shenoy KV. (2005).

Heterogeneous coordinate frames for reaching in macaque PMd. Soc. for

Neurosci. abstracts. Program No. 363.12.

• Churchland MM, Yu BM, Ryu SI, Santhanam G, Afshar A, Shenoy KV. (2005). Neural variability in premotor cortex provides a signature of motor preparation. Computational and Systems Neuroscience (Cosyne) Meeting. Abstract 13.

• Churchland MM, Yu BM, Ryu SI, Santhanam G, Shenoy KV. (2005). Motor preparation and settling activity in PMd. Neural Control of Movement (NCM) Annual Meeting. Abstract E-13.

• Gilja V, Kalmar RS, Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV. (2005). Trial-by-trial mean normalization improves plan period reach target decoding. Soc. for Neurosci. abstracts. Program No. 519.18.

• Kalmar RS, Gilja V, Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV.

(2005). PMd delay activity during rapid sequential movement plans. Soc. for Neurosci. abstracts. Program No. 519.17.



• Sahani M, Yu BM, Afshar G, Santhanam G, Ryu SI, Shenoy KV. (2005). Ex

tracting dynamical structure embedded in neural activity. Soc. for Neurosci. abstracts. Program No. 689.14.

• Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV. (2005). Intra-cortical communication prosthesis design. Soc. for Neurosci. abstracts. Program No. 519.19.

• Yu BM, Afshar A, Santhanam G, Ryu, SI, Shenoy KV, Sahani M. (2005). Extracting dynamical structure embedded in motor preparatory activity. Computational and Systems Neuroscience (Cosyne) meeting. Abstract 290.

• Yu BM, Kemere C, Santhanam G, Afshar A, Ryu SI, Meng TH, Sahani

M, Shenoy KV. (2005). Mixture of trajectory models for neural decoding of

goal-directed movements. Soc. for Neurosci. abstracts. Program No. 520.18.

• Yu BM, Santhanam G, Ryu, SI, Shenoy KV. (2005). Feedback-directed state transition for recursive estimation of goal-directed trajectories. Computational and Systems Neuroscience (Cosyne) meeting. Abstract 291.

2004

• Batista AP, Yu BM, Santhanam G, Ryu SI, Shenoy KV. (2004). Coordinate

frames for reaching in macaque dorsal premotor cortex (PMd). Soc. for Neurosci. abstracts. Program No. 191.6.

• Churchland MM, Yu BM, Ryu SI, Santhanam G, Shenoy KV. (2004). Time- course of PMd processing predicts reaction time. Soc. for Neurosci. abstracts. Program No. 603.5.

• Kemere C, Santhanam G, Ryu SI, Yu BM, Meng TH, Shenoy KV. (2004). Reconstruction of arm trajectories from plan and peri-movement motor cor

tical activity. Soc. for Neurosci. abstracts. Program No. 884.12.



• Kemere C, Santhanam G, Ryu SI, Yu BM, Meng TH, Shenoy KV. (2004). Reconstruction of arm trajectories from plan and peri-movement motor cor

tical activity. Neural Prosthesis Program meeting, National Institutes of Health.

• Ryu SI, Santhanam G, Yu BM, Shenoy KV. (2004). High speed neural

prosthetic icon positioning. Soc. for Neurosci. abstracts. Program No. 263.1.

• Ryu SI, Yu BM, Churchland MM, Shenoy KV. (2004). Premotor cortex plan

activity used to decode upcoming reach speed for high-performance neural

prosthetic system design. 72nd Annual Meeting American Association of Neurological Surgeons. Article ID 19873.

• Ryu SI, Santhanam G, Yu BM, Shenoy KV. (2004). The speed at which reach movement plans can be decoded from the cortex and its implications

for high performance neural prosthetic arm systems. 54th Annual Meeting

Congress of Neurological Surgeons. Article ID 785, 55(2) :481.

• Santhanam G, Ryu SI, Yu BM, Shenoy KV. (2004). High information transmission rates in a neural prosthetic system. Soc. for Neurosci. abstracts. Program No. 263.2.

• Yu BM, Ryu SI, Santhanam G, Churchland MM, Shenoy KV. (2004). Improving classification performance of neural prosthetic systems by combining plan and peri-movement activity. Soc. for Neurosci. abstracts. Program No. 884.11.

• Yu BM, Ryu SI, Churchland MM, Shenoy KV. (2004). Improving neural prosthetic system performance for a fixed number of neurons. Computational and Systems Neuroscience (Cosyne) meeting. Abstract 219.


Appendix A

Details of Expectation Propagation

A .l Inference

The following summarizes how to update the at and fit \

1. Initialize for t = 1 , . . . , T

x j = 0 (V /)-1 = [0]

M, = 0 (E ,)-1 = [0]

2. Forward PassSummary: At a given t, fix cq_i, f3t and update a t.

(a) For t — 1

i. Find Xi € Mpxl and V1 G Mpxp using one of the moment estimation techniques described in Section 5.5

P (x i)P (yi | xi)/?i (xi) Vi). (A.l)

117


APPENDIX A. DETAILS OF EXPECTATION PROPAGATION 118

ii. Matching the means and covariances in

a i (xi) A (xi) ocN (xi> V i) , (A.2)

update

W ) -1 = ( v .r 1 - s r 1 (a .3)

*1 = Vi ( (H ) - 1 Xi - E r V . ) . (A.4)

(b) F b r t = 2 , . . . , T

i. Find Xt and Vt in (5.23).

ii. Matching the means and covariances in (5.24), update

(V? ) - 1 = (V,,22) - ‘ - E , - 1 (A.5)

A = v; Xui- s rV .)- (a.6)

3. Backward Pass

Summary: At a given t, fix a t- i , St and update St-i-

(a) For t = T , . . . , 2

i. Find Xt and H in (5.23)..

ii. Matching the means and covariances in (5.25), update

£ r-\ = (H .n ) - 1 - ( V t i T 1 (A.7)

M.-i = S ,-! ( ( H u ) " 1 Xt,i - ( H . / r 1 x ‘: ( ) . (A.8 )

4. Iterate Steps 2. and 3. until convergence.

For completeness, we define St — 1- Its proxy in the above moment-matchingscheme is to set jj,T = 0 and (Ey) 1 = [0 ], which correctly are never updated.Furthermore, it is not necessary to compute a.r until the final iteration, since no updates depend on it.



A .2 D ata likelihood

Equations (5.8) and (5.9) cannot be directly used to find the data likelihood be

cause the messages and beliefs are only known up to a proportionality constant,

as shown in (5.16) and (5.17). Instead, we first express a t and [3t in terms of these constants. From (5.6) and (5.16),

a t (xt) = P (x, | {y}',) P ({y}5)

= X (x,; x<, V,‘) P ({y }i) , (A.9)

where the notation J\f (xt ; x£, V[) explicitly indicates that x t is Gaussian-distributed

with mean x* and covariance V}. Similarly, from (5.7) and (5.17),

(3t (xt) = A f(x t ] A*t, St) • Kt, (A.10)

where

= J p ( M m I x t) dxt. (A. 1 1 )

Substituting (A.9) and (A.10) into (5.8) and (5.9) and integrating,

^(M D^ / x*’ ***’ = 1 (A-12)

^ / ( { y j f ) ^ X*=*’ V ) P ^ t I x *~1)

• P (yt | ^ ) Af (xt; jat, Et) dxt_i dx* = 1 . (A.13)



Solving for P ({y}*) in (A.12) and P ({y}* 1) in (A. 13), these two quantities can then be used to compute the conditional likelihood

p (yi I {y}t‘)=f({y }t1,y.) (A .

p ( W f ') ( '(xt_i; x jl j , V ?T i)P (x j | x t_i) P (yt \ x t)A/'(xf; fxt , Et) dxt_i dxt

f A f ( x t; x |, V {)N (x .t \ Pt, T,t)d x t

The denominator of (A. 15) can be computed exactly, while its numerator can be approximated using either Laplace’s method (MacKay, 2003) or Gaussian quadra

ture. The complete data likelihood P ({y}f) can then be computed by accumulating conditional likelihood terms

({y}f) = IIF(y*! Mi"1)- (A.i6)t=l


Appendix B

Details of M-step parameter updates

We derive here the expressions (5.46) and (5.47), which are used to update the

model parameters {c*} and { d j during the M-step. For notational simplicity, the unit index i and the tilde are omitted.

8 8 r°°— {h (c 'x t )) = h (c 'x i) M (x*; x f , VtT) dxt (B.l)

8 r°°= d c j h (Vt^ ( Vt’ c' x t> c 'V fc ) d v t (B.2)

= h(Û f y j d VtT c + c' M (ut; 0, 1) dut, (B.3)

where (B.2) is obtained by the change of variable vt = c 'x 4 and (B.3) is obtained by the change of variable ut = (vt — c 'x f ) / \J d V)T c. Equations (5.46) and (5.47) will be derived starting from (B.2) and (B.3), respectively.

121


APPENDIX B. DETAILS OF M-STEP PARAM ETER UPDATES 122

To obtain (5.46), we first move the partial derivative inside the integral in (B.2).

f ° ° r)J h (v t) — A f (vt ; c' x f , c' VtT c) dvt

( (c' VtT c) (vt ~ c' x f ) x * + ( y t - d x f ) 2 VtT c _ VtT c \

A (c'V tT c)2 c'V tT c )

■ h (vt) JV (vt; c' x f , c' V r c) dvt (B.4)

• h ^ U f \Jc' VtT c + c ' x J ^ A f (ut, 0, 1) dut, (B.5)

where (B.5) is obtained by the change of variable ut = (vt — c 'x f) / i /c M ^ c . Equation (B.5) is identical to (5.46).

To obtain (5.47), we move the partial derivative inside the integral in (B.3).

J JA (A ' \J c' v tT c + A f {ut\ 0 , 1 )du,

= r (Ut' v*Tc - A L .. / ^ : ,i-oo \ ^ c f VtT c

+ x t U ' U f V c , ^ T c + c 'x [ A f (i**; 0, 1 )dut, (B.6 )

where h' denotes the derivative of the function h. Equation (B.6 ) is identical to (5.47).


Bibliography

Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., Tishby, N., and

Vaadia, E. (1995). Cortical activity flips among quasi-stationary states. Proc

Natl Acad Sci USA, 92:8616-8620.

Afshar, A., Achtman, N., Santhanam, G., Ryu, S., Yu, B., and Shenoy, K. (2005). Free-paced target estimation in a delayed-reach task. Soc Neurosci Abstr.

Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biol Cybem, 27(2):77-87.

Ashe, J. and Georgopoulos, A. (1994). Movement parameters and neural activity in motor cortex and area 5. Cereb Cortex, 4:590-600.

Averbeck, B. and Lee, D. (2003). Neural noise and movement-related codes in the

macaque supplementary motor area. J Neurosci, 23:7630-7641.

Bair, W. and O ’Keefe, L. (1998). The influence of fixational eye movements on

the response of neurons in area MT of the macaque. Vis Neurosci, 15:779-786.

Bastian, A., Riehle, A., Erlhagen, W., and Schoner, G. (1998). Prior information

preshapes the population representation of movement direction in motor cortex. N e u r o R e p o r t , 9:315—319.

Bastian, A., Schoner, G., and Riehle, A. (2003). Preshaping and continuous evolution of motor cortical representations during movement preparation. Eur J Neurosci, 18:2047-2058.

123


BIBLIOGRAPHY 124

Batista, A., Buneo, C., Snyder, L., and Andersen, R. (1999). Reach plans in eye-centered coordinates. Science, 285:257-260.

Briers, M., Doucet, A., and Masked, S. (2004). Smoothing algorithms for state-

space models. Technical Report CUED/F-INFENG/TR.498, Cambridge University Engineering Department.

Brockwell, A., Rojas, A., and Kass, R. (2004). Recursive Bayesian decoding of

motor cortical signals by particle filtering. J Neurophysiol, 91 (4): 1899-1907.

Brown, E., Barbieri, R., Ventura, V., Kass, R., and Frank, L. (2002). The time- rescaling theorem and its application to neural spike train data analysis. Neural Comput, 14(2):325-346.

Brown, E., Frank, L., Tang, D., Quirk, M., and Wilson, M. (1998). A statistical

paradigm for neural spike train decoding applied to position prediction from the ensemble firing patterns of rat hippocampal place cells. J Neurosci, 18(18):7411— 7425.

Caminiti, R., Johnson, P., Galli, C., Ferraina, S., and Burnod, Y. (1991). Making

arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets. J Neurosci, 11:1182-1197.

Carmena, J., Lebedev, M., Crist, R., O’Doherty, J., Santucci, D., Dimitrov, D., Patil, P., Henriquez, C., and Nicolelis, M. (2003). Learning to control a brain- machine interface for reaching and grasping by primates. PLoS Biology, 1 (2 ).

Carpenter, R. and Williams, M. (1995). Neural computation of log likelihood in control of saccadic eye movements. Nature, 377:59-62.

Churchland, M., Santhanam, G., and Shenoy, K. (2006a). Preparatory activity in premotor and motor cortex reflects the speed of the upcoming reach. J Neurophysiol, 96(6):3130-3146.


BIBLIOGRAPHY 125

Churchland, M. and Shenoy, K. (2006). Delay of movement caused by disruption of cortical preparatory activity. J Neurophysiol. In press.

Churchland, M., Yu, B., Ryu, S., Santhanam, G., and Shenoy, K. (2006b). Neural

variability in premotor cortex provides a signature of motor preparation. J Neurosci, 26(14):3697-3712.

Cover, T. and Thomas, J. (1990). Elements of Information Theory. John Wiley, New York.

Crammond, D. and Kalaska, J. (2000). Prior information in motor and premotor cortex: activity during the delay period and effect on pre-movement activity. J

Neurophysiol, 84:986-1005.

Cunningham, J., Yu, B., and Shenoy, K. (2006). Optimal target placement for

neural communication prostheses. In Proc 28th Annual Conf IEEE EMBS, pages 2912-2915.

Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B, 39:1-38.

Doucet, A., de Freitas, N., and Gordon, N., editors (2001). Sequential Monte Carlo Methods in Practice. Springer, New York.

Doucet, A., Godsill, S., and Andrieu, C. (2000). On sequential monte carlo sam

pling methods for bayesian filtering. Stat Comput, 10(3):197-208.

Doucet, A. and Tadic, V. (2003). Parameter estimation in general state-space models using particle methods. Ann Inst Stat Math, 55(2):409-422.

Erlhagen, W. and Schoner, G. (2002). Dynamic field theory of movement preparation. Psychol Rev, 109:545-572.

Evarts, E. (1968). Relation of pyramidal tract activity to force exerted during voluntary movement. J Neurophysiol, 31:14-27.


BIBLIOGRAPHY 126

Fetz, E. (1992). Are movement parameters recognizably coded in the activity of single neurons? Behav Brain Sci, 15:679-690.

Gat, I., Tishby, N., and Abeles, M. (1997). Hidden Markov modelling of simultaneously recorded cells in the associative cortex of behaving monkeys. Network, 8(3):297-322.

Georgopoulos, A., Kalaska, J., Caminiti, R., and Massey, J. (1982). On the rela

tions between the direction of two-dimensional arm movements and cell discharge

in primate motor cortex. J Neurosci, 2:1527-1537.

Georgopoulos, A., Schwartz, A., and Kettner, R. (1986). Neuronal population

coding of movement direction. Science, 233:1416-1419.

Glidden, H., Yozbatiran, N., Rizzuto, D., Cramer, S., and Andersen, R. (2006). fmri during goal-directed movement planning in normal and spinal cord-injured subjects. Soc Neurosci Abstr.

Godschalk, M., Lemon, R., Kuypers, H., and van der Steen, J. (1985). The in

volvement of monkey premotor cortex neurones in preparation of visually cued

arm movements. Behav Brain Res, 18:143-157.

Gur, M., Beylin, A., and Snodderly, D. (1997). Response variability of neurons in primary visual cortex (VI) of alert monkeys. J Neurosci, 17:2914-2920.

Hanes, D. and Schall, J. (1996). Neural control of voluntary movement initiation. Science, 274:427-430.

Hatsopoulos, N., Joshi, J., and O’Leary, J. (2004). Decoding continuous and

discrete motor behaviors using motor and premotor cortical ensembles. J Neurophysiol, 92:1165-1174.

Haykin, S. (1996). Adaptive Filter Theory. Prentice-Hall.

Heskes, T. and Zoeter, O. (2002). Expectation propagation for approximate infer

ence in dynamic Bayesian networks. In Darwiche, A. and Friedman, N., editors, Proceedings UAI-2002, pages 216-223.


BIBLIOGRAPHY 127

Heskes, T. and Zoeter, 0 . (2003). Extended version of ‘Expectation propagation

for approximate inference in dynamic Bayesian networks’. Technical report, University of Nijmegen.

Hochberg, L., Serruya, M., Friehs, G., Mukand, J., Saleh, M., Caplan, A., Branner,

A., Chen, D., Penn, R., and Donoghue, J. (2006). Neuronal ensemble control of

prosthetic devices by a human with tetraplegia. Nature, 442:164-171.

Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999). An introduction

to variational methods for graphical models. Mach Learning, 37(2):183-233.

Julier, S. and Uhlmann, J. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE , 92(3):401-422.

Julier, S., Uhlmann, J., and Durrant-Whyte, H. (2000). A new method for the nonlinear transformation of means and covariances in filters and estimators.

IEEE Transactions on Automatic Control, 45(3):477-482.

Kalaska, J. and Crammond, D. (1995). Deciding not to GO: neuronal correlates of response selection in a GO/NOGO task in primate premotor and parietal

cortex. Cereb Cortex, 5:410-428.

Kemere, C. (2006). Model-based decoding of neural signals for prosthetic interfaces. PhD thesis, Stanford University.

Kemere, C., Sahani, M., and Meng, T. (2003). Robust neural decoding of reaching

movements for prosthetic systems. In Proc 25th Annual Conf IEEE EMBS, pages 2079-2082.

Kemere, C., Santhanam, G., Yu, B., Ryu, S., Meng, T., and Shenoy, K. (2004a). M od el-b ased d ecod in g o f reaching m ovem ent for p rosth etic sy stem s. In P r o c

26th Annual Conf IEEE EMBS, pages 4524-4528.

Kemere, C., Santhanam, G., Yu, B., Shenoy, K., and Meng, T. (2002). Decoding of plan and peri-movement neural signals in prosthetic systems. IEEE Workshop on Signal Processing Systems, pages 276-283.


BIBLIOGRAPHY 128

Kemere, C., Shenoy, K., and Meng, T. (2004b). Model-based neural decoding of reaching movements: A maximum likelihood approach. IEEE Trans Biomed Eng, 51(6):925-932.

Kemere, C., Yu, B., Santhanam, G., Ryu, S., Afshar, A., Meng, T., and Shenoy, K.(2006). Hidden Markov models for spatial and temporal estimation for prosthetic control. Soc Neurosci Abstr.

Kennedy, P., Bakay, R., Moore, M., Adams, K., and Goldwaithe, J. (2000). Direct

control of a computer from the human central nervous system. IEEE Trans Rehabil Eng, 8:198-202.

Kurata, K. (1989). Distribution of neurons with set- and movement-related activity before hand and foot movements in the premotor cortex of rhesus monkeys. Exp

Brain Res, 77:245-256.

Kurata, K. (1993). Premotor cortex of monkeys: set- and movement-related activity reflecting amplitude and direction of wrist movements. J Neurophysiol, 69:187-200.

Lerner, U. (2002). Hybrid Bayesian Networks for Reasoning about Complex Systems. PhD thesis, Stanford University.

Leuthardt, E., Schalk, G., Wolpaw, J., Ojemann, J., and Moran, D. (2004). A brain-computer interface using electrocorticographic signals in humans. J Neural Eng, 1:63-71.

MacKay, D. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.

Maynard, E., Hatsopoulos, N., Ojakangas, C., Acuna, B., Sanes, J., Normann,

R., and Donoghue, J. (1999). Neuronal interactions improve cortical population coding of movement direction. J Neurosci, 19:8083-8093.


BIBLIOGRAPHY 129

Messier, J. and Kalaska, J. (2000). Covariation of primate dorsal premotor cell activity with direction and amplitude during a memorized-delay reaching task. J Neurophysiol, 84:152-165.

Minka, T. (1999). From hidden Markov models to linear dynamical systems. Technical Report TR-531, Massachusetts Institute of Technology.

Minka, T. (2000). Deriving quadrature rules from Gaussian processes. Technical

report, h ttp ://re se a rch .m ic ro so ft.c o m /~ m in k a /p a p e rs /q u ad ra tu re .h tm l.

Minka, T. (2001a). Expectation propagation for approximate Bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UA1), pages 362-369.

Minka, T. (2001b). A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology.

Moran, D. and Schwartz, A. (1999). Motor cortical representation of speed and

direction during reaching. J Neurophysiol, 82:2676-2692.

Moran, D. and Schwartz, A. (2000). One motor cortex, two different views. Nat Neurosci, 3:963-963.

Musallam, S., Corneil, B., Greger, B., Scherberger, H., and Andersen, R. (2004).

Cognitive control signals for neural prosthetics. Science, 305:258-262.

Neal, R., Beal, M., and Roweis, S. (2004). Inferring state sequences for non-linear

systems with embedded hidden markov models. In Thrun, S., Saul, L., and Scholkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.

Paninski, L., Fellows, M., Hatsopoulos, N., and Donoghue, J. (2004). Spatiotem-

poral tuning of motor cortical neurons for hand position and velocity. J Neurophysiol, 91(1):515 532.


http://research.microsoft.com/~minka/papers/quadrature.html

BIBLIO GRAPHY 130

Patil, P., Carmena, J., Nicolelis, M., and Turner, D. (2004). Ensemble recordings

of human subcortical neurons as a source of motor control signals for a brain- machine interface. Neurosurgery, 55:27-38.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Pesaran, B., Musallam, S., and Andersen, R. (2006). Cognitive neural prosthetics. Curr Biol, 16:77-80.

Polikov, V., Tresco, P., and Reichert, W. (2005). Response of brain tissue to chronically implanted neural electrodes. J Neurosci Methods, 148:1-18.

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. (1992). Numerical

Recipes in C: The Art of Scientific Computing. Cambridge University Press.

Reina, G., Moran, D., and Schwartz, A. (2001). On the relationship between joint angular velocity and motor cortical discharge during reaching. J Neurophysiol, 85:2576-2589.

Riehle, A. and Requin, J. (1989). Monkey primary motor and premotor cortex:

single-cell activity related to prior information about direction and extent of an intended movement. J Neurophysiol, 61:534-549.

Riehle, A. and Requin, J. (1993). The predictive value for performance speed of preparatory changes in neuronal activity of the monkey motor and premotor

cortex. J Behav Brain Res, 53:35-49.

Roitman, J. and Shadlen, M. (2002). Response of neurons in the lateral intrapari- etal area during a combined visual discrimination reaction time task. J Neurosci, 22:9475-9489.

Rosenbaum, D. (1980). Human movement initiation: specification of arm, direction, and extent. J Exp Psychol Gen, 109:444-474.


BIBLIOGRAPHY 131

Roweis, S. and Ghahramani, Z. (2001). Learning nonlinear dynamical systems

using the em algorithm. In Haykin, S., editor, Kalman Filtering and Neural

Networks, pages 175-220. Wiley.

Sahani, M. (1999). Latent Variable Models for Neural Data Analysis. PhD thesis, California Institute of Technology.

Santhanam, G., Ryu, S., Yu, B., Afshar, A., and Shenoy, K. (2006). A high- performance brain-computer interface. Nature, 442:195-198.

Santhanam, G., Sahani, M., Ryu, S., and Shenoy, K. (2004). An extensible infras

tructure for fully automated spike sorting during online experiments. In Proc 26th Annual Conf IEEE EM BS, pages 4380-4384.

Schwartz, A. (1992). Motor cortical activity during drawing movements: Singleunit activity during sinusoid tracing. J Neurophysiol, 68(2):528-541.

Scott, S. (2004). Optimal feedback control and the neural basis of volitional motor control. Nat Rev Neurosci, 5:532-546.

Scott, S. and Kalaska, J. (1997). Reaching movements with similar hand paths but different arm orientations. I. Activity of individual cells in motor cortex. J Neurophysiol, 77:826-852.

Sergio, L. and Kalaska, J. (1998). Changes in the temporal pattern of primary

motor cortex activity in a directional isometric force versus limb movement task. J Neurophysiol, 80:1577-1583.

Serruya, M., Hatsopoulos, N., Paninski, L., Fellows, M., and Donoghue, J. (2002). Instant neural control of a movement signal. Nature, 416:141-142.

S hannon , C. (1 9 4 8 ). A m a th em a tica l th eory o f com m u n ication . B e l l S y s t e m T e c h

nical Journal, 27:379-423 and 623-656.

Shen, L. and Alexander, G. (1997). Preferential representation of instructed target

location versus limb trajectory in dorsal premotor area. J Neurophysiol, 77:1195- 1212 .


BIBLIO GRAPHY 132

Shenoy, K., Meeker, D., Cao, S., Kureshi, S., Pesaran, B., Mitra, P., Buneo, C., Batista, A., Burdick, J., and Andersen, R. (2003). Neural prosthetic control signals from plan activity. NeuroReport, 14:591-596.

Shoham, S., Halgren, E., Maynard, E., and Normann, R. (2001). Motor-cortical activity in tetraplegics. Nature, 413(6858) :793.

Shoham, S., Paninski, L., Fellows, M., Hatsopoulos, N., Donoghue, J., and Normann, R. (2005). Statistical encoding model for a primary motor cortical brain-

machine interface. IEEE Trans Biomed Eng, 52(7):1313—1322.

Smith, A. and Brown, E. (2003). Estimating a state-space model from point process observations. Neural Comput, 15(5):965-991.

Snyder, L., Batista, A., and Andersen, R. (1997). Coding of intention in the

posterior parietal cortex. Nature, 386:167-170.

Tanji, J. and Evarts, E. (1976). Anticipatory activity of motor cortex neurons in relation to direction of an intended movement. J Neurophysiol, 39:1062-1068.

Taylor, D., Tillery, S. H., and Schwartz, A. (2002). Direct cortical control of 3D

neuroprosthetic devices. Science, 296:1829-1832.

Taylor, D., Tillery, S. H., and Schwartz, A. (2003). Information conveyed through

brain-control: cursor vs. robot. IEEE Trans Neural Syst Rehabil Eng, 11:195- 199.

Tkach, D., Reimer, J., and Hatsopoulos, N. (2005). A hybrid neuromotor brain- machine interface using trajectory and goal state control modes. Soc Neurosci Abstr.

Tolhurst, D., Movshon, J., and Dean, A. (1983). The statistical reliability of signals

in single neurons in cat and monkey visual cortex. Vision Res, 23:775-785.

Valpola, H. and Karhunen, J. (2002). An unsupervised ensemble learning method for nonlinear dynamic state-space models. Neural Comput, 14(ll):2647-2692.


BIBLIOGRAPHY 133

Wan, E. and van der Merwe, R. (2001). The unscented Kalman filter. In Haykin, S., editor, Kalman Filtering and Neural Networks, chapter 7. Wiley Publishing.

Weinrich, M. and Wise, S. (1982). The premotor cortex of the monkey. J Neurophysiol, 2:1329-1345.

Weinrich, M., Wise, S., and Mauritz, K. (1984). A neurophysiological study of the

premotor cortex in the rhesus monkey. Brain, 107:385-414.

Wise, S. (1985). The primate premotor cortex: past, present, and preparatory.

Ann Rev Neurosci, 8:1-19.

Wolpaw, J. and McFarland, D. (2004). Control of a two-dimensional movement signal by a noninvasive brain-computer interface in humans. Proc Natl Acad Sci USA, 1 0 1 (51): 17849-17854.

Wu, W., Black, M., Gao, Y., Bienenstock, E., Serruya, M., Shaikhouni, A., and Donoghue, J. (2003). Neural decoding of cursor motion using a Kalman filter.

Advances in Neural Information Processing Systems 15.

Wu, W., Black, M., Mumford, D., Gao, Y., Bienenstock, E., and Donoghue, J. (2004). Modeling and decoding motor cortical activity using a switching Kalman filter. IEEE Trans Biomed Eng, 51(6):933-942.

Wu, W., Gao, Y., Bienenstock, E., Donoghue, J., and Black, M. (2006). Bayesian population decoding of motor cortical activity using a Kalman filter. Neural Comput, 18(1):80-118.

Ypma, A. and Heskes, T. (2005). Novel approximations for inference in nonlinear dynamical systems using expectation propagation. Neurocomputing, 69:85-99.

Yu, B., Ryu, S., Santhanam, G., Churchland, M., and Shenoy, K. (2004). Improv

ing neural prosthetic system performance by combining plan and peri-movement

activity. In Proc 26th Annual Conf IEEE EMBS, pages 4516-4519.


BIBLIO GRAPHY 134

Zoeter, 0 ., Ypma, A., and Heskes, T. (2004). Improved unscented Kalman smoothing for stock volatility estimation. In Barros, A., Principe, J., Larsen, J., Adali, T., and Douglas, S., editors, Proceedings of the IEEE Workshop on Machine Learning for Signal Processing.

Zumsteg, Z., Kemere, C., O’Driscoll, S., Santhanam, G., Ahmed, R., Shenoy,

K., and Meng, T. (2005). Power feasibility of implantable digital spike sorting

circuits for neural prosthetic systems. IEEE Trans Neural Syst Rehabil Eng,

13:272-279.


Documents

NEURAL DYNAMICS OF MOTOR PREPARATION AND A …shenoy/Theses/Yu.pdf · 2018-05-26 · Unit at University College London visited our lab. ... 2.3 Goal-directed reach task and neural