1
Dimensionality Reduction of Multi-Trial Neural Data by Canonical Polyadic Tensor Decomposition Alex H. Williams , Hyun Kim 2 , Fori Wang 1 , Mark J. Schnitzer 1 , Surya Ganguli , Tamara G. Kolda 3 4 3 , Krishna V. Shenoy , Saurabh Vyas 2 2 We respectively refer to , , and as the neuron factors, time factors, and trial factors, of the CP decomposition. These vectors are collected into the columns of the matrices , , and , which we call the factor matrices. A crucial disadvantage of PCA is that its objective function is invariant to any invertible linear transformation (since ), meaning that the latent factors can only be recovered up to a linear subspace. In contrast, due to a seminal result of Kruskal [1977], the factors of high- er-order CP decomposition are essentially recoverable so long as they are linearly independent. Motivating Example (Linear 1-layer network) Acknowledgements and Funding: This work was supported by grants from Howard Hughes Medical Institute, the McNight Foundation, the Burroughs-Wellcome Fund, and the Deptartment of Energy Computational Science Graduate Fellowship. We thank Jeffrey Seely (Columbia University) and Madeleine Udell (Cornell University) for helpful conversations, and Chris Stock (Stanford University) for providing simulated data from the Mante-Susillo model network. Model Properties and Intuition trial #50 trial #100 trial #1 ... ... neurons + + Data is approximated as a sum of rank-1 tensors neuron factors time factors trial factors last trial first trial trial end trial start X = cell #6 cell #1 Canonical Polyadic (CP) Tensor Decomposition Case Study #1 (Artificial Recurrent Network) Calcium dynamics of ~500 neurons in the medial prefrontal cortex were imaged with a miniature fluorescence microscope in mice per- forming a spatial navigation task. Mice began each trial in either the east or west arm of a four-armed maze, with the opposing arm blocked. Mice were trained to follow either an egocentric (A) or allo- centric (B) navigation strategy for a water reward. The rewarded strategy was periodically switched, prompting the mouse to adapt strategies. Panels (C-D) show four illus- trative sets of CP factors chosen from a rank-15 model. Each set of factors identified a neural subpopulation (C) with particular within-trial (D) and across-trial (E) pattern of activity. The top factor has a consistent pattern of activity across trials (top row) while the others identify neural pat- terns that encode experimen- tal condition (row 2), choice (row 3), and trial outcomes (row 4). CP decomposition identifies neural populations that are sensitive to mouse position, behavioral strategies, and rewards Case Study #2 (Mouse Prefrontal Cortex) neuron factors time factors trial factors -4.6 4.6 -2.1 3.4 0.06 0.77 -1.9 5.0 -0.1 2.3 -0.76 1.20 -3.0 3.0 0.0 2.2 -1.0 0.54 -3.5 3.0 -0.5 3.5 -0.6 1.0 neutral factor correct east north south west start location end location rewarded error neuron index first trial last trial trial start reward delivery Egocentric Condition e.g. turn right Allocentric Condition e.g. go south trial 1 25 50 75 100 input signals across trials trial 1 25 50 75 100 CP decomposition can be interpreted as a 1-layer linear network with input signals that change in magnitude across trials (A). Simulated data with three inputs, 50 neurons, 100 trials, and additive Gaussian noise, is shown in (B). These data are approximately rank-3 since the dynamics of each neuron are a linear combina- tion of the three input signals. CP decomposition (C) identified the synaptic weights (left column), input signal waveforms (middle column), and how the inputs changed in magnitude across trials (right column), up to a re-scaling and permutation of the factors (see panel on Properties and Intuition). Applying PCA (D) or ICA (E) to the three tensor unfoldings (matricizations) of the data did not recover this structure. CP Decomposition simulated rasters time within trial neurons PCA on unfoldings ICA on unfoldings A B neurons time trials neurons time trials neurons time trials C D E A B Author Affiliations and Contact Information: Departments of Neurobiology 1 , Electrical Engineering 2 , & Applied Physics 3 , Stanford University, Stanford, CA. Data Sciences & Cyber Analytics Department 4 , Sandia National Labs, Livermore, CA. *Send correspondence to [email protected] and [email protected] Demo Code: Modern technologies enable neuroscientists to record from many hundreds of neurons for weeks or even months. Commonly used methods for dimensionality reduction identify low-dimensional features of with- in-trial neural dynamics, but do not model changes in neural activity across trials. We represent multi-trial data as a three-dimensional data array (a third-order tensor) with each entry indicating the activity of a par- ticular neuron at a particular time on a particular trial, and use canonical polyadic (CP) tensor decomposi- tion to find low-dimensional signatures of within and across-trial changes in neural dynamics. This ap- proach produces more informative descriptions of experimental and synthetic multi-trial neural data than classic techniques such as principal and independent components analysis (PCA & ICA). Summary evidence signals A We used the Python package pycog [Song et al, 2016, PLoS Comp Bio] to train a recurrent network with rectified linear hidden units (A) to perform a context-dependent sensory integration task [Mante et al, 2013, Nature]. On each trial, one of two context-indicating input signals (top, A) was active. Each contextual cue was matched to a pair of noisy evidence signals (left, A), and the network used two readout units (right, A) to indicate which of the two evidence signals associated with the activated context was larger. We simulat- ed trials from 3 coherence conditions across both contexts; the trial-averaged population activity is shown in (B). A rank-4 CP decomposition captures roughly 90% of variance in the data; the neuron and time fac- tors shown in (C) indicate that a small fraction of neurons dominate and have ramping activity profiles. Each of these subpopulations is related to task-relevant variables as can be seen by plotting the pairwise relationships between trial factors in the trial factors (D-F). Within this reduced, 4-dimensional space, the trials clearly separate based on choice (D), trial context (E), and strength of evidence (F). trial-average raster time trials multi-trial neural activity recordings readout units (choice) context signals (active) (inactive) A B synaptic weights input waveforms input magnitude model estimate ground truth C D E minimize A,B X ::k - AB T 2 F Denote the activity of neuron at time on trial as . The full dataset is a third-order tensor and the population activity on a given trial is a matrix . CP decomposition aproxi- mates the data as a sum of rank-one tensors, which is a direct generalization of PCA to higher-order data arrays. k n t x ntk X X ::k A B C minimize A,B,C K k=1 X ::k - ADiag(c k: )B T 2 F PCA on a single trial CP decomp. on all trials Case Study #3 (Primate Motor/Premotor Cortex) 0 5.5 neuron 0 4.8 0 4.1 0.0 0.3 time (smoothed) 0.0 0.31 0.04 0.24 0.03 0.08 trial (smoothed) 0.02 0.07 0.02 0.06 A 501 ms 192 multi-units 1563 trials perturb BMI B BMI A Rhesus monkey was trained to make point-to-point reaches to visual targets in a 2D plane with a virtual cursor controlled by their contralateral arm (A). We recorded approximately 200 units using multi-electrode arrays (Utah arrays, Blackrock Microsystems) implanted in the pre- motor and motor cortex. We first fit a velocity Kalman filter to decode cursor velocities from firing neural rates, and then perturbed this decoder by rotating the output cursor velocities (a visuomotor rotation) by 30 degrees. Panel (B) shows the results of a rank-3 CP decomposition applied to this dataset. The factors identify neural populations which are pref- erentially activated by the BMI perturbation. #1 #2 #3 factors neurons time neurons CP decomp. factors time Def: a tensor is rank-one if it is ex- pressible as a vector outer product: Y Y = u v w ⇐⇒ y ijk = u i v j w k minimize A,B X - X 2 F subject to X = R r =1 a :r b :r c :r minimize A,B X ::k - X 2 F subject to X = R r =1 a :r b :r The boxes below show two equivalent formulations of PCA and CP decomposition to highlight their similarites. c :r a :r b :r activity B C D E F trial factors, choice #1 #2 trial factors, context #1 #2 coherence context #1 -10 3 10 context #2 1 -1 -3 https://github.com/ahwillia/tensor-demo http://www.sandia.gov/~tgkolda/TensorToolbox/ MATLAB Toolbox: We fit CP decompositions by alternating least-squares (ALS) as reviewed in [Bader & Kolda, 2009, SIAM Review]. This procedure may converge to local minima, but similar factors (after alignment by permutation and re-scaling) were consistently obtained from different random initializations in all applications. AB T = AF -1 FB T = A B T = X activity

Dimensionality Reduction of Multi-Trial Neural Data by

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dimensionality Reduction of Multi-Trial Neural Data by

Dimensionality Reduction of Multi-Trial Neural Data byCanonical Polyadic Tensor Decomposition

Alex H. Williams , Hyun Kim2, Fori Wang1 , Mark J. Schnitzer1 , Surya Ganguli, Tamara G. Kolda3 4 3, Krishna V. Shenoy, Saurabh Vyas2 2

We respectively refer to , , and as the neuron factors, time factors, and trial factors, of the CP decomposition. These vectors are collected into the columns of the matrices , , and , which we call the factor matrices. A crucial disadvantage of PCA is that its objective function is invariant to any invertible linear transformation (since ), meaning that the latent factors can only be recovered up to a linear subspace. In contrast, due to a seminal result of Kruskal [1977], the factors of high-er-order CP decomposition are essentially recoverable so long as they are linearly independent.

Motivating Example (Linear 1-layer network)

Acknowledgements and Funding:This work was supported by grants from Howard Hughes Medical Institute, the McNight Foundation, the Burroughs-Wellcome Fund, and the Deptartment of Energy Computational Science Graduate Fellowship. We thank Jeffrey Seely (Columbia University) and Madeleine Udell (Cornell University) for helpful conversations, and Chris Stock (Stanford University) for providing simulated data from the Mante-Susillo model network.

Model Properties and Intuition

trial #50 trial #100trial #1

. . . . . .

neur

ons ≈ + +

Data is approximated as a sum of rank-1 tensors

neuronfactors

timefactors

trial factors

lasttrial

firsttrial

trialend

trialstart

X =

cell #6cell #1

Canonical Polyadic (CP) Tensor Decomposition

Case Study #1 (Artificial Recurrent Network)

Calcium dynamics of ~500 neurons in the medial prefrontal cortex were imaged with a miniature fluorescence microscope in mice per-forming a spatial navigation task. Mice began each trial in either the east or west arm of a four-armed maze, with the opposing arm blocked. Mice were trained to follow either an egocentric (A) or allo-centric (B) navigation strategy for a water reward. The rewarded strategy was periodically switched, prompting the mouse to adapt strategies.

Panels (C-D) show four illus-trative sets of CP factors chosen from a rank-15 model. Each set of factors identified a neural subpopulation (C) with particular within-trial (D) and across-trial (E) pattern of activity. The top factor has a consistent pattern of activity across trials (top row) while the others identify neural pat-terns that encode experimen-tal condition (row 2), choice (row 3), and trial outcomes (row 4).

CP decomposition identifies neural populationsthat are sensitive to mouse position, behavioralstrategies, and rewards

Case Study #2 (Mouse Prefrontal Cortex)

neuron factors time factors trial factors

-4.6

4.6

-2.1

3.4

0.06

0.77

-1.9

5.0

-0.1

2.3

-0.76

1.20

-3.0

3.0

0.0

2.2

-1.0

0.54

-3.5

3.0

-0.5

3.5

-0.6

1.0

neutral factor

correct

east

northsouth

west

start location

end location

rewarded

error

neuron indexfirsttrial

lasttrial

trialstart

rewarddelivery

Egocentric Conditione.g. turn right

Allocentric Conditione.g. go south

trial 1 25 50 75 100input signals across trials

trial 1 25 50 75 100

CP decomposition can be interpreted as a 1-layer linear network with input signals that change in magnitude across trials (A). Simulated data with three inputs, 50 neurons, 100 trials, and additive Gaussian noise, is shown in (B). These data are approximately rank-3 since the dynamics of each neuron are a linear combina-tion of the three input signals. CP decomposition (C) identified the synaptic weights (left column), input signal waveforms (middle column), and how the inputs changed in magnitude across trials (right column), up to a re-scaling and permutation of the factors (see panel on Properties and Intuition). Applying PCA (D) or ICA (E) to the three tensor unfoldings (matricizations) of the data did not recover this structure.

CP Decomposition

simulated rasters

time within trial

neur

ons

PCA on unfoldings ICA on unfoldings

A B

neurons time trials neurons time trials neurons time trials C D E

A

B

Author Affiliations and Contact Information:Departments of Neurobiology1, Electrical Engineering2, & Applied Physics3, Stanford University, Stanford, CA. Data Sciences & CyberAnalytics Department4, Sandia National Labs, Livermore, CA. *Send correspondence to [email protected] and [email protected]

Demo Code:

Modern technologies enable neuroscientists to record from many hundreds of neurons for weeks or even months. Commonly used methods for dimensionality reduction identify low-dimensional features of with-in-trial neural dynamics, but do not model changes in neural activity across trials. We represent multi-trial data as a three-dimensional data array (a third-order tensor) with each entry indicating the activity of a par-ticular neuron at a particular time on a particular trial, and use canonical polyadic (CP) tensor decomposi-tion to find low-dimensional signatures of within and across-trial changes in neural dynamics. This ap-proach produces more informative descriptions of experimental and synthetic multi-trial neural data than classic techniques such as principal and independent components analysis (PCA & ICA).

Summary

evidencesignals

A

We used the Python package pycog [Song et al, 2016, PLoS Comp Bio] to train a recurrent network with rectified linear hidden units (A) to perform a context-dependent sensory integration task [Mante et al, 2013, Nature]. On each trial, one of two context-indicating input signals (top, A) was active. Each contextual cue was matched to a pair of noisy evidence signals (left, A), and the network used two readout units (right, A) to indicate which of the two evidence signals associated with the activated context was larger. We simulat-ed trials from 3 coherence conditions across both contexts; the trial-averaged population activity is shown in (B). A rank-4 CP decomposition captures roughly 90% of variance in the data; the neuron and time fac-tors shown in (C) indicate that a small fraction of neurons dominate and have ramping activity profiles. Each of these subpopulations is related to task-relevant variables as can be seen by plotting the pairwise relationships between trial factors in the trial factors (D-F). Within this reduced, 4-dimensional space, the trials clearly separate based on choice (D), trial context (E), and strength of evidence (F).

trial-average raster

time trials

multi-trial neural activity recordings

readout units(choice)

context signals

(active) (inactive)

A B

synapticweights

inputwaveforms

inputmagnitude

model estimateground truth

C D E

minimizeA,B

∥∥X::k −ABT∥∥2F

Denote the activity of neuron at time on trial as . The full dataset is a third-order tensor and the population activity on a given trial is a matrix . CP decomposition aproxi-mates the data as a sum of rank-one tensors, which is a direct generalization of PCA to higher-order data arrays.

kn t xntk

XX::k

A B C

minimizeA,B,C

K∑k=1

∥∥X::k −ADiag(ck:)BT∥∥2F

PCA on a single trial CP decomp. on all trials

Case Study #3 (Primate Motor/Premotor Cortex)

0

5.5neuron

0

4.8

0

4.1

0.0

0.3time (smoothed)

0.0

0.31

0.04

0.24

0.03

0.08trial (smoothed)

0.02

0.07

0.02

0.06

A

501 ms192 multi-units 1563 trialsperturb BMI

BBMI

A Rhesus monkey was trained to make point-to-point reaches to visual targets in a 2D plane with a virtual cursor controlled by their contralateral arm (A). We recorded approximately 200 units using multi-electrode arrays (Utah arrays, Blackrock Microsystems) implanted in the pre- motor and motor cortex. We first fit a velocity Kalman filter to decode cursor velocities from firing neural rates, and then perturbed this decoder by rotating the output cursor velocities (a visuomotor rotation) by 30 degrees. Panel (B) shows the results of a rank-3 CP decomposition applied to this dataset. The factors identify neural populations which are pref-erentially activated by the BMI perturbation.

#1#2

#3fa

ctor

s

neurons time

neur

ons

CP decomp. factors

time

Def: a tensor is rank-one if it is ex-pressible as a vector outer product:

Y

Y = u ◦ v ◦w ⇐⇒ yijk = uivjwk

minimizeA,B

∥∥X − X̂∥∥2F

subject to X̂ =∑R

r=1 a:r ◦ b:r ◦ c:r

minimizeA,B

∥∥X::k − X̂∥∥2F

subject to X̂ =∑R

r=1 a:r ◦ b:r

The boxes below show two equivalent formulations of PCA and CP decomposition to highlight their similarites.

c:ra:r b:r

activity

B C

D E Ftrial factors, choice #1#2 trial factors, context #1

#2 coherence context #1-10 3 10

context #2

1-1-3

https://github.com/ahwillia/tensor-demo http://www.sandia.gov/~tgkolda/TensorToolbox/MATLAB Toolbox:

We fit CP decompositions by alternating least-squares (ALS) as reviewed in [Bader & Kolda, 2009, SIAM Review]. This procedure may converge to local minima, but similar factors (after alignment by permutation and re-scaling) were consistently obtained from different random initializations in all applications.

ABT = AF−1FBT = A′B′T = X̂

activity