1
Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals Tom Dupr´ e La Tour 1 Thomas Moreau 2 Mainak Jas 1 Alexandre Gramfort 2 1 LTCI, T´ el´ ecom ParisTech, Universit´ e Paris Saclay, Paris, France, 2 Parietal, INRIA, Universit´ e Paris Saclay, Saclay, France Code available at: https://alphacsc.github.io 1. Convolutional Sparse Coding (CSC) Convolutional linear model : X = K X k =1 z k * D k + E , E∼N (0,σI ) (1) with signal X R P ×T (P sensors and T samples), K patterns D k R P ×L (duration L) and activations z k R e T such that e T = T - L +1. Multivariate CSC min D k ,z n k N X n=1 1 2 X n - K X k =1 z n k * D k 2 2 + λ K X k =1 kz n k k 1 , s.t. kD k k 2 2 1 and z n k 0 , (2) Multivariate CSC with rank-1 constraint min u k ,v k ,z n k N X n=1 1 2 X n - K X k =1 z n k * (u k v > k ) 2 2 + λ K X k =1 kz n k k 1 , s.t. ku k k 2 2 1 , kv k k 2 2 1 and z n k 0 . (3) One source in the brain is spread linearly and instantaneously over all sensors. The rank-1 hypothesis is particularly suited for MEG signals. 2. Z -step: solving for the activations The Z -step solves (2) or (3) for a fixed dictionary. We solve it using Greedy Coordinate Descent (GCD): I Optimization problem for one coordinate has a close form: z 0 k [t] = max β k [t] - λ kD k k 2 2 , 0 (4) with β k [t]= h D k * X - K l =1 z l * D l + z k [t]e t * D k i [t]. I Greedily update coefficient (k 0 ,t 0 ) = argmax |Z k [t] - Z 0 k [t]| Locally Greedy Coordinate Descent (LGCD): [Moreau et al., 2018] I Select the best coordinate locally, on one of M contiguous segments C m : (k 0 ,t 0 ) = argmax (k,t)∈C m |Z k [t] - Z 0 k [t]| I For M = b e T/(2L - 1)c, the computational complexity of choosing the coefficient matches the complexity of performing the update. I It is efficient when the updates are weakly dependent and when the solution is sparse. 3. D -step: solving for the atoms The D -step solves (2) or (3) for a fixed activations. We solve it using Projected Gradient Decent (PGD): I Separate minimization over {u k } k and {v k } k . I The step size is set using a Armijo backtracking line-search. Function and gradients computations: I The gradient relatively to a full atom D k = u k v > k R P ×L : D k E = N X n=1 (z n k ) > * X n - K X l =1 z n l * D l k - K X l =1 Ψ k,l * D l , (5) where Φ k R P ×L and Ψ k,l R 2L-1 are constant during a D -step and can be precomputed. I The gradients relatively to u k and v k are obtained using the chain rule: u k E =(D k E )v k R P , (6) v k E = u > k (D k E ) R L , (7) I E can be computed, up to a constant term C , with the following: E = K X k =1 u > k (D k E )v k + C. (8) Using this pre-computation helps scaling with the number of channels as the computation are in O (P + L) instead of O (PL). 4. Multivariate speed benchmark Speed benchmark for univariate-CSC (top) and multivariate-CSC (bottom). 5. Simulated Signals I Signals are generated following (1) over P channels. I Recovered temporal patterns b v k are evaluated using: loss(b v )= min sS(K ) K X k =1 min ( kb v s(k ) - v k k 2 2 , kb v s(k ) + v k k 2 2 ) . More channels improves the pattern recovery as it disentangling super-imposed patterns. 6. Experimental Signals (a)Temporal waveform (b) Spatial pattern (c) PSD (dB) (d) Dipole fit Atom learned using the MNE-somatosensory dataset. The learned temporal pattern illustrate mu-waveforms described for instance in [Cole and Voytek, 2017]. NIPS, 2018 Montreal, Canada

Multivariate Convolutional Sparse Coding for Electromagnetic …tomdlt.github.io/pdf/2018neurips.pdf · Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multivariate Convolutional Sparse Coding for Electromagnetic …tomdlt.github.io/pdf/2018neurips.pdf · Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

Multivariate Convolutional Sparse Coding forElectromagnetic Brain Signals

Tom Dupre La Tour1 Thomas Moreau2 Mainak Jas1 Alexandre Gramfort2

1LTCI, Telecom ParisTech, Universite Paris Saclay, Paris, France,2 Parietal, INRIA, Universite Paris Saclay, Saclay, France

Code available at: https://alphacsc.github.io

1. Convolutional Sparse Coding (CSC)

Convolutional linear model :

X =

K∑k=1

zk ∗Dk + E , E ∼ N (0, σI) (1)

with signal X ∈ RP×T (P sensors and T samples), K patterns Dk ∈ RP×L

(duration L) and activations zk ∈ RT such that T = T − L + 1.

X

z

D

Multivariate CSC

minDk,znk

N∑n=1

1

2

∥∥∥∥∥∥Xn −K∑k=1

znk ∗Dk

∥∥∥∥∥∥2

2

+ λK∑k=1

‖znk‖1,

s.t. ‖Dk‖22 ≤ 1 and znk ≥ 0 ,

(2)

Multivariate CSC with rank-1 constraint

minuk,vk,znk

N∑n=1

1

2

∥∥∥∥∥∥Xn −K∑k=1

znk ∗ (ukv>k )

∥∥∥∥∥∥2

2

+ λK∑k=1

‖znk‖1 ,

s.t. ‖uk‖22 ≤ 1 , ‖vk‖2

2 ≤ 1 and znk ≥ 0 .

(3)

← One source in the brain is spread linearly and instantaneously over allsensors. The rank-1 hypothesis is particularly suited for MEG signals.

2. Z-step: solving for the activations

The Z-step solves (2) or (3) for a fixed dictionary. We solve it using

Greedy Coordinate Descent (GCD):

I Optimization problem for one coordinate has a close form:

z′k[t] = max

(βk[t]− λ‖Dk‖2

2

, 0

)(4)

with βk[t] =[D�k ∗

(X −

∑Kl=1 zl ∗Dl + zk[t]et ∗Dk

)][t].

I Greedily update coefficient (k0, t0) = argmax |Zk[t]− Z ′k[t]|

Locally Greedy Coordinate Descent (LGCD): [Moreau et al., 2018]

I Select the best coordinate locally, on one of M contiguous segments Cm:

(k0, t0) = argmax(k,t)∈Cm

|Zk[t]− Z ′k[t]|

I For M = bT /(2L− 1)c, the computational complexity of choosing thecoefficient matches the complexity of performing the update.

I It is efficient when the updates are weakly dependent and when thesolution is sparse.

3. D-step: solving for the atoms

The D-step solves (2) or (3) for a fixed activations. We solve it using

Projected Gradient Decent (PGD):

I Separate minimization over {uk}k and {vk}k.

I The step size is set using a Armijo backtracking line-search.

Function and gradients computations:

I The gradient relatively to a full atom Dk = ukv>k ∈ RP×L:

∇DkE =

N∑n=1

(znk )> ∗

Xn −K∑l=1

znl ∗Dl

= Φk −K∑l=1

Ψk,l ∗Dl , (5)

where Φk ∈ RP×L and Ψk,l ∈ R2L−1 are constant during a D-step andcan be precomputed.

I The gradients relatively to uk and vk are obtained using the chain rule:

∇ukE = (∇DkE)vk ∈ RP , (6)

∇vkE = u>k (∇DkE) ∈ RL , (7)

I E can be computed, up to a constant term C, with the following:

E =

K∑k=1

u>k (∇DkE)vk + C . (8)

⇒ Using this pre-computation helps scaling with the number of channels asthe computation are in O(P + L) instead of O(PL).

4. Multivariate speed benchmark

λ= 0.3 λ= 1.0 λ= 3.0 λ= 10.0

103

104

Tim

e (s

)Garcia-Cardona et al (2017)Jas et al (2017) FISTA

Jas et al (2017) LBFGSProposed (univariate)

λ= 0.3 λ= 1.0 λ= 3.0 λ= 10.0

103

Tim

e (s

)

Wohlberg (2016) Proposed (multivariate) Proposed (rank-1)

Speed benchmark for univariate-CSC (top) and multivariate-CSC (bottom).

5. Simulated Signals

0 10 20 30 40 50 60Times

0.3

0.2

0.1

0.0

0.1

0.2

Atom

s

P= 1 P= 5 Simulated10-6 10-5 10-4 10-3 10-2 10-1

Noise level σ

10-3

10-2

10-1

100

loss

(v)

P= 1

P= 5

P= 25

P= 50

I Signals are generated following (1) over P channels.I Recovered temporal patterns vk are evaluated using:

loss(v) = mins∈S(K)

K∑k=1

min(‖vs(k) − vk‖2

2, ‖vs(k) + vk‖22

).

⇒ More channels improves the pattern recovery as it disentanglingsuper-imposed patterns.

6. Experimental Signals

0.0 0.2 0.4 0.6 0.8 1.0Time (s)

0.3

0.2

0.1

0.0

0.1

0.2

(a) Temporal waveform (b) Spatial pattern

0 5 10 15 20 25Frequencies (Hz)

30

20

10

0

10

20

(c) PSD (dB) (d) Dipole fit

Atom learned using the MNE-somatosensory dataset. The learned temporalpattern illustrate mu-waveforms described for instance in [Cole and Voytek, 2017].

NIPS, 2018 Montreal, Canada