View
2
Download
0
Category
Preview:
Citation preview
Shallow Learning with Kernels
for Dictionary-Free Magnetic Resonance Fingerprinting
Gopal Nataraj∗, Mingjie Gao∗, Jakob Asslander†, Clayton Scott∗, & Jeffrey A. Fessler∗
ISMRM Workshop on Magnetic Resonance Fingerprinting
∗Dept. of Electrical Engineering and Computer Science, University of Michigan†Center for Biomedical Imaging, NYU School of Medicine
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
Shallow Learning
• simpler structure needs fewer training points
• fast training via convex optimization
3
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
6
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
6
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
References i
Asslander, J., Lattanzi, R., Sodickson, D. K., and Cloos, M. A. (2017).
Relaxation in spherical coordinates: analysis and optimization of pseudo-SSFP based
MR-fingerprinting.
arxiv 1703.00481.
Cohen, O., Zhu, B., and Rosen, M. (2017).
Deep learning for fast MR fingerprinting reconstruction.
In Proc. Intl. Soc. Mag. Res. Med., page 0688.
Nataraj, G., Nielsen, J.-F., and Fessler, J. A. (2017a).
Myelin water fraction estimation from optimized steady-state sequences using kernel ridge
regression.
In Proc. Intl. Soc. Mag. Res. Med., page 5076.
8
References ii
Nataraj, G., Nielsen, J.-F., Scott, C., and Fessler, J. A. (2017b).
Dictionary-free MRI PERK: Parameter estimation via regression with kernels.
IEEE Trans. Med. Imag.
Submitted.
Rahimi, A. and Recht, B. (2007).
Random features for large-scale kernel machines.
In NIPS.
Scholkopf, B., Herbrich, R., and Smola, A. J. (2001).
A generalized representer theorem.
In Proc. Computational Learning Theory (COLT), pages 416–426.
LNCS 2111.
9
References iii
Virtue, P., Yu, S. X., and Lustig, M. (2017).
Better than real: Complex-valued neural nets for MRI fingerprinting.
In Proc. IEEE Intl. Conf. on Image Processing.
To appear.
10
PERK solution
Closed-form solution for each l ∈ {1, . . . , L}:
xl(·) = xTl
(1
N1N +M(MKM+ Nρl IN)
−1
(k(·)−
1
NK1N
))(2)
• xl := [xl ,1, . . . , xl ,N ]T training point regressands
• K :=
k(y1, y1) · · · k(y1, yN)
.... . .
...
k(yN , y1) · · · k(yN , yN)
Gram matrix
• M := IN − 1N1N1
TN de-meaning operator
• k(·) := [k(·, y1), . . . , k(·, yN )]T nonlinear kernel embedding
Can we scale computation with L more gracefully?
• Yes, in fact (2) separable in l ∈ {1, . . . , L} by construction
• However, explicitly computing K may be undesirable... 11
PERK solution
Closed-form solution for each l ∈ {1, . . . , L}:
xl(·) = xTl
(1
N1N +M(MKM+ Nρl IN)
−1
(k(·)−
1
NK1N
))(2)
• xl := [xl ,1, . . . , xl ,N ]T training point regressands
• K :=
k(y1, y1) · · · k(y1, yN)
.... . .
...
k(yN , y1) · · · k(yN , yN)
Gram matrix
• M := IN − 1N1N1
TN de-meaning operator
• k(·) := [k(·, y1), . . . , k(·, yN )]T nonlinear kernel embedding
Can we scale computation with L more gracefully?
• Yes, in fact (2) separable in l ∈ {1, . . . , L} by construction
• However, explicitly computing K may be undesirable... 11
Backup: PERK as High-Dimensional Bayesian Linear Regression
Suppose there exists“approximate feature mapping” z : Y 7→ RZ
such that Z := [z(y1), . . . , z(yN)] has for dim(Y) ≪ Z ≪ N
K ≈ ZTZ. (3)
Plugging (3) into PERK solution (2) and rearranging gives
xl(·) ≈1
NxTl 1N +
1
NxTl MZT
(1
NZMZT + ρl IZ
)−1(z(·)−
1
NZ1N
)
Does such a z exist and work well in practice?
• Yes, e.g. for kernels of form k(y, y′) ≡ k(y − y′) [Rahimi and Recht, 2007]
• In such cases, can reduce from ∼N2 to ∼NZ computations
12
Backup: PERK as High-Dimensional Bayesian Linear Regression
Suppose there exists“approximate feature mapping” z : Y 7→ RZ
such that Z := [z(y1), . . . , z(yN)] has for dim(Y) ≪ Z ≪ N
K ≈ ZTZ. (3)
Plugging (3) into KRR solution (2) and rearranging gives
xl(·) ≈ mxl + cTxl z
(Czz + ρl IZ
)−1
(z(·)− mz) (4)
which is regularized (“ridge”) Z -dimensional affine regression!
Does such a z exist and work well in practice?
• Yes, e.g. for kernels of form k(y, y′) ≡ k(y − y′) [Rahimi and Recht, 2007]
• In such cases, can reduce from ∼N2 to ∼NZ computations
12
Recommended