View
8
Download
0
Category
Preview:
Citation preview
Phase Retrieval And Cryo-Electron Microscopy
http://bicmr.pku.edu.cn/~wenzw/bigdata2020.html
Acknowledgement: this slides is based on Prof. Emmanuel Candès ’s and Prof. AmitSinger’s lecture notes
1/125
2/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
3/125
X-ray crystallography
Method for determining atomic structure within a crystal
10 Nobel Prizes in X-ray crystallography, and counting...
4/125
Missing phase problem
Detectors record intensities of diffracted rays =⇒ phaseless dataonly!
Fraunhofer diffraction =⇒ intensity of electrical ≈ Fourier transform
|x(f1, f2)|2 =
∣∣∣∣∫ x(t1, t2)e−i2π(f1t1+f2t2)dt1dt2
∣∣∣∣Electrical field x = |x|eiφ with intensity |x|2
Phase retrieval problem (inversion)How can we recover the phase (or signal x(t1, t2)) from |x(f1, f2)|
5/125
Phase and magnitude
Phase carries more information than magnitude
6/125
Phase retrieval in X-ray crystallography
Knowledge of phase crucial to build electron density map
Algorithmic means of recovering phase structure withoutsophisticated setups
Sayre (’52), Fienup (’78)
Initial success in certain cases by using very specific priorknowledge
7/125
X-ray imaging: now and then
Need for lens-less imagingResurgence: imaging with new X-ray sources (undulators andsynchrotrons); e.g. Miao and collaborators (’99—present)
8/125
X-ray diffraction microscopy
Imaging non-crystalline objects by measuring X-ray diffractionpatterns using extremely intense and ultrashort X-ray pulses
ConsequencesPR problem getting more importantFar less prior knowledge about unknown signal
9/125
Ultrashort X-ray pulses
10/125
Other applications of phase retrieval
11/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
12/125
Classical Phase Retrieval
Feasibility problem
find x ∈ S ∩M or find x ∈ S+ ∩M
given Fourier magnitudes:
M := {x(r) | |x(ω)| = b(ω)}
where x(ω) = F(x(r)), F : Fourier transformgiven support estimate:
S := {x(r) | x(r) = 0 for r /∈ D}
orS+ := {x(r) | x(r) ≥ 0 and x(r) = 0 if r /∈ D}
13/125
Error Reduction
Alternating projection:
xk+1 = PSPM(xk)
projection to S:
PS(x) =
{x(r), if r ∈ D,
0, otherwise,
projection toM:
PM(x) = F∗(y), where y =
{b(ω) x(ω)
|x(ω)| , if x(ω) 6= 0,b(ω), otherwise,
14/125
Summary of projection algorithms
Basic input-output (BIO)
xk+1 = (PSPM + I − PM) (xk)
Hybrid input-output (HIO)
xk+1 = ((1 + β)PSPM + I − PS − βPM) (xk)
Hybrid projection reflection (HPR)
xk+1 =((1 + β)PS+PM + I − PS+ − βPM
)(xk)
Relaxed averaged alternating reflection (RAAR)
xk+1 =(2βPS+PM + βI − βPS+ + (1− 2β)PM
)(xk)
Difference map (DF)
xk+1 = (I + β(PS((1− γ2)PM − γ2I) + PM((1− γ1)PS − γ1I))) (xk)
15/125
ADMM
Consider problem
find x and y, such that x = y, x ∈ X and y ∈ Y
X is either S or S+, and Y isM.Augmented Lagrangian function
L(x, y, λ) := λ>(x− y) +12‖x− y‖2
ADMM:
xk+1 = arg minx∈X
L(x, yk, λk),
yk+1 = arg miny∈YL(xk+1, y, λk),
λk+1 = λk + β(xk+1 − yk+1),
16/125
ADMM
ADMM
xk+1 = PX (yk − λk),
yk+1 = PY(xk+1 + λk),
λk+1 = λk + β(xk+1 − yk+1),
ADMM is equivalent to HIO or HPRif PX (x + y) = PX (x) + PX (y)
xk+2 + λk+1 = [(1 + β)PXPY + (I − PX )− βPY ](xk+1 + λk)
Hybrid input-output (HIO)
xk+1 = ((1 + β)PSPM + I − PS − βPM) (xk)
if β = 1
17/125
ADMM
ADMM: updating Lagrange Multiplier twice
xk+1 := PX (yk − πk),
πk+1 := πk + β(xk+1 − yk) = −(I − βPX )(yk − πk),
yk+1 := PY(xk+1 + λk),
λk+1 := λk + ν(xk+1 − yk+1) = (I − νPY)(xk+1 + λk),
ADMM is equivalent to ER if β = ν = 1
xk+1 := PX (yk) and yk+1 := PY(xk+1).
ADMM is equivalent to BIO if β = ν = 1
xk+1 + λk = (PXPY + I − PY) (xk + λk−1)
18/125
Numerical comparison
The parameter β in HPR and RAAR was updated dynamically withβ0 = 0.95. For ADMM, β = 0.5.
ADM
iter
= 1
0iter
= 2
0iter
= 2
00
HPR RAAR
19/125
Numerical comparison
The parameter β in HPR and RAAR was updated dynamically withβ0 = 0.95. For ADMM, β = 0.5.
ADM
iter
= 2
0
HPR RAAR
iter
= 4
0iter
= 2
00
20/125
Numerical comparison
The parameter β was fixed at 0.6, 0.8 and 0.95 for the first, secondand third rows respectively.
ADM HPR RAAR
21/125
Numerical comparison
The parameter β was fixed at 0.6, 0.8 and 0.95 for the first, secondand third rows respectively.
ADM HPR RAAR
22/125
Numerical Results
Convergence behavior:
errk =‖PX (PY(xk))− PY(xk)‖F
‖m‖F
0 20 40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
−3
ADM
HPR
RAAR
(a) Satellite0 20 40 60 80 100 120 140 160 180 200
0
0.5
1
1.5
2
2.5x 10
−3
ADM
HPR
RAAR
(b) lena
23/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
24/125
Ptychographic Phase Retrieval
Given |F(Qiψ)| for i = 1, . . . , k, can we recover ψ?
Ptychographic imaging along with advances in detectors andcomputing have resulted in X-ray microscopes with increased spatialresolution without the need for lenses
25/125
Ptychographic Phase Retrieval
Nanoporous structure of shale, Toxicity of silver nanoparticles,Nano-scale properties of concrete, Bone and high strengthnano-composite materials, Chemical structure of 3D polymericmaterials for energy applications, Nanoscale structure of rocksrelated to carbon sequestration
26/125
Ptychographic Phase Retrieval
27/125
Ptychographic Phase Retrieval
given an object ψ, the illuminated portion:
xi = Qiψ
Qi is an m× n illumination matrix: contains at least one nonzerorow, and each row of Qi contains at most one nonzero element.measurements:
bi = |Fxi| = |FQiψ|, i = 1, 2, ..., k,
retrieving the phases of xi from bi
xi are not completely independent due to the overlap
28/125
Ptychographic Phase Retrieval
Define: x ≡ (x∗1, x∗2, . . . , x
∗k)∗, b ≡ (b>1 , b
>2 , . . . , b
>k )>,
Q ≡ (Q∗1,Q∗2, . . . ,Q
∗k)∗, and F ≡ Diag(F ,F , . . . ,F)
projections:
PQ = Q(Q∗Q)−1Q∗ and PF(x) = F∗ Fx
|Fx|· b,
then we can apply projection algorithms
29/125
Reformulation
reconstruct ψ from measures
bi = |FQiψ|, i = 1, 2, ..., k,
Qi is an m× n illumination matrixoptimization problem
min ρ(ψ) :=
k∑i=1
12‖ |FQiψ| − bi ‖2
2
Reformulation:
min
k∑i=1
12‖|zi| − bi‖2
2, s.t. zi = FQiψ, i = 1, . . . , k
30/125
ADMM
Augmented Lagrangian function
L(zi, ψ, yi) =
k∑i=1
(12‖|zi| − bi‖2
2 + y∗i (FQiψ − zi) +α
2‖FQiψ − zi‖2
2
)
z-subproblem:
(z+i )(l) =
|(si)(l)|+(bi)(l)(1+α)|(si)(l)|
(si)(l), if (si)(l) 6= 0 and (bi)(l) > 0;
± (bi)(l)1+α , if (si)(l) = 0 and (bi)(l) > 0;
0, otherwise.
where si = yi + αFQiψ, i = 1, ..., kψ-subproblem:
ψ+ =1α
(k∑
i=1
Q∗i Qi
)−1 k∑i=1
Q∗i F∗(αz+i − yi
)
31/125
Numerical Results
Qi is a 64× 64 matrixthe probe is translated by 16 pixels
50 100 150 200 250
50
100
150
200
250
(a) The amplitude of the “goldball” image.
(b) The amplitude of the probe.
32/125
Numerical Results
50 100 150 200 250
50
100
150
200
250
(a) The ADMM reconstruc-tion.
50 100 150 200 250
50
100
150
200
250
1
2
3
4
5
6
7
x 10−5
(b) The magnitude of the er-ror produced by ADMM.
33/125
Numerical Results
relative residual norm and relative error:
res =
√∑ki=1 ‖|z
ji| − bi‖2√∑k
i=1 ‖bi‖2, err =
‖cψj − ψ‖‖ψ‖
,
where c is constant phase factor chosen to minimize ‖cψj − ψ‖.
0 20 40 60 80 10010
−5
10−4
10−3
10−2
10−1
100
iteration number
rela
tive
re
sid
ua
l n
orm
(re
s)
ER
SD
CG
NT
HIO
ADM
(a) Change of the relative residual norm(res) .
0 20 40 60 80 10010
−4
10−3
10−2
10−1
100
iteration number
rela
tive
err
or
(err
)
ER
SD
CG
NT
HIO
ADM
(b) Change of the relative error (err)
34/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
35/125
Discrete mathematical model
Phaseless measurements about x0 ∈ Cn
bk = | 〈ak, x0〉 |2, k ∈ {1, . . . ,m}
Phase retrieval is feasibility problem
find x
s.t. | 〈ak, x0〉 |2 = bk, k = 1, . . . ,m
Solving quadratic equations is NP-complete in general
36/125
NP-complete stone problem
Given weights wi ∈ R, i = 1, . . . , n, is there an assignment xi = ±1such that
n∑i=1
wixi = 0?
Formulation as a quadratic system
|xi|2 = 1, i = 1, . . . , n∣∣∣∣∣n∑
i=1
wixi
∣∣∣∣∣2
= 0
37/125
PhaseLift (C., Eldar, Strohmer, Voroninski, 2011)
Lifting: X = xx∗
bk = | 〈ak, x0〉 |2 = a∗k xx∗ak = 〈aka∗k ,X〉
Turns quadratic measurements into linear measurements b = A(X)about xx∗
Phase retrieval problem
find X
s.t. A(X) = b
X � 0, rank(X) = 1
PhaseLiftfind X
s.t. A(X) = b
X � 0
Connections: relaxation of quadratically constrained QP’sShor (87) [Lower bounds on nonconvex quadratic optimizationproblems]Goemans and Williamson (95) [MAX-CUT]Chai, Moscoso, Papanicolaou (11)
38/125
Exact generalized phase retrieval via SDP
Phase retrieval problem
find x
s.t. bk = | 〈ak, x0〉 |2
PhaseLiftfind tr(X)
s.t. A(X) = b, X � 0
Theorem (C. and Li (’12); C., Strohmer and Voroninski (’11))I ak independently and uniformly sampled on unit sphereI m & n
Then with prob. 1− O(e−γm), only feasible point is xx∗
{X : A(X) = b, and X � 0} = {xx∗}
39/125
Extensions to physical setups
40/125
Coded diffraction
Collect diffraction patterns of modulated samples
|F(w[t]x[t])|2 w ∈ W
Makes problem well-posed (for some choices ofW)
41/125
Exact recovery
Figure: Recovery from 6 random binary masks
42/125
Numerical results: noiseless 2D images
43/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
44/125
PhaseCut
Given A ∈ Cm×n and b ∈ Rm
find x, s.t. |Ax| = b.
(Candes et al. 2011b, Alexandre d’Aspremont 2013)
An equivalent model
minx∈Cn,y∈Rm
12‖Ax− y‖2
2
s.t. |y| = b.
45/125
PhaseCut
Reformulation:
minx∈Cn,u∈Cm
12‖Ax− diag(b)u‖2
2
s.t. |ui| = 1, , i = 1, . . . ,m.
Given u, the signal variable is x = A†diag(b)u. Then
minu∈Cm
u∗Mu
s.t. |ui| = 1, i = 1, . . . ,m,
where M = diag(b)(I − AA†)diag(b) is positive semidefinite.The MAXCUT problem
minU∈Sm
Tr(UM)
s.t. Uii = 1, i = 1, · · · ,m, U � 0.
46/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
47/125
Discrete mathematical model
Solve the equations:
yr = |〈ar, x〉|2, r = 1, 2, ...,m. (1)
Gaussian model:
ar ∈ Cn i.i.d.∼ N (0, I/2) + iN (0, I/2).
Coded Diffraction model:
yr =
∣∣∣∣∣n−1∑t=0
x[t]dl(t)e−i2πkt/n
∣∣∣∣∣2
, r = (l, k), 0 ≤ k ≤ n− 1, 1 ≤ l ≤ L.
48/125
Phase retrieval by non-convex optimization
Nonlinear least square problem:
minz∈Cn
f (z) =1
4m
m∑k=1
(yk − |〈ak, z〉|2)2
Pro: operates over vectors and not matrices
Con: f is nonconvex, many local minima
49/125
Wirtinger flow: C., Li and Soltanolkotabi (’14)
Strategies:Start from a sufficiently accurate initialization
Make use of Wirtinger derivative
f (z) =1
4m
m∑k=1
(yk − |〈ak, z〉|2)2
∇f (z) =1m
m∑k=1
(|〈ak, z〉|2 − yk)(aka∗k)z
Careful iterations to avoid local minima
50/125
Algorithm: Gaussian model
Spectral Initialization:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).
2 Calculate z0 to be the leading eigenvector of Y = 1m
m∑r=1
yrara∗r .
3 Normalize z0 such that ‖z0‖2 = n∑
r yr∑r ‖ar‖2 .
Iteration via Wirtinger derivatives: for τ = 0, 1, . . .
zτ+1 = zτ −µτ+1
‖z0‖2∇f (zτ )
51/125
Convergence property: Gaussian model
distance (up to global phase)
dist(z, x) = arg minπ∈[0,2π]
‖z− eiφx‖
TheoremConvergence for Gaussian model (C. Li and Soltanolkotabi (’14))
number of samples m & n log n
Step size µ ≤ c/n(c > 0)
Then with probability at least 1− 10e−γn − 8/n2 − me−1.5n, we have dist(z0, x) ≤ 18‖x‖
and after τ iterationdist(zτ , x) ≤
18(1− µ
4)τ/2‖x‖.
Here γ is a positive constant.
52/125
Algorithm: Coded diffraction model
Initialization via resampled Wirtinger Flow:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).2 Divide the measurements and observations equally into B + 1
groups of size m′. The measurements and observations in group bare denoted as a(b)
r and y(b)r for b = 0, 1, ...,B.
3 Obtain u0 by conducting the spectral initialization on group 0.4 For b = 0 to B− 1, perform the following update:
ub+1 = ub−µ
‖u0‖2
1m′
m′∑r=1
(|z∗a(b+1)
r |2 − y(b+1)r
)(a(b+1)
r (a(b+1)r )∗)z
.
5 Set z0 = uB.
Same iterations as the Gaussian model:for τ = 0, 1, . . .
zτ+1 = zτ −µτ+1
‖z0‖2∇f (zτ )
53/125
Convergence property: coded diffraction model
TheoremConvergence for CD model (C. Li and Soltanolkotabi (’14))
L & (log n)4
Step size µ ≤ c(c > 0)
Then with probability at least 1− (2L + 1)/n3 − 1/n2, we havedist(z0, x) ≤ 1
8√
n‖x‖ and after τ iteration
dist(zτ , x) ≤ 18√
n(1− µ
3)τ/2‖x‖.
54/125
Numerical results: 1D signals
Consider the following two kinds of signals:
• Random low-pass signals:
x[t] =M/2∑
k=−(M/2−1)
(Xk + iYk)e2πi(k−1)(t−1)/n,
with M=n/8 and Xk and Yk are i.i.d. N (0, 1).
• Random Guassian signals: where x ∈ Cn is a random complex Gaussianvector with i.i.d. entries of the form
X[t] = X + iY,
with X and Y distributed as N (0, 1/2).
55/125
Success rate
• Set n = 128.• Apply 50 iterations of the power method as initialization.• Set the step length parameter µτ = min(1− exp(−τ/τ0), 0.2),
where τ0 ≈ 330.• Stop after 2500 iterations, and declare a trial successful if the
relative error of the reconstruction dist(x, x)/‖x‖ falls below 10−5.• The empirical probability of success is an average over 100 trials.
56/125
Success rate
57/125
Numerical results: natural images
• View RGB image as n1 × n2 × 3 array, and run the WF algorithmseparately on each color band.• Apply 50 iterations of the power method as initialization.• Set the step length parameter µτ = min(1− exp(−τ/τ0), 0.4),
where τ0 ≈ 330. Stop after 300 iterations.• One FFT unit is the amount of time it takes to perform a single
FFT on an image of the same size.
58/125
Numerical results: natural images
Figure: Naqsh-e Jahan Square, Esfahan. Image size is 189× 768 pixels;timing is 61.4 sec or about 21200 FFT units. The relative error is 6.2× 10−16.
59/125
Numerical results: natural images
Figure: Stanford main quad. Image size is 320× 1280 pixels; timing is181.8120 sec or about 20700 FFT units. The relative error is 3.5× 10−14.
60/125
Numerical results: natural images
Figure: Milky way Galaxy. Image size is 1080× 1920 pixels; timing is 1318.1sec or 41900 FFT units. The relative error is 9.3× 10−16.
61/125
Recall the main theorems
TheoremConvergence for Gaussian model (C. Li and Soltanolkotabi (’14))
number of samples m & n log n
Step size µ ≤ c/n(c > 0)
Then with probability at least 1− 10e−γn − 8/n2 − me−1.5n, we havedist(z0, x) ≤ 1
8‖x‖ and after τ iteration
dist(zτ , x) ≤ 18
(1− µ
4)τ/2‖x‖.
Here γ is a positive constant.
62/125
Recall the main theorems
TheoremConvergence for CD model (C. Li and Soltanolkotabi (’14))
L & (log n)4
Step size µ ≤ c(c > 0)
Then with probability at least 1− (2L + 1)/n3 − 1/n2, we havedist(z0, x) ≤ 1
8√
n‖x‖ and after τ iteration
dist(zτ , x) ≤ 18√
n(1− µ
3)τ/2‖x‖.
63/125
Regularity condition
DefinitionDefinition We say that the function f satisfies the regularity conditionor RC(α, β, ε) if for all vectors z ∈ E(ε) we have
Re(〈∇f (z), z− xeiφ(z)〉
)≥ 1α
dist2(z, x) +1β‖∇f (z)‖2.
• φ(z) := arg minφ∈[0,2π] ‖z− eiφx‖.• dist(z, x) := ‖z− eiφ(z)x‖.• E(ε) := {z ∈ Cn : dist(z, x) ≤ ε}.
64/125
Proof of convergence
Lemma 1Assume that f obeys RC((α, β, ε)) for all z ∈ E(ε). Furthermore,suppose z0 ∈ E(ε), and assume 0 < µ ≤ 2/β. Consider the followingupdate
zτ+1 = zτ − µ∇f (zτ ).
Then for all τ we have zτ ∈ E(ε) and
dist2(zτ , x) ≤(
1− 2µα
)τdist2(z0, x).
65/125
Proof of convergence
Proof.We prove that if z ∈ E(ε) then for all 0 < µ ≤ 2/β
z+ = z− µ∇f (z)
obeys
dist2(z+, x) ≤(
1− 2µα
)dist2(z, x).
Then the lemma holds by inductively applying the equation above.
66/125
Proof of convergence
Simple algebraic manipulations together with the regularity conditiongive ∥∥∥z+ − xeiφ(z)
∥∥∥2
=∥∥∥z− xeiφ(z) − µ∇f (z)
∥∥∥2
=∥∥∥z− xeiφ(z)
∥∥∥2− 2µRe
(〈∇f (z), z− xeiφ(z)〉
)+ µ2 ‖∇f (z)‖2
≤∥∥∥z− xeiφ(z)
∥∥∥2− 2µ
(1α
∥∥∥z− xeiφ(z)∥∥∥2
+1β‖∇f (z)‖2
)+µ2 ‖∇f (z)‖2
=
(1− 2µ
α
)∥∥∥z− xeiφ(z)∥∥∥2
+ µ
(µ− 2
β
)‖∇f (z)‖2
≤(
1− 2µα
)∥∥∥z− xeiφ(z)∥∥∥2,
which concludes the proof.
67/125
Proof of regularity condition
We will make use of the following lemma:
Lemma 21 x is a solution obeying ‖x‖ = 1, and is independent from the
sampling vectors;2 m ≥ c(δ)n log n in Gaussian model or L ≥ c(δ) log3 n in CD model.
Then, ∥∥∇2f (x)− E∇2f (x)∥∥ ≤ δ
holds with pabability at least 1− 10e−γn − 8/n2 and 1− (2L + 1)/n3 forthe Gaussian and CD model, respectively.
• The concentration of the Hessian matrix at the optimizers.
68/125
Proof of regularity condition
Based on the lemma above with δ = 0.01, we prove the regularitycondition by establishing the local curvature condition and the localsmoothness condition.
Local curvature conditionWe say that the function f satisfies the local curvature condition orLCC(α, ε, δ) if for all vectors z ∈ E(ε),
Re(〈∇f (z), z− xeiφ(z)〉
)≥(
1α
+1− δ
4
)dist2(z, x)+
110m
m∑r=1
∣∣∣a∗r (z− xeiφ(z))∣∣∣4 .
The LCC condition states that the function curves sufficiently upwardsalong most directions near the curve of global optimizers.For the CD model, LCC holds with α ≥ 30 and ε = 1
8√
n ;
For the Gaussian model, LCC holds with α ≥ 8 and ε = 18 .
69/125
Proof of regularity condition
Local smoothness conditionWe say that the function f satisfies the local smoothness condition orLSC(β, ε, δ) if for all vectors z ∈ E(ε) we have
‖∇f (z)‖2 ≤ β
((1− δ)
4dist2(z, x) +
110m
m∑r=1
∣∣∣a∗r (z− xeiφ(z))∣∣∣4) .
The LSC condition states that the gradient of the function is wellbehaved near the curve of global optimizers. Using δ = 0.01, LSCholds with β ≥ 550 + 3n
β ≥ 550 for ε = 1/(8√
n),
β ≥ 550 + 3n for ε = 1/8.
70/125
Proof of regularity condition
In conclusion, when δ = 0.01, for the Gaussian model, the regularitycondition holds with
α ≥ 8, β ≥ 550 + 3n, and ε = 1/8.
while for the CD model, the regularity condition holds with
α ≥ 30, β ≥ 550, and ε = 1/(8√
n),
Therefore, for the Gaussian model, linear convergence holds if theinitial points satisfies dist(z0, x) ≤ 1/8; for the CD model, linearconvergence holds if dist(z0, x) ≤ 1/(8
√n).
71/125
Proof of initialization
Recall the initialization algorithm:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).
2 Calculate z0 to be the leading eigenvector of Y = 1m
m∑r=1
yrara∗r .
3 Normalize z0 such that ‖z0‖2 = n∑
r yr∑r ‖ar‖2 .
Ideas:
E
[1m
m∑r=1
yrara∗r
]= I + 2xx∗,
and any leading eigenvector of I + 2xx∗ is of the form λx. Therefore,by the strong law of large number, the initialization step would recoverthe direction of x perfectly as long as there are enough samples.
72/125
Proof of initialization
In the detailed proof, we will use the following lemma:
Lemma 3In the setup of Lemma 2, ∥∥∥∥∥I − 1
m
m∑r=1
ara∗r
∥∥∥∥∥ ≤ δ,holds with probability at least 1− 2e−γm for the Gaussian model and 1− 1/n2 for theCD model. On this event,
(1− δ)‖h‖2 ≤ 1m
m∑r=1
|a∗r h|2 ≤ (1 + δ)‖h‖2
holds for all h ∈ Cn.
73/125
Proof of initialization
Detailed proof:Lemma 2 gives
‖Y − (xx∗ + ‖x‖2I)‖ ≤ ε := 0.001.
Let z0 be the unit eigenvector corresponding to the top eigenvalue λ0of Y, then
|λ0 − (|z0x|2 + 1)| = |z∗0(Y − (xx∗ + I))z0| ≤ ‖Y − (xx∗ + I)‖ ≤ ε.
Therefore, |z∗0x|2 ≥ λ0 − 1− ε. Meanwhile, since λ0 is the topeigenvalue of Y, and ‖x‖ = 1, we have
λ0 ≥ x∗Yx = x∗(Y − (I + x∗x))x + 2 ≥ 2− ε.
Combining the above two inequalities together, we have
|z∗0x|2 ≥ 1−2ε ⇒ dist2(z0, x) ≤ 2−2√
1− 2ε ≤ 1256
⇒ dist(z0, x) ≤ 116.
74/125
Proof of initialization
Now consider the normalization. Recall that z0 =
(√1m
m∑r=1|a∗r x|2
)z0.
By Lemma 3, with high probability we have
|‖z0‖ − 1| ≤∣∣‖z0‖2 − 1
∣∣ =
∣∣∣∣∣ 1m
m∑r=1
|a∗r x|2 − 1
∣∣∣∣∣ ≤ δ < 116.
Therefore, we have
dist(z0, x) ≤ ‖z0 − z0‖+ dist(z0, x) ≤ |‖z0‖ − 1|+ dist(z0, x) ≤ 18.
75/125
Proof of initialization: Resampled WF
Recall the initialization step via resampled Wirtinger flow:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).2 Divide the measurements and observations equally into B + 1
groups of size m′. The measurements and observations in groupb are denoted as a(b)r and y(b)r for b = 0, 1, ...,B.
3 Obtain u0 by conducting the spectral initialization on group 0.4 For b = 0 to B− 1, perform the following update:
ub+1 = ub−µ
‖u0‖2
(1m′
m′∑r=1
(|z∗a(b+1)
r |2 − y(b+1)r
)(a(b+1)
r (a(b+1)r )∗)z
).
5 Set z0 = uB.
76/125
Proof of initialization: Resampled WF
Outline of the proof:1 For step 3, by the result of Algorithm 1, u0 obeys
dist(u0, x) ≤ 18.
2 For step 4, define f (z; b) = 12m′
m′∑r=1
(|z∗a(b+1)
r |2 − y(b+1)r
)2, then the
update can be written as
ub+1 = ub −µ
‖u0‖2∇f (z; b).
We prove the linear convergence of this series of updates byverifying the following regularity condition:
Re(〈∇f (z; b), z− xeiφ(z)〉
)≥ 1α
dist2(z, x) +1β‖∇f (z; b)‖.
77/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
78/125
Nonlinear least square problem
minz∈Cn
f (z) =1
4m
m∑k=1
(yk − |〈ak, z〉|2)2
Using Wirtinger derivative:
z :=
[zz
];
g(z) := ∇cf (z) =1m
m∑r=1
(|aT
r z|2 − yr) [ (araT
r )z(ara>r )z
];
J(z) :=1√m
m∑r=1
[|a∗1z|a1, |a∗2z|a2, · · · , |a∗mz|am
|a∗1z|a1, |a∗2z|a2, · · · , |a∗mz|am
]T
;
Ψ(z) := J(z)TJ(z) =1m
m∑r=1
[|aT
r z|2araTr (aT
r z)2ara>r( ¯aT
r z)2araTr |aT
r z|2ara>r
].
79/125
The Modified LM method for Phase Retrieval
Levenberg-Marquardt Iteration:
zk+1 = zk − (Ψ(zk) + µkI)−1 g(zk)
Algorithm1 Input: Measurements {ar}, observations {yr}. Set ε ≥ 0.2 Construct z0 using the spectral initialization algorithms.3 While ‖g(zk)‖ ≥ ε do
Compute sk by solving equation
Ψµkzk
sk = (Ψ(zk) + µkI) sk = −g(zk).
until ∥∥Ψµkzk
sk + g(zk)∥∥ ≤ ηk‖g(zk)‖.
Set zk+1 = zk + sk and k := k + 1.
3 Output: zk.
80/125
Convergence of the Gaussian Model
TheoremIf the measurements follow the Gaussian model, the LM equation issolved accurately (ηk = 0 for all k), and the following conditions hold:• m ≥ cn log n, where c is sufficiently large;
• If f (zk) ≥ ‖zk‖2
900n , let µk = 70000n√
nf (zk); if else, let µk =√
f (zk).Then, with probability at least 1− 15e−γn − 8/n2 − me−1.5n, we havedist(z0, x) ≤ (1/8)‖x‖, and
dist(zk+1, x) ≤ c1dist(zk, x),
Meanwhile, once f (zs) <‖zs‖2
900n , for any k ≥ s we have
dist(zk+1, x) < c2dist(zk, x)2.
81/125
Convergence of the Gaussian Model
In the theorem above,
c1 :=
(
1− ||x||4µk
), if f (zk) ≥ 1
900n‖zk‖2;4.28+5.56
√n
9.89√
n , otherwise.
and
c2 =4.28 + 5.56
√n
‖x‖.
82/125
Key to proof
Lower bound of GN matrix’s second smallest eigenvalueFor any y, z ∈ Cn, Im(y∗z) = 0, we have:
y∗Ψ(z)y ≥ ‖y‖2‖z‖2,
holds with high probability.
Im(y∗z) = 0 ⇒ ‖(Ψµz )−1y‖ ≤ 2
‖z‖2 + µ‖y‖.
83/125
Key to proof
Local error bound property
14
dist(z, x)2 ≤ f (z) ≤ 8.04dist(z, x)2 + 6.06ndist(z, x)4,
holds for any z satisfying dist(z, x) ≤ 18 .
Regularity condition
µ(z)h∗(Ψµ
z)−1 g(z) ≥ 1
16‖h‖2 +
164100n‖h‖
‖g(z)‖2
holds for any z = x + h, ‖h‖ ≤ 18 , and f (z) ≥ ‖z‖
2
900n .
84/125
Convergence for the inexact LM method
TheoremConvergence of the inexact LM method for the Gaussian model:
• m & n log n;
• µk takes the same value as in the exact LM method for the Gaussian model;
• ηk ≤ (1−c1)µk25.55n‖zk‖
if f (zk) ≥ ‖zk‖2
900n ; otherwise ηk ≤ (4.33√
n−4.28)µk‖gk‖372.54n2‖zk‖3 .
Then, with probability at least 1− 15e−γn − 8/n2 − me−1.5n, we have dist(z0, x) ≤ 18‖x‖, and
dist(zk+1, x) ≤1 + c1
2dist(zk, x), for all k = 0, 1, ...
dist(zk+1, x) ≤9.89√
n + c2‖x‖2‖x‖
dist(zk, x)2, for all f (zk) <‖zk‖2
900n.
Here c1 and c2 take the same values as in the exact algorithm for the Gaussian model.
85/125
Solving the LM Equation: PCG
Solve(Ψk + µkI)u = gk
by Pre-conditioned Conjugate Gradient Method:
M−1(Ψk + µkI)u = M−1gk, M = Φk + µkI.
Φ(z) :=
[zz∗ 2zzT
2zz∗ zzT
]+ ‖z‖2I2n
small condition numberEasy to inverse: M = (µk + ‖zk‖2)I + M1, where M1 is rank-2matrix.
86/125
Solving the LM Equation: PCG
• small condition number.
Lemma
Consider solving the equation (Φµz )−1Ψµ
z s = (Φµz )−1g(z) by the CG
method from s0 := −(Φµz )−1g(z). Let s∗ be the solution of the system.
Define V := {× : × = [x∗, xT ]∗, x ∈ Cn}. Then, V is an invariantsubspace of (Φµ
z )−1Ψµz , and s0, s∗ ∈ V. Meanwhile, choosing
µk = Kn√
f (z), then the eigenvalues of (Φµz )−1Ψµ
z on V satisfy:
1− 57K√
n≤ λ ≤ 1 +
57K√
n.
87/125
Solving the LM Equation: PCG
• Easy to inverse.
Calculate by Sherman-Morrison-Woodbury theorem:
(Φµz )−1 = aI2n + b
[zz
][z∗, zT ] + c
[z−z
][z∗,−zT ]
where
a =1
‖z‖2 + µ, b = − 3
2(‖z‖2 + µ)(4‖z‖2 + µ), c =
12(‖z‖2 + µ)µ
.
88/125
Outline
1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method
2 Cryo-Electron Microscopy
89/125
Single Particle Cryo-Electron Microscopy
Drawing of the imaging process:
90/125
Single Particle Cryo-Electron Microscopy
Projection images Pi(x, y) =∫∞−∞ φ(xR1
i + yR2i + zR3
i )dz+ “noise”.
φ : R3 → R is the electric potential of the molecule.Cryo-EM problem: Find φ and R1, . . . ,Rn given P1, . . . ,Pn.
91/125
A Toy Example
92/125
E. coli 50S ribosomal subunit: sample images
Fred Sigworth, Yale Medical School
93/125
Algorithmic Pipeline
Particle Picking: manual, automatic or experimental imagesegmentation.Class Averaging: classify images with similar viewing directions,register and average to improve their signal-to-noise ratio (SNR).S, Zhao, Shkolnisky, Hadani, SIIMS, 2011.Orientation Estimation:S, Shkolnisky, SIIMS, 2011.Three-dimensional Reconstruction:a 3D volume is generated by a tomographic inversion algorithm.Iterative Refinement
94/125
What mathematics do we use to solve the problem?
Tomography
Convex optimization and semidefinite programming
Random matrix theory (in several places)
Representation theory of SO(3) (if viewing directions areuniformly distributed)
Spectral graph theory, (vector) diffusion maps
Fast randomized algorithms
. . .
95/125
Orientation Estimation: Fourier projection-slice
96/125
Augular Reconstruction
Van Heel 1987, Vainshtein and Goncharov 1986
97/125
Experiments with simulated noisy projections
Each projection is 129x129 pixels
SNR =Var(Signal)Var(Noise)
98/125
Detection of Common-lines between images
common line between two images Pi and Pj:radial resolution: nr, angular resolution: nθ
Fourier transformed images:(~lk0,~l
k1, . . . ,
~lknθ−1
)Compare~li0,~l
i1, . . . ,
~linθ−1 and~lj0,~lj1, . . . ,
~ljnθ−1
maximum normalized cross correlation:
(mi,j,mj,i) = arg max0≤m1<nθ/2, 0≤m2<nθ
⟨~lim1
,~ljm2
⟩∥∥∥~lim1
∥∥∥∥∥∥~ljm2
∥∥∥ , for all i 6= j,
99/125
Least Squares Approach
The directions of detected common-lines between Pi and Pj
~cij =(c1
ij, c2ij)
= (cos (2πmij/nθ) , sin (2πmij/nθ)) ,
~cji =(c1
ji, c2ji)
= (cos (2πmji/nθ) , sin (2πmji/nθ)) .
Ri ∈ SO(3), i = 1, · · · ,K: the orientations of the K images.Fourier projection-slice theorem
Ri
(~cT
ij0
)= Rj
(~cT
ji0
)for 1 ≤ i < j ≤ K.
They are(
K2
)linear equations for the 6K variables
100/125
Least Squares Approach
Weighted LS approach
minR1,...,RK∈SO(3)
∑i 6=j
wij
∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥2
Since∥∥∥Ri (~cij, 0)T
∥∥∥ =∥∥∥Rj (~cji, 0)T
∥∥∥ = 1, we obtain
maxR1,...,RK∈SO(3)
∑i 6=j
wij〈Ri (~cij, 0)T ,Rj (~cji, 0)T〉
The solution may not be optimal due to the typically largeproportion of outliers:
0 0.5 1 1.5 20
5000
10000
15000(a) SNR = 1/16
0 0.5 1 1.5 20
2000
4000
6000
8000(b) SNR = 1/32
0 0.5 1 1.5 20
1000
2000
3000
4000(c) SNR = 1/64
histogram plots of errors:∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T
∥∥∥
101/125
Least Unsquared Deviations
Sum of unsquared residuals
minR1,...,RK∈SO(3)
∑i6=j
∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥
ormin
R1,...,RK∈SO(3)
∑i 6=j
∥∥∥(~cij, 0)T − RTi Rj (~cji, 0)T
∥∥∥Reweighted Least unsquared problem
minR1,...,RK∈SO(3)
∑i6=j
wij
∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥
Less sensitive to misidentifications of common-lines (outliers)
102/125
Semidefinite Programming Relaxation (SDR)
Rotations:
RiRTi = I3, det (Ri) = 1, for i = 1, . . . ,K
The columns of the rotation matrix Ri:
Ri =
| | |R1
i R2i R3
i| | |
, i = 1, . . . ,K.
Define a 3× 2K matrix R:
R =
| | |R1
1 R21 · · · R1
k| | |
| | |R2
k · · · R1K R2
K| | |
103/125
Semidefinite Programming Relaxation (SDR)
Objective function:∑i 6=j
wij〈Ri (~cij, 0)T ,Rj (~cji, 0)T〉 = trace ((W ◦ S) G)
Gram matrixG = RTR
G is a rank-3 semidefinite positive matrix
Gij =
((R1
i )T
(R2i )T
)(R1
i R2i).
104/125
SDR for weighted LS
S = (Sij)i,j=1,··· ,K and W = (Wij)i,j=1,··· ,K are 2K × 2K matrices:
Sij = ~cTji~cij, and Wij = wij
(1 11 1
).
orthogonality implies
Gii = I2, i = 1, 2, · · · ,K
SDR:
maxG∈R2K×2K trace ((W ◦ S) G)
s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0
105/125
SDR for LUD
Define a 3K × 3K matrix G = (Gij)i,j=1,··· ,K with Gij = RTi Rj:
minG<0
∑i6=j
∥∥∥(~cij, 0)T − Gij (~cji, 0)T∥∥∥ , s.t. Gii = I3
If {Ri} is a solution, then {JRiJ} is also a solution, where
J =
1 0 00 1 00 0 −1
.
Let GJ = (GJij)i,j=1,··· ,K with GJ
ij = JRTi JJRjJ = JRT
i RjJ. Then12(G + GJ) is also a solution.
106/125
SDR for LUD
Using the fact that
12
(Gij + GJij) =
Gij00
0 0 0
,
we obtainminG<0
∑i 6=j
∥∥~cTij − Gij~cT
ji
∥∥ , s.t. Gii = I2.
Solved by alternating direction augmented Lagrangian method
107/125
Spectral Norm Constraint
presence of many “outliers" (i.e., a large proportion of misidentifiedcommon-lines):
images whose viewing directions are parallel share manycommon lines, which is resulted by the overlapping Fourierslices.
when the viewing directions of Ri and Rj are nearby, the fidelityterm
∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥ can become small, even when the
common line pair (~cij,~cji) is misidentified.
the Gram matrix G has just two dominant eigenvalues, instead ofthree.
108/125
Spectral Norm Constraint
The dependency of the spectral norm of G (denoted as αK here)on the distribution of orientations of the images. Each red pointdenotes the viewing direction of a projection. The larger thespectral norm αK is, the more clustered the viewing directionsare.
109/125
Spectral Norm Constraint
add constraint on the spectral norm of the Gram matrix G:
G 4 αKI2K
or‖G‖2 ≤ αK
α ∈ [23 , 1) controls the spread of the viewing directions
If true orientations are uniformly sampled SO(3), then by the lawof large numbers and the symmetry of the distribution oforientations, the spectral norm of the true Gram matrix Gtrue isapproximately 2
3 K
If true viewing directions are highly clustered, then the spectralnorm of the true Gram matrix Gtrue is close to K.
110/125
ADMM for relaxed weighted LS
Primal problem
minG<0
−〈C,G〉
s.t. A (G) = b‖G‖2 ≤ αK
where
A (G) =
G11ii
G22ii√
22 G12
ii +√
22 G21
ii
i=1,2,...,K
, b =
b1i
b2i
b3i
i=1,2,...,K
,
b1i = b2
i = 1, b3i = 0 for all i.
111/125
ADMM for relaxed weighted LS
Dual problem
miny,X<0
−yTb + αK ‖Z‖∗
s.t. Z = C + X +A∗ (y)
ADMM:
yk+1 = −A(C + Xk − Zk)− 1
µ(A (G)− b) ,
Zk+1 = Udiag (z) UT , z = arg minz
αKµ‖z‖1 +
12‖z− λ‖2
2 ,
Xk+1 =
(C + Xk +A∗
(yk+1)+
1µ
Gk)†,
Gk+1 = (1− γ)Gk + γµ(Xk+1 − Hk).
112/125
ADMM for relaxed LUD
Consider the LUD problem
minxij,G<0
∑i<j
‖xij‖ s.t. A (G) = b, xij = ~cTij − Gij~cT
ji , ‖G‖2 ≤ αK
Dual problem
minθij,y,X<0
−yTb−∑
i<j
⟨θij,~cT
ij
⟩+ αK ‖Z‖∗
s.t. Z = Q (θ) + X +A∗ (y) , and ‖θij‖ ≤ 1.
Apply ADMM to the dual problem
113/125
Iterative Reweighted Least Squares (IRLS)
SDR:
minG∈R2K×2K F(G) =∑
i,j=1,2,...,K
√2− 2
∑p,q=1,2
Gpqij Spq
ij
s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0,
‖G‖2 ≤ αK (optional)
Smoothing version
minG∈R2K×2K F(G, ε) =∑
i,j=1,2,...,K
√2− 2
∑p,q=1,2
Gpqij Spq
ij + ε2
s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0,
‖G‖2 ≤ αK (optional)
114/125
Iterative Reweighted Least Squares (IRLS)
Main steps:compute Gk+1 to be the solution of
minG<0
∑i 6=j
wkij(2− 2 〈Gij, Sij〉+ ε2) s.t. A(G) = b, (‖G‖2 ≤ αK)
update
wkij = 1/
√2− 2
⟨Gk
ij, Sij
⟩+ ε2, ∀k > 0.
Convergence:the cost function sequence is monotonically non-increasing
F(Gk+1, ε) ≤ F(Gk, ε).
The sequence of iterates{
Gk}
of IRLS is bounded, and everycluster point of the sequence is a stationary point.
115/125
Numerical Results
3D Fourier Shell Correlation (FSC). FSC measures thenormalized cross-correlation coefficient between two 3Dvolumes over corresponding spherical shells in Fourier space
FSC (i) =
∑j∈Shelli F (V1) (j) · F (V2) (j)√∑
j∈Shelli |F (V1) (j)|2 ·∑
j∈Shelli |F (V2) (j)|2
The reconstruction from the images with estimated orientationsused the Fourier based 3D reconstruction package FIRMThe reconstructed volumes are shown using the visualizationsystem Chimera
116/125
Numerical Results: simulated images
Clean SNR = 1/16 SNR = 1/32 SNR = 1/64
The first column shows three clean images of size 129× 129 pixelsgenerated from a 50S ribosomal subunit volume with different orientations.The other three columns show three noisy images corresponding to those inthe first column with SNR= 1/16, 1/32 and 1/64, respectively.
117/125
Numerical Results: simulated images
The clean volume (top), the reconstructed volumes and the MSEs. Nospectral norm constraint was used (i.e., α = N/A) for all algorithms. Theresult using the IRLS procedure without α is not available due to the highlyclustered estimated projection directions.
118/125
Numerical Results: simulated images
The reconstructed volumes from images with SNR = 1/64 and the MSEs ofthe estimated rotations using spectral norm constraints (i.e., α = 0.90, 0.75,and 0.67) for all algorithms. The result from the IRLS procedure withα = 0.67 for the spectral norm constraint is best.
119/125
Numerical Results: simulated images
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1(1) SNR = 1/16
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1(2) SNR = 1/32
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1
FS
C
(3) SNR = 1/64
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1
(4) SNR = 1/64, α = 0.90
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1
Spatial frequency (1/A)
(5) SNR = 1/64, α = 0.75
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.5
1
Spatial frequency (1/A)
(6) SNR = 1/64, α = 0.67
LUD (ADMM)
LS (SDP/ADMM)
LUD (IRLS)
FSCs of the reconstructed volumes against the clean volume
120/125
Numerical Results: real dataset
Raw image Average image1st Neighbor 2nd Neighbor
Noise reduction by image averaging. Three raw ribosomal images areshown in the first column. Their closest two neighbours (i.e., raw imageshaving similar orientations after alignments) are shown in the second andthird columns. The average images shown in the last column were obtainedby averaging over 10 neighbours of each raw image.
121/125
Numerical Results: real dataset
The ab-initio models estimated by merging two independentreconstructions, each obtained from 1000 class averages. The resolutions ofthe models are 17.2Å, 16.7Å, 16.7Å, 16.7Å and 16.1Å using the FSC 0.143resolution cutoff.
122/125
Numerical Results: real dataset
The refined models corresponding to the ab-initio models. The resolutionsof the models are all 11.1Å.
123/125
Numerical Results: real dataset
The average cost time using different algorithms on 500 and 1000images in the two experimental subsections. The notation α = N/Ameans no spectral norm constraint ‖G‖2 ≤ αK is used.
α = N/A 23 ≤ α ≤ 1
K LS LUD LS LUD(SDP) ADMM IRLS (ADMM) ADMM IRLS
500 7s 266s 469s 78s 454s 3353s1000 31s 1864s 3913s 619s 1928s 20918s
124/125
Numerical Results: real dataset
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.2
0.4
0.6
0.8
1
Spatial frequency (1/A)
FS
C
Initial models, 16.7A1st iteration, 12.8A2nd iteration, 11.1A3rd iteration, 11.1A0.143 FSC criterion
LUD, ADMM: Convergence of the refinement process
125/125
Numerical Results: real dataset
0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.2
0.4
0.6
0.8
1
Spatial freqency (1/A)
FS
C
Initial models, 16.1A1st iteration, 13.0A2nd iteration, 11.4A3rd iteration, 11.1A4th iteration, 11.1A0.143 FSC criterion
LUD, IRLS: Convergence of the refinement process
Recommended