Phase Retrieval And Cryo-Electron Microscopybicmr.pku.edu.cn/~wenzw/bigdata/lect-phase.pdf · 2/125...

Phase Retrieval And Cryo-Electron Microscopy

http://bicmr.pku.edu.cn/~wenzw/bigdata2020.html

Acknowledgement: this slides is based on Prof. Emmanuel Candès ’s and Prof. AmitSinger’s lecture notes

Outline

1 Phase RetrievalClassical Phase RetrievalPtychographic Phase RetrievalPhaseLiftPhaseCutWirtinger FlowsGauss-Newton Method

2 Cryo-Electron Microscopy

X-ray crystallography

Method for determining atomic structure within a crystal

10 Nobel Prizes in X-ray crystallography, and counting...

Missing phase problem

Detectors record intensities of diffracted rays =⇒ phaseless dataonly!

Fraunhofer diffraction =⇒ intensity of electrical ≈ Fourier transform

|x(f1, f2)|2 =

∣∣∣∣∫ x(t1, t2)e−i2π(f1t1+f2t2)dt1dt2

∣∣∣∣Electrical field x = |x|eiφ with intensity |x|2

Phase retrieval problem (inversion)How can we recover the phase (or signal x(t1, t2)) from |x(f1, f2)|

Phase and magnitude

Phase carries more information than magnitude

Phase retrieval in X-ray crystallography

Knowledge of phase crucial to build electron density map

Algorithmic means of recovering phase structure withoutsophisticated setups

Sayre (’52), Fienup (’78)

Initial success in certain cases by using very specific priorknowledge

X-ray imaging: now and then

Need for lens-less imagingResurgence: imaging with new X-ray sources (undulators andsynchrotrons); e.g. Miao and collaborators (’99—present)

X-ray diffraction microscopy

Imaging non-crystalline objects by measuring X-ray diffractionpatterns using extremely intense and ultrashort X-ray pulses

ConsequencesPR problem getting more importantFar less prior knowledge about unknown signal

Ultrashort X-ray pulses

10/125

Other applications of phase retrieval

11/125

Outline

12/125

Classical Phase Retrieval

Feasibility problem

find x ∈ S ∩M or find x ∈ S+ ∩M

given Fourier magnitudes:

M := {x(r) | |x(ω)| = b(ω)}

where x(ω) = F(x(r)), F : Fourier transformgiven support estimate:

S := {x(r) | x(r) = 0 for r /∈ D}

orS+ := {x(r) | x(r) ≥ 0 and x(r) = 0 if r /∈ D}

13/125

Error Reduction

Alternating projection:

xk+1 = PSPM(xk)

projection to S:

PS(x) =

{x(r), if r ∈ D,

0, otherwise,

projection toM:

PM(x) = F∗(y), where y =

{b(ω) x(ω)

|x(ω)| , if x(ω) 6= 0,b(ω), otherwise,

14/125

Summary of projection algorithms

Basic input-output (BIO)

xk+1 = (PSPM + I − PM) (xk)

Hybrid input-output (HIO)

xk+1 = ((1 + β)PSPM + I − PS − βPM) (xk)

Hybrid projection reflection (HPR)

xk+1 =((1 + β)PS+PM + I − PS+ − βPM

Relaxed averaged alternating reflection (RAAR)

xk+1 =(2βPS+PM + βI − βPS+ + (1− 2β)PM

Difference map (DF)

xk+1 = (I + β(PS((1− γ2)PM − γ2I) + PM((1− γ1)PS − γ1I))) (xk)

15/125

Consider problem

find x and y, such that x = y, x ∈ X and y ∈ Y

X is either S or S+, and Y isM.Augmented Lagrangian function

L(x, y, λ) := λ>(x− y) +12‖x− y‖2

xk+1 = arg minx∈X

L(x, yk, λk),

yk+1 = arg miny∈YL(xk+1, y, λk),

λk+1 = λk + β(xk+1 − yk+1),

16/125

xk+1 = PX (yk − λk),

yk+1 = PY(xk+1 + λk),

λk+1 = λk + β(xk+1 − yk+1),

ADMM is equivalent to HIO or HPRif PX (x + y) = PX (x) + PX (y)

xk+2 + λk+1 = [(1 + β)PXPY + (I − PX )− βPY ](xk+1 + λk)

Hybrid input-output (HIO)

xk+1 = ((1 + β)PSPM + I − PS − βPM) (xk)

if β = 1

17/125

ADMM: updating Lagrange Multiplier twice

xk+1 := PX (yk − πk),

πk+1 := πk + β(xk+1 − yk) = −(I − βPX )(yk − πk),

yk+1 := PY(xk+1 + λk),

λk+1 := λk + ν(xk+1 − yk+1) = (I − νPY)(xk+1 + λk),

ADMM is equivalent to ER if β = ν = 1

xk+1 := PX (yk) and yk+1 := PY(xk+1).

ADMM is equivalent to BIO if β = ν = 1

xk+1 + λk = (PXPY + I − PY) (xk + λk−1)

18/125

Numerical comparison

The parameter β in HPR and RAAR was updated dynamically withβ0 = 0.95. For ADMM, β = 0.5.

HPR RAAR

19/125

The parameter β in HPR and RAAR was updated dynamically withβ0 = 0.95. For ADMM, β = 0.5.

HPR RAAR

20/125

The parameter β was fixed at 0.6, 0.8 and 0.95 for the first, secondand third rows respectively.

ADM HPR RAAR

21/125

The parameter β was fixed at 0.6, 0.8 and 0.95 for the first, secondand third rows respectively.

ADM HPR RAAR

22/125

Numerical Results

Convergence behavior:

errk =‖PX (PY(xk))− PY(xk)‖F

‖m‖F

0 20 40 60 80 100 120 140 160 180 2000

(a) Satellite0 20 40 60 80 100 120 140 160 180 200

2.5x 10

(b) lena

23/125

Outline

24/125

Ptychographic Phase Retrieval

Given |F(Qiψ)| for i = 1, . . . , k, can we recover ψ?

Ptychographic imaging along with advances in detectors andcomputing have resulted in X-ray microscopes with increased spatialresolution without the need for lenses

25/125

Nanoporous structure of shale, Toxicity of silver nanoparticles,Nano-scale properties of concrete, Bone and high strengthnano-composite materials, Chemical structure of 3D polymericmaterials for energy applications, Nanoscale structure of rocksrelated to carbon sequestration

26/125

27/125

given an object ψ, the illuminated portion:

xi = Qiψ

Qi is an m× n illumination matrix: contains at least one nonzerorow, and each row of Qi contains at most one nonzero element.measurements:

bi = |Fxi| = |FQiψ|, i = 1, 2, ..., k,

retrieving the phases of xi from bi

xi are not completely independent due to the overlap

28/125

Define: x ≡ (x∗1, x∗2, . . . , x

∗k)∗, b ≡ (b>1 , b

>2 , . . . , b

>k )>,

Q ≡ (Q∗1,Q∗2, . . . ,Q

∗k)∗, and F ≡ Diag(F ,F , . . . ,F)

projections:

PQ = Q(Q∗Q)−1Q∗ and PF(x) = F∗ Fx

|Fx|· b,

then we can apply projection algorithms

29/125

Reformulation

reconstruct ψ from measures

bi = |FQiψ|, i = 1, 2, ..., k,

Qi is an m× n illumination matrixoptimization problem

min ρ(ψ) :=

k∑i=1

12‖ |FQiψ| − bi ‖2

Reformulation:

k∑i=1

12‖|zi| − bi‖2

2, s.t. zi = FQiψ, i = 1, . . . , k

30/125

Augmented Lagrangian function

L(zi, ψ, yi) =

k∑i=1

(12‖|zi| − bi‖2

2 + y∗i (FQiψ − zi) +α

2‖FQiψ − zi‖2

z-subproblem:

(z+i )(l) =

|(si)(l)|+(bi)(l)(1+α)|(si)(l)|

(si)(l), if (si)(l) 6= 0 and (bi)(l) > 0;

± (bi)(l)1+α , if (si)(l) = 0 and (bi)(l) > 0;

0, otherwise.

where si = yi + αFQiψ, i = 1, ..., kψ-subproblem:

ψ+ =1α

Q∗i Qi

)−1 k∑i=1

Q∗i F∗(αz+i − yi

31/125

Numerical Results

Qi is a 64× 64 matrixthe probe is translated by 16 pixels

50 100 150 200 250

(a) The amplitude of the “goldball” image.

(b) The amplitude of the probe.

32/125

Numerical Results

50 100 150 200 250

(a) The ADMM reconstruc-tion.

50 100 150 200 250

x 10−5

(b) The magnitude of the er-ror produced by ADMM.

33/125

Numerical Results

relative residual norm and relative error:

√∑ki=1 ‖|z

ji| − bi‖2√∑k

i=1 ‖bi‖2, err =

‖cψj − ψ‖‖ψ‖

where c is constant phase factor chosen to minimize ‖cψj − ψ‖.

0 20 40 60 80 10010

10−4

10−3

10−2

10−1

iteration number

(a) Change of the relative residual norm(res) .

0 20 40 60 80 10010

10−3

10−2

10−1

iteration number

(b) Change of the relative error (err)

34/125

Outline

35/125

Discrete mathematical model

Phaseless measurements about x0 ∈ Cn

bk = | 〈ak, x0〉 |2, k ∈ {1, . . . ,m}

Phase retrieval is feasibility problem

find x

s.t. | 〈ak, x0〉 |2 = bk, k = 1, . . . ,m

Solving quadratic equations is NP-complete in general

36/125

NP-complete stone problem

Given weights wi ∈ R, i = 1, . . . , n, is there an assignment xi = ±1such that

n∑i=1

wixi = 0?

Formulation as a quadratic system

|xi|2 = 1, i = 1, . . . , n∣∣∣∣∣n∑

∣∣∣∣∣2

37/125

PhaseLift (C., Eldar, Strohmer, Voroninski, 2011)

Lifting: X = xx∗

bk = | 〈ak, x0〉 |2 = a∗k xx∗ak = 〈aka∗k ,X〉

Turns quadratic measurements into linear measurements b = A(X)about xx∗

Phase retrieval problem

find X

s.t. A(X) = b

X � 0, rank(X) = 1

PhaseLiftfind X

s.t. A(X) = b

X � 0

Connections: relaxation of quadratically constrained QP’sShor (87) [Lower bounds on nonconvex quadratic optimizationproblems]Goemans and Williamson (95) [MAX-CUT]Chai, Moscoso, Papanicolaou (11)

38/125

Exact generalized phase retrieval via SDP

Phase retrieval problem

find x

s.t. bk = | 〈ak, x0〉 |2

PhaseLiftfind tr(X)

s.t. A(X) = b, X � 0

Theorem (C. and Li (’12); C., Strohmer and Voroninski (’11))I ak independently and uniformly sampled on unit sphereI m & n

Then with prob. 1− O(e−γm), only feasible point is xx∗

{X : A(X) = b, and X � 0} = {xx∗}

39/125

Extensions to physical setups

40/125

Coded diffraction

Collect diffraction patterns of modulated samples

|F(w[t]x[t])|2 w ∈ W

Makes problem well-posed (for some choices ofW)

41/125

Exact recovery

Figure: Recovery from 6 random binary masks

42/125

Numerical results: noiseless 2D images

43/125

Outline

44/125

PhaseCut

Given A ∈ Cm×n and b ∈ Rm

find x, s.t. |Ax| = b.

(Candes et al. 2011b, Alexandre d’Aspremont 2013)

An equivalent model

minx∈Cn,y∈Rm

12‖Ax− y‖2

s.t. |y| = b.

45/125

PhaseCut

Reformulation:

minx∈Cn,u∈Cm

12‖Ax− diag(b)u‖2

s.t. |ui| = 1, , i = 1, . . . ,m.

Given u, the signal variable is x = A†diag(b)u. Then

minu∈Cm

u∗Mu

s.t. |ui| = 1, i = 1, . . . ,m,

where M = diag(b)(I − AA†)diag(b) is positive semidefinite.The MAXCUT problem

minU∈Sm

Tr(UM)

s.t. Uii = 1, i = 1, · · · ,m, U � 0.

46/125

Outline

47/125

Discrete mathematical model

Solve the equations:

yr = |〈ar, x〉|2, r = 1, 2, ...,m. (1)

Gaussian model:

ar ∈ Cn i.i.d.∼ N (0, I/2) + iN (0, I/2).

Coded Diffraction model:

∣∣∣∣∣n−1∑t=0

x[t]dl(t)e−i2πkt/n

∣∣∣∣∣2

, r = (l, k), 0 ≤ k ≤ n− 1, 1 ≤ l ≤ L.

48/125

Phase retrieval by non-convex optimization

Nonlinear least square problem:

minz∈Cn

f (z) =1

m∑k=1

(yk − |〈ak, z〉|2)2

Pro: operates over vectors and not matrices

Con: f is nonconvex, many local minima

49/125

Wirtinger flow: C., Li and Soltanolkotabi (’14)

Strategies:Start from a sufficiently accurate initialization

Make use of Wirtinger derivative

f (z) =1

m∑k=1

(yk − |〈ak, z〉|2)2

∇f (z) =1m

m∑k=1

(|〈ak, z〉|2 − yk)(aka∗k)z

Careful iterations to avoid local minima

50/125

Algorithm: Gaussian model

Spectral Initialization:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).

2 Calculate z0 to be the leading eigenvector of Y = 1m

m∑r=1

yrara∗r .

3 Normalize z0 such that ‖z0‖2 = n∑

r yr∑r ‖ar‖2 .

Iteration via Wirtinger derivatives: for τ = 0, 1, . . .

zτ+1 = zτ −µτ+1

‖z0‖2∇f (zτ )

51/125

Convergence property: Gaussian model

distance (up to global phase)

dist(z, x) = arg minπ∈[0,2π]

‖z− eiφx‖

TheoremConvergence for Gaussian model (C. Li and Soltanolkotabi (’14))

number of samples m & n log n

Step size µ ≤ c/n(c > 0)

Then with probability at least 1− 10e−γn − 8/n2 − me−1.5n, we have dist(z0, x) ≤ 18‖x‖

and after τ iterationdist(zτ , x) ≤

18(1− µ

4)τ/2‖x‖.

Here γ is a positive constant.

52/125

Algorithm: Coded diffraction model

Initialization via resampled Wirtinger Flow:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).2 Divide the measurements and observations equally into B + 1

groups of size m′. The measurements and observations in group bare denoted as a(b)

r and y(b)r for b = 0, 1, ...,B.

3 Obtain u0 by conducting the spectral initialization on group 0.4 For b = 0 to B− 1, perform the following update:

ub+1 = ub−µ

‖u0‖2

m′∑r=1

(|z∗a(b+1)

r |2 − y(b+1)r

)(a(b+1)

r (a(b+1)r )∗)z

5 Set z0 = uB.

Same iterations as the Gaussian model:for τ = 0, 1, . . .

zτ+1 = zτ −µτ+1

‖z0‖2∇f (zτ )

53/125

Convergence property: coded diffraction model

TheoremConvergence for CD model (C. Li and Soltanolkotabi (’14))

L & (log n)4

Step size µ ≤ c(c > 0)

Then with probability at least 1− (2L + 1)/n3 − 1/n2, we havedist(z0, x) ≤ 1

n‖x‖ and after τ iteration

dist(zτ , x) ≤ 18√

n(1− µ

3)τ/2‖x‖.

54/125

Numerical results: 1D signals

Consider the following two kinds of signals:

• Random low-pass signals:

x[t] =M/2∑

k=−(M/2−1)

(Xk + iYk)e2πi(k−1)(t−1)/n,

with M=n/8 and Xk and Yk are i.i.d. N (0, 1).

• Random Guassian signals: where x ∈ Cn is a random complex Gaussianvector with i.i.d. entries of the form

X[t] = X + iY,

with X and Y distributed as N (0, 1/2).

55/125

Success rate

• Set n = 128.• Apply 50 iterations of the power method as initialization.• Set the step length parameter µτ = min(1− exp(−τ/τ0), 0.2),

where τ0 ≈ 330.• Stop after 2500 iterations, and declare a trial successful if the

relative error of the reconstruction dist(x, x)/‖x‖ falls below 10−5.• The empirical probability of success is an average over 100 trials.

56/125

Success rate

57/125

Numerical results: natural images

• View RGB image as n1 × n2 × 3 array, and run the WF algorithmseparately on each color band.• Apply 50 iterations of the power method as initialization.• Set the step length parameter µτ = min(1− exp(−τ/τ0), 0.4),

where τ0 ≈ 330. Stop after 300 iterations.• One FFT unit is the amount of time it takes to perform a single

FFT on an image of the same size.

58/125

Figure: Naqsh-e Jahan Square, Esfahan. Image size is 189× 768 pixels;timing is 61.4 sec or about 21200 FFT units. The relative error is 6.2× 10−16.

59/125

Figure: Stanford main quad. Image size is 320× 1280 pixels; timing is181.8120 sec or about 20700 FFT units. The relative error is 3.5× 10−14.

60/125

Figure: Milky way Galaxy. Image size is 1080× 1920 pixels; timing is 1318.1sec or 41900 FFT units. The relative error is 9.3× 10−16.

61/125

Recall the main theorems

TheoremConvergence for Gaussian model (C. Li and Soltanolkotabi (’14))

number of samples m & n log n

Step size µ ≤ c/n(c > 0)

Then with probability at least 1− 10e−γn − 8/n2 − me−1.5n, we havedist(z0, x) ≤ 1

8‖x‖ and after τ iteration

dist(zτ , x) ≤ 18

(1− µ

4)τ/2‖x‖.

Here γ is a positive constant.

62/125

Recall the main theorems

TheoremConvergence for CD model (C. Li and Soltanolkotabi (’14))

L & (log n)4

Step size µ ≤ c(c > 0)

Then with probability at least 1− (2L + 1)/n3 − 1/n2, we havedist(z0, x) ≤ 1

n‖x‖ and after τ iteration

dist(zτ , x) ≤ 18√

n(1− µ

3)τ/2‖x‖.

63/125

Regularity condition

DefinitionDefinition We say that the function f satisfies the regularity conditionor RC(α, β, ε) if for all vectors z ∈ E(ε) we have

Re(〈∇f (z), z− xeiφ(z)〉

)≥ 1α

dist2(z, x) +1β‖∇f (z)‖2.

• φ(z) := arg minφ∈[0,2π] ‖z− eiφx‖.• dist(z, x) := ‖z− eiφ(z)x‖.• E(ε) := {z ∈ Cn : dist(z, x) ≤ ε}.

64/125

Proof of convergence

Lemma 1Assume that f obeys RC((α, β, ε)) for all z ∈ E(ε). Furthermore,suppose z0 ∈ E(ε), and assume 0 < µ ≤ 2/β. Consider the followingupdate

zτ+1 = zτ − µ∇f (zτ ).

Then for all τ we have zτ ∈ E(ε) and

dist2(zτ , x) ≤(

1− 2µα

)τdist2(z0, x).

65/125

Proof.We prove that if z ∈ E(ε) then for all 0 < µ ≤ 2/β

z+ = z− µ∇f (z)

dist2(z+, x) ≤(

1− 2µα

)dist2(z, x).

Then the lemma holds by inductively applying the equation above.

66/125

Simple algebraic manipulations together with the regularity conditiongive ∥∥∥z+ − xeiφ(z)

∥∥∥2

=∥∥∥z− xeiφ(z) − µ∇f (z)

∥∥∥2

=∥∥∥z− xeiφ(z)

∥∥∥2− 2µRe

(〈∇f (z), z− xeiφ(z)〉

)+ µ2 ‖∇f (z)‖2

≤∥∥∥z− xeiφ(z)

∥∥∥2− 2µ

∥∥∥z− xeiφ(z)∥∥∥2

+1β‖∇f (z)‖2

)+µ2 ‖∇f (z)‖2

(1− 2µ

)∥∥∥z− xeiφ(z)∥∥∥2

(µ− 2

)‖∇f (z)‖2

1− 2µα

)∥∥∥z− xeiφ(z)∥∥∥2,

which concludes the proof.

67/125

Proof of regularity condition

We will make use of the following lemma:

Lemma 21 x is a solution obeying ‖x‖ = 1, and is independent from the

sampling vectors;2 m ≥ c(δ)n log n in Gaussian model or L ≥ c(δ) log3 n in CD model.

Then, ∥∥∇2f (x)− E∇2f (x)∥∥ ≤ δ

holds with pabability at least 1− 10e−γn − 8/n2 and 1− (2L + 1)/n3 forthe Gaussian and CD model, respectively.

• The concentration of the Hessian matrix at the optimizers.

68/125

Based on the lemma above with δ = 0.01, we prove the regularitycondition by establishing the local curvature condition and the localsmoothness condition.

Local curvature conditionWe say that the function f satisfies the local curvature condition orLCC(α, ε, δ) if for all vectors z ∈ E(ε),

Re(〈∇f (z), z− xeiφ(z)〉

+1− δ

)dist2(z, x)+

m∑r=1

∣∣∣a∗r (z− xeiφ(z))∣∣∣4 .

The LCC condition states that the function curves sufficiently upwardsalong most directions near the curve of global optimizers.For the CD model, LCC holds with α ≥ 30 and ε = 1

For the Gaussian model, LCC holds with α ≥ 8 and ε = 18 .

69/125

Local smoothness conditionWe say that the function f satisfies the local smoothness condition orLSC(β, ε, δ) if for all vectors z ∈ E(ε) we have

‖∇f (z)‖2 ≤ β

((1− δ)

4dist2(z, x) +

m∑r=1

∣∣∣a∗r (z− xeiφ(z))∣∣∣4) .

The LSC condition states that the gradient of the function is wellbehaved near the curve of global optimizers. Using δ = 0.01, LSCholds with β ≥ 550 + 3n

β ≥ 550 for ε = 1/(8√

β ≥ 550 + 3n for ε = 1/8.

70/125

In conclusion, when δ = 0.01, for the Gaussian model, the regularitycondition holds with

α ≥ 8, β ≥ 550 + 3n, and ε = 1/8.

while for the CD model, the regularity condition holds with

α ≥ 30, β ≥ 550, and ε = 1/(8√

Therefore, for the Gaussian model, linear convergence holds if theinitial points satisfies dist(z0, x) ≤ 1/8; for the CD model, linearconvergence holds if dist(z0, x) ≤ 1/(8

√n).

71/125

Proof of initialization

Recall the initialization algorithm:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).

2 Calculate z0 to be the leading eigenvector of Y = 1m

m∑r=1

yrara∗r .

3 Normalize z0 such that ‖z0‖2 = n∑

r yr∑r ‖ar‖2 .

Ideas:

m∑r=1

yrara∗r

]= I + 2xx∗,

and any leading eigenvector of I + 2xx∗ is of the form λx. Therefore,by the strong law of large number, the initialization step would recoverthe direction of x perfectly as long as there are enough samples.

72/125

In the detailed proof, we will use the following lemma:

Lemma 3In the setup of Lemma 2, ∥∥∥∥∥I − 1

m∑r=1

ara∗r

∥∥∥∥∥ ≤ δ,holds with probability at least 1− 2e−γm for the Gaussian model and 1− 1/n2 for theCD model. On this event,

(1− δ)‖h‖2 ≤ 1m

m∑r=1

|a∗r h|2 ≤ (1 + δ)‖h‖2

holds for all h ∈ Cn.

73/125

Detailed proof:Lemma 2 gives

‖Y − (xx∗ + ‖x‖2I)‖ ≤ ε := 0.001.

Let z0 be the unit eigenvector corresponding to the top eigenvalue λ0of Y, then

|λ0 − (|z0x|2 + 1)| = |z∗0(Y − (xx∗ + I))z0| ≤ ‖Y − (xx∗ + I)‖ ≤ ε.

Therefore, |z∗0x|2 ≥ λ0 − 1− ε. Meanwhile, since λ0 is the topeigenvalue of Y, and ‖x‖ = 1, we have

λ0 ≥ x∗Yx = x∗(Y − (I + x∗x))x + 2 ≥ 2− ε.

Combining the above two inequalities together, we have

|z∗0x|2 ≥ 1−2ε ⇒ dist2(z0, x) ≤ 2−2√

1− 2ε ≤ 1256

⇒ dist(z0, x) ≤ 116.

74/125

Now consider the normalization. Recall that z0 =

(√1m

m∑r=1|a∗r x|2

By Lemma 3, with high probability we have

|‖z0‖ − 1| ≤∣∣‖z0‖2 − 1

∣∣ =

∣∣∣∣∣ 1m

m∑r=1

|a∗r x|2 − 1

∣∣∣∣∣ ≤ δ < 116.

Therefore, we have

dist(z0, x) ≤ ‖z0 − z0‖+ dist(z0, x) ≤ |‖z0‖ − 1|+ dist(z0, x) ≤ 18.

75/125

Proof of initialization: Resampled WF

Recall the initialization step via resampled Wirtinger flow:1 Input measurements {ar} and observation {yr}(r = 1, 2, ...,m).2 Divide the measurements and observations equally into B + 1

groups of size m′. The measurements and observations in groupb are denoted as a(b)r and y(b)r for b = 0, 1, ...,B.

3 Obtain u0 by conducting the spectral initialization on group 0.4 For b = 0 to B− 1, perform the following update:

ub+1 = ub−µ

‖u0‖2

(1m′

m′∑r=1

(|z∗a(b+1)

r |2 − y(b+1)r

)(a(b+1)

r (a(b+1)r )∗)z

5 Set z0 = uB.

76/125

Proof of initialization: Resampled WF

Outline of the proof:1 For step 3, by the result of Algorithm 1, u0 obeys

dist(u0, x) ≤ 18.

2 For step 4, define f (z; b) = 12m′

m′∑r=1

(|z∗a(b+1)

r |2 − y(b+1)r

)2, then the

update can be written as

ub+1 = ub −µ

‖u0‖2∇f (z; b).

We prove the linear convergence of this series of updates byverifying the following regularity condition:

Re(〈∇f (z; b), z− xeiφ(z)〉

)≥ 1α

dist2(z, x) +1β‖∇f (z; b)‖.

77/125

Outline

78/125

Nonlinear least square problem

minz∈Cn

f (z) =1

m∑k=1

(yk − |〈ak, z〉|2)2

Using Wirtinger derivative:

g(z) := ∇cf (z) =1m

m∑r=1

r z|2 − yr) [ (araT

r )z(ara>r )z

J(z) :=1√m

m∑r=1

[|a∗1z|a1, |a∗2z|a2, · · · , |a∗mz|am

|a∗1z|a1, |a∗2z|a2, · · · , |a∗mz|am

Ψ(z) := J(z)TJ(z) =1m

m∑r=1

r z|2araTr (aT

r z)2ara>r( ¯aT

r z)2araTr |aT

r z|2ara>r

79/125

The Modified LM method for Phase Retrieval

Levenberg-Marquardt Iteration:

zk+1 = zk − (Ψ(zk) + µkI)−1 g(zk)

Algorithm1 Input: Measurements {ar}, observations {yr}. Set ε ≥ 0.2 Construct z0 using the spectral initialization algorithms.3 While ‖g(zk)‖ ≥ ε do

Compute sk by solving equation

Ψµkzk

sk = (Ψ(zk) + µkI) sk = −g(zk).

until ∥∥Ψµkzk

sk + g(zk)∥∥ ≤ ηk‖g(zk)‖.

Set zk+1 = zk + sk and k := k + 1.

3 Output: zk.

80/125

Convergence of the Gaussian Model

TheoremIf the measurements follow the Gaussian model, the LM equation issolved accurately (ηk = 0 for all k), and the following conditions hold:• m ≥ cn log n, where c is sufficiently large;

• If f (zk) ≥ ‖zk‖2

900n , let µk = 70000n√

nf (zk); if else, let µk =√

f (zk).Then, with probability at least 1− 15e−γn − 8/n2 − me−1.5n, we havedist(z0, x) ≤ (1/8)‖x‖, and

dist(zk+1, x) ≤ c1dist(zk, x),

Meanwhile, once f (zs) <‖zs‖2

900n , for any k ≥ s we have

dist(zk+1, x) < c2dist(zk, x)2.

81/125

Convergence of the Gaussian Model

In the theorem above,

1− ||x||4µk

), if f (zk) ≥ 1

900n‖zk‖2;4.28+5.56

9.89√

n , otherwise.

c2 =4.28 + 5.56

‖x‖.

82/125

Key to proof

Lower bound of GN matrix’s second smallest eigenvalueFor any y, z ∈ Cn, Im(y∗z) = 0, we have:

y∗Ψ(z)y ≥ ‖y‖2‖z‖2,

holds with high probability.

Im(y∗z) = 0 ⇒ ‖(Ψµz )−1y‖ ≤ 2

‖z‖2 + µ‖y‖.

83/125

Key to proof

Local error bound property

dist(z, x)2 ≤ f (z) ≤ 8.04dist(z, x)2 + 6.06ndist(z, x)4,

holds for any z satisfying dist(z, x) ≤ 18 .

Regularity condition

µ(z)h∗(Ψµ

z)−1 g(z) ≥ 1

16‖h‖2 +

164100n‖h‖

‖g(z)‖2

holds for any z = x + h, ‖h‖ ≤ 18 , and f (z) ≥ ‖z‖

900n .

84/125

Convergence for the inexact LM method

TheoremConvergence of the inexact LM method for the Gaussian model:

• m & n log n;

• µk takes the same value as in the exact LM method for the Gaussian model;

• ηk ≤ (1−c1)µk25.55n‖zk‖

if f (zk) ≥ ‖zk‖2

900n ; otherwise ηk ≤ (4.33√

n−4.28)µk‖gk‖372.54n2‖zk‖3 .

Then, with probability at least 1− 15e−γn − 8/n2 − me−1.5n, we have dist(z0, x) ≤ 18‖x‖, and

dist(zk+1, x) ≤1 + c1

2dist(zk, x), for all k = 0, 1, ...

dist(zk+1, x) ≤9.89√

n + c2‖x‖2‖x‖

dist(zk, x)2, for all f (zk) <‖zk‖2

Here c1 and c2 take the same values as in the exact algorithm for the Gaussian model.

85/125

Solving the LM Equation: PCG

Solve(Ψk + µkI)u = gk

by Pre-conditioned Conjugate Gradient Method:

M−1(Ψk + µkI)u = M−1gk, M = Φk + µkI.

Φ(z) :=

[zz∗ 2zzT

2zz∗ zzT

]+ ‖z‖2I2n

small condition numberEasy to inverse: M = (µk + ‖zk‖2)I + M1, where M1 is rank-2matrix.

86/125

• small condition number.

Consider solving the equation (Φµz )−1Ψµ

z s = (Φµz )−1g(z) by the CG

method from s0 := −(Φµz )−1g(z). Let s∗ be the solution of the system.

Define V := {× : × = [x∗, xT ]∗, x ∈ Cn}. Then, V is an invariantsubspace of (Φµ

z )−1Ψµz , and s0, s∗ ∈ V. Meanwhile, choosing

µk = Kn√

f (z), then the eigenvalues of (Φµz )−1Ψµ

z on V satisfy:

1− 57K√

n≤ λ ≤ 1 +

57K√

87/125

• Easy to inverse.

Calculate by Sherman-Morrison-Woodbury theorem:

(Φµz )−1 = aI2n + b

][z∗, zT ] + c

[z−z

][z∗,−zT ]

‖z‖2 + µ, b = − 3

2(‖z‖2 + µ)(4‖z‖2 + µ), c =

12(‖z‖2 + µ)µ

88/125

Outline

89/125

Single Particle Cryo-Electron Microscopy

Drawing of the imaging process:

90/125

Single Particle Cryo-Electron Microscopy

Projection images Pi(x, y) =∫∞−∞ φ(xR1

i + yR2i + zR3

i )dz+ “noise”.

φ : R3 → R is the electric potential of the molecule.Cryo-EM problem: Find φ and R1, . . . ,Rn given P1, . . . ,Pn.

91/125

A Toy Example

92/125

E. coli 50S ribosomal subunit: sample images

Fred Sigworth, Yale Medical School

93/125

Algorithmic Pipeline

Particle Picking: manual, automatic or experimental imagesegmentation.Class Averaging: classify images with similar viewing directions,register and average to improve their signal-to-noise ratio (SNR).S, Zhao, Shkolnisky, Hadani, SIIMS, 2011.Orientation Estimation:S, Shkolnisky, SIIMS, 2011.Three-dimensional Reconstruction:a 3D volume is generated by a tomographic inversion algorithm.Iterative Refinement

94/125

What mathematics do we use to solve the problem?

Tomography

Convex optimization and semidefinite programming

Random matrix theory (in several places)

Representation theory of SO(3) (if viewing directions areuniformly distributed)

Spectral graph theory, (vector) diffusion maps

Fast randomized algorithms

95/125

Orientation Estimation: Fourier projection-slice

96/125

Augular Reconstruction

Van Heel 1987, Vainshtein and Goncharov 1986

97/125

Experiments with simulated noisy projections

Each projection is 129x129 pixels

SNR =Var(Signal)Var(Noise)

98/125

Detection of Common-lines between images

common line between two images Pi and Pj:radial resolution: nr, angular resolution: nθ

Fourier transformed images:(~lk0,~l

k1, . . . ,

~lknθ−1

)Compare~li0,~l

i1, . . . ,

~linθ−1 and~lj0,~lj1, . . . ,

~ljnθ−1

maximum normalized cross correlation:

(mi,j,mj,i) = arg max0≤m1<nθ/2, 0≤m2<nθ

⟨~lim1

,~ljm2

⟩∥∥∥~lim1

∥∥∥∥∥∥~ljm2

∥∥∥ , for all i 6= j,

99/125

Least Squares Approach

The directions of detected common-lines between Pi and Pj

~cij =(c1

ij, c2ij)

= (cos (2πmij/nθ) , sin (2πmij/nθ)) ,

~cji =(c1

ji, c2ji)

= (cos (2πmji/nθ) , sin (2πmji/nθ)) .

Ri ∈ SO(3), i = 1, · · · ,K: the orientations of the K images.Fourier projection-slice theorem

)for 1 ≤ i < j ≤ K.

They are(

)linear equations for the 6K variables

100/125

Least Squares Approach

Weighted LS approach

minR1,...,RK∈SO(3)

∑i 6=j

∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥2

Since∥∥∥Ri (~cij, 0)T

∥∥∥ =∥∥∥Rj (~cji, 0)T

∥∥∥ = 1, we obtain

maxR1,...,RK∈SO(3)

∑i 6=j

wij〈Ri (~cij, 0)T ,Rj (~cji, 0)T〉

The solution may not be optimal due to the typically largeproportion of outliers:

0 0.5 1 1.5 20

15000(a) SNR = 1/16

0 0.5 1 1.5 20

8000(b) SNR = 1/32

0 0.5 1 1.5 20

4000(c) SNR = 1/64

histogram plots of errors:∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T

∥∥∥

101/125

Least Unsquared Deviations

Sum of unsquared residuals

∑i6=j

∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥

R1,...,RK∈SO(3)

∑i 6=j

∥∥∥(~cij, 0)T − RTi Rj (~cji, 0)T

∥∥∥Reweighted Least unsquared problem

∑i6=j

∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥

Less sensitive to misidentifications of common-lines (outliers)

102/125

Semidefinite Programming Relaxation (SDR)

Rotations:

RiRTi = I3, det (Ri) = 1, for i = 1, . . . ,K

The columns of the rotation matrix Ri:

| | |R1

i R2i R3

i| | |

, i = 1, . . . ,K.

Define a 3× 2K matrix R:

| | |R1

1 R21 · · · R1

k| | |

| | |R2

k · · · R1K R2

K| | |

103/125

Semidefinite Programming Relaxation (SDR)

Objective function:∑i 6=j

wij〈Ri (~cij, 0)T ,Rj (~cji, 0)T〉 = trace ((W ◦ S) G)

Gram matrixG = RTR

G is a rank-3 semidefinite positive matrix

(R2i )T

i R2i).

104/125

SDR for weighted LS

S = (Sij)i,j=1,··· ,K and W = (Wij)i,j=1,··· ,K are 2K × 2K matrices:

Sij = ~cTji~cij, and Wij = wij

(1 11 1

orthogonality implies

Gii = I2, i = 1, 2, · · · ,K

maxG∈R2K×2K trace ((W ◦ S) G)

s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0

105/125

SDR for LUD

Define a 3K × 3K matrix G = (Gij)i,j=1,··· ,K with Gij = RTi Rj:

minG<0

∑i6=j

∥∥∥(~cij, 0)T − Gij (~cji, 0)T∥∥∥ , s.t. Gii = I3

If {Ri} is a solution, then {JRiJ} is also a solution, where

1 0 00 1 00 0 −1

Let GJ = (GJij)i,j=1,··· ,K with GJ

ij = JRTi JJRjJ = JRT

i RjJ. Then12(G + GJ) is also a solution.

106/125

SDR for LUD

Using the fact that

(Gij + GJij) =

we obtainminG<0

∑i 6=j

∥∥~cTij − Gij~cT

∥∥ , s.t. Gii = I2.

Solved by alternating direction augmented Lagrangian method

107/125

Spectral Norm Constraint

presence of many “outliers" (i.e., a large proportion of misidentifiedcommon-lines):

images whose viewing directions are parallel share manycommon lines, which is resulted by the overlapping Fourierslices.

when the viewing directions of Ri and Rj are nearby, the fidelityterm

∥∥∥Ri (~cij, 0)T − Rj (~cji, 0)T∥∥∥ can become small, even when the

common line pair (~cij,~cji) is misidentified.

the Gram matrix G has just two dominant eigenvalues, instead ofthree.

108/125

The dependency of the spectral norm of G (denoted as αK here)on the distribution of orientations of the images. Each red pointdenotes the viewing direction of a projection. The larger thespectral norm αK is, the more clustered the viewing directionsare.

109/125

add constraint on the spectral norm of the Gram matrix G:

G 4 αKI2K

or‖G‖2 ≤ αK

α ∈ [23 , 1) controls the spread of the viewing directions

If true orientations are uniformly sampled SO(3), then by the lawof large numbers and the symmetry of the distribution oforientations, the spectral norm of the true Gram matrix Gtrue isapproximately 2

If true viewing directions are highly clustered, then the spectralnorm of the true Gram matrix Gtrue is close to K.

110/125

ADMM for relaxed weighted LS

Primal problem

minG<0

−〈C,G〉

s.t. A (G) = b‖G‖2 ≤ αK

A (G) =

G22ii√

22 G12

ii +√

22 G21

i=1,2,...,K

b1i = b2

i = 1, b3i = 0 for all i.

111/125

ADMM for relaxed weighted LS

Dual problem

miny,X<0

−yTb + αK ‖Z‖∗

s.t. Z = C + X +A∗ (y)

yk+1 = −A(C + Xk − Zk)− 1

µ(A (G)− b) ,

Zk+1 = Udiag (z) UT , z = arg minz

αKµ‖z‖1 +

12‖z− λ‖2

Xk+1 =

(C + Xk +A∗

(yk+1)+

Gk)†,

Gk+1 = (1− γ)Gk + γµ(Xk+1 − Hk).

112/125

ADMM for relaxed LUD

Consider the LUD problem

minxij,G<0

∑i<j

‖xij‖ s.t. A (G) = b, xij = ~cTij − Gij~cT

ji , ‖G‖2 ≤ αK

Dual problem

minθij,y,X<0

−yTb−∑

⟨θij,~cT

⟩+ αK ‖Z‖∗

s.t. Z = Q (θ) + X +A∗ (y) , and ‖θij‖ ≤ 1.

Apply ADMM to the dual problem

113/125

Iterative Reweighted Least Squares (IRLS)

minG∈R2K×2K F(G) =∑

i,j=1,2,...,K

√2− 2

∑p,q=1,2

Gpqij Spq

s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0,

‖G‖2 ≤ αK (optional)

Smoothing version

minG∈R2K×2K F(G, ε) =∑

i,j=1,2,...,K

√2− 2

∑p,q=1,2

Gpqij Spq

ij + ε2

s.t. Gii = I2, i = 1, 2, · · · ,K,G < 0,

‖G‖2 ≤ αK (optional)

114/125

Iterative Reweighted Least Squares (IRLS)

Main steps:compute Gk+1 to be the solution of

minG<0

∑i 6=j

wkij(2− 2 〈Gij, Sij〉+ ε2) s.t. A(G) = b, (‖G‖2 ≤ αK)

update

wkij = 1/

√2− 2

ij, Sij

⟩+ ε2, ∀k > 0.

Convergence:the cost function sequence is monotonically non-increasing

F(Gk+1, ε) ≤ F(Gk, ε).

The sequence of iterates{

of IRLS is bounded, and everycluster point of the sequence is a stationary point.

115/125

Numerical Results

3D Fourier Shell Correlation (FSC). FSC measures thenormalized cross-correlation coefficient between two 3Dvolumes over corresponding spherical shells in Fourier space

FSC (i) =

∑j∈Shelli F (V1) (j) · F (V2) (j)√∑

j∈Shelli |F (V1) (j)|2 ·∑

j∈Shelli |F (V2) (j)|2

The reconstruction from the images with estimated orientationsused the Fourier based 3D reconstruction package FIRMThe reconstructed volumes are shown using the visualizationsystem Chimera

116/125

Numerical Results: simulated images

Clean SNR = 1/16 SNR = 1/32 SNR = 1/64

The first column shows three clean images of size 129× 129 pixelsgenerated from a 50S ribosomal subunit volume with different orientations.The other three columns show three noisy images corresponding to those inthe first column with SNR= 1/16, 1/32 and 1/64, respectively.

117/125

The clean volume (top), the reconstructed volumes and the MSEs. Nospectral norm constraint was used (i.e., α = N/A) for all algorithms. Theresult using the IRLS procedure without α is not available due to the highlyclustered estimated projection directions.

118/125

The reconstructed volumes from images with SNR = 1/64 and the MSEs ofthe estimated rotations using spectral norm constraints (i.e., α = 0.90, 0.75,and 0.67) for all algorithms. The result from the IRLS procedure withα = 0.67 for the spectral norm constraint is best.

119/125

0.02 0.04 0.06 0.08 0.1 0.12 0.140

1(1) SNR = 1/16

0.02 0.04 0.06 0.08 0.1 0.12 0.140

1(2) SNR = 1/32

0.02 0.04 0.06 0.08 0.1 0.12 0.140

(3) SNR = 1/64

0.02 0.04 0.06 0.08 0.1 0.12 0.140

(4) SNR = 1/64, α = 0.90

0.02 0.04 0.06 0.08 0.1 0.12 0.140

Spatial frequency (1/A)

(5) SNR = 1/64, α = 0.75

0.02 0.04 0.06 0.08 0.1 0.12 0.140

(6) SNR = 1/64, α = 0.67

LUD (ADMM)

LS (SDP/ADMM)

LUD (IRLS)

FSCs of the reconstructed volumes against the clean volume

120/125

Numerical Results: real dataset

Raw image Average image1st Neighbor 2nd Neighbor

Noise reduction by image averaging. Three raw ribosomal images areshown in the first column. Their closest two neighbours (i.e., raw imageshaving similar orientations after alignments) are shown in the second andthird columns. The average images shown in the last column were obtainedby averaging over 10 neighbours of each raw image.

121/125

The ab-initio models estimated by merging two independentreconstructions, each obtained from 1000 class averages. The resolutions ofthe models are 17.2Å, 16.7Å, 16.7Å, 16.7Å and 16.1Å using the FSC 0.143resolution cutoff.

122/125

The refined models corresponding to the ab-initio models. The resolutionsof the models are all 11.1Å.

123/125

The average cost time using different algorithms on 500 and 1000images in the two experimental subsections. The notation α = N/Ameans no spectral norm constraint ‖G‖2 ≤ αK is used.

α = N/A 23 ≤ α ≤ 1

K LS LUD LS LUD(SDP) ADMM IRLS (ADMM) ADMM IRLS

500 7s 266s 469s 78s 454s 3353s1000 31s 1864s 3913s 619s 1928s 20918s

124/125

0.02 0.04 0.06 0.08 0.1 0.12 0.140

Initial models, 16.7A1st iteration, 12.8A2nd iteration, 11.1A3rd iteration, 11.1A0.143 FSC criterion

LUD, ADMM: Convergence of the refinement process

125/125

0.02 0.04 0.06 0.08 0.1 0.12 0.140

Spatial freqency (1/A)

Initial models, 16.1A1st iteration, 13.0A2nd iteration, 11.4A3rd iteration, 11.1A4th iteration, 11.1A0.143 FSC criterion

LUD, IRLS: Convergence of the refinement process

Phase Retrieval And Cryo-Electron Microscopybicmr.pku.edu.cn/~wenzw/bigdata/lect-phase.pdf · 2/125...

Documents

Phase Retrieval using Power Measurements

Ptychographic wavefront characterisation for single

Phase Retrieval via Wirtinger Flow: Theory and Algorithmscandes/papers/WirtingerFlow.pdf · Phase Retrieval via Wirtinger Flow: Theory and Algorithms ... two steps, introduced in

Evaluation of phase retrieval approaches in magnified X

Phase retrieval algorithms: a comparisonppoon/AO82_PRComparison.pdf · 2013-06-13 · Phase retrieval algorithms: a comparison J. R. Fienup Iterative algorithms for phase retrieval

Phase retrieval with PhaseLift algorithm

DOLPHIn - Dictionary Learning for Phase Retrieval

Wide-field, high-resolution Fourier ptychographic microscopy

Generalized spectral phase-only time-domain ptychographic

BM3D-prGAMP: Compressive Phase Retrieval Based …cam6.web.rice.edu › talks › ICME_Presentation_v4.pdfBM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising Chris Metzler,

DEVELOPMENT OF PHASE RETRIEVAL TECHNIQUES IN OPTICAL

Fourier ptychographic reconstruction using Poisson maximum

A generalization for optimized phase retrieval algorithms

Phase Retrieval with Application to Optical Imagingmicroscopy, crystallography, astronomy, and optical imaging. Exploring phase retrieval in optical settings, speciﬁcally when the

Phase retrieval for wavelet transforms

Parallel Ptychographic Reconstructiontpeterka/papers/nashed-optics-express14-paper.pdf · Parallel Ptychographic Reconstruction ... 22. nVIDIA Corporation, CUDA C Programming Guide

Ptychographic reconstruction of attosecond pulses...Ptychographic reconstruction of attosecond pulses ... “Frequency-resolved optical gating for complete reconstruction of attosecond

Computational super-resolution phase retrieval …...Research Article Vol. X, No. X / April 2016 / Optica 1 Computational super-resolution phase retrieval from multiple phase-coded

Optical Ptychographic Phase Tomography - UCLucapikr/projects/Optical... · and computerised tomography using optical waves was investigated in this report. The theoretical background

ALL-LINEAR PHASE RETRIEVAL OF OPTICAL FREQUENCY …