12
1 Unbiasing Procedures for Scale-invariant Multi-reference Alignment Matthew Hirn, Anna Little Abstract—This article discusses a generalization of the 1- dimensional multi-reference alignment problem. The goal is to recover a hidden signal from many noisy observations, where each noisy observation includes a random translation and ran- dom dilation of the hidden signal, as well as high additive noise. We propose a method that recovers the power spectrum of the hidden signal by applying a data-driven, nonlinear unbiasing procedure, and thus the hidden signal is obtained up to an unknown phase. An unbiased estimator of the power spectrum is defined, whose error depends on the sample size and noise levels, and we precisely quantify the convergence rate of the proposed estimator. The unbiasing procedure relies on knowledge of the dilation distribution, and we implement an optimization procedure to learn the dilation variance when this parameter is unknown. Our theoretical work is supported by extensive numerical experiments on a wide range of signals. Index Terms—Multi-reference alignment, method of invariants, dilations, signal processing. I. I NTRODUCTION In classic multi-reference alignment (MRA), one attempts to recover a hidden signal f : R R from many noisy observations, where each noisy observation has been randomly translated and corrupted by additive noise, as described in the following model. Model 1 (Classic MRA). The classic MRA data model con- sists of M independent observations of a compactly supported, real-valued signal f L 2 (R): y j (x)= f (x - t j )+ ε j (x) , 1 j M, (1) where: (i) supp(y j ) [- 1 2 , 1 2 ] for 1 j M . (ii) {t j } M j=1 are independent samples of a random variable t R. (iii) {ε j (x)} M j=1 are independent white noise processes on [- 1 2 , 1 2 ] with variance σ 2 . This toy model is a first step towards more realistic models arising in cryo-electron microscopy, and is relevant in many other applications including structural biology [1]–[6]; radar [7], [8]; single cell genomic sequencing [9]; image registration [10]–[12]; and signal processing [7]. Some methods solve Model 1 via synchronization [13]–[22], i.e. the translation fac- tors {t j } M j=1 are explicitly recovered and the signals aligned. M. Hirn is with the Department of Computational Mathematics, Science and Engineering, the Department of Mathematics, and the Center for Quan- tum Computing, Science and Engineering, Michigan State University, East Lansing, MI, 48824 USA, e-mail: [email protected]. A. Little is with the Department of Mathematics and the Utah Center For Data Science, University of Utah, Salt Lake City, UT, 84112 USA, e-mail: [email protected]. Fig. 1: Model illustration: a hidden signal is randomly trans- lated (first column), randomly dilated (second column), and then corrupted by additive noise (second and third columns). Column 2 shows corruption with σ 2 = 1 2 and Column 3 with σ 2 =2; the purple curves illustrate the noise level considered in the simulations reported in Section V. Synchronization approaches will fail in the high noise regime when the signal-to-noise ratio (SNR) is low, but the hidden signal can still be recovered by methods which avoid align- ment; these include the method of moments [23]–[25], which contain the method of invariants [26]–[28] as a special case, and expectation-maximization type algorithms [29], [30]. The method of invariants leverages translation invariant Fourier features such as the power spectrum and bispectrum, as they are especially useful for solving Model 1. Recall the Fourier transform of a signal f L 1 (R) is defined as b f (ω)= Z f (x)e -ixω dx , and its power spectrum is then defined by (Pf )(ω)= | b f (ω)| 2 . In this article we analyze the following generalization of classic MRA, where signals are also corrupted by a random scale change (i.e. dilation) in addition to random translation and additive noise. See Figure 1. Model 2 (Noisy dilation MRA data model). The noisy dilation MRA data model consists of M independent observations of a compactly supported, real-valued signal f L 2 (R): y j (x)= f ( (1 - τ j ) -1 (x - t j ) ) + ε j (x) , 1 j M. (2) In addition, we assume: arXiv:2107.01274v1 [eess.SP] 2 Jul 2021

Unbiasing Procedures for Scale-invariant Multi-reference

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unbiasing Procedures for Scale-invariant Multi-reference

1

Unbiasing Procedures for Scale-invariantMulti-reference Alignment

Matthew Hirn, Anna Little

Abstract—This article discusses a generalization of the 1-dimensional multi-reference alignment problem. The goal is torecover a hidden signal from many noisy observations, whereeach noisy observation includes a random translation and ran-dom dilation of the hidden signal, as well as high additive noise.We propose a method that recovers the power spectrum of thehidden signal by applying a data-driven, nonlinear unbiasingprocedure, and thus the hidden signal is obtained up to anunknown phase. An unbiased estimator of the power spectrumis defined, whose error depends on the sample size and noiselevels, and we precisely quantify the convergence rate of theproposed estimator. The unbiasing procedure relies on knowledgeof the dilation distribution, and we implement an optimizationprocedure to learn the dilation variance when this parameteris unknown. Our theoretical work is supported by extensivenumerical experiments on a wide range of signals.

Index Terms—Multi-reference alignment, method of invariants,dilations, signal processing.

I. INTRODUCTION

In classic multi-reference alignment (MRA), one attemptsto recover a hidden signal f : R → R from many noisyobservations, where each noisy observation has been randomlytranslated and corrupted by additive noise, as described in thefollowing model.

Model 1 (Classic MRA). The classic MRA data model con-sists of M independent observations of a compactly supported,real-valued signal f ∈ L2(R):

yj(x) = f(x− tj) + εj(x) , 1 ≤ j ≤M , (1)

where:(i) supp(yj) ⊆ [− 1

2 ,12 ] for 1 ≤ j ≤M .

(ii) tjMj=1 are independent samples of a random variablet ∈ R.

(iii) εj(x)Mj=1 are independent white noise processes on[− 1

2 ,12 ] with variance σ2.

This toy model is a first step towards more realistic modelsarising in cryo-electron microscopy, and is relevant in manyother applications including structural biology [1]–[6]; radar[7], [8]; single cell genomic sequencing [9]; image registration[10]–[12]; and signal processing [7]. Some methods solveModel 1 via synchronization [13]–[22], i.e. the translation fac-tors tjMj=1 are explicitly recovered and the signals aligned.

M. Hirn is with the Department of Computational Mathematics, Scienceand Engineering, the Department of Mathematics, and the Center for Quan-tum Computing, Science and Engineering, Michigan State University, EastLansing, MI, 48824 USA, e-mail: [email protected].

A. Little is with the Department of Mathematics and the Utah Center ForData Science, University of Utah, Salt Lake City, UT, 84112 USA, e-mail:[email protected].

Fig. 1: Model illustration: a hidden signal is randomly trans-lated (first column), randomly dilated (second column), andthen corrupted by additive noise (second and third columns).Column 2 shows corruption with σ2 = 1

2 and Column 3 withσ2 = 2; the purple curves illustrate the noise level consideredin the simulations reported in Section V.

Synchronization approaches will fail in the high noise regimewhen the signal-to-noise ratio (SNR) is low, but the hiddensignal can still be recovered by methods which avoid align-ment; these include the method of moments [23]–[25], whichcontain the method of invariants [26]–[28] as a special case,and expectation-maximization type algorithms [29], [30]. Themethod of invariants leverages translation invariant Fourierfeatures such as the power spectrum and bispectrum, as theyare especially useful for solving Model 1. Recall the Fouriertransform of a signal f ∈ L1(R) is defined as

f(ω) =

∫f(x)e−ixω dx ,

and its power spectrum is then defined by (Pf)(ω) = |f(ω)|2.In this article we analyze the following generalization of

classic MRA, where signals are also corrupted by a randomscale change (i.e. dilation) in addition to random translationand additive noise. See Figure 1.

Model 2 (Noisy dilation MRA data model). The noisy dilationMRA data model consists of M independent observations ofa compactly supported, real-valued signal f ∈ L2(R):

yj(x) = f((1− τj)−1(x− tj)

)+εj(x) , 1 ≤ j ≤M . (2)

In addition, we assume:

arX

iv:2

107.

0127

4v1

[ee

ss.S

P] 2

Jul

202

1

Page 2: Unbiasing Procedures for Scale-invariant Multi-reference

2

(i) supp(yj) ⊆ [− 12 ,

12 ] for 1 ≤ j ≤M .

(ii) tjMj=1 are independent samples of a random variablet ∈ R.

(iii) τjMj=1 are independent samples from a uniformly dis-tributed random variable τ satisfying:

τ ∈ R , E(τ) = 0 , Var(τ) = η2 ≤ 1/12.

(iv) εj(x)Mj=1 are independent white noise processes on[− 1

2 ,12 ] with variance σ2.

Model 2 is a first step towards studying more generaldiffeomorphisms f(τ(x)), since it considers the case whenτ(x) is an affine function. This is relevant to molecularimaging applications since the flexible regions of macro-molecular structures create diffeomorphisms of the underlyingshape [31]. Dilations are also highly relevant in imagingapplications, for example [12], [32]–[36]. Note we considerL∞(R) normalized dilations in Model 2 since this is nat-ural for images; however the method is easily modified toaccommodate other normalizations such as L1(R). The latteris useful in the statistical context, where one may observesamples from a family of distributions which are shifts andrescalings of an underlying distribution.

Solving Model 2 is highly challenging. Dilations causeinstabilities in the high frequencies of a signal, where even asmall dilation can lead to a large perturbation of the frequencyvalues. Ideally, one would like to compute a representationwhich (1) is both translation and dilation invariant, (2) allowsfor the additive noise to be removed by averaging, and (3)is invertible with a numerically stable algorithm. Howeverthere is a tension between achieving (1) and (3), since themore invariants which are built into the representation, theharder it will be to invert the representation and obtain theunderlying signal. In this article we propose the followingcompromise: we do not define a dilation invariant representa-tion, but propose a method for dilation unbiasing which canbe achieved with a numerically stable algorithm; we learn thepower spectrum of the hidden signal instead of the hiddensignal itself, thus reducing to a phase retrieval problem.

Model 2 was considered in [37] for L1(R) normalized dila-tions. The authors define wavelet-based, translation invariantfeatures and unbias for dilations by utilizing the first fewmoments of the dilation distribution. The method has twomain short-comings: although it can reduce the bias due todilations, it cannot remove it entirely, i.e. the method of [37]does not define an unbiased estimator of the true features. Inaddition, inverting the wavelet-based features to recover thepower spectrum of the hidden signal is numerically unstable,as it is driven by the condition number of a low rank matrix.This article proposes a method which overcomes both of thesechallenges: by working directly on the power spectrum, weavoid a numerically unstable inversion, and we develop a newunbiasing procedure which yields an unbiased estimator ofthe power spectrum of the hidden signal; we refer to thisunbiasing procedure as inversion unbiasing. To achieve this weassume explicit knowledge of the dilation distribution insteadof knowledge of the first few moments. To illustrate inversionunbiasing, it is helpful to define the following model in which

signals are randomly translated and dilated, but not corruptedby additive noise.

Model 3 (Dilation MRA data model). The dilation MRA datamodel consists of M independent observations of a compactlysupported, real-valued signal f ∈ L2(R):

yj(x) = f((1− τj)−1(x− tj)

), 1 ≤ j ≤M . (3)

In addition, we assume (i)-(iii) of Model 2.

Since Model 3 lacks additive noise, it can in fact be triviallysolved by first estimating ‖f‖2, and then dilating any observedsignal to have the right norm (for further details see [37]). Weuse Model 3 to build a theory to solve Model 2, but note it isnot of independent interest.

Remark 1. The box size in Models 1–3 is arbitrary; moregenerally, the signals may be supported on any finite interval[−N2 ,

N2 ]. All results still hold with σ

√N replacing σ.

The remainder of the article is organized as follows. SectionII motivates inversion unbiasing by first considering the infinitesample size case. Section III presents our main results forsolving Models 2 and 3 in the finite sample regime. SectionIV discusses how inversion unbiasing is implemented via anoptimization algorithm. Section V reports simulation resultstesting the performance of inversion unbiasing. Section VIconcludes the article and summarizes future research direc-tions.

A. Notation

Let fj(x) = f((1− τj)−1(x− tj)

)denote the jth signal

which is dilated by 1−τj . We note that fj(ω) = (1−τj)f((1−τj)ω), so that

(Pfj)(ω) = (1− τj)2(Pf)((1− τj)ω) .

We let g = Pf , and for Models 2 and 3 we define

gη(ω) := Eτ [(Pfj)(ω)] . (4)

Note for Model 2, it is easy to show that

gη(ω) = Eτ,ε[(Pyj)(ω)− σ2] ;

see for example Proposition 3.1 in [37]. We also define thefollowing constants which depend on η:

C0 =(1−

√3η)

(1 +√

3η), C1 = 2

√3η , C2 =

1

1 +√

3η, (5)

and we let (LCg)(ω) = C3g(Cω) be a dilation operator. Weuse a∗ to denote the complex conjugate of a. Finally, a = O(b)denotes a ≤ Cb for an absolute constant C.

II. INFINITE SAMPLE ESTIMATE

To motivate our finite sample procedure, we first considerhow to define an unbiased estimator in the infinite samplelimit. We can recover Pf from gη , as stated in the followingProposition.

Page 3: Unbiasing Procedures for Scale-invariant Multi-reference

3

Proposition 1. Assume Pf ∈ C0(R) and gη as defined in (4).Then for ω 6= 0:

(Pf)(ω) = (I − LC0)−1C1LC2

(3gη(ω) + ωg′η(ω)) ,

where C0, C1, C2 are as defined in (5).

Proof of Proposition 1. Since τ has a uniform distribu-tion with variance η2, the pdf of τ has form pτ =

12√

3η1[−√

3η,√

3η]. Thus:

gη(ω) := Eτ [(1− τ)2g((1− τ)ω)]

=

∫(1− τ)2g((1− τ)ω)pτ (ω) dτ

=1

2√

∫ √3η

−√

(1− τ)2g((1− τ)ω) dτ

=1

2√

∫ (1+√

3η)ω

(1−√

3η)ω

τ2

ω2g(τ)

1

ωdτ ,

where we have applied the change of variable τ = (1− τ)ω,dτ = − 1

ω dτ . Letting h(x) = x2g(x) and H(x) an antideriva-tive of h, by the Fundamental Theorem of Calculus we thusobtain:

2√

3ηω3gη(ω) =

∫ (1+√

3η)ω

(1−√

3η)ω

τ2g(τ) dτ

=

∫ (1+√

3η)ω

(1−√

3η)ω

h(τ) dτ

= H((1 +√

3η)ω)−H((1−√

3η)ω) .

Differentiating with respect to ω yields:

2√

3η(3ω2gη(ω) + ω3g′η(ω)

)= (1 +

√3η)h((1 +

√3η)ω)− (1−

√3η)h((1−

√3η)ω) ,

and dividing by ω2 gives:

2√

3η(3gη(ω) + ωg′η(ω)

)= (1 +

√3η)3g((1 +

√3η)ω)− (1−

√3η)3g((1−

√3η)ω) .

Applying the dilation operator LC2 then gives:

C1LC2(3gη + ωg′η(ω))

= g (ω)−

(1−√

1 +√

)3

g

((1−√

1 +√

)= (I − LC0)g .

Since C0 < 1, the series I+LC0+L2

C0+L3

C0+ . . . converges,

and I − LC0is invertible. We thus obtain

g = (I − LC0)−1C1LC2(3gη + ωg′η(ω)) ,

which proves the proposition.

III. FINITE SAMPLE ESTIMATES

Since we are only given a finite sample, we do not haveaccess to gη , but for large M , gη is well approximated by:

gη(ω) :=1

M

M∑j=1

(Pfj)(ω) . (6)

For dilation MRA, gη can be computed exactly, and wedescribe the resulting estimator in Section III-A. For noisydilation MRA, gη cannot be computed exactly due to theadditive noise, but an unbiased estimator can still be definedas described in Section III-B.

A. Results for Dilation MRA

Motivated by Propositions 1, we define the following esti-mator for dilation MRA:

(P f)(ω) := (I − LC0)−1C1LC2(3gη(ω) + ωg′η(ω)) , (7)

where gη is as defined in (6). We note that in practice one doesnot have a closed form formula for applying (I−LC0

)−1, but(P f)(ω) can be obtained by solving the following convexoptimization problem:

arg ming

‖(I − LC0 )g − C1LC2(3gη(ω) + ωg′η(ω))‖22 .

We describe this optimization procedure in detail in SectionIV, but first we analyze the statistical properties of the es-timator (P f)(ω). The key quantity we bound is the meansquared error (MSE) E

[‖Pf − P f‖22

]. The following lemma

establishes that when gη, g′η are good approximations of gη, g′η ,P f is a good approximation of Pf , so we can reduce theproblem to controlling gη, g′η .

Lemma 1. Assume Model 3, Pf ∈ C1(R), and the estimator(P f)(ω) defined in (7). Then:

‖Pf − P f‖22 . ‖gη − gη‖22 + ‖ω(g′η(ω)− g′η(ω))‖22 .

Proof. From Proposition 1 and (7)

Pf − P f =

(I − LC0)−1C1LC2

[3(gη − gη) + ω(g′η(ω)− g′η(ω))

].

Letting ‖·‖ denote the spectral norm, we thus obtain:

‖Pf − P f‖22 ≤ C21‖(I − LC0

)−1‖2‖LC2‖2

× ‖3(gη − gη) + ω(g′η(ω)− g′η(ω))‖22≤ 2C2

1‖(I − LC0)−1‖2‖LC2

‖2

×(9‖gη − gη‖22 + ‖ω(g′η(ω)− g′η(ω))‖22

).

We first observe that ‖LiC‖ = C5i2 since

‖LiCg‖22 =

∫(C3ig(Ciω))2 dω

=

∫C6ig(ω)2 dω

Cifor ω = Ciω

= C5i‖g‖22 .

Thus

‖(I − LC0)−1‖ =

∥∥ ∞∑i=0

LiC0

∥∥ ≤ ∞∑i=0

C5i2

0 =1

1− C520

= O(η−1)

‖LC2‖ = C

522 = O(1)

C1 = O(η)

Page 4: Unbiasing Procedures for Scale-invariant Multi-reference

4

so that

2C21‖(I − LC0)−1‖2‖LC2‖2 = O(1)O(η2)O(η−2) = O(1)

and we obtain

‖Pf − P f‖22 . ‖gη − gη‖22 + ‖ω(g′η(ω)− g′η(ω))‖22 .

Lemma 1 thus establishes that to bound E[‖Pf − P f‖22

],

it is sufficient to bound E[‖gη − gη‖22

]and

E[‖ω(g′η(ω)− g′η(ω))‖22

]. Utilizing Lemma 1 yields the

following Theorem, which bounds the MSE of (7) for dilationMRA. To control higher order terms we define:

(Pf)k(ω) := maxξ∈[ω/2,2ω]

|(Pf)k(ξ)| .

In general for well behaved functions (g)k and gk have thesame decay rate; for example, if gk is monotonic, (g)k(ω) =gk(2ω).

Theorem 1. Assume Model 3, the estimator (P f)(ω) definedin (7), Pf ∈ C3(R), and that ωk(Pf)(k)(ω) ∈ L2(R) fork = 2, 3. Then:

E[‖Pf − P f‖22

].η2

M

(‖(Pf)(ω)‖22 + ‖ω(Pf)′(ω)‖22+ ‖ω2(Pf)′′(ω)‖22

)+ r ,

where r is a higher-order term satisfying

r ≤ η4

M

(‖ω2(Pf)

′′(ω)‖22 + ‖ω3(Pf)

′′′(ω)‖22

).

Proof. By Lemma 1, it is sufficient to boundE[∥∥gη − gη∥∥2

2

]and E

[∥∥ω(g′η(ω)− g′η(ω))∥∥2

2

]. Since

gη(ω) = 1M

∑Mj=1 Pfj(ω), we have

(gη(ω)− gη(ω))2 ≤

1

M

M∑j=1

(Pfj)(ω)− gη(ω)

2

.

Let Xj = (Pfj)(ω) − gη(ω) = (Pfj)(ω) − E [(Pfj)(ω)].Thus because 1

M

∑Mj=1Xj is a centered random variable, we

have

E

1

M

M∑j=1

Xj

2 = var

1

M

M∑j=1

Xj

=var(Xj)

M. (8)

Note that we can write:

Xj = (Pfj)(ω)− (Pf)(ω) + (Pf)(ω)− E [(Pfj)(ω)]

X2j ≤ 2 ((Pfj)(ω)− (Pf)(ω))

2

+ 2 ((Pf)(ω)− E [(Pfj)(ω)])2

Since it is easy to check that

E[((Pf)(ω)− E [(Pfj)(ω)])

2]

≤ E[((Pfj)(ω)− (Pf)(ω))

2],

we obtain

E[X2j

]≤ 4E

[((Pfj)(ω)− (Pf)(ω))

2].

Taylor expanding (Pf)((1− τj)ω) gives:

(Pf)((1− τj)ω) = (Pf)(ω) + (Pf)′(ω) · ωτj

± 1

2(Pf)

′′(ω) · ω2τ2

j .

Multiplying by (1− τj)2 and rearranging:

(1− τj)2(Pf)((1− τj)ω)− (Pf)(ω)

= (−2τj + τ2j )(Pf)(ω) + (1− τj)2(Pf)′(ω) · ωτj

± (1− τj)2

2(Pf)

′′(ω) · ω2τ2

j .

Utilizing a+ b− c ≤ d ≤ a+ b+ c =⇒ d2 . a2 + b2 + c2,we square and take expectation to obtain

E[((Pfj)(ω)− (Pf)(ω))

2]

. [(Pf)(ω)]2η2 + [ω(Pf)′(ω)]

2η2 +

[ω2(Pf)

′′(ω)]2η4 .

Thus

var[Xj ] = E[X2j ] .

([(Pf)(ω)]

2+ [ω(Pf)′(ω)]

2)η2

+[ω2(Pf)

′′(ω)]2η4 .

Utilizing (8), we obtain

E[(gη(ω)− gη(ω))2

].η2

M

([(Pf)(ω)]

2+ [ω(Pf)′(ω)]

2+[ω2(Pf)

′′(ω)]2η2

)so that

E[‖gη − gη‖22

]=

∫E[(gη(ω)− gη(ω))2

]dω

.η2

M

(‖(Pf)(ω)‖22 + ‖ω(Pf)′(ω)‖22 + ‖ω2(Pf)

′′(ω)‖22 η2

).

We now bound E[∥∥ω(g′η(ω)− g′η(ω))

∥∥2

2

]. Letting gj = Pfj ,

we have

ωg′η(ω)− ωg′η(ω) =1

M

M∑j=1

ωg′j(ω)− ωg′η(ω) =1

M

M∑j=1

Zj

where

Zj = ωg′j(ω)− ωg′η(ω) .

We note E[Zj ] = 0, and a similar argument as the one appliedto Xj gives

Z2j ≤ 2

(ωg′j(ω)− ωg′(ω)

)2+ 2

(ωg′(ω)− ωg′η(ω)

)2E[Z2j

]≤ 4E

[(ωg′j(ω)− ωg′(ω)

)2].

Taylor expanding (Pf)′((1− τj)ω) gives

(Pf)′((1− τj)ω) = (Pf)′(ω) + (Pf)′′(ω) · ωτj

± 1

2(Pf)

′′′(ω) · ω2τ2

j .

Since ωg′j(ω) = ω(Pfj)′(ω) = (1 − τj)3ω(Pf)′((1 − τj)ω),

we multiply by (1− τj)3ω to obtain:

ω(Pfj)′(ω) = (1− τj)3ω(Pf)′(ω) + τj(1− τj)3ω2(Pf)′′(ω)

± 1

2τ2j (1− τj)3ω3(Pf)

′′′(ω)

Page 5: Unbiasing Procedures for Scale-invariant Multi-reference

5

Rearranging:

ω(Pfj)′(ω)− ω(Pf)′(ω) = (−3τj + 3τ2

j − τ3j )ω(Pf)′(ω)

+ τj(1− τj)3ω2(Pf)′′(ω)± 1

2τ2j (1− τj)3ω3(Pf)

′′′(ω) .

Squaring and taking expectation:

E[(ωg′j(ω)− ωg′(ω)

)2]. [ω(Pf)′(ω)]

2η2

+[ω2(Pf)′′(ω)

]2η2 +

[ω3(Pf)

′′′(ω)]2η4 .

Having bounded var [Zj ], an identical argument as the oneused to control E

[∥∥gη − gη∥∥2

2

]gives

E[∥∥ωg′η(ω)− ωg′η(ω)

∥∥2

2

].η2

M

(∥∥ω(Pf)′(ω)∥∥2

2

+∥∥ω2(Pf)′′(ω)

∥∥2

2+∥∥ω3(Pf)

′′′(ω)∥∥2

2η2),

which proves the Theorem.

Figure 2d illustrates how much is gained from inversionunbiasing for a specific high frequency signal; the mean powerspectrum under Model 3 is greatly perturbed due to largedilations, but P f is still an accurate approximation of Pf .Although in general a signal is not uniquely defined by itspower spectrum, if f is real and positive as in Figure 2, fcan be recovered from Pf . Figures 2a–2c illustrate how inthis case inversion unbiasing yields a signal which accuratelyapproximates the target.

B. Results for Noisy Dilation MRA

Solving noisy dilation MRA presents several additionalchallenges which are lacking in dilation MRA. First of all,the MSE can only be controled on a finite frequency intervaldue to the additive noise. We thus restrict to a finite frequencyinterval Ω, and consider the MSE of an estimator P f overthe finite interval, i.e. E

[‖Pf − P f‖2L2(Ω)

]. We note the

residual error from working on Ω decays to zero as |Ω| → ∞.In addition, in any numerical implementation one is alwaysrestricted to a finite frequency interval.

Another challenge is that one does not have direct accessto gη; rather one only has access to

1

M

M∑j=1

Pyj − σ2 = gη + gσ

where

gσ :=1

M

M∑j=1

fj ε∗j + f∗j εj + Pεj − σ2 .

Although the compact support of the hidden signal guaranteesthe smoothness of gη , gσ is not smooth due to the additivenoise. To extend the unbiasing procedure of Section III-A tothe additive noise context, it is thus necessary to smooth thenoisy power spectra. We thus compute (gη + gσ) ∗ φL where

-3 -2 -1 0 1 2 3

-10

-5

0

5

10

15

20

(a) Target signal

-3 -2 -1 0 1 2 3

-10

-5

0

5

10

15

20

(b) Signal recovered via PS

-3 -2 -1 0 1 2 3

-10

-5

0

5

10

15

20

(c) Signal recovered via mean PS

15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

16

18

(d) Power spectra

Fig. 2: Power spectrum estimation and signal recovery for highfrequency Gabor signal f3(x) = C3 exp−5x2

cos(32x) underModel 3 with η = 12−1/2 and M = 100, 000. The meanpower spectrum gη is greatly perturbed from the target powerspectrum Pf , but applying inversion unbiasing to gη yields anapproximation P f which is quite close to Pf (see Figure 2d).Figure 2a shows the target signal, and Figures 2b, 2c show thetarget signal approximations obtained by inverting gη , P f .

φL(ω) = (2πL2)−12 e−

ω2

2L2 is a Gaussian filter with width L,and then define the following estimator:

(P f)(ω) := (I − LC0)−1C1LC2

(9)[3(gη + gσ) ∗ φL(ω) + ω ((gη + gσ) ∗ φL)

′(ω)].

As M →∞ and L→ 0, (9) is an unbiased estimator of Pf .To quantify how the error of the estimator depends on L, weneed the following two lemmas.

Lemma 2. Let h ∈ L2(R) and assume |h(ω)| decays like|ω|−α for some integer α ≥ 1. Then for L small enough:

‖h− h ∗ φL‖22 . ‖h‖22L4 + L4∧(2α−1) .

Proof. The proof of Lemma 2 is given in Appendix A.

Lemma 3. Let xh(x) ∈ L2(R) and assume |(·)h(·)(ω)|decays at least like |ω|−α for some integer α ≥ 1. Then forL small enough:

‖x(h− h ∗ φL)‖22. (L3‖h‖22) ∧ (L4‖h′‖22) + ‖xh‖22L4 + L4∧(2α−1) .

Proof. The proof of Lemma 3 is given in Appendix B.

We now state the main result of the article.

Page 6: Unbiasing Procedures for Scale-invariant Multi-reference

6

Theorem 2. Assume Model 2, the estimator (P f)(ω) definedin (9), Pf ∈ C3(R), and that ωk(Pf)(k)(ω) ∈ L2(R) fork = 2, 3. Then

E[‖Pf − P f‖2L2(Ω)

]. Cf,Ω

(η2

M+ L4 +

σ2 ∨ σ4

L2M

).

Proof. From Proposition 1 and a proof similar to Lemma 1

‖Pf − P f‖2L2(Ω) . ‖gη + ωg′η(ω)− (gη + gσ) ∗ φL− ω((gη + gσ) ∗ φL)′(ω)‖2L2(Ω) .

By the triangle inequality

‖Pf − P f‖2L2(Ω) . ‖gη + ωg′η(ω)− gη − ωg′η(ω)‖22+ ‖gη + ωg′η(ω)− (gη + gσ) ∗ φL − ω((gη + gσ) ∗ φL)′(ω)‖2L2(Ω)

:= (A) + (B) .

From the proof of Theorem 1,

E[(A)] .

η2

M

(‖(Pf)(ω)‖22 + ‖ω(Pf)′(ω)‖22 + ‖ω2(Pf)′′(ω)‖22

)+ r ,

where r = Cfη4/M for a constant Cf depending on f . It

remains to control (B). We have

(B) . ‖gη − gη ∗ φL‖22 + ‖ωg′η − ω(gη ∗ φL)′‖22+ ‖gσ ∗ φL‖2L2(Ω) + ‖ω(gσ ∗ φL)′‖2L2(Ω)

:= (I) + (II) + (III) + (IV) .

We control (I) with Lemma 2 and (II) with Lemma 3; wenote in both cases α can be chosen arbitrarily large since thesignals have compact support. By Lemma 2,

(I) = ‖gη − gη ∗ φL‖22 . L4‖gη‖22 . L4‖Pf‖22 ,

since ‖Pfj‖2 = (1− τj)32 ‖Pf‖2 ≤ ( 3

2 )32 ‖Pf‖2. By Lemma

3,

(II) = ‖ωg′η − ω(gη ∗ φL)′‖22. L4‖g′′η‖22 + L4‖ωg′η(ω)‖22 + L4

. L4(‖(Pf)′′‖22 + ‖ω(Pf)′(ω)‖22 + 1

).

For (III), note that by Young’s Convolution Inequality

‖gσ ∗ φL‖2L2(Ω) ≤ ‖φL‖21 · ‖gσ‖2L2(Ω)

= ‖gσ‖2L2(Ω)

.∥∥∥ 1

M

M∑j=1

fj ε∗j

∥∥∥2

2+∥∥∥ 1

M

M∑j=1

Pεj − σ2∥∥∥2

L2(Ω).

We have

E

‖ 1

M

M∑j=1

fj ε∗j‖22

=

∫E

1

M

M∑j=1

fj(ω)ε∗j (ω)

2

≤∫

1

M2

M∑j=1

fj(ω)2σ2 dω

.σ2

M‖f‖22 .

Since E[Pεj ] = σ2, E[(Pεj)2] ≤ 3σ4 (see Lemma D.1 in

[37]), one has

E(

1

M

∑Pεj − σ2

)2

=var(Pεj)M

≤ 3σ4

M,

which implies

E[‖ 1

M

∑Pεj − σ2‖2L2(Ω)

]. |Ω|σ

4

M.

Thus

E [(III)] .σ2

M

(‖f‖22 + |Ω|σ2

).

For (IV), note that since ‖φ′L‖21 ∼ L−2,

‖ω(gσ ∗ φL)′‖2L2(Ω) ≤ |Ω|2 ‖gσ ∗ φ′L‖2L2(Ω)

≤ |Ω|2‖φ′L‖21‖gσ‖2L2(Ω)

.|Ω|2

L2‖gσ‖2L2(Ω) ,

so that utilizing our previous bound for E[‖gσ‖2L2(Ω)

]one

obtains

E [(IV)] .|Ω|2σ2

L2M

(‖f‖22 + |Ω|σ2

).

Adding up the error terms:

‖Pf − P f‖2L2(Ω)

.η2

M

(‖(Pf)(ω)‖22 + ‖ω(Pf)′(ω)‖22 + ‖ω2(Pf)′′(ω)‖22

)+ r + L4

(‖Pf‖22 + ‖(Pf)′′‖22 + ‖ω(Pf)′(ω)‖22 + 1

)+|Ω|2σ2

L2M

(‖f‖22 + |Ω|σ2

). Cf,Ω

(η2

M+ L4 +

σ2 ∨ σ4

L2M

),

which proves the theorem.

To minimize the error upper bound in Theorem 2, webalance the last two terms, i.e. we choose L such thatL4 ∼ σ2∨σ4

L2M . In the high noise regime where σ ≥ 1, this gives

L ∼(σ4

M

) 16

, which yields the following important corollary.

Corollary 1. Let the assumptions of Theorem 2 hold and in

addition let σ ≥ 1 and L =(σ4

M

) 16

. Then:

E[‖Pf − P f‖2L2(Ω)

]. Cf,Ω

[η2

M+

(σ4

M

) 23

].

Remark 2. The inversion unbiasing procedure can also bedirectly applied to the wavelet-based features (Sy)(λ) =‖y ∗ ψλ‖22, where ψλ(x) =

√λψ(λx) is a wavelet with

frequency λ, proposed in [37] for solving Model 2. Becausethese features are smooth by design, no additional smoothingis necessary, and when σ ≥ 1 this will yield an estimator Sfwith error

E[‖Sf − Sf‖2L2(Ω)

]. Cf,Ω

[η2

M+σ4

M

].

Page 7: Unbiasing Procedures for Scale-invariant Multi-reference

7

The additive noise convergence rate for the wavelet-basedfeatures is slightly better than the convergence rate for thepower spectrum given in Corollary 1. A power spectrumestimator P f can then be obtained from Sf , since the wavelet-based features are defined by an invertible operator on thepower spectrum. However, this inversion process is highlyunstable numerically, as its accuracy is governed by thesmallest eigenvalue of a low rank matrix. In practice, applyinginversion unbiasing directly to the power spectrum yielded alower error in our numerical experiments.

IV. OPTIMIZATION

To actually compute the estimator (9), one must applythe inverse operator (I − LC0

)−1. A simple formula forthis inversion is unavailable; however it is straightforwardto compute the estimators by solving a convex optimizationproblem. In the infinite sample limit, one has access to theperfect data term

d(ω) = 3gη(ω) + ωg′η(ω) ,

and Proposition 1 guarantees that g = Pf can be recoveredfrom d by

g = arg ming

∥∥(I − LC0)g − C1LC2

d∥∥2

2,

where the constants Ci depend on η. In practice the variationparameter η may be unknown, so the relevant loss function is:

L(g, η) =∥∥(I − LC0(η)

)g − C1(η)LC2(η)d

∥∥2

2.

The following Proposition guarantees that the infinite sampleloss function L has a unique critical point, and thus that g =Pf can be recovered by minimizing L.

Proposition 2. Let g ∈ C2(R), η > 0 be the true powerspectrum and dilation standard deviation, and assume g(0) 6=0, g′′(0) 6= 0. Then (g, η) is the only critical point of L(g, η)in (C2(R),R+).

Proof. We first compute ∇gL(g, η) and ∇ηL(g, η). To com-pute ∇gL(g, η), we first view η as fixed, and compute theFrechet derivative of L(g). Let A = I − LC0

; throughout theproof, A and the constants Ci depend on η but for brevity wedo not explicitly denote this dependence. Note

L(g) = ‖Ag − C1LC2d‖22 = N(Ag) ,

where Nf = ‖f − C1LC2d‖22. Thus by the chain rule, the

functional derivative at g applied to a test function h is

(DL)(g)h = (DN)(Ag) D(Ag)h = (DN)(Ag) Ah

since A is a linear operator. To compute DN , note that

|N(f + h)−Nf − 2〈f − C1LC2d, h〉|

‖h‖2=‖h‖22‖h‖2

→ 0

as ‖h‖2 → 0, so (DN)(f)h = 2〈f − C1LC2d, h〉. Thus

(DL)(g)h = 2〈Ag − C1LC2d,Ah〉

= 〈2A∗(Ag − C1LC2d), h〉

=⇒ ∇L(g) = 2A∗(Ag − C1LC2d) .

We thus have

∇gL(g, η) = 2A∗(Ag − C1LC2d)

∇ηL(g, η) =∫2(Ag(ω)− C1LC2d(ω))

∂η(Ag(ω)− C1LC2d(ω)) dω .

Since as demonstrated in the previous section Ag =C1LC2

d when η = η, ∇gL(g, η) = ∇ηL(g, η) = 0, and(g, η) is a critical point of L. We now show (g, η) is the onlycritical point.

Assume (g, η) is a critical point. Then 2A∗(Ag −C1LC2d) = 0. Since C0 < 1, A = I−LC0 is invertible as waspreviously argued; thus its adjoint A∗ is also invertible, andAg = C1LC2

d in L2. Since LC2is a dilation operator and

thus invertible, Bη g = d in L2, where Bη = C−11 L−1

C2A =

C−11 L−1

C2(I − LC0

). Next we show that if Bη g = Bηg in L2,we must have (g, η) = (g, η). It is easy to check from ourdefinition of C0, C1, C2 that

(Bη g)(ω) =(1 +

√3η)3

2√

3ηg(

(1 +√

3η)ω)

− (1−√

3η)3

2√

3ηg(

(1−√

3η)ω).

Note that

(Bη g)(0) =1

2√

((1 +

√3η)3 − (1−

√3η)3

)g(0)

=(3 + 3η2

)g(0)

and similarly for (Bηg)(0). Since the functions are equal inL2 and continuous, we must have(

3 + 3η2)g(0) =

[3 + 3η2

]g(0) .

In addition (Bη g)′′(0) satisfies

(Bη g)′′(0) =1

2√

((1 +

√3η)5 − (1−

√3η)5

)g′′(0)

= (5 + 30η2 + 9η4)g′′(0)

and similarly for (Bηg)′′(0). Again since the functions areequal in L2 and continuously differentiable, we must have

(5 + 30η2 + 9η4)g′′(0) = [5 + 30(η)2 + 9(η)4]g′′(0) .

So

g(0) = K1g(0)

g′′(0) = K2g′′(0)

for constants K1,K2 > 0 depending on η, η. We conclude wemust have K1 = K2. So

K1 = K2 ⇐⇒3 + 3(η)2

3 + 3η2=

5 + 30(η)2 + 9(η)4

5 + 30η2 + 9η4

⇐⇒[3 + 3(η)2

](5 + 30η2 + 9η4)

= (3 + 3η2)[5 + 30(η)2 + 9(η)4] .

Since η = η is the only real, positive solution of the above,we conclude η = η. Thus Bηg = Bη g. Since Bη is invertible,we conclude that g = g, and the Proposition is proved.

Page 8: Unbiasing Procedures for Scale-invariant Multi-reference

8

In practice one only has access to the finite sample dataterm and loss function:

d(ω) := 3(gη + gσ) ∗ φL(ω) + ω [(gη + gσ) ∗ φ′L] (ω)

L(g, η) :=∥∥(I − LC0(η)

)g − C1(η)LC2(η)d

∥∥2

2,

and the estimator (9) is computed by minimizing L. However,as M → ∞, Proposition 2 guarantees the optimizationprocedure has a unique critical point and is thus well behaved.However for finite M , the optimization can be delicate: sinceL(g, 0) = 0 for any g, there is a large plateau defined by η = 0were loss values are small even for g very far from Pf . It thusbecomes necessary to constrain η to be bounded away from0; Section V describes specific implementation details.

Remark 3. If η is known so the optimization is just over g,the optimization is convex.

Remark 4. In practice we define p =√g, optimize over p to

obtain the optimal p, and then define g = p2; such a procedureensures g is nonnegative without constraining g in the opti-mization. Note to implement the minimization of L(g, η), oneneeds to compute A∗ for the operator A = I−LC0

. A straight-forward calculation shows A∗h(ω) = h(ω)− C2

0h(ωC0

).

V. SIMULATION RESULTS

In this section we investigate the proposed inversion unbias-ing procedure on the following collection of synthetic signalswhich capture a variety of features:

f1(x) = C1 exp−5x2

cos(8x)

f2(x) = C2 exp−5x2

cos(16x)

f3(x) = C3 exp−5x2

cos(32x)

f4(ω) = C4 [sinc(0.2(ω − 32)) + sinc(0.2(−ω − 32))]

f5(x) = C5 exp−0.04x2

cos(30x+ 1.5x2)

f6(ω) = C6 [1(ω ∈ [−38,−32]) + 1(ω ∈ [32, 38])]

f7(ω) = C7 [zigzag (0.2(ω + 40)) + zigzag (0.2(ω + 40))]1/2

f8(x) = 0

The hidden signals were defined on [−N4 ,N4 ] and the corre-

sponding noisy signals on [−N2 ,N2 ]. The signals were sampled

at rate 1/2`, resolving frequencies in the interval [−2`π, 2`π];N = 25 and ` = 5 were used for all simulations. Asindicated above, f4, f6, f7 were sampled directly in the fre-quency domain, while the rest were sampled in the spatialdomain. The normalization constants Ci were chosen so thatall signals would have the same SNR for a fixed additivenoise level, specifically (SNR)−1 = σ2, where SNR =(

1N

∫ N/2−N/2 f(x)2 dx

)/σ2.

The Gabors f1 − f3 are smooth with a fast decay in bothspace and frequency; f4 is discontinuous in space, with asmooth but slowly decaying FT; f5 is a linear chirp witha non-constant instantaneous frequency; f6 is discontinuousin frequency; f7 is continuous but not smooth in frequency.The zero signal was included to investigate the effect ofthe inversion unbiasing procedure when applied directly to

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(a) f1 (slope = −0.2498)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(b) f2 (slope = −0.2473)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(c) f3 (slope = −0.2464)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(d) f4 (slope = −0.2306)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(e) f5 (slope = −0.2184)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(f) f6 (slope = −0.1071)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(g) f7 (slope = −0.2400)

4 6 8 10 12 14 16 18 20-1

0

1

2

3

4

5

6

7

PS (No Dilation UB)PS (Inversion UB)

(h) f8 (slope = −0.2320)

Fig. 3: Error decay with standard error bars for Model 2 (oraclemoment estimation). All plots show relative L2 error and havethe same axis limits, except Figure 3h, which shows absoluteerror. Reported slopes were computed by linear regression onthe right half of the plot, i.e. for 12 ≤ log2(M) ≤ 20.

Page 9: Unbiasing Procedures for Scale-invariant Multi-reference

9

-2.5 -2 -1.5 -1 -0.5 0 0.5-8

-7

-6

-5

-4

-3

-2

-1

0

(a) Smoothing decay rate

4 6 8 10 12 14 16 18 20-4

-2

0

2

4

6

8

10

12

14

16

(b) Young’s Inequality

Fig. 4: Plots explaining the small discrepancy between theo-retical and empirical convergence rates. Figure 4a: The rightside of the dashed line shows the L values corresponding to12 ≤ log2(M) ≤ 20, i.e. the upper range of values used in oursimulations. In the simulation regime, the slope in the log-logplot is 1.65; however for small L (left side of dashed line),the slope is 1.96, which closely matches the L2 rate given inLemma 3. Figure 4b: the additive noise term exhibits a decayrate of -0.25 in the range of M values used for our simulations,while the upper bound due to Young’s Inequality decays at thefaster rate of -0.33.

additive noise, i.e. in the absence of any signal. We investigatethe ability of inversion unbiasing to solve Model 2 in thechallenging regime of both low SNR and large dilations.Specifically we choose SNR = 1

2 and τ uniform on [− 12 ,

12 ]

(thus σ =√

2 and η = 12−1/2 ≈ 0.2887). For comparison,the simulations in [37] were restricted to η ≤ 0.12.

We first assume oracle knowledge of the additive noise anddilation variances σ2, η2. We let M increase exponentiallyfrom 16 to 1, 048, 576, and for each value of M we run 10simulations of Model 2 and compute P f as given in (9). Thewidth of the Gaussian filter L is chosen as in Corollary 1,and the inversion operator is applied by solving a convexoptimization problem as described in Section IV. For eachsimulation, the relative error of the resulting power spectrumestimator is computed as

Error :=‖Pf − P f‖2‖Pf‖2

,

and the mean error is then computed across simulations. Figure3 shows the decay of the mean error as the sample size Mincreases. All signals exhibit a linear error decay in the log-log plots; as the error decay does not plateau, the simulationsconfirm that P f is an unbiased estimator of Pf as shown inTheorem 2 and Corollary1.

More specifically, for signals with a smooth power spectrum(f1, . . . , f5, f8), Corollary 1 predicts that the error shoulddecay like M−1/3, i.e. we would expect to observe a slopeof −1/3 in the log-log plots. In practice the error decay isslightly slower, with a slope of about −1/4 for the smoothsignals. There are two reasons for the small mismatch betweenthe theory and simulations. First of all, Lemmas 2 and 3 arebased on Taylor expansions about L = 0, and so the decayrates in terms of L are only sharp for L small enough; thedecay rate is slightly worse in the range of L values used in

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(a) f1 (slope = −0.2355)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(b) f2 (slope = −0.2292)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(c) f3 (slope = −0.2126)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(d) f4 (slope = −0.2712)

4 6 8 10 12 14 16 18 20-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

PS (No Dilation UB)PS (Inversion UB)

(e) f5 (slope = −0.1709)

4 6 8 10 12 14 16 18 20-1

0

1

2

3

4

5

6

7

8

PS (No Dilation UB)PS (Inversion UB)

(f) f8 (slope = −0.2856)

Fig. 5: Error decay with standard error bars for Model 2(empirical moment estimation). All plots show relative L2

error and have the same axis limits, except Figure 5f, whichshows absolute error. Reported slopes were computed bylinear regression on the right half of the plot, i.e. for 12 ≤log2(M) ≤ 20.

our simulations; see Figure 4a. In practice when the continuoustheory is implemented on a computer, one can never take Lsmaller than the discrete frequency resolution. Secondly, theproof of Theorem 2 applies Young’s Convolution Inequality tocontrol the additive noise terms, but simulations indicate thatthe actual decay rate of the additive noise terms is smallerthan this upper bound for the simulation range of M values.See Figure 4b; as M →∞, the decay rates do converge.

For the non-smooth signals, recall that f7 has a powerspectrum which is continuous but not differentiable whilef6 has a discontinuous power spectrum. The decay rate off7 matches that of the smooth signals, but Pf7 /∈ C1(R),indicating that perhaps Pf ∈ C3(R) is not required to achievethe rate in Theorem 2 but an artifact of the proof technique.We note the infinite sample result (Proposition 1) holds underthe much milder assumption Pf ∈ C0(R). In practice, thedecay rate seems to be driven by the α appearing in Lemma2; for f6, Lemma 2 would apply with α = 1 to give an errordecay like

√L and a predicted slope of −1/12 = −0.083; we

Page 10: Unbiasing Procedures for Scale-invariant Multi-reference

10

observe −0.1071 in Figure 3f.We next investigate the ability of inversion unbiasing to

solve Model 2 without oracle knowledge of the variancesσ2, η2. The additive noise level can be reliably estimated fromthe signal tails. Estimating η is more complex and we imple-ment a joint optimization procedure to simultaneously learn ηand Pf . The optimization to learn η must be constrained sinceη = 0 minimizes the loss function (recall Proposition 2 onlyapplies to η > 0); η is thus constrained to lie in the interval[0.05, 0.40] and we initialize η on a course grid ranging from0.10 to 0.35. For each initialization, the learned η value isrecorded; a set of candidate η values is obtained by discardinglearned η values which are close to the boundary, and η isthen selected as the candidate value with the smallest loss.Results are shown in Figure 5 for the signals with smoothpower spectra; error decay is similar to the oracle case butmore variable. Note η cannot be reliably learned with thisgradient descent procedure when the power spectrum is notsmooth.

VI. CONCLUSION

This article considers a generalization of MRA whichincludes random dilations in addition to random translationsand additive noise. The proposed method has several desirableproperties compared with previous work. The bias due todilations is eliminated (not just reduced as in [37]) as thesample size increases. In addition, the method is numericallystable, as the unbiasing procedure operates directly on thepower spectrum, rather than features derived from the powerspectrum.

There are many compelling directions for future research.By extending the inversion unbiasing procedure to operateon the bispectrum instead of the power spectrum, full signalrecovery should be possible with an additional computationalcost. In addition, preliminary work suggests that inversionunbiasing can be extended to a broad class of dilation dis-tributions as long as their underlying density functions areknown. Thus innovative methods for robustly learning thedilation distribution are critical for these methods to becomecompetitive for real world applications. Another promisingdirection is to design a representation which is both translationand dilation invariant, and where the effect of the additivenoise can be removed by an averaging procedure. However itremains to be seen whether such a representation exists whichis also invertible, i.e. is the hidden signal uniquely defined upto the desired invariants? Once these foundational questionsare answered, extensions to 2-dimensional signals are also ofinterest.

APPENDIX APROOF OF LEMMA 2

Proof. Note by assumption there exist constants C > 0, ω0 ≥1 such that |h(ω)| ≤ C|ω|−α for |ω| ≥ ω0. Also note that

φL(ω) = e−L2ω2/2, so that 1 − φL(ω) = L2ω2

2 + O(L3) forsmall L. We have:

‖h− h ∗ φL‖22 = (2π)−1‖h(1− φL)‖22

=1

∫|ω|<ω0

|h(ω)|2|1− φL(ω)|2 dω

+1

∫|ω|≥ω0

C2|ω|−2α|1− φL(ω)|2 dω

:= (I) + (II) .

Note:

(I) ≤∫|ω|<ω0

|h(ω)|2|1− φL(ω)|2 dω

≤ 2

∫ ω0

0

|h(ω)|2(L2ω2

2+O(L3)

)2

≤ 2

(L4ω4

0

4+O(L5)

)∫ ω0

0

|h(ω)|2 dω

≤ ω40

2‖h‖22L4 +O(L5) .

To control the second term, note

(II) ≤ 2C2

∫ ∞1

ω−2α(

1− e−L2ω2

2

)2

= 2C2

∫ ∞L

(L

ω

)2α (1− e− ω2

2

)2 dω

L

= 2C2L2α−1

∫ ∞L

ω−2α(

1− e−ω2

2

)2

dω .

Explicit evaluation of the upper bound with a computer algebrasystem gives:

α = 1 : C1L+O(L4)

α = 2 : C2L3 +O(L4)

α = 3 : C3L4 +O(L5)

Also sinced

∫ ∞1

ω−2α(

1− e−L2ω2

2

)2

=

∫ ∞1

−2 ln(ω)ω−2α(

1− e−L2ω2

2

)2

dω < 0 ,

the upper bound is decreasing in α, and we can conclude(II) . L4∧(2α−1) and the lemma is proved.

APPENDIX BPROOF OF LEMMA 3

Proof. First observe:

‖x(h− h ∗ φL)‖22 = (2π)−1∥∥∥ ddω

(h− hφL

)∥∥∥2

2

= (2π)−1‖h′ − h′φL − hφ′L‖22. ‖h′ − h′φL‖22 + ‖hφ′L‖22 .

To bound the first term, we apply Lemma 2 to the functionxh to obtain,

‖h′ − h′φL‖22 = 2π‖xh− (xh) ∗ φL‖22. ‖xh‖22L4 + L4∧(2α−1) .

Page 11: Unbiasing Procedures for Scale-invariant Multi-reference

11

To bound the second term, note φ′L(ω) = −L2ωe−L2ω2/2, and

that ‖ω2e−L2ω2‖∞ = (eL)−1. Thus

‖hφ′L‖22 = L4

∫|h(ω)|2ω2e−L

2ω2

dω ≤ L3‖h‖22 .

Note we could get a higher power for L by

‖hφ′L‖22 ≤ L4

∫ω2|h(ω)|2 dω = L4‖ωh‖22 . L4‖h′‖22 ,

which proves the lemma.

ACKNOWLEDGMENT

This work was supported by the National Science Foun-dation [grant DMS-1912906 to A.L. and M.H.; grant DMS-1845856 to M.H.], the National Institutes of Health [grantNIGMS-R01GM135929 to M.H.], and the Department ofEnergy [grant DE-SC0021152 to M.H.].

REFERENCES

[1] D. L. Theobald and P. A. Steindel, “Optimal simultaneous superposition-ing of multiple structures with missing data,” Bioinformatics, vol. 28,no. 15, pp. 1972–1979, 2012.

[2] R. Diamond, “On the multiple simultaneous superposition of molecularstructures by rigid body transformations,” Protein Science, vol. 1, no. 10,pp. 1279–1287, 1992.

[3] S. H. Scheres, M. Valle, R. Nunez, C. O. Sorzano, R. Marabini, G. T.Herman, and J.-M. Carazo, “Maximum-likelihood multi-reference re-finement for electron microscopy images,” Journal of molecular biology,vol. 348, no. 1, pp. 139–149, 2005.

[4] B. M. Sadler and G. B. Giannakis, “Shift-and rotation-invariant objectreconstruction using the bispectrum,” JOSA A, vol. 9, no. 1, pp. 57–69,1992.

[5] W. Park, C. R. Midgett, D. R. Madden, and G. S. Chirikjian, “Astochastic kinematic model of class averaging in single-particle electronmicroscopy,” The International journal of robotics research, vol. 30,no. 6, pp. 730–754, 2011.

[6] W. Park and G. S. Chirikjian, “An assembly automation approach toalignment of noncircular projections in electron microscopy,” IEEETransactions on Automation Science and Engineering, vol. 11, no. 3,pp. 668–679, 2014.

[7] J. P. Zwart, R. van der Heiden, S. Gelsema, and F. Groen, “Fasttranslation invariant classification of hrr range profiles in a zero phaserepresentation,” IEE Proceedings-Radar, Sonar and Navigation, vol.150, no. 6, pp. 411–418, 2003.

[8] R. Gil-Pita, M. Rosa-Zurera, P. Jarabo-Amores, and F. Lopez-Ferreras,“Using multilayer perceptrons to align high range resolution radarsignals,” in International Conference on Artificial Neural Networks.Springer, 2005, pp. 911–916.

[9] R. M. Leggett, D. Heavens, M. Caccamo, M. D. Clark, and R. P. Davey,“Nanook: multi-reference alignment analysis of nanopore sequencingdata, quality and error profiles,” Bioinformatics, vol. 32, no. 1, pp. 142–144, 2015.

[10] H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase corre-lation to subpixel registration,” IEEE transactions on image processing,vol. 11, no. 3, pp. 188–200, 2002.

[11] L. G. Brown, “A survey of image registration techniques,” ACM com-puting surveys (CSUR), vol. 24, no. 4, pp. 325–376, 1992.

[12] D. Robinson, S. Farsiu, and P. Milanfar, “Optimal registration of aliasedimages using variable projection with applications to super-resolution,”The Computer Journal, vol. 52, no. 1, pp. 31–42, 2007.

[13] A. Singer, “Angular synchronization by eigenvectors and semidefiniteprogramming,” Applied and computational harmonic analysis, vol. 30,no. 1, pp. 20–36, 2011.

[14] N. Boumal, “Nonconvex phase synchronization,” SIAM Journal onOptimization, vol. 26, no. 4, pp. 2355–2377, 2016.

[15] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra, “Message-passing algorithms for synchronization problems over compact groups,”Communications on Pure and Applied Mathematics, vol. 71, no. 11, pp.2275–2322, 2018.

[16] Y. Chen and E. J. Candes, “The projected power method: An efficientalgorithm for joint alignment from pairwise differences,” Communica-tions on Pure and Applied Mathematics, vol. 71, no. 8, pp. 1648–1714,2018.

[17] A. S. Bandeira, N. Boumal, and A. Singer, “Tightness of the maximumlikelihood semidefinite relaxation for angular synchronization,” Mathe-matical Programming, vol. 163, no. 1-2, pp. 145–167, 2017.

[18] Y. Zhong and N. Boumal, “Near-optimal bounds for phase synchro-nization,” SIAM Journal on Optimization, vol. 28, no. 2, pp. 989–1016,2018.

[19] A. Bandeira, Y. Chen, R. R. Lederman, and A. Singer, “Non-uniquegames over compact groups and orientation estimation in cryo-em,”Inverse Problems, 2020.

[20] A. S. Bandeira, M. Charikar, A. Singer, and A. Zhu, “Multireferencealignment using semidefinite programming,” in Proceedings of the 5thconference on Innovations in theoretical computer science. ACM, 2014,pp. 459–470.

[21] Y. Chen, L. J. Guibas, and Q.-X. Huang, “Near-optimal joint objectmatching via convex relaxation,” in Proceedings of the 31st InternationalConference on Machine Learning, ser. Proceedings of Machine LearningResearch, vol. 32, no. 2, 2014, pp. 100–108.

[22] A. S. Bandeira, N. Boumal, and V. Voroninski, “On the low-rankapproach for semidefinite programs arising in synchronization andcommunity detection,” in Conference on learning theory, 2016, pp. 361–382.

[23] L. P. Hansen, “Large sample properties of generalized method of mo-ments estimators,” Econometrica: Journal of the Econometric Society,pp. 1029–1054, 1982.

[24] Z. Kam, “The reconstruction of structure from electron micrographsof randomly oriented particles,” in Electron Microscopy at MolecularDimensions. Springer, 1980, pp. 270–277.

[25] N. Sharon, J. Kileel, Y. Khoo, B. Landa, and A. Singer, “Method ofmoments for 3-D single particle ab initio modeling with non-uniformdistribution of viewing angles,” Inverse Problems, vol. 36, no. 4, p.044003, 2020.

[26] T. Bendory, N. Boumal, C. Ma, Z. Zhao, and A. Singer, “Bispectruminversion with application to multireference alignment,” IEEE Transac-tions on Signal Processing, vol. 66, no. 4, pp. 1037–1050, 2017.

[27] A. Bandeira, P. Rigollet, and J. Weed, “Optimal rates of estimation formulti-reference alignment,” arXiv preprint at arXiv:1702.08546, 2017.

[28] W. Collis, P. White, and J. Hammond, “Higher-order spectra: thebispectrum and trispectrum,” Mechanical systems and signal processing,vol. 12, no. 3, pp. 375–394, 1998.

[29] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the em algorithm,” Journal of the RoyalStatistical Society: Series B (Methodological), vol. 39, no. 1, pp. 1–22,1977.

[30] E. Abbe, T. Bendory, W. Leeb, J. M. Pereira, N. Sharon, and A. Singer,“Multireference alignment is easier with an aperiodic translation distri-bution,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp.3565–3584, 2018.

[31] M. Palamini, A. Canciani, and F. Forneris, “Identifying and visualizingmacromolecular flexibility in structural biology,” Frontiers in molecularbiosciences, vol. 3, p. 47, 2016.

[32] V. Chandran and S. L. Elgar, “Position, rotation, and scale invariantrecognition of images using higher-order spectra,” in ICASSP’92: IEEEInternational Conference on Acoustics, Speech, and Signal Processing,vol. 5. IEEE, 1992, pp. 213–216.

[33] L. Capodiferro, R. Cusani, G. Jacovitti, and M. Vascotto, “A correlationbased technique for shift, scale, and rotation independent object identi-fication,” in ICASSP’87: IEEE International Conference on Acoustics,Speech, and Signal Processing, vol. 12. IEEE, 1987, pp. 221–224.

[34] M. K. Tsatsanis and G. B. Giannakis, “Translation, rotation, andscaling invariant object and texture classification using polyspectra,” inAdvanced Signal Processing Algorithms, Architectures, and Implemen-tations, vol. 1348. International Society for Optics and Photonics, 1990,pp. 103–115.

[35] K. Hotta, T. Mishima, and T. Kurita, “Scale invariant face detection andclassification method using shift invariant features extracted from log-polar image,” IEICE Transactions on Information and Systems, vol. 84,no. 7, pp. 867–878, 2001.

[36] D. Martinec and T. Pajdla, “Robust rotation and translation estimationin multiview reconstruction,” in 2007 IEEE Conference on ComputerVision and Pattern Recognition. IEEE, 2007, pp. 1–8.

[37] M. Hirn and A. Little, “Wavelet invariants for statistically robustmulti-reference alignment,” Information and Inference: A Journal of

Page 12: Unbiasing Procedures for Scale-invariant Multi-reference

12

the IMA, 08 2020, iaaa016. [Online]. Available: https://doi.org/10.1093/imaiai/iaaa016

Matthew Hirn Matthew Hirn is an Associate Pro-fessor in the Department of Computational Mathe-matics, Science & Engineering and the Departmentof Mathematics at Michigan State University. AtMichigan State he is the scientific leader of theComplEx Data Analysis Research (CEDAR) team,which develops new tools in computational har-monic analysis, machine learning, and data sciencefor the analysis of complex, high dimensional data.Hirn received his B.A. in Mathematics from CornellUniversity and his Ph.D. in Mathematics from the

University of Maryland, College Park. Before arriving at MSU, he heldpostdoctoral appointments in the Applied Math Program at Yale Universityand in the Department of Computer Science at Ecole Normale Superieure,Paris. He is the recipient of the Alfred P. Sloan Fellowship (2016), the DARPAYoung Faculty Award (2016), the DARPA Director’s Fellowship (2018), andthe NSF CAREER award (2019), and was designated a Kavli Fellow by theNational Academy of Sciences (2017).

Anna Little Anna Little received her PhD fromDuke University in 2011, where she worked un-der Mauro Maggioni to develop a new multiscalemethod for estimating the intrinsic dimension of adata set. From 2012-2017 she was an Assistant Pro-fessor of Mathematics at Jacksonville University, aprimarily undergraduate liberal arts institution wherein addition to teaching and research she served as astatistical consultant. From 2018-2020 she was a re-search postdoc in the Department of ComputationalMathematics, Science, and Engineering at Michigan

State University, where she worked with Yuying Xie and Matthew Hirn onstatistical and geometric analysis of high-dimensional data. She is currentlyan Assistant Professor in the Department of Mathematics at the University ofUtah, as well as a member of the Utah Center for Data Science.