Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Universality in Polytope Phase Transitions and
Message Passing Algorithms
Mohsen Bayatiâ, Marc Lelargeâ and Andrea Montanari âĄ
July 31, 2012
Abstract
We consider a class of nonlinear mappings FA,N in RN indexed by symmetric random matrices
A â RNĂN with independent entries. Within spin glass theory, special cases of these mappings
correspond to iterating the TAP equations and were studied by Erwin Bolthausen. Within infor-mation theory, they are known as âapproximate message passingâ algorithms.
We study the high-dimensional (large N) behavior of the iterates of F for polynomial functionsF, and prove that it is universal, i.e. it depends only on the first two moments of the entries ofA, under a subgaussian tail condition. As an application, we prove the universality of a certainphase transition arising in polytope geometry and compressed sensing. This solves âfor a broadclass of random projectionsâ a conjecture by David Donoho and Jared Tanner.
1 Introduction and main results
Let A â RNĂN be a random Wigner matrix, i.e. a random matrix with i.i.d. entries Aij satisfying
E{Aij} = 0 and E{A2ij} = 1/N . Considerable effort has been devoted to studying the distribution of
the eigenvalues of such a matrix [AGZ09, BS05, TV12]. The universality phenomenon is a strikingrecurring theme in these studies. Roughly speaking, many asymptotic properties of the joint eigen-values distribution are independent of the entries distribution as long as the latter has the prescribedfirst two moments, and satisfies certain tail conditions. We refer to [AGZ09, BS05, TV12] and ref-erences therein for a selection of such results. Universality is extremely useful because it allows tocompute asymptotics for one entries distribution (typically, for Gaussian entries) and then exportthe results to a broad class of distributions.
In this paper we are concerned with random matrix universality, albeit we do not focus oneigenvalues properties. Given A â R
NĂN , and an initial condition x0 â RN independent of A, we
consider the sequence (xt)tâ„0 t â N defined by letting, for t â„ 0,
xt+1 = Af(xt; t)â bt f(xtâ1; tâ 1) , bt âĄ1
Ndiv(f(x; t))
âŁâŁx=xt. (1.1)
âGraduate School of Business, Stanford University
â INRIA and ENS, Paris
âĄDepartment of Electrical Engineering and Department of Statistics, Stanford University
1
Here, div denotes the divergence operator and, for each t â„ 0, f( · ; t) : RN â R
N is a separablefunction, i.e. f(z; t) = (f1(z1; t), . . . , f2(zN ; t)) where the functions fi( · ; t) : R â R are polynomialsof bounded degree. In particular bt = Nâ1
âNi=1 f
âČi(x
ti; t).
The present paper is concerned with the asymptotic distribution of xt as N â â with t fixed,and establishes the following results:
Universality. As N ââ, the finite-dimensional marginals of the distribution of xt are asymptot-ically insensitive to the distribution of the entries of Aij.
State evolution. The entries of xt are asymptotically Gaussian with zero mean, and variance thatcan be explicitly computed through a one-dimensional recursion, that we will refer to as stateevolution
Phase transitions in polytope geometry. As an application, we use state evolution to proveuniversality of a phase transition on polytope geometry, with connections to compressed sens-ing. This solves âfor a broad class of random matrices with independent entriesâ a conjectureput forward by David Donoho and Jared Tanner in [Don05a, DT11].
In order to illustrate the usefulness of the first two technical results, we start the presentation of ourresults from the third one.
1.1 Universality of polytope neighborliness
A polytope Q is said to be centrosymmetric if x â Q implies âx â Q. Following [Don05b, Don05a]we say that such a polytope is k-neighborly if the condition below holds:
(I) Every subset of k vertices of Q which does not contain an antipodal pair, spans a (k â 1)dimensional face.
The neighborliness of Q is the largest value of k for which this condition holds. The prototype ofneighborly polytope is the `1 ball Cn ⥠{x â R
n : âxâ1 †1}, whose neighborliness is indeed equalto n.
It was shown in a series of papers [Don05b, Don05a, DT05b, DT05a, DT09] that polytope neigh-borliness has tight connections with the geometric properties of random point clouds, and withsparsity-seeking methods to solve underdetermined systems of linear equations. The latter are inturn central in a number of applied domains, including model selection for data analysis and com-pressed sensing. For the readerâs convenience, these connections will be briefly reviewed in Section5.
Intuitive images of low-dimensional polytopes suggest that âtypicalâ polytopes are not neighborly:already selecting k = 2 vertices, does lead to a segment that connects them and passes through theinterior of Q. This conclusion is spectacularly wrong in high dimension. Natural random construc-tions lead to polytopes whose neighborliness scales linearly in the dimension. Motivated by the aboveapplications, and following [Don05b, Don05a, DT05b, DT05a], we focus here on a weaker notion ofneighborliness. Roughly speaking, this corresponds to the largest k such that most subsets of kvertices of Q span a (kâ 1)-dimensional face. In order to formalize this notion, we denote by F(Q; `)the number of b`c-dimensional faces of Q.
2
Definition 1. Let Q = {Qn}nâ„0 be a sequence of centrosymmetric polytopes indexed by n where Qn
has 2n vertices and has dimension m = m(n): Qn â Rm. We say that Q has weak neighborliness
Ï â (0, 1) if for any Ο > 0,
limnââ
F(Qn;m(n)Ï(1â Ο))
F(Cn;m(n)Ï(1 â Ο))= 1,
limnââ
F(Qn;m(n)Ï(1 + Ο))
F(Cn;m(n)Ï(1 + Ο))= 0 .
If the sequence Q is random, we say that Q has weak neighborliness Ï (in probability) if the abovelimits hold in probability.
In other words, a sequence of polytopes {Qn}nâ„0 has weak neighborliness Ï, if for large n the mdimensional polytope Qn has close to the maximum possible number of k faces, for all k < mÏ(1âΟ).
Note 1. Note that previously the neighborliness of a polytope was defined to be the largest integer ksatisfying condition (I). However, in our definition weak neighborliness refers to the fraction k/n.This is due to the fact that weak neighborliness is defined in the limit nââ.
The existence of weakly neighborly polytope sequences is clear when m(n) = n since in this casewe can take Qn = Cn with Ï = 1, but the existence is highly non-trivial when m is only a fractionof n.
It comes indeed as a surprise that this is a generic situation as demonstrated by the followingconstruction. For a matrix A â R
mĂn, and S â Rn, let AS ⥠{Ax â R
m : x â S}. In particular,ACn is the centrosymmetric m-dimensional polytope obtained by projecting the n-dimensional `1
ball to m dimensions. The following result was proved in [Don05a].
Theorem 1 (Donoho, 2005). There exists a function Ïâ : (0, 1) â (0, 1) such that the followingholds. Fix ÎŽ â (0, 1). For each n â N, let m(n) = bnÎŽc and define A(n) â R
m(n)Ăn to be a randommatrix with i.i.d. Gaussian entries.
Then, the sequence of polytopes {A(n)Cn}nâ„0 has weak neighborliness Ïâ(ÎŽ) in probability.
A characterization of the curve ÎŽ 7â Ïâ(ÎŽ) was provided in [Don05a], but we omit it here since amore explicit expression will be given below.
The proof of Theorem 1 is based on exact expressions for the number of faces F(A(n)Cn; `).These are in turn derived from earlier works in polytope geometry by Affentranger and Schneider[AS92] and by Vershik and Sporyshev [VS92]. This approach relies in a fundamental way on theinvariance of the distribution of A(n) under rotations.
Motivated by applications to data analysis and signal processing, Donoho and Tanner [DT11]carried out extensive numerical simulations for random polytopes of the form A(n)C n for severalchoices of the distribution of A(n). They formulated a universality hypothesis according to whichthe conclusion of Theorem 1 holds for a far broader class of random matrices. The results of theirnumerical simulations were consistent with this hypothesis.
Here we establish the first rigorous result indicating universality of polytope neighborliness for abroad class of random matrices. Define the curve (ÎŽ, Ïâ(ÎŽ)), ÎŽ â (0, 1), parametrically by letting, for
3
α â (0,â):
ÎŽ =2Ï(α)
α+ 2(Ï(α) â αΊ(âα)), (1.2)
Ï = 1â αΊ(âα)
Ï(α), (1.3)
where Ï(z) = eâz2/2/â
2Ï is the Gaussian density and Ί(x) âĄâ« xââ Ï(z) dz is the Gaussian distri-
bution. Explicitly, if the above functions on the right-hand side of Eqs. (1.2), (1.3) are denoted byfÎŽ(α), fÏ(α), then1 Ïâ(ÎŽ) ⥠fÏ(f
â1ÎŽ (ÎŽ)).
Here we extend the scope of Theorem 1 from Gaussian matrices to matrices with independentsubgaussian2 entries (not necessarily identically distributed).
Theorem 2. Fix ÎŽ â (0, 1). For each n â N, let m(n) = bnÎŽc and define A(n) â Rm(n)Ăn to be an
random matrix with independent subgaussian entries, with zero mean, unit variance, and commonscale factor s independent of n. Further assume Aij(n) = Aij(n) + Îœ0Gij(n) where Îœ0 > 0 isindependent of n and {Gij(n)}iâ[m],jâ[n] is a collection of i.i.d. N(0, 1) random variables independent
of A(n).Then the sequence of polytopes {A(n)Cn}nâ„0 has weak neighborliness Ïâ(ÎŽ) in probability.
It is likely that this theorem can be improved in two directions. First, a milder tail condition thansubgaussianity is probably sufficient. Second, we are assuming that the distribution of Aij has anarbitrarily small Gaussian component. This is not necessary for the upper bound on neighborliness,and appears to be an artifact of the proof of the lower bound.
The proof of Theorem is provided in Section 5. By comparison, the most closely related resulttowards universality is by Adamczak, Litvak, Pajor, and Tomczak-Jaegermann [ALPTJ11]. Fora class of matrices A(n) with i.i.d. columns, these authors prove that A(n)Cn has neighborlinessscaling linearly with n. This however does not suggest that a limit weak neighborliness exists, andis universal, as established instead in Theorem 2.
At the other extreme, universality of compressed sensing phase transitions can be conjecturedfrom the results of the non-rigorous replica method [KWT09, RFG09].
1.2 Universality of iterative algorithms
We will consider here and below a setting that is somewhat more general than the one describedby Eq. (1.1). Following the terminology of [DMM09], we will refer to such an iteration as to theapproximate message passing (AMP) iteration/algorithm.
We generalize the iteration (1.1) to take place in the vector space Vq,N ⥠(Rq)N ' RNĂq.
Given a vector x â Vq,N , we shall most often regard it as an N -vector with entries in Rq, namely
x = (x1, . . . ,xN ), with xi â Rq. Components of xi â R
q will be indicated as (xi(1), . . . , xi(q)) ⥠xi.Given a matrix A â R
NĂN , we let it act on Vq,N in the natural way, namely for vâČ, v â Vq,N
letting vâČ = Av be given by vâČi =âN
j=1Aijvj for all i â [N ]. Here and below [N ] ⥠{1, . . . , N} is theset of first N integers. In other words we identify A with the Kronecker product Aâ IqĂq.
1It is easy to show that fÎŽ(α) is strictly decreasing in α â [0,â), with fÎŽ(0) = 1, limαââ fÎŽ(α) = 0, and hence fâ1ÎŽ
is well defined on [0, 1]. Further properties of this curve can be found in [DMM09, DMM11].
2See Eq. (1.7) for the definition of subgaussian random variables.
4
Definition 2. An AMP instance is a triple (A,F , x0) where:
1. A â RNĂN is a symmetric matrix with Ai,i = 0 for all i â [N ].
2. F = {fk : k â [N ]} is a collection of mappings f k : Rq Ă N â R
q, (x, t) 7â fk(x, t) that arelocally Lipschitz in their first argument;
3. x0 â Vq,N is an initial condition.
Given F = {fk : k â [N ]}, we define f( · ; t) : Vq,N â Vq,N by letting vâČ = f(v; t) be given byvâČi = f i(vi; t) for all i â [N ].
Definition 3. The approximate message passing orbit corresponding to the instance (A,F , x0) isthe sequence of vectors {xt}tâ„0, x
t â Vq,N defined as follows, for t â„ 0,
xt+1 = Af(xt; t)â Bt f(xtâ1; tâ 1) . (1.4)
Here Bt : Vq,N â Vq,N is the linear operator defined by letting, for v âČ = Btv,
vâČi =
â
jâ[N ]
A2ij
âf j
âx(xt
j , t)
vi , (1.5)
with âfj
âxdenoting the Jacobian matrix of f j( · ; t) : R
q â Rq.
The above definition can also be summarized by the following expression for the evolution of asingle coordinate under AMP
xt+1i =
â
jâ[N ]
Aijfj(xt
j , t)ââ
jâ[N ]
A2ij
âf j
âx(xt
j , t)fi(xtâ1
i , tâ 1) . (1.6)
Notice that Eq. (1.1) corresponds to the special case q = 1, in which we replaced A2ij by E{A2
ij} = 1/Nfor simplicity of exposition.
Recall that a centered random variable X is subgaussian with scale factor Ï2 if, for all λ > 0, wehave
E
(eλX
)†e
Ï2λ2
2 . (1.7)
Definition 4. Let {(A(N),FN , x0,N )}Nâ„1 be a sequence of AMP instances indexed by the dimension
N , with A(N) a random matrix and x0,N a random vector. We say that the sequence is (C, d)-regular(or, for short, regular) polynomial sequence if
1. For each N , the entries (Aij(N))1â€i<jâ€N are independent centered random variables. Furtherthey are subgaussian with common scale factor C/N .
2. For each N , the functions f i( · ; t) in FN (possibly random, as long as they are independentfrom A(N), x0,N ) are polynomials with maximum degree d and coefficients bounded by C.
3. For each N , A(N) and x0,N are independent. Further, we haveâN
i=1 exp{âx0,Ni â2
2/C} †NCwith probability converging to one as N ââ.
5
We state now our universality result for the algorithm (1.4).
Theorem 3. Let (A(N),FN , x0,N )Nâ„1 and (A(N),FN , x
0,N )Nâ„1 be any two (C, d)-regular polyno-mial sequences of instances, that differ only in the distribution of the random matrices A(N) andA(N).
Denote by {xt}tâ„0, {xt}tâ„0 the corresponding AMP orbits. Assume further that for all N andall i < j, E{A2
ij} = E{A2ij}. Then, for any set of polynomials {pN,i}Nâ„0,1â€iâ€N pN,i : R
q â R, withdegree bounded by D and coefficients bounded by B for all N and i â [N ], we have
limNââ
1
N
Nâ
i=1
{EpN,i(x
ti)â EpN,i(x
ti)}
= 0 . (1.8)
1.3 State evolution
Theorem 3 establishes that the behavior of the sequence {xt}tâ„0 is, in the high dimensional limit,insensitive to the distribution of the entries of the random matrix A. In order to characterize thislimit, we need to make some assumption on the collection of functions FN .
Definition 5. We say that the sequence of AMP instances {(A(N),FN , x0,N )}Nâ„0 is polynomial
and converging (or simply converging) if it is (C, d)-regular and there exists: (i) An integer k; (ii) Asymmetric matrix W â R
kĂk with non-negative entries; (iii) A function g : Rq ĂR
q Ă [k]ĂN â Rq,
with g(x, Y, a, t) = (g1(x, Y, a, t), . . . , gq(x, Y, a, t)) and, for each r â [q], a â [k], t â N, gr( · , a, t) apolynomial with degree d and coefficients bounded by C; (iv) k probability measures P1, . . . , Pk onR
q, with Pa a finite mixture of (possibly degenerate) Gaussians for each a â [k]; (v) For each N , afinite partition CN
1 âȘ CN2 âȘ · · · âȘ CN
k = [N ]; (vi) k positive semidefinite matrices ÎŁ01,. . . ÎŁ
0k â R
qĂq,such that the following happens.
1. For each a â [k], we have limNââ |CNa |/N = ca â (0, 1).
2. For each N â„ 0, each a â [k] and each i â CNa , we have f i(x, t) = g(x, Y (i), a, t) where
Y (1), . . . , Y (N) are independent random variables with Y (i) ⌠Pa whenever i â CNa for some
a â [k].
3. For each N , the entries {Aij(N)}1â€i<jâ€N are independent subgaussian random variables withscale factor C/N , EAij = 0, and, for i â CN
a and j â CNb , E{A2
ij} = Wab/N .
4. For each a â [k], in probability,
limNââ
1
|CNa |
â
iâCNa
g(x0
i , Y (i), a, 0)g(x0
i , Y (i), a, 0)
T
= ÎŁ0a . (1.9)
With a slight abuse of notation, we will sometime denote a converging sequence by {(A(N), g, x0,N )}Nâ„0.We use capital letters to denote the Y (i)âs to emphasize that they are random and do not changeacross iterations.
Our next result establishes that the low-dimensional marginals of {xt} are asymptotically Gaus-sian. State evolution characterizes the covariance of these marginals. For each t â„ 1, state evolution
6
defines a set of k positive semidefinite matrices ÎŁt = (ÎŁt1,ÎŁ
t2, . . . ,ÎŁ
tk), with ÎŁt
a â RqĂq. These are
obtained by letting, for each t â„ 1
ÎŁta =
kâ
b=1
cbWab ÎŁtâ1b (1.10)
ÎŁta = E
{g(Zt
a, Ya, a, t)g(Zta, Ya, a, t)
T
}, (1.11)
for all a â [k]. Here Ya ⌠Pa, Zta ⌠N
(0,ÎŁt
a
)and Ya and Zt
a are independent.
Theorem 4. Let (A(N),FN , x0)Nâ„0 be a polynomial and converging sequence of AMP instances,
and denote by {xt}tâ„0 the corresponding AMP sequence. Then for each t â„ 1, each a â [k], and eachlocally Lipschitz function Ï : R
q Ă Rq â R such that |Ï(x, y)| †K(1 + âyâ2
2 + âxâ22)
K, we have, inprobability,
limNââ
1
|CNa |
â
jâCNa
Ï(xtj , Y (i)) = E{Ï(Za, Ya)} , (1.12)
where Za ⌠N(0,ÎŁta) is independent of Ya ⌠Pa.
We conclude by mentioning that, following [DMM09], generalizations of the algorithm (1.4) werestudied by several groups [Sch10, Ran11, MAYB11], for a number of applications. Universality resultsanalogous to the one proved here are expected to hold for such generalizations as well.
1.4 Outline of the paper
The paper is organized as follows. After some preliminary facts and notations in Section 2, Section3 considers the AMP iteration (1.4) and proves Theorems 3 and 4. In order to achieve our goal, weintroduce two different iterations whose analysis provides useful intermediate steps. We also prove ageneralization of Theorem 4 to estimate functions of messages at two distinct times Ï(xt
i,xsi , Y (i)).
Section 4 proves a generalization of Theorem 4 to the case of rectangular (non-symmetric) ma-trices A. This is achieved by effectively embedding the rectangular matrix, into a larger symmetricmatrix and applying our results for symmetric matrices.
The generalization to rectangular matrices is finally used in Section 5 to prove our result onthe universality of polytope neighborliness, Theorem 2. This is done via a correspondence withcompressed sensing reconstruction established in [Don05a], and a sharp analysis of an AMP iterationthat solves this reconstruction problem.
2 Notations and basic simplifications
We will always view vectors as column vectors. The transpose of vector v is the row vector indicatedby vT. Analogously, the transpose of a matrix (or vector) M is denoted by M T. For a vector v â R
m,we denote its `p norm, p â„ 1 by âvâp ⥠(
âmi=1 |vi|p)1/p. This is extended in the usual way to p = â.
We will often omit the subscript if p = 2. For a matrix M , we denote by âMâp the corresponding `poperator norm. The standard scalar product of u, v â R
m is denoted by ău, vă =âm
i=1 uivi. Givenv â R
m, w â Rn, we denote by [v, w] â R
m+n the (column) vector obtained by concatenating v
7
and w. The identity matrix is denoted by I, or ImĂm if the dimensions need to be specified. Theindicator function is 1( · ). The set of first m integers is indicated by [m] = {1, . . . ,m}. Finally, givenx = (x(1), x(2), . . . , x(q)) â R
q and m = (m(1), . . . ,m(q)) â Nq, we write
xm âĄqâ
r=1
x(r)m(r) . (2.1)
Following the common practice, degenerate Gaussian distributions will be considered Gaussian,without further qualification. In particular, any distribution with finite support in R
k is a finitemixture of Gaussians.
In our proof of Theorem 4 we will make use of the following simplification, that lightens somewhatthe notation.
Remark 1. For proving Theorem 4, it is sufficient to consider the case in which g : (x, Y, a, t) 7âg(x, Y, a, t) is independent of Y .
Proof. We can assume without loss of generality that the measures Pa are Gaussian. Indeed if, forinstance, Pa is a mixture of ` gaussians, Pa = w1 Pa,1 + w2 Pa,2 + · · · + w`Pa,` then we can replaceeffectively the partition element CN
a by a finer partition CNa,1, . . . , C
Na,` whereby CN
a,1âȘ· · ·âȘCNa,` = CN
a
and |CNa,1|, . . . , |CN
a,`| are multinomial with parameters (w1, . . . , w`). Notice that this finer partition
is random, but |CNa,i|/N â cawi almost surely, and therefore the theorem applies.
Assume therefore that the Pa are gaussian. By replacing g(x, Y, a, t) by g âČ(x, Y, a, t) = g(x, QaY +va, a, t) for suitable matrices Qa, and vectors va, we can always assume Ya ⌠N(0, IqĂq) for all a.Assume therefore Ya ⌠N(0, IqĂq). Enlarge the space by letting kâČ = k + q, N âČ = (q + 1)N andCN âČ
a = {N`+1, . . . , N(`+1)}, for a = k+` > k, while CN âČ
a = CNa for a †k. We further let qâČ = q+ q
and define new functions gâČ : RqâČ Ă R
q Ă [kâČ]Ă N â RqâČ independent of the second argument (Y ) as
follows. For x â Rq, x â R
q, we let
gâČr
((x, x), Y, a, t
)= gr(x, x, a, t) for r â {1, . . . , q} , a â {1, . . . , k} ,
gâČr
((x, x), Y, a, t
)= 0 for r â {q + 1, . . . , q + q} , a â {1, . . . , k} ,
gâČr
((x, x), Y, a, t
)= 0 for r â {1, . . . , q} , a â {k + 1, . . . , k + q} ,
gâČq+`
((x, x), Y, k + `âČ, t
)= 1(` = `âČ) for `, `âČ â {1, . . . , q} .
We further use matrix AâČ constructed as follows: AâČij = Aij for i, j †N , and Aij ⌠N(0, 1/N) if
i > N or j > N . (Notice that E{(AâČij)
2} = 2/N âČ but this amounts just to an overall rescalingand is of course immaterial.) Clearly the functions g âČ do not depend on Y as claimed. Further,x ⌠N(0, IqĂq) at all iterations. Hence the new iteration is identical to the original one whenrestricted on {xi(r) : i †N, r †q}.
3 Proofs of Theorems 3 and 4
In this section we consider the AMP iteration (1.4), and prove Theorem 3 and Theorem 4, and indeedgeneralize the latter.
8
We extend the state evolution (1.10) by defining for each t â„ s â„ 0 and for all a â [k], a positivesemidefinite matrix ÎŁt,s
a â R(2q)Ă(2q) as follows. For boundary conditions, we set
ÎŁ0,0a =
(ÎŁ0
a ÎŁ0a
ÎŁ0a ÎŁ0
a
), ÎŁt,0
a =
(ÎŁt
a 0
0 ÎŁ0a
), ÎŁ0,t
a =
(ÎŁ0
a 0
0 ÎŁta
), (3.1)
with ÎŁta defined per Eq. (1.10). For any s, t â„ 1, we set recursively
ÎŁt,sa =
kâ
b=1
cbWabÎŁtâ1,sâ1b , (3.2)
ÎŁt,sa = E
{XaX
T
a
}, Xa ⥠[g(Zt
a, Ya, a, t), g(Zsa , Ya, a, s)] , (Z t
a, Zsa) ⌠N(0,ÎŁt,s
a ) . (3.3)
Recall that [g(Z ta, Ya, a, t), g(Z
sa , Ya, a, s)] â R
2q is the vector obtained by concatenating g(Z ta, Ya, a, t)
and g(Zsa, Ya, a, s). Note that taking s = t in (3.2), we recover the recursion for ÎŁt
a given by Eq. (1.10).Namely, for all t we have
ÎŁt,ta =
(ÎŁt
a ÎŁta
ÎŁta ÎŁt
a
). (3.4)
Theorem 5. Let {(A(N),FN , x0,N )}Nâ„1 be a polynomial and converging sequence of instances and
denote by {xt}tâ„0 the corresponding AMP orbit.Fix s, t â„ 1. If s 6= t, further assume that the initial condition x0,N is obtained by letting
x0,Ni ⌠Qa independent and identically distributed, with Qa a finite mixture of Gaussians for eacha. Then, for each a â [k], and each locally Lipschitz function Ï : R
q Ă Rq Ă R
q â R such that|Ï(x,xâČ, y)| †K(1 + âyâ2
2 + âxâ22 + âxâČâ2
2)K, we have, in probability,
limNââ
1
|CNa |
â
jâCNa
Ï(xtj ,x
sj , Y (j)) = E
[Ï(Zt
a, Zsa, Ya)
],
where (Z ta, Z
sa) ⌠N(0,ÎŁt,s
a ) is independent of Ya ⌠Pa.
Throughout this section, we will assume that {(A(N),FN , x0,N )}, {(A(N),FN , x
0,N )}, etc. are(C, d)-regular polynomial sequences of AMP instances. We will often omit explicit mention of thishypothesis. Notice that Theorem 3 holds per realization of the functions FN . Because of this, andof Remark 1, we will consider hereafter FN to be non-random.
The rest of this section is organized as follows. In subsection 3.1 we introduce two new itera-tions that are useful intermediary steps for our analysis. We show that the corresponding variablesadmit representations as sums over trees in Sec. 3.2 and use them to prove basic properties of theserecursions in Secs. 3.3, 3.4, and 3.5. Theorems 3 and 5 are then proved in Secs. 3.6, 3.7. Becauseof Eq. (3.4), Theorem 4 follows as a special case of Theorem 5. Indeed, we will show that bothstatements are equivalent through a reduction argument. Depending on the application, Theorem 5might be a more convenient formulation of the state evolution and will be used in Section 4.
9
3.1 Message passing iteration
We define two new message passing sequences corresponding to the instance (A,F , x0,N ). For eachi â [N ] we use the short notation [N ] \ i to denote the set [N ] \ {i}. We now define the sequence ofvectors (zt
iâj)tâN, where for each i 6= j â [N ], ztiâj is a vector in R
q or equivalently for each t â N, wecan see (zt
iâj) as an N ĂN matrix with entries in Rq (diagonal elements are never used). The initial
condition is denoted by z0iâj â R
q for any i, j â [N ] and is independent of j, such that z0iâj = x
0,Ni
for all j 6= i. The r-th coordinate of the vector zt+1iâj is defined by the following recursion for t â„ 0,
zt+1iâj(r) =
â
`â[N ]\j
A`i f`r(z
t`âi, t) , (3.5)
where f `r(·, t) : R
q â R is the rth coordinate of f `(·, t).We also define for each i â [N ] and t â„ 0, the vector zt+1
i â Rq by
zt+1i (r) =
â
`â[N ]
A`i f`r(z
t`âi, t). (3.6)
Our first result establishes universality of the moments of ztiâj for polynomial sequences of instances.
Proposition 6. Let (A(N),FN , x0,N )Nâ„1 and (A(N),FN , x
0,N )Nâ„1 be any two (C, d)-regular poly-nomial sequences of AMP instances, that differ only in the distribution of the random matrices A(N)and A(N). Assume that for all N and all i < j, E{A2
ij} = E{A2ij}. Denote by zt
i the orbit (respec-
tively zti) defined by (3.6) while iterating (3.5) with matrix A (respectively A). Then for any t â„ 1
and any m = (m(1), . . . ,m(q)) â Nq, there exists K independent of N such that, for any i â [N ]:
âŁâŁâŁE[(zt
i
)m]â E
[(zt
i
)m]âŁâŁâŁ †KNâ1/2. (3.7)
The proof of this proposition is provided in Section 3.3.
Note 2. In this statement and and in the rest of this section, K is always understood as a functionof d, t, q,m,C which may vary from line to line but which is independent of N .
Our second message passing sequence is defined as follows: for a (C, d)-regular sequence of in-stances (A(N),FN , x
0,N )Nâ„1, we define for each N , an i.i.d. sequence of N Ă N random matrices
{At}tâN such that A0 = A(N). Then we define (ytiâj) by y0
iâj = x0,Ni and for t â„ 0
yt+1iâj(r) =
â
`â[N ]\j
At`i f
`r (y
t`âi, t), (3.8)
and
yt+1i (r) =
â
`â[N ]
At`i f
`r(y
t`âi, t). (3.9)
The asymptotic analysis of yt is particularly simple because an independent random matrix At isused at each iteration. In particular, it is easy to establish state evolution for y t. Our next resultshows that yt provides a good approximation for zt.
10
Proposition 7. Let (A(N),FN , x0,N )Nâ„1 be a (C, d)-regular polynomial sequence of instances. Let
zti and yt
i be the sequences of vectors obtained by iterating (3.5)-(3.6) and (3.8)-(3.9) respectively.Then for any t â„ 1 and any m = (m(1), . . . ,m(q)) â N
q, there exists K independent of N such that,for any i â [N ]:
âŁâŁâŁE[(
zti
)m]â E
[(yt
i
)m]âŁâŁâŁ †KNâ1/2.
The proof of this proposition is provided in Section 3.4.Finally, recall that we defined the sequences (xt
i)tâN with xti â R
q, by x0i and for t â„ 0,
xt+1i (r) =
â
`
A`if`r (x
t`, t)â
â
`
A2`i
â
s
f is(x
tâ1i , tâ 1)
âf `r
âx(s)(xt
`, t)
Proposition 8. Let (A(N),FN , x0,N )Nâ„1 be a (C, d)-regular sequence of instances. Denote by
{xt}tâ„0 the corresponding AMP sequence and by {zt}tâ„0 the sequence defined by (3.6) while iter-ating (3.5). Then for any t â„ 1 and m(1), . . . ,m(q) â„ 0, there exists K independent of N such that,for any i â [N ],:
âŁâŁâŁE[(
xti
)m]â E
[(zt
i
)m]âŁâŁâŁ †KNâ1/2.
The proof of this proposition is provided in Section 3.5.
3.2 Tree representation
By assumption of Proposition 6, we have for each ` â [N ] and r â [q],
f `r (z, t) =
â
i1+···+iqâ€d
c`i1,...,iq(r, t)
qâ
s=1
z(s)is , (3.10)
where each coefficient c`i1,...,iq(r, t) belongs to R and has absolute value bounded by C (uniformly in
` â [N ], i1, . . . , iq, and t â N).We now introduce families of finite rooted labeled trees that will allow us to get a simple expression
for the ztiâj(r)âs and zt
i(r), see Lemma 1 below. For a vertex v in a rooted tree T different from theroot, we denote by Ï(v) the parent of v in T . We denote the root of T by âŠ. We consider that theedges of T are directed towards the root and write (uâ v) â E(T ) if Ï(u) = v. The unlabeled treesthat we consider are such that the root and the leaves have degree one; each other vertex has degreeat most d + 1, i.e. has at mostd children. We now describe the possible labels on such trees. Thelabel of the root is in [N ], the label of a leaf is in [N ] Ă [q] Ă N
q and all other vertices have a labelin [N ] Ă [q]. For a vertex v different from the root or a leaf, we denote its label by (`(v), r(v)) andcall `(v) its type and r(v) its mark. The label (or type) of the root is also denoted by `(âŠ); the labelof a leaf v is denoted by (`(v), r(v), v[1], . . . v[q]). For a vertex u â T , we denote |u| its generationin the tree, i.e. its graph-distance from the root. Also for a vertex u â T (which is not a leaf), wedenote by u[r] the number of children of u with mark r â [q] (with the convention u[0] = 0). Thechildren of such a node are ordered with respect to their mark: the labels of the children of u are then(`1, 1), . . . , (`u[1], 1), (`u[1]+1, 2), . . . , (`u[1]+···+u[q], q), where each (`u[0]+···+u[i], . . . , `u[0]+···+u[i+1]â1) is au[i + 1]-tuple with coordinates in [N ]. We denote by L(T ) the set of leaves of a tree T , i.e. the set
11
of vertices of T with no children. For v â L(T ), its label (`(v), r(v), v[1], . . . v[q]) is such that for alli â [q], v[i] â N and v[1] + · · · + v[q] †d. We will distinguish between two types of leaves: thosewith maximal depth t = max{|v|, v â L(T )} and the remaining ones. If v â L(T ) and |v| †tâ 1,then we impose v[1] = · · · = v[q] = 0. This case corresponds to ânaturalâ leaves and since they haveno children, the notation is consistent with the notation introduced for other nodes of the tree. Forall other leaves, we do not make this assumption so that v[1] + · · · + v[q] can take any value in [d].These leaves are âartificialâ and can be thought of as leaves resulting from cutting a larger tree aftergeneration t so that the vector of the v[r]âs keeps the information on the number of children withmark r in the original tree.
Definition 9. We denote by T t the set of labeled trees T with t generations as above that satisfy thefollowing conditions:
1. If v1 = âŠ, v2, . . . , vk is a path starting from the root (i.e. with Ï(vi+1) = vi for i â„ 1), then thecorresponding sequence of types `(vi) is non-backtracking. i.e., for any 1 †i †kâ 2, the threelabels `(vi), `(vi+1) and `(vi+2) are distinct.
2. If u â L(T ) and |u| †tâ 1 (i.e. u is a ânaturalâ leaf), then we have v[1] + · · ·+ v[q] = 0.
3. If u â L(T ) and |u| = t (i.e. u is an âartificialâ leaf) then we have v[1] + · · ·+ v[q] †d.
We also denote by T tthe set of trees that satisfy conditions 2 and 3, but not necessarily the non-
backtracking condition 1. Hence T t â T t.
We also let U t be the same set of trees in which marks have been removed (i.e. we identify any
two trees that differ in the marks but not on type). Analogously, U tis the set of trees in which marks
have been removed, but do not necessarily the non-backtracking condition 1.
For a labeled tree T â T t and a set of coefficients c = (c`i1,...,iq(r, t)), we define three weights:
A(T ) =â
(uâv)âE(T )
A`(u)`(v),
Î(T, c, t) =â
(uâv)âE(T )
c`(u)u[1],...,u[q]
(r(u), tâ |u|),
x(T ) =â
vâL(T )
qâ
s=1
(x0,N
`(v)(s))v[s]
.
We define
(a) T tiâj(r) â T t the family of trees such that: (i) The root has type i; (ii) The root has only one
child, call it v; (iii) The type of v is `(v) /â {i, j} and its mark is r(v) = r.
(b) T ti (r) â T t the family of trees such that: (i) The root has type i; (ii) The root has only one
child, call it v; (iii) The type of v is `(v) 6= i and its mark is r(v) = r.
The sets of trees U ti (r) and U t
iâj(r) are obtained from T ti (r) and T t
iâj(r) by removing marks.
12
Lemma 1. Let (A(N),FN , x0,N )Nâ„1 be a polynomial sequence of AMP instances. Denote by zt
i theorbit defined by (3.6) while iterating (3.5) with matrix A. Then,
ztiâj(r) =
â
TâT tiâj (r)
A(T )Î(T, c, t)x(T ), (3.11)
zti(r) =
â
TâT ti (r)
A(T )Î(T, c, t)x(T ). (3.12)
Proof. We first prove (3.11) by induction on t. For t = 1 we have, by definition
z1iâj(r) =
â
`â[N ]\j
â
i1+···+iqâ€d
A`i c`i1,...,iq (r, 0)
qâ
s=1
(x0,N
`âi(s))is
This expression corresponds exactly to equation (3.11) since trees in T 1iâj(r) have a root with label
i and with one child with label (`, r, i1, . . . , iq) for some ` /â {i, j} and i1 + · · ·+ iq †d.To prove the induction, we start with Eq. (3.5), which yields
zt+1iâj(r) =
â
`â[N ]\j
A`i
â
i1+···+iqâ€d
c`i1,...,iq(r, t)
qâ
s=1
(zt`âi(s)
)is
Using the induction hypothesis, we get
qâ
s=1
(zt`âi(s)
)is =
qâ
s=1
â
TâT t`âi(s)
A(T )Î(T, c, t)x(T )
is
=â
[T t`âi
(s)]i1+···+iq
qâ
s=1
isâ
k=1
A(T sk )Î(T s
k , c, t)x(Tsk ),
where the last expression is a sum over all (i1 + · · · + iq)-tuples of trees with the first i1 trees inT t
`âi(1), the following i2 in T t`âi(2), and so on.
Hence, we get
zt+1iâj(r) =
â
`â[N ]\j
â
i1...iq
â
[T t`âi(s)]
i1+···+iq
A`i c`i1,...,iq (r, t)
qâ
s=1
isâ
k=1
A(T sk )Î(T s
k , c, t)x(Tsk ). (3.13)
The claim now follows by observing that the set of trees in T t+1iâj (r) is in bijection with the set of
pairs constituted by a label (`, r) with ` /â {i, j} and a (i1 + · · · + iq)-tuple of trees with exactly istrees belonging to T k
`âi(s) for s â [q]. Indeed, take a root with label i and one child say v, with label(`, r) for some ` /â {i, j} and with a (i1 + · · · + iq)-tuple of trees with exactly is trees belonging toT t
`âi(s) for s â [q]. Now take v as the root of these (i1 + · · ·+ iq) trees, the order in the tuple givingthe order of the subtrees of v. Note that the root of each subtree in T t
`âi(s) has type ` and in theresulting tree will get mark r. The proof of (3.12) follows by the same argument, the only change isthat in the sum in (3.13), we need now to include ` = j.
13
3.3 Proof of Proposition 6
We are now in position to prove Proposition 6.
Proof. For notational simplicity, we consider the case m(r) = m, and m(s) = 0 for all s â [q] \ r.Thanks to Lemma 1, we have
E[(zti(r)
)m]=
â
T1,...,TmâT ti (r)
[mâ
`=1
Î(T`, c, t)
]E
[mâ
`=1
x(T`)
]E
[mâ
`=1
A(T`)
]. (3.14)
Since c is fixed in this section, we omit to write it in Î(T, t). Notice that the general case m =(m(1), . . . ,m(q)) â N
q admits a very similar representation whereby the sum over T1, . . . , Tm â T ti (r)
is replaced by sums over T1, . . . , Tm(1) â T ti (1), T1, . . . , Tm(2) â T t
i (2), . . . , T1, . . . , Tm(q) â T ti (q)/
The argument goes through essentially unchanged.We have Î(T`, t) †Cdt+1
. We first concentrate on the term E [âm
`=1A(T`)]. Recall that, from
subgaussian property of entries of A: E(eλAij
)†e
Cλ2
2N . Now using Lemma 12 from Appendix D weget for all i < j â [N ]
E [|Aij |s] †2(se
)sλâse
Cλ2
2N †2Cs2
(se
) s2Nâ s
2 , (3.15)
obtained by taking λ =âNs/C.
For a labeled tree T , we define Ï(T ) = {Ï(T )ij â N, i †j â [N ]} where Ï(T )ij is the number ofoccurrences in T of an edge (uâ v) with endpoints having types `(u), `(v) â {i, j}. Hence we have
A(T ) =â
i<jâ[N ]
AÏ(T )ij
ij and E
[mâ
`=1
A(T`)
]=
â
i<jâ[N ]
E
[A
Pm`=1 Ï(T`)ij
ij
]. (3.16)
Since the mean of each entry of the matrix A is zero, in Equation (3.14), we can restrict the sumto T1, . . . , Tm such that for all i < j â [N ],
âm`=1 Ï(T`)ij < 2 implies
âm`=1 Ï(T`)ij = 0.
We now concentrate on the sum restricted to T1, . . . , Tm such that moreover there exists i < j â[N ] such that
âm`=1 Ï(T`)ij â„ 3. For such a m-tuple T1, . . . , Tm, we denote ” = ”(T1, . . . , Tm) =â
i<j
âm`=1 Ï(T`)ij . Let G be the graph obtained by taking the union of the T`âs and identifying ()
vertices v with the same type `(v). We define e(T1, . . . , Tm) =â
i<j 1(âm
`=1 Ï(T`)ij â„ 1) which is thenumber of edges counted without multiplicity in G. Since there exists i < j with
âm`=1 Ï(T`)ij â„ 3,
we have 3 + 2(e(T1, . . . , Tm)â 1) †”, i.e. e(T1, . . . , Tm) †”â12 . Using Eq. (3.15), we get
âŁâŁâŁâŁâŁE[ mâ
`=1
A(T`)]âŁâŁâŁâŁâŁ â€
â
i<jâ[N ]
E
[|Aij |
Pm`=1 Ï(T`)ij
]
â€(
2C”2
(”e
)”2
)(Ӊ1)/2
Nâ”2 , (3.17)
since in the product on the right-hand side of (3.16), there are e(T1, . . . , Tm) terms different formone, i.e. at most (Ӊ 1)/2 contributing terms.
14
We now compute an upper bound on
(”)â
T1,...,Tm
E
[ âŁâŁâŁâŁâŁ
mâ
`=1
x(T`)
âŁâŁâŁâŁâŁ
],
where the sum(”)â
ranges on m-tuple of trees in T ti (r) such that
âi<j
âm`=1 Ï(T`)ij = ”. First note
that for any x â Rq, we have for any p â„ 2:
âxâpp †âxâp
2 †max(exp(âxâ2
2), pp).
Hence the condition 1N
âNi=1 exp(âx0,N
i â22/C) †C ensures that for any p â„ 2,
1
N
Nâ
i=1
âx0,Ni âp
p †Cp.
Therefore,
(”)â
T1,...,Tm
mâ
`=1
|x(T`)| â€
qm
Nâ
j=1
qâ
s=1
(1 + |x0,N
j (s)|+ · · ·+ |x0,Nj (s)|md
)
Ӊ12
(3.18)
= (qmN)Ӊ1
2
q +
mdâ
k=1
1
N
Nâ
j=1
âx0,Nj âk
k
Ӊ12
â€(qm(q +
mdâ
k=1
Ck)
) Ӊ12
NӉ1
2 ,
where the last inequality is valid for N â„ C. To see why (3.18) is true, note that the graph G
is connected since all trees T1, . . . , Tm have the same type i at the root. Therefore, the number ofvertices in G is at most e(T1, . . . , Tm) + 1 †”â1
2 + 1. Since all T`âs have the same root which has
type i, G has at most Ӊ12 distinct vertices which are distinct from the one associated to the root.
In particular, all trees T1, . . . , Tm together have at most Ӊ12 distinct types among their leaves. The
factor qm comes from the fact that for each type j there are at most qm choices for its m marks r
corresponding to the m trees. Now each leaf with type j will contribute a factorâq
s=1
(x0,N
j (s))ns
withâ
s ns †md.It is now easy to conclude, since we can decompose the sum in (3.14) in two terms, the first term
say S1(A) consists of the contribution of the m-tuples T1, . . . , Tm such that for all i, j,âm
`=1 Ï(T`)ij â{0, 2} while the second term denoted by S2(A) consists of the remaining contribution. We haveS1(A) = S1(A) and, using (3.17) and (3.18), we get:
|S2(A)| â€â
”â€mdt+1
Cdt+1+”â12 C âČN
”â12 Nâ”
2 = O(Nâ 1
2
), (3.19)
which concludes the proof Proposition 6. Here we used the fact that all values ”, q, and {Ck}mdk=0 are
independent of N .
15
We end this section by showing that the term S1(A) can be further reduced. This result will beuseful in the sequel and we state it as the following lemma.
Lemma 2. Recall that we denoted by S1(A) the term in the sum (3.14), consisting of the contributionof the m-tuples T1, . . . , Tm such that for all i, j,
âm`=1 Ï(T`)ij â {0, 2}. We further decompose S1(A) =
T (A)+R(A) in two terms where the first term T (A) corresponds to the sum over trees T1, . . . Tm suchthat the resulting graph G obtained by taking the union of the T`âs and identifying vertices v withthe same type `(v), is a tree (each edge having multiplicity two). Then there exists K (independentof N) such that:
âŁâŁâŁE[zti(r)
m]â T (A)
âŁâŁâŁ = KNâ1/2 ,âŁâŁE[zti(r)
m]âŁâŁ †K ,
âŁâŁE[ztiâj(r)
m]âŁâŁ †K.
Proof. We have by definition E[(zti(r)
m)]
= T (A)+R(A)+S2(A), so that thanks to (3.19), we need
only to show that R(A) = O(Nâ1/2
).
For any m-tuple T1, . . . , Tm such that for all i, j,âm
`=1 Ï(T`)ij â {0, 2}, we have with the samenotation as above: e(T1, . . . , Tm) = ”
2 . The number of vertices in G is at most 1 + e(T1, . . . , Tm)with equality if and only if G is a tree (remember that G is always connected as all trees T`âs sharethe same root). Hence for the cases that G is not a tree it has at most ”
2 â 1 vertices that serve asleaves of a tree among T1, . . . , Tm. By the same argument as above we get
|T (A)| â€â
”â€mdt+1
KN”2Nâ”
2 = O(1) (3.20)
|R(A)| â€â
”â€mdt+1
KN”2â1Nâ”
2 = O(Nâ1), (3.21)
and the claim follows.
3.4 Proof of Proposition 7
The proof follows the same approach as for Proposition 6. For notational simplicity, we consider thecase m(r) = m, and m(s) = 0 for all s â [q] \ r. The general case follows by the same argument. Fory, we are using a different matrix at each iteration and we need to define a new weight associatedto trees T â T t as follows:
A(T, t) =â
(uâv)âE(T )
Atâ|u|`(u)`(v). (3.22)
In the particular case where the sequence {At}tâN is constant (i.e., equals to A), this expressionreduces to A(T ) defined previously. Similar to Lemma 1 for x, we have now
ytiâj(r) =
â
TâT tiâj (r)
A(T, t)Î(T, c, t)x(T ),
yti(r) =
â
TâT ti (r)
A(T, t)Î(T, c, t)x(T ),
16
so that we get
E[(yt
i(r))m]
=â
T1,...,TmâT ti (r)
[mâ
`=1
Î(T`, c, t)
]E
[mâ
`=1
x(T`)
]E
[mâ
`=1
A(T`, t)
]. (3.23)
For a labeled tree T , we define Ï(T ) = {Ï(T )gij â„ 0, i †j â [N ], d â„ 1} where Ï(T )g
ij is the numberof occurrences in T of an edge (u â v) with endpoints having labels `(u), `(v) â {i, j} and withgeneration |u| = g. In particular, we have
âg Ï(T )g
ij = Ï(T )ij which was defined in the proof ofProposition 6. Hence we have with ” =
âi<j
âm`=1 Ï(T`)ij ,
âŁâŁâŁâŁâŁE[ mâ
`=1
A(T`, t)]âŁâŁâŁâŁâŁ
(a)=
â
i<jâ[N ]
â
g
âŁâŁâŁâŁE[A
Pm`=1 Ï(T`)
gij
ij
]âŁâŁâŁâŁ
â€â
i<jâ[N ]
â
g
E
[|Aij |
Pm`=1 Ï(T`)
gij
]
(b)
â€(
2C”2
(”e
)”2
)(Ӊ1)/2
Nâ”2 (3.24)
where (a) holds since {At}tâN is an iid sequence with the same distribution as A(N), and (b) followsby the same argument as in (3.17). The inequality (3.24) implies that the bounds (3.19) and (3.21)are still valid with the weight of a tree given by (3.22) (the term E [
âm`=1 x(T`)] can be treated as in
previous section).As in the proof of Proposition 6, we define the graph G obtained by taking the union of the T`âs
and identifying vertices v with the same type `(v). By Lemma 2, we need only to concentrate on theterm T (A) corresponding to m-tuples T1, . . . , Tm such that each edge in G has multiplicity 2 andsuch that G is a tree. Indeed, the proposition will follow, once we prove
T (A) = T (A), (3.25)
where T (A) was defined in Lemma 2 and T (A) is the corresponding term with the weight of a treegiven by (3.22). First note that for any T1, . . . , Tm such that E
[âm`=1A(T`, t)
]6= 0, we have
E
[mâ
`=1
A(T`, t)
]= E
[mâ
`=1
A(T`)
].
Now suppose that we have E [âm
`=1A(T`)] 6= 0 = E[âm
`=1A(T`, t)]. This can only happen, if an
edge in G connecting types say i and j has multiplicity 2 but appears at different generations in theoriginal trees T`âs. Suppose this edge appears twice in say T1 at on the same branch and at differentgenerations, i.e. there exists (a â b) and (c â d) â E(T1) with {`(a), `(b)} = {`(c), `(d)} = {i, j},|a| < |c| and the edge (a â b) is on the path that connects c, d to the root. Thanks to the non-backtracking property, these two edges cannot be adjacent, i.e. a 6= d. But then these edges create acycle in G, contradiction. Suppose now that these edge appears in T1 and T2 in different generations,i.e. there exists (a â b) â E(T1) and (c â d) â E(T2) with {`(a), `(b), `(c), `(d)} = {i, j} and|a| < |c|. Then the same reasoning shows that they will create a cycle in G since b and d areconnected to the roots of T1 and T2 respectively which are both identify to a single vertex in G. Thelatter argument can be used for the case where both edges belong to the same tree T1 but they liein different branches. Hence we obtain again a contradiction.
17
3.5 Proof of Proposition 8
Proof. As in the proof of Proposition 6, we will rely on a representation of xti(r) based on labeled
trees defined as in Section 3.2. In the present case it is however more convenient to work with treesfrom which marks have been removed, i.e. we identify any two trees in which the vertex marks aredifferent but the types are the same. Notice that Eqs. (3.11), (3.12) imply
ztiâj(r) =
â
TâUtiâj (r)
A(T )ÎâČ(T, c, t)x(T ), (3.26)
zti(r) =
â
TâUti (r)
A(T )ÎâČ(T, c, t)x(T ), (3.27)
where ÎâČ(T, c, t) is obtained by summing Î(T, c, t) over all trees T that coincide up to marks. In thefollowing, with a slight abuse of notation, we will write Î(T, c, t) instead of Î âČ(T, c, t).
In a directed labeled graph, we define a backtracking path of length 3 as a path aâ bâ câ dsuch that `(a) = `(c) and `(b) = `(d). We define a backtracking star as a set of vertices a â b â cand aâČ(6= a) â b such that `(a) = `(aâČ) = `(c). We define Bt as the set of rooted labeled trees T in
U t, that satisfy the following conditions:
âą If u â v â E(T ), then `(u) 6= `(v) and there exists in T at least one backtracking path oflength 3 or one backtracking star.
Then, we define Bti as the subset of trees in Bt with root having type i and only one child with type
` with ` 6= i.
Lemma 3. Under the same assumptions as in Proposition 8, we have
xti(r) = zt
i(r) +â
TâBti
A(T )Î(T, t, r)x(T ),
for some Î(T, t, r) which is bounded uniformly as |Î(T, t, r)| †K(d,C, t).
Proof. Following the same argument as in Lemma 1, it is easy to prove by induction on t that wecan find Î(T, t, r) such that
xti(r) =
â
TâUti
A(T )Î(T, t, r)x(T ), (3.28)
with |Î(T, t, r)| †K(d,C, t). The terms Ai`f`r (x
t`, t) can be handled exactly as in Lemma 1. Concern-
ing the terms A2i`f
is(x
tâ1i , tâ1) âf`
r
âx(s) (xt`, t), it can be interpreted as a sum on the following trees in U :
the type of the root is i and the root has one child with type `. This child has at most dâ 1 subtrees
in U tcoming from the term âf`
r
âx(s)(xt`, t) (which is a polynomial with degree at most d â 1) and one
child say u with type i. This child u is the root of at most d subtrees in U tâ1coming from the term
f is(x
tâ1i , t â 1). We see that the resulting tree is in U t+1
. Now to see that |Î(T, t, r)| †K(d,C, t),
note that each polynomial f `r( · , t) (resp. âf`
r
âx(s)( · , t)) has coefficients bounded by C (resp. dC) so
that taking into account the contribution of each term in decomposition (3.28), we easily get
|Î(T, t+ 1, r)| †dC2[K(d,C, t)d +K(d,C, t)dâ1K(d,C, t â 1)d
].
18
It remains to prove that Î(T, t, r) agrees with the expression in Lemma 1, cf. Eq. (3.26), (3.27),
for T â U ti (r) and is zero for trees in U t \ Bt
i . The proof of this fact will proceed by induction on t.The cases t = 0, 1 are clear since Bt
i = â . For t â„ 1, we define
zt`,i(r) = Ai`f
ir(z
tâ1iâ`, tâ 1), et
`(r) =â
TâBt`
A(T )Î(T, t, r)x(T ), dt`,i(r) = zt
`,i(r) + et`(r)
so that we have by the induction hypothesis, xt` = zt
`âi + zt`,i + et
` = zt`âi + dt
`,i.
Since f `r( · , t) is a polynomial, we have
f `r(x
t`, t) = f `
r(zt`âi, t) +
â
s
(zt`,i(s) + et`(s)
) âf `r
âx(s)(zt
`âi, t)
+â
n1+···+nqâ„2
qâ
s=1
(dt
`,i(s))ns
ns!
ân1+···+nqf `r
âx(1)n1 . . . âz(q)nq(zt
`âi, t),
where the last sum contains a finite number of non-zero terms.Multiplying by Ai` and summing over ` â [N ], the first term on the right hand side gives exactly
zt+1i (r). The second term gives:
â
`
A2`i
â
s
f is(z
tâ1iâ`, tâ 1)
âf `r
âx(s)(zt
`âi, t) +â
`
A`i
â
s
et`(s)âf `
r
âx(s)(zt
`âi, t) .
From now on and to lighten the notation, we omit the second argument of the functions f `r . Hence
we have
xt+1i (r) = zt+1
i (r)ââ
`
A2`i
â
s
(f i
s(xtâ1i )
âf `r
âx(s)(xt
`)â f is(z
tâ1iâ`)
âf `r
âx(s)(zt
`âi)
)
+â
`
A`i
â
s
et`(s)âf `
r
âx(s)(zt
`âi) (3.29)
+â
`
A`i
â
n1+···+nqâ„2
qâ
s=1
(dt
`,i(s))ns
ns!
ân1+···+nqf `r
âx(1)n1 . . . âx(q)nq(zt
`âi).
We now show that each contribution on the right hand side (except z t+1i (r)) can be written as a sum
of terms A(T )Î(T, t+ 1, r, x0) over trees T â Bt+1i that we construct explicitly.
First consider the terms of the form: A`iet`(s)
âf`r
âx(s)(zt`âi). By definition et
`(s) can be written as a
sum over trees in Bt` and by Lemma 1, the r-th component of zt
`âi can be written as a sum over trees
in U t`âi(r). Hence by the same argument as in the proof of Lemma 1, we see that A`ie
t`(s)
âf`r
âx(s) (xt`âi)
can be written as a sum over trees with root having type i, one child say v with type `. This vertex vis the root of a tree in Bt
` (corresponding to the factor et`(s)) and a set of trees in U t
`âi(1), . . . ,U t`âi(q)
(corresponding to the factor âf`r
âx(s)(zt`âi)). This tree clearly belongs to Bt+1
i .We now treat the terms in the first line. Again, we have
f is(x
tâ1i )
âf `r
âx(s)(xt
`) = f is(z
tâ1iâ`)
âf `r
âx(s)(zt
`âi) + g(dtâ1i,` ,d
t`,i, z
tâ1iâ`, z
t`âi),
19
where g is a polynomial with either a positive power of a component of dtâ1i,` or of dt
`,i. Hence, we only
need to construct trees in Bt+1i (r) corresponding to terms of the following form: for
âs(as + bs) â„ 1,
A2`i
â
s
(dtâ1
i,` (s))as (
dt`,i(s)
)bs(ztâ1iâ`(s)
)cs(zt`âi(s)
)ds .
Let first consider the term: A2`i
âs
(ztâ1iâ`(s)
)cs(zt`âi(s)
)ds . It can be interpreted as a sum on thefollowing family of trees: the type of the root is i and the root has one child with type `. Thischild has ds subtrees in U t
`âi(s) and one child denoted u with type i. This child u has cs subtreesin U tâ1
iâ`(s). Note that the only backtracking path in such a tree is the path from u to the root withtypes i, `, i. In particular such a tree does not belong to Bt
i(r).We assume now that there exists s with as â„ 1. We need to interpret the multiplication by
dtâ1i,` (s) = ztâ1
i,` (s) + etâ1i (s). First consider the case of etâ1
i (s), this corresponds to add a subtree in
Btâ1i to the vertex u. As in previous analysis, we clearly obtain a tree in B t+1
i . The term ztâ1i,` (s)
corresponds to adding a child of type ` to the vertex u which is the root of a subtree in U tâ2`âi(s), in
particular we introduce a backtracking path of length 3 so that again the resulting tree is in B t+1i .
Similarly if bs â„ 1, the multiplication by dt`,i(s) will correspond to add a subtree to the child of the
root, resulting in either adding a backtracking path of length 3 or adding a backtracking star.The last term of the form
A`i
qâ
s=1
(dt
`,i(s))ns
ns!
ân1+···+nqf `r
âx(1)n1 . . . âx(q)nq(zt
`âi),
with n1 + · · · + nq ℠2 can be analyzed by the same kind of argument by noticing that the factorAi`z
t`,i(s)z
t`,i(s
âČ) corresponds to a backtracking star.
The proof of Proposition 8 follows from the same arguments as in the proof of Proposition 6.Once more, for simplicity, we only consider the case m(r) = m and m(s) = 0 for s 6= r, the generalcase of m = (m(1),m(2), . . . ,m(q)) â N
q being completely analogous. We represent both momentsE[xt
i(r)m] and E[zt
i(r)m] using Lemma 1 (in the form given in Eqs. (3.26), (3.27)) and Lemma3. The
expectation E[xti(r)
m] is represented as a sum over trees T1, . . . , Tm â U ti (r) âȘ Bt
i(r), while E[zti(r)
m]is given by a sum over trees T1, . . . , Tm â U t
i (r). In order to complete the proof we need to show thatthe contribution of terms that have at least one tree in Bt
i(r) vanishes as N ââ.
The factorâm
`=1 Î(T`, t, r) is bounded by K(d,C, t)m. which is independent of N . Hence, weonly need to prove that
â
T1âBti(r)
â
TjâT ti (rj)âȘBt
i (rj),jâ[2,m]
E
[mâ
`=1
A(T`)x(T`)
]= O
(Nâ 1
2
). (3.30)
This statement directly follows from previous analysis, since in the graph G obtained by taking theunion of the T`âs and identifying vertices v with the same type `(v), there is at least one edge withmultiplicity 3, due to the backtracking path of length 3 or the backtracking star in T1. So that
previous analysis shows that the term in (3.30) is of order O(Nâ 1
2
).
20
3.6 Proof of Theorem 3
Let {pN,i}Nâ„0,1â€iâ€N be a collection of multivariate polynomials pN,i : Rq â R with degrees bounded
by D, and coefficients bounded in magnitude by B:
pN,i(x) =â
m(1)+···+m(q)â€D
cN,im(1),...,m(q)x(1)
m(1) · · · x(q)m(q) . (3.31)
By Propositions 6 and 8, we have,
|EpN,i(xti)â EpN,i(x
ti)| â€
â
m(1)+···+m(q)â€D
|cN,im(1),...,m(q)| |E[(xt
i)m]â E[(xt
i)m]| †KDqBN1/2 (3.32)
whence the thesis follows.
3.7 Proof of Theorem 5
An important simplification is provided by the following.
Remark 2. It is sufficient to prove Theorem 5 for t = s.(Hence, Theorem 4 implies Theorem 5.)
Proof. Indeed consider a converging sequence {(A(N),FN , x0,N )}Nâ„1 and fix h = t â s > 0. For
the sake of simplicity, and in view of Remark 1 we can assume FN to be given by the polynomialfunction g : R
q Ă Rq Ă [k] Ă N â R
q, (x, Y, a, t) 7â g(x, Y, a, t) that does not depend on the randomvariable Y . With an abuse of notation we will write g(x, a, t) in place of g(x, Y, a, t).
We will construct a new converging sequence of instances {(A(N), FN , x0,N )}Nâ„1 with variables
xti â R
2q and such that, letting xti = (ut
i,vti), ut
i,vti â R
q, the pair (uti,v
ti) is distributed as (xt
i,xtâhi )
asymptotically as N ââ.The new sequence of initial conditions is constructed as follows
1. The initial condition is given by x0i = (0, 0).
2. The independent randomness is given by Y (i) = x0i . Notice that, for i â CN
a , we haveY (i) âŒi.i.d. Qa and hence we let Pa = Qa.
3. The partitions CNa , a â [k] and matrices A(N) are kept unchanged.
4. The collection of functions in FN is determined by the polynomial function g : R2qĂR
qĂ [k]ĂN â R
2q, (x, Y, a, t) 7â g(x, Y, a, t). Writing g( · ) = [g(1)( · ), g(2)( · )], with g(1)( · ), g(2)( · ) â Rq,
we let, for u,v â Rq.
g(1)((u,v), Y, a, t
)=
{g(Y, a, t) if t = 0 ,
g(u, a, t) if t > 0,(3.33)
g(2)((u,v), Y, a, t
)=
{g(Y, a, t) if t †h ,
g(v, a, t) if t > h.(3.34)
As a consequence of this construction, uti = xt
i for all i â [N ], t â„ 1, and vti = xtâh
i for all t â„ h+ 1.This completes the reduction.
21
As a consequence of this remark, it is sufficient to prove Theorem 4, and by Remark 1 we canlimit ourselves to the case in which g : (x, Y, a, t) 7â g(x, Y, a, t) does not depend on Y and hencethis argument will be dropped. We begin by considering the expectation of moments of xt
i.
Proposition 10. Let (A(N),FN , x0)Nâ„0 be a polynomial and converging sequence of AMP instances,
and denote by {xt}tâ„0 the corresponding AMP orbit. Then we have for any i = i(N) â CNa , t â„ 1,
m = (m(1), . . . ,m(q)) â Nq,
limNââ
E[(
xti
)m]= E
[(Zt
a
)m],
where Z ta ⌠N
(0,ÎŁt
a
).
Proof. By Propositions 7 and 8, we need only to prove the statement for the AMP orbit yt. We willindeed prove by induction on t that for any i â CN
a and any j 6= i,
limNââ
E[(
ytiâj
)m]= E
[(Zt
a
)m], (3.35)
limNââ
1
|CNa |
â
iâCNa
(yt
iâj
)m= E
[(Zt
a
)m]in probability . (3.36)
For t â„ 1, let Ft be the Ï-algebra generated by A0, . . . , Atâ1. We will show, using the central limittheorem, that the random vector (yt+1
iâj(1), . . . , yt+1iâj(q)) given Ft converge in distribution to a centered
Gaussian random vector. More precisely, by (3.8) and the induction hypothesis, the following limitholds in probability,
limNââ
E
[yt+1
iâj(r)yt+1iâj(s)
âŁâŁFt
]= lim
Nââ
â
`â[N]\j
`âCNb
E[(At
`i)2]gr(y
t`âi, b, t)gs(y
t`âi, b, t)
=
kâ
b=1
cbWabE[gr(Z
tb, b, t)gs(Z
tb, b, t)
]= ÎŁt+1
a (r, s) .
Since for all r â [q] from (3.8) we have E[yt+1iâj(r)] = 0, from the central limit theorem, it follows that
yt+1iâj converges to a centered Gaussian vector with covariance ÎŁt+1
a . Since all the moments of yt+1iâj
are bounded uniformly in N by Proposition 7 and Lemma 2, the induction claim, Eq. (3.35) follows,for iteration t+ 1.
In the base case t = 0 the same conclusion holds because
limNââ
E[y1
iâj(r)y1iâj(s)
]= lim
Nââ
â
`â[N]\j
`âCNb
E[(A0
`i)2]gr(y
0`âi, b, 0)gs(y
0`âi, b, 0)
=
kâ
b=1
cbWabÎŁ0b(r, s) ,
where the second identity holds by assumption.
22
Next consider the induction claim Eq. (3.36). Recall the representation introduced in Section3.4:
ytiâj(r) =
â
TâT tiâj (r)
A(T, t)Î(T, c, t)x(T ),
A(T, t) =â
(uâv)âE(T )
Atâ|u|`(u)`(v).
Using this representation of ytiâj, yt
kâj it is easy to show that, for i 6= k, i, k â CNa
âŁâŁE[(
ytiâj
)m(yt
kâj
)m]â E[(
ytiâj
)m]E[(
ytkâj
)m]âŁâŁ †Δ(N) , (3.37)
for some function Δ(N) â 0 as N â â ar m, C, d, t fixed. Indeed, the above expectations canbe represented as sums over m = m(1) + m(2) + · · · + m(q) trees T1, . . . , Tm â T t
iâj and m treesT âČ1, . . . , T
âČm â T t
kâj. Let G be the simple graph obtained by identifying vertices of the same type inT1, . . . , Tm, T
âČ1, . . . , T
âČm.
By Lemma 2 and the argument in the proof of Proposition 6, all the terms in which G hascycles, or an edge of G correspond to more than 2 edges in the union of T1, . . . , Tm, T âČ1, . . . , T
âČm
add up to a vanishing contribution in the N â â limit. Further, all the terms in which G is theunion of two disconnected components (one containing i, and the other containing k) are identical
in E
[(yt
iâj
)m(yt
kâj
)m]and E
[(yt
iâj
)m]E
[(yt
kâj
)m]and hence cancel out. We are therefore left
with the sum over trees T1, . . . , Tm, TâČ1, . . . , T
âČm such that G is itself a connected tree, with edges
covered exactly twice. Assume, to be definite, that G has ” vertices and hence ” â 1 edges. Theweight of such a term is bounded by
KE
{mâ
i=1
A(Ti, t)
mâ
i=1
A(T âČi , t)
}†KNâ”+1
On the other hand, the number of such terms is bounded by KN ”â2 (because the type has to beassigned to ” vertices, but 2 of these are fixed to i and k), and hence the overall contribution of theseterms vanishes as well.
From Eq. (3.37) and using the fact that E[(ytiâj)
2m] †K (because of Lemma 2 and Proposition7), we have
limNââ
Var{ 1
|CNa |
â
iâCNa
(yt
iâj
)m}
†limNââ
1
|CNa |2
â
i,kâCNa
âŁâŁâŁE[(
ytiâj
)m(yt
kâj
)m]â E[(
ytiâj
)m]E[(
ytkâj
)m] âŁâŁâŁ = 0 .
Equation (3.36) follows for iteration t+ 1 by applying Chebyshev inequality to the sequence
1
|CNa |
â
iâCNa
(ytiâj)
m
Nâ„0
,
and using (3.35).
23
We are now ready to prove Theorem 5 in the case in which Ï : Rq â R is a polynomial.
Proposition 11. Let (A(N),FN , x0)Nâ„0 be a polynomial and converging sequence of AMP in-
stances, and denote by {xt}tâ„0 the corresponding AMP orbit. Then we have for any t â„ 1, m =(m(1), . . . ,m(q)) â N
q,
limNââ
Var{ 1
|CNa |
â
iâCNa
(xt
i
)m}= 0 . (3.38)
Proof. In order to prove (3.38), we fix t â„ 1 and a â [k], and construct a modified sequence ofAMP instances as follows. The new sequence has N âČ = 2N and kâČ = k + 1. The new partitionof the variable indices {1, . . . , N} is the same as in the original instances, with the addition ofCN
k+1 = {N + 1, . . . , 2N = N âČ}. Further we set, for Ï : Rq â R a polynomial,
1. For i, j †N : AâČij = Aij and when i > N or j > N define AâČij ⌠N(0, 1/N) independently.
2. gâČ(x, b, tâČ) = g(x, b, tâČ) for b â [k], tâČ â€ t â 1; gâČ(x, b, t) = 0 for b â [k] \ a; gâČ1(x, a, t) = Ï(x),gâČr(x, a, t) = 0, for r â„ 2; gâČ(x, k + 1, tâČ) = 0 for all tâČ.
The definition of gâČ(x, a, tâČ) for tâČ > t is irrelevant for our purposes.
Since gâČ(x, k + 1, tâČ) = 0 for all tâČ, the orbit (xtâČi : i †N, tâČ â€ t) is not affected by the new variables.
Further, by the general AMP equation (1.6), we have, for i â CNk+1
xt+1i (1) =
â
jâCNa
AijÏ(xtj) . (3.39)
Notice that the {Aij}jâCNa
in this equation are independent of xtj . Hence
E{xt+1i (1)4} =
â
j1,...,j4âCNa
E{Aij1Aij2Aij3Aij4}E{Ï(xtj1)Ï(xt
j2)Ï(xtj3)Ï(xt
j4)} (3.40)
=3
N2
â
j1,j2âCNa
E{Ï(xtj1)
2Ï(xtj2)
2} . (3.41)
On the other hand, using Proposition 10 (once for iteration t+1 and i â CNk+1, and another time for
iteration t and i â CNa ) we get
limNââ
E{xt+1i (1)4} = E{(Z t+1
k+1(1))4} = 3(ÎŁt+1
k+1(1, 1))2 = 3c2aE{Ï(Zt
a)2}2 , i â CN
k+1,(3.42)
limNââ
E{Ï(xti)
2} = E{Ï(Z ta)
2} , i â CNa , (3.43)
where Z ta ⌠N(0,ÎŁt
a). Comparing these equations with Eq.(3.41) we conclude that
limNââ
1
N2
â
j1,j2âCNa
E{Ï(xtj1)
2Ï(xtj2)
2} =
lim
Nââ
1
N
â
jâCNa
E[Ï(xtj)
2]
2
. (3.44)
24
Equivalently
limNââ
Var{ 1
|CNa |
â
iâCNa
Ï(xti)
2}
= 0 . (3.45)
Taking Ï(x) = xk, we obtain Eq.(3.38) for m even. In order to establish Eq.(3.38) for general m wetake, for instance, Ï(x) = 1 + Δxm and use the fact that the limit must vanish for all Δ.
At this point we can prove Theorem 5.
Proof of Theorem 5. By Remark 1 and Remark 2, we reduced ourselves to the case t = s, andY (i) = 0 (equivalently, Y (i), is absent).
Consider the empirical measure on Rq given by
”Na =
1
|CNa |
â
iâCNa
ÎŽxti.
Proposition 10 shows the convergence of expected the moments of ”Na to moments that determine
the Gaussian distribution. Proposition 11 combined with Chebyshev inequality implies
limNââ
”Na
((xt
i
)m) = E
[(Zt
a
)m],
in probability. The proof follows using the relation between convergence in probability and conver-gence almost sure along subsequences, together with the moment method.
4 Non-symmetric matrices
In this section we consider a slightly different setting that turns out to be a special case of the oneintroduced in Section 1.3.
Definition 12. A converging sequence of (polynomial) bipartite AMP instances {(A(n), f, h, x0,n)}nâ„1
is defined by giving for each n:
1. A matrix A(n) â RmĂn with m = m(n) such that limnââm(n)/n = ÎŽ > 0. Further, A(n) =
(Aij)iâ€m,jâ€n is a matrix with the entries Aij independent subgaussian random variables withcommon scale factor C/n and first two moments E{Aij} = 0, E{A2
ij} = 1/m.
2. Two functions f : Rq Ă R
q Ă N â Rq, and h : R
q Ă Rq Ă N â R
q such that, for each t ℠0,f( · , · , t) and h( · , · , t) are polynomials.
3. An initial condition x0,n = (x01, . . . ,x
0n) â Vq,n ' (Rq)n, with x0
i â Rq, such that, in probability,
nâ
i=1
exp{âx0,ni â2
2/C} †nC , (4.1)
limnââ
1
m(n)
nâ
i=1
f(x0i , Y (i), 0)f(x0
i , Y (i), 0)T = Î0 . (4.2)
25
4. Two collections of i.i.d. random variables (Y (i), i â [n]) and (W (j), j â [m]) with Y (i) âŒi.i.d. Qand W (j) âŒi.i.d. P . Here Q and P are finite mixture of Gaussians on R
q.
Throughout this section, we will refer to non-bipartite AMP instances as per Definition 5, as tosymmetric instances. With these ingredients, we define the AMP orbit as follows.
Definition 13. The approximate message passing orbit corresponding to the bipartite instance(A, f, h, x0) is the sequence of vectors {xt, zt}tâ„0, x
t â Vq,n zt â Vq,m defined as follows, for t â„ 0,
zt = Af(xt, Y ; t)â Bt h(ztâ1,W ; tâ 1) , (4.3)
xt+1 = AT h(zt,W ; t)â Dt f(xt, Y ; t) , (4.4)
where f( · · · ), h( · · · ) are applied componentwise (see below for an explicit formulation). Here Bt :Vq,m â Vq,m is the linear operator defined by letting, for v âČ = Btv, and any j â [m],
vâČj =
â
kâ[n]
A2jk
âf
âx(xt
k, Y (k); t)
vj . (4.5)
Analogously Dt : Vq,n â Vq,n is the linear operator defined by letting, for v âČ = Dtv, and any j â [n],
vâČi =
â
lâ[m]
A2li
âh
âz(zt
l ,W (l); t)
vi . (4.6)
For the sake of clarity, it is useful to rewrite the iteration (4.3), (4.4) explicitly, by components:
zti =
â
jâ[n]
Aijf(xtj, Y (j); t) â
â
kâ[n]
A2jk
âf
âx(xt
k, Y (k); t)h(ztâ1i ,W (i); t â 1) for all i â [m],
xt+1j =
â
iâ[m]
Aijh(yti ,W (i); t) â
â
lâ[m]
A2lj
âh
âz(zt
l ,W (l); t) f(xtj , Y (j); t) for all j â [n].
We will state and prove a state evolution result that is analogous to Theorem 5 for the present case.Since the proof is by reduction to the symmetric case, the same argument also implies a universalitystatement of the type of Theorem 3. However, we will not state explicitly any universality statementin this case. We begin by introducing the appropriate state evolution recursion. In analogy withEq. (1.10), we introduce two sequences of positive semidefinite matrices {ÎŁt}tâ„0, {Ît}tâ„0 by lettingÎ0 be given as per Eq. (4.2) and defining, for all t â„ 1,
ÎŁt = E
{h(Ztâ1,W, tâ 1)h(Z tâ1,W, tâ 1)T
}, Ztâ1 ⌠N(0,Îtâ1) , W ⌠P , (4.7)
Ît =1
ÎŽE
{f(Xt, Y, t)f(X t, Y, t)T
}, Xt ⌠N(0,ÎŁt) , Y ⌠Q . (4.8)
We also define a two-times recursion analogous to Eqs. (3.2), (3.3). Namely, we introduce theboundary condition
Î0,0 =
(Î0 Î0
Î0 Î0
), Ît,0 =
(Ît 00 Î0
), Î0,t =
(Î0 00 Ît
), (4.9)
26
with Ît defined per Eq. (4.7), (4.8). For any s, t â„ 1, we set recursively
ÎŁt,s = E
{Ztâ1,sâ1ZT
tâ1,sâ1
}, (4.10)
Ztâ1,sâ1 ⥠[h(Ztâ1,W, tâ 1), h(Zsâ1,W, sâ 1)] , (4.11)
Ît,s = E
{Xt,sXT
t,s
}, (4.12)
Xt,s ⥠[f(Xt, Y, t), f(Xs, Y, s)] . (4.13)
(Recall that [u, v] denotes the column vector obtained by concatenating u and v.)
Theorem 6. Let {(A(n), f, h, x0,n)}nâ„1 be a polynomial and converging sequence of bipartite AMPinstances, and denote by {xt, zt}tâ„0 the corresponding AMP orbit.
Fix s, t ℠1. If s 6= t, further assume that the initial condition x0,n is obtained by letting x0,ni ⌠R
independent and identically distributed, with R a finite mixture of Gaussians. Then, for each locallyLipschitz function Ï : R
q Ă Rq Ă R
q â R such that |Ï(x,xâČ, y)| †K(1 + âyâ22 + âxâ2
2 + âxâČâ22)
K , wehave, in probability,
limnââ
1
n
â
jâ[n]
Ï(xtj ,x
sj , Y (j)) = E
[Ï(Xt, Xs, Y )
], (4.14)
limNââ
1
m(n)
â
jâ[m]
Ï(ztj , z
sj ,W (j)) = E
[Ï(Zt, Zs,W )
], (4.15)
where (X t, Xs) ⌠N(0,ÎŁt,s) is independent of Y ⌠Q, and (Z t, Zs) ⌠N(0,Ît,s) is independent ofW ⌠P .
Proof. The proof follows by constructing a suitable polynomial and converging sequence of symmetricinstances, recognizing that a suitable subset of the resulting orbit corresponds to the orbit {xt, zt}of interest, and applying Theorem 5.
Specifically, given a converging sequence of bipartite instances (A(n), f, h, x0,n), we construct asymmetric instance (As(N), g, x0,N
s ) with (below we use the subscript s to refer to the symmetricinstance):
1. The symmetric instance has dimensions N = n+m and qs = q, qs = q.
2. We partition the index set in k = 2 subsets: [N ] = CN1 âȘ CN
2 , with CN1 = {1, . . . ,m} and
CN2 = {m+ 1, . . . ,m+ n}. In particular c1 = ÎŽ/(1 + ÎŽ) and c2 = 1/(1 + ÎŽ).
3. The symmetric random matrix AâČ is given by
As =
(0 AAT 0
).
In particular W11 = W22 = 0 and W12 = W21 = (1 + ÎŽ)/ÎŽ.
4. The vertex labels are Ys(i) = W (i) for i †m and Ys(i) = Y (i âm) for i > m. In particular,these are independent random variables with distribution Ys(i) ⌠P1 = Q if i â CN
1 andYs(i) ⌠P2 = P if i â CN
2 .
27
5. The initial condition is given by x0,Ns,i = 0 for i â CN
1 and x0,Ns,i = x
0,niâm for i â CN
2 .
6. Finally, for any x â Rq, Y â R
q, t â„ 0, we let
g(x, Y, a = 1, 2t) = f(x, Y, t) , (4.16)
g(x, Y, a = 2, 2t+ 1) = h(x, Y, t), . (4.17)
The definition of g(x, Y, a = 1, 2t+ 1) and g(x, Y, a = 2, 2t) is irrelevant for our purposes.
The proof is concluded by recognizing that, for all t â„ 0,
x2t+1s,i = zt
i, for i â CN1 ,
x2ts,i = xt
iâm, for i â CN2 ,
We finish this section with a lemma that establishes continuity of the AMP trajectories withrespect to Gaussian perturbations of the matrix A. This fact will be used in the next section. (Noticethat an analogous Lemma holds by the same argument for converging, non-bipartite, instances.)
Lemma 4. Let {(A(n), f, h, x0,n)}nâ„1 be a polynomial converging sequence of bipartite AMP in-stances and denote by {xt, zt}tâ„0 the corresponding AMP orbit. For each n, let G(n) â R
m(n)Ăn
be a random matrix with i.i.d. entries G(n)ij ⌠N(0, 1/m(n)), independent of A(n). Consider theperturbed sequence {(A(n) = A(n) + Îœ G(n), f, h, x0,n)}nâ„1, with Îœ â R
+ and denote by {xt, zt}tâ„0
the corresponding AMP orbit. Then for any t there exists a constant K independent of n such that
E{âxti â xt
iâ22} †K
(Îœ2 + nâ1/2
), E{âzt
i â ztiâ2
2} †K(Îœ2 + nâ1/2
).
Proof. Consider the difference [xti(r)â xt
i(r)]. By the tree representation in Section 3.2 and Lemma3, this difference can be written as a polynomial in A and G whereby each monomial has the form
Î(T, t)x(T ){ â
(uâv)âE(T )
A`(u)`(v) ââ
(uâv)âE(T )
A`(u)`(v)
}. (4.18)
Enumerating the edges in T as (u1, v1),. . . , (uk, vk) the quantity in parenthesis reads
kâ
i=1
iâ1â
j=1
A`(uj),`(vj ) · Îœ G`(ui),`(vi) ·kâ
j=i+1
A`(uj),`(vj ) . (4.19)
In other words, the sum over trees T is replaced by a sum over trees with one distinguished edge,and the edge carries weight Îœ G`(ui),`(vi). The expectation E{âxt
i â xtiâ2
2} is given by a sum over
pairs of such marked trees. Using the fact that the entries of the matrix A(n) are still independentsubgaussian with scale factor C/(n+Îœ2Cm(n)) †C âČ/n, it is easy to see that the argument in Lemma2 and (3.30) are still valid. Hence, up to errors bounded by K nâ1/2 the only terms that contributeto this sum are those over pair of trees such that the graph G obtained by identifying vertices of thesame type has only double edges. In particular for the distinguished edge, we can use the followingupper bound instead of (3.15): E
[|ÎœGij |2
]= Μ2
m(n) †K Μ2
n and this yields a factor Μ2 (by the same
argument as in the proof of Lemma 2 to get (3.20)).
28
5 Proof of universality of polytope neighborliness
In this section we prove Theorem 2, deferring several technical steps to the Appendix.
Hypothesis 1 Throughout this section {A(n)}nâ„0 is a sequence of random matrices whereby A(n) âR
mĂn has independent entries that satisfy E{A(n)ij} = 0, E{A(n)2ij} = 1/m and are subgaus-sian with scale factor s/m, with s independent of m, n.
Notice that these matrices differ by a factor 1/âm from the matrices in the statement of Theorem
2. Since neighborliness is invariant under scale transformations, this change is immaterial.The approach we will follow is based on the equivalence between weak neighborliness and com-
pressed sensing reconstruction developed in [Don05b, Don05a, DT05b, DT05a]. Within compressedsensing, one considers the problem of reconstructing a vector x0 â R
n from a vector of linear âobser-vationsâ y = Ax0 with y â R
m and m †n. The measurement matrix A â RmĂn is assumed to be
known. An interesting approach towards reconstructing x0 from the linear observations y consistsin solving a convex program:
x(y) = arg min{âxâ1 such that x â R
n , y = Ax ,}. (5.1)
Hence one says that `1 minimization succeeds if the above arg min is uniquely defined and x(y) = x0.Remarkably, this event only depends on the support of x0, supp(x0) = {i â [n] : x0,i 6= 0} [Don05b].This motivates the following abuse of terminology. We say that, for a given matrix A, `1 minimizationsucceeds for a fraction f of vectors x0 with3 âx0â0 †k if it does succeed for at least f
(nk
)choices
of supp(x0) out of the(nk
)possible ones. Analogously, that `1 minimization fails for a fraction f of
vectors x0 if it does succeed at most for (1â f)(nk
)choices of supp(x0).
Success of `1 minimization turns out to be intimately related to the neighborliness properties ofthe polytope ACn.
Theorem 7 (Donoho, 2005). Fix ÎŽ â (0, 1). For each n â N, let m(n) = bnÎŽc and A(n) â Rm(n)Ăn
be a random matrix. Then the sequence {A(n)Cn}nâ„0 has weak neighborliness Ï in probability if andonly if the following happens:
1. For any Ïâ < Ï, there exists Δn â 0 such that, for a fraction larger than (1 â Δn) of vectors x0
with âx0â0 = m(n) Ïâ the `1 minimization succeeds with high probability (with respect to thechoice of the random matrix A(n)).
2. Viceversa, for any Ï+ > Ï, there exists Δn â 0 such that, for a fraction larger than (1â Δn) ofvectors x0 with âx0â0 = m(n) Ï+ the `1 minimization fails with high probability (with respectto the choice of the random matrix A(n)).
This is indeed a rephrasing of Theorem 2 in [Don05b].In view of this result, Theorem 2 follows from the following result on compressed sensing with
random sensing matrices.
3As customary in this domain, we denote by âvâ0 the number of non-zero entries in v â Rq (which of course is not
a norm).
29
Theorem 8. Fix ÎŽ â (0, 1). For each n â N, let m(n) = bnÎŽc and define A(n) â Rm(n)Ăn to be
a random matrix with independent subgaussian entries, with mean 0, variance 1/m, and commonscale factor s/m. Further assume Aij(n) = Aij(n) + Îœ0Gij(n) where Îœ0 > 0 is independent of n and{Gij(n)}iâ[m],jâ[n] is a collection of i.i.d. N(0, 1/m) random variables independent of A(n).
Consider either of the following two cases:
1. The matrix A(n) has i.i.d. entries and {x0(n)}nâ„1 is any fixed sequence of vectors withlimnââ âx0(n)â0/m(n) = Ï.
2. The matrix A(n) has independent but not identically distributed entries. The vectors x0(n)have i.i.d. entries independent of A(n), with P{x0,i(n) 6= 0} = ÏÎŽ.
Then the following holds. If Ï < Ïâ(ÎŽ) then `1 minimization succeeds with high probability. Viceversa,if Ï > Ïâ(ÎŽ), then `1 minimization fails with high probability. (Here probability is with respect to therealization of the random matrix A(n) and, eventually, x0(n).)
The rest of this section is devoted to the proof of Theorem 8. Indeed, as shown below, thisimmediately implies Theorem 2.
Proof of Theorem 2. Take x0(n) to be a sequence of independent vectors with independent entriessuch that PÏ{x0(n)i = 1} = ÏÎŽ and PÏ{x0(n)i = 0} = 1â ÏÎŽ. Then, by the law of large numbers wehave limnââ âx0(n)â0/m(n) = Ï almost surely. Let A(n) â R
m(n)Ăn be a matrix with i.i.d. entriesas per Hypothesis 1 above, with m(n) = bnÎŽc and y(n) = A(n)x0(n). Applying Theorem 8, we have,for any Ïâ < Ïâ(ÎŽ) and Ï+ > Ïâ(ÎŽ)
limnââ
PÏâ
{x(y(n)) = x0(n)
}= 1 , (5.2)
limnââ
PÏ+
{x(y(n)) = x0(n)
}= 0 , (5.3)
where Pϱ
{·}
denotes probability with respect to the law just described when Ï = ϱ. LetV (Ï;m,n) be the fraction of vectors x0 with âx0â = bmÏc on which `1 reconstruction succeeds.Since in Eqs. (5.2), (5.3), support of x0(n) is uniformly random given its size, and the probability ofsuccess is monotone decreasing in the support size [Don05b], the above equations imply
limnââ
E{V (Ïâ;m,n)
}= 1 , (5.4)
limnââ
E{V (Ï+;m,n)
}= 0 , (5.5)
Using Markov inequality, Eqs. (5.4), (5.5) coincide (respectively) with assumptions 1 and 2 in The-orem 7. The claim follows by applying this theorem.
Let us now turn to the proof of Theorem 8. The following Lemma provides a useful sufficientcondition for successful reconstruction. Here and below, for a convex function F : R
q â R, âF (x)denotes the subgradient of F at x â R
q. In particular ââxâ1 denotes the subgradient of the `1 normat x. Further, for R â [n], AR denotes the submatrix of A formed by columns with index in R.The singular values of a matrix M â R
d1Ăd2 are denoted by Ïmax(M) ⥠Ï1(M) â„ Ï2(M) ℠· · · â„Ïmin(d1,d2)(M) ⥠Ïmin(M).
Lemma 5. For any c1, c2, c3 > 0, there exists Δ0(c1, c2, c3) > 0 such that the following happens. Ifx0 â R
n, A â RmĂn, y = Ax0 â R
m, are such that
30
1. There exists v â ââx0â1 and z â Rm with v = ATz+w and âwâ2 â€
ân Δ, with Δ †Δ0(c1, c2, c3).
2. For c â (0, 1), let S(c) ⥠{i â [n] : |vi| â„ 1 â c}. Then, for any S âČ â [n], |SâČ| †c1n, theminimum singular value of AS(c1)âȘSâČ satisfies Ïmin(AS(c1)âȘSâČ) â„ c2.
3. The maximum singular value of A satisfies câ13 †Ïmax(A)2 †c3.
Then x0 is the unique minimizer of âxâ1 over x â Rn such that y = Ax.
The proof of this lemma is deferred to Appendix B.The proof of Theorem 8 consists in two parts. For Ï > Ïâ(ÎŽ), we shall exhibit a vector x with
âxâ1 < âx0â1 and y = Ax. For Ï < Ïâ(ÎŽ) we will show that assumptions of Lemma 5 hold. Inparticular, we will construct a subgradient v as per assumption 1. In both tasks, we will use aniterative message passing algorithm analogous to the one in Section 4. The algorithm is defined bythe following recursion initialized with x0 = 0:
xt+1 = η(xt +ATzt;αÏt) , (5.6)
zt = y âAxt + bt ztâ1 , (5.7)
where η(u; Ξ) = sign(u) (u â Ξ)+, α is a non-negative constant, and bt is a diagonal matrix whoseprecise definition is immaterial here and will be given in the proof of Proposition 14 below. Noticetwo important differences with respect to the treatment in Section 4:
âą The iteration in Eqs. (5.6), (5.7) does not take immediately the form in Eqs. (4.3), (4.4). Forinstance the nonlinear mapping η( · ;αÏt) is applied after multiplication by AT. This mismatchcan be resolved by a simple change of variables.
âą The nonlinear mapping η( · ;αÏt) is not a polynomial. This point will be addressed by con-structing suitable polynomial approximations of η.
We refer to Appendix A for further details.For t â„ 0, Ït is defined by the one-dimensional recursion
Ï2t+1 =
1
ÎŽE{[η(X + Ït Z;αÏt)âX]2} , (5.8)
where expectation is with respect to the independent random variables Z ⌠N(0, 1), X ⌠pX , andÏ2
0 = E{X2}/ÎŽ.Proposition 14. Let {(x0(n), A(n), y(n))}nâ„0 be a sequence of triples with A(n) random as perHypothesis 1, {x0,i(n) : i â [n]} independent and identically distributed with x0,i(n) ⌠pX a finitemixture of Gaussians on R, and y(n) = A(n)x0(n).
Then, for each n there exist a sequence of vectors {xt(n), zt(n)}tâ„0, with xt(n) = xt â Rn,
zt(n) = zt â Rm, such that the following happens for every t.
1. There exists a diagonal matrix bt = bt(n) such that
zt = y âAxt + bt ztâ1 , (5.9)
limnââ
maxiâ[m]
(bt)ii = limnââ
miniâ[m]
(bt)ii =1
ÎŽP{|X + Ïtâ1Z| ℠αÏtâ1
}. (5.10)
where the limit holds in probability.
31
2. In probability
limnââ
1
nâxt+1 â η(xt +ATzt;αÏt)â2
2 = 0 . (5.11)
3. For any locally Lipschitz function Ï : RĂ R â R, |Ï(x, y)| †C(1 + x2 + y2), in probability
limnââ
1
n
nâ
i=1
Ï(x0,i, xti + (ATzt)i) = EÏ(X,X + ÏtZ) . (5.12)
4. There exist a two functions o(a; c) and o(a, b; c), with o(a; c) â 0, o(a, b; c) â 0 as c â 0 ata, b fixed, such that the following holds. Assume Aij(n) = Aij(n) + Îœ Gij(n) where Îœ > 0 isindependent of n and {Gij(n)}iâ[m],jâ[n] is a collection of i.i.d. N(0, 1/m) random variables
independent of A(n). Then there exists a sequence of vectors {xt, zt}tâ„0 that is independent ofG such that, for any t â„ 0,
1
n
nâ
i=1
E{(
(xt +ATzt)i â (xt + ATzt)i)2} †o(t; Îœ) + o(t, Îœ;nâ1) , (5.13)
1
m
mâ
i=1
E{(zti â zt
i
)2} †o(t; Îœ) + o(t, Îœ;nâ1) . (5.14)
The proof is deferred to Appendix A.We also need a generalization of the last proposition for functions of the estimates xt, xs at two
distinct iteration numbers t 6= s. To this objective, we introduce the generalization of the stateevolution equation (5.8). Namely, we define {Rs,t}s,tâ„0 recursively for all s, t â„ 0 by letting
Rs+1,t+1 =1
ÎŽE{[η(X + Zs;αÏs)âX][η(X + Zt;αÏt)âX]
}. (5.15)
Here the expectation is with respect to X ⌠pX and the independent Gaussian vector [Zs, Zt] withzero mean and covariance given by E{Z2
s} = Rs,s, E{Z2t } = Rt,t and E{ZtZs} = Rt,s. The boundary
condition is fixed by letting R0,0 = E{X2}/ÎŽ and defining, for each t â„ 0,
R0,t+1 =1
ÎŽE{[η(X + Zt;αÏt)âX][âX]
}, (5.16)
with Zt ⌠N(0, Rt,t). This uniquely determine the doubly infinite array {Rt,s}t,sâ„0. Notice inparticular that Rt,t = Ï2
t for all t â„ 0. (This is easily checked by induction over t).
Proposition 15. Under the assumptions of Proposition 14 the sequence {xt(n), zt(n)}tâ„0 constructedthere further satisfies the following. For any fixed t, s â„ 0, and any Lipschitz continuous functionsÏ : RĂ RĂ R â R, Ï : RĂ R â R, in probability
limnââ
1
n
nâ
i=1
Ï(x0,i, x
si + (ATzs)i, x
ti + (ATzt)i
)= EÏ(X,X + Zs, X + Zt) , (5.17)
limnââ
1
m
nâ
i=1
Ï(zsi , z
ti ) = EÏ(Zs, Zt) , (5.18)
where expectation is with respect to X ⌠pX and the independent Gaussian vector (Zs, Zt) with zeromean and covariance given by E{Z2
s} = Rs,s, E{Z2t } = Rt,t and E{ZtZs} = Rt,s.
32
The proof of this proposition is in Appendix A.Finally, we need some analytical estimates on the recursions (5.8) and (5.15). Part of these
estimates were already proved in [DMM09, DMM11, BM12], but we reproduce them here for thereaderâs convenience. Proofs of the others are provided in Appendix C.
Lemma 6. Let pX be a probability measure on the real line such that pX({0}) = 1âΔ and EpX{X2} <
â, fix ÎŽ â (0, 1) and set Ï = ΎΔ. For this choice of parameters, consider the sequences {Ï2t }tâ„0,
{Rs,t}s,tâ„0 defined as per Eqs. (5.8), (5.15).If Ï < Ïâ(ÎŽ) then
(a1) There exists α1(Δ, ÎŽ), α2(Δ, ÎŽ), αâ(Δ) with 0 < α1(Δ, ÎŽ) < αâ(Δ) < α2(Δ, ÎŽ) < â, and Ïâ(Δ, ÎŽ) â(0, 1) such that the following happens. For each α â (α1, α2), Ï
2t = B Ït(1 + ot(1)) as tââ,
with Ï â (0, 1).
Further, for each Ï â [Ïâ(Δ, ÎŽ), 1) there exists αâ â (α1, αâ] and α+ â [αâ, α2) (distinct as longas Ï > Ïâ) such that, letting α â {αâ, α+}, Ï2
t = B Ït(1 + ot(1)).
Finally, for all α â [αâ, α2), we have Δ+ 2(1â Δ)Ί(âα) < ÎŽ.
(a2) For any α â [αâ(Δ), α2(Δ, ÎŽ)), we have limtââRt,tâ1/(ÏtÏtâ1) = 1.
(a3) Assume pX to be such that max(pX((0, a)), pX ((âa, 0))) †Bab for some B, b > 0 (in particularthis is the case if pX has an atom at 0 and is absolutely continuous in a neighborhood of 0).Fixing again α â [αâ(Δ), α2(Δ, ÎŽ)), and c â R+,
limt0ââ
supt,sâ„t0
P{|X + Zs| â„ c Ïs ; |X + Zt| < cÏt
}= 0 , (5.19)
where (Zs, Zt) is a gaussian vector with E{Z2s} = Ï2
s , E{Z2s} = Ï2
s , E{ZsZt} = Rs,t.
Viceversa, if Ï > Ïâ(ÎŽ) , then there exists α0(ÎŽ, pX ) > αmin(ÎŽ) > 0 such that
(b1) For any α > αmin(ÎŽ), we have limtââ Ï2t = Ï2
â > 0 and, for α ℠α0, limtââ[Rt,t â 2Rt,tâ1 +Rtâ1,tâ1] = 0.
(b2) Letting α = α0(ÎŽ, pX ), we have P{|X + ÏâZ| ℠αÏâ} = ÎŽ.
(b3) Consider the probability distribution pX = (1 â Δ)ÎŽ0 + Δ Îł with Îł(dx) = exp(âx2/2)/â
2Ï dxthe standard Gaussian measure. Then, setting α = α0(ÎŽ, pX), we have limtââ E{|η(X +ÏtZ;αÏt)|} < E{|X|}, where Z ⌠N(0, 1) independent of X.
We are now in position to prove Theorem 8. For greater convenience of the reader, we distinguishthe cases Ï < Ïâ(ÎŽ) and Ï > Ïâ(ÎŽ). Before considering these cases, we will establish some commonsimplifications.
5.1 Proof of Theorem 8, common simplifications
Consider first case 1. By exchangeability of the columns of A(n), it is sufficient to prove the claimfor the sequence of random vectors obtained by permuting the entries of x0(n) uniformly at random.Hence x0(n) is a vector with a uniformly random support supp(x0(n)) = Sn, with deterministic size|Sn| such that |Sn|/n â Δ. Further, the success of `1 minimization is an event that is monotone
33
decreasing in the support supp(x0(n)) [Don05b]. Therefore we can replace the deterministic supportsize, with a random size |Sn| ⌠Binom(n, Δ) (which concentrates tightly around nΔ).
Finally, since success of `1 minimization only depends on the support of x0(n) [Don05b], we canreplace the non-zero entries by arbitrary values. We will take advantage of this fact and assumethat all the non-zero entries of x0(n) are i.i.d. N(0, 1). We conclude that it is sufficient to provethat `1 minimization succeeds/fails with high probability if the vectors x0(n) have i.i.d. entries withdistribution pX = (1â Δ)ÎŽ0 + Δ Îł, where Îł(dx) = exp(âx2/2)/
â2Ï dx.
Consider next case 2, in which the entries of x0(n) are i.i.d. with P{x0,i(n) 6= 0} = ÏÎŽ = Δ. Again,exploiting the fact that the success of `1 minimization depends only on the support of x0(n), we canassume that its entries have common distribution pX = (1â Δ)ÎŽ0 + Δ Îł.
Summarizing this discussion, in order to prove the Theorem both in case 1 and case 2, it will besufficient to do so for the following setting
Remark 3. In the proof of Theorem 8, we can assume the vectors x0(n) to be random with i.i.d.entries with common distribution pX = (1â Δ)ÎŽ0 + Δ Îł, and the matrices A(n).
5.2 Proof of Theorem 8, Ï < Ïâ(ÎŽ)
Fix Ï < Ïâ(ÎŽ). We will prove that the hypotheses 1, 2, 3 of Lemma 5 hold with high probability forfixed c1, c2, c3 > 0, and Δ arbitrarily small. This implies the claim (i.e. that `1 minimization succeeds)by applying the Lemma. Notice that hypothesis 3 holds with high probability for some c3 = c3(ÎŽ)by classical estimates on the extreme eigenvalues of sample covariance matrices [BS98, BS05].
We next consider hypothesis 1 of Lemma 5. In order to construct the subgradient v used there, weconsider the sequence of vectors {xt, zt}tâ„0 defined by as per Proposition 14. We fix α â (α1(Δ), α2(Δ))as per Lemma 6.(a) so that Ï2
t = AÏt(1 + o(1)) with Ï â (0, 1) to be chosen close enough to 1. Also,we introduce the notation Ξt ⥠αÏt. We let vt â R
n be defined by
vti =
{sign(x0,i) if i â S ,
1Ξtâ1
(xtâ1 +ATztâ1 â xt
)i
otherwise,(5.20)
xt ⥠η(xtâ1 +ATztâ1; Ξtâ1) . (5.21)
Notice that, by definition of the function η( · ; · ) we have |xtâ1i â (ATztâ1)i â xt
i| †Ξtâ1, and hencevt â ââx0â1. We can write
vt =1
Ξtâ1ATzt + Οt + ÎČt + ζt , (5.22)
Οt ⥠1
Ξtâ1
(xtâ1 +ATztâ1 â xt âATzt
), (5.23)
ÎČt ⥠1
Ξtâ1
(xt â xt
), (5.24)
ζt âĄ{
sign(x0,i)â 1Ξtâ1
(xtâ1 +ATztâ1 â xt
)i
if i â S ,0 otherwise.
(5.25)
This part of the proof is completed by showing that there exists h(t) with limtââ h(t) = 0 suchthat, for each t, with high probability we have âΟ tâ2
2/n †(1ââÏ)2/α2 + h(t), âÎČtâ2
2/n †h(t), and
34
âζtâ22/n †h(t). Indeed, if this is true, we can then choose t sufficiently large and α â (αâ(Δ), α2(Δ, ÎŽ))
so that âΟt + ÎČt + ζtâ22 is small enough as to satisfy the condition 1 of Lemma 5.
First consider Οt. Applying Proposition 15 to Ï(x, y1, y2) = (y1 â y2)2, we have, in probability
limnââ
1
nâΟtâ2
2 = limnââ
1
nα2Ï2tâ1
âxt +ATzt â xtâ1 âATztâ1â22
=1
α2Ï2tâ1
[Rt,t â 2Rt,tâ1 +Rtâ1,tâ1]
=1
α2Ï2tâ1
[Ï2
t â 2ÏtÏtâ1 + Ï2tâ1
]+ 2
Ït
Ïtâ1
[1â Rt,tâ1
ÏtÏtâ1
]
=1
α2(1ââ
Ï)2 + h(t) .
Here the last equality follows from the fact that Ï2t /Ï
2tâ1 â Ï by Lemma 6.(a1) andRt,tâ1/(ÏtÏtâ1) â
1 by Lemma 6.(a2). This implies the claim for Οt.Next, consider ÎČt. By Proposition 14.2
limnââ
1
nâxt â xtâ2
2 = limnââ
1
nâxt â η(xtâ1 +ATztâ1;αÏtâ1)â2
2 = 0 , (5.26)
and hence âÎČtâ22/n †h(t) with high probability for any h(t) > 0
Finally consider ζ t, and define R(y; Ξ) = y â η(y; Ξ). We have
R(y; Ξ) =
+1 for y ℠Ξ,
y/Ξ for â Ξ < y < Ξ,
â1 for y †âΞ.
Using Proposition 14.3, we can show that
limnââ
1
nâζtâ2
2 = E{[sign(X)âR(X + Ïtâ1Z;αÏtâ1)]21X 6=0} . (5.27)
Notice that this apparently requires applying Proposition 14 to the function Ï(x, y) = [sign(x) âR(y; Ξ)]21x6=0 which is non-Lipschitz in x. However we can define a Lipschitz approximation, withparameter r > 0:
Ïr(x, y) =
{[x/r âR(y; Ξ)]2 |x|/r for |x| †r ,
[1âR(y; Ξ)] for |x| > r .(5.28)
Notice that Ïr is bounded and Lipschitz continuous. We further have |Ïr(x, y)â Ï(x, y)| †41(x 6=0; |x| †r), whence
lim supnââ
âŁâŁâŁ 1nâζtâ2
2 â1
n
nâ
i=1
Ïr(x0,i, xtâ1i +ATztâ1)
âŁâŁâŁ †lim supnââ
4
n
nâ
i=1
1(x0,i 6= 0; |x0,i| †r) †8 r .(5.29)
The last inequality holds almost surely by the law of large numbers using Îł([âr, r]) < 2r. AnalogouslyâŁâŁâŁEÏ(X,X + Ïtâ1Z)â EÏr(X,X + Ïtâ1Z)
âŁâŁâŁ †4 P(X 6= 0; |X| †r) †8r . (5.30)
35
Hence the claim (5.27) follows by applying Proposition 14.3 to Ïr(x, y), using Eqs. (5.29), (5.30),and letting râ 0.
We conclude by noting that the right-hand side of Eq. (5.27) converges to 0 as t â â bydominated convergence, since Ït â 0. Therefore
limnââ
1
nâζtâ2
2 â€h(t)
2.
this completes our proof of assumption 1 of Lemma 5.We finally consider hypothesis 2. Let St(c) be defined as there, for the subgradient vt, namely
St(c) âĄ{i â [n] : |vt
i | â„ 1â c}
= S âȘ{i â [n] \ S : |xtâ1 +ATztâ1| â„ (1â c)Ξtâ1
}.
Recall that by assumption Aij = Aij +Îœ Gij whereby Gij ⌠N(0, 1/m) and (eventually redefining Aij)we can freely choose Îœ â [0, Îœ0]. Let {xt, zt}tâ„0 be a sequence of vectors defined as per Proposition14.4, and define vt as vt, but replacing xt, zt, A by xt, zt, A
vti =
{sign(x0,i) if i â S ,
1Ξtâ1
(xtâ1 + ATztâ1 â xt
)i
otherwise,(5.31)
xt ⥠η(xtâ1 + ATztâ1; Ξtâ1) . (5.32)
We further define
St(c) âĄ{i â [n] : |vt
i | â„ 1â c}
= S âȘ{i â [n] \ S : |xtâ1 + ATztâ1| â„ (1â c)Ξtâ1
}.
We claim that the following two claims hold for some tâ â„ 0 independent of n:
Claim 1. There exists c1, c2 > 0 (independent of Îœ) such that for all S âČ â [n], |SâČ| †2c1n, the minimumsingular value of AeStâ(2c1)âȘSâČ , satisfies Ïmin(AeStâ (2c1)âȘSâČ) â„ c2Îœ with probability converging to1 as nââ.
Claim 2. For all t â„ tâ,
P{|St(c1) \ Stâ(2c1)| â„ n c1
}= o1(tâ; Îœ) + o2(tâ, Îœ;n
â1) ,
where o1(tâ, Îœ) vanishes as Îœ â 0 at tâ, c1, c2 fixed, and o2(tâ, Îœ;nâ1) vanishes as nâ1 â 0 at
tâ, Îœ, c1, c2 fixed.
These claims immediately imply that hypothesis 2 of Lemma 5 holds with probability convergingto one as n â â. Indeed, if |S âČ| †nc1, then (by Claim 2) St(c1) âȘ SâČ â Stâ(2c1) âȘ SâČâČ where|SâČâČ| †2nc1 with probability larger than 1 â o1(tâ; Îœ) â o2(tâ, Îœ;n
â1). By Claim 1, we hence haveÏmin(ASt(c1)âȘSâČ) â„ c2 ⥠c2Îœ. The thesis follows since Îœ can be chosen as small as we want. (Noticethat once tâ is fixed to satisfy these claims, we can still choose t â„ tâ arbitrarily to satisfy hypothesis1 of Lemma 5, as per the argument above.)
36
In order to prove Claim 1, above first notice that, for any b â„ 0
P
{minSâČâ[n]
|SâČ|â€2c1n
Ïmin(AeStâ(2c1)âȘSâČ) < c2Îœ
}
†P
{minSâČâ[n]
|SâČ|â€2c1n
Ïmin(AeStâ(2c1)âȘSâČ) < c2Îœ; |Stâ(2c1)| †bn}
+ P{|Stâ(2c1)| > bn
}
†enH(2c1) maxSâČâ[n]
|SâČ|â€2c1n
P
{Ïmin(AeStâ(2c1)âȘSâČ
) < c2Îœ; |Stâ(2c1)| †bn}
+ P{|Stâ(2c1)| > bn
},
(5.33)
where in the last line H(c) denotes the binary entropy of b and we used( nnc
)†exp{nH(c)}. We
want to show that tâ, b, c1, c2, Îœ can be chosen so that both contributions vanish as nââ.Consider any b â (0, ÎŽ) and restrict c1 â (0, (ÎŽ â b)/2). Then the matrix A eStâ(2c1)âȘSâČ
has nÎŽ rows
and nÎŽâÎ(n) columns. Further A = A+ÎœG with Stâ(2c1) (and hence Stâ(2c1)âȘSâČ) independent of G.We can therefore use an upper bound on the condition number of randomly perturbed deterministicmatrices proved by Buergisser and Cucker [BC10] (see also Appendix D) to show that
P
{Ïmin(AeStâ(2c1)âȘSâČ) < c2Îœ; |Stâ(2c1)| †bn
}†(a1c2)
n(ÎŽâbâ2c1)+1 (5.34)
with a1 = a1((b + 2c1)/ÎŽ) bounded as long as (b + 2c1)/ÎŽ < 1 We can therefore select c2 = 1/(2a1)and select c1 small enough so that H(2c1) †(1/2)(ÎŽâ bâ 2c1) log 2. This ensures that the first termin Eq. (5.33) vanishes as nââ.
We are left with the task of selecting b â (0, ÎŽ), tâ â„ 0, so that the second term vanishes as well,since then we can take c1 â (0, (ÎŽâ b)/2). To this hand notice that by Proposition 14 (and using thefact that X + Ïtâ1Z has a density) we have, in probability,
limnââ
1
n|Stâ(c)| = P
{|X + Ïtââ1Z| â„ (1â c)Ξtââ1} ,
and further, since Ït â 0 as tââ (cf. Lemma 6.(a1)) and Ξt = αÏt, we have
limtâââ
P{|X + Ïtââ1Z| â„ (1â c)Ξtââ1} = Δ+ 2(1â Δ)Ί(â(1 â c)α) .
On the other hand, by Lemma 6.(a1), and since α â [αâ, α2), we have Δ+2(1â Δ)Ί(âα) < ÎŽ. Hencethere exist b0 â (0, ÎŽ) and c1 > 0 so that for all tâ large enough |Stâ(3c1)| †nb0 with high probability.Taking b â (b0, ÎŽ) and using Markov inequality (with tâČâ = tâ â 1)
P{|Stâ(2c1)| > bn
}†1
(bâ b0)nE{|Stâ(2c1) \ Stâ(3c1)|}+ P
{|Stâ(3c1)| > b0n
}
†1
(bâ b0)c21Ξ2tââ1n
nâ
i=1
E{(
(xtâČâ +ATztâČâ)i â (xtâČâ + ATztâČâ)i)2 â„ c21Ξ
2tâČâ
}+ P
{|Stâ(3c1)| > b0n
}
†o1(tâ; Îœ) + o2(tâ, Îœ;nâ1) + P
{|Stâ(3c1)| > b0n
},
where the last inequality follows from Proposition 14.4. LL terms can be made arbitrarily small bychoosing Μ small and n large enough.
37
In order to conclude the proof, we need to show that Claim 2 holds for eventually larger tâ. Firstnotice that, applying again Proposition 14.4, we get
P{âŁâŁStâ(c1) \ Stâ(2c1)
âŁâŁ â„ nc1/2}†2
nc1E{âŁâŁStâ(c1) \ Stâ(2c1)
âŁâŁ}
†2
nc1
nâ
i=1
E{(
(xtâČâ +ATztâČâ)i â (xtâČâ + ATztâČâ)i)2 â„ c21Ξ
2tâČâ
}†o1(tâ; Îœ) + o2(tâ, Îœ;n
â1) . (5.35)
By Proposition 15, and using the fact that the vector (X + Ztâ , X + Zt) has a density, we have, inprobability,
limnââ
1
n|St(c1) \ Stâ(c1)| = P
{|X + Ztââ1| â„ (1â c1)Ïtââ1; |X + Ztâ1| < (1â c1)Ïtâ1
}†h(tâ) ,
where, by Lemma 6.(a3), h(tâ) vanishes as tâ â â. Given any c1 > 0, we can therefore choose tâso that, with high probability |St(c1) \ Stâ(c1)| †nc1/2. Combining with Eq. (5.35), we obtain thedesired Claim.
5.3 Proof of Theorem 8, Ï > Ïâ(ÎŽ)
Fix a small number h > 0. By Lemma 6.(b), there exists â = â(ÎŽ, Δ) > 0 independent of h, suchthat, for α = α0(ÎŽ, pX) and t large enough
âŁâŁâŁ1ÎŽ
P{|X + ÏtZ| > αÏt
}â 1âŁâŁâŁ †h , (5.36)
âŁâŁRt,t â 2Rt,tâ1 +Rtâ1,tâ2
âŁâŁ †h2 , (5.37)
E{|η(X + ÏtZ;αÏt)|} < E{|X|} â 2â , (5.38)
as well as Ï2tâ1 †2Ï2
â . By Propositions 14, 15 (and noting that X + ÏtZ has a distribution that isabsolutely continuous with respect to Lebesgue measure), we have, with high probability,
maxiâ[m]
âŁâŁ(bt â 1)iiâŁâŁ †2h , (5.39)
âzt â ztâ1â2 †2hân , (5.40)
âxtâ1 †âx0â1 â nâ , (5.41)
âztâ2 †2Ïâân . (5.42)
Namely Eq. (5.36) implies (5.39), Eq. (5.37) implies (5.40), Eq. (5.38) implies (5.41), and the as-sumption Ï2
tâ1 †2Ï2â implies (5.42).
Using Eq. (5.9) together with the above, we get
ây âAxtâ2 †âzt â ztâ1â2 + maxiâ[m]
|(bt)ii â 1| âztâ1â2 †2hân (1 + 2Ïâ) . (5.43)
Define x = xt + AT(AAT)â1(y â Axt) (notice that the sample covariance matrix AAT has fullrank with high probability [BS98, BS05]). Notice that, by construction Ax = y. Then, with highprobability
âxâ xtâ2 †Ïmax(A)Ïmin(A)â2ây âAxtâ2 †C(ÎŽ)(1 + 2Ïâ)hân , (5.44)
38
where Ïmax(A), Ïmin(A) are the maximum and minimum non-zero singular values of A. The secondinequality holds with high probability for ÎŽ â (0, 1) by standard estimates on the singular values ofrandom matrices [BS98, BS05]. Using Eq. (5.41) together with triangular inequality and âxâxtâ1 â€ân âxâ xtâ2 we finally get
âxâ1 †âx0â1 â nâ + C(ÎŽ)(1 + 2Ïâ)hn < âx0â1 (5.45)
where the second inequality follows from the fact that h > 0 can be taken arbitrarily small (byletting t large) while â, C and Ïâ are fixed. We conclude that x0 cannot be the solution of the `1minimization problem (5.1).
Acknowledgments
A.M. is grateful to Amir Dembo, David Donoho and Van Vu for stimulating conversations. Thiswork was partially supported by the NSF CAREER award CCF- 0743978, the NSF grant DMS-0806211, and the AFOSR grant FA9550-10-1-0360. M.L. acknowledges the support of the FrenchAgence Nationale de la Recherche (ANR) under reference ANR-11-JS02-005-01 (GAP project).
A Proof of Proposition 14 and 15
In this appendix we prove Proposition 14 and 15 by a suitable application of Theorem 6. Beforepassing to these proofs, we establish a corollary of Theorem 6 that allows to control iterations of theform (5.6), (5.7), with η( · ; · ) replaced by a general polynomial.
A.1 A general corollary
For x0 = x0(n) â Rn and A = A(n) â R
mĂn as per Hypothesis 1 in Section 5, we define y = y(n) âR
m by
y = Ax0 . (A.1)
Let D â RnĂn be the diagonal matrix with diagonal entries equal to the square column norms of A,
that is Dii =â
jâ[m]A2ji, and Dij = 0 for i 6= j. Further define u0 = u0(n) â R
n as follows
u0,i = (Dii â 1)x0,i =( â
jâ[m]
A2ji â 1
)x0,i . (A.2)
Let x0 = (I âDâ1)x0 (notice that D is invertible with high probability) and define iteratively
zt = y âAxt + bt ztâ1 , (bt)ii =
â
jâ[n]
A2ijη
âČtâ1
(Djjx
tâ1j + (ATztâ1)j â u0,j
), (A.3)
xt+1 = ηt(Dxt +ATzt â u0), (A.4)
where, for each t, ηt : R â R is a polynomial and, for v â Rn, ηt(v) = (ηt(v1), . . . , ηt(vn)). Further
bt â RmĂm is a diagonal matrix with entries given as in Eq. (A.3).
39
We next introduce the corresponding state evolution recursion. Namely, we define {Rs,t}s,tâ„0
recursively for all s, t â„ 0 by letting
Rs+1,t+1 =1
ÎŽE{[ηs(X + Zs)âX][ηt(X + Zt)âX]
}. (A.5)
Here expectation is with respect to X ⌠pX and the independent Gaussian vector [Zs, Zt] with zeromean and covariance given by E{Z2
s} = Rs,s, E{Z2t } = Rt,t and E{ZtZs} = Rt,s. The boundary
condition is fixed by letting R0,0 = E{X2}/ÎŽ and defining, for each t â„ 0,
R0,t+1 =1
ÎŽE{[ηt(X + Zt)âX][âX]
}, (A.6)
with Zt ⌠N(0, Rt,t). This uniquely determines the doubly infinite array {Rt,s}t,sâ„0.
Corollary 16. Let {(x0(n), A(n), y(n))}nâ„0 be a sequence of triples with A(n) having independentsubgaussian entries with E{Aij} = 0, E{A2
ij} = 1/m, {x0,i(n) : i â [n]} independent and identicallydistributed with x0,i(n) ⌠pX , and pX a finite mixture of Gaussians. Define {xt, zt}tâ„0 as perEqs. (A.3), (A.4).
Then, for any fixed t, s â„ 0, and any Lipschitz continuous functions Ï : R Ă R Ă R â R,Ï : RĂ R â R, in probability
limnââ
1
n
nâ
i=1
Ï(x0,i, x
si + (ATzs)i, x
ti + (ATzt)i
)= EÏ(X,X + Zs, X + Zt) , (A.7)
limnââ
1
m
nâ
i=1
Ï(zsi , z
ti ) = EÏ(Zs, Zt) , (A.8)
where expectation is with respect to X ⌠pX and the independent Gaussian vector [Zs, Zt] with zeromean and covariance given by E{Z2
s} = Rs,s, E{Z2t } = Rt,t and E{ZtZs} = Rt,s.
Proof. Define xt+1 = ATzt +Dxt âDx0. Then Eqs. (A.3), (A.4) read
zt = Af(xt, x0; t) + bth(ztâ1; tâ 1) , (A.9)
xt+1 = AT h(xt; t) + dtf(xt, x0; t) , (A.10)
where, for i â [m], j â [n],
f(x, y; t) = y â ηtâ1(y + x) , h(z; t) = z , (A.11)
(bt)ii = ââ
jâ[n]
A2ijf
âČ(xtj , x0,i; t) , (A.12)
(dt)jj = ââ
jâ[n]
A2ijh
âČ(z; t) . (A.13)
(Here f âČ(x, y; t), hâČ(x; t) denote derivatives with respect to the first argument.) The iteration takesthe same form as in Eqs. (4.3), (4.4) with Y (i) = x0,i, and W (i) = 0, Bt = âbt and Dt = âdt.Further, the initial condition x0 implies x0 = âx0. Notice that this is dependent on Y = x0, but we
40
can easily set the initial condition at xâ1 = 0 and define f(x, y; t = 0) = ây. We can therefore applyTheorem 6 and conclude that, in probability
limnââ
1
n
nâ
i=1
Ï(x0,i, Dii(x
si â x0,i) + (ATzs)i, Dii(x
ti â x0,i) + (ATzt)i
)= EÏ(X,Zs, Zt) , (A.14)
limnââ
1
m
nâ
i=1
Ï(zsi , z
ti) = EÏ(Zs, Zt) , (A.15)
where expectations are defined as in the statement of the Corollary. The second of these equationscoincides with Eq. (A.8). For the first one, note that E{Dii} = 1 and, by a standard Chernoff bound
limnââ
max{Dii : i â [n]
}= 1 , (A.16)
limnââ
min{Dii : i â [n]
}= 1 . (A.17)
We therefore get
limnââ
1
n
nâ
i=1
Ï(x0,i, (x
s +ATzs)i â x0,i, (xti +ATzt)i â x0,i
)= EÏ(X,Zs, Zt) , (A.18)
which coincides with Eq. (A.7) after a redefinition of the function Ï.
A.2 Proofs of Propositions 14 and 15
We will start by proving Proposition 14. Since Proposition 15 follows from the same construction, wewill only point to the necessary modifications. Before presenting the proof, we recall a basic resultin weighted polynomial approximation (here stated for a specific case), see e.g. [Lub07].
Theorem 9. Let f : R â R be a continuous function. Then for any Îș, Ο > 0 there exists a polynomialp : R â R such that, for all x â R,
âŁâŁf(x)â p(x)âŁâŁ †Ο eÎșx2/2 . (A.19)
Proof of Propositions 14. Since the proposition holds as nââ at t fixed, we shall assume through-out that t â {0, 1, . . . , tmax} for some fixed arbitrarily large tmax.
We claim that, for each ÎČ, tmax > 0, we can construct an orbit {xÎČ,t, zÎČ,t}tâ„0 obeying Eqs. (A.3),
(A.4) for suitable functions ηt = η(ÎČ)t such that the following holds (with a slight abuse of notation
we will drop the parameter ÎČ from xÎČ,t, zÎČ,t). For all 0 †t †tmax, and all functions Ï as in thestatement, we have zt = y âAxt + bt z
tâ1 by construction. Further, in probability,
limnââ
maxiâ[m]
âŁâŁâŁâŁ (bt)ii â1
ÎŽP{|X + Ïtâ1Z| ℠αÏtâ1
}âŁâŁâŁâŁ †ÎČ , (A.20)
limnââ
1
nâxt+1 â η(xt +ATzt;αÏt)â2
2 †ÎČ , (A.21)
limnââ
âŁâŁâŁâŁâŁ1
n
nâ
i=1
Ï(x0,i, xti + (ATzt)i)â EÏ(X,X + ÏtZ)
âŁâŁâŁâŁâŁ †ÎČ . (A.22)
41
Assuming this claim holds, let {ÎČ`}`â„0 be a sequence such that lim`ââ ÎČ` = 0. Denote by {x`,t, z`,t}tâ„0
the orbit satisfying Eqs. (A.20), (A.21), (A.22) with ÎČ = ÎČ`. Let η`t = η
(ÎČ`)t be the correspond-
ing polynomial, and b`t be given per Eq. (A.3). Fix an increasing sequence of instance sizes
n1 < n2 < n3 < . . . , and let xt(n) = x`,t(n), zt(n) = z`,t(n) for all n` †n < n`+1. Choosing {n`}`â„0
that increases rapidly enough we can ensure that, for all n â„ n`,
maxiâ[m]
âŁâŁâŁâŁ (b`t)ii â
1
ÎŽP{|X + Ïtâ1Z| ℠αÏtâ1
}âŁâŁâŁâŁ †2ÎČ` , (A.23)
1
nâx`,t+1 â η(x`,t +ATz`,t;αÏt)â2
2 †2ÎČ` , (A.24)âŁâŁâŁâŁâŁ1
n
nâ
i=1
Ï(x0,i, x`,ti + (ATz`,t)i)â EÏ(X,X + ÏtZ)
âŁâŁâŁâŁâŁ †2ÎČ` . (A.25)
with probability larger than 1â ÎČ`. Points 1, 2, 3 in the proposition then follow since ÎČ` â 0.In order to prove Eqs. (A.20) to (A.22) we proceed as follows. It is easy to check that Ït > 0 for
all t, cf. Eq. (5.8). We use Theorem 9 to construct polynomials ηt such that
âŁâŁÎ·(x;αÏt)â ηt(x)âŁâŁ †Ο exp
{x2
16max(Ï2t , s
2)
}, (A.26)
for all x â R. Here Ο > 0 is a small parameter to be chosen below, and s2 is the smallest variance ofthe Gaussians that are combined in pX . Let Ït be defined by
Ï2t+1 =
1
ÎŽE{[ηt(X + Ït Z)âX]2} , (A.27)
with Z ⌠N(0, 1) independent from X ⌠pX , and Ï20 = E{X2}/ÎŽ. Notice that Ï2
t = Rtt. FromEqs. (5.8), (A.26), and (A.27), it is then straightforward to show that |Ï2
t â Ï2t | †C Ο for some
C = C(t).Given polynomials as defined by (A.26), we define {xt, zt}tâ„0 as per Eqs. (A.3), (A.4), with the
initial condition given there. Equation (A.22) follows immediately from Corollary 16 for Ο sufficientlysmall. Equation (A.21) also follows from the same Corollary, by taking
Ï(x1, x2, x3) ={ηt(x3)â η(x3;αÏt)
}2, (A.28)
and then using once again Eq. (A.26) on the resulting expression.Finally, consider Eq. (A.20). For economy of notation, we write
(bt)ii =â
jâ[n]
A2ijÏj , Ïi = ηâČtâ1(Djjx
tâ1j + (ATztâ1)j â u0,j) , (A.29)
and further define
bavt =
1
m
â
jâ[n]
Ïj . (A.30)
42
Then we have
E{(
(bt)ii â bavt
)4}=
â
j1,j2,j3,j4â[n]
E
{(A2
ij1 â1
m
)(A2
ij2 â1
m
)(A2
ij3 â1
m
)(A2
ij4 â1
m
)Ïj1Ïj2Ïj3Ïj3
}
=â
j1,j2,j3,j4â[n]
E(j1, j2, j3, j4)
Using the tree representation in Section 3.2, it is not hard to prove that the expectation on theright-hand side is bounded as follows
E(p, q, r, s) †K
n6, p, q, r, s distinct,
E(q, q, r, s) †K
n5, q, r, s distinct,
E(r, r, s, s) †K
n4, r, s distinct,
E(r, r, r, s) †K
n4, r, s distinct,
E(r, r, r, r) †K
n3.
Consider for instance the first case, p, q, r, s distinct. Using Lemma 3, each of Ïp, Ïq, Ïr Ïs can berepresented as a sum over trees with root type respectively at p, q, r, s. The weight of these trees isas in Lemma 3, times the prefactor (A2
ipâmâ1) · · · (A2isâmâ1). Let ” be the total number of edges in
these trees, plus 8 (two for each of the additional factors). Then any non-vanishing contribution is oforder nâ”/2. Let G be the graph obtained by identifying the vertices of the same type in these trees,and e(G) the number of its edges. Since each edge in G must be covered at least twice by the treesto get a non-zero expectation, and the edges in (i, p),. . . ,(i, s) at least once, we have 2e(G) + 4 †”.The number of vertices in G is at most e(G) + 1 (note that G is connected because it includes typei connected to p, q, r, s). Of these vertices all but 5 (whose type is i, p, q, r, s) can take an arbitrarytype, yielding a combinatorial factor of order ne(G)+1â5 †n”/2â6. Hence the sum over trees is oforder nâ”/2n”/2â6 = nâ6 as claimed.
Summing over j1, . . . , j4 de above bounds we obtain E{(
(bt)ii â bavt
)4} †K/n2 and therefore,by Markov inequality
limnââ
P
{maxiâ[m]
|(bt)ii â bavt | â„ nâ1/5
}= 0 . (A.31)
Since by standard concentration bounds maxiâ[n]Dii, miniâ[n]Dii â 1, we obtain, in probability,
limnââ
maxiâ[m]
(bt)ii = limnââ
miniâ[m]
(bt)ii = limnââ
bavt
= limnââ
1
m
â
jâ[n]
ηâČtâ1(xtâ1j + (ATztâ1)j)
=1
ÎŽE{ηâČtâ1(X + Ïtâ1 Z)
}
43
where, in the last step, we applied Corollary 16 to the polynomials ηâČtâ1, and X ⌠pX , Z ⌠N(0, 1)are independent. We are left with the task of showing that, by taking Ο small enough in Eq. (A.26),we can ensure that
âŁâŁE{ηâČtâ1(X + Ïtâ1 Z)
}â P{|X + Ïtâ1 Z| ℠αÏtâ1}
âŁâŁ †ÎČ ÎŽ . (A.32)
Indeed integrating by parts with respect to Z the above difference can be written as (for K a finiteconstant that can depend on t and change from line to line)
âŁâŁâŁâŁ1
Ïtâ1E{Zηtâ1(X + Ïtâ1 Z)
}â 1
Ïtâ1E{Zη(X + Ïtâ1 Z;αÏtâ1)
}âŁâŁâŁâŁ
†K E
âŁâŁâŁZηtâ1(X + Ïtâ1 Z)â Zη(X + Ïtâ1 Z;αÏtâ1)âŁâŁâŁ+K|Ïtâ1 â Ïtâ1|
†K ΟE
{exp
{X2 + Ï2tâ1X
2
4max(Ï2t , s
2)
}}+K|Ïtâ1 â Ïtâ1|
†KΟ +K|Ïtâ1 â Ïtâ1| .The claim follows by noting that, as argued above |Ïtâ1 â Ïtâ1| †K âČΟ.
Consider finally point 4. First recall that we constructed the vectors {xt, zt}tâ„0, using a sequenceof orbits {x`,t, z`,t}tâ„0, indexed by ` â N, that obey Eqs. (A.3), (A.4), and letting
xt(n) = x`,t(n) , zt(n) = z`,t(n) , for all n, with n` †n < n`+1. (A.33)
Claim 17. There exists a sequence {ÎČ`}`âN with lim`ââ ÎČ` = 0 such that, for all `âČ â„ `,
limnââ
1
n
â
iâ[n]
E{(
(x`âČ,t +ATz`âČ,t)i â (x`,t +ATz`,t)i)2} †ÎČ` , (A.34)
limnââ
1
m
â
iâ[m]
E{(z`âČ,ti â z`,t
i
)2} †ÎČ` . (A.35)
The proof of this claim is presented below. It follows from this claim that, by eventually redefiningn`âČ to be larger we can ensure
E{(
(x`âČ,t +ATz`âČ,t)I â (x`,t +ATz`,t)I)2} †2ÎČ` ,
E{(z`âČ,tJ â z`,t
J
)2} †2ÎČ` .
for all n â„ n`âČ . Here and below expectation is taken also with respect to I uniformly random in [n]and J uniformly random in [m]. By Eq. (A.33), for all n â„ n`, we also have
E{(
(xt +ATzt)I â (x`,t +ATz`,t)I)2} †2ÎČ` ,
E{(ztJ â z`,t
J
)2} †2ÎČ` .
Applying Lemma 4, we can then construct {xt, zt}tâ„0 as in the statement at point 4, such that
E{(
(xt + ATzt)I â (x`,t +ATz`,t)I)2} †K (Îœ2 + nâ1/2) ,
E{(ztJ â z`,t
J
)2} †K (Îœ2 + nâ1/2) ,
where K depends on ` but not on Îœ or n. Proof is finished by using triangular inequality and selecting` = `(Îœ, t) diverging slowly enough as Îœ â 0.
44
We now prove Claim 17.
Proof of Claim 17. To be definite we will focus on Eq. (A.34).Fis `, `âČ â N (not necessarily distinct). By an immediate generalization of Corollary 16, we have,
in probability
limnââ
1
n
â
iâ[n]
E{(x`,t +ATz`,t â x0)i(x`âČ,t +ATz`âČ,t â x0)i} = Qt
`,`âČ . (A.36)
Further, the quantities Qt`,`âČ satisfy the state evolution recursion
Qt+1`,`âČ =
1
ÎŽE
{[η`
t (X + Zt,`)âX][η`âČ
t (X + Zt,`âČ)âX]}, (A.37)
with initial condition Q0`,`âČ = (1/ÎŽ)E{X2}. Here expectation is taken with respect to X ⌠pX
and the independent centered Gaussian vector (Zt,`, Zt,`âČ) with covariance given by E{Z2`,t} = Qt
`,`,
E{Z2`âČ,t} = Qt
`âČ,`âČ , E{Z`,tZ`âČ,t} = Qt`,`âČ . In order to prove the claim, it is therefore sufficient to show
that
lim`ââ
sup`âČ: `âČâ„`
âŁâŁQt`,`âČ â Ï2
t
âŁâŁ = 0 , (A.38)
since this implies lim`ââ sup`âČ: `âČâ„`[Qt`,` â 2Qt
`,`âČ + Qt`âČ,`âČ ] = 0, which in turn implies the Claim, via
Eq. (A.36).Finally, recall that η`
t was constructed using Theorem 9, cf. Eq. (A.26), in such a way that, forall x â R,
âŁâŁÎ·(x;αÏt)â η`t (x)
âŁâŁ †Ο` exp
{x2
16max(Ï2t , s
2)
}, (A.39)
with Ο` â 0 as ` â â. The desired estimate (A.38) then follows by recalling that Ï2t+1 =
(1/ÎŽ)E{[η(X+ÏtZ)âX]2
}and using Eq. (A.39) inductively to show that
âŁâŁQt`,`âČâÏ2
t
âŁâŁ †K(t) Ο`.
We finally sketch the proof of Proposition 15.
Proof of Proposition 15. The sequence {xt, zt}tâ„0 is constructed as in the previous statement. Theproof hence follow by using Corollary 16, and taking Ο small enough in Eq. (A.26), since we canensure that |Rt,s â Rt,s| †ÎČâČ for any ÎČâČ > 0 and any t, s †tmax (as shown above for the caset = s).
B Proof of Lemma 5
Throughout the proof we denote by C1, C2, C3 etc, positive constants that depend uniquely onc1, . . . , c3.
Consider the `1 minimization problem
minimize âxâ1 ,
subject to y = Ax0 .
45
and denote by x any minimizer. Further, let v be a subgradient as in the statement, and define, forsome c â (0, 1),
S(c) âĄ{i â [n] : |vi| â„ 1â c
}. (B.1)
Also, let S(c) = [n] \ S(c) be the complement of this set. Notice that, by definition of subgradient,we have vi = sign(x0,i) for all i â S and |v0,i| †1 for all in S ⥠[n] \ S. This implies that S â S(c).
We have
âxâ1 = âx0â1 + ăv, (xâ x0)ă+R1 +R2 , (B.2)
R1 = âxS(c)â1 â âx0,S(c)â1 â ăvS(c), (xâ x0)S(c)ă , (B.3)
R2 = âxS(c)â1 â âx0,S(c)â1 â ăvS(c), (xâ x0)S(c)ă . (B.4)
Since S(c) â S, we have x0,S(c) = 0 and hence
R2 = âxS(c)â1 â ăvS(c), xS(c)ă =â
iâS(c)
(|xi| â vixi
)â„â
iâS(c)
(|xi| â (1â c)|xi|
)= câxS(c)â1 . (B.5)
On the other hand, vS(c) is in the subgradient of âxS(c)â1 at xS(c) = x0,S(c). Hence R1 â„ 0. It followsthat Eq. (B.2) implies âxâ1 â„ âx0â1 + ăv, (x â x0)ă+ câxS(c)â1. Since x is a minimizer, we thus get
âxS(c)â1 †â1
căv, (x â x0)ă = â1
căw, (x â x0)ă â€
Δ
c
ân âxâ x0â2 , (B.6)
where in the last step we used Cauchy-Schwarz together with assumption 1. Hereafter we let r âĄxâ x0.
Let S(c) = âȘK`=1S` be a partition such that nc/2 †|S`| †nc, and that |ri| †|rj| for each i â S`,
j â S`â1. If |S(c)| < nc/2, such a partition does not exist, but the argument follows by an obviousmodification of the one below. Further define S+ = âȘK
`=2S` â S(c) and S+ = [n] \ S+. We have
ârS+â22 =
Kâ
`=2
ârS`â22 â€
Kâ
`=2
|S`|(ârS`â1
â1
|S`â1|
)2
†4
nc
Kâ1â
`=1
ârS`â21 â€
4
ncârS(c)â2
1 . (B.7)
Fix c = c1. Since S(c) â S, we have rS(c) = xS(c) and using Eq. (B.6) we conclude that there exists
C1 †4/c31 such that
ârS+â22 †C1 Δ
2ârâ22 . (B.8)
On the other hand, by definition Ar = 0, and hence AS+rS+ +AS+rS+
= 0. Since S(c) â S, we have
S â S(c) â S+. Further S+ \ S(c) = S1, whence |S+ \ S(c)| †nc = nc1. By assumption 2, we haveÏmin(AS+) â„ c2 and therefore
ârS+â2 â€1
c2âAS+rS+â2 =
1
c2âAS+
rS+â2 â€
c3c2ârS+
â2 .
Combining this with Eq. (B.8), we deduce that ârâ2 †C2Δ ârâ2 for some C2 = C2(c1, c2, c3), whichin turns implies r = 0 provided that C2Δ < 1. The claim hence follows for Δ0 = 1/[2C2(c1, c2, c3)].
46
C Asymptotic analysis of state evolution: Proof of Lemma 6
Before proceeding, we introduce the following piece of notation (following [BM12]). Fix a probabilitydistribution pX on R, with pX({0}) = 1â Δ, and ÎŽ > 0. For Ξ, Ï2 > 0, we define
F(Ï2, Ξ) ⥠1
ÎŽE{[η(X + ÏZ; Ξ)âX
]2}, (C.1)
where expectation is taken with respect to the independent random variables X ⌠pX and Z âŒN(0, 1). When necessary, we will indicate the dependency on pX by F(Ï2, Ξ; pX). With this notationthe state evolution recursion reads Ï2
t+1 = F(Ï2t , αÏt). The following properties of the function F
were proved in [DMM09] (but see also [BM12], Appendix A for a more explicit treatment).
Lemma 7 ([DMM09]). For any α > 0, the mapping Ï2 7â F(Ï2, αÏ) is monotone increasing andconcave with F(0, 0) = 0 and
d
d(Ï2)F(Ï2, αÏ)
âŁâŁâŁâŁÏ=0
=1
ÎŽ
{Δ(1 + α2) + 2(1 â Δ)E
[(Z â α)2+
]}. (C.2)
It is also convenient to define
GΔ(α) ⥠Δ(1 + α2) + 2(1 â Δ)E{(Z â α)2+
}(C.3)
= Δ(1 + α2) + 2(1 â Δ)[(1 + α2)Ί(âα)â αÏ(α)
].
The first two derivatives of α 7â GΔ(α) will be used in the proof
GâČΔ(α) =2αΔ + 4(1â Δ)
[â Ï(α) + αΊ(âα)
], (C.4)
GâČâČΔ(α) =2Δ+ 4(1 â Δ)Ί(âα) . (C.5)
In particular, we have the following.
Lemma 8. For any Δ â (0, 1), α 7â GΔ(α) is strictly convex in α â R+, with a unique minimum onαâ(Δ) â (0,â). Further GΔ(0) = 1 and limαââGΔ(α) = â. Finally, the minimum value satisfies
GΔ(αâ) = Δ+ 2(1â Δ)Ί(âαâ) =1
2GâČâČ
Δ(αâ) â (0, 1) . (C.6)
Proof. By inspection of Eq. (C.5), GâČâČΔ(α) > 0 for all α > 0, hence GΔ(α) is strictly convex. Further,
from Eq. (C.4), we have GâČΔ(0) = â4(1â Δ)Ï(0) < 0 and GâČ
Δ(α) = 2αΔ+Oα(1) > 0 as αââ. Henceα 7â GΔ(α) has a unique minimum αâ(Δ) â (0,â).
Finally, Eq. (C.6) follows immediately by using the condition GâČΔ(αâ) = 0 in the expression
(C.3).
In our proof it is more convenient to use the coordinates (ÎŽ, Δ) instead of (Ï, ÎŽ). In terms of thelatter, the phase boundary (1.2), (1.3) reads
ÎŽâ(Δ) =2Ï(αâ(Δ))
αâ(Δ) + 2[Ï(αâ(Δ)) â αâ(Δ)Ί(âαâ(Δ))], (C.7)
αâ(Δ) solves αΔ+ 2(1 â Δ)[αΊ(âα)â Ï(α)
]= 0 . (C.8)
Notice that the use of the symbol αâ(Δ) in the last equations is not an abuse of notation. Indeedcomparing Eq. (C.8) with (C.4) we conclude that αâ(Δ) is indeed the unique solution of GâČ
Δ(α) = 0.Further, comparing Eq. (C.7) with Eq. (C.3) we obtain the following.
47
Lemma 9. Let (ÎŽ, Ïâ(ÎŽ)) be the phase boundary defined by Eqs. (1.2), (1.3). Then, for Ï, ÎŽ â [0, 1],Ï > Ïâ(ÎŽ) if and only if, for Δ â (0, 1), ÎŽ â (Δ, 1)
ÎŽ < ÎŽâ(Δ) ⥠minα>0
GΔ(α) . (C.9)
Viceversa Ï < Ïâ(ÎŽ) if and only if ÎŽ > ÎŽâ(Δ).
C.1 Proof of Lemma 6.(a): Ï < Ïâ(ÎŽ)
Proof of Lemma 6.(a1). We set α = αâ(Δ) ⥠arg minαâ„0GΔ(α). Hence we have, by Lemma 7, andLemma 9,
d
d(Ï2)F(Ï2, αâÏ)
âŁâŁâŁâŁÏ2=0
=1
Ύminα>0
GΔ(α) =ÎŽâ(Δ)
ÎŽ. (C.10)
In particular, by Lemma 9, for Ï < Ïâ(ÎŽ), we have dd(Ï2)
F(Ï2, αâÏ) ⥠Ïâ(Δ, ÎŽ) â (0, 1). Since, by
Lemma 7, Ï2 7â F(Ï2, αâÏ) is concave, it follows that Ï2t = B Ït
â[1 + ot(1)].Let S ⥠{α â R+ : GΔ(α)/ÎŽ < 1}. Since α 7â GΔ(α) is strictly convex by Lemma 8, with
GΔ(0), GΔ(â) > ÎŽ, we have S = (α1, α2) with 0 < α1 < αâ < α2 <â. Let Ï(α) = GΔ(α)/ÎŽ. Fixingα â (α1, α2), by concavity of Ï2 7â F(Ï2, αÏ), we have Ï2
t = B Ï(α)t[1+ot(1)]. Finally, by continuityof α 7â GΔ(α), we have {Ï(α) : α â (α1, α2)} = [Ïâ, 1) and hence any rate Ï â [Ïâ, 1) can berealized.
Finally by Lemma 8 GΔ(αâ) ⥠Δ+2(1âΔ)Ί(âαâ) < ÎŽ. Since α 7â Δ+2(1âΔ)Ί(âα) is decreasingin α, the last claim follows.
In the proof of part (a2) we will make use of the following analytical result.
Lemma 10. For Δ â (0, 1), α ℠αâ(Δ), consider the function Fα,Δ : [0, 1] â R defined by
Fα,Δ(Q) ⥠1
GΔ(α)E
{[η(Xâ + Z1;α) âXâ][η(Xâ + Z2;α) âXâ]
}, (C.11)
where expectation is taken with respect to Xâ, P{Xâ = 0} = 1 â Δ, P{Xâ â {+â,ââ}} = Δ,and the independent Gaussian vector (Z1, Z2) with mean zero and covariance E{Z2
1} = E{Z22} = 1,
E{Z1Z2} = Q. (The mapping x 7â [η(x + a; b) â x] is here extended to x = +â,ââ by continuityfor any a, b bounded.)
Then Fα,Δ is increasing and convex on [0, 1] with Fα,Δ(1) = 1 and F âČα,Δ(1) < 1. In particular
Fα,Δ(Q) > Q for all â [0, 1)
Proof. It is convenient to change variables and let Q = eâs. If we let {Us}sâR denote the standardOrnstein-Uhlenbeck process, dUs = âUsds+
â2 dBs with {Bs}sâR the standard Brownian motion.
Then Fα,Δ(Q) = Fα,Δ(â log(Q)), with
Fα,Δ(s) âĄ1
GΔ(α)E{[η(Xâ + U0;α)âXâ][η(Xâ + Us;α) âXâ]
}. (C.12)
A simple calculation yields
d
dsFα,Δ(s) = â 1
GΔ(α)E{ηâČ(Xâ + U0;α)ηâČ(Xâ + Us;α)
}eâs , (C.13)
48
where ηâČ( · ;α) denotes the derivative of η with respect to its first argument. By the spectral decom-position of the Ornstein-Uhlenbeck process, we have, for any function Ï â L2(R)
E{Ï(U0)Ï(Us)
}=
ââ
k=1
eâλk s ck(Ï)2 , (C.14)
for some non-negative {λk}kâ„1. In particular es dds Fα,Δ(s) is strictly negative and increasing in s.
We therefore obtain
d
dQFα,Δ(Q) =
1
GΔ(α)E{ηâČ(Xâ + Z1;α)ηâČ(Xâ + Z2;α)
}, (C.15)
Which is strictly positive and increasing in Q. Hence Q 7â Fα,Δ(Q) is increasing and strictly convex.Finally, since ηâČ(y;α) = 1(|y| ℠α), we have
d
dQFα,Δ(Q)
âŁâŁâŁâŁQ=1
=1
GΔ(α)P{|Xâ + Z| > α} =
1
GΔ(α)
{Δ+ 2(1 â Δ)Ί(âα)} =
GâČâČΔ(α)
2GΔ(α). (C.16)
Since by Lemma 8 α 7â GΔ(α) is strictly increasing over (αâ(Δ),â) and by Eq. (C.5) α 7â GâČâČΔ(α) is
strictly decreasing over R+, we have
d
dQFα,Δ(Q)
âŁâŁâŁâŁQ=1
<GâČâČ
Δ(αâ(Δ))
2GΔ(αâ(Δ))= 1 , (C.17)
where the last equality follows again by Lemma 8. This conclude the proof.
We are now in position to prove part (a2) of Lemma 6.
Proof of Lemma 6.(a2). Throughout the proof we fix α â (αâ(Δ, ÎŽ), α2(Δ, ÎŽ)). Let the sequence{Ï2
t }tâ„0 be given as per the state evolution equation (5.8). Define Qt ⥠Rt,tâ1/(ÏtÏtâ1). By Propo-sition 15, Qt is the covariance of two gaussian random variables of variance 1. Hence |Qt| †1. UsingEq. (5.15) we further have
Qt+1 = Ft(Qt) , (C.18)
Ft(Q) =Ïtâ1
ÎŽÏt+1E
{[η(XÏt
+ Z1;α)â X
Ït
][η( X
Ïtâ1+ Z2;α
)â X
Ïtâ1
]}, (C.19)
were expectation is taken with respect to X ⌠pX and the independent Gaussian random vector(Z1, Z2) with zero mean and covariance E{Z2
1} = 1, E{Z22} = 1, E{Z1Z2} = Qt. By induction it is
easy to check that Qt â„ 0 for all t.For α â (α1, α2), by part (a1) we have Ït â 0. Hence X/Ït converges in distribution (over the
completed real line) to a random variable Xâ ⌠(1â Δ)ÎŽ0 + Δ+ÎŽ+â + ΔâÎŽââ where Δ+ ⥠P{X > 0},Δâ ⥠P{X < 0}, Δ = Δ+ + Δâ. Hence the expectation in Eq. (C.19) converges pointwise to
E{[η(Xâ + Z1;α
)âXâ
][η(Xâ + Z2;α
)âXâ
]}. (C.20)
(Notice that this expectation depends on the distribution of Xâ only through Δ, because of thesymmetry properties of the function η.)
49
Further, by the proof of part (a1), as tââ we have Ï2t â 0 and
Ï2t+1 =
d
d(Ï2)F(Ï2, αâÏ)
âŁâŁâŁâŁÏ=0
Ï2t + o(Ï2
t ) =1
ÎŽGΔ(αâ)Ï
2t + o(Ï2
t ) . (C.21)
Hence
limtââ
Ïtâ1
Ït+1=
ÎŽ
GΔ(α). (C.22)
Comparing Eqs. (C.11) and (C.19) we conclude that, for any Q â [0, 1]
limtââ
Ft(Q) = Fα,Δ(Q) . (C.23)
Further the convergence is uniform, since the functions Ft are uniformly Lipschitz (see proof ofLemma 10 above).
Consider now the sequence {Qt}tâ„0 and let Qâ = lim inf tââQt. Since Qt â [0, 1] for all t, wehave Qâ â [0, 1] as well. We claim that in fact Qâ = 1 and therefore limtââQt = 1, which impliesthe thesis.
In order to prove the claim, let {Qt(k)}kâN be a subsequence that converges to Qâ. Then
Qâ = limkââ
Ft(k)â1(Qt(k)â1) = limkââ
Fα,Δ(Qt(k)â1) â„ Fα,Δ(lim infkââ
Qt(k)â1) â„ Fα,Δ(Qâ) , (C.24)
where, in the last step, we used the fact that Fα,Δ( · ) is monotone increasing. Since Fα,Δ(q) > q forall q â [0, 1) by Lemma 10, we conclude that Qâ = 1.
Before proving (a3) of Lemma 6, we establish one more technical result.
Lemma 11. Let pX be a probability measure on the real line such that pX({0}) = 1 â Δ andEpX
{X2} < â, Assume pX to be such that max(pX((0, a)), pX ((âa, 0))) †Bab for some B, b > 0.Then, letting Xâ ⌠(1 â Δ)ÎŽ0 + Δ+ÎŽ+â + ΔâÎŽââ (with the notation introduced above, namely,Δ+ = pX(0,+â) and Δâ = pX(ââ, 0)):
âŁâŁâŁE{[η(XÏt
+ Z1;α)â X
Ït
][η( X
Ïtâ1+ Z2;α
)â X
Ïtâ1
]}(C.25)
â E{[η(Xâ + Z1;α
)âXâ
][η(Xâ + Z2;α
)âXâ
]}âŁâŁâŁ †BâČ(Ïbt + Ïb
tâ1) ,
for an eventually different constant B âČ. Here expectation is taken with respect to X ⌠pX and theindependent Gaussian random vector (Z1, Z2) with zero mean and covariance E{Z2
1} = 1, E{Z22} = 1,
E{Z1Z2} = Qt, and
F(Ï2, Ξ) =dF
d(Ï2)(Ï2;αÏ)
âŁâŁâŁâŁÏ=0
Ï2 +O(Ï2+b) . (C.26)
Proof. By triangular inequality, the left hand side of Eq. (C.25) can be upper bounded as D1 +D2
whereby
D1 ⥠E
{[η(XÏt
+ Z1;α)â X
Ïtâ η(Xâ + Z1;α
)+Xâ
][η( X
Ïtâ1+ Z2;α
)â X
Ïtâ1
]},
D2 ⥠E
{[η(Xâ + Z1;α
)âXâ
][η( X
Ïtâ1+ Z2;α
)â X
Ïtâ1â η(Xâ + Z2;α
)+Xâ
]}.
50
Here X and Xâ are coupled in such a way that X = 0 if and only if Xâ = 0 and the two variableshave the same sign in the other case. We focus on bounding D1 since D2 can be treated along thesame lines. Letting R(x; Ξ) = η(x; Ξ)â x, we have
D1 = E
{[R(XÏt
+ Z1;α)âR
(Xâ + Z1;α
)][R( X
Ïtâ1+ Z2;α
)+ Z2
]}= D1,a +D1,b ,
D1,a = E
{[R(XÏt
+ Z1;α)âR
(Xâ + Z1;α
)]R( X
Ïtâ1+ Z2;α
)},
D1,b = Qt E
{[RâČ(XÏt
+ Z1;α)âRâČ
(Xâ + Z1;α
)]},
where in the last line we used Steinâs lemma to integrate over Z2, and RâČ denotes derivative withrespect to the first argument. Once again the two terms are treated along the same lines, and wewill only consider D1,a. We have
|D1,a| †αE
{âŁâŁâŁR(XÏt
+ Z1;α)âR
(Xâ + Z1;α
)âŁâŁâŁ}
†αΔ+E
{âŁâŁâŁR(X+
Ït+ Z1;α
)âR(+â;α)
âŁâŁâŁ}
+ αΔâE
{âŁâŁâŁR(Xâ
Ït+ Z1;α
)âR(ââ;α)
âŁâŁâŁ}, (C.27)
where X+ (resp. Xâ) is distributed as X conditioned on X > 0 (resp. X < 0). The functionx 7â R(x;α) â R(â;α) is monotone decreasing, equal to 2α for x †âα and to 0 for x ℠α. HenceR(x) ⥠EZ1{|R(x + Z1;α) â R(+â;α)|} is monotone decreasing, takes values in (0, 2α) and upperbounded by Ceâx2/4 for x â„ 0. Denoting by F+ the distribution of X+, we have
E
{âŁâŁâŁR(X+
Ït+ Z1;α
)âR(+â;α)
âŁâŁâŁ}
= ER(X+/Ït) =
â« â
0|RâČ(x)|F (xÏt) dx †BâČÏb
t .
The other term in Eq. (C.27) is bounded by the same argument. This concludes the proof ofEq. (C.25).
The proof of Eq. (C.26) follows from Eq. (C.25) if we notice that
F(Ï2, αÏ) =Ï2
ÎŽE
{[η(XÏ
+ Z;α)âX
]2},
dF
d(Ï2)(Ï2;αÏ)
âŁâŁâŁâŁÏ=0
= E{[η(Xâ + Z;α
)âXâ
]2}.
The last lemma has a useful consequence that we will exploit in the ensuing proof of Lemma6.(a3).
Corollary 18. Let Fα,Δ(Q) be defined as per Eq. (C.11) and Ft(Q) defined as per Eq. (C.19) withpX , α, Δ satisfying the conditions of Lemma 6.(a3). Then there exists a constants B,B âČ, b > 0depending on pX such that
supQâ[0,1]
âŁâŁâŁFt(Q)âFα,Δ(Q)âŁâŁâŁ †B Ïb
t †BâČÏbt/2 .
51
Proof. The second inequality follows from the first one using Lemma 6.(a1). Using Eq. (C.26), wehave
Ï2tâ1
Ï2t+1
=Ï2
t
F(Ï2t ;αÏt)
· Ï2tâ1
F(Ï2tâ1;αÏtâ1)
=ÎŽ2
GΔ(α)2{1 +O(Ïb
t , Ïbtâ1
}.
The proof of the corollary is obtained by noting that Ït = Î(Ïtâ1) and applying Eq. (C.25) to theexpectation in Eq. (C.19).
Proof of Lemma 6.(a3). Define, as in the proof of part (a2), Qt ⥠Rt,tâ1/(ÏtÏtâ1), and recall that
Qt+1 = Ft(Qt) .
By Corollary 18, and Lemma 10, it follows that Qt â„ 1âAÏ2t for some constants A > 0, Ï â (0, 1).Indeed
Qt+1 â„ Fα,Δ(Qt)âBâČÏbt/2 â„ 1âBâČÏbt/2 âF âČα,Δ(1)(1 âQt) .
and the claim follows by noting that F âČα,Δ(1) â (0, 1) by Lemma 10.
Next, consider a sequence of centered Gaussian random variables (Zt)tâ„0 with covariance E{ZtZs} =Rt,s. By triangular inequality, we have, for any t < s,
(2â 2
Rt,s
ÏtÏs
)1/2= E
{(Zt
Ïtâ Zs
Ïs
)2}1/2â€
sâ
k=t+1
E
{(Zk
Ïkâ Zkâ1
Ïkâ1
)2}1/2=
sâ
k=t+1
(2â 2Qk)1/2 †AâČÏt .
(C.28)
Next consider the quantity in Eq. (5.19). We have
supt,sâ„t0
P{|X + Zs| â„ c Ïs ; |X + Zt| < cÏt
}
†suptâ„t0
P{|X + Zt| < cÏt; X 6= 0
}+ sup
t,sâ„t0
P{âŁâŁZs/Ïs
âŁâŁ â„ c ;âŁâŁZt/Ït
âŁâŁ < c ; X = 0}
= suptâ„t0
P{|X/Ït + Zt| < c ; X 6= 0
}+ sup
t,sâ„t0
P{âŁâŁZs
âŁâŁ â„ c ;âŁâŁZt
âŁâŁ < c}, (C.29)
where (Zs, Zt) are Gaussian with E{Z2t } = E{Z2
s} = 1, and E{ZsZt} = Rt,s/(ÏtÏs). The first termin Eq. (C.29) vanishes as t0 ââ since Ït â 0 as tââ, and the second vanishes by Eq. (C.28).
C.2 Proof of Lemma 6.(b): Ï > Ïâ(ÎŽ)
Proof of Lemma 6.(b1), (b2). First notice that, with the definitions given in the previous section
limÏ2ââ
d
d(Ï2)F(Ï2, αâÏ) =
2
ÎŽE{(Z â α)2+
}
=2
ÎŽ
{(1 + α2)Ί(âα) â αÏ(α)
}.
Notice that the right hand side is equal to 2/ÎŽ for α = 0, monotonically decreasing in α, and vanishingas αââ. Hence there exists αmin(Δ, ÎŽ) such that the right hand side is smaller than 1 if and only if
52
α > αmin(Δ, ÎŽ). Further, Ï2 7â F(Ï2, αÏ) is concave with F(0, 0) = 0 and first derivative larger than1 at Ï2 = 0 (cf. Lemma 7). It follows that for α > αmin(Δ, ÎŽ) there exists a unique Ïâ(ÎŽ, pX ) suchthat F(Ï2, αÏ) > Ï2 for all Ï â (0, Ïâ) and F(Ï2, αÏ) < Ï2 for Ï â (Ïâ,â). It follows that Ï2
t â Ïâfor any Ï2
0 6= 0. This proves the first part of claim (b1).Letting Ï2
â = Ï2â(α), it is easy to check that α 7â Ï2
â(α) is continuous for α â (αmin,â) withlimαâαmin
Ï2â(α) = +â (the limit being taken from the left), and limαââ Ï2
â(α) = +E{X2}/ÎŽ > 0.As a consequence
limαâαmin
P{|X + ÏâZ| ℠αÏâ} = 2Ί(âαmin) , (C.30)
limαââ
P{|X + ÏâZ| ℠αÏâ} = 0 . (C.31)
Notice that by the definition of αmin given above, we have
2Ί(âαmin)â 2αmin
{Ï(αmin)â αminΊ(âαmin)
}= ÎŽ .
Since Ï(z) > zΊ(âz) for z > 0, it follows that limαâαminP{|X + ÏâZ| ℠αÏâ} > ÎŽ. We define
α0(ÎŽ, pX ) ⥠sup{α > αmin(Δ, ÎŽ) : P{|X + ÏâZ| ℠αÏâ} â„ ÎŽ
}. (C.32)
By the above α0 â (αmin,â). Further, by continuity, for α = α0, P{|X +ÏâZ| ℠αÏâ} = ÎŽ. We thusproved claim (b2).
In order to prove the second statement in (b1), we proceed analogously to part (a2), and defineQt ⥠Rt/(ÏtÏtâ1). This sequence satisfies the recursion (C.18) with Ft defined as per Eq. (C.19).As tââ we have Ït â Ïâ and hence Ft converges uniformly to a limit that we denote by an abuseof notation Fα,ÎŽ,pX
, where
Fα,Ύ,pX(Q) ⥠1
ÎŽE
{[η(XÏâ
+ Z1;α)â X
Ïâ
][η(XÏâ
+ Z2;α)â X
Ïâ
]}(C.33)
Proceeding as in the proof of Lemma 10, we conclude that Q 7â Fα,ÎŽ,pX(Q) is increasing and convex
on [0, 1]. Further (for Z ⌠N(0, 1))
Fα,Ύ,pX(1) =
1
ÎŽE
{[η(XÏâ
+ Z1;α)â X
Ïâ
]2}=
1
Ï2â
F(Ï2â , αÏâ) = 1 . (C.34)
Finally, for α ℠α0(Ύ, pX ),
d
dQFα,Ύ,pX
(Q)
âŁâŁâŁâŁQ=1
=1
ÎŽP
{âŁâŁâŁXÏâ
+ Z1
âŁâŁâŁ > α}†1 , (C.35)
and therefore Fα,ÎŽ,pX(Q) > Q for all Q â [0, 1). Hence, proceeding again as in the proof of part (a2)
we conclude that limtââQt = 1 and therefore limtââRt,tâ1 = Ï2â as claimed.
Proof of Lemma 6.(b3). Throughout this proof we fix pX = (1 â Δ)ÎŽ0 + Δ Îł, ÎŽ â (Δ, ÎŽâ(Δ)). By part(b1), we have limtââ E{|η(X+ÏtX;αÏt)|} = E{|η(X+ÏâZ;αÏâ)|}. It is therefore sufficient to provethat E{|η(X + ÏâZ;αÏâ)|} < E{|X|}.
Consider the function E : (Ï2, Ξ) 7â E(Ï2, Ξ) defined on R+ Ă R+) by
E(Ï2, Ξ) ⥠â1
2(1â ÎŽ)
Ï2
Ξ+ E min
sâR
{ 1
2Ξ(sâX â ÏZ)2 + |s|
}, (C.36)
53
where expectation is taken with respect to X ⌠pX and Z ⌠N(0, 1). Notice that the minimum overs â R is uniquely achieved at s = η(X + Ï Z; Ξ). It is not hard to compute the partial derivatives
âEâΞ
(Ï2, Ξ) = â ÎŽ
2Ξ2
{(1â 2
ÎŽP{|X + ÏZ| ℠Ξ}
)Ï2 + F(Ï2, Ξ)
}, (C.37)
âEâÏ2
(Ï2, Ξ) =ÎŽ
2Ξ
{1â 1
ÎŽP{|X + ÏZ| ℠Ξ}
}, (C.38)
where F(Ï2, Ξ) is defined as per Eq. (C.1). Using these expressions in Eq. (C.36) we conclude that
âEâΞ
(Ï2, Ξ) =âEâÏ2
(Ï2, Ξ) = 0 â E(Ï2, Ξ) = E{|η(X + ÏZ; Ξ)|} (C.39)
In particular, one can check from Eqs. (C.37), (C.37) that a stationary point4 is given by settingÏ = Ïâ(ÎŽ, pX) and Ξ = Ξâ(ÎŽ, pX ) ⥠α0(ÎŽ, pX )Ïâ(ÎŽ, pX).
Define E(Ï2) = E(Ï2, α0(ÎŽ, pX)Ï). Using again Eqs. (C.37), (C.38) we get
dE
dÏ2(Ï2) =
ÎŽ
4αÏ3
{Ï2 â F(Ï2, α0Ï)
}. (C.40)
In particular, as a consequence of Lemma 7, and of the analysis at point (b1), we have dEdÏ2 < 0 for
Ï2 â (0, Ï2â) (C.37). Therefore, setting α = α0(ÎŽ, pX), we have
E{|η(X + ÏâZ;αÏâ)|} = E(Ï2â) < lim
Ïâ0E(Ï2)
= â limÏâ0
1
2αÏ(1â ÎŽ) + lim
Ïâ0
Ï
2αE
{[η(XÏ
+ Z;α)â X
Ïâ Z
]2}+ lim
Ïâ0E{|η
(X + ÏZ;αÏ)|}
= limÏâ0
Ï
2αα2 + E{|X|} = E{|X|} .
This concludes the proof.
D Reference results
The following calculus fact is used in the main text.
Lemma 12. For all s, x > 0 we have xs â€(
se
)sex.
Proof. Since f(x) = ln(x) for x > 0 is concave, when x â„ s then
ln(x)â ln(s)
xâ s†f âČ(s) =
1
s(D.1)
This is equivalent to (x/s)s †exâs which proves the result. The case of x < s is proved similarly.
We also use an estimate on the minimum singular value of perturbed rectangular matrices, whichwas proved in [BC10, Theorem 1.1].
Theorem 10. For M,N â N, N †(1 â a)M , let B â RMĂN , âBâ2 †1/a be any deterministic
matrix and G â RMĂN be a matrix with i.i.d. entries Gij ⌠N(0, 1/M). Then there exist constants
a1, a2 depending only on a and bounded for a > 0 such that, for all z < a2,
P
{ÏN
(A+ Μ G
)†Μ z
}†(a1 z)
MâN+1 . (D.2)
4Indeed this is the unique saddle point of the function (Ξâ1, Ï2) 7â E(Ξ, Ï2) as it can be proved by the generalminimax theorem.
54
References
[AGZ09] G. W. Anderson, A. Guionnet, and O. Zeitouni, An introduction to random matrices,Cambridge University Press, 2009.
[ALPTJ11] R. Adamczak, A.E. Litvak, A. Pajor, and N. Tomczak-Jaegermann, Restricted isome-try property of matrices with independent columns and neighborly polytopes by randomsampling, Constructive Approximation (2011), 61â88.
[AS92] R. Affentranger and R. Schneider, Random projections of regular simplices, Discr. andComput. Geometry 7 (1992), 219â226.
[BC10] P. Buergisser and F. Cucker, Smoothed analysis of moore-penrose inversion, SIAM J.Matr. Anal. and Appl. (2010), no. 31, 2769â2783.
[BM12] M. Bayati and A. Montanari, The LASSO risk for gaussian matrices, IEEE Trans. onInform. Theory 58 (2012), 1997â2017.
[BS98] Z. Bai and J. Silverstein, No eigenvalues outside the support of the limiting spectraldistribution of large-dimensional sample covariance matrices, Ann. Probab. 26 (1998),316â345.
[BS05] , Spectral Analysis of Large Dimensional Random Matrices, Springer, 2005.
[DMM09] D. L. Donoho, A. Maleki, and A. Montanari, Message Passing Algorithms for Com-pressed Sensing, Proceedings of the National Academy of Sciences 106 (2009), 18914â18919.
[DMM11] D.L. Donoho, A. Maleki, and A. Montanari, The Noise Sensitivity Phase Transition inCompressed Sensing, IEEE Trans. on Inform. Theory 57 (2011), 6920â6941.
[Don05a] D. L. Donoho, High-dimensional centrally symmetric polytopes with neighborliness pro-portional to dimension, Discrete Comput. Geom. (2005), 617652.
[Don05b] , Neighborly polytopes and sparse solution of underdetermined linear equations,Technical Report, Statistics Department, Stanford University, 2005.
[DT05a] D. L. Donoho and J. Tanner, Neighborliness of randomly-projected simplices in highdimensions, Proceedings of the National Academy of Sciences 102 (2005), no. 27, 9452â9457.
[DT05b] , Sparse nonnegative solution of underdetermined linear equations by linear pro-gramming, Proceedings of the National Academy of Sciences 102 (2005), no. 27, 9446â9451.
[DT09] , Counting faces of randomly projected polytopes when the projection radicallylowers dimension, Journal of American Mathematical Society 22 (2009), 1â53.
[DT11] D. L. Donoho and J. Tanner, Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,Phil. Trans. R. Soc. A (2011), 4273â4293.
55
[KWT09] Y. Kabashima, T. Wadayama, and T. Tanaka, A typical reconstruction limit for com-pressed sensing based on lp-norm minimization, J.Stat. Mech. (2009), L09003.
[Lub07] D.S. Lubinsky, A survey of weighted polynomial approximation with exponential weights,Approcimation Theory 3 (1007), 1â105.
[MAYB11] A. Maleki, L. Anitori, A. Yang, and R. Baraniuk, Asymptotic Analysis of ComplexLASSO via Complex Approximate Message Passing (CAMP), arXiv:1108.0477, 2011.
[Ran11] S. Rangan, Generalized Approximate Message Passing for Estimation with Random Lin-ear Mixing, IEEE Intl. Symp. on Inform. Theory (St. Perersbourg), August 2011.
[RFG09] S. Rangan, A. K. Fletcher, and V. K. Goyal, Asymptotic analysis of map estimation viathe replica method and applications to compressed sensing, Neural Information Process-ing Systems (NIPS) (Vancouver), 2009.
[Sch10] P. Schniter, Turbo Reconstruction of Structured Sparse Signals, Proceedings of the Con-ference on Information Sciences and Systems (Princeton), 2010.
[TV12] T. Tao and V. Vu, Random matrices: The Universality phenomenon for Wigner ensem-bles, arXiv:1202.0068, 2012.
[VS92] A. M. Vershik and P. V. Sporyshev, Asymptotic behavior of the number of faces of ran-dom polyhedra and the neighborliness problem, Selecta Math. Soviet. 11 (1992), 181201.
56