Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Convex Stochastic and Large-ScaleDeterministic Programming via Robust
Stochastic Approximation and its Extensions
Arkadi NemirovskiH. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
Joint research with Anatoli Juditsky†, Guanghui Lan‡,and Alexander Shapiro§
†: Joseph Fourier University, Grenoble, France; ‡: ISyE, University of Florida;§: ISyE, Georgia Tech
BFG’09, Leuven September 14-18, 2009
Robust Stochastic Approximation and Extensions
Overview
Convex Stochastic Minimization and Convex-ConcaveStochastic Saddle Point problemsClassical Stochastic ApproximationRobust Stochastic Approximation & Stochastic Mirror Prox
constructionefficiency estimateshow it works
Acceleration via RandomizationMatrix Game & `1-minimization with uniform fit`1-minimization with Least Squares fitMinimizing maxima of convex polynomials over a simplex
Robust Stochastic Approximation and Extensions
Convex Stochastic Minimization Problem
♣ The basic Convex Stochastic Minimization problem is
minx∈X
f (x) (SM)
• X ⊂ Rn: – convex compact set• f (x) : X → R: convex and Lipschitz continuous objectivegiven by a Stochastic Oracle.♠ At i-th call to the oracle, the input being x ∈ X , the oraclereturns stochastic gradient G(x , ξi) ∈ Rn of f , with independentξi ∼ P, i = 1,2, ... and
Eξ∼PG(x , ξ) ∈ ∂f (x) ∀x ∈ X .
♣ In many cases (but not always!) G(x , ξ) ∈ ∂xF (x , ξ), whereF (x , ξ) is convex in x . Here (SM) is the problem of minimizingthe expectation f (x) = EF (x , ξ) of a convex in x integrantF (x , ξ) given by a first order oracle.
Robust Stochastic Approximation and Extensions
Convex-Concave Stochastic Saddle Point Problem
♣ Convex Stochastic Minimization problem is a particular caseof Convex-Concave Stochastic Saddle Point problem
minx∈X
maxy∈Y
f (x , y) (SP)
• Z = X × Y ⊂ Rn = Rnx × Rny ⊂ Rn: convex compact set• f (x , y) : X × Y → R: convex in x ∈ X and concave in y ∈ YLipschitz continuous function given by a Stochastic Oracle.♠ At i-th call to the oracle, the input being z = (x , y) ∈ Z ,Stochastic Oracle returns “stochastic derivative”
G(z, ξi) = [Gx (z, ξi); Gy (z, ξi)] ∈ Rnx × Rny
of f , with independent ξi ∼ P i = 1,2, ... andF (x , y) := Eξ∼PG(x , y , ξ) ∈ ∂x f (x , y)× ∂y (−f (x , y))
Robust Stochastic Approximation and Extensions
Classical Stochastic Approximation
minx∈X
f (x) (SM)
minx∈X
maxy∈Y
f (x , y) (SP)
♣ The very first method for solving (SM) is the classicalStochastic Approximation (Robbins & Monro ’51) which mimicsthe usual Subgradient Descent:
xt+1 = argminu∈X1
2‖u − xt‖22 + γt〈G(xt , ξt ),u〉
⇔ xt+1 = closest to xt − γtG(xt , ξt ) point of X, γt = a
t
•When f is strongly convex on X , SA with properly chosen aensures that E‖xt − argminx f‖22 ≤ O(1/t).• If, in addition, argminX f ∈ intX and f ′ is Lipschitz continuous,then also Ef (xt )−minx f ≤ O(1/t).♠ Similar construction, with similar results, works for (SP) withstrongly convex-concave f .
Robust Stochastic Approximation and Extensions
Classical Stochastic Approximation (continued)
• Good news on Classical SA: with smooth strongly convex fattaining its minimum in intX, the rate of convergence
Ef (xt )−minx f ≤ O(1/t)is the best allowed under circumstances by Laws of Statistics.• Bad news on Classical SA: it requires strong convexity ofthe objective f and is highly sensitive to the scale factor a in thestepsize rule γt = a
t .• To choose a properly, we need to know the modulus of strongconvexity of f , and overestimating this modulus may result indramatic slowing down the convergence.⇒The Stochastic Optimization community commonly treats SAas a highly unstable and impractical optimization technique.
Robust Stochastic Approximation and Extensions
Robust Stochastic Approximation
xt+1 = argminu∈X1
2‖u − xt‖22 + γt〈G(xt , ξt ),u〉, γt = a
t (CSA)
♣ It was discovered long ago [Nem.&Yudin’79] that SA admits arobust version which does not require strong convexity of theobjective in (SM). To make SA robust, it suffices• to replace “short steps” γt = a
t with “long steps” γt = a√t, and
• to add to (CSA) the averaging x t =[∑t
τ=1γτ
]−1∑tτ=1γτxτ
and to treat, as approximate solutions, the averages x t ratherthan the search points xt .
♠ Another important improvement (in its rudimentary formgoing back to [Nem.&Yudin’79]) is to replace the quadratic proxterm 1
2‖u − xt‖22 in (CSA) with a more general prox term, whichallows to adjust, to some extent, the method to the “geometry”of the problem of interest.
Robust Stochastic Approximation and Extensions
Robust Stochastic Approximation[Jud.&Lan&Nem.&Shap.’09]
minx∈X
maxy∈Y
f (x , y)
G(x , y , ξ) : EξG(x , y , ξ) ∈ ∂x f (x , y)× ∂y [−f (x , y)]
Setup for RSA: a norm ‖ · ‖ on Rn ⊃ Z = X × Y and a convexcontinuous distance generating function ω(z) : Z → R such thatthe set Z o = z ∈ Z : ∂ω(z) 6= ∅ is convex, and ω
∣∣Z o is C1 and
strongly convex:∀z, z ′ ∈ Z o : 〈ω′(z)− ω′(z ′), z − z ′〉 ≥ α‖z − z ′‖2 [α > 0]
♠ Robust Stochastic Approximation:z1 = zω := argminZ ω(·)
zt+1 = argminu∈Z
[ω(u)− 〈ω′(zt ),u〉] + γt〈G(zt , ξt ),u〉
z t =[∑t
τ=1 γτ
]−1∑tτ=1 γτzτ
Robust Stochastic Approximation and Extensions
Stochastic Mirror Prox [Jud.&Nem.&Tauvel’08]
minx∈X
maxy∈Y
f (x , y)
G(x , y , ξ) : EξG(x , y , ξ) ∈ ∂x f (x , y)× ∂y [−f (x , y)]
♠ Stochastic Mirror Prox algorithm:z1 = zω := argminZ ω(·)wt = argmin
u∈Z[ω(u)− 〈ω′(zt ),u〉] + γt〈G(zt , ξ2t−1),u〉
zt+1 = argminu∈Z
[ω(u)− 〈ω′(zt ),u〉] + γt〈G(wt , ξ2t ),u〉
z t =[∑t
τ=1 γτ
]−1∑tτ=1 γτwτ
Robust Stochastic Approximation and Extensions
Saddle Point accuracy measure
minx∈X
maxy∈Y
f (x , y) (SP)
♣ Accuracy measure for (SM):
ErrSad(x , y) = maxy ′∈Y
f (x , y ′)− minx ′∈X
f (x ′, y).
Explanation: (SP) gives rise to two optimization programs withequal optimal values:
Primal: minx∈X
f (x), f (x) = maxy∈Y
f (x , y)
Dual: maxy∈Y
f (y), f (y) = minx∈X
f (x , y)
ErrSad(x , y) is the duality gap — the sum of non-optimalities ofx and y as approximate solutions to the respective problems.
Robust Stochastic Approximation and Extensions
RSA & SMP: Efficiency Estimates
minx∈X
maxy∈Y
f (x , y)
G(x , y , ξ) : EξG(x , y , ξ) ∈ ∂x f (x , y)× ∂y [−f (x , y)](SP)
Theorem: Assume that E‖G(z, ξ)‖2∗ ≤ M2 ∀z ∈ Z = X × Y ,and let N ≥ 1. N-step RSA with properly chosen constant stepsγt = γ, 1 ≤ t ≤ N, ensures that
E
ErrSad(zN)≤ O(1)ΘM/
√N[
Θ =√
2 maxz∈Z [ω(z)− ω(zω)− 〈ω′(zω), z − zω〉]/α]
♠With “light tail” ‖G(z, ξ)‖∗/M, probabilities of large deviationsin RSA admit exponential bounds:E
exp‖G(z, ξ)‖2∗/M2≤ exp1 ∀z ∈ Z ⇒
∀Ω > 1 : Prob
ErrSad(zN) > O(1)ΩΘM/√
N≤ 2 exp−Ω
• Similar results hold true for SMP.
Robust Stochastic Approximation and Extensions
Good Geometry case
♣ Let rint Z ⊂ rint Z +, where Z + is the direct product of K“basic blocks”:• Kb unit Euclidean balls Bi ⊂ Ei = Rni ;• Ks spectahedrons Sj ⊂ Fj ;Fj : spaces of symmetric kj × kj matrices of a given block-diagonal
structure with the Frobenius inner productSj : the set of all positive semidefinite matrices from Fj with unit
trace♠ Good setup for RSA/SMP is given by
the norm ‖z‖ =√∑Kb
i=1 ‖z i‖22 +∑Ks
j=1 ‖zj‖2tr• z i : ball components of z • zj : spectahedron components of z• ‖ · ‖tr: trace norm of a symmetric matrix.
the dgf ω(z) =∑Kb
i=112‖z
i‖22 +∑Ks
j=1 Entropy(zj)
• Entropy(u) =∑
` λ`(u) lnλ`(u), λ`(u) being eigenvalues of asymmetric matrix u.
Robust Stochastic Approximation and Extensions
Good Geometry case (continued)
♣With the above setup, we haveΘ = O(1)
√Kb +
∑Ksj=1 ln kj
⇒ EErrSad(zN) ≤ O(1)[Kb +∑Ks
j=1 ln kj ]1/2M/
√N
Good news: When K = O(1), the rate of convergence isnearly dimension-independent and, moreover, nearly the bestallowed under circumstances by Laws of Statistics
♣ The computational cost of a step is dominated by the costs ofa call to SO and a prox-step g 7→ argminz∈Z
[gT z + ω(z)
]♠When Z = Z +, the cost C of a prox-step is
C = O(1)(∑
i dim (Ei) +∑
j Cj
)• Cj : the cost of eigenvalue decomposition of A ∈ Fj
♠When Z is simple – product of K balls and simplexes – wehave C = O(1)dim Z .
Robust Stochastic Approximation and Extensions
How it works: Convex Stochastic Programming
minx∈X⊂Rn
f (x) := Eξ∼PF (x , ξ)
G(x , ξ) ∈ ∂xF (x , ξ), Eξ∼P‖G(x , ξ)‖2∗ ≤ M2(SM)
♣ Till now, the working horse of Convex StochasticMinimization is the Sample Average Approximation method. InSAA, problem (SM) is replaced with its empirical counterpart
minx fN(x), fN(x) = 1N∑N
i=1 F (x , ξi)[ξ1, ..., ξN : sample drawn from P]
♣ In the Good Geometry case, the SAA sample size Nresulting, with probability close to 1, in ε-solution to (SM) isO(1)n(M/ε)2 [Shap.’03], which is by factor O(n) worse than thenumber of steps of RSA/SMP yielding a solution of the samequality.
Robust Stochastic Approximation and Extensions
How it works (continued)
minx∈X⊂Rn
f (x) := Eξ∼PF (x , ξ) (SM)
⇒ minx∈X
fN(x) = 1N∑N
i=1 f (x , ξi) (SAA)
♣ A single call to the first order oracle for fN is as costly as Ncalls to the SO.⇒RSA and SMP are much cheaper computationally than SAA.When Z is simple, the overall computational effort in the N-stepRSA/SMP is of the same order as the effort to compute a singlesubgradient of fN(·)!
♠ Our experimental results consistently demonstrate that whilethe solutions built by RSA and SAA with a common sample sizeN are of nearly the same quality, RSA by orders of magnitudeoutperforms SAA in terms of running time.
Robust Stochastic Approximation and Extensions
How it works: Typical experiment
f∗ = minx
f (x) = Eξ∼N (a,In)φ(ξT x) : x ≥ 0,∑
i x1 = 1, ‖a‖∞ = 1
φ(·) : piecewise linear convex loss function
n = 1000, f∗ = −5.9439 n = 5000, f∗ = −5.6066Method f (z4000) : f (z4000)− f∗ CPU f (z4000) : f (z4000)− f∗ CPU
SAA −5.9384 : 0.0055 253 −5.5967 : 0.0099 1283N-RSA −5.9365 : 0.0074 12 −5.5935 : 0.0131 49E-RSA −5.9193 : 0.0246 13 −5.5354 : 0.0712 77
• N-RSA: ‖ · ‖ = ‖ · ‖1, ω(x) =∑
i xi ln xi Ef (zN)− f∗ ≤ O(1) ln n√N
• E-RSA: ‖ · ‖ = ‖ · ‖2, ω(x) = 12 xT x Ef (zN)− f∗ ≤ O(1)
√n√N
Robust Stochastic Approximation and Extensions
SMP vs RSAminx∈X
maxy∈Y
f (x , y) (SP)
F (x , y) := Eξ G(x , y , ξ) ∈ ∂x f (x , y)× ∂y [−f (x , y)]
Question: What, if any, are the advantages of SMP ascompared to RSA?Answer: SMP can utilize the fact that some components ofF (x , y) are regular and well-observable.
♣ Assumption A:‖F (z)− F (z ′)‖∗ ≤ L‖z − z ′‖+ σ ∀z, z ′ ∈ Z = X × Y
Eξ‖G(z, ξ)− F (z)‖2∗
≤ σ2
Fact: Under A, the quantity M2 = supz∈Z Eξ‖G(z, ξ)‖2∗ canbe as large as O(1)(ΘL + σ)2
⇒The efficiency of RSA cannot be better than
E
ErrSad(zNRSA)
≤ O(1)Θ
ΘL + σ√N
even when σ = 0, i.e., f ∈ C1,1(L), no noise at all!
Robust Stochastic Approximation and Extensions
SMP vs RSA (continued)minx∈X
maxy∈Y
f (x , y)
F (x , y) := Eξ G(x , y , ξ) ∈ ∂x f (x , y)× ∂y [−f (x , y)]‖F (z)− F (z ′)‖∗ ≤ L‖x − y‖+ σ, Eξ
‖G(z, ξ)− F (z)‖2∗
≤ σ2
Theorem: N-step SMP with properly chosen constantstepsizes ensures that
E
ErrSad(zNSMP)
≤ O(1)Θ
[ΘLN + σ√
N
](!)
When Eξ
exp‖G(z, ξ)− F (z)‖2∗/σ2≤ exp1, for all Ω > 0
one hasProb
ErrSad(zN
SMP) > O(1)[Θ[
ΘLN + σ√
N
]+ ΩΘσ√
N
]≤ exp−Ω2/3+ exp−ΩN
Good news: When Z is the product of O(1) high-dimensionalballs, (!) is the best efficiency estimate achievable undercircumstances.
Robust Stochastic Approximation and Extensions
Applying SMP: Stochastic Feasibility problem
♣We want to solve a feasible system of “simple” and “difficult”convex constraints:
Sµ(x) ≤ 0, 1 ≤ µ ≤ pDν(x) ≤ 0, 1 ≤ ν ≤ qx ∈ X = x ∈ Rn : ‖x‖2 ≤ 1
• Sµ are given by precise First Order oracle and are smooth:‖S′µ(x)‖2 ≤ Lµ, ‖S′µ(x)− S′µ(x ′)‖2 ≤
√ln(p + q)Lµ‖x − x ′‖2.
• Dν are given by Stochastic Oracle. At i-th call to the SO,x ∈ X being the input, the SO returns unbiased estimateshν(x , ξi), Hν(x , ξi) of Dν(x), D′ν(x), with independent ξi ∼ P and
Eξ
maxν
[h2ν(x ,ξ)M2ν
]≤ 1, max
νEξ‖Hν(x ,ξ)‖2
2M2ν
≤ ln(p + q).
Robust Stochastic Approximation and Extensions
Stochastic Feasibility problem (continued)Sµ(x) ≤ 0, 1 ≤ µ ≤ pDν(x) ≤ 0, 1 ≤ ν ≤ qx ∈ X = x ∈ Rn : ‖x‖2 ≤ 1
Sν : ‖S′µ‖ ≤ Lµ, ‖S′µ(x)− S′µ(x ′)‖2 ≤√
ln(p + q)Lµ‖x − x ′‖2Dν(x) = Eξhν(x , ξ), Eξ
maxν
h2ν(x , ξ)/M2
ν
≤ 1
D′ν(x) = EξHν(x , ξ), maxν
Eξ‖H(x , ξ)‖22/M2
ν
≤ ln(p + q)
♣ Properly applying N-step SMP to the Saddle Point problem
minx∈X
maxy≥0,∑i yi =1
[∑pµ=1 yµαµSµ(x) +
q∑ν=1
yp+ναp+νDν(x)
]
we build a solution xN ∈ X such that
Sµ(xN) ≤√
ln(p + q)LνN ζ, Dν(xN) ≤
√ln(p + q) Mν√
Nζ
ζ ≥ 0, Eζ ≤ O(1).
Robust Stochastic Approximation and Extensions
Acceleration via Randomization
♣When solving large-scale well-structured convex optimizationproblems, e.g., the `1 minimization problem
minx‖x‖1 : ‖Ax − b‖∞ ≤ δ
with large-scale and possibly dense matrix A,• polynomial time methods, and thus generating high accuracysolutions, become prohibitively time consuming;• when medium accuracy solutions are sought, our best choiceis computationally cheap first order methods.
♠ There are interesting and important cases when first ordermethods can be accelerated significantly by randomization –passing from computationally demanding deterministic firstorder oracles to their computationally cheap stochasticcounterparts.
Robust Stochastic Approximation and Extensions
Example 1: Matrix Game
minx∈∆n
maxy∈∆m
f (x , y) := yT Ax , ∆k = u ∈ Rk+ :∑
i ui = 1
F (x , y) := [f ′x (x , y);−f ′y (x , y)] = [AT y ;−Ax ]
A := maxi,j|Aij |
(MG)
•When A is large and dense, solving (MG) by polynomial timealgorithms is prohibitively time consuming.• The best deterministic First Order algorithms find ε-solution to(MG) in O(1) ln(m + n)(A/ε) steps, with O(1) computations ofF (·) per step. The total effort is O(1) ln(m + n)mn(A/ε) a.o.
♣ Matrix-vector multiplications ∆m 3 y 7→ AT y , ∆n 3 x 7→ Axrequired to compute F (x , y) are easy to randomize: to build anunbiased estimate of Bu =
∑i ujBj , draw at random a column
index : Prob = j =|uj |‖u‖1
, and return ‖u‖1sign(u)B.
Robust Stochastic Approximation and Extensions
Matrix Game (continued)
minx∈∆n
maxy∈∆m
yT Ax , A := maxi,j|Aij | (MG)
♣ Equipping (MG) with Stochastic Oracle and applying RSA,we get, with confidence ≥ 1− β, an ε solution to (MG) inO(1) ln(mn/β)(A/ε)2 steps. A step reduces to extracting from Arandomly chosen row and column plus O(m + n) a.o. overhead.⇒ε-solutions costs O(1) ln(mn/β)(m + n)(A/ε)2 a.o.
♠With ε, A fixed and m = O(n) large, RSA• outperforms by order of magnitudes the best known
deterministic alternatives• exhibits sublinear time behaviour: ε-solution is found by
just O(1) ln(mn/β)(m + n)(A/ε)2 of the total of mn dataentries, cf. Grigoriadis & Khachiyan ’95.
Robust Stochastic Approximation and Extensions
Application: `1 minimization with uniform fit
♣ Numerous sparsity-oriented Signal Processing problemsreduce to (small series of) problems
minu ‖Au − b‖p : ‖u‖1 ≤ 1 [A : m × n] (`1)
♠When p =∞, (`1) reduces to Matrix Gamemin
x=[u;v ]∈∆2n
maxy=[p;q]∈∆2m
[p − q]T[A− b1T ,−A− b1T ] [u; v ]
⇒RSA can be used to solve `1-minimization problems.
•When m,n are really large (like 104 and more) and A is ageneral-type dense analytically given matrix, (`1) becomesprohibitively difficult for all known deterministic algorithms, andrandomized algorithms become a kind of "last resort".
Robust Stochastic Approximation and Extensions
How it works
Opt = minu‖Au − b‖∞ : ‖u‖1 ≤ 1[
A: dense analytically given m × n matrixb: ‖Au∗ − b‖∞ ≤ δ = 1.e-3 with 16-sparse u∗, ‖u∗‖1 = 1
]Errors CPU
m × n Method ‖u∗ − u‖1 ‖u∗ − u‖2 ‖u∗ − u‖∞ sec Mult2048× 4096 DMP 0.0014 0.00052 0.00036 122.8 1770
RSA 0.039 0.0079 0.0030 325.4 29.38192× 32768 DMP 1.006 0.319 0.184 3141.9 5
RSA 0.120 0.0196 0.00634 3000.5 4.7
• DMP: Deterministic Mirror Prox • u: resulting solution• Last column: (equivalent) # of matrix-vector multiplications.
Robust Stochastic Approximation and Extensions
How it works (continued)
0 500 1000 1500 2000 2500 3000 3500 4000 4500−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0 0.5 1 1.5 2 2.5 3 3.5
x 104
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
2048× 4096 8192× 32768`1-recovery: Blue: true signal, Cyan: MDSA
Robust Stochastic Approximation and Extensions
Example 2: `1-minimization with Least Squares fit
♣ The Least Squares `1 minimization problemminu ‖Au − b‖2 : ‖u‖1 ≤ 1 [A : m × n]
reduces to the Saddle Point problemminx∈∆n
max‖y‖2≤1
yT [Ax − b] [F (x , y) = [AT y ; b − Ax ]]
♠ Randomizing matrix-vector multiplications and applying RSA,we get, with confidence ≥ 1− β, an ε-solution with the overalleffort of
O(1) ln(mn/β)(√
mC2/ε)2 a.o. (∗)
• C2: maximum of ‖ · ‖2-norms of columns in C = [A,b]
Bad news: Factor√
m in (∗) reflecting high potential noisinessof the unbiased estimate (y , ‖y‖2 ≤ 1) 7→ ‖y‖1sign(yı)[Aı,:]T ofAT y : it may happen that
‖‖y‖1sign(yı)[Aı,:]T‖∞ ∼√
mC2
Robust Stochastic Approximation and Extensions
`1-minimization with Least Squares fit (continued)
minx∈∆n
max‖y‖2≤1
yT [Ax − b] [F (x , y) = [AT y ; b − Ax ]]
Remedy: Randomized preprocessing [A,b] 7→ UD[A,b] withorthogonal U, |Uij | ≤ O(m−1/2) and random diagonal D with iidDii ∼ Uniform−1; 1 results in an equivalent problem,preserves C2 and reduces noise in the estimate of AT y : now,with confidence 1− β,‖y‖2 ≤ 1⇒ ‖‖y‖1sign(yı)[Aı,:]T‖∞ ≤ O(1)
√ln(mn/β)C2
♠ After preprocessing, RSA finds, with confidence ≥ 1− β, anε-solution to the problem at the cost of
O(1) ln2(mn/β)(C2/ε)2 a.o.
•With properly chosen U (e.g., Hadamard or DFT matrix), thepreprocessing costs just O(1)mn ln(m) a.o.
Robust Stochastic Approximation and Extensions
Example 3: Minimizing the maximum of convexpolynomials over a simplex/`1-ball
♣ Consider the optimization problemminx∈∆n
max1≤k≤m
pk (x), pk (x) =∑d
`=0 Ak`[x , ..., x ] (∗)• Ak`[x1, ..., x`]: symmetric `-linear forms; • pk (·): convex.
♠We can reduce (∗) to the Saddle Point problemminx∈∆n
maxy∈∆m
∑k yjpk (x)
and randomize in the aforementioned manner computation ofpk (·), ∇pk (·):• To build an unbiased estimates pk (x), Pk (x) of pk (x) andPk (x) = ∇pk (x), x ∈ ∆n, we generate at random indices1, ..., d , Prob = j = xj , and set
pk =∑d
`=0 Ak`[e1 , ...,e` ][Pk ]j =
∑d`=1 `Ak`[e1 , ...,e`−1 ,ej ] 1 ≤ j ≤ n.
Robust Stochastic Approximation and Extensions
Minimizing maximum of convex polynomials(continued)
minx∈∆n
max1≤k≤m
∑d`=0 Ak`[x , ..., x ] (∗)
• A: maximum magnitude of coefficients in Ak`[·, ..., ·]
♠ Applying RSA, we find, with confidence ≥ 1− β, anε-solution to (∗) at the cost of O(1) ln(mn/β)d2(A/ε)2 steps,with a step reducing to
• extracting O(1)d(m + n) coefficients of the formsAk`[·, ·, ..., ·], given the “addresses” k , `, 1, ..., ` ofthe coefficients, plus• O(m + dn)-a.o. overhead.
Robust Stochastic Approximation and Extensions