Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
AN ACCELERATED INEXACT PROXIMAL POINT METHOD FORSOLVING NONCONVEX-CONCAVE MIN-MAX PROBLEMS
WEIWEI KONGâ AND RENATO D.C. MONTEIROâ
Abstract. This paper presents smoothing schemes for obtaining approximate stationary pointsof unconstrained or linearly-constrained composite nonconvex-concave min-max (and hence non-smooth) problems by applying well-known algorithms to composite smooth approximations of theoriginal problems. More specifically, in the unconstrained (resp. constrained) case, approximatestationary points of the original problem are obtained by applying, to its composite smooth approx-imation, an accelerated inexact proximal point (resp. quadratic penalty) method presented in aprevious paper by the authors. Iteration complexity bounds for both smoothing schemes are alsoestablished. Finally, numerical results are given to demonstrate the efficiency of the unconstrainedsmoothing scheme.
Key words. quadratic penalty method, composite nonconvex problem, iteration-complexity,inexact proximal point method, first-order accelerated gradient method, minimax problem.
AMS subject classifications. 47J22, 90C26, 90C30, 90C47, 90C60, 65K10.
1. Introduction. The first goal of this paper is to present and study the com-plexity of an accelerated inexact proximal point smoothing (AIPP-S) scheme for fin-ding approximate stationary points of the (potentially nonsmooth) min-max compos-ite nonconvex optimization (CNO) problem
(1.1) minxâXp(x) := p(x) + h(x)
where h is a proper lower-semicontinuous convex function, X is a nonempty convexset, and p is a max function given by
(1.2) p(x) := maxyâY
Ί(x, y) âx â X,
for some nonempty compact convex set Y and function Ί which, for some scalarm > 0 and open set Ω â X, is such that: (i) Ί is continuous on Ω Ă Y ; (ii) thefunction âΊ(x, ·) : Y 7â R is lower-semicontinuous and convex for every x â X; and(ii) for every y â Y , the function Ί(·, y) +mâ · â2/2 is convex, differentiable, and itsgradient is Lipschitz continuous on XĂY . Here, the objective function is the sum of aconvex function h and the pointwise supremum of (possibly nonconvex) differentiablefunctions which is generally a (possibly nonconvex) nonsmooth function.
When Y is a singleton, the max term in (1.1) becomes smooth and (1.1) reducesto a smooth CNO problem for which many algorithms have been developed in theliterature. In particular, accelerated inexact proximal points (AIPP) methods, i.e.methods which use an accelerated composite gradient variant to approximately solvethe generated sequence of prox subproblems, have been developed for it (see, forexample, [4,15]). When Y is not a singleton, (1.1) can no longer be directly solved byan AIPP method due to the nonsmoothness of the max term. The AIPP-S schemedeveloped in this paper is instead based on a perturbed version of (1.1) in which themax term in (1.1) is replaced by a smooth approximation and the resulting smoothCNO problem is solved by an AIPP method.
âSchool of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA,30332-0205. (E-mails: [email protected] & [email protected]). The works of theseauthors were partially supported by ONR Grant N00014-18-1-2077 and NSERC Grant PGSD3-516700-2018.
1
arX
iv:1
905.
1343
3v4
[m
ath.
OC
] 1
7 Ju
n 20
21
Throughout our presentation, it is assumed that efficient oracles for evaluatingthe quantities Ί(x, y), âxΊ(x, y), and h(x) and for obtaining exact solutions of theproblems
(1.3) minxâX
λh(x) + 1
2âxâ x0â2, max
yâY
λΊ(x0, y)â 1
2ây â y0â2
for any (x0, y0) and λ > 0, are available. Throughout this paper, the terminologyâoracle callâ is used to refer to a collection of the above oracles of size O(1) whereeach of them appears at least once. We refer to the computation of the solution ofthe first problem above as a h-resolvent evaluation. In this manner, the computationof the solution of the second one is a [âΊ(x0, ·)]-resolvent evaluation.
We first develop an AIPP-S scheme that obtains a stationary point based on aprimal-dual formulation of (1.1). More specifically, given a tolerance pair (Ïx, Ïy) âR2
++, it is shown that an instance of this scheme obtains a quadruple (u, v, x, y) suchthat
(1.4)(uv
)â(âxΊ(x, y)
0
)+(
âh(x)â [âΊ(x, ·)] (y)
), âuâ †Ïx, âvâ †Ïy
in O(Ïâ2x Ïâ1/2y ) oracle calls, where âÏ(z) is the subdifferential of a convex function Ï
at a point z (see (1.9) with Δ = 0). We then show that another instance of this schemecan obtain an approximate stationary point based on the directional derivative of p.More specifically, given a tolerance pair ÎŽ > 0, it is shown that this instance computesa point x â X such that
(1.5) âx â X s.t. infâdââ€1
pâČ(x; d) â„ âÎŽ, âxâ xâ †Ύ,
in O(ÎŽâ3) oracle calls, where pâČ(x; d) is the directional derivative of p at the point xalong the direction d (see (1.10)).
The second goal of this paper is to develop a quadratic penalty AIPP-S (QP-AIPP-S) scheme to obtain approximate stationary points of a linearly constrainedversion of (1.1), namely
minxâX
p(x) + h(x) : Ax = b(1.6)
where p is as in (1.2), A is a linear operator, and b is in the range of A. The scheme isa penalty-type method which approximately solves a sequence of penalty subproblemsof the form
(1.7) minxâX
p(x) + h(x) + c
2âAxâ bâ2
for an increasing sequence of positive penalty parameters c. Similar to the approachused for the first goal of this paper, the method considers a perturbed variant of(1.7) in which the objective function is replaced by a smooth approximation andthe resulting problem is solved by the quadratic-penalty AIPP (QP-AIPP) methodproposed in [15]. For a given tolerance triple (Ïx, Ïy, η) â R3
++, it is shown that themethod computes a quintuple (u, v, x, y, r) satisfying
(1.8)
(uv
)â(âxΊ(x, y) +Aâr
0
)+(
âh(x)â [âΊ(x, ·)] (y)
),
âuâ †Ïx, âvâ †Ïy, âAxâ bâ †η.2
in O(Ïâ2x Ïâ1/2y + Ïâ2
x ηâ1) oracle calls.Finally, it is worth mentioning that all of the above complexities are obtained un-
der the mild assumption that the optimal value in each of the respective optimizationproblems, namely (1.1) and (1.6) is bounded below. Moreover, it is neither assumedthat X be bounded nor that (1.1) or (1.6) has an optimal solution.
Related Works. Since the case when Ί(·, ·) in (1.1) is convex-concave has beenwell-studied in the literature (see, for example, [1, 11, 13, 21, 22, 23, 27]), we will makeno more mention of it here. Instead, we will focus on papers that consider (1.1) whereΊ(·, y) is differentiable and nonconvex for every y â Y and there are mild conditionson Ί(x, ·) for every x â X.
Letting ÎŽC denote the indicator function of a closed convex set C â X (see Sub-section 1.1), Conv(X ) denote the set of proper lower semicontinuous convex functionson X , and Ï := minÏx, Ïy, Tables 1.1 and 1.2 compare the assumptions and itera-tion complexities obtained in this work with corresponding ones derived in the earlierpapers [24, 26] and the subsequent works [17, 25, 30]. It is worth mentioning thatthe above works consider termination conditions that are slightly different than theones in this paper. It is shown in Subsection 2.1 that they are actually equivalentto the ones in this paper up to multiplicative constants that are independent of thetolerances, i.e., Ïx, Ïy, ÎŽ.
Algorithm Oracle ComplexityUse Cases
Dh =â h ⥠0 h ⥠ΎC h â Conv(X )PGSF [24] O(Ïâ3) % ! ! %
Minimax-PPA [17] O(Ïâ2.5 log2(Ïâ1)) % ! ! %
FNE Search [25] O(Ïâ2x Ïâ1/2
y log(Ïâ1)) ! ! ! %
AIPP-S O(Ïâ2x Ïâ1/2
y ) ! ! ! !
Table 1.1Comparison of iteration complexities and assumptions under notions equivalent to (1.4) with
Ï := minÏx, Ïy.
Algorithm Oracle ComplexityUse Cases
Dh =â h ⥠0 h ⥠ΎC h â Conv(X )PG-SVRG [26] O(ÎŽâ6 log ÎŽâ1) % ! ! !
Minimax-PPA [17] O(ÎŽâ3 log2(ÎŽâ1)) % ! ! %
Prox-DIAG [30] O(ÎŽâ3 log2(ÎŽâ1)) ! ! % %
AIPP-S O(ÎŽâ3) ! ! ! !
Table 1.2Comparison of iteration complexities and assumptions under notions equivalent to (1.5).
To the best of our knowledge, this work is the first one to analyze the complexityof a smoothing scheme for finding approximate stationary points of (1.6).
Organization of the paper. Subsection 1.1 presents notation and some basic defi-nitions that are used in this paper. Subsection 1.2 presents several motivating appli-cations that are of the form in (1.1). Section 2 is divided into two subsections. Thefirst one precisely states the assumptions underlying problem (1.1) and discusses fournotions of stationary points. The second one presents a smooth approximation of the
3
function p in (1.1). Section 3 is divided into two subsections. The first one reviewsthe AIPP method in [15] and its iteration complexity. The second one presents theAIPP-S scheme its iteration complexities for finding approximate stationary pointsas in (1.4) and (1.5). Section 4 is also divided into two subsections. The first onereviews the QP-AIPP method in [15] and its iteration complexity. The second onepresents the QP-AIPP-S scheme its iteration complexity for finding points satisfying(1.8). Section 5 presents some computational results. Section 6 gives some conclud-ing remarks. Finally, several appendices at the end of this paper contain proofs oftechnical results needed in our presentation.
1.1. Notation and basic definitions. This subsection provides some basicnotation and definitions.
The set of real numbers is denoted by R. The set of non-negative real numbersand the set of positive real numbers is denoted by R+ and R++ respectively. Theset of natural numbers is denoted by N. For t > 0, define log+
1 (t) := max1, log(t).Let Rn denote a realâvalued nâdimensional Euclidean space with standard norm â · â.Given a linear operator A : Rn 7â Rp, the operator norm of A is denoted by âAâ :=supâAzâ/âzâ : z â Rn, z 6= 0.
The following notation and definitions are for a general complete inner productspace Z, whose inner product and its associated induced norm are denoted by ă·, ·ăand â · â respectively. Let Ï : Z 7â (ââ,â] be given. The effective domain of Ï isdenoted as domÏ := z â Z : Ï(z) < â and Ï is said to be proper if domÏ 6= â .The set of proper, lower semi-continuous, convex functions Ï : Z 7â (ââ,â] isdenoted by Conv(Z). Moreover, for a convex set Z â Z, we denote Conv(Z) to beset of functions in Conv(Z) whose effective domain is equal to Z. For Δ â„ 0, theΔ-subdifferential of Ï â Conv(Z) at z â domÏ is denoted by
(1.9) âΔÏ(z) := w â Rn : Ï(zâČ) â„ Ï(z) + ăw, zâČ â ză â Δ,âzâČ â Z ,
and we denote âÏ âĄ â0Ï. The directional derivative of Ï at z â Z in the directiond â Z is denoted by
(1.10) ÏâČ(z; d) := limtâ0
Ï(z + td)â Ï(z)t
.
It is well-known that if Ï is differentiable at z â domÏ, then for a given directiond â Z we have ÏâČ(z; d) = ăâÏ(z), dă.
For a given Z â Z, the indicator function of Z, denoted by ÎŽZ , is defined asÎŽZ(z) = 0 if z â Z and ÎŽZ(z) = â if z /â Z. Moreover, the closure, interior, andrelative interior of Z are denoted by clZ, intZ, and riZ, respectively. The supportfunction of Z at a point z is denoted by ÏZ(z) := supzâČâZ ăz, zâČă.
1.2. Motivating applications. This subsection lists motivating applicationsthat are of the form in (1.1). In Section 5, we examine the performance of ourproposed smoothing scheme on some special instances of these applications.
1.2.1. Maximum of a finite number of nonconvex functions. Given afamily of functions fiki=1 that are continuously differentiable everywhere with Lip-schitz continuous gradients and a closed convex set C â Rn. The problem of interestis the minimization of max1â€iâ€k fi over the set C, i.e.,
minxâC
max1â€iâ€k
fi(x),
4
which is clearly an instance of (1.1) where Y = y â Rk+ :âki=1 yi = 1, Ί(x, y) =âk
i=1 yifi(x), and h(x) = ÎŽC(x).
1.2.2. Robust regression. Given a set of observations Ï := Ïini=1 and acompact convex set Î â Rk, let `Ξ(·|Ï)ΞâÎ be a family of nonconvex loss functionsin which: (i) `Ξ(x|Ï) is concave in Ξ for every x â Rn; and (ii) `Ξ(x|Ï) is continuouslydifferentiable in x with Lipschitz continuous gradient for every Ξ â Î. The problemof interest is to minimize the worst-case loss in Î, i.e.,
minxâRn
maxΞâÎ
`Ξ(x|Ï),
which is clearly an instance of (1.1), where Y = Î, Ί(x, y) = `y(x|Ï), and h(x) = 0.
1.2.3. Min-max games with an adversary. Let Uj(x1, ..., xk, y)kj=1 be aset of utility functions in which: (i) Uj is nonconvex and continuously differentiablein its first k arguments, but concave in its last argument; (ii) âxiUj(x1, ..., xk, y) isLipschitz continuous for every 1 †i †k. Given input constraint sets Biki=1 andBy, the problem of interest is to maximize the total utility of the players (indices 1to k) given that the adversary (index k + 1) seeks to maximize his own utility, i.e.,
minx1,...,xk
maxy
â
kâi=1Uj(x1, ..., xk, y) : xi â Bi, i = 0, ..., k
,
which is clearly an instance of (1.1) where x = (x1, ..., xk), Y = By, Ί(x, y) =ââki=1 Uj(x1, ..., xk, y), and h(x) = ÎŽB1Ă...ĂBk
(x).
2. Preliminaries. This section presents some preliminary material and is di-vided into two subsections. The first one precisely describes the assumptions andvarious notions of stationary points for problem (1.1) and briefly compares two ap-proaches for obtaining them. The second one presents a smooth approximation of themax function p in (1.1) and some of its properties.
2.1. Assumptions and notions of stationary points. This subsection de-scribes the assumptions and four notions of stationary points for for problem (1.1).It is worth mentioning that the complexities of the smoothing scheme of Section 3are presented with respect to two of these notions. In order to understand how theseresults can be translated to the other two alternative notions, which have been usedin a few papers dealing with problem (1.1), we present a few results discussing someuseful relations between all these notions.
Throughout our presentation, we let X and Y be finite dimensional inner productspaces. We also make the following assumptions on problem (1.1):
(A0) X â X and Y â Y are nonempty convex sets, and Y is also compact;(A1) there exists an open set Ω â X such that Ί(·, ·) is finite and continuous on
Ωà Y ; moreover, âxΊ(x, y) exists and is continuous at every (x, y) â Ωà Y ;(A2) h â Conv(X) and âΊ(x, ·) â Conv(Y ) for every x â Ω;(A3) there exist scalars (Lx, Ly) â R2
++, and m â (0, Lx] such that
Ί(x, y)â [Ί(xâČ, y) + ăâxΊ(xâČ, y), xâ xâČă] â„ âm2 âxâ xâČâ2,(2.1)
ââxΊ(x, y)ââxΊ(xâČ, yâČ)â †Lxâxâ xâČâ+ Lyây â yâČâ,(2.2)
for every x, xâČ â X and y, yâČ â Y ;5
(A4) pâ := infxâX p(x) is finite, where p is as in (1.1);We make three remarks about the above assumptions. First, it is well-known that
condition (2.2) implies that
(2.3) Ί(xâČ, y)â [Ί(x, y) + ăâxΊ(x, y), xâČ â xă] †Lx2 âx
âČ â xâ2,
for every (xâČ, x, y) â X Ă X Ă Y . Second, functions satisfying the lower curvaturecondition in (2.1) are often referred to as weakly convex functions (see, for example,[5, 6, 7, 8]). Third, the aforementioned weak convexity condition implies that, for anyy â Y , the function Ί(·, y) +mâ · â2/2 is convex, and hence p+mâ · â2/2 is as well.Note that while p is generally nonconvex and nonsmooth, it has the nice propertythat p+mâ · â2/2 is convex.
We now discuss two stationarity conditions of (1.1) under assumptions (A0)â(A3).First, denoting
(2.4) Ί(x, y) := Ί(x, y) + h(x) â(x, y) â X Ă Y,
it is well-known that (1.1) is related to the saddle-point problem which consists offinding a pair (xâ, yâ) â X Ă Y such that
(2.5) Ί(xâ, y) †Ί(xâ, yâ) †Ί(x, yâ),
for every (x, y) â X Ă Y . More specifically, (xâ, yâ) satisfies (2.5) if and only if xâis an optimal solution of (1.1), yâ is an optimal solution of the dual of (1.1), andthere is no duality gap between the two problems. Using the composite structuredescribed above for Ί, it can be shown that a necessary condition for (2.5) to hold isthat (xâ, yâ) satisfy the stationarity condition
(2.6)(
00
)â(âxΊ(xâ, yâ)
0
)+(
âh(xâ)â [âΊ(xâ, ·)] (yâ)
).
When m = 0, the above condition also becomes sufficient for (2.5) to hold. Second,it can be shown that pâČ(xâ; d) is well-defined for every d â X and that a necessarycondition for xâ â X to be a local minimum of (1.1) is that it satisfies the stationaritycondition
(2.7) infâdââ€1
pâČ(xâ; d) â„ 0.
When m = 0, the above condition also becomes sufficient for xâ to be a globalminimum of (1.1). Moreover, in view of Lemma 19 in Appendix D with (u, v, x, y) =(0, 0, xâ, yâ), it follows that xâ satisfies (2.7) if and only if there exists yâ â Y suchthat (xâ, yâ) satisfies (2.6).
Note that finding points that satisfy (2.6) or (2.7) exactly is generally a difficulttask. Hence, in this section and the next one, we only consider approximate versionsof (2.6) or (2.7), which are (1.4) and (1.5), respectively. For ease of future reference,we say that:
(i) a quadruple (u, v, x, y) is a (Ïx, Ïy)âprimal-dual stationary point of (1.1)if it satisfies (1.4);
(ii) a point x is a ÎŽâdirectional stationary point of (1.1) if it satisfies the firstinequality in (1.5).
6
It is worth mentioning that (1.5) is generally hard to verify for a given point x â X.This is primarily because the definition requires us to check an infinite number ofdirectional derivatives for a (potentially) nonsmooth function at points x near x. Incontrast, the definition of an approximate primal-dual stationary point is generallyeasier to verify because the quantities âuâ and âvâ can be measured directly, and theinclusions in (1.4) are easy to verify when the prox oracles for h and Ί(x, ·), for everyx â X, are readily available.
The next result, whose proof is given in Appendix D, shows that a (Ïx, Ïy)âprimal-dual stationary point, for small enough Ïx and Ïy, yields a point x satisfying (1.5).Its statement makes use of the diameter of Y defined as
(2.8) Dy := supy,yâČâY
ây â yâČâ.
Proposition 1. If the quadruple (u, v, x, y) is a (Ïx, Ïy)âprimal-dual stationarypoint of (1.1), then there exists a point x â X such that
infâdââ€1
pâČ(x; d) â„ âÏx â 2â
2mDyÏy, âxâ xâ â€â
2DyÏym
.
The iteration complexities in this paper (see Section 3) are stated with respect tothe two notions of stationary points (1.4) and (1.5). However, it is worth discussingbelow two other notions of stationary points that are common in the literature as wellas some results that relate all four notions.
Given (λ, Δ) â R2++, a point x is said to be a (λ, Δ)-prox stationary point of (1.1)
if the function p+ â · â2/(2λ) is strongly convex and
(2.9) 1λâxâ xλâ †Δ, xλ = argmin
uâX
Pλ(u) := p(u) + 1
2λâuâ xâ2.
The above notion is considered, for example, in [17, 26, 30]. The result below, whoseproof is given in Appendix D, shows how it is related to (1.5).
Proposition 2. For any given λ â (0, 1/m), the following statements hold:(a) for any Δ > 0, if x â X satisfies (1.5) and
(2.10) 0 < Ύ †λ3Δ
λ2 + 2(1â λm)(1 + λ) ,
then x is a (λ, Δ)-prox stationary point;(b) for any ÎŽ > 0, if x â X is a (λ, Δ)-prox stationary point for some Δ â€
ÎŽ ·min1, 1/λ, then x satisfies (1.5) with x = xλ, where xλ is as in (2.9).Note that for a fixed λ â (0, 1/m) such that maxλâ1, (1 â λm)â1 = O(1), thelargest ÎŽ in part (a) is O(Δ). Similarly, for part (b), if λâ1 = O(1) then largest Δ inpart (b) is O(ÎŽ). Combining these two observations, it follows that (2.9) and (1.5)are equivalent (up to a multiplicative factor) under the assumption that ÎŽ = Î(Δ).
Given (Ïx, Ïy) â R2++, a pair (x, y) is said to be a (Ïx, Ïy)-first-order Nash equi-
librium point of (1.1) if
(2.11) infâdxââ€1
S âČy(x; dx) â„ âÏx, supâdyââ€1
S âČx(y; dy) †Ïy,
where Sy := Ί(·, y) + h(·) and Sx := Ί(x, ·). The above notion is considered, forexample, in [17, 24, 25]. The next result, whose proof is given in Appendix D, showsthat (2.11) is equivalent to (1.4).
7
Proposition 3. A pair (x, y) is a (Ïx, Ïy)-first-order Nash equilibrium point ifand only if there exists (u, v) â X Ă Y such that (u, v, x, y) satisfies (1.4).
We now end this subsection by briefly discussing some approaches for findingapproximate stationary points of (1.1). One approach is to apply a proximal descenttype method directly to problem (1.1), but this would lead to subproblems withnonsmooth convex composite functions. A second approach is based on first applyinga smoothing method to (1.1) and then using a prox-convexifying descent method suchas the one in [15] to solve the perturbed unconstrained smooth problem. An advantageof the second approach, which is the one pursued in this paper, is that it generatessubproblems with smooth convex composite objective functions. The next subsectiondescribes one possible way to smooth the (generally) nonsmooth function p in (1.1).
2.2. Smooth approximation. This subsection presents a smooth approxima-tion of the function p in (1.1).
For every Ο > 0, consider the smoothed function pΟ defined by
pΟ(x) := maxyâY
ΊΟ(x, y) := Ί(x, y)â 1
2Ο ây â y0â2âx â X,(2.12)
for some y0 â Y . The following proposition presents the key properties of pΟ and itsrelated quantities.
Proposition 4. Let Ο > 0 be given and assume that the function Ί satisfiesconditions (A0)â(A3). Let pΟ(·) and ΊΟ(·, ·) be as defined in (2.12) and define
(2.13)QΟ := ΟLy +
âΟ(Lx +m), LΟ := LyQΟ + Lx â€
(LyâΟ +
âLx
)2,
yΟ(x) := argmaxyâČâY
ΊΟ(x, yâČ),
for every x â X. Then, the following properties hold:(a) yΟ(·) is QΟâLipschitz continuous on X;(b) pΟ(·) is continuously differentiable on X and âpΟ(x) = âxΊ(x, yΟ(x)) for
every x â X;(c) âpΟ(·) is LΟâLipschitz continuous on X;(d) for every x, xâČ â X, we have
(2.14) pΟ(x)â [pΟ(xâČ) + ăâpΟ(xâČ), xâ xâČă] â„ âm
2 âxâ xâČâ2;
Proof. Note that the inequality in (2.13) follows from (a), the fact that m †Lx,and the bound
LΟ = Ly
[ΟLy +
âΟ(Lx +m)
]+ Lx †ΟL2
y + 2âΟLx + Lx =
(LyâΟ +
âLx
)2.
The other conclusions of (a)â(c) follow from Lemma 13 and Proposition 14 in Appen-dix B with (Κ, q, y) = (ΊΟ, pΟ, yΟ). We now show that the conclusion of (d) is true.Indeed, if we consider (2.1) at (y, xâČ) = (yΟ(xâČ), xâČ), the definition of ΊΟ, and use thedefinition of âpΟ in (b), then
â m
2 âxâ xâČâ2 †Ί(xâČ, yΟ(x))â [Ί(x, yΟ(x)) + ăâxΊ(x, yΟ(x)), xâČ â xă]
= ΊΟ(xâČ, yΟ(x))â [pΟ(x) + ăâpΟ(x), xâČ â xă] †pΟ(xâČ)â [pΟ(x) + ăâpΟ(x), xâČ â xă] ,
where the last inequality follows from the optimality of y.8
We now make two remarks about the above properties. First, the Lipschitz con-stants of yΟ and âpΟ depend on the value of Ο while the weak convexity constant m in(2.14) does not. Second, as Ο ââ, it holds that pΟ â p pointwise and QΟ, LΟ ââ.These remarks are made more precise in the next result.
Lemma 5. For every Ο > 0, it holds that ââ < p(x) âD2y/(2Ο) †pΟ(x) †p(x)
for every x â X, where Dy is as in (2.8).Proof. The fact that p(x) > ââ follows immediately from assumption (A4). To
show the other bounds, observe that for every y0 â Y , we have
Ί(x, y) + h(x) ℠Ί(x, y)â 12Ο ây â y0â2 + h(x) ℠Ί(x, y)â
D2y
2Ο + h(x)
for every (x, y) â X Ă Y . Taking the supremum of the bounds over y â Y and usingthe definitions of p and pΟ yields the remaining bounds.
3. Unconstrained min-max optimization. This section presents our pro-posed AIPP-S scheme for solving the min-max CNO problem (1.1) and is dividedinto two subsections. The first one reviews an AIPP method for solving smooth CNOproblems. The second one presents the AIPP-S scheme and its iteration complexityfor finding stationary points as in (1.4) and (1.5).
Before proceeding, we briefly outline the idea of the AIPP-S scheme. The mainidea is to apply the AIPP method described in the next subsection to the smoothCNO problem
(3.1) minxâXpΟ(x) := pΟ(x) + h(x) ,
where pΟ is as in (2.12) and Ο is a positive scalar that will depend on the tolerancesin (1.4) and (1.5). The above smoothing approximation scheme is similar to the oneused in [23]; the approximation function pΟ used in both schemes is smooth, but theone here is nonconvex while the one in [23] is convex. Moreover, while [23] uses anACG variant to approximately solve (3.1), the AIPP-S scheme uses the AIPP methoddiscussed below for this purpose.
3.1. AIPP method for smooth CNO problems. This subsection describesthe AIPP method studied in [15], and its corresponding iteration complexity result,for solving a class of smooth CNO problems.
We first describe the problem that the AIPP method is intended to solve. Let Xbe a finite-dimensional inner product and consider the smooth CNO problem
(3.2) Ïâ := infxâX
[Ï(x) := f(x) + h(x)]
where h : X 7â (ââ,â] and function f satisfy the following assumptions:(P1) h â Conv(X ) and f is differentiable on dom h;(P2) for some M â„ m > 0, the function f satisfies
âm2 âxâČ â xâ2 †f(xâČ)â [f(x) + ăâf(x), xâČ â xă] ,(3.3)
ââf(xâČ)ââf(x)â â€MâxâČ â xâ,(3.4)
for any x, xâČ â dom h;(P3) Ïâ defined in (3.2) is finite.
9
We now make four remarks about the above assumptions. First, it is well-knownthat a necessary condition for xâ â dom h to be a local minimum of (3.2) is that xâis a stationary point of Ï, i.e. 0 â âf(xâ) + âh(xâ). Second, it is well-known that(3.4) implies that (3.3) holds for any m â [âM,M ]. Third, it is easy to see fromProposition 4 that pΟ in (2.12) satisfies assumption (P2) with (M,f) = (LΟ, pΟ) whereLΟ is as in (2.13). Fourth, it is also easy to see that the function pΟ in (2.12) satisfiesassumption (P3) with Ïâ = infxâX pΟ(x) in view of assumption (A4) and Lemma 5.
For the purpose of discussing future complexity results, we consider the followingnotion of an approximate stationary point of (3.2): given a tolerance Ï > 0, a pair(x, u) â dom hĂX is said to be a Ïâapproximate stationary point of (3.2) if
(3.5) u â âf(x) + âh(x), âuâ †Ï.
We now state the AIPP method for finding a pair (x, u) satisfying (3.5).
AIPP method
Input: a function pair (f, h), a scalar pair (m,M) â R2++ satisfying (P2), scalars
λ â (0, 1/(2m)] and Ï â (0, 1), an initial point x0 â dom h, and a tolerance Ï > 0;Output: a pair (x, u) â dom hĂX satisfying (3.5);
(0) set k = 1 and define Ï := Ï/4, Δ := Ï2/[32(M + λâ1)], and Mλ := M + λâ1;(1) call the accelerated composite gradient (ACG) method in Appendix A with
inputs z0 = xkâ1, (”,L) = (1/2, λM + 1/2), Ïs = λf + â · âxkâ1â2/4, andÏn = λh + â · âxkâ1â2/4 in order to obtain a triple (x, u, Δ) â X Ă X Ă R+satisfying
(3.6) u â âΔ(λÏ+ 1
2â · âxkâ1â2)
(x), âuâ2 + 2Δ †Ïâxkâ1 â x+ uâ2;
(2) if âxkâ1âx+uâ †λÏ/5, then go to (3); otherwise set (xk, uk, Δk) = (x, u, Δ),increment k = k + 1 and go to (1);
(3) restart the previous call to the ACG method in step 1 to find a triple (x, u, Δ)such that Δ †Δλ and (x, u, Δ) = (x, u, Δ) satisfies (3.6);
(4) compute
x := argminxâČâX
ăâf(x), xâČ â xă+ h(xâČ) + Mλ
2 âxâČ â xâ2
,(3.7)
u := Mλ(xâ x) +âf(x)ââf(x),(3.8)
where Mλ is as in step 0, and output the pair (x, u).
We now make four remarks about the above AIPP method. First, at the kth
iteration of the method, its step 1 invokes an ACG method, whose description is givenin Appendix A, to approximately solve the strongly convex proximal subproblem
(3.9) minxâX
λÏ(x) + 1
2âxâ xkâ1â2
according to (3.6). Second, Lemma 12 shows that every ACG iterate (z, u, Δ) satisfiesthe inclusion in (3.6), and hence, only the inequality in (3.6) needs to be verified.Third, note that (3.4) implies that the gradient of the function Ïs defined in step 1 of
10
the AIPP method is (λM + 1/2)âLipschitz continuous. As a consequence, Lemma 12with L = λM + 1/2 implies that the triple (z, u, Δ) in step 1 of any iteration of theAIPP method can be obtained in O(
â[λM + 1]/Ï) ACG iterations.
Note that the above method differs slightly from the one presented in [15] in thatit adds step 4 in order to directly output a Ïâapproximate stationary point as in (3.5).The justification for the latter claim follows from [15, Lemma 12], [15, Theorem 13],and [15, Corollary 14], which also imply the following complexity result.
Proposition 6. The AIPP method terminates with a Ïâapproximate stationarypoint of (3.2) in
(3.10) O(â
λM + 1[
R(Ï;λ)âÏ(1â Ï)2λ2Ï2 + log+
1 (λM)])
ACG iterations, where
(3.11) R(Ï;λ) = infxâČ
12âx0 â xâČâ2 + λ [Ï(xâČ)â Ïâ]
.
Note that scaling R(Ï;λ) by 1/λ and then shifting by Ïâ results in the λ-Moreauenvelope1 of Ï. Moreover, R(Ï;λ) admits the upper bound
(3.12) R(Ï;λ) †min
12d
20, λ [Ï(x0)â Ïâ]
where d0 := inf âx0 â xââ : xâ is an optimal solution of (3.2).
3.2. AIPP-S scheme for min-max CNO problems. We are now ready tostate the AIPP-S scheme for finding approximate stationary points of the uncon-strained min-max CNO problem (1.1).
It is stated in a incomplete manner in the sense that it does not specify how theparameter Ο and the tolerance Ï used in its step 2 are chosen. Two invocations ofthis method, with different choices of Ο and Ï, are considered in Propositions 8 and9, which describe the iteration complexities for finding approximate stationary pointsas in (1.4) and (1.5), respectively.
AIPP-S scheme
Input: a triple (m,Lx, Ly) â R3++ satisfying (A3), a smoothing constant Ο > 0, an
initial point (x0, y0) â X Ă Y , and a tolerance Ï > 0;Output: a pair (x, u) â X ĂX ;
(0) set LΟ as in (2.13), Ï = 1/2, λ = 1/(4m), and define pΟ as in (2.12);(1) apply the AIPP method with inputs (m,LΟ), (pΟ, h), λ, Ï, x0, and Ï to obtain
a pair (x, u) satisfying
(3.13) u â âpΟ(x) + âh(x), âuâ †Ï;
(2) output the pair (x, u).
We now give four remarks about the above method. First, the AIPP methodinvoked in step 2 terminates due to [15, Theorem 13] and the third and fourth remarks
1See [28, Chapter 1.G] for an exact definition.
11
following assumptions (P1)â(P3). Second, since the AIPP-S scheme is a one-passmethod (as opposed to an iterative method), the complexity of the AIPP-S scheme isessentially that of the AIPP method. Third, similar to the smoothing scheme of [23]which assumes m = 0, the AIPP-S scheme is also a smoothing scheme for the case inwhich m > 0. On the other hand, in contrast to the algorithm of [23] which uses anACG variant, AIPP-S invokes the AIPP method to solve (3.1) due to its nonconvexity.Finally, while the AIPP method in step 2 is called with (Ï, λ) = (1/2, 1/(4m)), itcan also be called with any Ï â (0, 1) and λ â (0, 1/(2m)) to establish the desiredtermination of the AIPP-S scheme.
For the remainder of this subsection, our goal will be to show that a carefulselection of the parameter Ο and the tolerance Ï will allow the AIPP-S method togenerate approximate stationary points as in (1.5) and (1.4).
Before proceeding, we first present a bound on the quantity R(pΟ;λ) in terms ofthe data in problem (1.1). Its importance derives from the fact that the AIPP methodapplied to the smoothed problem (3.1) yields the bound (3.10) with Ï = pΟ.
Lemma 7. For every Ο > 0 and λ ℠0, it holds that
(3.14) R(pΟ;λ) †R(p;λ) +λD2
y
2Ο ,
where R(·, ·) and Dy are as in (3.11) and (2.8), respectively.Proof. Using Lemma 5 and the definitions of p and pΟ, it holds that
(3.15) pΟ(x)â infxâČpΟ(xâČ) †p(x)â inf
xâČp(xâČ) +
D2y
2Ο , âx â X.
Multiplying the above expression by (1 â Ï)λ and adding the quantity âx0 â xâ2/2yields the inequality
12âx0 â xâ2 + (1â Ï)λ
[pΟ(x)â inf
xâČpΟ(xâČ)
]†1
2âx0 â xâ2 + (1â Ï)λ[p(x)â inf
xp(xâČ)
]+ (1â Ï)
λD2y
2Ο âx â X,(3.16)
Taking the infimum of the above expression, and using the definition of R(·; ·) in(3.11) yields the desired conclusion.
We now show how the AIPP-S scheme generates a (Ïx, Ïy)âprimal-dual stationarypoint, i.e. one satisfying (1.4). Recall the definition of âoracle callâ in the paragraphcontaining (1.3).
Proposition 8. For a given tolerance pair (Ïx, Ïy) â R2++, let (x, u) be the pair
output by the AIPP-S scheme with input parameter Ο and tolerance Ï satisfying Ο â„Dy/Ïy and Ï = Ïx. Moreover, define
(3.17) (u, v) :=(u,y0 â yΟ(x)
Ο
), (x, y) := (x, yΟ(x)),
where yΟ is as in (2.13). Then, the following statements hold:(a) the AIPP-S scheme performs
(3.18) O
(ΩΟ
[m2R(p; 1/(4m))
Ï2x
+mD2
y
ΟÏ2x
+ log+1 (ΩΟ)
])12
oracle calls, where R(·; ·) and Dy are as in (3.11) and (2.8), respectively, and
(3.19) ΩΟ := 1 +âΟLy +
âLxâ
m;
(b) the quadruple (u, v, x, y) is a (Ïx, Ïy)âprimal-dual stationary point of (1.1).Proof. (a) Using the inequality in (2.13), it holds thatâ
LΟ4m + 1 †1 +
âLΟ4m †1 +
âΟLy +
âLx
2âm
= Î(ΩΟ).(3.20)
Moreover, using Proposition 6 with (Ï,M) = (pΟ, LΟ), Lemma 7, and bound (3.20),it follows that the number of ACG iterations performed by the AIPP-S scheme is onthe order given by (3.18). Since step 1 of the AIPP invokes once the ACG variant inAppendix A with a pair (Ïs, Ïn) of the form
Ïs = λpΟ + 14â · âzâ
2, Ïn = λh+ 14â · âzâ
2
for some z and each iteration of this ACG variant performs O(1) gradient evaluationsof Ïs, O(1) function evaluations of Ïs and Ïn, and O(1) Ïn-resolvent evaluations, itfollows from Proposition 4(b) and the definition of an âoracle callâ in the paragraphcontaining (1.3) that each one of the above ACG iterations requires O(1) oracle calls.Statement (a) now follows from the above two conclusions.
(b) It follows from the definitions of pΟ, tolerance Ï, and (y, u) in (2.12), the choiceof Ο and Ï, and (3.17), respectively, Proposition 4(b), and the inclusion in (3.13) thatâuâ †Ïx and
u â âpΟ(x) + âh(x) = âxΊ(x, yΟ(x)) + âh(x) = âxΊ(x, y) + âh(x).
Hence, we conclude that the top inclusion and the upper bound on âuâ in (1.4) hold.Next, the optimality condition of y = yΟ(x) as a solution to (2.12) and the definitionof v in in (2.12) give
(3.21) 0 â â [âΊ(x, ·)] (y) + y â y0
Ο= â [âΊ(x, ·)] (y)â v
Moreover, the definition of Ο implies that âvâ = ây â y0â/Ο †Dy/(Dy/Ïy) = Ïy.Hence, combining (3.21) and the previous identity, we conclude that the bottominclusion and the upper bound on âvâ in (1.4) hold.We now make three remarks about Proposition 8. First, recall that R(p; 1/(4m))in the complexity (3.18) can be majorized by the rightmost quantity in (3.12) with(Ï, λ) = (p, 1/(4m)). Second, under the assumption that Ο = Dy/Ïy, the complexityof AIPP-S scheme reduces to
(3.22) O
(m3/2 ·R(p; 1/(4m)) ·
[L
1/2x
Ï2x
+ LyD1/2y
Ï2xÏ
1/2y
])
under the reasonable assumption that the O(Ïâ2x +Ïâ2
x Ïâ1/2y ) term in (3.18) dominates
the other terms. Third, recall from the last remark following the previous proposition13
that when Y is a singleton, (1.1) becomes a special instance of (3.2) and the AIPP-S scheme becomes equivalent to the AIPP method of Subsection 3.1. It similarlyfollows that the complexity in (3.22) reduces to O(Ïâ2
x ) and, in view of this remark,the O(Ïâ2
x Ïâ1/2y ) term in (3.22) is attributed to the (possible) nonsmoothness in (1.1).
We next show how the AIPP-S scheme generates a point that is near a ÎŽâdirectional stationary point, i.e., one satisfying (1.5). Recall the definition of âoraclecallâ in the paragraph containing (1.3).
Proposition 9. Let a tolerance pair ÎŽ > 0 be given and consider the AIPP-Sscheme with input parameter Ο and tolerance Ï satisfying Ο â„ Dy/Ï and Ï = ÎŽ/2 forsome Ï â€ min
mÎŽ2/2Dy, ÎŽ
2/32mDy
. Then, the following statements hold:
(a) the AIPP-S scheme performs
(3.23) O
(ΩΟ
[R(p;λ)λ2Ύ2 +
D2y
λΟΎ2 + log+1 (ΩΟ)
])
oracle calls where ΩΟ, R(·; ·), and Dy are as in (3.19), (3.11), and (2.8),respectively;
(b) the first argument x in the pair output by the AIPP-S scheme satisfies (1.5).Proof. (a) Using Proposition 8 with (Ïx, Ïy) = (ÎŽ/2, Ï) and our assumption on Ï
it follows that the AIPP-S stops in a number of ACG iterations bounded above by(3.23).
(b) Let (x, u) be the Ïâapproximate stationary point of (3.1) generated by theAIPP-S scheme (see step 2) under the given assumption on Ο and Ï. Defining (v, y)as in (3.17), it follows from Proposition 8 with (Ïx, Ïy) = (ÎŽ/2, Ï) that (u, v, x, y) isa (ÎŽ/2, Ï)âprimal-dual stationary point of (1.1). As a consequence, it follows fromProposition 1 with (Ïx, Ïy) = (ÎŽ/2, Ï) that there exists a point x satisfying
âxâ xâ â€â
2DyÏ
m, infâdââ€1
pâČ(x; d) â„ âÎŽ2 â 2â
2mDyÏ .(3.24)
Combining the above bounds with our assumption on Ï yields the desired conclusionin view of (1.5).We now give four remarks about the above result. First, recall that R(p; 1/(4m))in the complexity (3.23) can be majorized by the rightmost quantity in (3.12) with(Ï, λ) = (p, 1/(4m)). Second, Proposition 9(b) states that, while x not a stationarypoint itself, it is near a ÎŽâdirectional stationary point x. Third, under the assumptionthat the bounds on Ο and Ï in Proposition 9 hold at equality, the complexity of theAIPP-S scheme reduces to
(3.25) O
(m3/2 ·R(p; 1/(4m)) ·
[L
1/2x
ÎŽ2 + LyDy
ÎŽ3
])
under the reasonable assumption that the O(ÎŽâ2 + ÎŽâ3) term in (3.23) dominatesthe other O(ÎŽâ1) terms. Fourth, when Y is a singleton, it is easy to see that (1.1)becomes a special instance of (3.2), the AIPP-S scheme becomes equivalent to theAIPP method of Subsection 3.1, and the complexity in (3.25) reduces to O(ÎŽâ2). Inview of the last remark, the O(ÎŽâ3) term in (3.25) is attributed to the (possible)nonsmoothness in (1.1).
14
4. Linearly-constrained min-max optimization. This section presents ourproposed QP-AIPP-S scheme for solving the linearly constrained min-max CNO prob-lem (1.6), and it is divided into two subsections. The first one reviews a QP-AIPPmethod for solving smooth linearly-constrained CNO problems. The second onepresents the QP-AIPP-S scheme and its iteration complexity for finding stationarypoints as in (1.8). Throughout our presentation, we let X ,Y, and U be finite dimen-sional inner product spaces.
Before proceeding, let us give the precise assumptions underlying the problem ofinterest and discuss the relevant notion of stationarity. For problem (1.6) supposethat assumptions (A0)â(A3) hold and that the linear operator A : X 7â U and vectorb â U satisfy:
(A5) A 6⥠0 and F := x â X : Ax = b 6= â ;(A6) there exists c â„ 0 such that infxâX
p(x) + câAxâ bâ2/2
> ââ.
Note that (A4) in Subsection 2.1 is replaced by (A6) which is required by the QP-AIPP method of the next subsection.
Analogous to the first remark following (2.6), it is known that if (xâ, yâ) satisfies(2.5) for every (x, y) â F Ă Y and Ί as in (2.4), then there exists a multiplier râ â Usuch that
(4.1)(
00
)â(âxΊ(xâ, yâ) +Aârâ
0
)+(
âh(xâ)â [âΊ(xâ, ·)] (yâ)
),
holds. Hence, in view of the third remark in the paragraph following (2.7), we onlyconsider the approximate version of (4.1) which is (1.8).
We now briefly outline the idea of the QP-AIPP-S scheme. The main idea is toapply the QP-AIPP method described in the next subsection to the smooth linearly-constrained CNO problem
(4.2) minxâXpΟ(x) + h(x) : Ax = b ,
where pΟ is as in (1.2) and Ο is a positive scalar that will depend on the tolerancesin (1.8). This idea is similar to the one in Section 3 in that it applies an acceleratedsolver to a perturbed version of the problem of interest.
4.1. QP-AIPP method for constrained smooth CNO problems. Thissubsection describes the QP-AIPP method studied in [15], and its correspondingiteration complexity, for solving linearly-constrained smooth CNO problems.
We begin by describing the problem that the QP-AIPP method intends to solve.Consider the linearly-constrained smooth CNO problem
(4.3) Ïâ := infxâXÏ(x) := f(x) + h(x) : Ax = b
where h : X 7â (ââ,â] and a function f satisfy assumptions (P1)â(P3), the operatorA : X 7â U is linear, b â U , and the following additional assumptions hold:
(Q1) A 6⥠0 and F := x â dom h : Ax = b 6= â ;(Q2) there exists c â„ 0 such that Ïc > ââ where
(4.4) Ïc := infxâX
Ïc(x) := Ï(x) + c
2âAxâ bâ2, âc â„ 0.
We now give some remarks about the above assumptions. First, similar to problem(3.2), it is well-known that a necessary condition for xâ â dom h to be a local minimum
15
of (4.3) is that xâ satisfies 0 â âf(xâ) + âh(xâ) + Aârâ for some râ â U . Second,it is straightforward to verify that (p, h,A, b) in (1.6) satisfy (Q1)â(Q2) in view ofassumptions (A5)â(A6). Third, since every feasible solution of (4.3) is also a feasiblesolution of (4.4), it follows from assumptions (Q2) that Ïâ â„ Ïc > ââ. Fourth, ifinfxâX Ï(x) > ââ (e.g., dom h is compact) then (Q2) holds with c = 0.
Our interest in this subsection is in finding an approximate stationary point of(4.3) in the following sense: given a tolerance pair (Ï, η) â R2
++, a triple (x, u, r) âdom hĂX Ă U is said to be a (Ï, η)âapproximate stationary point of (4.3) if
(4.5) u â âf(x) + âh(x) +Aâr, âuâ †Ï, âAxâ bâ †η.
We now state the QP-AIPP method for finding (x, u, r) satisfying (4.5).
QP-AIPP method
Input: a function pair (f, h), a scalar pair (m,M) â R2++ satisfying (3.3), scalars
λ â (0, 1/(2m)] and Ï â (0, 1), a scalar c satisfying assumption (Q2), an initial pointx0 â dom h, and a tolerance pair (Ï, η) â R2
++;Output: a triple (x, u, r) â dom hĂX Ă U satisfying (4.5);
(0) set c = c+M/âAâ2;(1) define the quantities
(4.6) Mc := M + câAâ2, fc := f + c
2âA(·)â bâ2, Ïc = fc + h,
and apply the AIPP method with inputs (m,Mc), (fc, h), λ, Ï, x0, and Ï toobtain a Ïâapproximate stationary point (x, u) of (3.2) with f = fc;
(2) if âAx â bâ > η then set c = 2c and go to (1); otherwise, set r = c (Axâ b)and output the triple (x, u, r).
We now give two remarks about the above method. First, it straightforward tosee that QP-AIPP method terminates due to the results in [15, Section 4]. Second,in view of Proposition 6 with (Ï,M) = (Ïc,Mc), it is easy to see that the number ofACG iterations executed in step 1 at any iteration of the method is
(4.7) O(â
λMc + 1[
R(Ïc;λ)âÏ(1â Ï)2λ2Ï2 + log+
1 (λMc)])
and that the pair (x, u) computed in step 1 satisfies the inclusion and the first in-equality in (4.5).
We now focus on the iteration complexity of the QP-AIPP method. Before pro-ceeding, we first define the useful quantity
(4.8) Rc(Ï;λ) := infxâČ
12âx0 â xâČâ2 + λ
[Ï(xâČ)â Ïc
]: xâČ â F
,
for every c â„ c, where Ïc is as defined in (4.4). The quantity in (4.8) plays an analogousrole as (3.11) in (3.10) and, similar to the discussion following Proposition 6, it is ascaled and shifted λ-Moreau envelope of Ï+ ÎŽF . Moreover, due to [15, Lemma 16], italso admits the upper bound
(4.9) Rc(Ï;λ) †Rc(Ï;λ) †min
12 d
20, λ
[Ïâ â Ïc
]16
where Ïâ is as defined in (4.3) and
d0 := inf âx0 â xââ : xâ is an optimal solution of (4.3) .
We now state the iteration complexity of the QP-AIPP method, whose proof maybe adapted from [15, Lemma 12] and [15, Theorem 18].
Proposition 10. Let a constant c as in assumption (Q2), scalar Ï â (0, 1),curvature pair (m,M) â R2
++, and a tolerance pair (Ï, η) â R2+ be given. Moreover,
define
(4.10) Tη := 2Rc(Ï;λ)η2(1â Ï)λ + c, Îη := M + TηâAâ2.
Then, the QP-AIPP method outputs a triple (x, u, r) satisfying (4.5) in
(4.11) O(â
λÎη + 1[
Rc(Ï;λ)âÏ(1â Ï)2λ2Ï2 + log+
1 (λÎη)])
ACG iterations.
4.2. QP-AIPP-S scheme for constrained min-max CNO problems. Weare now ready to state the QP-AIPP smoothing scheme for finding an approximateprimal-dual stationary point of the linearly-constrained min-max CNO problem (1.6).
QP-AIPP-S scheme
Input: a triple (m,Lx, Ly) â R2++ satisfying assumption (A3), a scalar c satisfying
assumption (A6), a smoothing constant Ο â„ Dy/Ïy, an initial point (x0, y0) â X ĂY ,and a tolerance triple (Ïx, Ïy, η) â R3
++;Output: a triple (u, v, x, y, r) satisfying (1.8);
(0) set LΟ as in (2.13), Ï = 1/2, λ = 1/(4m), and define pΟ as in (2.12);(1) apply the QP-AIPP method of Subsection 4.1 with inputs (m,LΟ), (pΟ, h), λ,
Ï, c, x0, and (Ïx, η) to obtain a triple (u, x, r) satisfying
(4.12) u â âpΟ(x) + âh(x) +Aâr, âuâ †Ïx, âAxâ bâ †η.
(2) define (v, y) as in (3.17) and output the quintuple (u, v, x, y, r).
Some remarks about the above method are in order. First, the QP-AIPP methodinvoked in step 1 terminates due to the remarks following assumptions (Q1)â(Q2)and the results in Subsection 4.1. Second, since the QP-AIPP-S scheme is a one-passalgorithm (as opposed to an iterative algorithm), the complexity of the QP-AIPP-S scheme is essentially that of the QP-AIPP method. Finally, while the QP-AIPPmethod in step 2 is called with (Ï, λ) = (1/2, 1/(4m)), it can also be called with anyÏ â (0, 1) and λ â (0, 1/(2m)) to establish the desired termination of the QP-AIPP-Sscheme.
We now show how the QP-AIPP-S scheme generates a point (u, v, x, y, r) satisfy-ing (1.8). Recall the definition of âoracle callâ in the paragraph containing (1.3).
Proposition 11. Let a tolerance triple (Ïx, Ïx, η) â R3++ be given and let the
quadruple (u, v, x, y, r) be the output obtained by the QP-AIPP-S scheme. Then thefollowing properties hold:
17
(a) the QP-AIPP-S scheme terminates in
(4.13) O
(ΩΟ,η
[m2Rc(p; 1/(4m))
Ï2x
+mD2
y
ΟÏ2x
+ log+1 (ΩΟ,η)
])oracle calls, where
(4.14) ΩΟ,η := ΩΟ +(Rc(p; 1/(4m)) +
D2y
mΟ
)1/2âAâη
and ΩΟ, R(·; ·), and Dy are as in (3.19), (3.11), and (2.8), respectively;(b) the quintuple (u, v, x, y, r) satisfies (1.8).Proof. (a) Let Îη be as in (4.10) with M = LΟ. Using the same arguments as
in Lemma 7, it is easy to see that Rc(pΟ; 1/(4m)) †Rc(p; 1/(4m)) +D2y/(8mΟ), and
hence, using (3.20), we haveâÎη
4m + 1 †1 +âLΟ4m +
â4Rc(pΟ; 1/(4m))âAâ2
η2
†1 +âΟLy +
âLx
2âm
+ 2(Rc(p; 1/(4m)) +
D2y
8mΟ
)1/2âAâη
= Î(ΩΟ,η).(4.15)
The complexity in (4.13) now follows from the above bound and Proposition 10 with(Ï, f,M) = (p, pΟ, LΟ).
(b) The top inclusion and bounds involving âuâ and âAxâbâ in (1.8) follow fromProposition 4(b), the definition of y in step 2 of the algorithm, and Proposition 10with f = pΟ. The bottom inclusion and bound involving âvâ follow from similararguments given for Proposition 8(b).We now make three remarks about the above complexity bound. First, recall thatRc(p; 1/(4m)) in the complexity (11) can be majorized by the rightmost quantity in(4.9) with λ = 1/(4m). Second, under the assumption that Ο = Dy/Ïy, the complexityof the QP-AIPP-S scheme reduces to
(4.16) O(m3/2 ·Rc(p; 1/(4m)) ·
[L
1/2x
Ï2x
+ LyD1/2y
Ï1/2y Ï2
x
+ m1/2âAâR1/2c (p; 1/(4m))ηÏ2x
]),
under the reasonable assumption that the O(Ïâ2x +ηâ1Ïâ2
x +Ïâ1/2y Ïâ2
x ) term in (4.13)dominates the other terms. Third, when Y is a singleton, it is easy to see that (1.6)becomes a special instance of the linearly-constrained smooth CNO problem (4.3),the QP-AIPP-S of this subsection becomes equivalent to the QP-AIPP method ofSubsection 4.1, and the complexity in (4.16) reduces to O(ηâ1Ïâ2
x ). In view of the lastremark, the O(Ïâ2
x Ïâ1/2y ) term in (4.16) is attributed to the (possible) nonsmoothness
in (1.6).Let us now conclude this section with a remark about the penalty subproblem
(4.17) minxâX
pΟ(x) + h(x) + c
2âAxâ bâ2,
which is what the AIPP method considers every time it is called in the QP-AIPP-Sscheme (see step 1). First, observe that (1.6) can be equivalently reformulated as
(4.18) minxâX
maxyâY,râU
[Κ(x, y, r) := Ί(x, y) + h(x) + ăr,Axâ bă] .
18
Second, it is straightforward to verify that problem (4.17) is equivalent to
(4.19) minxâXpc,Ο(x) := pc,Ο(x) + h(x) ,
where the function pc,Ο : X 7â R is given by pc,Ο(x) := maxyâY,râUΚ(x, y, r) âârâ2/(2c)â ây â y0â2/(2Ο), for every x â X, and Κ as in (4.18). As a consequence,problem (4.19) is similar to (3.1) in that a smooth approximate is used in place of thenonsmooth component of the underlying saddle function Κ.
On the other hand, observe that we cannot directly apply the smoothing schemedeveloped in Subsection 3.2 to (4.19) as the set U is generally unbounded. Oneapproach that avoids this problem is to invoke the AIPP method of Subsection 3.1 tosolve a sequence subproblems of the form in (4.19) for increasing values of c. However,in view of the equivalence of (4.17) and (4.19), this is exactly the approach taken bythe QP-AIPP-S scheme of this section.
5. Numerical experiments. This section presents numerical results that illus-trate the computational efficiency of the our proposed smoothing scheme. It containsthree subsections. Each subsection presents computational results for a specific un-constrained nonconvex min-max optimization problem class.
Each unconstrained problem considered in this section is of the form in (1.1) andis such that the computation of the function yΟ in (2.13) is easy. Moreover, for agiven initial point x0 â X, three algorithms are run for each problem instance until aquadruple (u, v, x, y) satisfying the inclusion of (1.4) and
âuâââpΟ(z0)â+ 1 †Ïx, âvâ †Ïy,(5.1)
is obtained, where Ο = Dy/Ïy.We now describe the three nonconvex-concave min-max methods that are being
compared in this section, namely: (i) the R-AIPP-S method; (ii) the acceleratedgradient smoothing (AG-S) scheme; and (iii) the projected gradient step framework(PGSF). Both the AG-S and R-AIPP-S schemes are modifications of the AIPP-Sscheme which, instead of using the AIPP method in its step 1, use the AG methodof [10] and R-AIPP method of [16], respectively. The PGSF is a simplified variantof Algorithm 2 of [24, Subsection 4.1] which explicitly evaluates the argmax functionαâ(·) in [24, Section 4] instead of applying an ACG variant to estimate its evaluation.
Regarding the penalty solvers, the AG method is implemented as described inAlgorithm 2 of [10] while the R-AIPP method follows the implementation describedin [14, Section 5.3].
Note that, like the AIPP method, the R-AIPP similarly: (i) invokes at each of its(outer) iterations an ACG method to inexactly solve the proximal subproblem (3.9);and (ii) outputs a Ïâapproximate stationary point of (3.2). However, the R-AIPPmethod is more computationally efficient due to three key practical improvementsover the AIPP method, namely: (i) it allows the stepsize λ to be significantly largerthan the 1/(2m) upper bound in the AIPP method using adaptive estimates of m;(ii) it uses a weaker ACG termination criterion compared to the one in (3.6); and (iii)it does not prespecify the minimum number of ACG iterations as the AIPP methoddoes in its step 1.
We next state some additional details about the numerical experiments. First,each algorithm is run with a time limit of 4000 seconds. Second, the bold numbers ineach of the computational tables in this section highlight the algorithm that performed
19
the most efficiently in terms of iteration count or total runtime. Moreover, each oftables contain a column labeled pΟ(x) that contains the smallest obtained value of thesmoothed function in (3.1), across all of the tested algorithms. Third, the descriptionof yΟ and choice of the constantsm,Lx, and Ly for each of the considered optimizationproblems can be found in [14, Appendix I]. Fourth, y0 is chosen to be 0 for all ofthe experiments. Finally, all algorithms described at the beginning of this section areimplemented in MATLAB 2019a and are run on Linux 64-bit machines each containingXeon E5520 processors and at least 8 GB of memory.
Before proceeding, it is worth mentioning that the code for generating the resultsof this section is available online2.
5.1. Maximum of a finite number of nonconvex quadratic forms. Thissubsection presents computational results for a minmax quadratic vector problem,which is based on a similar problem in [16].
We first describe the problem. Given a dimension triple (n, l, k) â N3, a set ofparameters (αi, ÎČi)ki=1 â R2
++, a set of vectors diki=1 â Rl, a set of diagonalmatrices Diki=1 â RnĂn, and matrices Ciki=1 â RlĂn and Biki=1 â RnĂn, theproblem of interest is the quadratic vector minmax (QVM) problem
minxâRn
maxyâRk
ÎŽân(x) +
kâi=1
yigi(x) : y â âk
,
where, for every index 1 †i †k, integer p â N, and x â Rn, we define gi(x) :=αiâCixâ diâ2/2â ÎČiâDiBixâ2/2 and âp :=
z â Rp+ :
âpi=1 zi = 1, z â„ 0
.
We now describe the experiment parameters for the instances considered. First,the dimensions are set to be (n, l, k) = (200, 10, 5) and only 5.0% of the entries of thesubmatrices Bi and Ci are nonzero. Second, the entries of Bi, Ci, and di (resp., Di)are generated by sampling from the uniform distribution U [0, 1] (resp., U [1, 1000]).Third, the initial starting point is z0 = In/n, where In is the nâdimensional identitymatrix. Fourth, with respect to the termination criterion, the inputs, for every (x, y) âRn Ă Rk, are Ί(x, y) =
âki=1 yigi(x), h(x) = ÎŽân(x), Ïx = 10â2,Ïy = 10â1, and
Y = âk. Finally, each problem instance considered is based on a specific curvaturepair (m,M) satisfying m †M , for which each scalar pair (αi, ÎČi) â R2
++ is selectedso that M = λmax(â2gi) and âm = λmin(â2gi).
We now present the results in Table 5.1.
M m pΟ(x)Iteration Count Runtime
R-AIPP-S AG-S PGSF R-AIPP-S AG PGSF100 100 2.85E-01 23 294 1591 0.66 5.72 22.60101 100 2.88E+00 86 1371 14815 1.37 25.96 209.62102 100 2.85E+01 217 6270 150493 3.35 118.32 2122.93103 100 2.85E+02 1417 28989 - 21.58 546.25 4000.00*
Table 5.1Iteration counts and runtimes for QVM problems.
5.2. Truncated robust regression. This subsection presents computationalresults for the robust regression problem in [26].
2See the examples in ./examples/minmax/ from the GitHub repositoryhttps://github.com/wwkong/nc_opt/.
20
It is worth mentioning that [26] also presents a min-max algorithm for obtaininga stationary point as in (5.1). However, its iteration complexity, which is O(Ïâ6)when Ï = Ïx = Ïy, is significantly worse than the other algorithms considered in thissection and, hence, we choose not to include this algorithm in our benchmarks.
We now describe the problem. Given a dimension pair (n, k) â N2, a set of n datapoints (aj , bj)ni=1 â Rk Ă1,â1 and a parameter α > 0, the problem of interest isthe truncated robust regression (TRR) problem
minxâRk
maxyâRn
nâj=1
yj(Ïα `j)(x) : y â ân
where ân is as in Subsection 5.1 with p = n, Ïα(t) := α log (1 + t/α), and `j(x) :=log(1 + eâbjăaj ,xă
), for every (α, t, x) â R++ Ă R++ Ă Rk,
We now describe the experiment parameters for the instances considered. First,α is set to 10 and the data points (ai, bi) are taken from different datasets inthe LIBSVM library3 for which each problem instance is based off of (see the âdatanameâ column in the table below, which corresponds to a particular LIBSVM dataset).Second, the initial starting point is z0 = 0. Third, with respect to the terminationcriterion, the inputs, for every (x, y) â Rk Ă Rn, are Ί(x, y) =
ânj=1 yj(Ïα `j)(x),
h(x) = 0, Ïx = 10â5, Ïy = 10â3, and Y = ân.We now present the results in Table 5.2.
data name pΟ(x)Iteration Count Runtime
R-AIPP-S AG-S PGSF R-AIPP-S AG PGSFheart 6.70E-01 425 1747 6409 6.37 15.54 32.76
diabetes 6.70E-01 852 1642 3718 8.61 24.12 52.77ionosphere 6.70E-01 1197 8328 54481 8.26 63.82 320.72
sonar 6.70E-01 45350 96209 - 461.52 580.37 4000.00*breast-cancer 1.11E-03 46097 - - 476.59 4000.00* 4000.00*
Table 5.2Iteration counts and runtimes for TRR problems
5.3. Power control in the presence of a jammer. This subsection presentscomputational results for the power control problem in [18].
It is worth mentioning that [18] also presents a min-max algorithm for obtainingstationary points for the aforementioned problem. However, its termination criterionand notion of stationarity are significantly different than what is being considered inthis paper and, hence, we choose not to include the algorithm of [18] in our bench-marks.
We now describe the problem. Given a dimension pair (N,K) â N2, a pair ofparameters (Ï,R) â R2
++, a 3D tensor A â RKĂKĂN+ , and a matrix B â RKĂN+ , theproblem of interest is the power control (PC) problem
minXâRKĂN
maxyâRN
Kâk=1
Nân=1
fk,n(X, y) : 0 †X †R, 0 †y †N
2 ,,
3See https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.
21
where, for every (X, y) â RKĂN Ă RN ,
fk,n(X, y) := â log(
1 + Ak,k,nXk,n
Ï2 +Bk,nyn +âKj=1,j 6=kAj,k,nXj,n
).
We now describe the experiment parameters for the instances considered. First,the scalar parameters are set to be (Ï,R) = (1/
â2,K1/K) and the quantities A and
B are set to be the squared moduli of the entries of two Gaussian sampled complexâvalued matrices H â CKĂKĂN and P â CKĂN . More precisely, the entries of Hand P are sampled from the standard complex Gaussian distribution CN (0, 1) withAj,k,n = |Hj,k,n|2 and Bk,n = |Pk,n|2 for every (j, k, n). Second, the initial startingpoint is z0 = 0. Third, with respect to the termination criterion, the inputs, areΊ(X, y) =
âKk=1
âNn=1 fk,n(X, y), h(X) = ÎŽQKĂN
R(X), Ïx = 10â1, Ïy = 10â1, and
Y = QNĂ1N/2 , for every (X, y) â RKĂN Ă RN and (U, V ) â N2, where QUĂVT := z â
RpĂq : 0 †z †T for every T > 0. Fourth, each problem instance considered isbased on a specific dimension pair (N,K).
We now present the results in Table 5.3.
N K pΟ(x)Iteration Count Runtime
R-AIPP-S AG-S PGSF R-AIPP-S AG PGSF5 5 -3.64E+00 37 322832 - 0.96 2371.27 4000.00*10 10 -2.82E+00 54 33399 - 0.75 293.60 4000.00*25 25 -4.52E+00 183 - - 9.44 4000.00* 4000.00*50 50 -4.58E+00 566 - - 40.89 4000.00* 4000.00*
Table 5.3Iteration counts and runtimes for PC problems.
6. Concluding Remarks. This section makes some concluding remarks.We first make a final remark about the AIPP-S smoothing scheme. Recall that
the main idea of AIPP-S is to call the AIPP method to obtain a pair satisfying (3.13),or equivalently4,
(6.1) infâdââ€1
(pΟ)âČ(x; d) â„ âÏ.
Moreover, using Proposition 8 with (Ïx, Ïy) = (Ï,Dy/Ο), it straightforward to seethat that the number of oracle calls, in terms of (Ο, Ï), is O(Ïâ2Ο1/2), which reducesto O(Ïâ2.5) if Ο is chosen so as to satisfy Ο = Î(Ïâ1). The latter complexity boundimproves upon the one obtained for an algorithm in [24] which obtains a point xsatisfying (6.1) with Ο = Î(Ïâ1) in O(Ïâ3) oracle calls.
We now discuss some possible extensions of this paper. First, it is worth investi-gating whether complexity results for the AIPP-S method can be derived for the casewhere Y is unbounded. Second, it is worth investigating if the notions of stationarypoints in Subsection 2.1 are related to first-order stationary points5 of the relatedmathematical program with equilibrium constraints:
min(x,y)âXĂY
Ί(x, y) + h(y) : 0 â â[âΊ(·, y)](x) .
4See Lemma 15 with f = pΟ.5See, for example, [19, Chapter 3].
22
Finally, it remains to be seen if a similar prox-type smoothing scheme can be developedfor the case in which assumption (A2) is relaxed to the condition that there existsmy > 0 such that âΊ(x, ·) is my-weakly convex for every x â X.
Appendix A. This appendix contains a description and a result about an ACGvariant used in the analysis of [15].
Part of the input of the ACG variant, which is described below, consists of a pairof functions (Ïs, Ïn) satisfying:
(i) Ïn â Conv(Z) is ”âstrongly convex for some ” â„ 0;(ii) Ïs is a convex differentiable function on domÏn whose gradient is LâLipschitz
continuous for some L > 0.
ACG method
Input: a scalar pair (”,L) â R2++, a function pair (Ïn, Ïs), and an initial point
z0 â domÏn;(0) set y0 = z0, A0 = 0, Î0 ⥠0 and j = 0;(1) compute
Aj+1 = Aj +”Aj + 1 +
â(”Aj + 1)2 + 4L(”Aj + 1)Aj
2L ,
zj = AjAj+1
zj + Aj+1 âAjAj+1
yj ,
Îj+1(y) = AjAj+1
Îj(y) + Aj+1 âAjAj+1
[Ïs(zj) + ăâÏs(zj), y â zjă] ây,
yj+1 = argminy
Îj+1(y) + Ïn(y) + 1
2Aj+1ây â y0â2
,
zj+1 = AjAj+1
zj + Aj+1 âAjAj+1
yj+1;
(2) compute
uj+1 = y0 â yj+1
Aj+1,
Δj+1 = Ï(zj+1)â Îj+1(yj+1)â Ïn(yj+1)â ăuj+1, zj+1 â yj+1ă;
(3) increment j = j + 1 and go to (1).
We now discuss some implementation details of the ACG method. First, a singleiteration requires the evaluation of two distinct types of oracles, namely: (i) the eval-uation of the functions Ïn, Ïs, âÏs at any point in domÏn; and (ii) the computationof the exact solution of subproblems of the form miny
Ïn(y) + ây â aâ2/(2α)
for
any a â Z and α > 0. In particular, the latter is needed in the computation of yj+1.Second, because Îj+1 is affine, an efficient way to store it is in terms of a normalvector and a scalar intercept that is updated recursively at every iteration. Indeed, ifÎj = αj + ă·, ÎČjă for some (αj , ÎČj) â RĂ Z, then step 1 of the ACG method impliesthat Îj+1 = αj+1 + ă·, ÎČj+1ă where
αj+1 := AjAj+1
αj + Aj+1 âAjAj+1
[Ïs(zj)â ăâÏj(zj), zjă] ,
23
ÎČj+1 := AjAj+1
ÎČj + Aj+1 âAjAj+1
[âÏs(zj)] .
The following result, whose proof is given in [15, Lemma 9], is used to establishthe iteration complexity of obtaining the triple (z, u, Δ) in step 1 of the AIPP methodof Subsection 3.1.
Lemma 12. Let (Aj , zj , uj , Δj) be the sequence generated by the ACG method.Then, for any Ï > 0, the ACG method obtains a triple (z, u, Δ) satisfying
(A.1) u â âΔ(Ïs + Ïn)(z) âuâ2 + 2Δ †Ïâz0 â z + uâ2
in at mostâ2â
2L(1 +âÏ)/âÏâiterations.
Appendix B. This appendix contains results about functions that can be de-scribed be as the maximum of a family of differentiable functions.
The technical lemma below, which is a special case of [9, Theorem 10.2.1], presentsa key property about max functions.
Lemma 13. Assume that the triple (X,Y,Κ) satisfies (A0)â(A1) in Subsection 2.1with Ί = Κ. Moreover, define
(B.1) q(x) := supyâY
Κ(x, y), Y (x) := y â Y : Κ(x, y) = q(x), âx â X.
Then, for every (x, d) â X ĂX , it holds that
qâČ(x; d) = maxyâY (x)
ăâxΚ(x; y), dă .
Moreover, if Y (x) reduces to a singleton, say Y (x) = y(x), then q is differentiableat x and âq(x) = âxΚ(x, y(x)).
Under assumptions (A0)â(A3) in Subsection 2.1, the next result establishes Lip-schitz continuity of the gradient of q. It is worth mentioning that it generalizes relatedresults in [2, Theorem 5.26] (which covers the case where Κ is bilinear) and [20, Pro-position 4.1] (which makes the stronger assumption that Κ(·, y) is convex for everyy â Y ).
Proposition 14. Assume that the triple (X,Y,Κ) satisfies (A0)â(A3) in Sub-section 2.1 with Ί = Κ and that, for some ” > 0, the function Κ(x, ·) is ”-stronglyconcave on Y for every x â X, and define
Q” := Ly”
+
âLx +m
”, L” := LyQ” + Lx, y(x) := argmax
yâYΚ(x, y)(B.2)
for every x â X. Then, the following properties hold:(a) y(·) is Q”âLipschitz continuous on X;(b) âq(·) is L”âLipschitz continuous on X where q is as in (B.1).Proof. (a) Let x, x â X be given and denote (y, y) = (y(x), y(x)). Define α(u) :=
Κ(u, y) â Κ(u, y) for every u â X, and observe that the optimality conditions of yand y imply that α(x) ℠”ây â yâ2/2 and âα(x) ℠”ây â yâ2/2. Using the previousinequalities, (2.1), (2.2), (2.3), and the Cauchy-Schwarz inequality, we conclude that
”ây â yâ2 †α(x)â α(x) †ăâxΚ(x, y)ââxΚ(x, y), xâ xă+ Lx +m
2 âxâ xâ2
24
†ââxΚ(x, y)ââxΚ(x, y)â · âxâ xâ+ Lx +m
2 âxâ xâ2
†Lyây â yâ · âxâ xâ+ Lx +m
2 âxâ xâ2.
Considering the above as a quadratic inequality in ây â yâ yields the bound
ây â yâ †12”
[Lyâxâ xâ+
âL2yâxâ xâ2 + 4”(Lx +m)âxâ xâ2
]â€
[Ly”
+
âLx +m
”
]âxâ xâ = Q”âxâ xâ
which is the conclusion of (a).(b) Let x, x â X be given and denote (y, y) = (y(x), y(x)). Using part (a),
Lemma 13, and (2.2) we have that
ââq(x)ââq(x)â = ââxΚ(x, y)ââxΚ(x, y)â†ââxΚ(x, y)ââxΚ(x, y)â+ ââxΚ(x, y)ââxΚ(x, y)â†Lyây â yâ+ Lxâxâ xâ †(LyQ” + Lx)âxâ xâ = L”âxâ xâ,
which is the conclusion of (b).
Appendix C. The main goal of this appendix is to prove Propositions 17 and18, which are used in the proofs of Propositions 1, 2, and 3 given in Appendix D.
The following well-known result presents an important property about the direc-tional derivative of a composite function f + h.
Lemma 15. Let h : X 7â (ââ,â] be a proper convex function and let f be adifferentiable function on dom h. Then, for any x â dom h, it holds that
(C.1) infâdââ€1
(f + h)âČ(x; d) = infâdââ€1
[ăâf(x), dă+ Ïâh(x)(d)
]= â inf
uââf(x)+âh(x)âuâ.
The proof of Lemma 15 can be found for example in [28, Exercise 8.8(c)]. Analternative and more direct proof is given in [14, Lemma F.1.2]. It is also worthmentioning that if we further assumed that dom h = X , then the above result wouldfollow from [3, Lemma 5.1].
The next technical lemma, which can be found in [29, Corollary 3.3], presents awell-known min-max identity.
Lemma 16. Let a convex set D â X and compact convex set Y â Y be given.Moreover, let Ï : D Ă Y 7â R be a function in which Ï(·, y) is convex lower semicon-tinuous for every y â Y and Ï(d, ·) is concave upper semicontinuous for every d â D.Then,
infdâX
supyâY
Ï(d, y) = supyâY
infdâX
Ï(d, y).
The next result establishes an identity similar to Lemma 15 but for the case wheref is a max function.
Proposition 17. Assume the quadruple (Κ, h,X, Y ) satisfies assumptions (A0)â(A3) of Subsection 2.1 with Ί = Κ. Moreover, suppose that Κ(·, y) is convex for everyy â Y , and let q and Y (·) be as in Lemma 13. Then, for every x â X, it holds that
(C.2) infâdââ€1
(q + h)âČ(x; d) = â infuâQ(x)
âuâ
25
where Q(x) := âh(x) +âyâY (x). Moreover, if âh(x) is nonempty, then the infimum
on the right-hand side of (C.2) is achieved.Proof. Let x â X and define
(C.3) Ï(d, y) := (Κy + h)âČ(x; d), â(d, x, y) â X à Ωà Y.
We claim that Ï in (C.3) satisfies the assumptions on Ï in Lemma 16 with Y = Y (x)and D given by
D := d â Z : âdâ †1, d â FX(x) ,
where FX(x) := t(x â x) : x â X, t â„ 0 is the set of feasible directions at x.Before showing this claim, we use it to show that (C.2) holds. First observe that (A1)and Lemma 13 imply that qâČ(x; d) = supyâY ΚâČy(x; d) for every d â X . Using thenLemma 16 with Y = Y (x), Lemma 15 with (f, x) = (Κy, x) for every y â Y (x), andthe previous observation, we have that
infâdââ€1
(q + h)âČ(x; d) = infdâD
(q + h)âČ(x; d) = infdâD
supyâY (x)
(Κy + h)âČ(x; d)
= infdâD
supyâY (x)
Ï(d, y) = supyâY (x)
infdâD
Ï(d, y) = supyâY (x)
infâdââ€1
(Κy + h)âČ(x; d)
= supyâY (x)
[â infuââxΊ(x,y)+âh(x)
âuâ]
=[â infuâQ(x)
âuâ].(C.4)
Let us now assume that âh(x) is nonempty, and hence, Q(x) is nonempty as well. Notethat continuity of the function âxΚ(x, ·) from assumption (A1) and the compactnessof Y (x) imply that Q is closed. Moreover, since âuâ â„ 0, it holds that any sequenceukkâ„1 where limkââ âukâ = infuâQ(x) âuâ is bounded. Combining the previoustwo remarks with the Bolzano-Weierstrass Theorem, we conclude that infuâQ(x) âuâ =minuâQ(x) âuâ, and hence (C.2) holds.
To complete the proof, we now justify the above claim on Ï. First, for any giveny â Y (x), it follows from [27, Theorem 23.1] with f(·) = Κy(·) and the definitions ofq and Y (x) that
(C.5) Ï(d, y) = ΚâČy(x; d) = inft>0
Κy(x+ td)â q(x)t
âd â X .
Since assumption (A2) implies that Κ(x, ·) is upper semicontinuous and concave onY , it follows from (C.5), [27, Theorem 5.5], and [27, Theorem 9.4] that Ï(d, ·) is uppersemicontinuous and concave on Y for every d â X . On the other hand, since Κ(·, y)is assumed to be lower semicontinuous and convex on X for every y â Y , it followsfrom (C.5), the fact that x â int Ω, and [27, Theorem 23.4], that Ï(·, y) is lowersemicontinuous and convex on X , and hence D â X , for every y â Y (x).
The last technical result is a specialization of the one given in [12, Theorem 4.2.1].Proposition 18. Let a proper closed function Ï : X 7â (ââ,â] and assume
that Ï + â · â2/2λ is ”-strongly convex for some scalars ”, λ > 0. If a quadru-ple (xâ, x, u, Δ) â X Ă domÏ Ă X Ă R+ together with λ satisfy the inclusion u ââΔ(Ï+ â · âxââ2/[2λ]
)(x), then the point x â domÏ given by
(C.6) x := argminxâČ
Ïλ(xâČ) := Ï(xâČ) + 1
2λâxâČ â xââ2 â ău, xâČă
26
satisfies
(C.7) infâdââ€1
ÏâČ(x; d) â„ â 1λâxâ â x+ λuâ â
â2Δλ2”
, âxâ xâ â€â
2Δ”.
Proof. We first observe that the assumed inclusion implies that Ïλ(xâČ) â„ Ïλ(x)âΔfor every xâČ â X. Using the previous inequality at xâČ = x, the optimality of x, andthe ”âstrong convexity of Ïλ, we have that ”âx â xâ2/2 †Ïλ(x) â Ïλ(x) †Δ fromwhich we conclude that âxâ xâ â€
â2Δ/”, i.e., the second inequality in (C.7).
To show the other inequality, let nλ := xâ â x+ λu. Using the definition of Ïλ,the triangle inequality, and the previous bound on âxâ xâ, we obtain
0 †infâdââ€1
ÏâČλ(x; d) = infâdââ€1
ÏâČ(x; d)â 1λăd, nλă
†infâdââ€1
ÏâČ(x; d) + ânλâλ
+ âxâ xâλ
†infâdââ€1
ÏâČ(x; d) + ânλâλ
+â
2Δλ2”
,(C.8)
which clearly implies the first inequality in (C.7).
Appendix D. This appendix presents the proofs of Propositions 1, 2, and 3.The first technical result shows that an approximate primal-dual stationary point
is equivalent to an approximate directional stationary point of a perturbed version ofproblem (1.1).
Lemma 19. Suppose the quadruple (Ί, h,X, Y ) satisfies assumptions (A0)â(A3)of Subsection 2.1 and let (x, u, v) â X Ă X Ă Y be given. Then, there exists y â Ysuch that the quadruple (u, v, x, y) satisfies the inclusion in (1.4) if and only if
(D.1) infâdââ€1
(pu,v + h)âČ(x; d) â„ 0,
where pu,v := maxyâY [Ί(x, y) + ăv, yă â ău, xă] for every x â Ω.Proof. Let (x, u, v) â X ĂX Ă Y be given, define
Κ(x, y) := Ί(x, y) + ăv, yă â ău, xă+mâxâ xâ2 â(x, y) â Ωà Y,(D.2)
and let q and Y (·) be as in Lemma 13. It is easy to see that q = pu,v, the functionΚ satisfies the assumptions on Κ in Proposition 17, and x satisfies (D.1) if and onlyif infâdââ€1(q + h)âČ(x; d) â„ 0. The desired conclusion follows from Proposition 17, theprevious observation, and the fact that y â Y (x) if and only if v â â[âΊ(x, ·)](y).
We are now ready to give the proof of Proposition 1.Proof of Proposition 1. Suppose (u, v, x, y) is a (Ïx, Ïy)âprimal-dual stationary
point of (1.1). Moreover, let Κ, q, and Dy be as in (D.2), (B.1) and (2.8), respectively,and define
q(x) := q(x) + h(x) âx â X.
Using Lemma 19, we first observe that infâdââ€1 q(x; d) â„ 0. Since q is convex fromassumption (A3), it follows from the previous bound and Lemma 15 with (f, h) =(0, q), that minuââq(x) âuâ †0, and hence, 0 â âq(x). Moreover, using the Cauchy-Schwarz inequality, the second inequality in (1.4), the previous inclusion, and thedefinition of q and Κ, it follows that for every x â X ,
p(x) +DyÏy â ău, xă+mâxâ xâ2 â„ q(x) â„ q(x) â„ p(x)âDyÏy â ău, xă ,27
and hence that u â âΔ(p + mâ · âxâ2)(x) where Δ = 2DyÏy. Using now the firstinequality in (1.4), Proposition 18 with (Ï, x, xâ, u) = (p, x, x, u) and also (Δ, λ, ”) =(DyÏy, 1/(2m),m), we conclude that there exists x such that âxâ xâ â€
â2DyÏy/m
andinfâdââ€1
pâČ(x; d) â„ ââuâ â 2â
2mDyÏy â„ âÏx â 2â
2mDyÏy.
We next give the proof of Proposition 2.Proof of Proposition 2. (a) We first claim that Pλ is α-strongly convex, where
α = 1/λ â m. To see this, note that Ί(·, y) + mâ · â2/2 is convex for every y â Yfrom assumption (A3). The claim now follows from assumption (A2), the fact thatthe supremum of a collection of convex functions is also convex, and the definition ofp in (1.1).
Suppose the pair (x, ÎŽ) satisfies (1.5) and (2.10). If x = xλ in (1.5), then clearlythe second inequality in (1.5), the fact that λ < 1/m, and (2.10) imply the inequalityin (2.9), and hence, that x is a (λ, Δ)-prox stationary point. Suppose now that x 6= xλ.Using the convexity of Pλ, we first have that P âČλ(x; d) = inft>0
[Pλ(x+ td)â Pλ(x)
]/t
for every d â X . Denoting nλ := (xλ â x)/âxλ â xâ, using both inequalities in (1.5)and the previous identity, we then have that
Pλ(xλ)â Pλ(x)âxλ â xâ
â„ pâČ (x;nλ) +âšnλλ, xâ x
â©â„ âÎŽ â âxâ xâ
λ℠âÎŽ
(1 + λ
λ
).
Using the optimality of xλ, the α-strong convexity of Pλ (see our claim on p in thefirst paragraph), and the above bound, we conclude that
12αâxâ xλâ
2 †Pλ(x)â Pλ(xλ) †Ύ(
1 + λ
λ
)âxâ xλâ.
Thus, âxâ xλâ †2αΎ(1 + λ)/λ. Using the previous bound, the second inequality in(1.5), and (2.10) yields
âxâ xλâ †âxâ xâ+ âxâ xλâ â€(
1 + 2α[
1 + λ
λ
])Ύ †λΔ,
which implies (2.9), and hence, that x is a (λ, Δ)-prox stationary point.(b) Suppose that the point x is a (λ, Δ)-prox stationary point with Δ †Ύ ·
min1, 1/λ. Then the optimality of xλ, the fact that Pλ is convex (see the be-ginning of part (a)), the inequality in (2.9), and the Cauchy-Schwarz inequality implythat
0 †infâdââ€1
[pâČ(xλ; d) + 1
λăd, xλ â xă
]†infâdââ€1
pâČ(xλ; d) + Δ †infâdââ€1
pâČ(xλ; d) + ÎŽ,
which, together with the fact that λΔ †Ύ, imply that x satisfies (1.5) with x = xλ.Finally, we give the proof of Proposition 3.Proof of Proposition 3. This follows by using Lemma 15 with (f, h) = (Ί(·, y), h)
and (f, h) = (0,âΊ(x, ·)).
REFERENCES
28
[1] K. Arrow, L. Hurwicz, and H. Uzawa, Studies in linear and non-linear programming, Cam-bridge Univ. Press, 1958.
[2] A. Beck, First-order methods in optimization, SIAM, 2017.[3] J. V. Burke and J. J. Moré, On the identification of active constraints, SIAM Journal on
Numerical Analysis, 25 (1988), pp. 1197â1211.[4] Y. Carmon, J. Duchi, O. Hinder, and A. Sidford, Accelerated methods for non-convex
optimization, Available on arXiv:1611.00756, (2017).[5] D. Davis and D. Drusvyatskiy, Stochastic model-based minimization of weakly convex func-
tions, SIAM Journal on Optimization, 29 (2019), pp. 207â239.[6] D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette, Subgradient methods for
sharp weakly convex functions, Journal of Optimization Theory and Applications, 179(2018), pp. 962â982.
[7] D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex func-tions and smooth maps, Mathematical Programming, 178 (2019), pp. 503â558.
[8] J. C. Duchi and F. Ruan, Stochastic methods for composite and weakly convex optimizationproblems, SIAM Journal on Optimization, 28 (2018), pp. 3229â3259.
[9] F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarityproblems, Springer Science & Business Media, 2007.
[10] S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochasticprogramming, Math. Program., 156 (2016), pp. 59â99.
[11] Y. He and R. D. C. Monteiro, An accelerated HPE-type algorithm for a class of compositeconvex-concave saddle-point problems, SIAM J. Optim., 26 (2016), pp. 29â56.
[12] J. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms II,Springer, Berlin, 1993.
[13] O. Kolossoski and R. D. C. Monteiro, An accelerated non-euclidean hybrid proximalextragradient-type algorithm for convex-concave saddle-point problems, Optim. MethodsSoftw., 32 (2017), pp. 1244â1272.
[14] W. Kong, Accelerated inexact first-order methods for solving nonconvex composite optimiza-tion problems, arXiv preprint arXiv:2104.09685, (2021).
[15] W. Kong, J. G. Melo, and R. D. C. Monteiro, Complexity of a quadratic penalty acceler-ated inexact proximal point method for solving linearly constrained nonconvex compositeprograms, SIAM Journal on Optimization, 29 (2019), pp. 2566â2593.
[16] W. Kong, J. G. Melo, and R. D. C. Monteiro, An efficient adaptive accelerated inex-act proximal point method for solving linearly constrained nonconvex composite problems,Computational Optimization and Applications, 76 (2020), pp. 305â346.
[17] T. Lin, C. Jin, and M. Jordan, Near-optimal algorithms for minimax optimization, arXivpreprint arXiv:2002.02417, (2020).
[18] S. Lu, I. Tsaknakis, and M. Hong, Block alternating optimization for non-convex min-maxproblems: Algorithms and applications in signal processing and communications, ICASSP2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), (2019), pp. 4754â4758.
[19] Z.-Q. Luo, J.-S. Pang, and D. Ralph, Mathematical programs with equilibrium constraints,Cambridge University Press, 1996.
[20] R. D. C. Monteiro and B. F. Svaiter, Convergence rate of inexact proximal point meth-ods with relative error criteria for convex optimization, submitted to SIAM Journal onOptimization, (2010).
[21] A. Nemirovski, Prox-method with rate of convergence O(1/t) for variational inequalities withlipschitz continuous monotone operators and smooth convex-concave saddle point problems,SIAM J. Optim., 15 (2004), pp. 229â251.
[22] A. Nemirovski and D. Yudin, Cesari convergence of the gradient method of approximatingsaddle points of convex-concave functions, in Dokl. Akad. Nauk, vol. 239, Russian Academyof Sciences, 1978, pp. 1056â1059.
[23] Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program., 103 (2005),pp. 127â152.
[24] M. Nouiehed, M. Sanjabi, T. Huang, J. Lee, and M. Razaviyayn, Solving a class of non-convex min-max games using iterative first order methods, in Advances in Neural Infor-mation Processing Systems, 2019, pp. 14905â14916.
[25] D. M. Ostrovskii, A. Lowy, and M. Razaviyayn, Efficient search of first-order nash equi-libria in nonconvex-concave smooth min-max problems, arXiv preprint arXiv:2002.07919,(2020).
[26] H. Rafique, M. Liu, Q. Lin, and T. Yang, Non-convex min-max optimization: Provablealgorithms and applications in machine learning, arXiv e-prints, (2018).
29
[27] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.[28] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317, Springer Science &
Business Media, 2009.[29] M. Sion, On general minimax theorems., Pacific Journal of mathematics, 8 (1958), pp. 171â176.[30] K. K. Thekumparampil, P. Jain, P. Netrapalli, and S. Oh, Efficient algorithms for
smooth minimax optimization, in Advances in Neural Information Processing Systems,2019, pp. 12680â12691.
30