1. Introduction. › 0273 › a0cd5cf29b7d... · An interesting attempt to solve general MPECs is to directly apply state-of-the-art NLP active set solvers, which do not rely on the

AN ACTIVE SET METHOD FOR MATHEMATICAL PROGRAMS WITHLINEAR COMPLEMENTARITY CONSTRAINTS

LIFENG CHEN∗ AND DONALD GOLDFARB†

April, 2007

Abstract. We study mathematical programs with linear complementarity constraints (MPLCC) for which theobjective function is smooth. Current nonlinear programming (NLP) based algorithms including regularization meth-ods and decomposition methods generate only weak (e.g., C- or M-) stationary points that may not be first-ordersolutions to the MPLCC. Piecewise sequential quadratic programming methods enjoy stronger convergence properties,but need to solve expensive subproblems. Here we propose a primal-dual active set projected Newton method forMPLCCs, that maintains the feasibility of all iterates. At every iteration the method generates a working set forpredicting the active set. The projected step direction on the subspace associated with this working set is determinedby the current dual iterate, while other elements in the step direction are computed by a Newton system. The majorcost of a subproblem involves one matrix factorization and is comparable to that of NLP based algorithms. Ourmethod has strong convergence properties. In particular, under the MPLCC-linear independence constraint qualifi-cation, any accumulation point of the generated iterates is a B-stationary solution (i.e., a first-order solution) to theMPLCC. The asymptotic rate of convergence is quadratic under additional MPLCC-second-order sufficient conditionsand strict complementarity.

Key words. mathematical programs with linear complementarity constraints, equilibrium constraints, active setmethod, B-stationary point, global convergence, local convergence

1. Introduction. A mathematical program with equilibrium constraints (MPEC) is an opti-mization problem, whose constraint set includes variational inequalities. Specifically, an MPEC canbe expressed by a bilevel programming problem in the form

minz,y F (z, y)s.t. (z, y) ∈ F ,

y ∈ Q(z),(1.1)

where F (·) is the “first-level” objective function in the first-level variables z ∈ <n and “second-level”variables y ∈ <m and F ⊆ <n+m is the joint feasible region. The set Q(z) is the solution set of avariational inequality parameterized by z with the feasible set F(z), i.e.,

y ∈ Q(z)⇔ {y ∈ F(z)|G(z, y)>(w − y) ≥ 0, ∀w ∈ F(z)},(1.2)

where G : <n+m → <m is a mapping in y and z and > denotes the transpose of a vector. Inparticular, if G(z, y) is the gradient of a differentiable function g(z, y) in the vector y, i.e., G(z, y) =∇yg(z, y), then the solution set Q(z) reduces to the first-order solution set of the optimizationproblem

miny

g(z, y) s.t. y ∈ F(z).(1.3)

MPECs have received increasing interest in recent years because of their numerous applicationsin engineering and economics, see [7, 24, 26]. For example, in game theory, e.g., the static Stackelberggame can be formulated as an MPEC in which the variational inequality constraints arise from acompetitive equilibrium [3].

Problem (1.1), being typically non-convex and non-differentiable, is intrinsically hard to solve.For example, even in the linear case, i.e., F (·) and G(·) are linear mappings and F and F(z) for

∗IEOR Department, Columbia University, New York, NY10027 ([email protected]). Research supportedby the Presidential Fellowship of Columbia University.

†IEOR Department, Columbia University, New York, NY10027 ([email protected]). Research supportedin part by NSF Grant DMS 01-04282, DOE Grant DE-FG02-92EQ25126 and DNR Grant N00014-03-0514.

1

2 L. CHEN AND D. GOLDFARB

all z are polyhedra, problem (1.1) is strongly NP-hard. Indeed, in this case, problem (1.1) can beformulated as a mixed integer linear program and determining whether a solution is optimal to (1.1)is strongly NP-hard.

An attractive way to handle the underlying non-differentiability in the lower-level constraints of(1.1) is to replace the constraint y ∈ Q(z) by its first-order necessary conditions. Suppose the feasiblesets F and every F(z) are polyhedra, i.e., F = {(z, y)|D1z+D2y ≤ d} and F(z) = {w|D1z+D2w ≤d} for some matrices D1, D2, D1 and D2 and some vectors d and d with corresponding dimensions.The necessary conditions for y ∈ Q(z) give rise to a system of equations and inequalities in theprimal variables y and the dual variables u

G(z, y) + D>2 u = 0, D1z + D2y ≤ d, u ≥ 0, (D1z + D2y − d)>u = 0.(1.4)

Note that when G(z, y) = ∇yg(z, y), (1.4) are the Karush-Kuhn-Tucker (KKT) optimality conditionsfor problem (1.3). These conditions become sufficient conditions for y ∈ Q(z) if the mapping G(z, w)is monotone in w. Replacing the constraint y ∈ Q(z) in (1.1) by (1.4), problem (1.1) becomes amathematical program with complementarity constraints (MPCC) with variables (z, y, u)

minz,y F (z, y)s.t. D1z + D2y ≤ d,

D1z + D2y ≤ d, u ≥ 0G(z, y) + D>

2 u = 0,(D1z + D2y − d)>u = 0.

(1.5)

Our primary interest in this paper is the development of a solution method for problem (1.5) thatpossesses strong convergence properties. In particular, we will restrict ourselves to the situation inwhich G(z, y) is a linear mapping in both y and z. In such a case problem (1.5) reduces to a so-calledmathematical program with linear complementarity constraints (MPLCC). To simply notation, wewill focus on MPLCCs in the standard form:

min f(x)s.t. Ax = b,

x0 ≥ 0,0 ≤ x1 ⊥ x2 ≥ 0.

(1.6)

The vector of variables x has three parts, i.e., x = (x>0 , x>1 , x>2 )> with x0 ∈ <n, x1, x2 ∈ <m. Thecomplementarity operator ⊥ requires that for each pair of nonnegative complementarity variablesx1,i, x2,i ≥ 0, either x1,i = 0 or x2,i = 0 for all i = 1, ...,m. Throughout the paper we assume thatthe feasible region of problem (1.6) is nonempty, that the objective function f : <n+2m → < is twicecontinuously differentiable, and that the matrix A ∈ <p×(n+2m) and the vector b ∈ <p.

From a nonlinear programming (NLP) point of view, MPCCs are among the most highly de-generate problems since they violate at every feasible point the Mangasarian-Fromovitz constraintqualification (MFCQ), a key ingredient for stability. This lack of regularity may impose difficultiesfor existing nonlinear programming solvers. Interior-point algorithms do not directly apply sincethere is no strictly feasible solution to (1.6). Sequential quadratic programming (SQP) methods canget stuck due to infeasible QP subproblems. Solution methods that overcome these difficulties arediscussed in the next subsection.

1.1. Related work. Research on solution methods for the continuous variant (1.6) of MEPCshas followed two main approaches: regularization and decomposition.

Although any smooth reformulation of the complementarity constraints in (1.6) violates theMFCQ, slightly relaxing these reformulations resolves this difficulty and results in problems satisfyingthe MFCQ. State-of-the-art NLP solvers can then be applied. The basic idea of regularization

Active Set Method for MPLCCs 3

methods is to compute solutions for a sequence of the relaxed problems controlled by some parameter,obtaining a stationary solution to problem (1.6) when the parameter goes beyond a certain thresholdor goes to the limit.

For instance, the regularization framework [31] introduces a positive parameter φ and relaxesthe complementarity constraints to {x1, x2 ≥ 0, x>1 x2 ≤ φ}. Interior-point algorithms for NLPsare shown to be able to solve the resulting relaxations [23, 27]. Global and fast local convergenceto stationary solutions of problem (1.6) is analyzed by letting φ converge to zero. In [5] fast localconvergence is also established allowing φ to be bounded away from zero. An alternative to thisrelaxation is to introduce an `1-penalty for the complementarity constraints and add rx>1 x2 to theobjective function, where r is the penalty parameter. The resulting penalized problem is well be-haved. Together with a suitable technique for updating r, the penalty scheme identifies a stationarysolution of (1.6) as long as r is large enough or r goes to infinity [20]. Other regularization approachesthat explore different relaxations include an elastic model [1, 2], smoothing schemes [6, 12, 17] aswell as the approaches proposed in [16, 21, 32], etc.

Much work has been done on understanding the limiting behavior of regularization methodsfor MEPCs with increasingly tighter relaxations. It turns out that the convergence properties ofall the proposed regularization schemes are surprisingly similar. In particular, any cluster pointof the sequence of the first-order solutions for these relaxations is at least a C-stationary point ofthe MPEC. Moreover, if these first-order solutions are also second-order solutions to the relaxedproblems, their cluster points are at least M-stationary points. The assumption made to achievethese results is, in general, the MPEC-linear independence constraint qualification (MPEC-LICQ),also known as the primal non-degeneracy assumption. An important question that remains openis whether regularization methods can be strengthened to provided convergence to first-order (i.e.,B-stationary) solutions of the MPEC, since there are examples showing that regularization methodsmay converge to weak C- or M-stationary points at which trivial first-order descent directions exist.Such points are really spurious stationary points that are not locally optimal.

Another approach for attacking problem (1.6) is to use decomposition methods. These methodsexplore the disjunctive structure of the complementarity constraints and are essentially an extensionof pivoting algorithms for complementarity problems. At each iteration, these methods specify twosubsets B1 and B2 with B1 ∪ B2 = {1, 2, ...,m}. By decomposing the complementarity constraintsaccording to B1 and B2, they compute a stationary solution to the NLP subproblem

min f(x)Ax = b, x0 ≥ 0,x1,i = 0, i ∈ B1, x1,i ≥ 0, i ∈ B2\B1,x2,i = 0, i ∈ B2, x2,i ≥ 0, i ∈ B1\B2.

(1.7)

Obviously, if the sets B1 and B2 are able to identify respectively the correct active sets for x1 andx2 at an optimal solution, solutions to problems (1.7) also solve problem (1.6). Therefore, the keypoint for decomposition methods lies in how to choose B1 and B2 effectively, especially when theiterates are far from the solution. One way is to determine these sets according to the currentvalues of the dual variables. Recent developments in decomposition methods show that they haveguaranteed convergence to M-stationary points of problem (1.6) assuming primal non-degeneracy[13, 15]. In [14] this result is strengthened to show convergence to B-stationary points under anadditional assumption that every M-stationary point is isolated.

An interesting attempt to solve general MPECs is to directly apply state-of-the-art NLP activeset solvers, which do not rely on the existence of a strict interior. Based on a filter-SQP solver,Fletcher and Leyffer [9] solve MPECs as NLPs by replacing the complementarity constraints by theinequalities: x1 ≥ 0, x2 ≥ 0 and x1x2 ≤ 0. Surprisingly, quadratic convergence is observed foralmost all tested MPEC problems. The reason for this is studied in [10].


Piecewise SQP (PSQP) methods are the natural extension of NLP SQP methods for solvingMEPCs. The subproblem solved by PSQP methods at each iteration is a quadratic program withlinear complementarity constraints (QPLCC). As a variant of SQP methods, it is not surprising thatPSQP methods enjoy reasonably strong convergence properties. In particular, it has been shownthat under rather weak conditions, PSQP methods converge quadratically in a small neighborhoodof a second-order local solution [25, 28]. Unfortunately, since solving the QPLCC subproblem ismuch more expensive than solving a regular quadratic program, current PSQP methods are onlyconceptual. How to reduce this cost and still guarantee the strong convergence properties of PSQPmethods is the subject of current research [22].

1.2. Contribution. Given the state of current solution methods for problem (1.6), this paperis aimed at an efficient method that not only is implementable with reasonable subproblem cost, butalso enjoys strong global and local convergence to meaningful stationary solutions. In particular,we propose a primal-dual active set Newton method for problem (1.6). The method starts from afeasible point and maintains feasibility at every iterate. In many important applications, a feasiblestarting point to problem (1.6) is readily available through state-of-the-art codes for linear comple-mentarity problems (LCP) or linear variational inequalities. Moreover, in many interesting casessuch a feasible solution can be obtained in polynomial time, e.g., if the matrix defining the LCP ispositive semidefinite, or the variational inequalities are strictly monotone.

At every iteration our method generates a working set for predicting the active set. The com-plementarity and inequality constraints in the working set are treated approximately as equalityconstraints depending on the current estimate for the dual variables, while those not in the set aretemporarily ignored. In terms of projected Newton methods, the working set defines the projec-tion space, in which elements of the step direction are chosen according to the multipliers. Thesubproblem solved at each step is as inexpensive to solve as the subproblem solved in NLP based al-gorithms. Specifically, the major cost involves only one matrix factorization besides the cost neededto guarantee the correct condition of the Newton system. The convergence properties of our algo-rithm are as strong as those of the PSQP methods and stronger than any that have been shownfor NLP-based regularization or decomposition methods. In particular, we show that under theprimal non-degeneracy condition, any accumulation point of the generated iterates is a B-stationarypoint (i.e., a first-order solution) to problem (1.6). Under additional MPLCC-second order sufficientconditions (SOSC) and strict complementarity, we show that the generated working set eventuallyidentifies the final active set at the B-stationary solution. As a consequence, the convergence rateof our algorithm is locally quadratic.

As a first step for this research, in this paper we will focus on the global and local convergencetheory of the proposed method. In particular, the paper is organized as follows. Section 2 presentsnotation and some definitions and preliminary results relevant to MPECs. In section 3, we describeour active set algorithm for solving problem (1.6). Global convergence to B-stationary solutions andfast local convergence are established in Sections 4 and 5, respectively. We conclude the paper inSection 6.

2. Preliminaries. We use standard notation in the literature of MPECs and NLPs. A letterwith superscript or subscript k is related to the kth iteration. The subscript i means the ith elementof a vector or the ith column of a matrix. ∅ stands for empty set. ‖ · ‖ denotes the Euclidean normunless otherwise specified. For a vector d of dimension n + 2m, we write d = (d>0 , d>1 , d>2 )> withd0 ∈ <n, d1 ∈ <m and d2 ∈ <m, respectively, as (d0; d1; d2). Moreover, for an index subset D, d0,D(d1,D, d2,D) denote the sub-vector of d0 (d1, d2) with components d0,i (d1,i, d2,i) for i ∈ D. For anindex i, vectors e0,i, e1,i and e2,i are respectively the n + 2m dimensional gradients of x0,i, x1,i andx2,i, i.e., all elements of e0,i (e1,i, e2,i) are zero except the one corresponding to variable x0,i (x1,i,x2,i), which is one. Matrices e>0 , e>1 and e>2 denote the Jacobian of x0, x1 and x2, respectively. Giventwo vectors x and y of the same dimension q, we say x ≥ (>)y if and only if xi ≥ (>)yi, ∀i = 1, ..., q,


and min(x, y) is a vector whose ith element is min(xi, yi). We define the following index sets for theconstraints in problem (1.6)

E = {1, ..., p}, I = {1, ..., n}, A = {1, ...,m}.

The active sets for inequality and complementarity constraints at a feasible point x are defined by

I(x) = {i ∈ I|x0,i = 0}, A1(x) = {i ∈ A|x1,i = 0}, A2(x) = {i ∈ A|x2,i = 0}.

In particular, the bi-active set A12(x) for the complementarity constraints is defined as the intersec-tion of A1(x) and A2(x), i.e., A12(x) = A1(x) ∩ A2(x).

A local optimal solution to problem (1.6) must be a first-order optimal solution, also known as aB-stationary point, at which no feasible descent directions exist. The primal stationarity conditionfor B-stationary points is given by

Definition 2.1. A feasible point x is a B-stationary point of problem (1.6) if the followinglinear program with linear complementarity constraints is solved by 4x = (4x0;4x1;4x2) = 0,

min4x ∇f(x)>4xs.t. A4x = 0,

x0 +4x0 ≥ 0,0 ≤ (x1 +4x1) ⊥ (x2 +4x2) ≥ 0.

(2.1)

Checking B-stationarity from Definition 2.1 is usually very difficult. Thus, it is more convenient toconsider primal-dual stationarity conditions. For instance, under certain constraint qualifications,one can check whether a feasible point x is a weak stationary solution of problem (1.6); i.e., thereexist dual variables y ∈ <p and λ = (λ0;λ1; , λ2) ∈ <n+2m that along with x satisfy

∇xL(x, y, λ) = 0,(2.2)

λ0 ≥ 0, λ0,ix0,i = 0, ∀i ∈ I,(2.3)

λ1,ix1,i = 0, λ2,ix2,i = 0, ∀i ∈ A,(2.4)

where L(x, y, λ) is the Lagrangian function

L(x, y, λ) = f(x) + (Ax− b)>y − x>0 λ0 − x>1 λ1 − x>2 λ2.

Unlike the simple multiplier rule for NLPs, there are several levels of stationarity conditionsfor MPLCCs depending on the sign to the complementarity constraint multipliers. The stationarityconditions of primary interest in this paper include C-stationarity, M-stationarity and S-stationarity.

Definition 2.2. A feasible point x is called a C-stationary point of problem (1.6), if there existy and λ such that (2.2), (2.3) and (2.4) hold and λ1,iλ2,i ≥ 0 for all i ∈ A.

Definition 2.3. A feasible point x is called a M-stationary point of problem (1.6), if there existy and λ such that (2.2), (2.3) and (2.4) hold and either λ1,i ≥ 0, λ2,i ≥ 0 or λ1,iλ2,i = 0 for alli ∈ A12(x).

Definition 2.4. A feasible point x is called a S-stationary point of problem (1.6), if there existy and λ such that (2.2), (2.3) and (2.4) hold and λ1,i ≥ 0, λ2,i ≥ 0 for all i ∈ A12(x).

Note that S-stationarity is the strongest stationarity condition. A S-stationary point must be aM-stationary point and a M-stationary point must be a C-stationary point. Moreover, a S-stationarypoint must be a B-stationary point. B-stationarity does not necessarily imply S-stationarity andS-stationary points may not exist for problem (1.6). However, as is well known (see, e.g., [30]),B-stationarity is equivalent to S-stationarity under the following primal non-degeneracy condition:


Definition 2.5. (MPLCC-LICQ) A feasible point x is said to satisfy the MPLCC-LICQ, ifthe following constraint gradients are linearly independent:

{Ai, i ∈ E} ∪ {e0,i, i ∈ I(x)} ∪ {e1,i, i ∈ A1(x)} ∪ {e2,i, i ∈ A2(x)}.

MPLCC-LICQ is important in both practice and theory. It allows the identification of a B-stationarysolution as a S-stationary point, which is much easier than solving the LPLCC (2.1). In particular,given the validity of the MPLCC-LICQ at x, the unique multiplier vector (y, λ) that satisfies (2.2)and the complementarity conditions can be obtained by solving the linear system

∇f(x) + A>y −∑

i∈I(x)

λ0,ie0,i −∑

i∈A1(x)

λ1,ie1,i −∑

i∈A2(x)

λ2,ie2,i = 0.

For other multipliers in λ set λ0,i = 0, ∀i ∈ I\I(x), λ1,i = 0, ∀i ∈ A\A1(x) and λ2,i = 0, ∀i ∈A\A2(x). If furthermore the multipliers corresponding to the bi-active setA12(x) are all nonnegative,x is a S-stationary point and thus also a B-stationary point. Otherwise, x is not a B-stationary point.

We now define second-order sufficient conditions (SOSC) for a feasible point x to be a solutionof problem (1.6).

Definition 2.6. The MPLCC-SOSC hold at a feasible point x if x is a S-stationary point ofproblem (1.6) and there is a % > 0 such that d>∇2f(x)d ≥ %‖d‖2 for all

d ∈ S(x) = {s ∈ <n+2m|As = 0; s0,i = 0, ∀i ∈ I(x); s1,i = 0, ∀i ∈ A1(x); s2,i = 0, ∀i ∈ A2(x)}.

Strict complementarity for problem (1.6) is defined as follows.Definition 2.7. Let x be a S-stationary point of problem (1.6) and (y, λ) be a corresponding

multiplier vector. Strict complementarity holds at (x, y, λ) if λ0,i > 0, ∀i ∈ I(x), λ1,i 6= 0, ∀i ∈A1(x) and λ2,i 6= 0, ∀i ∈ A2(x).

So far we have described stationarity conditions for problem (1.6) from the perspective ofMPECs. Now let us look at problem (1.6) from the view point of nonlinear optimization by writingit as:

min f(x)s.t. Ax = b,

x0 ≥ 0, x1 ≥ 0, x2 ≥ 0,x>1 x2 ≤ 0.

(2.5)

A feasible point x is a first-order optimal solution to problem (2.5), if there exist multipliers y ∈ <p,z0 ∈ <n, z1 ∈ <m, z2 ∈ <m and z3 ∈ < such that the KKT conditions hold; i.e.,

R(x, y, z) =

∇xL(x, y, z)min{x0, z0},min{x1, z1},min{x2, z2},

min{x>1 x2, z3}

= 0,(2.6)

where z = (z0; z1; z2; z3) and L(x, y, z) is the Lagrangian function associated with the NLP (2.5),

L(x, y, z) = f(x) + (Ax− b)>y − x>0 z0 − x>1 z1 − x>2 z2 + z3x>1 x2.(2.7)

Proposition 4.1 in [10] shows the strong relation between a KKT point of problem (2.5) and aS-stationary solution to problem (1.6), which we describe as follows.


Proposition 2.8. Suppose x is a feasible point of (1.6) and (2.5). If (x, y, z) satisfies the KKTconditions (2.6), then x is a S-stationary point of (1.6) with multipliers y and λ, where λ is definedby

λ0 = z0, λ1 = z1 − z3x2, λ2 = z2 − z3x1.(2.8)

If x is a S-stationary point of problem (1.6) with a corresponding multiplier vector (y, λ), then(x, y, z) satisfies the KKT conditions (2.6), where z is defined by

z0 = λ0,

z3 ∈ {θ ≥ 0|λ1 + θx2 ≥ 0, λ2 + θx1 ≥ 0},z1 = λ1 + z3x2,

z2 = λ2 + z3x1.

(2.9)

The SOSC for problem (2.5) follows from the standard definition of the SOSC for general NLPs.In particular, it is easy to check that under the MPLCC-LICQ (implying the uniqueness of themultipliers) and the MPLCC-SOSC, the SOSC for problem (2.5) holds for any multipliers thatsatisfy (2.9).

3. Algorithm. Our algorithm is an active set Newton method, which maintains the feasibilityof all iterates. At every iteration, the algorithm computes a working set that approximates the finalactive set at a solution. Given an iterate x and a multiplier estimate (y, z), the indices of the workingset that correspond, respectively to x0, x1 and x2, are those indices in the sets I(x, y, z; ε) = {i ∈ I |x0,i ≤ min{ε, ‖R(x, y, z)‖η}},

A1(x, y, z; ε) = {i ∈ A|x1,i ≤ min{ε, ‖R(x, y, z)‖η}},A2(x, y, z; ε) = {i ∈ A|x2,i ≤ min{ε, ‖R(x, y, z)‖η}},

(3.1)

where ε > 0 and η ∈ (0, 1) are parameters, R(x, y, z) is defined by (2.6). The set

A12(x, y, z; ε) = A1(x, y, z; ε) ∩ A2(x, y, z; ε)(3.2)

is called the bi-working set.Let W denote the current working set; i.e., W ≡ (I(x, y, z; ε),A1(x, y, z; ε),A2(x, y, z; ε)). Also

let the Jacobian of the inequality and complementarity constraints in this working set be denotedby

E(x, y, z; ε) = [E0(x, y, z; ε), E1(x, y, z; ε), E2(x, y, z; ε)],(3.3)

where E0(x, y, z; ε) = [e0,i, i ∈ I(x, y, z; ε)],E1(x, y, z; ε) = [e1,i, i ∈ A1(x, y, z; ε)],E2(x, y, z; ε) = [e2,i, i ∈ A2(x, y, z; ε)].

(3.4)

Using the notation dW = (dI(x,y,z;ε); dA1(x,y,z;ε); dA2(x,y,z;ε)), several Newton systems are solved ateach iteration, where each system is a linearization of the nonlinear system of equations

∇f(x) + A>y − E(x, y, z; ε)λW = 0,Ax− b = 0,

xW − uW = 0


for some appropriate choice of uW . In particular, letting $ = uW − xW , the Newton systems are

M(H, x, y, z; ε)(4x; y+;λ+W) = (−∇f(x); 0; $),(3.5)

where the Newton matrix is given by

M(H, x, y, z; ε) =

H A> −E(x, y, z; ε)A 0 0

E(x, y, z; ε)> 0 0

,(3.6)

and H is the exact Hessian ∇2f(x) or an approximation to it.At the start of each iteration, the system (3.5) is solved with $ = 0, which is equivalent to

keeping the components of xW fixed at their current values (i.e., setting uW = xW), to obtain anupdated estimate λ+

W of the dual variables. The values of the components of λ+W and xW are then

compared to determine which indices in the working set W are likely to be in an optimal active set,and $ is chosen so that a full Newton step will make the corresponding constraints active. If thisstep, which is modified by a step length parameter, keeps all variables nonnegative and satisfies anArmijo sufficient decrease condition, the modified step is taken, the multipliers are updated and theiteration is complete. Otherwise, the value of the tolerance ε used to define the working set in (3.1)is reduced to further refine the working set, and the above step computation procedure is repeated.

Now we are ready to present our active set algorithm for solving problem (1.6). The motivationsunderlying the computations performed by the algorithm are more fully explained in the remarksthat follow it.

Algorithm 3.1.Step 0. Initialization.Parameters: η ∈ (0, 1), zmax

3 > 0, κ ∈ (0, 1), β ∈ (0, 1), σ ∈ (0, 12 ), γ > 1, ν ∈ (0, 1).

Initial data: a feasible point x0 ∈ <n+2m, y0 ∈ <p, z0 ∈ <n+2m+1 with z03 ≥ 0, H0 = ∇2f(x0),

ε0 > 0.Set k ← 0.

Step 1. Factorization of the Newton matrix.Step 1.1. If the Jacobian [A>, E(xk, yk, zk; εk)] has full column rank, set εk,0 = εk; otherwise, setεk,0 = 0 and εk+1 = κεk.Step 1.2. Modify Hk if necessary, so that

d>Hkd > 0, ∀d ∈ Sk, d 6= 0,(3.7)

where

Sk = {s ∈ <n+2m|As = 0; s0,i = 0, ∀i ∈ Ik; s1,i = 0, ∀i ∈ Ak1 ; s2,i = 0, ∀i ∈ Ak

2},(3.8)

and

Ik = I(xk, yk, zk; εk,0), Ak1 = A1(xk, yk, zk; εk,0), Ak

2 = A2(xk, yk, zk; εk,0).(3.9)

Step 2. Computation of a Newton direction.Step 2.1. Solve

Mk(4xk; yk+1; λk+1

)=

(−∇f(xk); 0; 0

)(3.10)


where Mk = M(Hk, xk, yk, zk; εk,0) and λk+1 =(λk+1

0,Ik ;λk+11,Ak

1;λk+1

2,Ak2

). Set λk+1

0,i = 0, ∀i ∈ I\Ik,

λk+11,i = 0, ∀i ∈ A\Ak

1 and λk+12,i = 0, ∀i ∈ A\Ak

2 .Step 2.2. If xk and the multipliers (yk+1, λk+1) satisfy the conditions in Definition 2.4, stop; xk isa S-stationary point of problem (1.6). Otherwise, continue to Step 3.

Step 3. Line search.Step 3.1. Set j ← 0 and

γmaxk = γ max{1; xk

1,i, i ∈ Ak1 ; xk

2,i, i ∈ Ak2},(3.11)

Ik+ =

{i ∈ Ik|λk+1

0,i ≥ xk0,i

}, Ik

− ={i ∈ Ik|λk+1

0,i < xk0,i

}.(3.12)

Step 3.2. Set Ak,j1 = A1(xk, yk, zk; εk,j) and Ak,j

2 = A2(xk, yk, zk; εk,j). If j = 0, set αk,j = 1;otherwise, set

αk,j = min

{θ

∣∣∣∣∣βαk,j−1 ≤ θ ≤ αk,j−1; θ ≥xk

1,i

γmaxk

, ∀i ∈ Ak,j1 ; θ ≥

xk2,i

γmaxk

, ∀i ∈ Ak,j2

}.(3.13)

Let Ak,j12 = Ak,j

1 ∩ Ak,j2 and

Bk,j1 =

{i ∈ Ak,j

12

∣∣λk+11,i ≥ xk

1,i, λk+12,i ≥ xk

2,i

},

Bk,j2 =

{i ∈ Ak,j

12

∣∣xk1,i = 0, λk+1

2,i < xk2,i, xk

1,i ≤ λk+11,i

},

Bk,j3 =

{i ∈ Ak,j

12

∣∣xk1,i = 0, λk+1

2,i < xk2,i, λk+1

1,i < xk1,i, λk+1

2,i ≤ λk+11,i

},

Bk,j4 =

{i ∈ Ak,j

12

∣∣xk1,i = 0, λk+1

2,i < xk2,i, λk+1

1,i < xk1,i, λk+1

1,i < λk+12,i

},

Bk,j5 =

{i ∈ Ak,j

12

∣∣xk1,i = 0, xk

2,i ≤ λk+12,i , λk+1

1,i < xk1,i

},

Bk,j6 =

{i ∈ Ak,j

12

∣∣xk2,i = 0, λk+1

1,i < xk1,i, xk

2,i ≤ λk+12,i

},

Bk,j7 =

{i ∈ Ak,j

12

∣∣xk2,i = 0, λk+1

1,i < xk1,i, λk+1

2,i < xk2,i, λk+1

1,i ≤ λk+12,i

},

Bk,j8 =

{i ∈ Ak,j

12

∣∣xk2,i = 0, λk+1

1,i < xk1,i, λk+1

2,i < xk2,i, λk+1

2,i < λk+11,i

},

Bk,j9 =

{i ∈ Ak,j

12

∣∣xk2,i = 0, xk

1,i ≤ λk+11,i , λk+1

2,i < xk2,i

},

Bk,j10 =

{i ∈ Ak

1\Ak,j1

∣∣xk1,i ≤ λk+1

1,i

},

Bk,j11 =

{i ∈ Ak

1\Ak,j1

∣∣λk+11,i < xk

1,i

},

Bk,j12 =

{i ∈ Ak

2\Ak,j2

∣∣xk2,i ≤ λk+1

2,i

},

Bk,j13 =

{i ∈ Ak

2\Ak,j2

∣∣λk+12,i < xk

2,i

}.

(3.14)

Set the components of the vector $k,j = ($k,j0 ;$k,j

1 ;$k,j2 ), where $k,j

0 ∈ <|Ik|, $k,j1 ∈ <|Ak

1 | and


$k,j2 ∈ <|Ak

2 |, as follows:

$k,j0,i = −xk

0,i, i ∈ Ik+,

$k,j0,i = −λk+1

0,i , i ∈ Ik−,

$k,j1,i = −xk

1,i(= 0), i ∈ Ak,j1 \A

k,j12 ,

$k,j2,i = −xk

2,i(= 0), i ∈ Ak,j2 \A

k,j12 ,

$k,j1,i = −xk

1,i, $k,j2,i = −xk

2,i, i ∈ Bk,j1 ,

$k,j1,i = −xk

1,i(= 0), $k,j2,i = −λk+1

2,i , i ∈ Bk,j2 ∪ Bk,j

3 ,

$k,j1,i = −λk+1

1,i + xk2,i/αk,j , $k,j

2,i = −xk2,i/αk,j , i ∈ Bk,j

4 ∪ Bk,j5 ,

$k,j1,i = −λk+1

1,i , $k,j2,i = −xk

2,i(= 0), i ∈ Bk,j6 ∪ Bk,j

7 ,

$k,j1,i = −xk

1,i/αk,j , $k,j2,i = −λk+1

2,i + xk1,i/αk,j , i ∈ Bk,j

8 ∪ Bk,j9 ,

$k,j1,i = −xk

1,i, i ∈ Bk,j10 ,

$k,j1,i = −λk+1

1,i , i ∈ Bk,j11 ,

$k,j2,i = −xk

2,i, i ∈ Bk,j12 ,

$k,j2,i = −λk+1

2,i , i ∈ Bk,j13 .

(3.15)

Step 3.3. Compute a trial step; i.e., using the factorization ofMk computed in Step 2.1, solve

Mk(4xk,j ; yk+1,j ; λk+1,j

)=

(−∇f(xk); 0; $k,j

),(3.16)

where λk+1,j =(λk+1,j

0,Ik ;λk+1,j

1,Ak1

;λk+1,j

2,Ak2

).

Step 3.4. Check the following conditions:

xk + αk,j4xk,j ≥ 0,(3.17)f(xk + αk,j4xk,j) ≤ f(xk) + σαk,j∇f(xk)>4xk,j .(3.18)

If both (3.17) and (3.18) hold, set jk ← j, xk+1 = xk + αk,j4xk,j and go to Step 4. Otherwise, setεk,j+1 = νεk,j , j ← j + 1 and go back to Step 3.2.

Step 4. Update.Set Hk+1 = ∇2f(xk+1). Compute zk+1 by

zk+10 = λk+1

0 ,

zk+13 = min{zmax

3 ,min{θ ≥ 0|λk+11 + θxk+1

2 ≥ 0, λk+12 + θxk+1

1 ≥ 0}},zk+11 = λk+1

1 + zk+13 xk+1

2 ,

zk+12 = λk+1

2 + zk+13 xk+1

1 .

(3.19)

Set k ← k + 1 and go back to Step 1.

Remark 3.1. As we have mentioned earlier, for many interesting applications a feasible startingpoint to problem (1.6) is readily available in polynomial time through state-of-the-art solvers forcomplementarity problems.

Remark 3.2. In Step 1.1 of Algorithm 3.1, we check whether the gradients of the equalityconstraints and the inequality constraints in the working set are linearly independent. Althoughthis procedure may incur additional cost, in practice it can be accomplished as a side product offactorizing the Newton matrix. If these gradients are dependent, the algorithm sets εk,0 = 0, i.e.,


Ik = I(xk), Ak1 = A1(xk) and Ak

2 = A2(xk). Hence, if the MPLCC-LICQ holds at all feasible points,the gradients corresponding to the equality constraints and the inequality constraints in the workingset are now independent. Moreover, if the Hessian estimate H is positive definite on the null spaceof the working set Jacobian, i.e., condition (3.7) holds, the Newton matrix Mk is nonsingular andas will be shown later, the directions computed in Step 2 and Step 3 are descent directions for theobjective function. Given that the Jacobian inMk has full column rank, condition (3.7) holds if andonly if the inertia of the matrix obtained by symmetrizingMk is (n + 2m, p + |Ik|+ |Ak

1 |+ |Ak2 |, 0),

i.e ., the matrix has n + 2m positive eigenvalues, p + |Ik| + |Ak1 | + |Ak

2 | negative eigenvalues andno zero eigenvalues. The inertia of a symmetric indefinite matrix is readily available from severallinear system solvers that perform an LBL> factorization. Whenever condition (3.7) fails with theexact Hessian H = ∇2f(x), we can employ a common approach used in interior-point methods fornon-convex NLPs of adding to H a multiple of the identity with increasing magnitude until (3.7)holds (see, e.g., [33, 34]). Note that (3.7) eventually holds with the exact Hessian as long as theiterates converge to a S-stationary solution of problem (1.6) at which the MPLCC-SOSC holds. Alsonote that since the linear systems (3.10) and (3.16) have the same coefficient matrix Mk, only onematrix factorization is needed to compute the step directions at each iteration.

Remark 3.3. The Newton step 4xk computed in Step 2.1 is not directly used to update thecurrent iterate xk. Rather, it is used to compute new dual variable estimates yk+1 and λk+1, whichare then used to actually compute trial primal steps 4xk,j .

Remark 3.4. Our goal for using the working set is to predict the final active set. The optimalityerror ‖R(x, y, z)‖η used to define the working set in (3.1) is a superior error bound in the sensethat if (x, y, z) is close enough to a second-order solution of problem (2.5), this optimality error isan upper bound on the distance of (x, y, z) to the solution. Therefore, the working set defined by(3.1) identifies the final active set once the iterates (xk, yk, zk) are sufficiently close to a solutionof problem (2.5) satisfying the SOSC. In this case we can deal with the constraints in the workingset exactly as equality constraints and ignore the constraints outside the working set. Hence, theNewton equation yields 4xk

l,i = −xkl,i (l = 0, 1, 2) for every constraint indexed by i in the working

set.Unfortunately, the error bound ‖R(x, y, z)‖η may not be valid when the iterates are far from

a solution. In this case the working set may not correspond to the final active set. Hence, weneed to determine which constraints in the working set are likely to be in an optimal active set andwhich are not. In Algorithm 3.1, this is done by comparing the values of λk+1

0,i and xk0,i for i ∈ Ik

and the values of λk+1l,i and xk

l,i for l = 1, 2 and i ∈ Ak12, where λk+1 is the multiplier estimate

computed from (3.10). The motivation is based on a simple observation that, for example, if x0,i

eventually becomes active, the corresponding dual variable λ0,i will eventually become greater thanor equal to x0,i; otherwise, λ0,i will eventually become smaller than x0,i if strict complementarityholds. In Algorithm 3.1, the working set Ik is partitioned into two sets: Ik

− for those indices i

such that λk+10,i < xk

0,i and Ik+ for those indices i such that λk+1

0,i ≥ xk0,i. If i ∈ Ik

+, the algorithmfor the moment recognizes x0,i as likely to be an active variable and sets 4xk,j

0,i = −xk0,i (since

4xk,j0,i = $k,j

0,i by (3.16) and (3.6) and $k,j0,i = −xk

0,i by (3.15)); otherwise, x0,i is considered likely tobe a basic variable and 4xk,j

0,i = −λk+10,i . Note that setting 4xk,j

0,i = −λk+10,i is not only important to

our proof in Lemma 3.2 that 4xk,j is a descent direction, but it allows a full step to be taken, sincexk

0,i +4xk,j0,i = xk

0,i − λk+10,i ≥ 0. Also by setting x0,i in this manner, it may leave the working set in

the next iteration if λk+10,i is very negative.

The situation becomes more complicated when handling the working set for the complementarityconstraints. Those variables in the working set that are outside of the bi-working set are alwaysconsidered as likely to be active variables; hence, Newton equations yield 4xk,j

l,i = −xkl,i (l = 1, 2)

(see the third and fourth equations in (3.15)). For the bi-working set Ak,j12 , the underlying idea is the


same as that described above for the nonnegativity constraints except that several cases are neededwhen both variables x1,i and x2,i in the bi-working set are considered likely to be basic variables(i.e., λk

1,i < xk1,i and λk

2,i < xk2,i). Specifically, we cannot allow both x1,i and x2,i to become basic

variables since Algorithm 3.1 maintains the feasibility of every iterate. To ensure this, Algorithm 3.1further compares the values of λk+1

1,i and λk+12,i and defines the sets Bk,j

3 , Bk,j4 , Bk,j

7 and Bk,j8 in (3.14).

If λk+11,i is less than λk+1

2,i , x2,i is considered more likely to be an active variable than x1,i; otherwise,x1,i is chosen. The values for $k,j

1,i or $k,j2,i are set in the same manner as that for $k,j

0,i except for twocases: i ∈ Bk,j

4 ∪Bk,j5 or i ∈ Bk,j

8 ∪Bk,j9 . In these cases, the variables that are considered likely to be

active are greater than zero. To guarantee the feasibility of the next iterate, we need to ensure thatthese variables are driven to zero. To this end, given a step size αk,j , Algorithm 3.1 computes a step4xk,j for which 4xk,j

2,i = −xk2,i/αk,j for i ∈ Bk,j

4 ∪ Bk,j5 and 4xk,j

1,i = −xk1,i/αk,j for i ∈ Bk,j

8 ∪ Bk,j9

(see (3.15)). Hence, if xk+1 = xk + αk,j4xk,j is chosen as the next iterate, xk+12,i (i ∈ Bk,j

4 ∪ Bk,j5 )

and xk+11,i (i ∈ Bk,j

8 ∪ Bk,j9 ) are all zero. The other (complementarity) variables in these sets are

considered likely to be basic variables, and we set 4xk,j1,i = −λk+1

1,i + xk2,i/αk,j for i ∈ Bk,j

4 ∪Bk,j5 and

4xk,j2,i = −λk+1

2,i + xk1,i/αk,j for i ∈ Bk,j

8 ∪ Bk,j9 . This, as will be shown later, guarantees that 4xk,j

is a descent direction for the objective function. Since Algorithm 3.1 maintains 0 ≤ xk1 ⊥ xk

2 ≥ 0for every k, it follows that xk

1,i = 0 (respectively, xk2,i = 0) if for some j, i ∈ Ak

1\Ak,j12 (respectively,

i ∈ Ak2\A

k,j12 ). Finally, we note that if xk

1,i = 0 and xk2,i = 0, the index i could belong to two of the

sets Bk,jl , l = 2, ..., 9. In this case we can simply choose to put i into either of the two sets. This

does not interfere with our analysis.Remark 3.5. The line search procedure described in Step 3 is a variant of the traditional back-

tracking line search using the Armijo condition (3.18). In particular, as we have seen from Remark3.4, the step direction 4xk,j varies in a specified way as the step size αk,j changes. This guaranteesthe feasibility of the generated trial iterate with respect to the complementarity constraints. If thetrial point is feasible, i.e., (3.17) holds, and satisfies the Armijo condition (3.18), it is accepted asthe next iterate. Otherwise, we decrease the working set parameter εk,j by a factor and update thestep size using (3.13). Decreasing εk,j avoids the situation that the Armijo condition is rejected dueto an overestimation of the active set. Consequently, Ak,j

1 and Ak,j2 are, respectively, subsets of Ak

1

and Ak2 for all j. Those variables outside Ak,j

1 or Ak,j2 but in Ak

1 or Ak2 are handled in the same way

as are the inequality variables x0,i that belong to Ik (see the treatment related to Bk,j10 -Bk,j

13 in (3.15)and (3.14)). In Algorithm 3.1, the step size always starts from one and the update formula (3.13)tries to shorten a previous step size by a factor β unless the resulting step size is bigger than certainthresholds. These thresholds ensure that the vector $k,j defined by (3.15) is bounded, and hencethat the step direction is bounded. Finally, the feasibility of the iterate with respect to the equalityconstraints follows trivially since x0 is feasible and A4xk,j always equals zero for any k and j.

To simply notation, let

Ek0 = E0(xk, yk, zk; εk,0),



Ek = [Ek0 , Ek

1 , Ek2 ].

(3.20)

In the remainder of this section, we show that Algorithm 3.1 is well-defined. To this end, we needthe following assumptions.

A1. Problem (1.6) is feasible.A2. The MPLCC-LICQ holds on the feasible region of problem (1.6).


As we have discussed in Remarks 3.4 and 3.5, our line search procedure always maintains thefeasibility of every iterate as long as the starting point x0 is feasible. Moreover, from Remark 3.2,under the MPLCC-LICQ, the constraint gradients involved in the Newton matrix Mk are linearlyindependent. Since condition (3.7) holds, the next lemma is easy to prove using Sylvester’s law ofinertia and is an immediate consequence of Proposition 2 in [11].

Lemma 3.1. Under Assumptions A1 and A2, every iterate generated by Algorithm 3.1 is feasibleand the Newton matrix Mk in Step 2.1 of Algorithm 3.1 is nonsingular on each iteration.

This shows that the proposed step directions for each iteration are all computable. The nextlemma shows that these directions are non-ascent directions for the objective function f(x).

Lemma 3.2. Under Assumptions A1 and A2, the directions 4xk and 4xk,j (j = 0, 1, ...) satisfy

∇f(xk)>4xk = −(4xk)>Hk4xk ≤ 0,(3.21)

∇f(xk)>4xk,j = ∇f(xk)>4xk +(λk+1

)>$k,j ≤ ∇f(xk)>4xk, j = 0, 1, ....(3.22)

Proof. Let ξk+1 =((

yk+1)>

,−(λk+1

)>)>, ξk+1,j =

((yk+1,j

)>,−

(λk+1,j

)>)>, Ak = [A>, Ek],

and vk,j =(0>,

($k,j

)>)>with 0 ∈ <p. The linear systems (3.10) and (3.16) can be written as

[ Hk Ak(Ak

)> 0

] [4xk

ξk+1

]=

[−∇f(xk)

0

],

[ Hk Ak(Ak

)> 0

] [4xk,j

ξk+1,j

]=

[−∇f(xk)

vk,j

](3.23)

Premultiplying the first block of equations in the first system in (3.23) by(4xk

)> and noting thesecond block of equations and condition (3.7) yields (3.21). Since Mk is nonsingular, we can write

(Mk

)−1=

[ Hk Ak(Ak

)> 0

]−1

≡[

B CC> D

],

[4xk

ξk+1

]=

[−B∇f(xk)−C∇f(xk)

]and

4xk,j = −B∇f(xk) + Cvk,j = 4xk + Cvk,j .

Hence,

∇f(xk)>4xk,j = ∇f(xk)>4xk +∇f(xk)>Cvk,j

= ∇f(xk)>4xk − (ξk+1)>vk,j ,(3.24)

which implies the equality in (3.22).Moreover, according to the definition (3.15) of $k,j , we have from (3.24) that

−(ξk+1)>vk,j = (λk+1)>$k,j = −∑

i∈Ik+

λk+10,i xk

0,i −∑

i∈Ik−(λk+1

0,i )2

−∑

i∈Bk,j1

(λk+1

1,i xk1,i + λk+1

2,i xk2,i

)−

∑i∈Bk,j

2 ∪Bk,j3

(λk+1

2,i

)2

−∑

i∈Bk,j4 ∪Bk,j

5

((λk+1

1,i

)2+ 1

αk,j

(λk+1

2,i − λk+11,i

)xk

2,i

)−

∑i∈Bk,j

6 ∪Bk,j7

(λk+1

1,i

)2

−∑

i∈Bk,j8 ∪Bk,j

9

((λk+1

2,i

)2+ 1

αk,j

(λk+1

1,i − λk+12,i

)xk

1,i

)−

∑i∈Bk,j

10λk+1

1,i xk1,i −

∑i∈Bk,j

11

(λk+1

1,i

)2

−∑

i∈Bk,j12

λk+12,i xk

2,i −∑

i∈Bk,j13

(λk+1

2,i

)2.

(3.25)


From the definitions of Ik+, Ik

− and Bk,jl (l = 1, ..., 13) given by (3.12) and (3.14), it follows that

λk+10,i xk

0,i ≥ (xk0,i)

2, ∀i ∈ Ik+,

λk+11,i xk

1,i ≥ (xk1,i)

2 and λk+12,i xk

2,i ≥ (xk2,i)

2, ∀i ∈ Bk,j1 ,

λk+12,i > λk+1

1,i , ∀i ∈ Bk,j4 ,

λk+12,i ≥ xk

2,i ≥ xk1,i(= 0) > λk+1

1,i , ∀i ∈ Bk,j5 ,

λk+11,i > λk+1

2,i , ∀i ∈ Bk,j8 ,

λk+11,i ≥ xk

1,i ≥ xk2,i(= 0) > λk+1

2,i , ∀i ∈ Bk,j9 ,

λk+11,i xk

1,i ≥ (xk1,i)

2, ∀i ∈ Bk,j10 ,

λk+12,i xk

2,i ≥ (xk2,i)

2, ∀i ∈ Bk,j12 .

(3.26)

Hence, every summation term in the right hand side of (3.25) is nonnegative and thus, it followsthat ∇f(xk)>4xk,j ≤ ∇f(xk)>4xk ≤ 0.

Lemma 3.3. Under Assumptions A1 and A2, 4xk,j is a descent direction for f(x) at xk; i.e., ifAlgorithm 3.1 does not terminate at iteration k, ∇f(xk)>4xk < 0 and ∇f(xk)>4xk,j < 0 for everyj and k. Moreover, the line search step in Algorithm 3.1 is well-defined, i.e., a step size satisfying(3.17) and (3.18) can always be found in Step 3.

Proof. Suppose ∇f(xk)>4xk,j = 0 for some iteration k and j. By (3.26) every summation termin the right hand side of (3.25) is nonnegative. Since 4xk ∈ Sk by (3.10), 4xk satisfies condition(3.7). Consequently, we obtain from (3.21), (3.25) and (3.14) that:

(a1) : ∇f(xk)>4xk = 0⇒ (4xk)>Hk4xk = 0⇒4xk = 0;

(a2) :∑i∈Ik

+

λk+10,i xk

0,i = 0 and λk+10,i ≥ xk

0,i, ∀i ∈ Ik+ ⇒ λk+1

0,i ≥ 0, xk0,i = 0, ∀i ∈ Ik

+;

(a3) :∑i∈Ik

−

(λk+10,i )2 = 0⇒ λk+1

0,i = 0, ∀i ∈ Ik−;

(a4) :∑

i∈Bk,j1

(λk+1

1,i xk1,i + λk+1

2,i xk2,i

)= 0 and λk+1

1,i ≥ xk1,i, λk+1

2,i ≥ xk2,i, ∀i ∈ B

k,j1

⇒ λk+11,i ≥ 0, xk

1,i = 0, λk+12,i ≥ 0, xk

2,i = 0, ∀i ∈ Bk,j1 ;

(a5) :∑

i∈Bk,j2

(λk+1

2,i

)2= 0 and λk+1

1,i ≥ xk1,i, ∀i ∈ B

k,j2

⇒ λk+12,i = 0, λk+1

1,i ≥ 0, xk1,i = 0, ∀i ∈ Bk,j

2 ;

(a6) :∑

i∈Bk,j3

(λk+1

2,i

)2= 0 and λk+1

2,i ≤ λk+11,i < xk

1,i = 0, ∀i ∈ Bk,j3 ⇒ Bk,j

3 = ∅;

(a7) :∑

i∈Bk,j4

((λk+1

1,i

)2+

1αk,j

(λk+1

2,i − λk+11,i

)xk

2,i

)= 0 and λk+1

1,i < λk+12,i < xk

2,i, ∀i ∈ Bk,j4

⇒ λk+11,i = 0, λk+1

2,i = 0, ∀i ∈ Bk,j4 ;

(a8) :∑

i∈Bk,j5

((λk+1

1,i

)2+

1αk,j

(λk+1

2,i − λk+11,i

)xk

2,i

)= 0 and

λk+11,i < xk

1,i = 0 ≤ xk2,i ≤ λk+1

2,i , ∀i ∈ Bk,j5 ⇒ Bk,j

5 = ∅;

(a9) :∑

i∈Bk,j6

(λk+1

1,i

)2= 0 and λk+1

2,i ≥ xk2,i, ∀i ∈ B

k,j6


⇒ λk+11,i = 0, λk+1

2,i ≥ 0, xk2,i = 0, ∀i ∈ Bk,j

6 ;

(a10) :∑

i∈Bk,j7

(λk+1

1,i

)2= 0 and λk+1

1,i ≤ λk+12,i < xk

2,i = 0, ∀i ∈ Bk,j7 ⇒ Bk,j

7 = ∅;

(a11) :∑

i∈Bk,j8

((λk+1

2,i

)2+

1αk,j

(λk+1

1,i − λk+12,i

)xk

1,i

)= 0 and λk+1

2,i < λk+11,i < xk

1,i, ∀i ∈ Bk,j8

⇒ λk+12,i = 0, λk+1

1,i = 0, ∀i ∈ Bk,j8 ;

(a12) :∑

i∈Bk,j9

((λk+1

2,i

)2+

1αk,j

(λk+1

1,i − λk+12,i

)xk

1,i

)= 0 and

λk+12,i < xk

2,i = 0 ≤ xk1,i ≤ λk+1

1,i , ∀i ∈ Bk,j9 ⇒ Bk,j

9 = ∅;

(a13) :∑

i∈Bk,j10

λk+11,i xk

1,i = 0 and xk1,i ≤ λk+1

1,i , ∀i ∈ Bk,j10 ⇒ λk+1

1,i ≥ 0, xk1,i = 0, ∀i ∈ Bk,j

10 ;

(a14) :∑

i∈Bk,j11

(λk+1

1,i

)2= 0⇒ λk+1

1,i = 0, ∀i ∈ Bk,j11 ;

(a15) :∑

i∈Bk,j12

λk+12,i xk

2,i = 0 and xk2,i ≤ λk+1

2,i , ∀i ∈ Bk,j12 ⇒ λk+1

2,i ≥ 0, xk2,i = 0, ∀i ∈ Bk,j

12 ;

(a16) :∑

i∈Bk,j13

(λk+1

2,i

)2= 0⇒ λk+1

2,i = 0, ∀i ∈ Bk,j13 .

Note that from (a13) and (a15) we know Bk,j10 = Bk,j

12 = ∅ since otherwise Bk,j10 ⊆ A1(xk) ⊆ Ak,j

1

and Bk,j12 ⊆ A2(xk) ⊆ Ak,j

2 contradicting the definitions of Bk,j10 and Bk,j

12 , respectively. Moreover,since A1(xk) ⊆ Ak,j

1 and A2(xk) ⊆ Ak,j2 , we have xk

2,i > xk1,i = 0 for i ∈ Ak,j

1 \Ak,j12 and xk

1,i > xk2,i = 0

for i ∈ Ak,j2 \A

k,j12 . On the other hand, all the multipliers in λk+1 for which the corresponding primal

iterates are not in the working set are set to zero according to Step 2.1 of Algorithm 3.1.Summarizing all the results above, we obtain that

λk+10 ≥ 0;

(λk+1

0

)>xk

0 = 0; λk+11,i , λk+1

2,i ≥ 0, ∀i ∈ Ak,j12 ; λk+1

1,i = 0, ∀i ∈ A\Ak,j1 ;

λk+12,i = 0, ∀i ∈ A\Ak,j

2 ; λk+11,i xk

1,i = 0, ∀i ∈ Ak,j1 ; λk+1

2,i xk2,i = 0, ∀i ∈ Ak,j

2 .

In view of the fact that A1(xk) ⊆ Ak,j1 and A2(xk) ⊆ Ak,j

2 , we further obtain that

λk+11,i , λk+1

2,i ≥ 0, ∀i ∈ A12(xk); λk+11,i = 0, ∀i ∈ A\A1(xk); λk+1

2,i = 0, ∀i ∈ A\A2(xk).

Hence, substituting 4xk = 0 into the first equation of (3.10), we conclude from Definition 2.4 thatxk is a S-stationary point of problem (1.6) with the unique multiplier vector (yk+1, λk+1). Thismeans that Algorithm 3.1 should have been terminated in Step 2.2. Therefore, we always have∇f(xk)>4xk,j < 0 for every j if the algorithm does not stop at Step 2.2 on iteration k.

Now we consider the second part of the lemma. Suppose at some iteration k, a step size satisfying(3.17) and (3.18) cannot be found. Thus, by Step 3.4 of Algorithm 3.1, parameters εk,j are eventuallydriven to zero as j increases. As a consequence, the working sets Ak,j

1 and Ak,j2 eventually become

the active sets A1(xk) and A2(xk), respectively. Therefore, the sets defined in (3.14) and the vectors$k,j given by (3.15) are eventually fixed. This means that the step directions 4xk,j are fixed forall large enough j. Let 4xk,j = 4xk for such j. Clearly, it follows from (3.15) and (3.14) that4xk

0,i ≥ 0, ∀i ∈ I(xk), 4xk1,i ≥ 0, ∀i ∈ A1(xk) and 4xk

2,i ≥ 0, ∀i ∈ A2(xk). Hence, there is a α > 0


such that xk + α4xk ≥ 0 for all α ∈ (0, α]. Moreover, since ∇f(xk)>4xk < 0 as shown above andf(x) is continuously differentiable, there is a α > 0 with α ≤ α such that for any α ∈ (0, α],

f(xk + α4xk) ≤ f(xk) + σα∇f(xk)>4xk.

Hence, violation of (3.17) and (3.18) implies that αk,j > α for all j large enough. On the otherhand, since Ak,j

1 = A1(xk) and Ak,j2 = A2(xk) eventually, we know from (3.13) and the definition

(3.11) of γmaxk that αk,j = βαk,j−1 for all large enough j. This implies that the step sizes αk,j tend

to zero in the limit. Hence, we get a contradiction.So far we have established that under Assumptions A1 and A2, at every iteration Algorithm 3.1

either terminates at Step 2.2 with a S-stationary point to problem (1.6), or readily generates the nextiterate and continues on to the next iteration. In particular, we have shown that the Newton matrixMk is always non-singular, all step directions are computable and the line search is well-defined. Toconclude, we have

Proposition 3.4. Under Assumptions A1 and A2, Algorithm 3.1 is well defined.

4. Global convergence to B-stationarity. In this section we show that the iterates gener-ated by Algorithm 3.1 converge to a S-stationary solution, and hence, under the MPLCC-LICQ, toa B-stationary solution of problem (1.6). In particular, we only consider the case that Algorithm3.1 generates an infinite sequence of iterates as otherwise a B-stationary point is found in a finitenumber of steps. The next two assumptions are used in our analysis.

A3. The level set {x ∈ <n+2m|f(x) ≤ f(x0)} is bounded on the feasible region of problem (1.6).A4. The sequence {Hk} of Hessian estimates is bounded and there exists a µ > 0 such that for

all k, d>Hkd ≥ µ‖d‖2, ∀d ∈ Sk.Since Algorithm 3.1 is a descent algorithm and f(x) is continuously differentiable, Assumption

A3 guarantees that the generated iterates are bounded and hence an accumulation point exists.Assumption A4, which can be ensured by careful modifications to the exact Hessian, ensures thatthe sequence {Hk} is uniformly positive definite on the null space defined by Sk. This will be usedto show that the Newton system is nonsingular in the limit.

Lemma 4.1. Under Assumptions A1, A2 and A3, there is a ε > 0 such that εk,0 = εk = ε forall large enough k.

Proof. Suppose {εk} tends to zero as k increases. Then by Step 1.1 of Algorithm 3.1, there isan infinite index set K on which the matrix [A>, E(xk, yk, zk; εk)] does not have full column rank.Since the iterates xk are bounded by Assumption A3 and I and A are finite sets, there is an infiniteindex set K ⊆ K such that {xk}K → x and the sets Ik, Ak

1 and Ak2 defined by (3.9) are fixed for

all k ∈ K. Suppose they are I, A1 and A2, respectively. Since xk are all feasible, x is also feasible.Since {εk} → 0, it follows from (3.1) that I ⊆ I(x), A1 ⊆ A1(x) and A2 ⊆ A2(x). Hence, we havefound a feasible point x for which the gradients of equality constraints and a set of active inequalityand complementarity constraints respectively in I and A1 and A2 are linearly dependent. Thiscontradicts Assumption A2. Thus, εk = ε for some ε > 0 and all k large enough. This implies thatmatrix [A>, Ek] eventually has full column rank by Step 1.1 of Algorithm 3.1, where Ek is definedin (3.20). Hence, εk,0 = εk = ε for all k large enough.

Lemma 4.2. Under Assumptions A1-A4, the sequences {xk}, {‖(Mk)−1‖}, {4xk, yk+1, λk+1},{zk} and {4xk,j , yk+1,j , λk+1,j} for all j are bounded.

Proof. Suppose there exists an infinite index set K such that the sequence {‖(Mk)−1‖} tendsto infinity as k ∈ K tends to infinity. Since the iterates xk are bounded by Assumption A3 andthe sets I and A are finite, there exists an infinite index set K ⊆ K such that {xk}K → x and theworking sets Ik, Ak

1 and Ak2 defined by (3.9) are fixed for all k ∈ K. Suppose they are I, A1 and

A2, respectively. Moreover, the Jacobian matrix Ek defined in (3.20) and the set Sk are also fixedfor all k ∈ K. Denote them by E and S, respectively. By Assumption A4, there is an infinite index


set K ⊆ K such that Hk → H as k ∈ K → ∞. Hence,

{Mk}K → M =

H A> −EA 0 0E 0 0

.(4.1)

By Assumption A4, we have d>Hd ≥ µ‖d‖2 for all d ∈ S, i.e., for all d ∈ <n+2m such that Ad = 0and E>d = 0. Since the columns of [A>, E] are independent by arguments similar to those used inthe proof of Lemma 3.1, we know that M is nonsingular. Hence, it follows that {(Mk)−1}K → M−1.This contradicts the assumption that {‖(Mk)−1‖}K →∞. Hence, the sequence {Mk} is uniformlynonsingular.

The boundedness of {xk} implies the right hand side of (3.10) is bounded. Hence, the so-lution sequence {4xk, yk+1, λk+1} is bounded. Moreover, we have from (3.13) that for any j,xk

1,i/αk,j ≤ γmaxk , ∀i ∈ Ak,j

1 and xk2,i/αk,j ≤ γmax

k , ∀i ∈ Ak,j2 , where {γmax

k } is bounded because ofthe boundedness of {xk} and (3.11). Hence, we know from (3.15) that {$k,j} is bounded and thusthe right hand side of (3.16) is bounded for all j. This together with the uniform non-singularity of{Mk} imply that the sequence {4xk,j , yk+1,j , λk+1,j} is bounded for all j.

By the second equation in (3.19), {zk3} is bounded above by zmax

3 > 0. Hence, we have from theboundedness of {λk} and {xk} and (3.19) that {zk} is bounded as well.

Lemma 4.3. Suppose Assumptions A1-A4 hold. If there is an infinite index set K associated witha sequence of indexes {jk}K such that min{‖R(xk, yk, zk)‖η, εk,jk

} ≥ ε for some ε > 0 and all k ∈ Kand {∇f(xk)>4xk,jk} → 0 as k ∈ K →∞, then any limit point of the sequence {(xk, yk+1, λk+1)}Ksatisfies the S-stationarity conditions for problem (1.6).

Proof. First note that {(xk, yk+1, λk+1)}K is bounded by Assumption A3 and Lemma 4.2.Assume to the contrary that for some infinite index set K ⊆ K, {(xk, yk+1, λk+1)}K → (x, y, λ), but(x, y, λ) does not satisfy the S-stationarity conditions for problem (1.6). Due to the boundedness of{4xk} by Lemma 4.2 and the boundedness {Hk} by Assumption A4, there exists an infinite indexset K ⊆ K such that {Hk} → H and {4xk} → 4x as k ∈ K → ∞. Since the sets I and A arefinite, there exists an infinite index set K ⊆ K such that Ik, Ik

−, Ik+, Ak

1 , Ak2 , Ak,jk

1 , Ak,jk

2 , Ak,jk

12

and Bk,jk

l (l = 1, ..., 13) are all fixed for all k ∈ K. Suppose they are I, I−, I+, A1, A2, A1, A2, A12

and Bl (l = 1, ..., 13), respectively.From (3.10) we have 4xk ∈ Sk. Hence, by Assumption A4 and Lemma 3.2, it follows that

∇f(xk)>4xk,jk ≤ −µ‖4xk‖2 for all k. By (3.26) the summation terms in the right hand side of(3.25) are all nonnegative. Since {∇f(xk)>4xk,jk}K → 0, similarly to the analysis in Lemma 3.3,letting k ∈ K → ∞ in (3.25) and using the definitions of Bl (l = 1, ..., 13) given in (3.14) yields that:

(b1) : 4x = 0;(b2) : λ0,i ≥ x0,i = 0, ∀i ∈ I+;(b3) : λ0,i = 0, ∀i ∈ I−;(b4) : λ1,i ≥ x1,i = 0, and λ2,i ≥ x2,i = 0, ∀i ∈ B1;(b5) : λ2,i = 0 and λ1,i ≥ x1,i = 0, ∀i ∈ B2;(b6) : x1,i = 0 and λ1,i ≥ λ2,i = 0, ∀i ∈ B3;(b7) : λ1,i = 0 and x2,i ≥ λ2,i = 0, ∀i ∈ B4;(b8) : λ1,i = 0 and λ2,i ≥ x2,i = 0, ∀i ∈ B5;(b9) : λ1,i = 0 and λ2,i ≥ x2,i = 0, ∀i ∈ B6;

(b10) : x2,i = 0 and λ2,i ≥ λ1,i = 0, ∀i ∈ B7;(b11) : λ2,i = 0 and x1,i ≥ λ1,i = 0, ∀i ∈ B8;


(b12) : λ2,i = 0 and λ1,i ≥ x1,i = 0, ∀i ∈ B9;(b13) : λ1,i ≥ x1,i = 0, ∀i ∈ B10;(b14) : x1,i ≥ λ1,i = 0, ∀i ∈ B11;(b15) : λ2,i ≥ x2,i = 0, ∀i ∈ B12;(b16) : x2,i ≥ λ2,i = 0, ∀i ∈ B13.

By Step 2.1 of Algorithm 3.1, λ0,i = 0 for i ∈ I\I. This together with (b2) and (b3) imply

λ0 ≥ 0,(λ0

)>x0 = 0.(4.2)

From (b4)− (b12) we have

λ1,i, λ2,i ≥ 0, λ1,ix1,i = 0, λ2,ix2,i = 0, ∀i ∈ A12.(4.3)

Since min{‖R(xk, yk, zk)‖η, εk,jk} ≥ ε for all k ∈ K, it follows that x1,i ≥ ε for all i ∈ A\A1

and hence for all i ∈ A1\A1 and x2,i ≥ ε for all i ∈ A\A2 and hence for all i ∈ A2\A2. SinceB10 ⊆ A1\A1 and B12 ⊆ A2\A2, we have from (b13) and (b15) that B10 = B12 = ∅. Hence, (b14)and (b16) imply that λ1,i = 0 for i ∈ A1\A1 and λ2,i = 0 for i ∈ A2\A2. Moreover, by Step 2.1 ofAlgorithm 3.1, λ1,i = 0 for i ∈ A\A1 and λ2,i = 0 for i ∈ A\A2. Hence,

λ1,i = 0, ∀i ∈ A\A1; λ2,i = 0, ∀i ∈ A\A2.(4.4)

Since min{‖R(xk, yk, zk)‖η, εk,jk} ≥ ε, we have from (3.1) that x2,i > ε for i ∈ A1\A12 and x1,i > ε

for i ∈ A2\A12. Moreover, since Algorithm 3.1 maintains the feasibility of every iterate, it followsthat x1,ix2,i = 0 for every i ∈ A and hence that x1,i = 0 for i ∈ A1\A12 and x2,i = 0 for i ∈ A2\A12.This together with (4.3) imply

λ1,ix1,i = 0, ∀i ∈ A1; λ2,ix2,i = 0, ∀i ∈ A2.(4.5)

Since A1(x) ⊆ A1 and A2(x) ⊆ A2, we obtain from (4.2)-(4.5) that

λ1,i, λ2,i ≥ 0, ∀i ∈ A12(x); λ1,i = 0, ∀i ∈ A\A1(x); λ2,i = 0, ∀i ∈ A\A2(x).(4.6)

Finally, since 4x = 0, letting k ∈ K → ∞ in the first equation of (3.10) yields ∇xL(x, y, λ) = 0.Combining this with (4.2) and (4.6) shows that (x, y, λ) satisfies S-stationarity. This is a contradic-tion.

Lemma 4.4. For any infinite index set K and any associated index set {jk}K, {εk,jk}K → 0 if

and only if {αk,jk}K → 0.

Proof. We have from (3.13) that αk,j ≥ βj for any j. Hence, if {αk,jk}K → 0, it follows that

{jk}K →∞. Since εk,j = νjεk,0 for any j, we have {εk,jk}K → 0.

Suppose {εk,jk}K → 0. Since εk,0 = ε > 0 for all k large enough by Lemma 4.1 and εk,j = νjεk,0

for all j, it follows that {jk}K →∞. By (3.11) we have γmaxk ≥ γ > 1 for every k. By the definitions

of Ak,j1 and Ak,j

2 , it follows that for any j, xk1,i ≤ εk,j , ∀i ∈ Ak,j

1 and xk2,i ≤ εk,j , ∀i ∈ Ak,j

2 . Hence,εk,j ≥ xk

1,i/γmaxk , ∀i ∈ Ak,j

1 and εk,j ≥ xk2,i/γmax

k , ∀i ∈ Ak,j2 . This together with (3.13) imply that

αk,j ≤ max{βαk,j−1, εk,j}. Hence, it follows that

max{εk,j , αk,j} ≤ max{βαk,j−1, εk,j}= max{βαk,j−1, νεk,j−1}≤ max{β, ν}max{αk,j−1, εk,j−1}≤ max{βj , νj}max{αk,0, εk,0}.


This implies {αk,jk}K → 0 as ν, β ∈ (0, 1) and {jk}K →∞.

Lemma 4.5. Suppose Assumptions A1-A4 hold. If there is an infinite index set K such that‖R(xk, yk, zk)‖η ≥ ε for some ε > 0 and all k ∈ K, then there exists a α > 0 such that for all k ∈ Klarge enough, all j and all α ∈ (0, α], (i) xk

0+α4xk,j0 ≥ 0, (ii) xk

1,i+α4xk,j1,i ≥ 0, ∀i ∈ A\(Bk,j

8 ∪Bk,j9 ),

and (iii) xk2,i + α4xk,j

2,i ≥ 0, ∀i ∈ A\(Bk,j4 ∪ Bk,j

5 ).Proof. Let ε be defined as in Lemma 4.1. From the definitions of Ik, Ak

1 and Ak2 , we have that

for all k ∈ K large enough,

xk0,i > min{ε, ε}, ∀i ∈ I\Ik; xk

1,i > min{ε, ε}, ∀i ∈ A\Ak1 ; xk

2,i > min{ε, ε}, ∀i ∈ A\Ak2 .

Since {4xk,j}K are bounded for all j by Lemma 4.2, there is a α > 0 such that for all k ∈ K largeenough, all j and all α ∈ (0, α],

xk0,i + α4xk,j

0,i ≥ 0, ∀i ∈ I\Ik; xk1,i + α4xk,j

1,i ≥ 0, ∀i ∈ A\Ak1 ; xk

2,i + α4xk,j1,i ≥ 0, ∀i ∈ A\Ak

2 .

On the other hand, it follows from (3.16) that for any j, 4xk,j0,i = $k,j

0,i , ∀i ∈ Ik, 4xk,j1,i = $k,j

1,i , ∀i ∈Ak

1 and 4xk,j2,i = $k,j

2,i , ∀i ∈ Ak2 . Hence, we have from (3.15) that for any j:

(c1) : 4xk,j0,i = −xk

0,i, ∀i ∈ Ik+; (c2) : 4xk,j

0,i = −λk+10,i ≥ −xk

0,i, ∀i ∈ Ik+;

(c3) : 4xk,j1,i = −xk

1,i, ∀i ∈ (Ak,j1 \A

k,j12 ) ∪ Bk,j

1 ∪ Bk,j2 ∪ Bk,j

3 ∪ Bk,j10 ;

(c4) : 4xk,j2,i = −xk

2,i, ∀i ∈ (Ak,j2 \A

k,j12 ) ∪ Bk,j

1 ∪ Bk,j6 ∪ Bk,j

7 ∪ Bk,j12 ;

(c5) : 4xk,j1,i = −λk+1

1,i > −xk1,i, ∀i ∈ B

k,j6 ∪ Bk,j

7 ∪ Bk,j11 ;

(c6) : 4xk,j2,i = −λk+1

2,i > −xk2,i, ∀i ∈ B

k,j2 ∪ Bk,j

3 ∪ Bk,j13 ;

(c7) : 4xk,j1,i = −λk+1

1,i + xk2,i/αk,j ≥ −λk+1

1,i > −xk1,i, ∀i ∈ B

k,j4 ∪ Bk,j

5 ;

(c8) : 4xk,j2,i = −λk+1

2,i + xk1,i/αk,j ≥ −λk+1

2,i > −xk2,i, ∀i ∈ B

k,j8 ∪ Bk,j

9 .

Now the result follows immediately.Lemma 4.6. Suppose Assumptions A1-A4 hold. If there is an infinite index set K such that (i)

‖R(xk, yk, zk)‖η ≥ ε for some ε > 0 and all k ∈ K, and (ii) {εk,jk} → 0 as k ∈ K → ∞, where

jk is the index for which conditions (3.17) and (3.18) in Step 3.4 of Algorithm 3.1 are satisfiedat iteration k, then any limit point of the sequence {(xk, yk+1, λk+1)}K satisfies the S-stationarityconditions for problem (1.6).

Proof. First note that {(xk, yk+1, λk+1)}K is bounded by Lemma 4.2. Since {εk,jk}K → 0, we

have {jk}K → ∞ and from Lemma 4.4 that {αk,jk}K → 0. Suppose there exists an infinite index

set K ⊆ K such that {(xk, yk+1, λk+1)}K → (x, y, λ), but (x, y, λ) does not satisfy the S-stationarityconditions for problem (1.6). Let

τ =12

min {ε, ε, min {x1,i, i ∈ A\A1(x)} ,min {x2,i, i ∈ A\A2(x)}} ,

where ε is given by Lemma 4.1. It is obvious that τ > 0. Let jk be the first index of the sequencej = 0, 1, ... for which εk,j ≤ τ . Clearly, jk < jk for all k ∈ K large enough as {jk}K → ∞and εk,jk

= νεk,jk−1> ντ , since εk,jk−1

> τ . Therefore, we have τ ≥ εk,jk= ν jk ε > ντ as

εk,0 = ε > τ by Lemma 4.1. Clearly, by (3.13) there is an index j such that jk ≤ j and αk,jk≥ β j

for all k ∈ K. Also, since ε > τ ≥ εk,jk, it follows that Ak,jk

1 = A1(x), Ak,jk

2 = A2(x) and

Ak,jk

12 = A12(x) for all k ∈ K large enough. Since (x, y, λ) does not satisfy the S-stationarity


conditions and min{‖R(xk, yk, zk)‖η, εk,jk} ≥ ντ , we conclude from Lemma 4.3 that there is a ρ > 0

such that ∇f(xk)>4xk,jk ≤ −ρ for all k ∈ K.

Since {4xk,jk} is bounded by Lemma 4.2 and I andA are finite sets, there exists an infinite indexset K ⊆ K such that {4xk,jk} → 4x and Ik

−, Ik+, Ak

1 , Ak2 , Ak,jk

1 , Ak,jk

2 , Ak,jk

12 and Bk,jk

l (l = 1, ..., 13)are all fixed on K. Suppose they are I−, I+, A1, A2, A1, A2, A12 and Bl (l = 1, ..., 13), respectively.Clearly, we have ∇f(x)>4x ≤ −ρ. Let σ ∈ (σ, 1). Due to the continuous differentiability of f(x),by standard arguments used for the Armijo condition (3.18), one can show that for all small enoughα > 0,

f(x + α4x) ≤ f(x) + σα∇f(x)>4x < f(x) + σα∇f(x)>4x− α(σ − σ)ρ.(4.7)

Let α ∈ (0,min{α, β j}), where α is given by Lemma 4.5, such that for all α ∈ [βα, α], inequality(4.7) holds. Since {αk,jk

}K → 0, it follows that αk,jk< βα for all k ∈ K large enough. By (3.13),

there is an index jk for every large enough k ∈ K such that αk,jkbelongs to the region [βα, α]

and the line search conditions (3.17) and (3.18) fail to hold for these jk. Since αk,jk≥ β j ≥

α ≥ αk,jk, it follows that jk ≤ j ≤ jk. Since {αk,jk

}K is bounded below by βα, we obtain byLemma 4.4 that {εk,jk

}K is bounded below by a strictly positive scalar. This implies that thereis an index j such that jk ≤ j and thus, εk,jk

≥ ν j ε for all k ∈ K large enough. Moreover, sinceεk,jk

≤ ν j ε ≤ ν jk ε = εk,jk≤ τ , we have from the definition of τ that Ak,jk

1 = A1 = A1(x),

Ak,jk

2 = A2 = A2(x) and Ak,jk

12 = A12 = A12(x) for all k ∈ K large enough. Hence, in view of (3.14),we know Bk,jk

l = Bk,jk

l = Bl (l = 1, ..., 13) for all k ∈ K large enough. On the other hand, accordingto (3.15), the elements in $k,jk and $k,jk are identical except for those belonging to Bl, l = 4, 5, 8, 9.Since Bl ⊆ A12 = A12(x) (l = 4, 5, 8, 9) and the sequences {αk,jk

} and {αk,jk} are bounded below

by βα on K, we obtain from (3.15) that as k ∈ K → ∞,

lim$k,jk

1,i = lim $k,jk

1,i = −λ1,i, lim$k,jk

2,i = lim $k,jk

2,i = 0, ∀i ∈ B4 ∪ B5;

lim$k,jk

2,i = lim $k,jk

2,i = −λ2,i, lim$k,jk

1,i = lim $k,jk

1,i = 0, ∀i ∈ B8 ∪ B9.

Therefore, we have proved lim $k,jk = lim $k,jk as k ∈ K → ∞. This together with the linearsystem (3.16) confirms {4xk,jk}K →4x.

From (3.16) and (3.15) we have

4xk,jk

2,i = $k,jk

2,i = −xk2,i/αk,jk

, ∀i ∈ Bk,jk

4 ∪ Bk,jk

5 ,

4xk,jk

1,i = $k,jk

1,i = −xk1,i/αk,jk

, ∀i ∈ Bk,jk

8 ∪ Bk,jk

9 .

Therefore, it follows that

xk2,i + αk,jk

4xk,jk

2,i = 0, ∀i ∈ Bk,jk

4 ∪ Bk,jk

5 ; xk1,i + αk,jk

4xk,jk

1,i = 0, ∀i ∈ Bk,jk

8 ∪ Bk,jk

9 .

Since αk,jk∈ [βα, α] for all k ∈ K large enough, there exists an infinite index set K ⊆ K such that

{αk,jk}K → α for some α ∈ [βα, α]. Then since αk,jk

≤ α ≤ α for k ∈ K, we obtain from Lemma4.5 that xk + αk,jk

4xk,jk ≥ 0 for all k ∈ K large enough. On the other hand, by the continuityof f(x) and ∇f(x) and the fact that {αk,jk

}K → α ∈ [βα, α] and {4xk,jk}K → 4x, we have from


(4.7) that for all k ∈ K large enough,

f(xk + αk,jk4xk,jk)− f(xk)− σαk,jk

∇f(xk)>4xk,jk

= f(xk + αk,jk4xk,jk)− f(x + α4x)− f(xk) + f(x)

−σ(αk,jk

∇f(xk)>4xk,jk − α∇f(x)>4x)

+f(x + α4x)− f(x)− σα∇f(x)>4x= f(x + α4x)− f(x)− σα∇f(x)>4x

+O(∥∥∥(

xk − x,4xk,jk −4x, αk,jk− α

)∥∥∥)< −α(σ − σ)ρ +O

(∥∥∥(xk − x,4xk,jk −4x, αk,jk

− α)∥∥∥)

≤ 0.

(4.8)

Consequently, we have proved that conditions (3.17) and (3.18) hold with index jk for all k ∈ Klarge enough. Therefore, from Step 3.4 of Algorithm 3.1, we know that jk = jk for all k ∈ K largeenough. Thus, the sequence {εk,jk

}K is bounded away from zero as εk,jk≥ ν j ε for all k ∈ K large

enough. This contradicts the assumption that {εk,jk}K → 0, where K ⊇ K.

Lemma 4.7. Suppose Assumptions A1-A4 hold. If there is an infinite index set K such that{‖R(xk, yk, zk)‖} → 0 as k ∈ K → ∞, any limit points of the sequences {(xk, yk, zk)}K and{(xk, yk, λk)}K satisfy the KKT conditions for problem (2.5) and the S-stationarity conditions forproblem (1.6), respectively.

Proof. First, by Lemma 4.2, the sequence {xk, yk, zk, λk} is uniformly bounded. Suppose forsome infinite index set K ⊆ K, {xk, yk, zk, λk} → (x, y, z, λ) as k ∈ K → ∞. Clearly, (x, y, z)satisfies the KKT conditions (2.6) since ‖R(·)‖ is the KKT error for problem (2.5). On the otherhand, it follows from (3.19) that λ0 = z0, z1 = λ1 + z3x2 and z2 = λ2 + z3x1. Hence, we obtain fromProposition 2.8 that x is a S-stationary point of problem (1.6) with multipliers y and λ.

Now we are ready to state the global convergence properties of Algorithm 3.1. We ignore thecase that Algorithm 3.1 stops with a S-stationary point in a finite number of iterations.

Theorem 4.8. Suppose Assumptions A1-A4 hold and Algorithm 3.1 generates an infinitesequence of iterates. Then any limit point of either the sequence {xk, yk, λk} or the sequence{xk, yk+1, λk+1} satisfies the S-stationarity conditions for problem (1.6).

Proof. First note that the sequence {xk, yk, zk, λk} is uniformly bounded by Lemma 4.2. Con-sider any infinite index set K on which {xk} → x, {(yk, λk)} → (y, λ) and {(yk+1, λk+1)} → (y, λ)as k ∈ K →∞. There are two cases to consider.

Case 1. There is an infinite index set K ⊆ K such that ‖R(xk, yk, zk)‖η ≥ ε for some ε > 0 andall k ∈ K large enough. There are two subcases.

Case 1.1. There is an infinite index set K ⊆ K such that εk,jk≥ ε for some ε > 0 and all k ∈ K

large enough. Then it follows from Lemma 4.4 that αk,jk≥ α for some α > 0 and all k ∈ K large

enough. Assumption A3 and the continuous differentiability of f(x) imply that f(x) is boundedbelow on the feasible region of problem (1.6). Therefore, we know from the Armijo condition (3.18)that {∇f(xk)>4xk,jk} → 0 as k ∈ K → ∞. Hence, it follows from Lemma 4.3 that (x, y, λ) satisfiesthe S-stationarity conditions for problem (1.6).

Case 1.2. There is an infinite index set K ⊆ K such that {εk,jk} → 0 as k ∈ K → ∞. Then it

follows from Lemma 4.6 that (x, y, λ) satisfies the S-stationarity conditions for problem (1.6).Case 2. There is an infinite index set K ⊆ K such that {‖R(xk, yk, zk)‖} → 0 as k ∈ K → ∞.

We know from Lemma 4.7 that (x, y, λ) satisfies the S-stationarity conditions for problem (1.6).

5. Quadratic local convergence. In this section, we assume that Algorithm 3.1 generatesan infinite sequence of iterates and for some infinite index set K, {xk} → x∗, {(yk, λk)} → (y, λ)and {(yk+1, λk+1)} → (y, λ) as k ∈ K →∞. By Theorem 4.8, x∗ is a S-stationary point of problem


(1.6). Under Assumption A2, there is a unique multiplier vector (y∗, λ∗) associated with x∗. Againwe have from Theorem 4.8 that either (y, λ) = (y∗, λ∗) or (y, λ) = (y∗, λ∗). To establish the fastlocal convergence of Algorithm 3.1, we need the following assumptions in addition to A1-A4.

A5. The MPLCC-SOSC hold at x∗ – see Definition 2.6.A6. Strict complementarity holds at (x∗, y∗, λ∗) – see Definition 2.7.Assumptions A5 and A6 are usually required to prove fast local convergence of NLP based

algorithms. In our case Assumption A5 has two purposes. First, it allows the KKT error ‖R(x, y, z)‖to properly measure the distance from a nearby point to the solution. On the other hand, itguarantees that the exact Hessian can be eventually used without any modification by Algorithm3.1. Moreover, in order for the final active set to be correctly identified using the error bound‖R(x, y, z)‖η, we need the parameter zmax

3 in Algorithm 3.1 to be chosen large enough so that

zmax3 ≥ z∗3 = min

{0, −

λ∗1,i

x∗2,i

, i ∈ A1(x∗)\A12(x∗); −λ∗2,i

x∗1,i

, i ∈ A2(x∗)\A12(x∗)

}.(5.1)

Similarly to NLPs, a first-order solution of problem (1.6) is a strict local optimal solution underthe MPLCC-SOSC and strict complementarity. This result can be proved in a straightforward wayusing the stability results in [29]. The next result is an easy consequence of the strict local optimalityof x∗.

Lemma 5.1. Under Assumptions A1-A6, x∗ is an isolated accumulation point of the sequence{xk} generated by Algorithm 3.1.

Proof. First it follows from Theorem 7 in [30] that under Assumptions A5 and A6, x∗ is a strictlocal minimizer of problem (1.6). This implies that there is a small neighborhood of x∗ in whichthere are no other first-order solutions to problem (1.6). Since by Theorem 4.8 any accumulationpoint of xk is a first-order solution to problem (1.6), the result follows immediately.

The original version of the next result is from [19]. Here we cite a slightly different version givenby Proposition 5.4 in [18].

Proposition 5.2. Assume that ω∗ ∈ <t is an isolated accumulation point of a sequence {ωk} ⊂<t such that for every subsequence {ωk}K converging to ω∗, there is an infinite index set K ⊆ Ksuch that {‖ωk+1 − ωk‖}K → 0. Then the whole sequence {ωk} converges to ω∗.

Let z∗1 = λ∗1 + z∗3x∗2 and z∗2 = λ∗2 + z∗3x∗1. Clearly, x∗ is a KKT point of problem (2.5) and(y∗, z∗) is an associated multiplier vector. The next result shows that the sequence {(xk, yk, zk, λk)}converges to (xk, y∗, z∗, λ∗).

Lemma 5.3. Suppose Assumptions A1-A6 and (5.1) hold. If there exists an infinite index setK ⊆ K such that I(x∗) ⊆ Ik, A1(x∗) ⊆ Ak

1 and A2(x∗) ⊆ Ak2 for all k ∈ K, then the entire sequence

{(xk, yk, zk, λk)} converges to (xk, y∗, z∗, λ∗).Proof. Since {Hk} is bounded by Assumption A4 and I and A are finite sets, there exists an

infinite index set K ⊆ K such that {Hk}K → H and Ik, Ik−, Ik

+, Ak1 , Ak

2 , Ak,jk

1 , Ak,jk

2 , Ak,jk

12 and

Bk,jk

l (l = 1, ..., 13) are all fixed on K. Suppose they are I, I−, I+, A1, A2, A1, A2, A12 andBl (l = 1, ..., 13), respectively. Let E = [e0,i, i ∈ I; e1,i, i ∈ A1; e2,i, i ∈ A2]. Consider the followinglinear system in (dx, dy, dλ) Hdx +A>dy −Edλ = −∇f(x∗),

Adx = 0,E>dx = 0.

(5.2)

Since I(x∗) ⊆ Ik, A1(x∗) ⊆ Ak1 and A2(x∗) ⊆ Ak

2 for all k ∈ K, it follows from (2.2), (2.3) and(2.4) that (0, y∗, E>λ∗) is a solution of (5.2). On the other hand, by Step 1.1 of Algorithm 3.1 andAssumption A2, [A>, E] has full column rank. Assumption A4 implies that H is positive definite on


the null space of [A>, E]>. Therefore, similarly to Lemma 3.1, we conclude that the matrix in (5.2) isnonsingular and thus (5.2) has the unique solution (0, y∗, E>λ∗). By continuity, taking limit in (3.10)as k ∈ K → ∞ yields (5.2). In particular, {(4xk, yk+1, E>λk+1)}K → (0, y∗, E>λ∗). Notice that byStep 2.1 of Algorithm 3.1, the multipliers in λk+1 for which the corresponding primal variables arenot in the working set are set to zero. Hence, it follows that {(4xk, yk+1, λk+1)}K → (0, y∗, λ∗).

Now we show {$k,jk}K → 0. First it follows from (3.15) that {$k,jk

0,i }K → −x∗0,i, ∀i ∈ I+ and{$k,jk

0,i }K → −λ∗0,i, ∀i ∈ I−. From the definition of Ik+ and Ik

−, we know x∗0,i ≤ λ∗0,i for i ∈ I+ andx∗0,i ≥ λ∗0,i for i ∈ I−. This together with the complementarity condition imply that x∗0,i = 0 fori ∈ I+ and λ∗0,i = 0 for i ∈ I−. Hence, it follows that {$k,jk

0 }K → 0. Since every xk is feasible toproblem (1.6), we have from (3.15) that for all k ∈ K, $k,jk

1,i = −xk1,i = 0, ∀i ∈ A1\A12 and $k,jk

2,i =−xk

2,i = 0, ∀i ∈ A2\A12. Since x∗1,i ≤ λ∗1,i, ∀i ∈ B10 and x∗2,i ≤ λ∗2,i, ∀i ∈ B12 in view of (3.14), weobtain from (3.15) and the complementarity condition that {$k,jk

1,i }K → −x∗1,i = 0, ∀i ∈ B10 and{$k,jk

2,i }K → −x∗2,i = 0, ∀i ∈ B12. Also from (3.15), it follows that {$k,jk

1,i }K → −λ∗1,i, ∀i ∈ B11

and {$k,jk

2,i }K → −λ∗2,i, ∀i ∈ B13. Now we show λ∗1,i = 0, ∀i ∈ B11. Notice that by the definitionof Bk,jk

11 we have xk1,i > εk,jk

≥ 0, ∀i ∈ B11. Since every xk is feasible, we have xk2,i = 0, ∀i ∈ B11.

Hence, if x∗1,i

= 0 for some i ∈ B11, it follows that i ∈ A12(x∗). This together with Definition 2.4 of

S-stationarity implies that λ∗1,i≥ 0. However, the definition of Bk,jk

11 implies λ∗1,i ≤ x∗1,i, ∀i ∈ B11.Hence, it follows that λ∗

1,i= 0. On the other hand, if x∗

1,i> 0 for some i ∈ B11, we have from the

complementarity condition that λ∗1,i

= 0. Therefore, we obtain {$k,jk

1,i }K → λ∗1,i = 0, ∀i ∈ B11. In

the same way we can derive {$k,jk

2,i }K → λ∗2,i = 0, ∀i ∈ B13 as well. There are two cases to analyzethe behavior of the remaining elements of $k,jk .

Case 1. {αk,jk}K → 0. Then it follows from Lemma 4.4 that {εk,jk

}K → 0. This impliesA1 ⊆ A1(x∗) and A2 ⊆ A2(x∗). Strict complementarity (Assumption A6) gives that λ∗1,i > x∗1,i = 0and λ∗2,i > x∗2,i = 0 for all i ∈ A12. Therefore, if follows from (3.14) that B1 = A12 and Bl = ∅, l =2, ..., 9. Hence, we have from (3.15) that {$k,jk

1,i }K → x∗1,i = 0 and {$k,jk

2,i }K → x∗2,i = 0 for alli ∈ ∪l=1,...,9Bl.

Case 2. αk,jk≥ α for some α > 0 and all k ∈ K. Since f(x) is bounded below on the feasible

region of problem (1.6) by Assumption A3, it follows from (3.18) that {∇f(xk)>4xk,jk}K → 0. Byfollowing the same analysis as in the proof of Lemma 4.3, we obtain relations (b5)-(b12) with x andλ in them replaced by x∗ and λ∗, respectively. Hence, it follows from (3.15) that {$k,jk

1,i }K → 0 and{$k,jk

2,i }K → 0 for all i ∈ B1 ∪ B2 ∪ B3 ∪ B6 ∪ B7. If B4 ∪ B5 6= ∅, for any i ∈ B4 ∪ B5, it follows from(3.14) and (b7) and (b8) in the proof of Lemma 4.3 that x∗1,i = 0 and λ∗1,i = 0. This contradicts thestrict complementarity assumption. Hence, B4 ∪ B5 = ∅. Similarly, we can obtain B8 ∪ B9 = ∅.

Summarizing the results above, we conclude that {$k,jk}K → 0. Therefore, letting k ∈ K → ∞,(3.10) and (3.16) attain the same limit (5.2). Thus, {4xk,jk}K → 0. By Lemma 5.1, x∗ is anisolated accumulation point of {xk}. Hence, we have from Proposition 5.2 that the whole sequence{xk} converges to x∗. Under the MPLCC-LICQ, (y∗, λ∗) is the unique multiplier vector associatedwith x∗. By Theorem 4.8, either {(yk, λk)} or {(yk+1, λk+1)} converges to (y∗, λ∗). Hence, the entiresequence {(yk, λk)} converges to (y∗, λ∗) as {xk} → x∗. Moreover, since (5.1) holds, letting k →∞in (3.19) gives the convergence of {zk} to z∗.

We now show that the working set eventually identifies the active set at x∗.Lemma 5.4. Suppose Assumptions A1-A6 and (5.1) hold. The entire sequence {(xk, yk, zk, λk)}

converges to (x∗, y∗, z∗, λ∗) and for all k large enough, Ik = I(x∗), Ak1 = A1(x∗) and Ak

2 = A2(x∗).Proof. Suppose there is an infinite index set K ⊆ K such that ‖R(xk, yk, zk)‖η ≥ ε for some ε > 0

and all k ∈ K. Then it follows from Lemma 4.1 that min{‖R(xk, yk, zk)‖η, εk,0} ≥ min{ε, ε} > 0,


where ε is defined in Lemma 4.1. Hence, we have from (3.1) that I(x∗) ⊆ Ik, A1(x∗) ⊆ Ak1

and A2(x∗) ⊆ Ak2 for all k ∈ K large enough. Thus, by Lemma 5.3, the entire sequence {xk, yk, zk}

converges to the optimal primal-dual solution (x∗, y∗, z∗) of problem (2.5). However, this contradictsthe assumption that the KKT error ‖R(xk, yk, zk)‖ is bounded away from zero. Thus, it followsthat {‖R(xk, yk, zk)‖}K → 0.

Now by Lemma 4.7 we obtain the convergence of {yk, λk}K to (y∗, λ∗), which is the uniquemultiplier vector associated with x∗. Since (5.1) holds, letting k →∞ in (3.19) yields {zk}K → z∗.Clearly, ‖R(x∗, y∗, z∗)‖ = 0. Notice that for any s ∈ S(x∗), where S(x∗) is given in Definition 2.6,we have

s>∇2((x∗1)>x∗2)s = 2

∑i∈A

s>e1,ie>2,is = 2

∑i∈A

s1,is2,i = 0.

This implies s>∇2xxL(x∗, y∗, z∗)s = s>∇2f(x∗)s for s ∈ S(x∗), where L(·) is given by (2.7). Notice

the null space of the active constraint gradients for problem (2.5) at x∗ is also S(x∗). Hence, giventhat the MPLCC-SOSC hold at x∗, the SOSC for problem (2.5) hold with (x∗, y∗, z∗). Thus, byTheorem 2 in [8], there is a constant χ > 0 such that

‖xk − x∗‖ ≤ χ‖R(xk, yk, zk)‖

for all k ∈ K large enough. This together with (3.1) and the fact that η ∈ (0, 1) imply Ik = I(x∗),Ak

1 = A1(x∗) and Ak2 = A2(x∗) for all k ∈ K large enough. Hence we conclude from Lemma 5.3

that the entire sequence {xk, yk, zk, λk} converges to (x∗, y∗, z∗, λ∗). Finally, we again have fromTheorem 2 in [8] and (3.1) that Ik = I(x∗), Ak

1 = A1(x∗) and Ak2 = A2(x∗) for all k large enough.

Let

E∗ = [e0,i, i ∈ I(x∗); e1,i, i ∈ A1(x∗); e2,i, i ∈ A2(x∗)].(5.3)

The next result shows the quadratic convergence of a full Newton step.Lemma 5.5. Suppose Assumptions A1-A6 and (5.1) hold. Then for all k large enough, (i)

$k,0 = −(E∗)>xk; (ii) Hk = ∇2f(xk); (iii) ‖xk +4xk,0 − x∗‖ = O(‖xk − x∗‖2); (iv) there is aconstant µ > 0 such that for all k large enough, ∇f(xk)>4xk,0 ≤ −µ‖4xk,0‖2.

Proof. First note that for every k, Ak,01 = Ak

1 and Ak,02 = Ak

2 . Hence, it follows from (3.14) thatBk,0

l = ∅ (l = 10, 11, 12, 13) for every k. Since strict complementarity holds at x∗, we obtain fromLemma 5.4 and (3.14) that for all k large enough, Ik

+ = I(x∗), Ik− = ∅, Bk,0

1 = Ak12 = A12(x∗) and

Bk,0l = ∅ for l = 2, ..., 9. Now result (i) follows from (3.15).

From (3.8) and Lemma 5.4, we have Sk = S(x∗) for all k large enough. Hence, the satisfactionof the MPLCC-SOSC at x∗ implies that (3.7) eventually holds with the exact Hessian ∇2f(xk).Thus, by Steps 1.2 and 4 of Algorithm 3.1, we have Hk = ∇2f(xk) for all k large enough.

Combining the results above, we obtain from (3.16), (2.2), (2.3) and (2.4) that for j = 0 and allk large enough,

Mk

xk +4xk,0 − x∗

yk+1,0 − y∗

λk+1,0 − (E∗)>λ∗

=

−∇f(xk) +∇2f(xk)(xk − x∗)−A>y∗ + E∗(E∗)>λ∗

A(xk − x∗)(E∗)>(xk − x∗) + $k,0

=

∇f(x∗)−∇f(xk) +∇2f(xk)(xk − x∗)00

.

(5.4)

By Taylor’s theorem, we have

‖∇f(x∗)−∇f(xk) +∇2f(xk)(xk − x∗)‖ = O(‖xk − x∗‖2).(5.5)


Hence, we know from Lemma 4.2 and (5.4) that result (iii) holds for all large enough k.From (3.16) for j = 0 and k large enough we have that

∇f(xk)>4xk,0 = −(4xk,0)>∇2f(xk)4xk,0 + ($k,0)>λk+1,0.(5.6)

Also from Definition 2.6 of the MPLCC-SOSC we have using a modified version of a lemma in [4;pp. 296] that there exists a δ0 > 0 such that for any µ, 0 < µ < %, where % is specified in Definition2.6,

d>(∇2f(x∗) + δE∗(E∗)>)d ≥ µ‖d‖2(5.7)

for any d such that Ad = 0, whenever δ > δ0. Finally, we have for k large enough, since (E∗)>4xk,0 =(Ek)>4xk,0 = $k,0 = −(E∗)>xk, {λk+1,0} → (E∗)>λ∗ by (5.4) and (5.5), Bk,0

l = ∅ for l = 2, ..., 13,and Ik

− = ∅, and $k,01,i = 0 for i ∈ Ak,0

1 \Ak,012 and $k,0

2,i = 0 for i ∈ Ak,02 \A

k,012 , that

($k,0)>λk+1,0 + δ4xk,0E∗(E∗)>4xk,0 = −∑

i∈Ik+(λ∗0,i − δxk

0,i)xk0,i

−∑

i∈Bk,01

[(λ∗1,i − δxk

1,i)xk1,i + (λ∗2,i − δxk

2,i)xk2,i

]≤ 0.

(5.8)

The nonnegativity of the above follows from the facts that λk0,i ≥ xk

0,i for i ∈ Ik+, λk

1,i ≥ xk1,i and

λk2,i ≥ xk

2,i for i ∈ Bk,01 . Combining (5.6), (5.7) and (5.8) and using the continuity of ∇2f(x) yields

∇f(xk)>4xk,0

= −(4xk,0)>∇2f(x∗)4xk,0 + ($k,0)>λk+1,0

+(4xk,0)>(∇2f(x∗)−∇2f(xk))4xk,0

≤ −µ‖4xk,0‖2 + δ(4xk,0)>E∗(E∗)>4xk,0 + ($k,0)>λk+1,0

+(4xk,0)>(∇2f(x∗)−∇2f(xk))4xk,0

≤ −µ2 ‖4xk,0‖2.

Part (iv) of the Lemma follows by choosing µ = µ2 .

Lemma 5.6. Suppose Assumptions A1-A6 and (5.1) hold. Then xk+1 = xk +4xk,0 for all klarge enough.

Proof. Since αk,0 = 1 for every k by Step 3.2 of Algorithm 3.1, it suffices to show that (3.17)and (3.18) hold for j = 0 when k is large enough. Clearly, {4xk,0} → 0 by Lemma 5.5 (iii). Hence,for all large enough k, xk

0,i + 4xk,00,i > 0, ∀i ∈ I\I(x∗), xk

1,i + 4xk,01,i > 0, ∀i ∈ A\A1(x∗) and

xk2,i +4xk,0

2,i > 0, ∀i ∈ A\A2(x∗). Moreover, by Lemma 5.5 (i), xk0,i +4xk,0

0,i = xk0,i +$k,0

0,i = 0, ∀i ∈I(x∗), xk

1,i +4xk,01,i = xk

1,i +$k,01,i = 0, ∀i ∈ A1(x∗) and xk

2,i +4xk,02,i = xk

2,i +$k,02,i = 0, ∀i ∈ A2(x∗).

Therefore, (3.17) holds for j = 0 and all large enough k.Now consider (3.18). Using Taylor’s theorem, we have

f(xk +4xk,0)

= f(xk) +∇f(xk)>4xk,0 + 12 (4xk,0)>∇2f(xk)4xk,0 + o(‖4xk,0‖2)

= f(xk) + 12∇f(xk)>4xk,0 + 1

2 (4xk,0)>E∗λk+1,0 + o(‖4xk,0‖2)

≤ f(xk) + 12∇f(xk)>4xk,0 + o(‖4xk,0‖2)

≤ f(xk) + σ∇f(xk)>4xk,0 − µ( 12 − σ)‖4xk,0‖2 + o(‖4xk,0‖2)

≤ f(xk) + σ∇f(xk)>4xk,0,

(5.9)

where the second equality follows from (5.6), the first inequality follows from (5.8) and the factthat $k,0 = (E∗)>4xk,0 and the last two inequalities follow from Lemma 5.5 (iv) and the fact thatσ ∈ (0, 1

2 ). By (5.9), (3.18) holds eventually for j = 0. Hence, the lemma follows.


Combining the results in Lemma 5.5 (iii) and Lemma 5.6, we directly obtain that Algorithm 3.1converges quadratically.

Theorem 5.7. Suppose Assumptions A1-A6 and (5.1) hold. Then the sequence {xk} generatedby Algorithm 3.1 converges quadratically, i.e., ‖xk+1 − x∗‖ = O(‖xk − x∗‖2).

6. Conclusion. We have proposed an active set method for solving MPLCCs with smoothobjective functions. Under the MPLCC-LICQ, we have proved that the proposed method convergesto a S-stationary solution of the MPLCC. As far as we are aware, this result is stronger than any thathave been obtained for NLP based regularization or decomposition methods, which only guaranteeconvergence to C- or M-stationary points under the MPLCC-LICQ. Assuming additional MPLCC-SOSC and strict complementarity, we have shown the asymptotic rate of convergence of our methodis quadratic.

Our method has several distinguishing features compared with existing methods. First, it em-ploys a projected Newton step with respect to the working set that is maintained at every iteration toestimate the final active set. The working set defines a subspace for the primal variables. Elementsof the step direction corresponding to the subspace are determined according to the current dualiterate. Other elements are computed through a Newton system. Our method is implementable at areasonable cost, involving mainly one matrix factorization for each major iteration. The line searchused in our method is a variant of a traditional backtracking line search, that not only guarantees thefeasibility of every iterate, but also ensures that at each iteration a sufficient reduction is achievedin the objective function.

There are several issues that require further research. First, can fast local convergence occurwithout strict complementarity? Also, fast local convergence was established under condition (5.1).Although we can always choose parameter zmax

3 large enough so that (5.1) holds, it is important inpractice to update zmax

3 dynamically to ensure (5.1). Finally, we would like to extend the proposedactive set approach to handle nonlinear constraints.

REFERENCES

[1] M. Anitescu, On Solving Mathematical Programs with Complementarity Constraints as Nonlinear Programs,SIAM J. Optim., 15 (2005), pp. 1203-1236.

[2] M. Anitescu, Global convergence of an elastic mode approach for a class of mathematical programs with com-plementarity constraints, SIAM J. Optim., 16 (2005), pp. 120-145.

[3] Y. Chen, B. F. Hobbs, S. Leyffer and T. S. Munson, Leader-follower equilibria for electric power and NOxallowances markets, Computational Management Science, 3 (2006), pp. 307-330.

[4] G. Debreu, Definite and semidefinite quadratic forms, Econometrica, 20 (1952), pp. 295-300.[5] A. Demiguel, M. Friedlander, F. Nogales and S. Scholtes, A two-sided relaxation scheme for mathematical

programs with equilibrium constraints, SIAM J. Optim., 16 (2005), pp. 587-609.[6] F. Facchinei, H. Jiang and L. Qi, A smoothing method for mathematical programs with equilibrium constraints,

Math. Programming, 85 (1999), pp. 107-134.[7] M. C. Ferris and J. S. Pang, Engineering and economic applications of complementarity problems, SIAM

Review, 39 (1997), pp. 669-713.[8] A. Fischer, Local behaviour of an iterative framework for generalized equations with nonisolated solutions.

Math. Programming, 94 (2002), 91-124.[9] R. Fletcher and S. Leyffer, Solving mathematical program with complementarity constraints as nonlinear

programs, Optim. Methods and Softw., 19 (2004), pp. 15C40.[10] R. Fletcher, S. Leyffer, D. Ralph and S. Scholtes, Local convergence of SQP methods for mathematical

programs with equilibrium constraints, SIAM J. Optim., 17 (2006), pp. 259-286.[11] A. Forsgren, Inertia-controlling factorizations for optimization algorithms, Appl. Numer. Math., 43 (2002),

pp. 91-107.[12] , Convergence of a smoothing continuation method for mathematical programs with complementarity

constraints, Ill-posed Variational Poroblems and Regularization Techniques, Edited by M. Thera and R.Tichatschke, Springer, New York, NY, pp. 99-110, 2000.

[13] M. Fukushima and P. Tseng, An implementable active-set algorithm for computing a B-stationary point of amathematical program with linear complementarity constraints, SIAM J. Optim., 12 (2002), pp. 724-739.


[14] , An implementable active-set algorithm for computing a B-stationary point of a mathematical programwith linear complementarity constraints: erratum, to appear in SIAM J. Optim., 2006.

[15] G. Giallombardo and D. Ralph, Multiplier convergence in trust-region methods with application to conver-gence of decomposition methods for MPECs, Online first, Math. Programming, 2006.

[16] X. M. Hu and D. Ralph, Convergence of a penalty method for mathematical programming with complementarityconstraints, J. Optim. Theory Appl., 123 (2004), pp. 365-390.

[17] H. Jiang and D. Ralph, Smooth SQP methods for mathematical programs with nonlinear complementarityconstraints, SIAM J. Optim., 10 (2000), pp. 779-808.

[18] C. Kanzow and H. Qi, A QP-free constrained Newton-type method for variational inequality problems, Math.Programming, 85 (1999), pp. 81-106.

[19] J. J. More and D. C. Sorensen, Computing a trust region step, SIAM J. Sci. Statist. Comput., 4 (1983), pp.553-572.

[20] S. Leyffer, G. Lopez-Calva and Jorge Nocedal, Interior methods for mathematical programs with comple-mentarity constraints, SIAM J. Optim., 17 (2006), pp. 52-77.

[21] G. Lin and M. Fukushima, New relaxation method for mathematical programs with complementarity con-straints, J. Optim. Theory Appl., 118 (2003), pp. 81-116.

[22] G. Liu and J. J. Ye, A merit function piecewise SQP algorithm for solving mathematical programs withequilibrium constraints, to appear in J. Optim. Theory Appl., 2006.

[23] X. Liu and J. Sun, Generalized stationary points and an interior-point method for mathematical programs withequilibrium constraints, Math. Programming, 101 (2004), pp. 231-261.

[24] Z. Luo, J. S. Pang and D. Ralph, Mathematical programs with equilibrium constraints, Cambridge UniversityPress, Cambridge, UK, 1996.

[25] , Piecewise sequential quadratic programming for mathematical programs with nonlinear complementarityconstraints, A. Migdalas, P. M. Pardalos and P. Varbrand, eds., Kluwer Academic Publishers, Dordrecht,The Netherlands, 1998, pp. 209-229.

[26] J. Outrata, M. Kocvara and J. Zowe, Nonsmooth approach to optimization problems with equilibrium con-straints, Kluwer Academic Publisher, Dordrecht, 1998.

[27] A. U. Raghunathan and L. T. Biegler, Interior point methods for mathematical programs with complemen-tarity constraints, SIAM J. Optim., 15 (2005), pp. 720-750.

[28] D. Ralph, Sequential quadratic programming for mathematical programs with linear complementarity con-straints, Computational Techniques and Applications: CTAC95 (Melbourne, 1995), pp. 663-668, WorldScientific, River Eege, NJ, 1996.

[29] S. M. Robinson, Strongly regular generalized equations, Math. Oper. Res., 5 (1980), pp. 43-62.[30] H. Scheel and S. Scholtes, Mathematical programs with complementarity constraints: Stationarity, optimality

and sensitivity, Math. Oper. Res., 25 (2000), pp. 1-22.[31] S. Scholtes, Convergence properties of a regularization scheme for mathematical programs with complemen-

tarity constraints, SIAM J. Optim., 11 (2001), pp. 918-936.[32] S. Scholtes and M. Stohr, Exact penalization of mathematical programs with equilibrium constraints, SIAM

J. Control Optim., 37 (1999), pp. 617-652.[33] D. F. Shanno and R. J. Vanderbei, Interior-point methods for nonconvex nonlinear programming: orderings

and high-order methods, Math. Programming, 87 (2000), pp. 303-316.[34] A. Wachter and L. T. Biegler, On the implementation of an interior-point filter line-search algorithm for

large-scale nonlinear programming, Math. Programming, 106 (2006), pp. 25-57.

Documents

1. Introduction. › 0273 › a0cd5cf29b7d... · An interesting attempt to solve general MPECs is to directly apply state-of-the-art NLP active set solvers, which do not rely on the