66
Optimization and Lagrange Multipliers: Non-C 1 Constraints and "Minimal" Constraint Qualifications by Leonid Hurwicz and Marcel K. Richter Discussion Paper No. 280, March 1995 Center for Economic Research Department of Economics University of Minnesota Minneapolis, MN 55455

Optimization and Lagrange Multipliers

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimization and Lagrange Multipliers

Optimization and Lagrange Multipliers:

Non-C 1 Constraints and

"Minimal" Constraint Qualifications

by

Leonid Hurwicz and Marcel K. Richter

Discussion Paper No. 280, March 1995

Center for Economic Research Department of Economics University of Minnesota Minneapolis, MN 55455

Page 2: Optimization and Lagrange Multipliers

Optimization and Lagrange Multipliers:

Non-C 1 Constraints

and

"Minimal" Constraint Qualifications

by

Leonid H urwicz

and

Marcel K. Richter

University of Minnesota

ABSTRACT

When do Lagrange multipliers exist at constrained maxima? In this paper we establish:

a) Existence of multipliers, replacing C1 smoothness of equality con­straint functions by differentiability (for Jacobian constraint qualifica­tions) or, for both equalities and inequalities, by the existence of par­tial derivatives (for path-type constraint qualifications). This unifies the treatment of equality and inequality constraints.

b) A notion of "minimal" Jacobian constraint qualifications. We give new Jacobian qualifications and prove they are minimal over cer­tain classes of constraint functions.

c) A path-type constraint qualification, weaker than previous con­straint qualifications, that is necessary and sufficient for existence of mul­tipliers. (It only assumes existence of partial derivatives.)

A survey of earlier results, beginning with Lagrange's own multipli­ers for equality constraints is contained in the last section. Among oth­ers, it notes contributions and formulations by Weierstrass; Bolza; Bliss; Caratheodory; Karush; Kuhn and Tucker; Arrow, Hurwicz, and Uzawa; Mangasarian and Fromovitz; and Gould and Tolle.

Page 3: Optimization and Lagrange Multipliers

Optimization and Lagrange Multipliers:

N on-C 1 Constraints

and

"Minimal" Constraint Qualifications*

by

Leonid H urwicz

and

Marcel K. Richter

University of Minnesota

I Introduction

Constrained optimization is central to economics, and Lagrange multipliers are a basic tool in solving such problems, both in theory and in practice. In this paper we extend the applicability of Lagrange multipliers to a wider class of problems, by reducing smoothness hypotheses (for classical Lagrange inequality constraints as well modern inequality constraints), and by reducing constraint qualifications to minimal levels.

We focus on constrained maximization using calculus, and in particular on first order necessary conditions. While there have been important contributions that go beyond the use of calculus tools (subdifferentials, Clarke cones, etc.), cal­culus tools are important, especially for giving explicit results that are helpful in many economic applications.(l) First order necessary conditions are partic­ularly valuable in problems where convexity conditions are not satisfied, as in economic models involving production sets with increasing returns to scale, and in characterizing the role of marginal cost pricing in such economies.

* We are indebted to Professor Kam-Chau Wong, Chinese University of Hong Kong, for valuable comments on an earlier version.

(1) For some of the economic applications of Lagrangean techniques see Hicks [23], Samuelson [41], Takayama [44, pp. 129-168], and many others.

Page 4: Optimization and Lagrange Multipliers

Two centuries years ago, Lagrange (with Euler as precursor) introduced "in­determinate" multipliers, placing the necessary consequences of constrained max­imization in a general framework [19].(2) His result for equality constraints is known in most analysis texts today as the Lagrange Multiplier Theorem. Half a century ago, Karush [26] obtained an analogue for inequality constraints, and he was followed independently by Kuhn and Tucker's celebrated paper [30] a decade later.

To extend those, and several more recent Lagrange multiplier results, to a wider class of problems, we employ:

a) reduced smoothness requirements on constraints and maximands;

b) weaker constraint qualifications;

c) notions of minimal constraint qualifications.

Under (a), we weaken the differentiability hypotheses for proving existence of Lagrange multipliers. In the classical equality-constrained context, for example, we require only continuity and differentiability of the constraints, and only at the maximizer, instead of the usual continuous differentiability. Under (b), we provide new constraint qualifications that are weaker, yet still guarantee exis­tence of Lagrange multipliers. These cover problems with inequality constraints and problems with mixed constraints. Under (c), we first introduce notions of minimal constraint qualifications of two types. The first type is defined by the Jacobian matrix at the maximand, and the second is defined by more general properties of paths lying in, or related to the constraint set. Using these notions, we prove that the Jacobian conditions introduced in (b) are minimal Jacobian conditions for "Lagrange regularity." We also prove that the path conditions in­troduced in (b) are necessary and sufficient for existence of Lagrange multipliers.

There are two mathematical bases on which our results depend. The first is a strong form of the Theorem of the Alternative from linear algebra. (This is closely related to the tools, such as Farkas' Lemma or Motzkin's Transposition Theorem, which others have used for solving linear equalities and inequalities.) Persistent exploitation of the algebraic result allows us to present simple proofs, to clarify the separate roles of algebra and analysis, and to make transparent the essential role of constraint qualifications.

The second base is a new implicit function theorem [24] with very weak dif­ferentiability hypotheses. The classical approach ([6], [7], [11]) to Lagrange mul-

(2) See QUATRIEME SECTION, paragraphs 1-8, pages 44-49 of Mechanique Analitique. A similar development is in [33], SECTION QUATRIEME, Sections 2-8, pages 77-83, which is an 1888 (fourth) edition of [19] under the name Mecanique Analytique; "Methode des multiplicateurs" occurs as the heading here, but not in the first edition.

See also [32], SECONDE PARTIE, Chapter XI, paragraph, pp. 291-292.

2

Page 5: Optimization and Lagrange Multipliers

tipliers with equality constraints used the classical Implicit Function Theorem (assuming C 1 functions), and for that reason it was necessary, in the Lagrangean theorems, to make sure that the equality constraints were C 1 . (For inequality constraints, there was a breakthrough in Kuhn and Tucker [30], which already replaced C 1 by simple differentiability.)

For mixed equality-inequality optimizations problems, which are typical of economics, the C 1 assumption has always been retained for equality constraints while in [35] it was relaxed to differentiability at the maximizing point for the in­equality constraints. Here we will formulate more uniform conditions, where the C 1 hypothesis is dropped for the equality constraints as well as for the inequality constraints. We have been able to accomplish this by proving a generalization of the Implicit Function Theorem that reduces the C 1 hypothesis to continuity and differentiability.

Because we focus here on first order (necessary) differential conditions for maximization, our theorems do not introduce the type of convexity or concav­ity conditions used in [4] or [35]. However, one could easily adjoin convexity conditions to our hypotheses; that would yield something more general than the results in those contributions.

We do not shy away from redundancies or explanations that would be obvious to a seasoned mathematician, but may be helpful to a student.

Since we are weaving together many strands, the following outline may be helpful.

40

II. Notation and Terminology: page 5 Also includes an index of the major definitions.

III. Constrained Maximization: page 8 Defines the Lagrange constrained maximization problem and the role of Lagrange multipliers, leading to definitions of the basic types of Lagrange regularity. Proves the algebraic Fun­damental Lemma, on which later proofs will be based, and which explains the need for constraint qualifications.

IV. The Jacobian Criterion: page 15 Defines Jacobian constraint qualifications, their sufficiency for Lagrange regularity, and the notion of minimal sufficiency. States the Jacobian Criterion for mixed, inequality, and equal­ity problems.

V. The Jacobian Criterion is Sufficient: page 19

3

Page 6: Optimization and Lagrange Multipliers

VI. The Jacobian Criterion Is Minimal: page 30

VII. The Tangency-Path Criterion: page 34 Explains the need to go beyond Jacobian constraint qualifications. States the Tangency-Path Criterion for mixed, inequality, and equality problems, and proves it is necessary as well as sufficient for Lagrange regularity.

VIII. Comparison of Jacobian and Tangency-Path Conditions: page 50 Compares the verifiability and computability aspects of J aco­bian and Tangency-Path Conditions.

IX. Appendix: page 52 The two mathematical results on which the main results are based: a Theorem of the Alternative and the Non-C! Implicit Function Theorem.

X. Historical Comments and Comparisons: page 54.

4

Page 7: Optimization and Lagrange Multipliers

II Notation and Terminology

We denote the set of natural numbers by IN = {O, 1,2,3, ... }, and the set of real numbers by IR. For ueIRn = IRk X IRm we write u = (x, y), where xeIRk

and ye IRm. When F : IRk X IRm ->- IRP and when all its partial derivatives

~~(u) exist at U, then F'(u) denotes the p x n Jacobian matrix: uU'

OFI(u) OFI(u)

OUI oUn

(1)

oFP(u) oFP(u) OUI oUn

When the function F not only has partial derivatives at U, but has the stronger(3) property of possessing a Frechet derivative at U, then we denote it by Fu(u); of course in this case the linear transformation Fu(u) is represented with respect to the standard bases of IRn and IRP by the matrix F'(U),<4) and so, for any zeIRn

:

When F is Frechet differentiable at U, then F'(u)z is a

matrix representation of the vector Fu( u)z. (2)

Similarly, when F(·, y) possesses a Frechet derivative at X, it is denoted by Fx(x, y); and when F(x, . ) possesses a Frechet derivative at y, it is denoted by Fy(x, y). For brevity, differentiability will always mean Frechet differentiability unless otherwise noted.

A function F : X ->- Y is said to be locally continuous at a point xeX if F is continuous on some neighborhood of x.

If A and B are linear subspaces of IRn, and if every element u of IRn can be

written uniquely as a sum of elements in A and B: u = x + y, where xeA and yeB, then we say that IRn is the direct sum of A and B, and we write:

IRn =A$B. (3)

Because all norms in a finite dimensional linear vector space lead to the same notions of convergence and differentiability, it will not matter which norm we use. As convenient, we will use three different norms on our basic spaces: for

(3) Cf. [40, p. 240, Exercise 14].

(4) Cf. [40, Theorem 9.17, p. 215].

5

Page 8: Optimization and Lagrange Multipliers

any veIRl,

the Euclidean norm: Ilvll = Jvr + ... + vf

the maximum norm: IIvll = max{lvll,···, Ivli}

the sum norm: for any normed subspaces A and B,

if IRn = A EI7 B, and v = a + b with aeA and beB,

then IIvll = lIall + IIbli.

For any veIRl and any real" we denote the closed ,-ball about v by

B-y(v) = {v + weIRl : IIwll ~ ,}.

For x and yin IRl:

x~y

x?:.y

means Xi ~ Yi for all i = 1, ... , n

means X ~ Y and x 1= Y

X> Y means Xi > Yi for all i = 1, ... , n.

(4a)

(4a)

(4c)

(5)

(6)

For an open set U ~ IRn and for I = (P, ... , Iq) : U -+ IRq, the notation I ~ 0 means Ii (Xl, ... , xn) ~ 0 for each i = 1, ... , q and for each i = 1, ... , q.

For any subset S of IRl,

ch(S ) = the convex hull of S

= the intersection of all convex sets T ~ S

= {toxo + tlxl + ... + tmxm : meIN & Xo, Xl, ... , XmeS

& to,tl, ... ,tm ~ 0 & to +tl + .. ·+tm = I}

cl(S ) = the topological closure of S

(7c) cone( S) = the conical closure of S

={tx:xeS & realt~O}

wedge(S) = the wedge generated by S

= the convex cone generated by S

= cone( ch( S))

span(S) = the linear subspace generated by S

= {to +tlxl +. ··+tmxm: meIN

& Xl, ... ,xmeS & to,tl, ... ,tmeIR}.

For any subsets Sand T of IRl:

S + T = {x + Y : xeS & yeT}

S+0 = S= 0+S.

6

(7d)

(7e)

(8a)

(7a)

(7b)

Page 9: Optimization and Lagrange Multipliers

Nonnegative orthants are denoted by:

JR~ = {xeJR1 : x ~ O}. (9)

When interpreted for matrix multiplication, elements x e JRl will be treated as column vectors. For any such x, we denote the transpose by xT, which we treat as a row vector for matrix multiplication.

We collect page references here for some notions to be defined later:

Lagrange regularity page 10 Jacobian constraint qualifications page 15

Sufficient page 15 Minimal page 15 Jacobian Criterion page 17

Tangency-Path Criterion page 37.

7

Page 10: Optimization and Lagrange Multipliers

III Constrained Maximization

We are concerned with maximizing a real valued function f on an open set U ~ IRn, subject to conditions of the form gl ~ 0, ... , grn ~ 0 and hI = 0, ... , hk = 0, where the gi and hi are also functions from U to IRn. Either 9 or h, or both, may be absent. Let 9 = (gl, ... , grn) and h = (hI, ... , hk), and define the constraint set by

C(g, h) = {xeu : Vi i=I, ... ,rn gi(x) ~ 0 & Vj i=I, ... ,k hi(x) = O}; (10)

i.e., when both 9 and h are present:

C(g, h) = {xeU : g(x) ~ 0 & h(x) = O}.

A point ueU is said to maximize f on C(g,h) if:

ueC(g, h)

Vu ueC(g,h) f(u) ~ f(u).

(11)

(12)

When 9 is absent, this is a problem of classical mathematics; and when h is absent, it is well known in economics as a "Kuhn-Tucker" problem. We write C(g) when h is absent, and C( h) when 9 is absent, trusting the context will avoid ambiguity. When both 9 and h are absent, i.e., when there are no constraints, then C(g, h) = IRn.

III. A Lagrange Regularity

Suppose that ueU maximizes f on C(g, h) and that f, g, and h have partial derivatives at u. We are interested in the existence of "Lagrange multipliers" A e IR~ and J.L e IRk , satisfying:

j'(u) + AT g'(u) + J.LT h'(u) = O. (13)

In addition, we will also want>. to satisfy a nonnegativity condition. But simple examples show that - quite apart from any additional requirements, there may exist no A and J.L satisfying (13). Consider inequalities defined by:(5)

gl(U!,U2) = U2 - ur ~ 0

g2(Ul, U2) = -U2 - ui ~ 0,

(5) This is analogous to Slater's example in [42].

8

(14)

Page 11: Optimization and Lagrange Multipliers

or equalities defined by:

h1(U1,U2)=U2- Ur=0

h2(Ul, U2) = -U2 - ur = O. (15)

In each case the constraint set is the singleton {O}, and so any function f(Ul, U2) has a constrained maximum there. But the function f( Ul, U2) = Ul does not admit any A or J.L satisfying (13) when the constraints are (14) or (15), since of -a (0,0) = l. Ul

Special assumptions are therefore needed to guarantee the existence of La­grange multipliers A and J.L. To state these conditions, as well as to sharpen the question, we first explain when an inequality constraint is "binding."

Binding Constraints. When ii maximizes f subject to 9 = (gl, ... , gm) ~ o and h = (hl, ... , hk) = 0, some of the gi (ii) may be positive, while others may equal O. If gi(ii) > 0 then small movements from ii will, by continuity, preserve the constraint property gi (ii) ~ 0; the same is true if gi is constant throughout some neighborhood of it. Since the condition (13) is a local condition, we can then restrict attention to a small enough neighborhood on which gi ~ 0 will hold throughout - and so we can effectively ignore such a constraint. To distinguish the constraints that cannot be so ignored from those that can, we say that a constraint gi is binding at it if gi (it) ~ 0 for all j = 1, ... , m, and if it belongs to the boundary of the "individual" constraint set {xelRn

: gi(x) ~ O}. Then we say that gi is a binding constraint at it. For example, if gi is continuous and if either gi (it) > 0 or gi is constant over a neighborhood of ii, then gi is not binding at it. (6)

With U an open subset of lRn , with it e U, and with 9 : U -+ lRm, we partition

M = {I, ... , m} to distinguish the gi that are binding at it from the others:(7)

1= {ieM : it is in the boundary of {ueU : gi(u) ~ O}}

J=M,I

(either I, J, or both may be empty). When I or J is nonempty, we write:

g] : U -+ lRP

gJ : U -+ lRm- p

- ( ) . U lRP lRm- p - IRm g-g],gJ. -+ X -.

(6) Note that if 91 = 92 , then each constraint may be binding at u.

(16)

(17)

(7) One could introduce an analogous distinction between binding and nonbinding equality constraints, but we shall not do so.

9

Page 12: Optimization and Lagrange Multipliers

Regularity. Now we seek conditions under which the fact that ue U maxi­mizes I on C(g, h) implies the existence of a A e IRm and aile IRk such that:

J'(u) +ATg'(U) +IlTh'(u) =0, (18a)

i.e.,

a I _ ag1 _ agm _ -a (U)+A1-

a (u)+ ... +Am -

a (U)

Ui Ui Uj

ahl ahk +1l1-

a (u)+ ... +llk-

a (U)=O

and such that:(8)

A~O

and

Ui Uj

Ai = 0 if gi is not binding at u (i.e., ie J).

(18b)

(i= 1, ... ,n),

(20a)

(20b)

Since (18) and (20) are linear in the Ai and the Ilj, it follows that, when the values of the derivatives of i and hj are known, then the existence of Ai and Ilj satisfying (18) and (20) can be determined algorithmically by "elimination of quantifiers" (e.g. by Fourier elimination).(9) But the question is often asked in a different form.

Rather than seeking solvability of (18) and (20) for A and Il given particular gi, hi, and I, one often asks whether given functions gi and hj allow solvability for all I in some family:r that are maximized on C(g, h) at u. (We shall always assume that each le:r has partial derivatives at u.) So we say that (g, h) is Lagrange mixed-regular lor (u, :r) if: (10)

i) all the gi and hj all have partial derivatives at u;

ii) for every function I e:r, if u maximizes I on C(g, h) then (18) and (20) hold for some (A,Il).

We have a corresponding definition when h is absent; we say that 9 is Lagrange inequality-regular for (u,:r) if:

(8) It is tempting to suppose that (20) implies Ai > 0 whenever gi is binding at u. Simple examples show that is not true. Consider this example in JR.1:

j(x) = _x2

g(x) = X

u = o.

Here the constraint 9 is binding at the maximizer u = 0, but A = 0 is required by (18).

(9) Cf. [29); [43,chapter 1).

(19)

(10) The terminology is modified from [4). We really should call this Lagrange regularity for (U,u,3); for brevity, we omit the reference to U.

10

Page 13: Optimization and Lagrange Multipliers

i) all the g; have partial derivatives at u; ii) for every function j E~, if u maximizes j on C(g) then:

j'(u) +)"T g'(ji) = 0

and (20) hold for some )".

(21)

And when 9 is absent we say that h is Lagrange equality-regular for (u, ~ if:

i) all the hj have partial derivatives at u; ii) for every function j E~, if u maximizes j on C( h) then:

J'(u) + p.T h'(u) = 0 (22)

for some 1-'.

We will often refer simply to Lagrange regularity, counting on the context to make the type clear. When both 9 and h are absent, then we say that we have Lagrange regularity for (u, ~ if, for every j E~, if u maximizes j on U then

J'(u) =0.

And now, seeking)" and I-' that satisfy (18) and (20) for all such j, the answer is less trivial. We begin by restating the problem in a simpler form with a natural

Reduction. Let 9 : U -> IRm, h : U -> IRk, and j : U -> IR have partial

derivatives at u. (11) Suppose that the binding constraints are g[ = (g1, ... , gP). Then a solution )"E IRm and I-'EIRk to (18) and (20) exists if and only if there exists a >. E IRP and a I-' E IRk such that:

j'(u) +).T g/(U) + I-'Th'(u) = 0, (23a)

I.e.,

a j _ - og1 _ - agP _ au; (u) + ),,10U; (u) + ... +)"P au; (u)

oh 1 ahk

+ 1-'1 £:)(u) + ... + I-'k£:)(U) = 0 UUi UUi

(23b)

(i=I, ... ,n),

and such that:

>. ~ O. (24)

(11) Actually, the gP+l, ... , gffi need not have partial derivatives, since they don't appear in (23), and they effectively don't appear in (18) since the corresponding Ai vanish according to (2Gb).

11

Page 14: Optimization and Lagrange Multipliers

Remark. In defining Lagrange regularity, we assumed that partial deriva­tives of the gi, hi, and f exist at U, even when not binding, in order to give meaning to (18). It is possible to avoid assuming that the nonbinding gi possess partial derivatives at U, by using (23) and (24) in place of (18) and (20). While this would be more general, it would be less convenient to apply in situations where the structure of the constraint set is not known a priori.

III. B The Fundamental Lemma. Constraint Qualifications

Although our basic maximization question is one of analysis, it is useful to view it algebraically. For basically what it asks is whether the vector - f' (u) is a non­negative linear combination of the vectors gi, (u) plus a linear combination of the hi(u) - i.e., whether it lies in the sum of the wedge (the convex cone) generated by the gi,(u) vectors plus the span of the hi,(u) vectors. Thus separating out algebraic from analytical aspects of the problem will be quite helpful.

Our answers all rest on the following simple consequence of the Theorem of the Alternative.

Fundamental Lemma. Let U be an open subset of IRn, and let u e U.

A) Suppose f : U -+ IR, 9 : U -+ IRP, and h : U -+ IRk are functions on U that have partial derivatives at U, with g( u) = 0, and suppose there does not exist (>., J.l) e IRP x IRk satisfying the following two conditions:

i) J'(u) + >.T g'(u) + J.lTh'(u) = 0, (25a)

i.e., such that:

of _ ag1 _ agP _

aUi (u) + >'1 aUi (u) + ... + >'P aUi (u)

ah1 ahk

+ J.ll ~(u) + ... + J.lk~(U) = 0 UUi UUi

and:

ii) >. f; O.

Then there exists a zeIRn such that:

g'(u)z f; 0

h'(u)z = 0

J'(u).z> O.

12

(i=I, ... ,n),

(25b)

(26a)

(27a)

(27b)

(27c)

Page 15: Optimization and Lagrange Multipliers

B) When 9 is absent, part (A) holds if we delete references to, and terms and relations involving A or g.

C) When h is absent, part (A) holds if we delete references to, and terms and relations involving J-l or h.

Proof. This is almost an immediate application of the Theorem of the Alternative. (12)

Part A: By hypothesis there do not exist Ai and J-li satisfying:

agl agP Al7l(U) + ... + Ap7l(U) (28a)

VUI VUI ahl _ ahk _ af_

+ J-lI-(U) + ... + J-lk-(U) = --(u) aUl aUI aUl

al agP AI-(U) + ... + Ap-(U)

aUn aUn ahl _ ahk _ af_

+ J-lI-(U) + ... + J-lk-(U) = --(u) aUn aUn aUn

All + A20 + ... + ApO ~ 0 (28b)

AlO + ... + Ap-lO + ApI ~ O.

So by part (II) of the Theorem of the Alternative there exist Z E IRn and v E IRP with v ~ 0, such that:

agj _ agj _ . Zl7l(u)+···+zn7l(u)+Vj =0 (J = 1, ... ,n) (29a)

UUl vUp

ahi ahi Zl7l(U)+···+zn~(u)=O (j=I, ... ,k) (29b)

VUI vUn of _ of _

- (ZI7l(U) + ... + Zn ~(u)) > o. (29c) UUI VUn

Defining z = -Z, and taking account of the fact that v ~ 0, we have (27).

Parts (B, C): The proof is analogous to that for part (A) above, using the appropriate section of part (II) of the Theorem of the Alternative. I

(12) See the Appendix.

13

Page 16: Optimization and Lagrange Multipliers

We note that the lemma is a purely algebraic statement about derivatives. Though it does not involve maximization, it is basic to maximization theory. The following intuition shows why.

Suppose that f, g, and h are differentiable at U, which maximizes f on the constraint set C(g, h). If (g, h) is not Lagrange regular, then the lemma implies that (27) holds. If there were a path u( .) with values u(t) lying in C(g, h), starting at u and with derivative u'(O) = z, then(13) we could rewrite (27) as:

!g(u(t»lt=o = g'(u)z (30a)

~ 0 (by (27a»

!h'(u(t»lt=o = h'(u)·z (30b)

= 0 (by (27b»

!f(u(t»lt=o = j'(u) . z (30c)

> 0 (by (27c».

So small movements along the path would keep us in the constraint set but increase the value of f. And that would contradict the assumption that u max­imizes f on the constraint set C(g, h).

Thus the algebra highlights the role of analysis. As we know from (14) and (15), some restrictions on the constraint functions 9 and h are required by Lagrange regularity. We now see that one such restriction is the existence of paths u( . ) with u'(O) = z for any z satisfying (27). (In the non-Lagrange-regular examples (14) and (15), such paths did not exist.) For historical reasons, such hypotheses are typically called constraint qualijications.(14)

Two general types of constraint qualifications have been developed.(1S) The first is a direct attack on the problem. It simply asserts that any z satisfying (27 a, b) is indeed the derivative of some path lying in C(g, h). We will call these path conditions.

The second type is computationally oriented. These constraint qualifications

(13) If g, h, and f are differentiable at 1:<.

(14) cr. [31).

(15) Slater [42) proposed a constraint qualification that does not fall into the two types we discuss. It was intended, however, for application to the Saddle Point Equivalence Theorem, assuming concavity of the constraint functions gi , rather than for a proof of Lagrange regularity. Because of the convexity of the constraint set defined by such gi, Slater's condition implies a path-type constraint qualification of the type we discuss below. (See the historical comments in Section X below.)

14

Page 17: Optimization and Lagrange Multipliers

are expressed as algebraic properties of the Jacobian matrix

[gl( ii)] h'(ii) . (31)

They may involve its nonsingularity, the span of its rows, or other algebraic aspects. We call these Jacobian conditions.

In simple examples, checking whether a path condition holds can be rather easy. That is the case with (14) and (15). In more complicated examples, it may be much harder. By contrast, the Jacobian conditions that we discuss are decidable in an algorithmic fashion, as we will see later.

IV The Jacobian Criterion

Our Jacobian condition will guarantee that every z satisfying (27a, b) (or the corresponding variant when 9 or h is absent) is the derivative of some path lying in C(g, h) (or C(g) or C(h)). In Theorem 1 we will prove the condition is sufficient for Lagrange regularity. And then in Theorem 2 we will prove it is as weak as possible in the class of sufficient Jacobian conditions. In order to make these notions precise, we use the following terminology.

Sufficient and minimal Jacobian constraint qualifications. Let ii e U ~ JRn

, let 9 be a set of functions 9 : U -+ JRm , let Ji be a set of functions h : U -+ JRk

, and let :::F be a set of functions f : U -+ JR. Assume that all functions in 9, Ji, and :::F have partial derivatives at ii.

By a Jacobian mixed-constraint qualification for (ii, 9, Ji) we mean a property

Q of some members of the set {(f~:D : ge9 & hEJi} of (m + k) x n (real)

matrices. To each property Q corresponds the set Q of matrices with that property.

We say that a Jacobian mixed-constraint qualification Q for (ii, 9, Ji) IS

(ii, 9, Ji, :::F)-sufficient for Lagrange mixed-regularity if: for all ge9 and heJi,

(~:~:D EQ => (g, h) is Lagrange mixed-regular for (ii,:::F). (32)

We say that a Jacobian constraint qualification Q for (ii, 9, Ji) is minimally (ii, 9, Ji, :::F)-sufficient for Lagrange mixed-regularity if:

15

Page 18: Optimization and Lagrange Multipliers

i) Q is (u, 9, JC, 31-sufficient for Lagrange mixed-regularity in the sense of (32);

ii) no weaker Jacobian property (i.e., no proper superset of Q) is (u, 9, JC, :f)-sufficient for Lagrange mixed-regularity:

if (!: ~:~) Ii!: Q then there is some 09 e 9 and some he JC with

o9'(u) = g'(u) and h'(u) = h'(u) for which (09, h) is not La­grange mixed-regular for (u, 31.

We may abbreviate "minimally sufficient" to "minimal."

Analogous definitions of sufficiency and minimality apply with functions g

for the inequalities problem, and with functions h for the equalities problem.

It is clear that whether or not a Jacobian condition is minimally sufficient for Lagrange regularity depends on the set U, the element u, and the classes 9 and JC of constraint functions under consideration, as well as on the class :f of maxim and functions. In what follows, the set U will always be an open subset of IRn. For any u e U we make these definitions:

9D (u) is the set of functions g : U -+ IRm that are differentiable at u. 9a(u) is the set of functions g : U -+ IRm that have partial derivatives

at u and are Gateaux differentiable at u in a dense set of directions. 9p(u) is the set of functions g : U -+ IRm that have partial derivatives

at u. JCD (u) is the set of functions h : U -+ IRk that are differentiable at u. JCDC( u) is the set of functions h : U -+ IRk that are differentiable at u

and locally continuous at u. JCCl (u) is the set of functions h : U -+ IRk that are Cl at u. :fD(U) is the set offunctions f : U -+ IRm+k that are differentiable at u. :fa ( u) is the set offunctions f : U -+ IRm+k that have partial derivatives

at u and are Gateaux differentiable at u in a dense set of directions.

Our main objective in the rest of this section is to state a new Jacobian constraint qualification, the Jacobian Criterion. The purpose of Section V is to prove it is sufficient for Lagrange regularity with respect to some important classes 9, JC, and:f. The purpose of Section VI is to prove it is minimally sufficient with respect to those same classes.

We now state the main Jacobian condition of interest, in forms for the mixed, the inequalities, and the equalities problems.

16

Page 19: Optimization and Lagrange Multipliers

The Jacobian Criterion. Suppose U is an open subset of IRn and g : U -+

IRm and h : U -+ IRk have partial derivatives at some U E U. (16) Let a( 1), ... , a(p) be the rows of the p x n Jacobian matrix g[ ( u), and let b( 1), ... , b( k) be the rows of the k x n Jacobian matrix h' ( u).

A) (Mixed.) We say that (g, h) satisfies the (Mixed-Problem) Jacobian Criterion at u if one of the following two mutually exclusive conditions holds:(17)

a) rank(h'(u» = k (i.e. rank(h'(u» is maximal)

and there exists a ~ E IRn such that:

g~(u)~ > 0

h'(u)~ = 0;

b) wedge( a(l), ... , a(p» + span(b(l), ... , b(k))) = IRn

I.e.:

{v EIRn : 3t O~tERP 3z Z ERk tla(l) + ... + tpa(p)

+zlb(l)+ ... +zkb(k)} = IRn.

(33a)

(33b)

B) (Inequalities.) We say that g satisfies the (Inequality-Problem) Ja­cobian Criterion at u if one of the following two mutually exclusive conditions holds:(17)

a) I is nonempty and there exists a ~ E IRn such that:

g~(u)~ > 0;

b) wedge(a(l), ... ,a(p» = IRn,

I.e.:

{veIRn : 3t O~tERP tla(l) + ... + tpa(p)} = IRn.

(34a)

(34b)

C) (Equalities.) We say that h satisfies the (Equality-Problem) Jacobian Criterion at u if the classical maximum rank condition

rank(h'(u» = min{k, n} (35)

holds.

(16) For 9 we only need to assume existence of partial derivatives for components of 9/.

(17) I is defined in (16).

17

Page 20: Optimization and Lagrange Multipliers

Remarks. (Mixed.) Alternative (33a) is Mangasarian's generalization(18) of the Arrow-Hurwicz-Uzawa Constraint QualificationJl9) The new alternative (33b) is natural in view of the fact that the basic Lagrange multiplier equation (18) says that - j'(u) lies in the sum of the wedge and the span. We will see that the Criterion is sufficient for mixed Lagrange regularity. While it is not truly necessary for Lagrange mixed-regularity, we will show it is necessary in the sense of being a minimal Jacobian condition.

For the mixed case, the Jacobian Criterion requires that either the sum of the wedge and span is "small," with the wedge lying strictly on one side of some hyperplane and the span lying in it, or else it is "large," spanning the whole space. In fact, that observation proves that parts (a) and (b) are mutually exclusive.

Note that it is not sufficient to have two different vectors e satisfying respec­tively (33a) and (33b). See the Remark following Corollary la below.

(Inequalities.) When only inequalities are present, the Jacobian Criterion requires that either the wedge generated by the gJ( u) is "small," lying strictly on one side of some hyperplane, or else it is "large," spanning the whole space. Again, that observation proves that parts (a) and (b) are mutually exclusive. What the Criterion rules out are the "borderline" cases where neither is true -i.e., where whenever the spanned convex cone lies on one side of some hyperplane, it does not lie strictly on one side. Examples (14) above and (90) below are such examples, with gl/(O) and g2/(0) pointing in opposite directions. Indeed, the general exception to the Jacobian Criterion occurs when the gil(U) all lie on one side of some hyperplane, but some gi 1 ( u) points in the opposite direction from some convex combination of the other gi 1 ( u).

(Equalities.) When only equality constraints are present, the Jacobian Cri­terion is the familiar hypothesis in the classical Lagrange Multiplier Theorem.

Removable Constraints. In many applications, Lagrange multipliers satisfying (13) are sought as a step in finding a constrained maximizing point u. If the Jacobian Criterion fails, all is not lost. For it may be that some constraint can be removed, without changing the constraint set C near u; and the Jacobian Criterion might be satisfied for the reduced set of constraints.

(18) The "modified Arrow-Hurwicz-Uzawa constraint qualification" in [35, pp. 172-173].

(19) Cf. the hypothesis of Theorem 3 in [4].

18

Page 21: Optimization and Lagrange Multipliers

In such situations, theorems using the Jacobian rank conditions should be applied to the rank of a "maximally reduced" set of constraints.(20)

Example 1: hI = h2 ;

Example 2: g1 ~ 0 ¢> g2 ~ 0; Example 3: g3 = h3 .

In these examples, h2 , g2, and g3 can be eliminated. Note also that a non-binding constraint is always removable.

This dependence of Lagrange regularity on the constraint functions, rather than on the constraint set, is illustrated in Remark (iii), page 39, for path con­ditions rather than Jacobian conditions.

V The Jacobian Criterion Is Sufficient

We now prove that the Jacobian Criterion is sufficient for Lagrange regularity - i.e., sufficient to guarantee the existence of Lagrange multipliers.

Theorem 1. Let U be an open subset of IRn, and let u E U.

A) (Equalities and inequalities.) The Jacobian Criterion (A)(21) is (u, 9D( u), 1(Dc( u), :JD( u))-sufficient for Lagrange mixed-regularity.(22) In other words:

Suppose 9 is differentiable at u, and h is differentiable at u and locally con­tinuous at U. If the Jacobian Criterion (33a,b) holds for (g,h) at u, then (g,h) is Lagrange mixed-regular for (u, :JD( u)).

In particular, under the assumptions just made: if u maximizes f : U -+ IR subject to 9 ~ 0 and h = 0, i.e., if:

g(u) ~ 0

h(u) = 0

'rIu uEU & g(u)~O & h(u)=O f(u) ~ f(u),

(36a)

(36b)

(36c)

(20) Cf. Condition R1 in [2], p. 8, where gt represents a reduced set of constraints. (Page 162 in [3].)

(21) Page 17(A).

(22) Sufficiency is defined in p. 15.

19

Page 22: Optimization and Lagrange Multipliers

and if f is differentiable at u, then there exists a A E IRm and a J.L E IRk such that:

j'(u) + AT g'(u) + J.LTh'(u) = 0,

i.e.,

of _ ag1 _ agm _

au; (u) + Al au; (u) + ... + Am au; (u)

ah1 ahk + J.Ll !leu) + ... + J.Lki:l(u) = 0

UUi uUi

and such that:

A~O

A; = 0 if iEJ.

(i= 1, ... ,n),

(37a)

(37b)

(38a)

(38b)

B) (Inequalities.)(23) The Jacobian Criterion (B)(24) is (u, 9G(u), 3'G(u»­sufficient for Lagrange inequality-regularity. In other words:

Suppose 9 : U -+ IRm is Gateaux differentiable at u for a dense set of direc­tions including the coordinate directions. If the Jacobian Criterion (34) holds for 9 at U, then 9 is Lagrange inequality-regular for (u,3'G(u».(25)

In particular, under the assumptions just made: if

g(u) ~ 0

Vu uEU & g(u)~O feu) ~ feu),

(39a)

(39b)

and if f is Gateaux differentiable at u, then there exists a A E IRm such that:

f'(u) + AT g'(u) = 0,

I.e.: of _ ag1 _ agm _

au; (u) + AlaUi (u) + ... + Am aUi (u) = 0

and such that:(26)

A~O

Aj = 0 if j E J.

(i=l, ... ,n),

(40a)

(40b)

(41a)

(41b)

(23) In this case 9 is present and h is absent. Equivalently, we can set h = 0 in part (A) delete refences to, and tenns involving, /J. or h.

(24) Page 17(B).

(25) A fortiori, then, 9 is Lagrange inequality-regular for Fr<;chet differentiable functions, i.e., for (u, 3' D (u)). Note that the assumption implies that 9 has partial derivatives at u.

(26) J is defined in (16).

20

Page 23: Optimization and Lagrange Multipliers

C) (Equalities.)(27) The Jacobian Criterion (C)(28) is (U,:J{DC(U), :7D(U))­

sufficient for Lagrange mixed-regularity. In other words:

Suppose that h is differentiable at u and locally continuous at u. If the Jacobian Criterion, the classical rank condition (35),

rank(h'(u)) = min{k, n}

holds for h at U, then h is Lagrange regular for (u,:7 D (u)).

In particular, under the assumptions just made: if

h(u) = ° 'Vu uEU & h(u)=O f(u) ~ f(u),

and if f is differentiable at U, then there exists a JJEIRk such that:

j'(u) + JJTh'(u) = 0,

i.e., there exist real numbers JJl, ... , JJk such that:

of _ oh l _ Ohk _ oU; (u) + JJl oU; (u) + ... + JJk OUi (u) = ° (i= 1, ... ,n).

(42)

(43a)

(43b)

(44a)

(44b)

Remarks. (Mixed Problem.) This theorem has weaker hypotheses than ear­lier results, in two respects. First, the Jacobian Criterion is weaker, and second, the differentiability conditions on the equality constraints hi are weaker.c29) For details, see the Historical Comments and Comparisons.

The continuity hypothesis on h can be weakened significantly, as indicated in remark (iii) below on equalities; but it cannot be dispensed with altogether, as shown by the example below in remark (iv) on equalities.

(Inequalities Problem.) Part (B) solves a traditional problem, considered in Karush [26], in Kuhn and Tucker [30], and in Arrow, Hurwicz, and Uzawa [4]. The Jacobian constraint qualification we use, (34), is weaker than the Jacobian constraint qualifications in previous work because it provides a new alterna­tive, (b). In addition, requiring only Gateaux differentiability is weaker than the usual Frechet differentiability assumption. (30)

(27) In this case h is present and 9 is absent. Equivalently, we can set 9 = 0 in part (A) and delete refences to, and terms involving, >. or g.

(28) Page 17(C).

(29) The theorem holds for :J{Dc(U) instead of just for :J{Cl (u)).

(30) It holds for (9G(u), :7G(u), instead of just for (9D(U),:7 D (u)).

21

Page 24: Optimization and Lagrange Multipliers

(Equalities Problem.) This is the classical Lagrange Multiplier Theorem except we have weakened the usual C1 hypothesis on h to differentiability at u and continuity in a neighborhood.(31) This allows, for example, h = 0 as an equality constraint, where:

h(u" U2) = { :' - uj 'in(l/u,), if Ul =1= 0

(45) otherwise,

even though it is not C1 at the origin. (See the figure.)

0.005

-0.1 0.1

-0.005

ii) When also k ~ n, it is well known(32) that the Lagrange multiplier vector J..l of (44) is unique. This can be seen by considering a linearly independent subset of k equations from the collection (44b), whose existence is guaranteed by the rank condition (35).

iii) The assumption that the constraint h is locally continuous is stronger than necessary.

First, if k = n, then the proof below for part (C) shows that no continuity is required. Second, if k < n, we can partition the vector of variables Ui into a k­vector y and an (n - k)-vector x, writing U = (x, y), where fy(x, u) is surjective. Then we can replace Theorem I.C's hypothesis that h is locally continuous by the weaker assumption that, for every x near X, the function h(x, . ) is continuous;

(31) In particular, the theorem holds for constraints in :JiDc(u) instead of just for those in :Jicl (u». Although many textbooks assume that the maximand f is also Cl, it is well known (e.g., [5]) that mere differentiability of f suffices.

(32) Cf. Bliss [7, p. 210) for k < n.

22

Page 25: Optimization and Lagrange Multipliers

and the Non-C! Implicit Function Theorem can still be applied as in the proof of part (C) that we give below.

iv) As we have just seen, we can weaken the continuity hypothesis on h. But we cannot drop it completely, as the following "nonsubstitution" example shows.

Example. The function h : IR? -+ IR:

h _ { x + y, if x + y -::j:. 0 (x, y) - x2 + y2, if x + y = 0 (46)

is clearly differentiable at (0, 0), with h'(O, 0) = (1,1), even though it is not locally continuous - being discontinuous at all other points on the line where x+y = o. But h is not Lagrange regular. In particular, even though h is differentiable at the origin, its level set allows no "substitution," since the constraint set C(h) contains only the origin (0,0). So if we define the function I : IR -+ IR by I(x, y) = y, then I is maximized on C(h) at (0,0), but no real A can satisfy:

al(O 0) - A ax ' -al ay (0, 0) = A.

(47)

Proof of Theorem 1. (A) (Mixed.) If condition (33b) holds, then it is immediate that A and J.L can be found satisfying (37) and (38). So we will assume that condition (33a) holds. Thus the index set(33) I = {1, .. . ,p} of binding constraints is nonempty and rank( h' (u)) = k.

By the Reduction principle (23), it suffices to find A E IRP and J.L E IRk satisfying (37) and (38a).

We complete the proof by contradiction. Suppose that, for some IE~D(U), there do not exist such A E IRP and J.L E IRk. Then by the Fundamental Lemma there exists a i E IRn such that:

gHu)z ~ 0

h'(u)z = 0

j'(u).z > O.

For all SE(O, 1] define:

z(s) = Z + s~,

(33) I is defined in (16).

23

(48a)

(48b)

(48c)

(49)

Page 26: Optimization and Lagrange Multipliers

for any ~ satisfying (33a). With (48a) we then have:

9/(u)z(S) > 0,

and from (48c) we also have:

!,(u) . z(s) = !,(u) . i + s!,(u) . ~ > 0

And from (33a) and (33b) we have:

h'(u)z(s) = 0 for all se(O, 1].

We define z = z(s) for any such small s.

for all small s.

(50)

(51)

(52)

Since k ~ 1, we can represent elements of IRn by u = (x, y)eIRn -k x IRk =

IRn. Then applying (2) to (50), (52), and (51):

9~(U)Z = 9x(X,y)x+9y(X,y)y > 0

hu(u)z = hx(x,y)x+ hy(x,y)y= 0

fu(u)·z = fx(x,y)x + fy(x,y)y > O.

(53a)

(53b)

(53c)

Since rank(hu(u» = k, the Non-C1 Implicit Function Theorem(34) implies there exists an open neighborhood V ~ IRn

-k of x and a function y( .) : V -+ IRk

that is differentiable at x, and such that:

Y(X) = Y h(x, y(x» = 0 for all xe V.

y'(x) = -(hy(x, y»-lhx(x, y).

By (53b) we also have:

y = -(hy(x, y»-lhx(x, y)x,

so: d dtY(x + tx)lt:o = y'(x)x

= y (by (54c) and (55».

Then the Chain Rule implies:

d dt 9I (x + tx, y(x + tx»lt=o = 9Ix(X, y) . x + 9Iy(X, y) . Y

> 0 (by (53a» d d/(x + tx, y(x + tx»lt=o = fx(x, y) . x + fy(x, y) . y

> 0 (by (53c»,

(34) See the Appendix.

24

(54a)

(54b)

(54c)

(55)

(56a)

(56a)

(56b)

Page 27: Optimization and Lagrange Multipliers

so for all small enough t > 0,

g(x + tx, y(x + tx» > 0

I(x + tx, y(x + tx» > I(x, y).

(57a)

(57b)

By (54b) we also have h(x+tx, y(x+tx)) = 0, so for small enough t > 0 there are points (x +tx, y(x +tx)) e C(g, h) at which I attains a value higher than I(x, y). And that contradicts the maximality property (36c), completing the proof for the mixed case (A).

(B) (Inequalities.) This would be an immediate corollary of Theorem l.A (with h == 0), if we were willing to assume that 9 is Frechet differentiable, rather than merely Gateaux differentiable. We will give a proof under the weaker Gateaux differentiability assumption by rewriting our proof of Theorem l.A, avoiding all mention of h, and without applying the Non-C1 Implicit Function Theorem.

If (34a) fails to hold, then the Jacobian Criterion implies that (34b) holds, and then we can take such a vector t as the desired A. So we will assume that (34a) holds. Then the index set(35) I = {1, ... , p} of binding constraints is nonempty.

By the Reduction principle, it suffices that for any le~G(u) we can find a AeIRP and a J1eIRk satisfying AT gJ(u) = -f'(u) and A ~ 0.(36) We complete the proof by contradiction. Suppose there does not exist such a A. Then by the Fundamental Lemma there exists a zeIRn such that:

g~(u)z ~ 0 f'(u).z > O.

For all se(O, 1] define:

z(s) = z + se,

(58a)

(58b)

(59)

for anye satisfying (34a). By (58) and (34a) we have, for all small enough s:

gj(u)z(s) > 0

J'(u) . z(s) > O. (60)

Since these strict inequalities will not change if we slightly change the vector z(s), and since 9 and I are Gateaux differentiable in a dense set of directions,

(35) I is defined in (16).

(36) We are assured that the Jacobian g~{ u) is well defined since 9 is Gateaux differentiable

in directions that include the coordinate axes.

25

Page 28: Optimization and Lagrange Multipliers

there exists a vector z such that:

g/(u)z> 0

f'(u)·z > 0 (61)

and such that g is Gateaux differentiable in the direction z. We then have:

!g/(u + tz)lt=o = g~(u)z > 0

d d/(u + tz)lt=o = f'(u) . z > o.

(62)

So for small movements from u in the direction of Z, we remain in the constraint set C(g) while increasing f. That contradiction of (39) completes the proof.

(C) (Equalities.) This result can be obtained as an immediate corollary of part (A) by setting g == O. However, because of the historical importance of the result, and the brevity allowed by our two mathematical pillars, we give a direct proof. If rank(h'(u)) = n, then by standard linear algebra the system

8h1 _ 8hk _ 8f _ . 1'1-

8 (u)+···+l'k-

8 (u)=--8 (u) (z=l, ... ,n) (63)

Ui Ui U;

is solvable for 1', given any vector f'(u). So, in view of (35), it remains to deal with the case that rank(h'( u)) = k < n.

The rest of our proof is again by contradiction. If, for some f e:J D (u), there does not exist a l'eIRk satisfying (63) then by the Fundamental Lemma (using (2) to translate its result from matrix terms into derivatives) there exists a z = (x, fJ)eIRn = IRn

-k x IRk such that:

hu(u)z = hx(x,y)x+hy(x,y)fJ= 0

fu(u) . z = fx(x, y). x + fy(x, y)fJ > O.

(64a)

(64b)

Because rank(hu(u)) = k < n, we can without loss of generality suppose that u = (x, y)eIRn

-k x IRk = IRn and

rank (hy(x, y» = k. (65)

Then by the Non-C1 Implicit Function Theorem,(37) there exists an open neigh­borhood V ~ IRn

-k of x and a function y( . ) : V -- IRk that is differentiable at

x, and such that:

y(x) = Y h( x, y( x » = 0 for all x e V.

Yx(x) = -(hy(x, y))-lhx(x, y).

(37) See the Appendix.

26

(66a)

(66b)

(66c)

Page 29: Optimization and Lagrange Multipliers

By (64a) we also have:

y = -(hy(x, jj))-lh:z:(x, ii)x,

hence:

y= y:z:(x)x.

Therefore:

d dtY(x + tx)lt=o = Y:z:(x)x

= y. Then the Chain Rule implies:

d d/(x + tx, y(x + tx))lt=o = f:z:(x, ii) . x + /y(x, ii) . Y

> 0 (by (64b)),

so for all small enough t > 0:

f(x+tx,y(x+tx)) > f(x,ii).

(67a)

(67b)

(68)

(69)

(70)

Since h(x +tx, y(x +tx)) = 0 by (66b), this contradicts the maximality property (43b). I

As a special case of Theorem l.A, we have:

Corollary la. (Equalities and inequalities.) Let U be an open subset of IRn

, and let UEU. Suppose gE9D(U) and hE1CDC(U), i.e. 9 : U -+ IRm

and h : U -+ IRk are differentiable at U, and h is locally continuous at U. Let gl, ... , gP be the binding inequality constraints at U. If:

rank ([f,i:j]) =p+k, (71)

then (g, h) is Lagrange mixed-regular for (u,:r D ( u)).

Remark (i). If m = k ;:£ nand gi (u) = 0 for j = 1, ... , m, then the Lagrange multiplier vector (A,J-!) whose existence is guaranteed by the corollary is unique. For in this case u a fortiori maximizes f( u) subject to both g( u) = 0 and h(u) = 0, so remark (ii) on page 22 applies.(38)

(38) This also follows from [37, Theorem 3.1, p. 180), provided all constraints considered there vanish at ii.

27

Page 30: Optimization and Lagrange Multipliers

Remark (ii). For (71) to hold it is not sufficient that rank(g[(u)) = p and rank(h'(u» = k. Although the (71) implies these two rank conditions, the following example shows that the converse is not true. Let:

h(Xl, X2) = -Xl + x~ 1 (72)

g(Xl,X2) = -Xl + 2x~,

Then

f(Xl,X2) = Xl + X2 (73)

IS maximized at (0,0) subject to the constraints 9 ~ 0 and h = O. And rank(g' (u» = 1 and rankh' (u» = 1, but the rank of the combined matrix is not equal to 1 + 1.

Proof of Corollary la. Under the rank condition (71), it follows that rank(h'(u» = k and also that there exists a solution ~ of

[g[( u)~] [c] h'(u)~ - 0 '

(74)

where c is any positive column vector in IR,P. Thus condition (33a) holds, so Theorem LA guarantees Lagrange regularity. I

As a special case of Theorem 1.B, we have:

Corollary 1 h. (Inequalities.) Let U be an open subset of IRn, and let ueU. Suppose ge9G(u), i.e., 9 : U - IRm is Gateaux differentiable at u; and suppose gl, ... , gP are the binding constraints at U. If the rank condition

rank(g~(u» = p (75)

holds, then 9 is Lagrange inequality-regular for (u, :7G(u)).

Remark (i). Ifrn ~ nand gi(u) = 0 for j = 1, ... ,rn, then the Lagrange multiplier vector A whose existence is guaranteed by Corollary 1b is unique. For then u a fortiori maximizes f(u) subject to g(u) = 0, and hence remark (ii) on page 22 applies. (39)

(39) Again, this also follows from [37, Theorem 3.1. p. 180], provided all constraints considered there vanish at the maximizing point.

28

Page 31: Optimization and Lagrange Multipliers

Remark (ii). The Jacobian g'(11) may have full rank p even though the Jacobian g[( 11) of the binding constraints may have rank < p. So it is important to note that the rank condition (75) refers to the Jacobian of only the binding constraints Consider, for example:(40)

gl(Xl' X2) = (1 - X2? - X2

g2(Xl,X2) = Xl

g3(Xl,X2) = X2.

(76)

Then for x = (1,0), the constraints gl and g3 are binding, while g2 is not. And:

[0 -1] g'(x) = ~ ~' (77)

which only has rank 2 < 3 = p. So condition (75) fails, and Corollary 1b does not apply, even though rank(g' (11» = p.

Remark (iii). Condition (75) is stronger than necessary for Lagrange reg-ularity. Consider:

gl(Xl' X2) = 6 - 3Xl - 3X2

g2(Xl' X2) = 6 - 4Xl - 2X2

g3(Xl, X2) = 6 - 2Xl - 4X2,

for which all constraints are binding at the point (2,2), and:

[-3 -3] g'(2, 2) = -4 -2 .

-2 -4

(78)

(79)

Clearly rank(g[(2,2» = 2 < 3 = p, so condition (75) does not hold. Yet g[(2,2)e> 0 for e = (1,1), so 9 is Lagrange regular at (2,2) by Theorem l.B.

Proof of Corollary lb. Under the rank condition, there exists a solution e ofg[(11)e = e, where e = (1, ... ,1) > OeIRP. Thus condition (34a) holds, so Theorem 1.B guarantees Lagrange regularity. I

(40) cr. [30, p. 484].

29

Page 32: Optimization and Lagrange Multipliers

(Equalities.) The rank condition for equalities parallel to those in (71) and (75) is the condition already stated in (35) of Theorem l.C.

VI The Jacobian Criterion Is Minimal

In Section V we showed that the Jacobian Criterion is sufficient for Lagrange regularity. However, as shown by the Example below in the introduction to Section VII, p. 34, it is not necessary. Indeed, the Example shows that no Jaco­bian constraint qualification can be both necessary and sufficient for Lagrange regularity.

We will obtain a necessary and sufficient constraint qualification in Sec­tion VII (Theorems 3 and 4) below, but it will be a "path condition" rather than a Jacobian condition. Nevertheless, because the Jacobian conditions typically have computability properties that make them useful in practical applications, while path conditions are often more difficult to deal with, Jacobian conditions are of special interest. So it is important to show that the Jacobian Criterion of the previous sections is as weak as possible among Jacobian conditions.

We now show that, if one restricts oneself to Jacobian conditions that are sufficient for Lagrange regularity, then the Jacobian Criterion is minimal (over an appropriate class of constraint functions). Theorem 2.A, together with The­orem l.A, will show that the mixed-problem Jacobian Criterion is a minimal Jacobian constraint qualification for the mixed problem over the class of differ­entiable functions 9 and h, where h is locally continuous.

And Theorem 2.B, together with Theorem l.B, will show that the inequalities­problem Jacobian Criterion is a minimal Jacobian constraint qualification for the inequalities problem over the class of differentiable functions g.

Finally, Theorem 2.C, together with Theorem l.C, will show that the equalities­problem Jacobian Criterion is a minimal Jacobian constraint qualification for the equalities problem over the class of differentiable functions h that are locally con­tinuous.

Theorem 2.A. (Equalities and inequalities.) Let U be an open subset of IRn and let ue U. The Jacobian Criterion (A) is minimally (u, 9D(U), JeDC(U), ~ D(U»­sufficient for Lagrange mixed-regularity. (41)

(41) See pp. 15-16 for the definition of minimal sufficiency, and p. 10 for the definition of Lagrange mixed-regularity.

30

Page 33: Optimization and Lagrange Multipliers

Remark. The theorem does not claim that the Mixed-Problem Jacobian Criterion is necessary for Lagrange mixed-regularity of a given function pair (g, h). It only claims the Criterion is minimal in the class of Jacobian constraint qualifications.(42) However, outside the class of Jacobian conditions, there are path-type conditions guaranteeing Lagrange regularity even in cases where the Jacobian Criterion is not satisfied. See, for example, the introduction to Sec­tion VII.

Proof of Theorem 2.A. By Theorem l.A, the Jacobian Criterion (A)(43) is sufficient for Lagrange mixed-regularity. To see that the Jacobian Criterion (A) is minimal,(44) suppose there is some ge9D(u) and some heJ(Dc(u) that do not satisfy the Jacobian Criterion (A). We will show there is some ge 9D(U) and

heJ(Dc(u) with g'(u) = g'(u) and h'(u) = h'(u) for which (g, h) is not Lagrange regular for (u, :7D(U».

Without loss of generality we can assume that U is a neighborhood of the origin 0 e IRn, and that u = O. Define the matrix A = g' (0) and the matrix B = h'(O). Since we are supposing that (g, h) violates the Jacobian Criterion (A), we know that (A, B) cannot satisfy either of these two mutually exclusive conditions:

a) rank(B) = min{k, n} and there exists a ~eIRn such that: (80a)

A~ > 0

B~ =0.

b) wedge ( {a(l), ... , a(m)}) + span( {b(l), ... , b(k)}) = IRn (80b)

I.e.:

{tla(l) + '" + tma(m) + zl b(l) + ... + Zkb(k) :

realtl, ... ,tp~O & Zl, ... ,ZkeIR}=IRn.

It will suffice to find 9 and h as above and such that C(g, h) = {O}. For then the failure of (b) implies that there is some, e IRn with:

,ewedge( {a(l), ... , a(m)}) + span( {b(l), ... , b(k)}); (81)

In fact, as seen from the proof below, Theorem 1.A remains true even if the class of maximands is as narrow as the class of linear ones - i.e., if we replace :7 D (u) by the class of linear functions.

(42) Cf. p. 15.

(43) Page 17.

(44) Page 15.

31

Page 34: Optimization and Lagrange Multipliers

so if we define f( x) = -, . x then f is maximized on C(g, h) at the origin 0, and f'(O) = -" which by (81) cannot satisfy the Lagrange regularity requirements (18) and (20).

We examine first the special case in which some row a( i) = 0 E IRn . For xEIRn, we define gi(x) = -(xi + ... + x;), so a(i) = gi,(O), and gi(x) ~ 0 if and only if x = 0; we define gi = gi for j :I i. Thus gE9D(ii), g'(O) = A, and C(g, h) = {O}.

Analogously, if some row b(j) = OEIRn, then defining hi(X1, ... ,Xn) = -(xi + .. . +x;) and hi = h for i:l j, a similar argument shows that hEJCDc(ii), h'(0) = B, and C(g, h) = {O}.

It remains to consider the case that:

a(i) :I 0

b(j) :I 0

for all i = 1, ... , m

for all j = 1, ... , k. (82)

We know that (80a) fails. There are two ways it can fail: i) we might have rank(B) < min{k,n}, or ii) there might exist no ~ with A~ > 0 and B~ = o.

(i) First consider the case that rank(B) < min{k,n}. Since rank(B) :I k, then some row of B is a linear combination of the others. Without loss of generality, suppose that:

b(I) = Z2b(2) + ... + Zkb(k) (83)

for some real Z2, ... , Zk. Since b(I) :I 0, we can also without loss of generality choose a basis so that

b(I) = (1,0, ... ,0). (84)

Now define:

gi(X1, ... , Xn) = a(i) . x (i=I, ... ,m) " 1 2 2 h (X1, ... ,Xn)=X1-(X2+···+Xn) (85)

hi (X1, ... , xn ) = b(i) . x (i=2, ... ,k).

Then g'(O) = A and h'(O) = B, gE9D(ii) and hEJCDc(ii); and if g(x) ~ 0 & hex) = 0 it follows from (83), (84), and (85) that Xl = ... = Xn = 0, so again the constraint set contains just the origin: C(g, h) = {O}.

(ii) The other way that (80a) could fail is through absence of a ~EIRn with A~ > 0 and B~ = O. In that case the Theorem of the Alternative,(45) implies

(45) See the Appendix, applying the special case of part (II) when C and -y are absent.

32

Page 35: Optimization and Lagrange Multipliers

existence of atE IRm and v E IRk such that:

t~O

tT A + vT B = ° E IRn. (86a)

Without loss of generality we can rewrite this as:

a(1) = -(w2a(2) + ... + wma(m) + z l b(1) + ... + zkb(k)), (86b)

for some real Zi and some real Wj ~ 0. Because of (82) we can, also without loss of generality, choose a basis so that:

a(1) = (1,0, ... ,0) E IRn. (87)

Now define:

gl(X1, ... , xn) = Xl - (x~ + ... + x~) gi(Xl, ... ,xn)=a(i).x (i=2, ... ,m) (88)

hi (X1, ... ,xn)=b(i).x (i=1, ... ,k).

Then 9'(0) = A and h'(O) = B, gE9D(U) and heJCDc(u), and if 9(X) ~ ° & hex) = ° it follows from (86), (87), and (88) that Xl = ... = Xn = 0; so again the constraint set contains just the origin: C(g, h) = {oJ. I

Theorem 2.B. (Inequalities.) Let U be an open subset of IRn and let ueU. The Jacobian Criterion (B) is minimally (u,9G(u),:7G(u))-sufficient for Lagrange inequality-regularity. (46)

Proof. This follows from obvious modifications of our proof for Theorem 2.A, or as a corollary of that theorem if we set h = 0. I

Finally, the Jacobian Criterion (C), the classical rank condition (35), is a minimally sufficient Jacobian constraint qualification for the classical Lagrange equality-constrained problem:

(46) See pp. 15-16 for the definition of minimal sufficiency, and p. 10 for the definition of Lagrange inequality-regularity.

In fact, as seen from the proof below, Theorem 2.B remains true even if the class of maximands is as narrow as the class of linear (hence infinitely differentiable) ones - i.e., jf we replace:7 G(u) by the class of linear functions.

33

Page 36: Optimization and Lagrange Multipliers

Theorem 2.C. (Equalities.) Let U be an open subset of IRn and let it E U. The Jacobian Criterion (C),

rank(h'(it» = min{k, n}, (89)

is minimally (it,:J{D (it),:r D( it) )-sufficient for Lagrange equality-regularity. (47)

Proof. This follows from obvious modifications of our prooffor Theorem 2.A, or as a corollary of that theorem if we set 9 = O. I

VII The Tangency-Path Criterion

Why consider more than the Jacobian? Because the Lagrange multiplier property (18) is a local property, one might expect that Jacobian properties would suffice to characterize conditions under which Lagrange regularity holds for (g, h). However, they cannot determine Lagrange regularity in all cases, as we see from the following example.

Example. Consider the inequalities 9 ~ 0 where:

{

U2 - ui sin(l/ul), if Ul ::f 0 gl(Ul,U2) =

o otherwise g2( Ul, U2) = -U2,

or the equalities h = 0 where:

{

U2 - ui sin(l/ud, h1(Ul, U2) =

o h

2(Ul, U2) = -U2·

if Ul ::f 0

otherwise

(90)

(91)

It will be evident from Theorem 3 (or Theorem 5) that, for any function f that is maximized at it = (0,0) subject to the constraints (90) or (91) there do exist Lagrange multipliers satisfying (18) and (20). By contrast, the function 9 of (14)

(47) See pp. 15-16 for the definition of minimal sufficiency, and p. 11 for the definition of Lagrange equality-regularity.

Again, as seen from the proof below, Theorem 2.C remains true even if the class of maximands is as narrow as the class of linear (hence infinitely differentiable) ones - i.e., if we replace :r D (11) by the class of linear functions.

34

Page 37: Optimization and Lagrange Multipliers

is not Lagrange inequality-regular, even though it has the same Jacobian g'(O, 0) as (90) above; and the function h of (15) is not Lagrange-equality regular, even though it has the same Jacobian h'(O, 0) as (91).(48)

So Jacobian conditions alone cannot characterize Lagrange regularity of (g, h) - none can be both necessary and sufficient. Even though the Jacobian Cri­terion was minimal among Jacobian constraint qualifications, a finer tool than Jacobian conditions is needed for a complete characterization. So we turn to path conditions.

The tangent cone. To describe derivatives of paths, we use the tangent cone. For any subset S of U ~ IRn , we say that a vector v E IRn is tangent to S at u if:

either:

a) there exists a sequence of points ui E S such that: (92a)

i) u i f. u for all i = 1,2,3, ...

ii) Ilui - ull -:---+ 0 '-00

iii) . ui

- U v

~ if0f' or:

b) u is an isolated point of S and v = o. (92b)

We define:

TuS = {rvEU : v is tangent to S at u & r is a nonnegative real number}.

(93)

As this is clearly a cone, TuS is called the tangent cone of S at u.

A path interpretation of the tangent cone. For an alternative view of the tangent cone concept, we now give an equivalent formulation in terms of derivatives of certain paths. This provides useful intuition. Also, when we later introduce a new constraint qualification based on the tangent cone, the path interpretation will help us link the new qualification to the traditional constraint qualifications of Karush [26], Kuhn and Tucker [30], and Arrow, Hurwicz and Uzawa [4], which are based on derivatives of paths.

(48) Cf. the discussion of (14) and (15), p. 9.

35

Page 38: Optimization and Lagrange Multipliers

We characterize the tangent cone as a set of derivatives of paths as follows. First we say that a function ¢ : [0,1]-+ IRn is a path at ii attending S if ¢(O) = ii and ¢(t) lies in S for arbitrarily small positive t:

\Ie £>0 3t O<t<£ ¢(t) e S. (94)

Then we can view the TijS as the derivatives of paths at ii attending S:

Proposition on Paths and Tangent Cones. a) If a path ¢ : [0,1]-+ IRn

at ii attends S and is differentiable at 0, then ¢'(O)eTijS.

b) Conversely, if v e Tij S, then there is a (not necessarily continuous) path ¢ : [0,1]-+ IRn at ii attending S with v = ¢'(O).

Proof. (a) We assume the conditions of (a). If ¢ is constant in a neighbor­hood of 0, then ¢(t) = ii for all t, so ¢'(O) = 0, which belongs to TijS. If ¢ is not constant near 0, then there exist t -+ 0 with ¢(t) :I ¢(O), and then we have a sequence oft> 0 with t "" 0 and ¢(t)eS with:

¢(t) - ¢(O) _ ¢'(O)t + o(t)

1I¢(t) - ¢(O)II - 1I¢'(O)t + o(t)11

¢'(O) + o(t) t

II¢'(O) + o~t) II

so ¢'(O)eTijS.

----+ t'O

¢'(O) II¢'(O)II eTijS,

(95)

(b) Suppose veTijS. If v = 0, then ¢ == 0 lies in S and satisfies v = ¢'(O). If v :I O,then there exist uieS satisfying (92). Then on this path ¢:

{

ii + t lIu~ - ~II' if lI ui+l - iill < t ~ Ilu i - iill

¢(t) = u' - u

ii, if t = 0

(96)

we have ¢(ti) = ui eS for ti = lIu i - iill so ¢ attends S. And ¢'(O) = v. I

36

Page 39: Optimization and Lagrange Multipliers

We cannot strengthen the statement in part (b) of the Proposition that ¢ attends S to the requirement that the values ¢(t) all lie in S. Simple examples rule that out.

Also, we cannot guarantee continuity of ¢ at 0 in part (b) of the Proposition. This is apparent from the example for (91) and the figure. There the tangent cone to the constraint set at u = (0,0) is just the horizontal axis. If we start away from the origin, then the only way to approach the origin from within the constraint set is to hop discontinuously from intersection to intersection, converging to the origin, on the horizontal axis, with a horizontal tangent vector.(49)

Conversion of equalities to inequalities. Our mixed maximization prob­lem, as stated at the beginning of Section III, is concerned with maximizing a function f subject to both inequality constraints gi ~ 0 and equality constraints hi = o. It is sometimes convenient to convert each equality hi = 0 into two inequalities gil ~ 0 and gi2 ~ 0, where

gil = hi (97) gi2 = _hi.

We write the new inequalities 9 ~ 0 together with the original inequalities 9 ~ 0 by g( h), which we call the associated inequalities for (g, h) ,<50) Clearly the original constrained maximization problem is now equivalent to maximizing a function f on U subject to g(h) ~ o.

We can now state a path criterion for Lagrange regularity in terms of the tangent cone concept.

Tangency-Path Criterion. To characterize Lagrange regularity, we state the following condition in terms of tangent cones; a path interpretation will be given later.

Suppose U is an open subset of IRn, and let (g, h) : U -+ IRm x IRk. We say that (g, h) satisfies the Tangency-Path Criterion at u if:

(49) Therefore only requiring paths to have tangents at the origin allows more generality than in Kuhn and Tucker, where the definitions assumed differentiable paths. This answers affirmatively the question raised by Arrow and Hurwicz [4], who noted that their differentia­bility hypothesis was weaker than that in Kuhn and Tucker [30], and who wondered if this in fact provided extra generality.

(50) In particular, when h is absent, then g(h) is understood to be g, and when 9 is absent, then g(h) is understood to contain the inequalities specified in (97).

37

Page 40: Optimization and Lagrange Multipliers

i) all the g(h)i have partial derivatives at u,c51) ii) for any veIRn,c52)

g(h)/(u)v ~ a ~ vecl(ch(TuC(g(h)))). (98a)

We can also write (ii) as:

L(g(h» ~ V(g(h», (98b)

where we define define:

L(g(h» = {veIRn : g(h)/(u)v ~ a}, (99)

and:

V(g(h» = cl(ch(TuC(g(h»))). (100)

A path interpretation of the Criterion. In view of the Proposition on Paths and Tangent Cones, property (ii) in (98a) can also be stated in terms of paths. Paraphrased in terms of binding constraints gi, (98a.ii) says that if a vector v has a nonnegative inner product gil(U) . v with each gil(U), then v

is a limit of positive linear combinations of tangents of paths attending(53) the constraint set. In particular, if the gi were differentiable(54) then any direction v in which all the gi had positive derivatives would be, if not a direction (i.e., tangent) of a "feasible" path, then at least a positive linear combination of directions of attending paths.

Remark (i). One might consider the simpler condition obtained from re­placing cl(ch(TuC(g(h»» by its subset ch(TuC(g(h»):(55)

L(g(h» ~ ch(TuC(g(h))). (101)

However, that would impose a stronger requirement on constraints than does (98b). Consider, for example, the inequality constraints gi : IR3 --> IR defined by:

gl(x,y,z) = z2 _ y2 _ (x _ Izl)2

g2(X, y, z) = x.

(51) I.e., all gi and all h j have partial derivatives at ii.

(52) I is defined in (16).

(53) See the definition page 36.

(54) And not merely possessing partial derivatives.

(55) Cf. [15].

38

(102)

Page 41: Optimization and Lagrange Multipliers

Both gi are binding at the origin (0,0,0), and have partial derivatives (in fact they are differentiable) at the origin:

g'(O,O,O) = [0 ° 0] 1 ° ° . (103)

The constraint set C(g) is a closed cone/56) so it is easy to see that 1(o,O,O)C(g) = C(g). But ch(C(g» ~ cl(ch(C(g»), since the former set is the open half-space defined by x > 0, together with the line where x = y = 0, while the latter is the closed half-space defined by x ~ 0.(57) Since (103) implies that L(g) is the latter closed half-space, the Tangency-Path condition is satisfied under our definition (98b), but not if we define it using the stronger requirement (101) -i.e., using ch(T(o,o,oP(g» rather than cl( ch(T(o,o,o)C(g ))).(58)

Remark (ii). If we replace g1 by

g1(x, y, z) = {g1(X' y, z)j V(x2 + y2 + z2), if (x, Y'. z) # (0,0,0) (104) 0, otherwIse,

then (g1, g2) has the same partial derivatives at the origin (0,0,0) as in (103), but it is no longer Frechet differentiable at the origin, even though it defines the same constraint set as (g1, g2). This illustrates the fact that Theorem 3 guarantees Lagrange regularity even when full differentiability does not apply. The example still exhibits a discrepancy between cl(ch(ToC(g» and ch(ToC(g».

Remark (iii). Without g2, the system (102) would not be Lagrange regular, even though the constraint set would be the same. This emphasizes that Lagrange regularity is a property of constraint functions, not just of constraint sets.c59)

(56) Cf. [16, p. 9].

(57) See [16], loco cit.

(58) See also (163), p. 59.

(59) Cf. the comments on Removable Constraints above, p. 18, for Jacobian conditions. Also see [2, footnote 2, p. 2], where again the form of constraints influences whether the rank constraint qualification holds or not - indeed, whether the inequality constraint is Lagrange regular or not.

39

Page 42: Optimization and Lagrange Multipliers

We will show that the Tangency-Path Criterion is necessary and sufficient for Lagrange regularity. By the conversion rule (15), it clearly suffices to show this for inequality constraints gi ~ O. By the definition of the Tangency-Path Criterion, the functions gi in the following theorem have partial derivatives; but it is not assumed that 9 is Frechet, or even Gateaux differentiable.(60)

Theorem 3. (Inequalities.) Let U be an open subset of IRn , and let u e U.

Suppose 9 : U -+ IRm. If the Tangency-Path Criterion(61) holds at ii, then 9

is Lagrange inequality-regular for (U,J'D(U)). In other words, if U maximizes J : U -+ IR subject to 9 ~ 0, i.e., if:

g(U) ~ 0

'T/u ueU & g(u)~O f(u) ~ J(u),

and if J is differentiable at ii, then there exists a AeIRm such that:

J'(u) + AT g'(u) = 0,

i.e.: 8 f _ 8g1 _ 8gm _ ~(u) + Ai "!l(u) + ... + Am ~(u) = 0 UUi UUi UUi

and such that:

A;:::O

Aj = 0 if jeJ.

Proof. For each j with j E J, define:

Aj = O.

(i=l, ... ,n),

(105a)

(10Sb)

(106a)

(106b)

(107a)

(107b)

(108)

Then (107b) is satisfied. If the index set(62) I of binding constraints is empty, we are done, since then u is in the interior of the constraint set, hence the usual

(60) Gould and Tolle [18, p. 167], presupposing differentiability and continuity conditions that are not made in Theorems 3, 4, 5, or 6 below, state a criterion that is the dual of a condition stronger than the Tangency-Path Criterion. (See Section X below.)

Our Theorem 3 is a strengthingof Arrow, Hurwicz, and Uzawa's Theorem 1 [4], as well as to the "if" part of Gould and Tolle's theorem. Our Theorem 4 is a strengthening of Theorem 2 in Arrow, Hurwicz, and Uzawa [4]; it plays the same role, under weaker hypotheses, as the "only if" part of Gould and Tolle's theorem.

(61) Page 37.

(62) I is defined in (16).

40

Page 43: Optimization and Lagrange Multipliers

calculus arguments imply

of (u) = 0 aUj

(j=I, ... ,n), (109)

which together with (108) satisfies (106). So we assume that I = {I, .. . ,p} is nonempty.

By the Reduction principle, (63) it suffices to find a>. E IRP satisfying>. T gi( u) = - f'(u) and >. ~ O. Arguing by contradiction, suppose there does not exist such a >.. Then the Fundamental Lemma implies there exists a i E IRn such that:

gi(u)z ~ 0 (iIOa)

J'(u)·z>O. (llOb)

By the Tangency-Path Criterion,(64) property (llOa) implies the existence of a sequence of wi Ech(T,.C(g)) with:

; -w -c---+ z. ' .... 00

(llI)

For all large enough i, therefore, (llOb) implies:

J'(u) . wi > O. (1l2)

By (7a), for each i there exist nonnegative ti,I,"" ti,q summing to 1, and there exist of Wi,l, ... , wi,q E T,. C(g) with:(65)

. . I . W' = ti,1 w" + ... + ti,qW"q. (1l3)

It follows from (1l2) and (113) that:

J'(u) . wi,} > 0 (1l4)

for some large i and some j = 1, ... , n. Since wi,} ET,.C(g), and since wi,} "I 0 by (1l4), it follows from the definition (93) that there exists a sequence of points uk E C(g) such that:

(63) Page 11.

(64) Page 37.

i) uk "I u

ii) Iluk - ull

for all k = 1,2,3, ...

----+ 0 k .... oo

k - wi,} u -u __ iii) IU. ..11 L.~ l Ilwi'}II'

(115)

(65) Actually, Caratheodory's Theorem ensures that we can choose q = n + 1 (cf. [39], p. 155 (Theorem 17.1)). In fact, since TuC(g) is a cone, we can choose q = n ([39], p. 156, (Theorem 17.1.2)).

41

Page 44: Optimization and Lagrange Multipliers

Since f is differentiable,

f(uk) = f(u) + f'(u) . (uk - it) + o(lluk - ul!) (116)

for all k, so:

f(uk) - f(u) = J'(u). (uk - u) o(lluk - u) Iluk - ull lIuk

- ull + Iluk - ull (117)

Now by (115(iii)) the first term on the right hand side converges to f(u) . wi,j, which is positive by (114), and the second term converges to o. So for all large k we have ukeC(g) and (applying (2) to f'(u)· wi,j > 0) we have f(uk) > f(u), which is a contradiction of the constrained maximization hypothesis (l05b) .•

Theorem 3 shows that the Tangency-Path Criterion(66) is strong enough for Lagrange regularity. Now we show it is not too strong.

Theorem 4. (Inequalities.) Let U be an open subset of JRn , and let u e U. Let 9 : U ~ JRm. If 9 is Lagrange inequality-regular for (U,:7D(U)),(67) then 9

satisfies the Tangency-Path Criterion.(68)

Remark. The necessity property in Theorem 4 is analogous to the minimal­ity properties of the Jacobian Criterion (Theorems 2.A, 2.B, 2.C). However, the Path Criterion is strictly weaker than the Jacobian Criterion, as shown by the Example, page 34. In fact, in view of Theorem 4, no further weaking is possible for ge9p(u) with constraint qualifications expressible in terms of initial deriva­tives of paths attending the constraint set C(g), and for maximand functions in :7D(U).

(66) See definition page 37.

(67) See definition page 10.

(68) See definition page 37.

42

Page 45: Optimization and Lagrange Multipliers

Proof. Suppose that 9 is Lagrange inequality-regular for (u, :fD(U)),(69) so the gi have partial derivatives at U. To verify the Tangency-Path Criterion (98), we must show that: for any Z E IRn PO)

gHu)z ~ 0 => ZE V(g). (118)

Without loss of generality, we will suppose that u is the origin.

a) If there are no binding constraints gi, then OEBe(O) ~ C(g) for some € > 0, so V(g) = IRn

, and we are done. So we will suppose that there are binding constraints, i.e.,(71) the set I = {I, ... ,p} is nonempty.

b) For a proof by contradiction, suppose the Tangency-Path Criterion (118) is not true, so there exists a zEL(g) " V(g). Now V(g) = cl(ch(TilC(g»» is a closed convex cone, not equal to IRn since it does not contain z; so by standard separating hyperplane theorems(72) there exists a q E IRn such that:

qiO q. U < 0

q. U = 0

q. Z > 0

q.q=l.

for all UEV(g) with - u~V(g)

for all UEV(g) with - UEV(g)

We define the hyperplane Hq by:

H q = {u E IRn : q . U = O},

so q is orthogonal to Hq . And we define the subspace orthogonal to Hq :

Q = {tq : tEIR},

so IRn is the direct sum:

IRn = Hq EEl Q.

(119a)

(119b)

(11ge)

(119d)

(11ge)

(120)

(121)

(122)

c) In what follows we shall define a differentiable function f : IRn -+ IR that

is maximized on C(g) at u = 0, and for which 1'(0) = q. Then by Lagrange

(69) See definition page 10.

(70) I is defined in (16).

(71) See the definition of binding constraint, page 9, and the notation established in (16) and (17).

(72) Cf. [271, p. 315, Theorem 2.7.

43

Page 46: Optimization and Lagrange Multipliers

inequality-regularity there must exist ).i ~ 0 such that f' (0) = - 2::f=l ).;gil(O), so:

q . z = f' (0) . z

p

= - L ).;gil(O) . Z

;=1

~O (by (99), since zeL(g)),

which contradicts (119d).

(123a)

(123b)

(123c)

d) Finding a function f with the properties mentioned in (c) is equivalent to finding a function f : IRn

-+ IR that is differentiable at 0, has fu(O) = q, and is maximized on C(g) at O. That is because the matrix corresponding to the linear transformation fu(O) is represented in the standard basis of IRn by the Jacobian matrix of f.(73)

e) Our intuition in defining f is simple. If all of C(q) lies "below" the hy­perplane H q , then f(x + tq) = t clearly satisfies the requirements of part (c) above. Now properties (119b,c) almost imply that V(g), hence its subset C(g), lies below the hyperplane H q. However they do allow C(g) itself to rise "above" Hq - though only "gradually," much as the function y = x 2 rises above the x-axis. So we take as the graph of our function f (through the function .:y below) the "upper boundary" of C(g), and show it has (like the function x 2 ) a zero derivative.

f) Next, some definitions:

1 Bk={ueU:llull~2k}

Ck = C(g) n Bk

1 H; = {ueHq : IIuli ~ 2k }·

(124)

g) To define the function f we first define, for each k e IN, a function ,0./ : Hq -+ IR U {-oo}; then we change i k into.:yk so that its values belong to the closure of the ball Bk; finally, we define f in terms of .:yk .

(73) Cf. [40, Theorem 9.17, page 215].

44

Page 47: Optimization and Lagrange Multipliers

We define for each keIN, the function 'l : Hq --> IRu {-oo} by:(74)

'l(x) = {SUP{t : x + tqeCk}, if 3y[x:+ yeCkj (125) -00, otherwIse.

Because C k ~ Bk, the values of ",l are either finite or -00. We note that:

,k(x)~t for all x+tqeCk. (126)

To bound the function below, we define:

so:

'i(x) = max{O"k(x)} for all x eHq ,

;;l ~ ,k and ;;k ~ O.

h) Next we show that for all large enough k e IN:

;;k(O) = O.

First we note that for large enough k e IN:

...,3t O<teR tqecl(Ck).

(127)

(128)

(129)

(130)

Otherwise there would exist a sequence of points xk +tkq)eCk with t k > 0 and:

IIxkll <...!:... Iltkqll = 2k

-:---->1 0 k ..... oo

(131)

and xk + tkq ~ 0 (since xk + tkqeCk). Then by definition (92a) and the

compactness of the unit ball, there is a subsequence (we use the same index k

for convenience) on which:

xk + tkq Ilxk + tkqll ~.~ I some weToC(g). (132)

Since weToC(g) ~ V(g), properties (119b,c) imply:

q·w::::; O. (133)

But using (131) we see that, in the maximum norm we have, for large enough k:

Xk+tkq _ xk+tkq

Ilxk + tkqll - Iltkqll q

~ jjqIT' (134)

(74) We use the supremum rather than the maximum because the sets ck are not nec­essarily closed. That is because the constraint functions gi are only assumed to have partial derivatives, and only at the origin, so they are not necessarily continuous in any neighborhood ofO.

45

Page 48: Optimization and Lagrange Multipliers

so W = q/llqll. Then (133) implies q . q ~ 0, so q = 0, contradicting (119a) and completing the proof of (130). So for all large enough keIN we have:

tqecl(Ck) => t ~ 0, (135)

and therefore (129) holds.

i) Now we show that, for all sufficiently large k e IN, the function .yk is differ­entiable at OeHq , and:

.yk is differentiable at OeHq and d.yk(O) = O. (136)

Let xi eHq with 0 ::j; Ilxili -:----> O. We must show that: .-00

l.yk(xi) - .yk(O)1

Ilxi - 011

-:----> 0 e IR, '-00

which by (128) and (129) means:

.yk (xi) -II-'ll -0. ,xt i~oo

(137)

(138)

It suffices to consider infinite subsequences of all the i's for which .yk(xi ) > 0, hence .yk(xi ) = -yk(xi). And it suffices to show that any such "positive" subse­quence itself has a subsequence that satisfies (138). Now on any subsequence of a positive subsequence (we'll use "j" to remind us) we have:

0< .yk(xi) = -yk(xi) -:----> 0, )-00

(139)

since otherwise this subsequence itself would have a subsequence converging to some t > 0; but then (by the definition (125)) there would exist xi + tiqeCk

converging to tq ecl( Ck), contradicting (130).

Now (139) implies:

o :f. Ilxi + -yk (xi )qll -:----> 0, (140) )-00

so by compactness of the unit ball, there is a subsequence of the j's on which we have convergence:

xi + -yk(xi)q IIxi + -yk(xi)qll

-:----> some w. )-00

(141)

We will show that (138) holds on any such subsequence (141). First we note that:

weToC(g). (142)

In view of (140) and (141), that would be immediate from the definition (92) if xi + -yk(xi)qeCk; but the definition (125) only requires that -yk(xi) be the sup

46

Page 49: Optimization and Lagrange Multipliers

oft for which xi +tqeCk.(75) Nevertheless, it follows from (125) and (141) that there are ti with 0 to xi + ti qeCk, Ilxi + ti qll- 0, and:

xi + tiq ~

Ilxi + ti qll ~ w, (143)

so (142) holds.

From (142) and (119b,c) we see that:

o ~ q ·w. (144)

Then (141) and (144) imply:

= xi +,k(xi)q o ;::: q . w +-:-- q. II' .... .. - J->OO xJ +

,k(xi ) (since xi eHq)

(145) -lixi +,k(~i)qllq·q

,k(xi )

- IIxill + lI,k(xi)qll (using (11ge) and the sum norm)

;::: 0 (since ,k(xi ) > 0),

which implies (138).

j) Now we define f : IRn _ IR by:(76)

f(u) = t _.yK (x)

for all ueIRn with u = x + tq & xeHq & teIR, (147)

where K is any k so large that (129) and (136) hold, as in parts (h) and (i). Clearly 0 e IRn maximizes f on CK , since f(O) = 0 by (147) and (129), and if x + tqeCK ~ C(g) then f(x + tq) ~ 0 by (147), (126), and (128).

By our argument in parts (c) and (d) above, it only remains to show that f is differentiable at the origin OeIRn

, with f11.(O) = q. And for that it suffices to show:

f(x + tq) - f(O) = f11.(O) . (x + tq) + o(llx + tqlD

= q . (x + tq) + o(llx + tqll).

(75) See footnote 74, page 45.

(148)

(76) IT we wanted 1 to have a unique maximum at the origin, we could instead define it by:

1(11.) = t _..:yK (x) -lIxW, where II . II is the Euclidean norm.

(146)

47

Page 50: Optimization and Lagrange Multipliers

Since f(O) = 0 and q. x = 0 for xEHq , this becomes:

f(x + tq) = t + o(llx + tqll),

i.e.,

,K(X)

IIx +tqll II x +tqll ..... O) O.

Using the sum norm,(77) this becomes:

,K(X)

IIxll + Iltqll IIx+tqll ..... O) 0,

which follows immediately from (136). I

(149)

(150)

(151)

While one could combine Theorems 3 and 4 into an "if and only if' theorem, we have separated them to make the proofs more readable. Because the remain­ing results are corollaries of these theorems, we state them in the symmetric fashion.

Theorem 5. (Equalities.) Let U be an open subset of IRn, and let U E U. Suppose h : U -+ IRk. Then h is Lagrange equality-regular for (u, 3' D (u) )(78) if and only if g(h) satisfies the Tangency-Path Criterion at u. In other words, if u maximizes f : U -+ IR subject to h = 0:

h(u) = 0

Vu uEU & h(u)=O f(u) ~ f(u),

and if f is differentiable at U, then there exists a PEIRk such that:

f'(u) + pTh'(u) = 0,

i.e., there exist real numbers Pl, ... , Pk such that:

of _ oh l _ ohk _

ou/u) + Pl OUi (u) + ... + Pk OUi (u) = 0 (i=I, ... ,n)

if and only if g(h) satisfies the Tangency-Path Criterion(79) at u.

(77) See (4c), page 6.

(78) See definition page 11.

(79) Page 37.

48

(152a)

(152b)

(153a)

(153b)

Page 51: Optimization and Lagrange Multipliers

Proof. This follows immediately from Theorems 3 and 4, by converting the equalities to inequalities, as in (97). I

And finally we have the most general path theorem:

Theorem 6. (Equalities and inequalities.) Let U be an open subset of IRn, and let UE U. Suppose 9 : U -+ IRm and h : U -+ IRk. Then (g, h) is Lagrange regular for (U,:7D(U», i.e., for functions f differentiable at U, if and only if g(h) satisfies the Tangency-Path Criterion(80) at U. In other words, if U maximizes f : U -+ IR subject to h = 0:

h(u) = 0

Vu uEU & h(u)=O feu) ~ feu),

(154a)

(154b)

and if f is differentiable at U, then there exists a AE IRm and a J.lE IRk such that:

f'(u) + AT g'(u) + J.lTh'(u) = 0,

I.e.,

of _ ogl _ ogm _ OUi (u) + A1 OUi (u) + ... + Am OUi (u)

oh1 Ohk + J.l1 ~(u) + ... + J.lk~(U) = 0

VUi VUi (i=l, ... ,n),

and such that:

A~O

Ai = 0 if iEJ,

if and only if g(h) satisfies the Tangency-Path Criterion(81) at u.

(155a)

(155b)

(156a)

(156b)

Proof. This again follows from Theorems 3 and 4, by converting equalities to inequalities, as in (97). I

(80) Page 37.

(81) Page 37.

49

Page 52: Optimization and Lagrange Multipliers

VIII Comparison of Jacobian and Tangency-Path Conditions

The Tangency-Path Criterion has the advantage of characterizing Lagrange regularity (Theorems 3, 4, 5, and 6). But it is an existential assertion - a statement that, for every vector z of a certain kind, there exists a path whose initial tangent is z, and which attends(82) the constraint set. And that may be hard to verify in particular instances.

On the other hand, the Jacobian Criteria have the advantage of being com­putable, in a sense we will describe below. But they do not completely charac­terize Lagrange regularity. (See the remarks concerning (90) and (91), page 34).) Being minimal conditions within the class of Jacobian conditions (as in Theo­rems 2.A, 2.B, and 2.C) is not the same as being necessary as well as sufficient for Lagrange regularity of constraint functions.

As to computability, let us explain what we mean when we say that the Ja­cobian criteria are computable. The idea is simple: the criteria can be applied using only simple rules and elementary arithmetic operations (addition, multi­plication, and division). More specifically, beginning with a particular Jacobian matrix - h'(u), for example - we know rules, or algorithms, from algebra that enable us to calculate the rank of the matrix. If the elements of the matrix were integers, then we could formalize the notion of algorithm and talk of a recursive function (or a Turing machine, or some other standard equivalent com­putability concept) yielding the result of the calculation. If the elements were rational numbers, we could represent them by pairs of integers, and again talk of computability in terms of recursive functions.

If, on the other hand, some of the matrix elements are irrational numbers, then we are in a context more general than basic recursion theory. While it is possible to formalize notions of computability in such a context,(83) we will be content with a few intuitive observations.

The first problem we face in explaining any notion of computability for real numbers is to determine how the numbers are presented. One approach would consider them to be presented simply as primitive entities sui generis. If we take the latter approach, and if we take as primitive notions the usual algebraic operations and relations: addition, multiplication, division, equality, and greater than, then we can "compute" the rank of a matrix, and determine if the rank has a given value. In any particular instance, then we can determine, in a "computable" manner, whether or not our Jacobian criterion (35) is true.

(82) Page 36.

(83) Cf. (8) and (38). for example.

50

Page 53: Optimization and Lagrange Multipliers

Similar remarks apply to the computability of (34a), for example. We know that the existence of a solution'; to such a system of inequalities can be de­termined in an algorithmic fashion by "elimination of quantifiers." That is a consequence of Tarski's celebrated theorem on the decidability of real closed fields [45]. In fact, because we are concerned with a system of linear inequali­ties, there are very simple algorithms (Fourier elimination) for eliminating the quantifiers. (84)

Similar remarks also apply to the computability of (34b). If we write gHu) as a p X n matrix A, then that wedge property is clearly equivalent to the statement that, for each vector ei (i = 1, ... , 2n) that is either a unit coordinate vector or the negative of one, there exists atE IR,P such that

ATt = ei t 2 o. (157)

Again, simple Fourier elimination provides an algorithm for determining whether or not (157) is solvable for t.

The same kind of computability arguments we have applied to (35) and (34) can be applied to show that our other Jacobian criterion (33) is a computable condition. By contrast, it does not seem likely that such simple algorithms exist to determine whether the Tangency-Path Criterion holds. Of course, to prove that would require a formal definition of computability, taking us beyond the scope of the present paper.

(84) Cf. [29], [43].

51

Page 54: Optimization and Lagrange Multipliers

IX Appendix

Our understanding of constraint qualifications, and our proofs of Lagrange regularity rest on one algebraic theorem and one analysis theorem. The algebraic theorem is a theorem of the alternative, or a transposition theorem.

Theorem of the Alternative. Let A, B, and C be matrices whose com-ponents are from IR. Suppose that

A has a rows and n columns, and a e IRa , B has b rows and n columns, and (J e IRb , C has c rows and n columns, and ,E IRe,

where 0 < aeIN, 0 < beIN, 0 < ceIN, and 0 < neIN.

Then:

I) Exactly one of (1) or (2) is true:

1) There exists r = (rl, ... , rn)eIRn solving:

Ar>a

Br ~ (J

Cr=,.

2) There exists u E IRa, V e IRb, and z eIRe, such that both ( a) and (b) hold:

a) uT A + vT B + zTC = 0

b) either (i) or (ii) hold:

i) u :2: 0 & v ~ 0 & uTa + vT (J + zT I ;::: 0

or

ii) u = 0 & v ~ 0 & vT (J + zT I > O.

II) When some, but not all, of (A, a) or (B, (J) or (C, I) are not present, then the same alternatives (1) and (2) in part (I) above hold with these modifications:

In (1,1): remove the first row if A and a are not present; remove the second row if Band {J are not present; remove the first row if C and I are not present.

In (2,a): set A = 0 and u = 0 if A and a are not present; set B = 0 and v = 0 if Band {J are not present; set C = 0 and z = 0 if C and I are not present.

52

Page 55: Optimization and Lagrange Multipliers

In (2,b): only case (ii) occurs if A and a are not present; set /3 = 0 and v = 0 if Band /3 are not present; set 'Y = 0 and z = 0 if C and 'Yare not present.

The theorem can be proved from the Transposition Theorem of Motzkin [36], [35, pp. 28(2)-29] which is a homogeneous version (equivalent to setting a = 0, /3 = 0, and 'Y = 0). A proof by Fourier elimination, along the lines of [43, pp. 1-20] or [29] yields a proof of the remark above concerning ordered fields and subfields.

The other pillar of our approach is an implicit function theorem with weaker than standard differentiability hypotheses. This is a special case of Theorem 1 of [24].

A Non-C1 Implicit Function Theorem. Let X x Y be an open subset of IRn x IRk and (x, y)eX x Y. Suppose 'I/J : X x Y -> IRk is differentiable at (x, y), and suppose that

'I/J(x, y) = 0;

'I/J(x, .) is continuous on Y, for all xeX;

'l/Jy(x, y) is surjective; i.e.,

fJ'l/J1(x, y) fJ'l/J1(x,y) fJYl fJYk

det I fJ'l/Jk(x, y) fJ'l/Jk (z)

fJYl fJYk Then:

II #0.

(158a)

(158b)

(158c)

a) There exists an open neighborhood Xo x Yo ~ X x Y of (x, y) and a function ¢ : Xo -> Yo such that:

'I/J(x,¢(x)) = 0

¢(x) = y. for all xeXo (159a)

(159b)

b) Every function ¢ : Xo -> Yo satisfying (159) is differentiable at x, with:

¢'(x) = ('l/Jy(x, y))-l'I/Jx(X, y). (160)

53

Page 56: Optimization and Lagrange Multipliers

X Historical Comments and Comparisons

The beginnings. Lagrange illustrated for one equality constraint, and sug­gested for multiple constraints a "general principle" :(85)

When a function of several variables is to have a maximum or a min­imum, and when there are one or more equations among these variables, it will suffice to add to the proposed function the functions which must vanish, each multiplied by an undetermined quantity, and then to seek the maximum or minimum as if the variables were independent; the equa­tions that one will find, combined with the given equations, will serve to determine all the unknowns.

Lagrange refers to maximizing or minimizing what we now call the Lagrangean function, but his analytics deal only with first order conditions. Modern the­ory embodies the maximization (or minimization) view in duality saddle point theorems, when certain convexity assumptions hold. In particular, there is now a bifurcation in the theory of constrained maximization. On the one hand, assuming these convexity properties, theorems (such as Kuhn and Tucker's The­orem 3 [30]) assert that constrained maxima correspond to saddle points of the Lagrangean function. These theorems do not generally require differentiability hypotheses (cf. Uzawa's [46]). On the other hand, a second class of results ob­tains the existence of Lagrange Multipliers through assuming differentiability properties but avoiding convexity assumptions. This second class of results pro­vides an important working tool for applied mathematics, but is of particular interest to economists, since by avoiding convexity hypotheses it permits the analysis of such phenomena as increasing returns to scale.

Euler [14] has been credited ([9], [11], [17], [28]) with originating a principle ("Euler's rule"), precursor of the Lagrange approach, for extremizing functions subject to constraints. In the context of the isoperimetric problem of the cal­culus of variations, Euler's rule states that minimizing J/: (F + AG)dt for some

multiplier A yields the same first order conditions as minimizing J/o' Fdt subject

to J/o' Gdt = constant. But according to Caratheodory [11, p. 177], Lagrange

was the first to recognize the fundamental significance of the parameters that came to be known as the Lagrange multipliers.

(85) Our translation of the end of Section 58, in Chapter XI of the Second Part of [32]. An earlier exposition of the idea is contained in [19], QUATRIEME SECTION, paragraphs 1-8, pages 44-49.

54

Page 57: Optimization and Lagrange Multipliers

Modern treatments of the Lagrange Multiplier Theorem for equalities are found in Bolza [9],(86) Caratheodory [11], Bliss [6], [7], as well as many recent textbooks. An earlier proof, applicable to real analytic functions, is found in Weierstrass [49].

Classical, Karush, and Kuhn-Tucker approaches. In order to establish existence of Lagrange multipliers, modern treatments of equality-constrained optimization, such as those mentioned above, assume the constraint functions are C 1 and have a Jacobian of maximum rank. The Cl hypothesis has been needed to apply the classical implicit function theorem, which uses a C 1 hypothesis.

Hancock [20, p. 150] suggested handling constrained maximization problems with inequality constraints ("limitations") by converting them into equality prob­lems. A constraint g(x) ~ 0 would be replaced by the equality g(x) + z2 = o.

The same conversion device was used by Valentine [47] in the treatment of a calculus of variations minimization problem with inequality side conditions. In Corollary 3:4, p. 9 (of the dissertation) and p. 415 (of the reprinting), he shows that the Lagrange coefficients for the inequality constraints are nonpositive (corresponding to nonnegative constraints in maximization problems).

Karush [26] also used the squared slack variable device to prove one of his results (cf. his Theorem 3:1, pp. 11-13). His remarks noted that, with a "normal­ity" condition and C2 constraints, the Lagrange coefficients would be nonpositive in his minimization problem. (87)

Karush's work went unnoticed for many years, although it contains most of the basic concepts and many of the results .of later work. His Theorem 3:2, for example, uses what he calls "Condition Q" and today is called the "Kuhn­Tucker Constraint Qualification," to obtain the existence of Lagrange multipliers of appropriate sign. Condition Q also marks a transition from Jacobian rank conditions to path conditions. And his Theorem 3:3 shows that a certain posi­tivity Jacobian condition, is sufficient for his constraint qualification to hold; this condition is essentially the same as our (34a) and as the inequality in Arrow­Hurwicz-Uzawa's Theorem 3. And he notes its computability by observing that a standard algebraic theorem " ... provides a useful method for determining in a finite number of; steps whether or not such an admissible vector ... does exist" [26, p. 11]. Although Karush assumes that his inequality constraint functions are

(86) Bolza's formulation makes clear that C l is a sufficient smoothness condition for Lagrange regularity. Earlier contributions appear to postulate real-analyticity.

(87) Neither Valentine nor Karush seem to refer to Hancock, and Karush does not refer to Valentine.

55

Page 58: Optimization and Lagrange Multipliers

C 1 , it is clear from his proof that he only uses differentiability at the maximizing point.

Fritz John [25] formulated first order necessary conditions for a minimum of a C 1 function! of finitely many variables subject to a (finite or infinite) set of C 1 inequality constraints ga ~ 0 (QEA). He showed that there exists a vector (AO, Ao" ... , Aam) '" 0 with AO ~ 0 and A01 ' .•• ' AOm > 0, such that, at a minimizing point if:

aL ax. (if) = 0 •

(i=1, ... ,n), (161)

where L is defined by:(88)

L(x) = AO! - (Ao, ga, + ... + Aargar ). (162)

Since he did not postulate any constraint qualifications, it may happen that AO '" O. (By contrast, Karush and Kuhn and Tucker, assuming constraint quali­fications, are able to obtain AO = 1.)(89)

As far as first order necessary consequences of constrained maximization is concerned, Kuhn and Tucker's paper [30] independently retraced part of the path traveled earlier by Karush. Their Theorem 1 is similar to Karush's The­orem 3:2, although Kuhn and Tucker are interested in nonnegative solutions of the inequality constraints, and they provide more information about the La­grange multipliers (analogous to our condition (20b)); they also reduced the C 1 hypothesis on constraints to differentiability. Kuhn later became aware of Karush's work and earlier references to it ([13], [37], [44], et al.), and related research by others. See [31] for his very informative historical account, and some general comments on the several quite divergent interests that converged in similar theorems. Because they use a constraint qualification, they are able to obtain a result for a Lagrangean of the form !(x) + A1g1(x) + ... + Amgm(X), as distinct from AO!(X) + A1g1(x) + ... + Amgm(X), which would be the Fritz John analogue.

Kuhn and Tucker only assumed differentiability of the constraint functions, in contrast to the C 1 assumptions of Karush and Fritz John. Their differentiabil­ity assumption remains stronger, however, than required by our Tangency-Path Criterion.

(88) For the case of equality constraints, Bolza [9, p. 547] attributes to Hilbert a similar Lagrangean expression, with >'0 multiplying the maximand and equal to 1 or 0 according to the rank of the constraint's Jacobian.

(89) In his Theorem 3:1, Karush also had a similar result, in the absence of any constraint qualification.

56

Page 59: Optimization and Lagrange Multipliers

Arrow, Hurwicz, and Uzawa's Constraint Qualification W [4] weakened the Kuhn-tucker constraint qualification. Their Theorem 1 proved that W is suffi­cient for Lagrange regularity, and their Theorem 2 proved that W is necessary for Lagrange regularity when the constraint set is convex.

Pennisi [37] considered maximization constrained by both inequalities and equalities. He obtained what we call Lagrange regularity, implicitly assuming all inequality constraints to be effective (satisfied with equality at the maximizing point), and assuming "normality" in the sense of Bliss.

Mangasarian and Fromovitz [34] also addressed maximization problems with both equality and inequality constraints. Without a constraint qualification, they obtained(90) an analogue of the Fritz John result, in which the coefficient of the maximand is allowed to be zero. Then, they introduced a new Jacobian-type constraint qualification, combining a condition of Arrow, Hurwicz, and Uzawa for inequality problems with the classical rank condition for equality problems; their new condition guaranteed that the coefficient of the maximand could be taken equal to one. In both results, the constraints were assumed to be continuously differentiable.

In Mangasarian [35, p. 173, Theorem 6, part (iv)] the second result of [34] (under the "modified Arrow-H urwicz-U zawa constraint qualification") is ob­tained with the hypothesis on the inequality constraints reduced from continuous differentiability to differentiability. In our Theorem l.A above it is reduced still further to differentiability and local continuity at the constrained maximizing point.

We have mentioned how our results compare with a few of those above. For more detailed comparisons, we will distinguish two categories: (a) Jacobian conditions, and (b) path conditions.

(a) Jacobian criteria. Here the main innovations are three-fold. First, we reduce the smoothness requirements on equality constraints from C 1 in a neigh­borhood to differentiability at the maximizing point and continuity in a neigh­borhood (cf. Theorem l(A,B), and Corollary 1A). In this way the hypotheses for equality-constrained problems become more like the hypotheses for inequality­constrained problems mentioned below.(91) The weaker smoothness condition is made possible by the Non-C! Implicit Function Theorem.(92)

(90) See [34, pp. 41-42]

(91) This is in the spirit of the differentiability hypotheses of Theorem 3 and Corollary 6 of [4].

(92) Which, in fact, was developed with these applications in mind [24].

57

Page 60: Optimization and Lagrange Multipliers

Second, we introduce the notion of minimal Jacobian conditions. And third, we provide new Jacobian criteria and prove they are minimal.

Theorem l.A uses a Jacobian Criterion weaker than earlier Jacobian con­straint qualifications, and it also relaxes the requirement that h be C 1 to mere differentiability at the maximum point and continuity locally, as indicated above. It is a generalization of Mangasarian's Theorem 6(iv), page 173 of [35] in several respects. First, our Mixed-Problem Jacobian Criterion (33a) is weaker: part (a) of Theorem l.A corresponds to Mangasarian's "modified Arrow-Hurwicz-Uzawa constraint qualification," but if (a) is not satisfied, then our part (b) provides another alternative (which, according to Theorem 2.A, makes the combination the weakest possible Jacobian condition).

Kuhn and Tucker are explicit about the need for a constraint qualification, giving an example ([30, pp. 483-484]) in which Lagrange regularity fails in its absence. Our examples (14) and (15), for which Lagrange multipliers may not exist, are similar in spirit. Slater also had an example illustrating the need for a constraint qualification; although it was proposed in the context of concave programming and the saddle Point Equivalence Theorem, rather than Lagrange regularity, it is applicable in the present setting as well.

Theorem l.B uses a Jacobian Criterion weaker than earlier Jacobian con­straint qualifications for inequality-constrained problems, and it only requires Gateaux differentiability at u. If we consider Mangasarian's Theorem 6(iv), [35, p 173], when it is restricted to the case of inequality constraints, then our Theorem l.B has a weaker constraint qualification (Mangasarian's "modi­fied Arrow-H urwicz-U zawa constraint qualification" again amounts to part (a) of our Criterion).

Theorem l.C generalizes the classical Lagrange Multiplier Theorem by re­ducing the traditional C1 hypothesis on the constraint functions to mere differ­entiabilityat the constrained maximizer.

The paper introduces the concept of Lagrange regularity for matrices, and uses it in Theorems 2.A, 2.B, and 2.C, to establish that the Jacobian Criteria are "minimal" Jacobian constraint qualifications. We are not aware of earlier results along these lines.

(b) Path criteria. We developed our Tangency-Path Criterion(93) based in part on Hestenes' use of the tangent cone ([21, pp. 25 ff.], [22, pp. 203 ff.]) and in part by analogy with Constraint Qualification W of Arrow, Hurwicz, and Uzawa [4]. Subsequently we discovered that the paper by Gould and Tolle [18]

(93) Page 37.

58

Page 61: Optimization and Lagrange Multipliers

contains a closely related, but stronger constraint qualification. Although their condition is stated in terms of dual (polar) cones, and although it is stated for both equalities and inequalities, if we state it in our terminology and express the equalities through pairs of inequalities, it amounts to L(g) = V(g), in contrast to our weaker condition L(g) ~ V(g). Furthermore, they assume that the constraint functions are continuous in a neighborhood of the constrained maximum and are differentiable at the constrained maximum, while we only require existence of partial derivatives at the constrained maximum and impose no continuity requirements beyond that.

By itself, the inclusion condition (98b) is weaker than Gould and Tolle's equality condition. This can be seen from the following example:(94)

l( ) {y, forxl=O 9 XI,X2,Y = -1, otherwise

2 ) {-y, 9 (Xl, X2, Y = -1,

for X2 = ° otherwise.

(163)

This pair (g1 ,g2) fails to satisfy the Frechet differentiability and continuity con­ditions that Gould and Tolle impose, as well as their L(g) = V(g) constraint qualification. Nevertheless, it is partially differentiable at the origin (0,0,0), and satisfies our criterion since L(g) ~ V(g) and so by our Theorem 3 it is Lagrange inequality-regular (as a direct proof also shows). However, if one im­posed Gould and Tolle's stronger continuity and differentiability assumptions(95) on constraints, then our subset condition would imply their equality condition (by [1, Lemma 4]).

Despite the stronger hypotheses of Gould and Tolle, and the more demanding constraint qualification, the proofs we had developed for our Theorems 3 and 4 turned out to be similar to corresponding parts of Gould and Tolle's proof. Indeed, it appears that parts of their proofs could be used to yield alternative proofs of Theorems 3 and 4.

Reducing smoothness requirements and weakening the constraint qualifica­tions are really two sides of the same coin. Under the weak partial derivative hypothesis, the weak constraint qualification (the Tangency-Path Criterion) is necessary and sufficient for Lagrange regularity (Theorems 3 and 4 above); under the stronger differentiability hypothesis, the stronger constraint qualification is necessary and sufficient for Lagrange regularity (Gould and Tolle's Theorem).

(94) See also (102), p. 39, and (104), p. 39.

(95) Stronger than the existence of partial derivatives postulated in the Tangency-Path Criterion.

59

Page 62: Optimization and Lagrange Multipliers

The tangent cone was applied to optimization problems by Hestenes [21], [22], Abadie [1], and Varaiya [48]. It was defined by Bouligand [10, paragraph 68, pp.65-66] (as the contingent set).

To the best of our knowledge, the Tangency-Path Criterion, in its present form, is new. As a sufficient condition (Theorem 3) for Lagrange regularity for inequalities (hence, by conversion, also equalities), it is weaker than any of the previously proposed constraint qualifications. Furthermore, the Criterion applies to a wider class of constraint functions, because it requires only the existence of partial derivatives, and only at the maximizing point.

Generalized derivative notions. Finally we mention another direction in which one can extend the notion of Lagrange regularity. Clarke uses the concept of generalized gradient to state a "Lagrange Multiplier Rule," [12, The­orem 6.1.1, page 228]. He does not assume the existence of partial derivatives, but he does assume that the functions J, gi, hj are "Lipschitz near any given point." This hypothesis is neither stronger nor weaker than ours, and the con­clusion is weaker. Instead of the Lagrange Multiplier equality, he obtains the weaker condition "OeoxL(x,>.,r,s,k)," where Ox denotes the sub differential at x.

60

Page 63: Optimization and Lagrange Multipliers

References

1. J. Abadie, On the Kuhn-Tucker Theorem, North-Holland, New York, NY, 1967, Nonlinear Programming.

2. K. J. Arrow and L. Hurwicz, Reduction of Constrained Maxima to Saddle­Point Problems, in Proceedings of the Third Berkeley Symposium on Mathemat­ical Statistics and Probability, vol. V, University of California Press, Berkeley and Los Angeles, 1956, 1-20.

3. , eds., Studies in Resource Allocation Processes, Cambridge Uni-versity Press, Cambridge, 1978.

4. K. J. Arrow, L. Hurwicz and H. Uzawa, Constraint Qualifications in Max­imization Problems, Naval Research Logistics Quarterly 8 (1961), 175-191.

5. A. Avez, Differential Calculus, John Wiley & Sons, New York, NY, 1986.

6. G. A. Bliss, Normality and Abnormality in the Calculus of Variations, Transactions of the American Mathematical Society 43 (1938), 365-376.

7. , Lectures on the Calculus of Variations, University of Chicago Press, Chicago, 1946.

8. L. Blum, M. Shub and S. Smale, On a Theory of Computation and Com­plexity over the Real Numbers: N P-Completeness, Recursive Functions and Universal Machines, Bulletin of the American Mathematical Society 21 (1989), 1-46.

9. O. Bolza, Vorlesungen iiber Variationsrechnung, Chelsea Publishing Com­pany, New York, NY, no date, author's preface dated 1909, second edition.

10. G. Bouligand, Introduction a la Geometrie Infinitesimale Directe, Li­brairie Vuibert, Paris, 1932.

11. C. Caratheodory, Calculus of Variations and Partial Differential Equa­tions of the First Order, Chelsea Publishing Company, New York, NY, 1982, originally published as Variationsrechnung und Partielle Differentialgleichungen erster Ordnung, B. G. Teubner, Berlin, 1935.

12. F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, NY, 1983.

13. M. A. EI-Hodiri, The Karush Characterization of Constrained Extrema of Functions of a Finite number of Variables, UAR Ministry of Treasury Research Memoranda, Series A, No.3, July, 1967.

61

Page 64: Optimization and Lagrange Multipliers

14. L. Euler, Methodus Inveniendi Lineas Curvas, in Leonhardi Euleri Opera Omnia, series prima, XXIV, vol. 34, Swiss Society of Natural Sciences, Bern, 1952, reprinted from Methodus Inveniendi Lineas Curvas, Lausanne & Geneva, 1744.

15. J. P. Evans, On Constraint Qualifications in Nonlinear Programming, Center for Mathematical Studies in Business and Economics, Graduate School of Business, University of Chicago, Report 6917, Chicago, May, 1969.

16. W. Fenchel, Convex Cones, Sets and Functions, Princeton University. Department of Mathematics, Princeton, 1953, from notes by D. W. Blackett of lectures at Princeton University, Spring term, 1951.

17. H. H. Goldstine, A History of the Calculus of Variations, Springer-Verlag, New York, NY, 1980.

18. F. J. Gould and J. W. Tolle, A Necessary and Sufficient Qualification for Constrained Optimization, SIAM Journal of Applied Mathematics 20 (1971), 164-172.

19. J. L. de la Grange, Mechanique Analitique, Chez la Veuve Desaint, Li­braire, Paris, 1788. See also [33].

20. H. Hancock, Theory of Maxima and Minima, Ginn and Company, Boston, 1917.

21. M. R. Hestenes, Calculus of Variations and Optimal Control Theory, John Wiley & Sons, New York, NY, 1966.

22. , Optimization Theory, John Wiley & Sons, New York, NY, 1975.

23. J. R. Hicks, Value and Capital, Oxford University Press, Oxford, 1939.

24. L. Hurwicz and M. K. Richter, Implicit Functions and Diffeomorphisms without C 1 , Discussion Paper No. 279, Department of Economics, University of Minnesota, 1994.

25. F. John, Extremum Problems with Inequalities as Subsidiary Conditions, in Studies and Essays: Courant Anniversary Volume, K. O. Friedrichs, O. E. Neugebauer and J. J. Stoker, eds., Interscience Publishers, New York, 1948, 187-204.

26. W. Karush, Minima of Functions of Several Variables with Inequalities as Side Conditions, 1939, Master of Science Dissertation, Department of Math­ematics, University of Chicago.

27. V. L. Klee, Jr., Separation Properties of Convex Cones, Proceedings of the American Mathematical Sociaty 6 (1955), 313-318.

28. A. Kneser, Variationsrechnung, Teubner, Leipzig, 1899-1916, originally published as Heft 5, 1904.

62

Page 65: Optimization and Lagrange Multipliers

29. H. W. Kuhn, Solvability and Consistency for Linear Equations and In­equalities, American Mathematical Monthly 63 (1956), 217-232.

30. H. W. Kuhn and A. W. Tucker, Nonlinear Programming, in Second Berke­ley Symposium on Mathematical Statistics and Probability, J. Neyman, ed., University of California Press, 1951,481-492.

31. H. W. Kuhn, Nonlinear Programming: A Historical View, Nonlinear Pro­gramming, American Mathematical Society, Providence, RI, 1976, in Nonlinear Programming, ed. by Richard W. Cottle and C. E. Lemke.

32. J. L. Lagrange, Theorie des Fonctions Analytiques, Courcier, Paris, 1813, Nouvelle edition.

33. , Mecanique Analytique, Gauthier-Villars et Fils, Bureau des Longitudes, de l'Ecole Poly technique, Paris, 1888, in volumes XI and XII of Ouevres. See also [19].

34. O. L. Mangasarian and S. Fromovitz, The Fritz John Necessary Optimal­ity Conditions in the Presence of Equality and Inequality Constraints, Journal of Mathematical Analysis and Applications 17 (1967),37-47.

35. O. 1. Mangasarian, Nonlinear Programming, McGraw-Hill, New York, NY, 1969.

36. T. S. Motzkin, Beitriige zur Theorie der linearen Ungleichungen, Basle, 1934, (Dissertation).

37.1. 1. Pennisi, An Indirect Sufficiency Proof for the Problem of Lagrange with Differential Inequalities as Added Side Conditions, Transactions of the American Mathematical Society 74 (1953), 177-198.

38. M. B. Pour-El and J. I. Richards, Computability in Analysis and Physics, Springer-Verlag, New York, NY, 1989.

39. R. T. Rockafellar, Convex Analysis, Princeton University Press, Prince­ton, 1970.

40. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill, New York, NY, 1976, third edition.

41. P. A. Samuelson, Foundations of Economic Analysis, Harvard University Press, Cambridge, MA, 1947.

42. M. Slater, Lagrange Multipliers Revisited: A Contribution to Nonlin­ear Programming, Cowles Commission Discussion Paper, Math. 403, November, 1950.

43. J. Stoer and C. Witzgall, Convexity and Optimization in Finite Dimen­sions I, Springer-Verlag, New York, NY, 1970.

63

Page 66: Optimization and Lagrange Multipliers

44. A. Takayama, Mathematical Economics, The Dryden Press, Hinsdale, IL, 1974.

45. A. Tarski, A Decision Method for Elementary Algebra and Geometry, University of California Press, Berkeley, 1951, Second Edition, Revised.

46. H. Uzawa, The Kuhn-Tucker Theorem in Concave Programming, in Stud­ies in Linear and Non-Linear Programming, K. J. Arrow, 1. Hurwicz and H. Uzawa, eds., Stanford University Press, Stanford, 1958,32-37.

47. F. A. Valentine, The Problem of Lagrange with Differential Inequalities as Added Side Conditions, in Contributions to the Calculus of Variations 1933-1937: Theses Submitted to the Department of Mathematics of the University of Chicago, University of Chicago Press, Chicago, 1937,403-447.

48. P. P. Varaiya, Nonlinear Programming in Banach Space, SIAM J. Appl. Math. 15 (1967), 284-293.

49. K. Weierstrass, Mathematische Werke, Akademische Verlagsgesellschaft, Leipzig, 1927, edited from notes of Weierstrass' 1875-188 lectures.

64