Convexity of Quadratic Transformations and Its Use …lab7.ipu.ru/files/polyak/JOTA'98.pdfQuadratic forms, convexity, numerical range, S-pro-cedure, nonconvex quadratic optimization,

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 99, No 3. pp. 553-583, DECEMBER 1998

Convexity of Quadratic Transformations andIts Use in Control and Optimization1,2

B. T. POLYAK3

Abstract. Quadratic transformations have the hidden convexity prop-erty which allows one to deal with them as if they were convex functions.This phenomenon was encountered in various optimization and controlproblems, but it was not always recognized as consequence of somegeneral property. We present a theory on convexity and closedness ofa 3D quadratic image of Rn, n > 3, which explains many disjoint knownresults and provides some new ones.

Key Words. Quadratic forms, convexity, numerical range, S-pro-cedure, nonconvex quadratic optimization, ellipsoidal bounding.

1. Introduction

The first assertion on convexity of quadratic maps is due to Toeplitz(Ref. 1). He considered a quadratic form z*Az, where A is an n x n complexmatrix, and proved that the set of values of this form for z belonging to theunit sphere in Cn (numerical range of A) has a convex boundary; i.e.,its intersection with any supporting hyperplane is convex. One year later,Hausdorff (Ref. 2) extended this result by proving that this set is convex.The Toeplitz-Hausdorff theorem can be reformulated as follows: given twoHermitian matrices A1, A2 and the vector function

then the set {f(z), || z || = 1} is convex in R2 . The strongest result in thisdirection is that such set is convex for three Hermitian forms; see e.g., Refs.3-4. An application of this fact to the so-called n -analysis, or structured

5530022-3239/98/1200-0553$15.00/0 © 1998 Plenum Publishing Corporation

1This work has been supported by Grants INTAS-93-1034ext and RFFI-96-01-00993.2The author wishes to thank N. K. Tsing, P. S. Scherbakov, M. Fu, V. A. Yakubovich, A. L.Fradkov, A. Tits, M. Teboulle, A. Nemirovskii, and A. loffe for helpful discussions andnumerous references.

3Principal Researcher, Institute for Control Sciences, Russian Academy of Sciences, Moscow,Russia.

554 JOTA: VOL. 99, NO. 3, DECEMBER 1998

singular-value computation, can be found in Ref. 5. Two monographs (Refs.6-7) contain various extensions of the numerical range and a wide list ofreferences.

In the present paper, we focus on the real, not complex case. The firstresult in this field was obtained by Dines in 1941 (Ref. 8). Consider tworeal quadratic forms

where Ai are real symmetric matrices and xeRn. Dines proved that the 2Dimage of Rn, that is, the set

is convex, and it is closed as well under some additional assumptions. Thenext contribution is due to Brickman (Ref. 9): if n>3, then the image ofthe unit sphere, that is, the set

is convex. The same statement was obtained independently in Ref. 10, Corol-lary 3 to Lemma 3.3. Some deep links of this result can be found in Ref.11.

In Section 2, we provide a general result which extends the Dines andBrickman theorems. Namely, we prove that the 3D quadratic image of Rn

is convex provided there exists a positive-definite linear combination of thequadratic forms. Another statement relates to the image of a space undernonhomogeneous quadratic maps. In Section 3, we discuss all theassumptions of the results presented and construct numerous examples andcounterexamples. Section 4 is devoted to the S-procedure, which deals withthe nonnegativity of a quadratic form on a set specified by quadratic inequal-ities. The first result of this sort is the Finsler theorem, sometimes alsoknown as the Debreu lemma (Ref. 12); it claims that, if (A1 x, x) >0 for allx such that (A2x, x) = 0, then there exists t such that A1 + tA2 is positivedefinite. Extensions to the case of m quadratic equalities are due to Hestenesand McShane (Ref. 13); other extensions and references can be found inRef. 14. The S-procedure was developed and exploited by Yakubovich andhis students (Refs. 10, 15, 16); it turned out to be a powerful tool in systemand control applications (Ref. 17). Yakubovich emphasized the role of con-vexity in the validation of the S-procedure. We show that the more generalconvexity results presented in this paper allow one to extend the applicationsof the S-procedure; for instance, the S-procedure can be validated for twoquadratic forms under an additional assumption. Then, we exploit the tech-nique for ellipsoidal approximation in Section 5. For instance, we providenecessary and sufficient conditions for an ellipsoid to contain the intersection

(b) the set F is an acute closed convex cone and the quadratic formsf1 ( x ) , f 2 ( x ) , f 3 ( x ) have no common zero except zero.

Proof. If feF, f=f(x), A > 0, then Af=f (JZx )eF; thus, F is a cone.

(a) => (b). The properties of acuteness, closedness, and convexity areinvariant with respect to linear transformation of a space. On the otherhand, a linear combination of quadratic forms is a quadratic form. Hence,we can choose g=Tf such that

The sets D, B, F defined above will be used elsewhere in the paper. Asusual, the notation A > 0 [A > 0] means that a matrix A is positive definite[nonnegative definite], and we write g > 0 for g(x) = (Ax, x) with A > 0. Now,we can formulate the main result of this paper.

Theorem 2.1. For n>3, the following assertions are equivalent:

(a) there exists jjeR3 such that

and the quadratic map f: Rn ->R3 with components f1, f2, f3. Without lossof generality, the matrices Ai can be assumed to be symmetric. We definethe image of Rn under this transformation,

or the sum of two ellipsoids and find the smallest ellipsoid. Such problemsare encountered often in identification and state estimation with unknown-but-bounded noises (Refs. 17-20). Next, Section 6 is related to optimizationapplications. It has been known long ago that the minimization of a noncon-vex quadratic function on an ellipsoid is a relatively simple problem andduality theory holds in this case (Refs. 16 and 21-25). Now, it becomes clearthat this is a consequence of the convexity property. Finally, Section 7contains conclusions and directions for further research.

2. Main Theorem

A set K<= Rn is a cone if xeK implies AxeK for all A>0. It is acute ifKcontains no straight lines, i.e., xeK, x=0 imply -x$K.

Let us consider three quadratic forms on Rn,

JOTA: VOL. 99, NO. 3, DECEMBER 1998 555

where xeRn , a ieRn, a ieIR1, and Ai are symmetric n x n matrices. Denote

thus, (1) holds. This completes the proof of the theorem.

We relegate the discussion of the result to the next section and providenow a convexity property related to nonhomogeneous quadratic functions.Consider the functions

is equivalent to

and hence G is convex.The set H is closed as a continuous image of the unit sphere. This

implies closedness of Q. A convex set is closed iff its intersection with asupporting hyperplane is nonempty and closed. But if an intersection ofsuch hyperplane with G contains any nonzero point, this intersection is {AS,A>0}, where S is the intersection of the hyperplane with Q. The latter isclosed; thus, G itself is closed.

The set G is acute, because g3 (x) > 0 for all x = 0. Indeed, if g=g(x) e G,g=0, then g 3 (x)>0 and — g cannot belong to G.

Condition (1) implies that fi(x) have no common zero except 0.

(b) => (a). If n = c, where c is as in Lemma 8.2 (see Appendix), then(U,f)> 0 for all feF,f=0. Since f(x) = 0 iff x = 0 due to the lack of commonnontrivial zeros ot f i(x), the condition

where

is convex, but

and F is an acute closed convex cone iff G= {g(x) : xeR n } shares theseproperties. Moreover, by making a nonsingular linear transformation of Rn

(it does not change G), we can assume that g3= ||x||2. Due to the Brickmantheorem, the set



Theorem 2.2. Suppose that n>2 and there exists /jeR2 such that

Then, O is closed and convex.

Proof. We construct the following homogeneous forms associatedwith (pi(x):

and consider the 3D-image

Let us show that Theorem 2.1 is applicable. First, dim z>3. Second, thequadratic forms fi(z) are defined by the matrices

Their linear combination with weights n1, n2, n3, with n3>0 being large

enough, will be positive definite. Thus, all the conditions of Theorem 2.1hold and the set F is closed and convex. The set

is the cross section of F by the hyperplane f3 = 1, and thus it is also closedand convex. But we have *P =O. Indeed,f3(z) = 1 means that either t= 1 orf = — 1. In the first case,

and the corresponding points in *P and O coincide. In the second case, if(x, -1) generates a point in *P, then -x generates the same point in <1>,because

3. Discussion and Examples

3.1. Comparison with Known Results. A direct proof of the aboveresults can be provided, but we chose the shortest way, which heavily relieson the Brickman theorem. At the same time, Theorem 2.1 can be viewed as

i.e., <J> is the set of points lying above the parabola

which is nonconvex.

Then, direct calculations give

is an acute closed convex cone for n>2 and its cross section by the hyper-plane f3 = 0 is also an acute closed convex cone. Thus, we obtain that D isacute and closed provided (2) holds. Dines (Ref. 8) proved that D is closedif f1 ,f2 have no common zero except x = 0. This condition for n > 3 is equiva-lent to (2); see Section 3.3 below.

3.2. Counterexamples. Let us demonstrate that, if (1) or (2) does nothold, this may lead to the loss of convexity, closedness, and acuteness.

Example 3.1. Consider the following two nonhomogeneous quadraticfunctions in R2,

then, by Theorem 2.1, the set

is a projection of F on the hyperplane f3 = 0. A projection of a convex setis convex; thus, we arrive at the Dines theorem for n > 3; the case n = 2 isdiscussed below.

If relation (2) holds for these two forms, then we can extend the spaceas in the proof of Theorem 2.2. Let z = (x, t)T: take

is a cross section of F by the hyperplane f3 = 1. On the other hand, the set

condition (a) of Theorem 2.1 is satisfied with u = (0, 0, 1), and the 2D-set

an extension of the Brickman and Dines results. Indeed, if we have twoarbitrary quadratic forms in Un, n > 3, then we can take


i.e., F is an acute closed convex cone; however, (1) does not hold. Thisexample shows the necessity of the second part of condition (b): the lackof common nontrivial zeros for given quadratic forms.

3.3. Positive Definiteness of a Linear Combination of Matrices. A natu-ral question is how to check if conditions (1), (2) hold for given matricesAi.

Then,

it is a closed convex cone, but not an acute one.

Examples 3.2 to 3.4 confirm also that strict inequality in condition (1)cannot be relaxed to nonstrict inequality. However, the following relaxedversion of Theorem 2.1 is true: if F is convex, then either F=R3 or thereexists 0=p eR3 such that (1) holds as a nonstrict inequality. This is a directcorollary of Lemma 3.1 below.

Example 3.5. Take

Then,

Then, F is not closed: the point (0, 1, 0)T lies on the boundary of F, butdoes not belong to F.

Example 3.4. Let

is the same as 4> in Example 3.1; thus, B is nonconvex. But it is a crosssection of F by the hyperplane f3 = 1; hence, F is also nonconvex.

Example 3.3. Let

Then, the set

Example 3.2. Consider the following three quadratic forms in R3:


It is not hard to verify that this function cannot be nonnegative for all vi

for any t if t1>t2 . Similarly, it is nonnegative (positive) under conditions(b) and (c).

An analogous result can be obtained for a linear combination expressedin the form

Hence, any xeUn can be expanded as x = £ vixi. After simple calculations,

we obtain

a MATLAB software can be exploited for numerical implementation. If thereare complex eigenvalues Ai, then (2) fails. Viceversa, if (2) holds, then allthe generalized eigenvalues Ai are real; we can use the following result, whichprovides a complete solution for the nondegenerate case.

Proposition 3.1. Suppose that all Ai are real and distinct and that(A2x

i, xi) =0 for all i. Scale xi so that |(A2x i , xi)| = 1 for all i and denote

Compute t2=min i e I 2(—A i) , t1 = — minieIl Xi, with t2= °o, if I2 is empty, andt1 = -oo, if I1 is empty. Denote A(t) = A1 + tA2. Then:

(a) A ( t ) > 0 , for all t, t 1 >t 2 ;(b) A(t)>0, for t1<t<t2 (and only for such t), if t1< t2 ;(c) A(t) > 0, for t1 < t < t2 (and only for such t), if t1 < t2 .

Proof. Because all Ai do not coincide, xi are A2-orthogonal,

For two matrices, there are several useful criteria. One of them is dueto Finsler (Ref. 12): for n>3, condition (2) is equivalent to the lack of acommon zero of fi (x),f2(x) other than x = 0. The excellent survey by Uhlig(Ref. 14) is devoted to various proofs and generalizations of this result.Another criterion can be traced back to Weierstrass (see Ref. 14): if (2)holds, then the matrices A1, A2 can be diagonalized simultaneously.

From a computational point of view, the following approach looksattractive. Solve the generalized eigenvalue problem for the matrix pencilA1, A2; i.e., find Ai, x

i such that


However, this gives no explicit solution to the problem. There are particularclasses of matrices where (1) can be checked easily. For instance, if Ai areall diagonal, then (1) is a system of n linear inequalities in two-dimensionalspace (we can normalize p) and it can be solved graphically. In the generalcase, condition (1) can be considered as a linear matrix inequality (LMI)with vector variables, and existing LMI tools (see references in Ref. 17) canbe employed.

However, in most applications below, checking (1) or (2) is a trivialproblem, because either one of the matrices Ai is positive definite or u canbe found by some additional arguments.

3.4. Parametric Description of the Boundary. The sets F, B, D, <I> intro-duced above have simple one-parametric descriptions of their boundary. Itis based on the following well-known fact from convex analysis.

Lemma 3.1. If Kc Rm is a convex cone, then for any boundary pointfedK, there exists 0=TeRm such that (r,f)>0, VfeK, ( r , f 0 ) = 0.

Combining this with Theorem 2.1, we obtain the following proposition.

Proposition 3.2. Suppose that n>3 and (1) holds. Then, for anyf 0 edF, there exists 0=reR3 such that AT = Zi=1 riAi>0 and f0=f(x0),where ATx0 = 0.

In fact, this is a one-dimensional description of dF. Indeed, we havetwo conditions for r e R3: the scaling condition || r || = 1 and the conditionthat Ar>0 has zero eigenvalue.


In particular, we can conclude that, if (2) holds, then A(t)>0 if and onlyif t1 < t < t2 for some t1, t2.

Another approach to checking (2) is a cut-and-try method. If we haveseveral points xi, i= 1,. . . ,k, and if the corresponding values f ( x i ) do notlie in one half-plane of R2 (for instance, if they lie in all four orthants of R2

or the origin belongs to the interior of their convex combination), then (2)is violated. If they do lie in a half-plane, the normal to the half-plane is acandidate for p.

For the case of three matrices, there are few analytical characterizationsof (1). The result in Ref. 13 is not of a constructive nature. Dines (Ref. 26)suggested another criterion: (1) does not hold if and only if there existsA > 0 such that

Then, B is a triangle with vertices (0,0)T, (1,1)T, (1, -1)T; each edge of thetriangle corresponds to a single value of t, while each vertex is generated byx(t) for values of t belonging to a line segment or a ray.

A similar explicit description of D can be provided.

Proposition 3.4. Let the assumptions of Proposition 3.1 hold andt1 <t2. Denote by x1, x2 those eigenvectors x' which are associated with theeigenvalues — t 1 , t2. Then, the set D is an acute sector on the plane R2

generated by two vectors d1 =f(x1), d2=f(x2).

3.5. 2D-Case. The case n = 2 can be considered separately; we pro-vide a result related to m quadratic forms. Suppose that f: R2 -»Rm is aquadratic transformation with components

If A(t) has a multiple smallest eigenvalue for some t, the situation may bedifferent, as the following example shows.

Example 3.6. Let

Proposition 3.3. Suppose that n > 3 and A (t) is a simple eigenvalue forall fe[0,2n). Then, B is strictly convex and dB= {f(x(t)), te[0,2n)}; thatis, the boundary of B is generated by the points x(t).

Indeed, due to the Brickman theorem, B is convex and closed, and itssupport function is

let A (t) be the minimal eigenvalue of A(t), and let x(t) be the associatedeigenvector,

Let us show how a similar description works for the set B. Denote

JOTA: VOL. 99, NO. 3, DECEMBER 1998562

The intersection of K with this plane is a circle. Its linear image H is a2D-ellipse. Similarly, in the second case, f0(x) = 1 implies k1-k2=1. The

In the first case, the condition f0 (x) = 1 is equivalent to

Now, without loss of generality, we can assume that A0 has one of thefollowing forms:

is a linear image of this cone in Um, because any component of f ( x ) is linear,

that is, they are represented by the so-called ice cream cone K. The set

Then, matrices of the form XXT are characterized by the conditions

Any matrix in S can be written as

in the space S of 2 x 2 real symmetric matrices; then,

where f0(x) = (A0x, x) and Ai, i = 0, 1,. . ., m, are 2 x 2 symmetric matrices.The set H is nonempty if A0 has a positive eigenvalue; denote it by A1, anddenote the second eigenvalue by A2. If we have a second-order curve in R2

(an ellipse, a parabola, or a branch of a hyperbola), then its image in Rm

under a linear mapping will be referred to as a 2D-ellipse, 2D-parabola, and2D-hyperbola.

Proposition 3.5. For m>2, H is a 2D-ellipse, 2D-parabola, or 2D-hyperbola provided A2 > 0, A2 = 0, or A2 < 0, respectively.

Proof. We introduce the scalar product

and let


But

Then,

and denote

Suppose that

and

This implies that

be an eigenvector and an eigenvalue of

intersection of this plane with K is a parabola; hence, H is a 2D-parabola.Finally, in the third case, we obtain the plane -2k2= 1, and its intersectionwith K is one branch of a 2D-hyperbola.

Note that, for all nondegenerate situations (i.e., when F is 2D, not 1D),the set H is nonconvex. This confirms the necessity of the condition n > 3 inTheorem 2.1.

3.6. Location of Eigenvalues. The sets B, D, F possess many remark-able properties. We provide just one of them, which is the analog of theToeplitz-Hausdorf statement that the eigenvalues of a complex matrix liein its numerical range.

Proposition 3.6. The set Bc C (i.e., with R2 identified with the complexplane) contains all eigenvalues of the matrix A1 + iA2.

Proof. Let


and B is convex. Hence,


this is the desired statement. Obviously, it is also true if a = 0 or B = 0.

3.7. Case of m Quadratic Forms. The problem of extending the resultsto more than three quadratic forms remains challenging. So, we consider mquadratic forms

and the question is when the set

is convex. The following example shows that the existence of a linear posi-tive-definite combination of the forms does not guarantee convexity.

Then,

and the intersection of F4 with the hyperplanes f3 = l, f4= 1 is the circle

that is, F4 is nonconvex.

On the other hand, there exist situations for arbitrary m, where convex-ity does take place. The following result is known (see, e.g., Ref. 16).

Proposition 3.7. If the matrices A1, . . . , Am commute, then Fm is aclosed convex cone for all m, n.

Indeed, in this case the matrices can be diagonalized simultaneouslyand, after linear transformation of the space, we can suppose that

Taking

Example 3.7. Let n > 2, m = 4, and

we obtain


where D is the matrix with entries dik and y is the vector with componentsyk. Obviously, this is a closed convex cone.

Note that this set is not necessarily acute and that, under the conditionsof Theorem 2.1, the matrices are not guaranteed to be diagonalizedsimultaneously.

4. S-Procedure

4.1. Quadratic Forms. Given two quadratic forms

in Rn and real numbers ai, i= 1, 2, the problem is how to characterize allf0(x) = (A0x, x), a0 such that

Such problem arises in numerous control applications (Ref. 17), ellipsoidalapproximation techniques (see Section 5 below), and various optimizationproblems. For instance, for f0(x) fixed, a0 is the optimal value of the objec-tive function in the nonconvex quadratic program

and the result below can be viewed as a necessary and sufficient optimalitycondition. Note that we do not assume normality (Refs. 13, 27) of thisoptimization problem, and the result also can be treated as a second-orderoptimality condition for abnormal and singular nonquadratic mathematicalprogramming problems; for recent results in this direction, see Ref. 28.

Theorem 4.1. Suppose that n>3 and there exist //eR2, x0eUn suchthat

Then, (3) holds if and only if there exist T1 >0, T2>0 such that

Then, (4) and (5) are satisfied,

thus (6), (7) imply (3).

The examples below confirm that all the conditions of this theorem arenecessary.

Example 4.1. Let n = 2, ai = 0, i = 0, 1, 2, and

we arrive at (6), (7). The opposite inclusion is trivial: if f i(x)<a i, i= 1, 2,and if (6), (7) hold, then for Ti>0, i= 1, 2, we have

Substituting x = x0 and taking into account the inequalities above and condi-tion (5), we conclude that c0=0. Dividing these inequalities by c0 anddenoting

while the first one implies

and

Note that we take 0 in the right-hand side of the inequalities, because aseparating hyperplane for a cone can be chosen to pass through the origin.The second inequality implies

it is also convex. Then, condition (3) means that F n S is empty. Thus, aseparating hyperplane exists, i.e.,

All the assumptions of Theorem 2.1 hold; hence, F is convex. Take anotherset in R3,

Proof. Consider


Then, the inequalities

define a single point x = 0 and f0(0) = 0; thus, (3) holds. But it is easy tocheck that (6) fails for any T.

Hence, the lack of condition (4) makes the result of Theorem 4.1 incorrect.Note that, for the example in Ref. 15 (constructed to show the impossibilityof an extension of the S-procedure to two constraints), condition (4) doesnot hold.

Finally, condition (5) is also necessary, as the following example shows.

Example 4.3. Take n = 3 and consider the quadratic forms

Then, all the conditions of Theorem 4.1 except (4) are satisfied, and it iseasy to check that (3) holds. However, the inequality (6) fails to be correctfor all x with any T. For instance, taking x: = (1,0,0)T, x = (0, 1,0)T, weobtain contradictory inequalities,

Thus, the condition n>3 in Theorem 4.1 is essential.

Example 4.2. For n = 3, take

we obtain linear inequalities for T, the only solution of which is T = 0. Butof course,

However, there exist no T1 > 0, T2 > 0 such that (6) holds. Indeed, substitutingx = (1, 0)T, x = (0, l)T in an equivalent inequality,

It is obvious that

and


Theorem 4.2. Suppose that n > 2, ai < 0, i = 1, 2, and there exists n e U2

satisfying (4). Then, (11) holds if and only if there exist T1>0, T2>0 suchthat

Introduce the (n+ 1) x (n+ 1) matrices

the question of interest is when

Then, to complete the proof, we substitute either x1 or x2 in the inequality,depending on the sign of c2, and conclude that c0 = 0 implies c = 0 for allcases.

4.2. Nonhomogeneous Quadratic Functions. Now, we provide a ver-sion of the theorem above for nonhomogeneous quadratic functions. Let

The new condition a2=0 is needed to avoid the situation c0 = c1 = 0, in whichcase we obtain c2a2 = 0; indeed, for a2 = 0 we cannot conclude that c = 0.Nevertheless, the case a2 = 0 can be treated. Instead of condition (8), weneed the following: there exist x1, x2 such that

if and only if there exists T1 >0 such that (6), (7) hold.

The proof remains the same, but

Then,

Sometimes, we need a version of Theorem 4.1 where one of the inequal-ities fi (x) < ai is replaced by an equality. This version is provided below.

Proposition 4.1. Suppose n>3, a2=0; there exist ueR2 satisfying (4),and x0eUn such that



Remark 4.1. The condition a i<0 just means that

Any other point x0 can be taken instead of 0. Condition (12) is equivalentto two inequalities, one of which is

if we use the Schur criterion for the nonnegative definiteness of a 2 x 2 blockmatrix; see, e.g., Ref. 17.

Proof of Theorem 4.2. As in the proof of Theorem 2.2, we extend thespace of variables and pass to the quadratic forms

Let us prove that, if (11) holds, then a similar condition holds for fi(z); i.e.,

If t=0, then

This implies that

If t = 0, then the conditions

are equivalent to

Consider two one-dimensional quadratic functions,

They have nonnegative quadratic terms and negative zero-order terms. It isnot hard to understand by considering all possible cases that

for all y large enough (positive or negative). Thus by (11), we have

Decoding these conditions as in the proof of Theorem 4.1, we arrive at (15).

with the last condition replaced by f2 = 0 in the case (14). By Theorem 2.1,F is an acute closed convex cone; obviously, S shares the same properties.Condition (13) or (14) means that the intersection of F and S contains asingle point 0. Such two cones can be strictly separated; see Lemma 8.1 inthe Appendix. Thus, there exists ceR3 such that

Proof. Construct the set F as in Theorem 4.1 and

Theorem 4.3. Suppose that (4) holds, n>3, and there existsx0:f1(x0)<0,f2(x0)<0[the last condition should be f2(x0) = 0 in the case(14)]. Then, (13) [or (14)] holds if and only if there exist T1>0, T2>0 [T2

of an arbitrary sign for the case (14)] such that

Note that we deal with quadratic forms in Rn as in Theorem 4.1, but withstrict inequality and ai = 0, i = 0, 1,2. We consider simultaneously a versionof the problem where one of the inequalities is replaced by an equality,

and (4) holds. Thus, we obtain (12). The opposite inclusion is obvious.

4.3. Strict Inequality. Finally, we provide a version of the S-procedurewith strict inequality. Now, the problem is when the following conditionholds:

So, the proposition above is proved for all situations. Now, we are able toapply Theorem 4.1 for fi(z); all its assumptions are satisfied. Indeed,

On the other hand, for z = (x, 0)T, we have

But this is possible only if


However, there is no T1 such that

Finally, from Theorem 4.3 [case (13)], we extract the S-procedure forstrict inequalities (Ref. 10). The condition n>3 below is not necessary.

Here,

Then, we can set u1=0, u2 = -l, and from Theorem 4.2 we obtain thefollowing proposition.

Proposition 4.2. If there exists x0: (p1(x0)<0, then (p0(x)<0 for allx: P1 (x) < 0 if and only if <jt0 (x) < T 1 <p1 (x) for all x with some T 1 > 0.

This result is due to Yakubovich (Ref. 15); for the case of homogeneousquadratic forms, it was known earlier (Ref. 10).

Now, the same trick applied to Theorem 4.3 [with condition (14)] pro-vides the following result.

Proposition 4.3. If (A 0 x ,x )>0 for all xeRn, n>3, x=0 such that(A1x , x) = 0, then A0+ T1A1>0 for some T1eR.

This is the famous Finsler theorem or Debreu lemma; see Refs. 8, 14,16; it has a broad range of applications. Actually, the condition n>3 canbe omitted.

It is interesting to note that Theorem 4.2 for nonhomogeneous quadraticfunctions cannot be extended to the case of equality as was done in Proposi-tion 4.1 for quadratic forms; see the example below.

Example 4.4. Let n = 3 and

then,

4.4. Discussion and Examples. Now, we describe various consequencesof the above theorems and compare them with known results.

Let us show that most of the known results on the S-procedure followfrom Theorems 4.1 to 4.3. Take


centered at 0; the problem is to describe all ellipsoids E0 = E(0, A0) contain-ing their intersection.

Theorem 5.1. Suppose that n > 3, A1 > 0, A2 > 0. Then, £0 => E1 n E2 ifand only if there exist T1 >0, T2>0, T1 + T2< 1 such that

an ellipsoid with center ai and matrix Ai. Usually, we assume that A i>0;sometimes, unbounded ellipsoids with Ai>0 can also be considered.

5.1. Intersection of Two Ellipsoids with Common Center. Consider twoellipsoids

Proposition 4.4. If (A0x,x)>0 for all xeRn, n>3, x=0 such that(A 1 x , x)<0, then A0- T1A1>0 for some T1>0.

Thus, the known results on the S-procedure with one constraint can beextracted easily from the theorems of Section 4. Theorem 4.1 can be consid-ered as a validation of the S-procedure for two quadratic forms. Theorems4.2 and 4.3 also look like extensions of this sort. However, they containsome restrictive assumptions. For instance, Theorem 4.2 requires that condi-tion (4) holds for the matrices Ai, not for the original matrices Bt. Wecannot guarantee that this condition is satisfied, even for positive-definitematrices Bi (this prevents one from applying Theorem 4.2 in Section 5.1below for ellipsoids with distinct centers). Another difficulty is met if we tryto exploit Theorem 4.3 for two constraints. If (4) holds, then f1 (x), f2 (x)have no common nontrivial zero; hence, one of the sets { x : f i ( x ) < 0 } con-tains the other, and we deal actually with one constraint.

5. Applications to Ellipsoidal Techniques

In Ref. 17, one can find numerous examples of system and controlproblems which can be described in terms of ellipsoids and operations withthem; the main tool here is the S-procedure and the conversion to linearmatrix inequalities (LMI). We present some new results in this direction,which can be obtained by use of the above extensions of the S-procedure.In this section, we denote by


can be taken as well. A one-dimensional optimization problem enjoys manynice properties. For instance, the function

which means that a solution of the matrix optimization problem is achievedif we take equalities in (16).

Note that the claim is the same if we deal with any optimality criterionhaving this monotonicity property; for instance,

The inequality 0<T is strict if A2 is not positive definite.

Proof. By Theorem 5.1, any A0 containing the intersection of E1 andE2 satisfies inequality (16). But if A<B for two matrices A>0, B>0, then

is given by A0 = A(T), where A(T) = TA1 + (1 -T)A2, and T is a solution ofthe 1D-problem

as an objective function.

Proposition 5.1. If n>3, A1>0, A2>0, then the solution of theproblem

Indeed, this is a direct corollary of Theorem 4.1 with ai = 1, i=0, 1, 2.All assumptions of the theorem are satisfied with x0 = 0, u = (1,0)T. A previ-ous result (Ref. 17, p. 44) addressed (16) as just a sufficient condition.

If one looks for the minimal ellipsoid containing the intersection of twoellipsoids, there is no need to solve the optimization problem with the LMI(16) as constraints; it suffices to solve a one-dimensional optimization prob-lem on the unit line segment. The qualifier "minimal" can be understood indifferent meanings; let us deal with


is well defined on [0, 1] or on [0, 1) if det A2 = 0; it is convex and smoothwith derivatives (Ref. 20)


Moreover, if A2 is of rank 1,

then the optimization problem can be solved explicitly; this leads to widelyused recurrent algorithms in set-membership identification (Refs. 18-20).

Attempts to formulate a counterpart of Theorem 5.1 for ellipsoids withdifferent centers relying on Theorem 4.2 for nonhomogeneous quadraticfunctions fail because assumption (4) is not satisfied. Of course, the sufficientpart of such proposition holds (this is a well-known fact, Ref. 17, p. 44),but not the necessary one. Moreover, the optimal ellipsoid found for aparametrized family is just suboptimal among all circumscribed ellipsoids.

in Rn, their sum is

We wish to describe all ellipsoids

The sufficient condition for E0 r> E1 + E2 is well known (Ref. 17, p. 46). Weshow that it is also necessary.

Without loss of generality, we can put

Indeed,

and if

then we also have

5.2. Sum of Two Ellipsoids. Given two ellipsoids


We now take x = (x 1 , x2)TeR2 n; the condition

is equivalent to

This can be written as

for corresponding 2x2 block matrices Ai. Thus, we can apply Theorem 4.1to this problem; note that A1 + A2>0.

Proposition 5.2. An ellipsoid E(0, B0) contains the sum of ellipsoidsE(0, B1) and E(0, B2) in Rn, n>2, with Bi,>0, i= 1, 2, if and only if thereexist T1>0, T2>0, T1 + T2<1 such that

The "if part" of this result is known (Ref. 17, p. 46).

Instead of solving the LMI (17), one can consider a simple two-param-eter family of solutions.

Lemma 5.1. If T1,2>0, B-1 = T-1B-1 + T-1B-1, then inequality (17)holds.

The proof is based on standard conditions for block matrices to benonnegative definite. We have

thus,

The second condition is

this equality can be checked by direct calculations; cf. Ref. 18, identity (A.3)in Appendix A.

It follows that finding the smallest ellipsoid containing the sum can beconverted to a one-dimensional optimization problem (we can take T1 + T2 =1) similarly to the case of the intersection above,


6. Applications to Optimization

6.1. Quadratic Form Subject to Two Quadratic Constraints. Considerthe following quadratic optimization problem with quadratic constraints:

where, as usual,

Such a problem can arise, for instance, if we look for the most remote pointin the intersection of two ellipsoids or the smallest ellipsoid of a given shape,containing the intersection (all ellipsoids are centered at the origin). A similaranalysis can be performed if the optimization is subject to two inequalitiesinstead of problem (18); obvious changes should be done in the resultsbelow, and we disregard this modification.

We introduce the two-dimensional optimization problem

We call this problem dual, while the original problem (18) is primal. Notethat the dual problem contains matrix inequalities and scalar variables, soit is a particular case of linear programming with LMI constraints. Theprimal problem is nonconvex (the set Q is always nonconvex, because itcontains a quadratic equality and we made no assumptions on the negative-definiteness of matrices). Nevertheless, duality theory holds in this case.

Theorem 6.1. Suppose that


Then:

(a) for all xeQ, TeT, the inequality v(T)>f0(x) holds;(b) the equality \ i / (T*)=f 0 (x*) holds if x*, T* are solutions of the

corresponding problems, and in this case the following comple-mentarity conditions are satisfied:

Proof. (a) By introducing the function

for any xeQ, TeT, we obtain

and equality holds only for

On the other hand,

But A(T)<,0 for any feasible T; thus, L(x, T ) < V ( T ) with equality forA(T)x = 0 [because, if (Ax, x) = 0, A <,0, then Ax = 0]. So, we conclude that(a) holds.

(b) Denote by a0 the supremum of f0(x) in the primal problem; then,

f 0 (x)<a 0 , for all xeQ.

Thus, we are in the framework of the S-procedure and Proposition 4.1 isapplicable. It states that there exists T, which we now denote as T*, suchthat

This means that T*eT, and from (a) we have

hence,

We conclude that T* is a solution of the dual problem, and if x* is a solutionof the primal problem, then

Then, a0 = 0; however, there is no xeQ with f0(x) = 0.

The existence of a solution of the primal problem can be guaranteed ifu1 >0; then, the set Q is compact.

The duality result of Theorem 6.1 suggests the following approach tothe solution of (18): first, solve the dual problem and, for its solution T*,find an eigenvector x* associated with the zero eigenvalue of A (T*) ; it willbe a solution of the primal problem. To verify such an approach rigorously,several details should be clarified. First, it is not obvious how to solve thedual problem. Of course, it is just 2D-optimization, and some brute forcemethod can be exploited [e.g., for a set of gridding points in {TeR2: T1 >0},check the condition A(T) <0 and for such admissible points find min yf(T)].On the other hand, (19) is a linear programming problem with LMI con-straints and general LMI solvers can be used. Some specially-tailoredmethods can also be constructed. Second, we get some approximate solutionof the dual problem; how will it affect the accuracy of the primal solution?Third, it may happen that the matrix A(T*) has a multiple zero eigenvalue.How to distinguish the desired eigenvector? These questions are important,but they are out of the scope of the present paper.

6.2. Quadratic Function Subject to One Quadratic Constraint. Thenonhomogeneous counterpart of (18) can be treated if we have one quadraticconstraint,

By considering the conditions for equalities, which were proved in (a), weobtain the complementarity conditions.

The problem of the existence of solutions in primal and dual problemsis worthy of special consideration. It is a well-known fact in quadratic pro-gramming that, if a quadratic objective function is bounded from above ona polyhedral set, then it attains its maximum on this set. One can expect asimilar effect in problem (18), but this is not the case.

Example 6.1. For arbitrary n>2, take


Such problems, with


were the subject of intensive investigations (Refs. 22-25). Recently, Ben-Taland Teboulle (Ref. 21) have considered the general form of (25), and theyhave underlined the role of condition (2) in such problems. Most researchersunderstood that simple methods available for solving (25) in spite of itsnonconvexity are closely related to some convexity characteristics (cf. thetitle of Ref. 21). We can explain this hidden convexity property in view ofTheorem 2.2. Indeed, (25) is equivalent to the 2D-optimization problem

which is convex by Theorem 2.2. Thus, we obtain the following necessaryand sufficient optimality condition.

Theorem 6.2. Suppose that n>2 and there exist x0eRn: <p1(x0)<0,u>0: U0A0 + n1A1>0. Then, x*eQ is a solution of (25) if and only if thereexists T* > 0 such that

This statement leads to numerical methods for solving (25), but we donot discuss them here. For the particular case

<p1(x) = ||x||2-1,

such methods were known; see the references cited above.

7. Conclusions

Convexity of the 3D-image or 2D-image of Rn under quadratic map-pings is established by Theorems 2.1 and 2.2. These results provide a power-ful tool toward a unified validation of known results related to the 5-procedure, ellipsoidal techniques, quadratic optimization, as well as theirextensions. We confined ourself to just a few applications, but there areother problems in robust control with real parameter uncertainties, robustidentification and parameter estimation, and numerical linear analysis whichcan be investigated with the use of the proposed techniques.

In this paper, we dealt only with finite-dimensional problems. But inrecent years there were valuable works by Yakubovich, Matveev, Megretskii,Savkin, Petersen, and others devoted to the infinite-dimensional case. The

situation is significantly different here. For instance, Matveev (Ref. 29) hasproved that, under some assumptions on the spectrum of self-adjoint opera-tors Ai, i= 1, . . . , m, in a Hilbert space H, the quadratic image


is almost convex in Rm for arbitrary m; here, f(x) is a quadratic map fromH to Rm with components fi(x) = (A ix, x). This is in sharp contrast to thefinite-dimensional case (see Sections 2 and 3 of the present paper), whereonly m = 1, 2 possess convexity. Probably, some connections between theseseparated cases can be found.

8. Appendix

The following two lemmas relating to finite-dimensional convex coneswere exploited in the text.

Lemma 8.1. Suppose that K1, K2 are two acute closed convex conesin Rn with no common nonzero elements: K1 n K2= {0}. Then, there existsa hyperplane, strictly separating their nonzero elements: 3ceRn: (c, x)>0,0=xeK1 , (c, x )<0, 0=xeK 2 .

Lemma 8.2. Let K be an acute closed convex cone in Rn. Then, thereexists a strictly positive linear functional on K\0: BceRn, (c, x)>0, xeK,x=0.

These results are particular cases of Ref. 30, Theorem 2.7, though theirdirect proofs can be obtained by standard tools of convex analysis.

References

1. TOEPLITZ, O., Das algebraische Analogen zu einem Satz von Fejer, Mathemat-ische Zeitschrift, Vol. 2, pp. 187-197, 1918.

2. HAUSDORFF, F., Der Wervorrat einer Bilinearform, Mathematische Zeitschrift,Vol. 3, pp. 314-316, 1919.

3. AU-YENG, Y. H., and TSING, N. K., An Extension of the Hausdorff-ToeplitzTheorem on the Numerical Range, Proceedings of the American MathematicalSociety, Vol. 89, pp. 215-218, 1983.

4. FAN, M. K. H., and TITS, A. L., On the Generalized Numerical Range, Linearand Multilinear Algebra, Vol. 21, pp. 313-320, 1987.

5. FAN, M. K. H., and TITS, A. L., m-Form Numerical Range and the Computationof the Structured Singular Value, IEEE Transactions on Automatic Control, Vol.33, pp. 284-289, 1988.

6. HORN, R. A., and JOHNSON, C. R., Topics in Matrix Analysis, CambridgeUniversity Press, Cambridge, England, 1991.

7. GUSTAFSON, K. E., and RAO, D. K. M., Numerical Range: The Field of Valuesof Linear Operators and Matrices, Springer, New York, New York, 1997.

8. DINES, L. L., On the Mapping of Quadratic Forms, Bulletin of the AmericanMathematical Society, Vol. 47, pp. 494-498, 1941.

9. BRICKMAN, L., On the Field of Values of a Matrix, Proceedings of the AmericanMathematical Society, Vol. 12, pp. 61-66, 1961.

10. YAKUBOVICH, V. A., The S-Procedure in Nonlinear Control Theory, VestnikLeningradskogo Universiteta, No. 1, pp. 62-77, 1971 (in Russian).

11. VERSHIK, A. M., Quadratic Forms Positive on a Cone and Quadratic Duality,Zapiski Nauchnogo Seminara LOMI, Vol. 134, pp. 59-83, 1984 (in Russian).

12. FINSLER, P., Uber das Vorkommen definiter und semidefiniter Formen in Scharenquadratischer Formen, Commentaria Mathematicae Helvetia, Vol. 9, pp. 188-192, 1937.

13. HESTENES, M. R., and MCSHANE, E. J., A Theorem on Quadratic Forms andIts Application in the Calculus of Variations, Transactions of the AmericanMathematical Society, Vol. 47, pp. 501-512, 1940.

14. UHLIG, F., A Recurring Theorem about Pairs of Quadratic Forms and Extensions:A Survey, Linear Algebra and Applications, Vol. 25, pp. 219-237, 1979.

15. YAKUBOVICH, V. A., Minimization of Quadratic Functionals under QuadraticConstraints and the Necessity of a Frequency Condition in the Quadratic Criterionfor Absolute Stability of Nonlinear Control Systems, Soviet MathematicsDoklady, Vol. 14, pp. 593-596, 1973.

16. FRADKOV, A. L., Duality Theorems for Certain Nonconvex Extremum Problems,Siberian Mathematical Journal, Vol. 14, pp. 247-264, 1973.

17. BOYD, S., EL GHAOUI, L., FERRON, E., and BALAKRISHNAN, A. V., LinearMatrix Inequalities in Systems and Control Theory, SIAM Publications, Philadel-phia, Pennsylvania, 1994.

18. SCHWEPPE, F., Uncertain Dynamic Systems, Prentice Hall, Englewood Cliffs,New Jersey, 1973.

19. FOGEL, E., and HUANG, Y. F., On the Value of Information in System Identifica-tion: Bounded Noise Case, Automatica, Vol. 18, pp. 229-238, 1982.

20. DURIEU, D., POLYAK, B. T., and WALTER, E., State versus Determinant inEllipsoidal Outer-Bounding, with Application to State Estimation, Proceedings ofthe 13th World IFAC Congress, San Francisco, California, Vol. 1, pp. 43-48,1996.

21. BEN-TAL, A., and TEBOULLE, M., Hidden Convexity in Some Convex Quadrat-ically Constrained Quadratic Programming Problems, Mathematical Program-ming, Vol. 72, pp. 51-63, 1996.

22. Fu, M., Luo, Z. Q., and YE, Y., Approximation Algorithms for Quadratic Pro-gramming, International Symposium on Optimization and Computation, Hay-ana, Japan, 1996.


23. GOLUB, G. H., and VON MATT, U., Quadratically Constrained Least Squaresand Quadratic Problems, Numerische Mathematik, Vol. 59, pp. 561-580, 1991.

24. STERN, R. J., and WOLKOWICZ, H., Indefinite Trust Region Subproblems andNonsymmetric Eigenvalue Perturbations, SIAM Journal on Optimization, Vol.5, pp. 286-313, 1995.

25. YE, Y., An Affine Scaling Algorithm for Nonconvex Quadratic Programming,Mathematical Programming, Vol. 56, pp. 285-300, 1992.

26. DINES, L. L., On Linear Combinations of Quadratic Forms, Bulletin of the Ameri-can Mathematical Society, Vol. 49, pp. 388-393, 1943.

27. HESTENES, M. R., Application of the Theory of Quadratic Forms in Hilbert Spaceto the Calculus of Variations, Pacific Journal of Mathematics, Vol. 1, pp. 525-581, 1951.

28. ARUTJUNOV, A. V., Extremum Conditions: Abnormal and Singular Problems,Factorial, Moscow, Russia, 1997 (in Russian).

29. MATVEEV, A. S., Lagrange Duality in the Theory of Nonconvex Optimizationand Extensions of the Toeplitz-Hausdorff Theorem, Algebra and Analysis, Vol.7, pp. 143-181, 1995.

30. KLEE, V., Separation Theorems for Convex Cones, Proceedings of the AmericanMathematical Society, Vol. 6, pp. 313-316, 1955.

JOTA. VOL. 99, NO. 3, DECEMBER 1998 583

Documents

Convexity of Quadratic Transformations and Its Use …lab7.ipu.ru/files/polyak/JOTA'98.pdfQuadratic forms, convexity, numerical range, S-pro-cedure, nonconvex quadratic optimization,