Download pdf - Lecture: Quadratic optimization - KTH€¦ · The scalar quadratic optimization problem minimize x∈R 1 2 hx2 +cx+c 0 has a ﬁnite solution, if and only if, • h= 0 and c= 0, or

Lecture: Quadratic optimization

1. Positive definite och semidefinite matrices

2. LDLT factorization

3. Quadratic optimization without constraints

4. Quadratic optimization with constraints

5. Least-squares problems

Lecture 1 Quadratic optimization

Quadratic optimization without constraints

minimize1

2xTHx+ cTx+ c0

where H = HT ∈ Rn×n, c ∈ Rn, c0 ∈ R.

• Common in applications, e.g.,

– linear regression (fitting of models)

– minimization of physically motivated objective functions such as

minimization of energy, variance etc.

– quadratic approximation of nonlinear optimization problems.


The quadratic term

Let

f(x) =1

2xTHx+ cTx+ c0

where H = HT ∈ Rn×n, c ∈ Rn, c0 ∈ R.

We can assume that the matrix H is symmetric.

If H is not symmetric it holds that

xTHx =1

2xTHx+

1

2(xTHx)T =

1

2xT(H+HT)x,

i.e., xTHx = xTHx where H =1

2(H+HT) is a symmetric matrix.


Positive definite and positive semi-definite matrices

Definition 1 (Positive semidefinite). A symmetric matrix

H = HT ∈ Rn×n is called positive semidefinite if

xTHx ≥ 0 for all x ∈ Rn

Definition 2 (Positive definite). A symmetric matrix H = HT ∈ Rn×n

is called positive definite if

xTHx > 0 for all x ∈ Rn − {0}


How do you determine if a symmetric matrix

H = HT ∈ Rn×n is positive (semi-) definite:

• LDLT factorization (suitable for calculations by hand).

Chapter 27.6 in the book.

• Use the spectral factorization of the matrix H = QΛQT, where

QTQ = I and Λ = diag(λ1, . . . , λn) where λk are the eigenvalues of

the matrix. From this it follows that

– H is positive semidefinite if and only if λk ≥ 0, k = 1, . . . , n

– H is positive definite if and only if λk > 0, k = 1, . . . , n


LDLT factorization

Theorem 1. A symmetric matrix H = HT ∈ Rn×n is positive

semidefinite if and only if there is a factorization H = LDLT where

L =

1 0 0 . . . 0

l21 1 0 . . . 0

l31 l32 1 . . . 0...

. . .

ln1 ln2 ln3 . . . 1

D =

d1 0 0 . . . 0

0 d2 0 . . . 0

0 0 d3 . . . 0...

. . .

0 0 0 . . . dn

and dk ≥ 0, k = 1, . . . , n.

H is positive definite if and only if dk > 0, k = 1, . . . , n, in the LDLT

factorization.

Proof: See chapter 27.9 in the book.


The factorization LDLT is determined by one of the following methods

• Elementary row and column operations,

• Completion of squares,


LDLT factorization - Example

Determine the LDLT factorization of

H =

2 4 6 8

4 9 17 22

6 17 44 61

8 22 61 118

The idea is now to perform symmetric row- and column-operations from

the left and right in order to form the diagonal matrix D.

The row-operations can be performed with lower triangular invertible

matrices, and the product of lower triangular matrices is lower triangular

and also the inverse.



Perform row-operations to get zeros in the first column:

ℓ1H =

1 0 0 0

−2 1 0 0

−3 0 1 0

−4 0 0 1

2 4 6 8

4 9 17 22

6 17 44 61

8 22 61 118

=

2 4 6 8

0 1 5 6

0 5 26 37

0 6 37 86

and the corresponding column-operations:

ℓ1HℓT1=

2 4 6 8

0 1 5 6

0 5 25 30

0 6 30 35

1 −2 −3 −4

0 1 0 0

0 0 1 0

0 0 0 1

=

2 0 0 0

0 1 5 6

0 5 26 37

0 6 37 86



Perform row-operations to get zeros in the second column:

ℓ2ℓ1HℓT1=

1 0 0 0

0 1 0 0

0 −5 1 0

0 −6 0 1

2 0 0 0

0 1 5 6

0 5 26 37

0 6 37 86

=

2 0 0 0

0 1 5 6

0 0 1 7

0 0 7 50


ℓ2ℓ1HℓT1ℓT2=

2 0 0 0

0 1 5 6

0 0 1 7

0 0 7 50

1 0 0 0

0 1 −5 −6

0 0 1 0

0 0 0 1

=

2 0 0 0

0 1 0 0

0 0 1 7

0 0 7 50



Perform row-operations to get zeros in the third column:

ℓ3ℓ2ℓ1HℓT1ℓT2=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 −7 1

2 0 0 0

0 1 0 0

0 0 1 7

0 0 7 50

=

2 0 0 0

0 1 0 0

0 0 1 7

0 0 0 1


ℓ3ℓ2ℓ1HℓT1ℓT2ℓT3=

2 0 0 0

0 1 0 0

0 0 1 7

0 0 0 1

1 0 0 0

0 1 0 0

0 0 1 −7

0 0 0 1

=

2 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

= D



We finally got the diagonal matrix

D =

2 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

Since all elements on the diagonal of D are positive the matrix H is

positive definite.

The values on the diagonal are not the eigenvalues of H, but they have

the same sign as the eigenvalues, and that is all we need to know.



How do we determine the L matrix ?

ℓ3ℓ2ℓ1HℓT1ℓT2ℓT3= D , i.e., H = LDLT if L = (ℓ3ℓ2ℓ1)

−1 = ℓ−1

1ℓ−1

2ℓ−1

3

L =

1 0 0 0

−2 1 0 0

−3 0 1 0

−4 0 0 1

−1

1 0 0 0

0 1 0 0

0 −5 1 0

0 −6 0 1

−1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 −7 1

−1

=

1 0 0 0

2 1 0 0

3 0 1 0

4 0 0 1

1 0 0 0

0 1 0 0

0 5 1 0

0 6 0 1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 7 1

=

1 0 0 0

2 1 0 0

3 5 1 0

4 6 7 1


Unbounded problems

Let f(x) =1

2xTHx+ cTx+ c0.

Assume that H is not positive semidefinite. Then there is a d ∈ Rn such

that dTHd < 0. It follows that

f(td) =1

2t2dTHd+ tcTd+ c0 → −∞ when t→ ∞

Conclusion: A necessary condition for f(x) to have a minimum is that

H is positive semidefinite.


Assume that H is positive semidefinite. Then we can use the

factorization H = LDLT which gives

minimize1

2xTHx+ cTx+ c0

= minimize1

2xTLDLTx+ cT(LT)−1LTx+ c0

= minimizen∑

k=1

1

2dky

2

k + ckyk + c0

= c0 +n∑

k=1

minimize1

2dky

2

k + ckyk

where we have introduced the new coordinates y = LTx and c = L−1c.

The optimization problem separates to n independent scalar quadratic

optimization problems, which are easy to solve.


Scalar Quadratic optimization without constraints

The scalar quadratic optimization problem

minimizex∈R1

2hx2 + cx+ c0

has a finite solution, if and only if,

• h = 0 and c = 0, or

• h > 0.

In the first case there is no unique optimum.

In the second case1

2hx2 + cx+ c0 =

1

2h(x+ c/h)2 + c0 − c2/(2h), and

the optimum is achieved for x = −c/h.


Solving the minimization problems separately shows that

y =[

y1 . . . yn

]T

is a minimizer if

(i)′ dk ≥ 0, k = 1, . . . , n

(ii)′ dkyk = −ck⇔

(i) H is positive semidefinite

(ii) DLTx = −L−1c

Constraint (ii) can be replaced with LDLTx = −c i.e., Hx = −c.

Furthermore, f(x) =1

2xTHx+ cTx+ c0 is unbounded (from below) if

(i)′ some dk < 0, or

(ii)′ one of the equations

dkyk = −ckdo not have a solution

i.e., dk = 0 and ck 6= 0

⇔

(i) H is not positive semi-definite

(ii) the equation Hx = −c

do not have a solution


The above arguments can be formalized as the following theorem

Theorem 2. Let f(x) =1

2xTHx+ cTx+ c0. Then x is a minimizer if

(i) H is positive semidefinite

(ii) Hx = −c

If H is positive definite then x = −H−1c.

If H is not positive semidefinite or Hx = −c do not have a solution,

then f is not limited from below.

Comment 1. The condition that Hx = −c has a solution is equivalent

to c ∈ R(H). Therefore, f is a minimizer if and only if H is positive

semidefinite and c ∈ R(H).


Quadratic optimization with linear constraints

minimize1

2xTHx+ cTx+ c0

s.t. Ax = b(1)

where H = HT ∈ Rn×n, A ∈ Rm×n, c ∈ Rn, and b ∈ Rm, and c0 ∈ R.

• We assume that b ∈ R(A) and n > m, i.e., N (A) 6= {0}.

• Assume that N (A) = span{z1, . . . , zk} and define the nullspace

matrix Z =[

z1 . . . zk

]

.

• If Ax = b it holds that an arbitrary solution to the linear constraint

has the form x = x+ Zv, for some v ∈ Rk. This follows since

A(x+ Zv) = Ax+AZv = b+ 0 = b


With x = x+ Zv inserted in f(x) =1

2xTHx+ cTx+ c0 we get

f(x) =1

2vTZTHZv +

(ZT(Hx+ c)

)Tv + f(x).

The minimization problem (1) is thus equivalent with

minimize1

2vTZTHZv +

(ZT(Hx+ c)

)Tv + f(x).

We can now apply Theorem 2 to obtain the following result:

Theorem 3. x is an optimal solution to (1) if

(i) ZTHZ is positive semidefinite.

(ii) x = x+ Zv where ZTHZv = −ZT(Hx+ c)


Comment 2. The second condition in the theorem is equivalent with

the existence of a u ∈ Rm such that

H −AT

A 0

x

u

=

−c

b

Proof: The second constraint in the theorem can be written

ZT(H(x+ Zv) + c) = 0

Since N (A) = R(Z) this is equivalent with

H(x+ Zv︸︷︷︸

x

) + c ∈ N (A)⊥ = R(AT)

⇔ Hx+ c = ATu, for some u ∈ Rm


Linear constrained QP: Example

Let f(x) = 1

2xTHx+ cTx+ c0, where

H =

−2 0

0 1

, c =

1

1

, c0 = 0.

Consider minimization of f over the feasible region F = {x |Ax = b}

where

A =[

2 1]

, b = 2.

I.e. the minimization problem

min −x2

1+ 1

2x22+ x1 + x2

s.t. 2x1 + x2 = 2


Linear constrained QP: Example - Nullspace method

The nullspace of A is spanned by Z and the reduced Hessian is given by

Z =

−1

2

, H = ZTHZ = 2

Since the reduced Hessian is positive definite the problem is strictly

convex. The unique solution is then given by taking an arbitrary x ∈ F,

e.g. x = (1 0)T and solving

Hv = −c = −ZT (Hx+ c) = −3 ⇒ v = −3

2

Then

x = x+ Zv =

1

0

+

−1

2

(−3

2) =

5

2

−3


Linear constrained QP: Example - Geometric illustration

Z

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

x

F

x

−3

2Z


Linear constrained QP: Example - Lagrange method

Assume that we know the problem is convex, then

H −AT

A 0

x

u

=

−c

b

,

−2 0 −2

0 1 −1

2 1 0

x1

x2

u

=

−1

−1

2

Solving

−2x1 − 2u = −1

x2 − u = −1⇒

x1 = 1

2− u

x2 = −1 + u,

then 2x1 + x2 = 2 implies 2(12− u) + (−1 + u) = −u = 2, so u = −2

and then x1 = 5/2 and x2 = −3.

As for the nullspace method.


Linear constrained QP: Example - Sensitivity analysis

From the Lagrange method we know that u = −2, it tells us how the

objective function changes due to variations in the right hand side b of

the linear constraint, in first approximation.

Assume that b := b+ δ, then from the previous calculations

u = −(2 + δ) and then x1 = 5/2 + δ and x2 = −3− δ. Then

f(x) =1

2

5

2+ δ

−3− δ

T

−2 0

0 1

5

2+ δ

−3− δ

+

1

1

T

5

2+ δ

−3− δ

= −9

4−2δ −

1

2δ2


Example 2: Analysis of a resistance circuit

x12

x13

x23

x24

x34

x35

x45

r12

r13

r23

r24

r34r35

r45

v12

v13

v23

v24

v34

v35

v45

u1

u2

u3

u4

u5

I

• Let a current I go through the circuit from node 1 to node 5.

• What is the solution to the circuit equation?


The current xij goes from node i to node j if xij > 0. Kirchoff’s currentlaw gives

x12 + x13 = i

x12 − x23 − x24 = 0

x13 + x23 − x34 − x35 = 0

x24 + x34 − x45 = 0

x35 + x45 = −i

This can be written Ax = b where x =[

x12 x13 x23 x24 x34 x35 x45

]T

,

A =

1 1 0 0 0 0 0

−1 0 1 1 0 0 0

0 −1 −1 0 1 1 0

0 0 0 −1 −1 0 1

0 0 0 0 0 −1 −1

, b =

I

0

0

0

−I


• The Node potentials are denoted uj , j = 1, . . . , 5

• The voltage drop over the resistor rij is denoted vij and is given by the

equations

vij = ui − uj

vij = rijxij (Ohms law)

This can also be written

v = ATu

v = Dx

where

u =[

u1 u2 u3 u4 u5

]T

v =[

v12 v13 v23 v24 v34 v35 v45

]T

D = diag(r12, r13, r23, r24, r34, r35, r45)


The circuit equations:

Ax = b

v = ATu

v = Dx

⇔

D −AT

A 0

x =

0

b

Since D is positive definite (rij > 0) this is, see comment 2, equivalent

to the optimality conditions for the following optimization problem

minimize1

2xTDx

s.t. Ax = b


Comment

In the lecture on network optimization we saw that the matrix

N (A) 6= 0 (then the matrix was called A) and that the nullspace

corresponded to cycles in the circuit.

The laws of electricity (Ohms law + Kirchoffs law) determines the

current that minimizes the loss of power. This follows since the objective

function can be written

1

2xTDx =

1

2

∑

ij

rijx2

ij


Comment 3. It is common to connect one of the nodes to earth. Let

us, e.g., connect node 5 to earth, then u5 = 0.

x12

x13

x23

x24

x34

x35

x45

r12

r13

r23

r24

r34r35

r45

v12

v13

v23

v24

v34

v35

v45

u1

u2

u3

u4

u5 = 0

I


With the potential in node 5 fixed at u5 = 0 the last column in the

voltage equation v = ATu is eliminated. The Voltage equation and the

current balance can be replaced with v = ATu and Ax = b where

A =

1 1 0 0 0 0 0

−1 0 1 1 0 0 0

0 −1 −1 0 1 1 0

0 0 0 −1 −1 0 1

, b =

I

0

0

0

, u =

u1

u2

u3

u4

The advantage with this is that the rows of A are linearly independent.

This facilitates the solution of the optimization problem.


Least-Squares problems

Application: Linear regression (model fitting)

Systems(t)

e(t)

y(t)

Problem: Fit a linear regressionsmodel to measured data.

Regressionmodel : y(t) =n∑

j=1

αjψj(t)

• ψk(t), k = 1, . . . , n are the regressors (known functions)

• αk = k = 1, . . . , n are the model parameters (to be determined)

• e(t) measurement noise (not known)

• s(t) observations.


Idea for fitting the model: Minimize the quadratic sum of the prediction

errors

minimize1

2

m∑

i=1

(n∑

j=1

αjψj(ti)− s(ti))2

=minimize1

2

m∑

i=1

(ψ(ti)Tx− s(ti))

2

=minimize1

2(Ax− b)T(Ax− b) (2)

ψ(t) =

ψ1(t)...

ψn(t)

, x =

α1

...

αn

, A =

ψ(t1)T

...

ψ(tn)T

, b =

s(t1)...

s(tn)


The solution of the Least-Squares problem (LSQ)

The Least-Squares problem (2) can equivalently be written

minimize1

2xTHx+ cTx+ c0

where H = ATA, c = −ATb, c0 =1

2bTb. We note that

• H = ATA is positive semidefinite since xTATAx = ‖Ax‖2 ≥ 0.

• c ∈ R(H) since c = −ATb ∈ R(AT) = R(ATA) = R(H).

The conditions in Theorem 2 and Comment 1 are satisfied and it follows

that the LSQ-estimate is given by

Hx = −c ⇔ ATAx = ATb (The Normal equation)


Linear algebra interpretation

x

x = ATu

xn

A

b−Ax

R(A)

N (A) N (AT)

R(AT) Ax

RmRn

• The Normal equation can be written

AT(b−Ax) = 0 ⇔ b−Ax ∈ N (AT)

This orthogonality property has a geometric interpretation which is

illustrated in the figure above.

• The Fundamental Theorem of Linear algebra explains the LSQ

solution geometrically.


Uniqueness of the LSQ solution

Case 1 If the columns in A are linearly independent,i.e., N (A) = {0}, it

holds that H = ATA is positive definite and hence invertible. The

Normal equations then has the unique solution

x = (ATA)−1Ab

Case 2 If A has linearly dependent columns, i.e., N (A) 6= {0} then it is

natural to chose the solution of smallest length. From the figure on

the previous slide it is clear that we should let x = ATu, where

AATu = Ax, and where x is an arbitrary solution to the LSQ

problem.


Our choice of x in case 2 can equivalently be interpreted as the solution

to the quadratic optimization problem

minimize1

2xTx

s.t. Ax = Ax

According to Comment 2 the solution to this optimization problem is

given by the solution to

I −AT

A 0

x

u

=

0

Ax

i.e. x = ATu, where AATu = Ax


Lasanvisningar

• Quadratic constraints without constraints: Chapter 9 and 27 in the

book.

• Quadratic optimization with constraints: Chapter 10 in the book.

• Least-Squares method: Chapter 11 in the book.