Lecture 2 Theory of Linear Programming - imag.fr · Lecture 2 Theory of Linear Programming (Linear Programming program, Linear, Theorem on Alternative, Linear Programming dual-ity)

Lecture 2

Theory of Linear Programming

(Linear Programming program, Linear, Theorem on Alternative, Linear Programming dual-ity)

2.1 Linear Programming: basic notions

A Linear Programming (LP) program is an optimization problem of the form

cTx→ min | Ax− b ≥ 0 (2.1.1)

where

• x ∈ Rn is the design vector;

• c ∈ Rn is a given objective;

• A is a given m × n constraint matrix, and b ∈ Rm is a given right hand side of theconstraints.

As any other optimization problem, (2.1.1) is called– feasible, if its feasible set x| Ax− b ≥ 0 is nonempty; a point from the latter set is calleda feasible solution to (2.1.1);– below bounded if it is either unfeasible or its objective cTx is below bounded on the feasibleset.

For a feasible below bounded problem, the lower bound of the objective on the feasibleset – the quantity

c∗ = infx:Ax−b≥0

cTx

is called the optimal value of the problem. For an unfeasible problem the optimal valueis, by definition, +∞, while for a feasible below unbounded problem the optimal value, bydefinition, is −∞.– solvable, if it is feasible, below bounded and the optimal value is attained: there existsfeasible x with cTx = c∗. An x of this type is called an optimal solution to (2.1.1).

31

32 LECTURE 2. THEORY OF LINEAR PROGRAMMING

A priori it is unclear whether a feasible and below bounded LP program is solvable: whyshould the infimum be attained? It turns out, however, that a feasible and below boundedLP program is always solvable.

2.2 An example: Tchebychev approximation and its

applications

In the majority of optimization textbooks, examples of LP programs deal economics, pro-duction planning, etc., and indeed the major applications of LP are in these areas. In ourcourse, however, we would prefer to use, as a basic example, a problem related to appliedmathematics/engineering. Let us start with a mathematical formulation.

2.2.1 The best uniform approximation

Problem [Tchebychev approximation] Given an M ×N

A =

aT1...aTM

and a vector b ∈ RM , solve the problem

minx∈Rn

‖Ax− b‖∞, ‖Ax− b‖∞ = maxi=1,...,M

|aTi x− bi|. (2.2.2)

As it is stated, the problem (2.2.2) is not an LP program – it’s objective is nonlinear. Wecan, however, immediately convert (2.2.2) into an equivalent LP program:

t→ min | − t ≤ aTi x− bi ≤ t, i = 1, ...,M. (2.2.3)

Thus (2.2.2) is equivalent to an LP program.A typical origin of the Tchebychev problem is as follows: we are interested to approximate

best of all a given “target” function β(t) on, say, the unit segment [0; 1] of values of t bya linear combination

∑Nj=1 xjαj of N given functions αj(t); the quality of approximation is

measured by its uniform distance from β, i.e., by the quantity

‖β −N∑j=1

xjαj‖∞ ≡ sup0≤t≤1

|β(t)−N∑j=1

xjαj|. (2.2.4)

While the problem (2.2.4) is important for several engineering applications, from the com-putational viewpoint, the drawback of the problem is that its objective is “implicit” – itinvolves maximization with respect to a continuously varying variable. As a result, eventhe related analysis problem – given a vector of coefficients x, to evaluate the quality ofthe corresponding approximation – can be quite difficult numerically. The simplest way to

2.2. AN EXAMPLE: TCHEBYCHEV APPROXIMATION AND ITS APPLICATIONS33

overcome this drawback is to approximate in (2.2.4) the maximum over t running through[0, 1] by the maximum over t running through a “fine finite grid”, e.g., through the finite set

TM = ti =i

M, | i = 1, ...,M.

With this approximation, the objective in the problem (2.2.4) becomes

maxi=1,...,M

|β(ti)−N∑j=1

xjαj(ti)| ≡ ‖Ax− b‖∞,

where the columns of A are the restrictions of the functions αj(·) on the grid TM , and b isthe restriction of β(·) on the grid. Consequently, the optimization problem (1.2.1) can beviewed as a discrete version of the problem (2.2.4).

2.2.2 Application example: synthesis of filters

As it has been already mentioned, problem (2.2.4) arises in a number of engineering applica-tions. Consider, e.g., the problem of synthesis a linear time-invariant (LTI) dynamic system(a “filter”) with a given impulse response.

A (continuous time) time-invariant linear dynamic system S is, mathematically, a transforma-tion from the space of “signals” – functions on the axis – to the same space given by the convolutionwith certain fixed function:

u(t) −→ y(t) =

∫ ∞−∞

u(s)h(t− s)ds,

u(·) being an input, and y(·) being the corresponding output of S. The convolution kernel h(·) is

a characteristic function of the system called the impulse response of S.

Consider the simplest synthesis problem

Problem [Filter Synthesis, I] Given a desired impulse response h∗(t) along with N “build-ing blocks” – standard systems Sj with impulse responses hj(·), j = 1, ..., N , assemble thesebuilding blocks in parallel with the amplification coefficients x1,..., xN into a system S insuch a way that the impulse response of the latter system will be as close as possible to thedesired impulse response h∗(·).Note that the structure of S is given, and all we can play with are the amplification coeffi-cients xj, j = 1, ..., N . The impulse response of our structure clearly is

h(t) =N∑j=1

xjhj(t).

Assuming further that h∗ and all hj vanish outside of [0, 1] 1 and that we are interested in

1 Assumptions of this type have a quite natural interpretation. Namely, the fact that impulse responsevanishes to the left of the origin means that the corresponding system is casual – its output till any timeinstant t depends solely on the input till the same instant and is independent of what happens with the inputafter the instant t. The fact that impulse response vanishes after certain T > 0 means that the memory ofthe corresponding system is at most T : output at a time instant t depends on what is the input startingwith the time instant t− T .


the best possible uniform on [0, 1] approximation of the desired impulse response h∗. Wecan pose our synthesis problem as (2.2.4) and further approximate it by (2.2.2). As weremember, the latter problem is equivalent to the LP program (2.2.3) and can therefore besolved by Linear Programming tools.

2.3 Duality in Linear Programming

The most important and interesting feature of Linear Programming as a mathematical entity(i.e., aside from computations and applications) is the wonderful LP duality theory we areabout to discuss. The question we are interested in now is:

Given an LP program

cTx→ min, | Ax− b ≥ 0, (2.3.5)

find a systematic way to bound from below its optimal value.Why this is an important question and how the answer to it helps to deal with LP

programs, this will be seen in the sequel. For the time being, let us just believe that thequestion is worthy of an effort. A trivial answer to the posed question is: solve (2.3.5) andlook what is the optimal value. There is, however, a smarter and a much more instructiveway to answer our question. Just to get an idea of this smart way, let us look at the followingexample:

x1 + x2 + ...+ x2012 → min |

x1 + 2x2 + ...+ 2014x2014 + 2015x2015 − 1 ≥ 0,

2015x1 + 2014x2 + ...+ 2x2015 + x2015 − 100 ≥ 0,... ... ...

We claim that the optimal value of the problem is ≥ 101

2013. If one would ask how have we

obtained this bound the answer would be very simple: “add the first two constraints to getthe inequality

2016(x1 + x2 + ...+ x2015)− 101 ≥ 0.

and divide the result by 2016. The LP duality is nothing but a straightforward generalizationof this simple trick.

2.3.1 Certificates for solvability and insolvability

Consider a (finite) system of scalar inequalities with n unknowns. To be as general aspossible, for the time being we do not assume the inequalities to be linear and allow for bothnon-strict and strict inequalities in the system, as well as for equations. Since an equationcan be represented by a pair of non-strict inequalities, our system always can be writtendown as

fi(x) Ωi 0, i = 1, ...,m, (2.3.6)

2.3. DUALITY IN LINEAR PROGRAMMING 35

where Ωi, for every i, is either the relation “>” or the relation “≥”. The basic questionabout (2.3.6) is

(?) Whether (2.3.6) has or has no solution.Knowing how to answer the question (?), we are able to answer many other questions.

E.g., to verify that a given real a is a lower bound on the optimal value of an LP program(LP) is the same as to verify whether the system

cTx < a, Ax− b ≥ 0

or, what is the same, the system

−cTx+ a > 0, Ax− b ≥ 0

has no solutions. Let us consider a seemingly simpler question(??) How to certify that (2.3.6) has or does not have a solution. Imagine, e.g., that you

are very smart and know the correct answer to (?); how could you convince somebody thatyour answer is correct? What could be an “evident for everybody” certificate of the validityof your answer? If your claim is that (2.3.6) is solvable, a certificate could be very simple:it suffices to point out a solution x∗ to (2.3.6). Given this certificate, one can substitute x∗

into the system and check whether x∗ indeed is a solution.Now assume that your claim is that (2.3.6) has no solutions. What could be a “simple

certificate” of this claim? How one could certify a negative statement? This is a highlynontrivial problem which goes far beyond the bounds of mathematics. Fortunately, in somecases there exist “simple certificates” of negative statements. E.g., in order to certify that(2.3.6) has no solutions, it suffices to demonstrate that one can obtain, as a consequence ofthe system (2.3.6), the contradictory inequality

−1 ≥ 0.

For example, assume that λi, i = 1, ...,m are nonnegative weights. Combining inequalitiesfrom (2.3.6) with these weights we come to the inequality

m∑i=1

λifi(x) Ω 0 (2.3.7)

where Ω is either “>” (this is the case when the weight of at least one strict inequalityfrom (2.3.6) is positive), or “≥” (otherwise). The resulting inequality, due to its origin, isa consequence of the system (2.3.6) – it for sure is satisfied by every solution to (2.3.6).Thus, if (2.3.7) happens to be contradictory – has no solutions at all – we may be sure that(2.3.6) has no solutions; whenever it is the case, we may treat the corresponding vector λ asa “simple certificate” of the fact that (2.3.6) is infeasible. Let us look what does the outlinedapproach mean when (2.3.6) is comprised of linear inequalities:

aTi x Ωi bi, i = 1, ...,m Ωi = “ >′′, ′′ ≥′′.


Here the “combined inequality” also is linear:

(m∑i=1

λiai)TxΩ

m∑i=1

λibi

( Ω is “>” whenever λi > 0 for at least one i with Ωi = “ >′′, and and Ω is “≥” otherwise).Now, when a linear inequality

dTx Ω e

can be contradictory? Of course, it can happen only in the case when the left hand side inthis inequality is trivial – identically zero, i.e, only if d = 0. Whether in this latter case ourinequality is contradictory, it depends on what is the relation Ω: in the case Ω = “ >′′ theinequality is contradictory if and only if e ≥ 0, and in the case of Ω = “ ≥′′ it is contradictoryif and only if e > 0. We have established the following simple

Proposition 2.3.1 Consider a system of linear inequalities (2.3.6) with n-dimensional vec-tor of unknowns x, where for every i the relation Ωi is either “>”, or “≥”. To simplifynotation, assume that Ωi are “>” for i = 1, ...,ms and Ωi =“≥” for i = ms+1, ...,m. Let usassociate with (2.3.6) two systems of linear inequalities and equations with m-dimensionalvector of unknowns λ:

TI :

(a) λ ≥ 0(b)

∑mi=1 λiai = 0;

(cI)∑mi=1 λibi ≥ 0;

(dI)∑msi=1 λi > 0.

TII :

(a) λ ≥ 0(b)

∑mi=1 λiai = 0;

(cII)∑mi=1 λibi > 0.

Assume that at least one of the systems TI , TII is solvable. Then the system (2.3.6) isinfeasible.

Proposition 2.3.1 says that in some cases it is easy to certify infeasibility of a linear systemof inequalities: a “simple certificate” is a solution to another system of linear inequalities.Note, however, that the existence of a certificate of this latter type to the moment is onlysufficient, but not necessary, condition for the infeasibility of (2.3.6). A fundamental resultin the theory of linear inequalities is that the sufficient condition in question in fact is alsonecessary:

Theorem 2.3.1 [General Theorem on Alternative] In the notation from Proposition 2.3.1,system (2.3.6) has no solution if and only if either TI , or TII , or both these systems, is/aresolvable.

The proof of the Theorem on Alternative, as well as a number of useful particular cases of it,form one of the topics of the exercises to this lecture. It makes sense to formulate explicitlytwo most useful principles following from the theorem:


A. A system of linear inequalities

aTi xΩi bi, i = 1, ...,m

is infeasible if and only if one can combine the inequalities of the system in a linearfashion (i.e., multiply the inequalities by nonnegative weights and add the results) toget a contradictory inequality, namely, either the inequality 0Tx ≥ 1 or the inequality0Tx > 0.

B. A linear inequalityaT0 x Ω0 b0

is a consequence of a solvable system of linear inequalities

aTi x Ωi bi, i = 1, ...,m,

if and only if it can be obtained by combining, in a linear fashion, the inequalities ofthe system and the trivial inequality 0 > −1.

It should be stressed that the above principles are very nontrivial and deep. Consider, e.g., thefollowing system of 4 linear inequalities with two variables u and v:

−1 ≤ u ≤ 1

−1 ≤ v ≤ 1.

From these inequalities it follows thatu2 + v2 ≤ 2,

which in turn implies, by the Cauchy inequality, the linear inequality u+ v ≤ 2:

u+ v = 1× u+ 1× v ≤√

12 + 12√u2 + v2 ≤ (

√2)2 = 2.

The concluding inequality is linear and is a consequence of the original system, but the both stepsof the demonstration of this fact are “highly nonlinear”. It is absolutely unclear a priori why thesame consequence can, as it is stated by Principle A, be derived from the system in a linear manneras well [of course it can – it suffices just to add two inequalities u ≤ 1 and v ≤ 1]. Note that theTheorem on Alternative and its corollaries A and B heavily exploit the fact that we are speakingabout linear inequalities. E.g., consider the following 2 quadratic and 2 linear inequalities with twovariables:

(a) u2 ≥ 1;

(b) v2 ≥ 1;

(c) u ≥ 0;

(d) v ≥ 0;

along with the quadratic inequality

(e) uv ≥ 1.


The inequality (e) clearly is a consequence of (a) – (d). However, if we try to extend the system

of inequalities (a) – (d) by all “trivial” (identically true) linear and quadratic in equalities with 2

variables, like 0 > −1, u2 + v2 ≥ 0, u2 + 2uv + v2 ≥ 0, u2 − uv + v2 ≥ 0, etc., and ask whether

(e) can be derived in a linear fashion from the inequalities of the extended system, the answer will

be negative. Thus, Principle A fails to be true already for quadratic inequalities (what is a great

sorrow – otherwise there were no difficult problems at all!).

We are about to use Theorem on Alternative to obtain the basic results of the LP DualityTheory.

2.3.2 Dual to an LP program: the origin

As it was already mentioned, the motivation for constructing the problem dual to the LPprogram

cTx→ min | Ax− b ≥ 0,

A =

aT1aT2...aTm

∈ Rm×n

(2.3.8)

is the desire to get a systematic way to generate lower bounds on the optimal value in (2.3.8).Now, a real a is a lower bound on the optimal value if and only if cTx ≥ a whenever Ax ≥ b,or, which is the same, if and only if the system of linear inequalities

−cTx > −a, Ax ≥ b, (2.3.9)

has no solutions. And we already know that the latter fact means that some other systemof linear inequalities (more exactly, at least one of the certain pair of systems) does have asolution. Namely, in view of Theorem on Alternative

(*) (2.3.9) has no solutions if and only if at least one of the following twosystems with m+ 1 unknowns:

TI :

(a) λ = (λ0, ..., λm) ≥ 0(b) −λ0c+

∑mi=1 λiai = 0;

(cI) −λ0a+∑mi=1 λibi ≥ 0;

(dI) λ0 > 0.

or

TII :

(a) λ = (λ0, ..., λm) ≥ 0(b) −λ0c+

∑mi=1 λiai = 0;

(cII) −λ0a+∑mi=1 λibi > 0

has a solution.


Now assume that (2.3.8) is feasible. Our claim is that under this assumption (2.3.9) has nosolutions if and only if TI has a solution.

Indeed, the implication “TI has a solution ⇒ (2.3.9) has no solution” is readily givenby the above remarks. All we should verify is the inverse implication. Thus, assume that(2.3.9) has no solutions and the system Ax ≥ b has a solution, and let us prove that thenTI has a solution. By (*), at least one of the systems TI , TII has a solution; assuming thatthe solvable system is not TI , we should conclude that TII is solvable, and λ0 = 0 for (every)solution to TII (since a solution to the latter system with λ0 > 0 solves TI as well). But thefact that TII has a solution λwith λ0 = 0 is independent of the values of a and c. If it weretrue that would mean, by Theorem on Alternative, that, e.g., the following modified versionof (2.3.9):

0Tx ≥ −1, Ax ≥ b

has no solutions. In other words, a solution to TII with λ0 = 0 would certify that alreadythe system Ax ≥ b has no solutions, which is not the case by our assumption.

Now, if TI has a solution, this system has a solution with λ0 = 1 as well (to see this, passfrom a solution λ to λ/λ0. This construction is well-defined, since λ0 > 0 for every solutionto TI). Now, an m + 1-dimensional vector λ = (1, y) is a solution to TI if and only if them-dimensional vector y solves the system of linear inequalities and equations

y ≥ 0;ATy ≡ ∑m

i=1 yiai = c;bTy ≥ a.

(2.3.10)

Summarizing our observations, we come to the following result.

Proposition 2.3.2 Assume that system (2.3.10) associated with the LP program (2.3.8) hasa solution (y, a). Then a is a lower bound on the optimal value in (2.3.8). Vice versa, if ais a lower bound on the optimal value of feasible LP program (2.3.8), then a can be extendedby a properly chosen m-dimensional vector y to a solution to (2.3.10).

We see that the entity responsible for lower bounds on the optimal value of (2.3.8) is thesystem (2.3.10): every solution to the latter system induces a bound of this type, and in thecase when (2.1.1) is feasible, all lower bounds can be obtained from solutions to (2.3.10).Now note that if (y, a) is a solution to (2.3.10), then the pair (y, bTy) also is a solution tothe same system, and the lower bound on c∗ given by the latter solution – i.e., bTy – is notworse than the lower bound a yielded by the former solution. Thus, as far as lower boundson c∗ are concerned, we loose nothing by restricting ourselves with the solutions (y, a) to(2.3.10) with a = bTy. The best lower bound on c∗ given by (2.3.10) is therefore the optimalvalue in the problem

bTy → max | ATy = c, y ≥ 0. (2.3.11)

The problem we end up with is called the problem dual to the primal problem (2.3.8). Notethat this problem also is a Linear Programming program. All we know about the dualproblem to the moment is the following:


Proposition 2.3.3 Whenever y is a feasible solution to (2.3.11), the corresponding valueof the dual objective bTy is a lower bound on the optimal value c∗ in (2.3.8). If (2.3.8)is feasible, then, for every lower bound a on the optimal value of (2.3.8), there exists afeasible solution y to (2.3.11) with bTy ≥ a (i.e., a feasible solution y which yields, via thecorresponding value of the dual objective bTy, a lower bound not worse than a).

2.3.3 LP Duality Theorem

Proposition 2.3.3 is in fact equivalent to the following

Theorem 2.3.2 (Duality Theorem in Linear Programming) Consider a linear pro-gram

cTx→ min, | Ax ≥ b, (2.3.12)

along with its dual

bTy → max, | ATy = c; y ≥ 0. (2.3.13)

Then

1. the duality is symmetric: the problem dual to dual is equivalent to the primal;

2. the value of the dual objective at every dual feasible solution is ≤ the value of the primalobjective at every primal feasible solution;

3. The following 5 properties are equivalent to each other:

(i) The primal is feasible and below bounded.(ii) The dual is feasible and above bounded.(iii) The primal is solvable.(iv) The dual is solvable.(v) Both primal and dual are feasible.

Whenever (i) ≡ (ii) ≡ (iii) ≡ (iv) ≡ (v) is the case, the optimal values in the primal andthe dual problems are equal to each other.

Proof. 1) is quite straightforward: writing the dual problem (2.3.13) in our standard form,we get

−bTy → min, |

ImAT

−AT

y ≥ 0

c−c

,


Im being the m-dimensional unit matrix. Applying the duality transformation to the latterproblem, we come to the problem

0T ξ + cTη + (−c)T ζ → max |

ξ ≥ 0;η ≥ 0;ζ ≥ 0;

ξ + Aη − Aζ = −b,

which is clearly equivalent to (2.3.12) (set x = ζ − η).2) is readily given by Proposition 2.3.3.3):(i)⇒ (iv): if the primal is feasible and below bounded, its optimal value c∗ (which of

course is a lower bound on itself) can, by Proposition 2.3.3, be (non-strictly) majorated by alower bound on c∗ of the type bTy∗ given by a feasible solution y∗ to (2.3.13). In the situationin question, of course, bTy∗ = c∗. On the other hand, in view of the same Proposition 2.3.3the optimal value in the dual is ≤ c∗. We conclude that the optimal value in the dual isattained and is equal to the optimal value in the primal.

(iv)⇒(ii): evident;(ii)⇒(iii): This implication, in view of the primal-dual symmetry (see 1)), follows from

the already shown implication (i)⇒(iv).(iii)⇒(i): evident. We have seen that (i)≡(ii)≡(iii)≡(iv) and that the first (and conse-

quently each) of these 4 equivalent properties implies that the optimal value in the primalproblem is equal to the optimal value in the dual one. It remains to prove the equivalencebetween (i)–(iv), on one hand, and (v), on the other hand. This is immediate: (i)–(iv), ofcourse, imply (v); vice versa, in the case of (v) the primal is not only feasible, but also belowbounded (this is an immediate consequence of the feasibility of the dual problem, see 2)),and (i) follows.

An immediate corollary of the LP Duality Theorem is the following necessary and suffi-cient optimality condition in LP:

Theorem 2.3.3 (NS optimality conditions in Linear Programming) Consider an LPprogram (2.3.12) along with its dual (2.3.13), and let (x, y) be a pair of primal and dual fea-sible solutions. The pair is comprised of optimal solutions to the respective problems if andonly if

yi[Ax− b]i = 0, i = 1, ...,m, [complementary slackness]

as well as if and only if

cTx− bTy = 0. [zero duality gap]

Indeed, the “zero duality gap” optimality condition is an immediate consequence of thefact that the value of primal objective at every primal feasible solution is larger than thevalue of the dual objective at every dual feasible solution, while the optimal values in theprimal and the dual are equal to each other, see Theorem 2.3.2. The equivalence between


the “zero duality gap” and the “complementary slackness” optimality conditions is given bythe following computation: whenever x is primal feasible and y is dual feasible, the productsyi[Ax− b]i, i = 1, ...,m are nonnegative. While the sum of these products is nothing but theduality gap:

yT [Ax− b] = (ATy)Tx− bTy = cTx− bTy.

Thus the duality gap can vanish at a primal-dual feasible pair (x, y) if and only if all productsyi[Ax− b]i for these pairs are zeros.

2.3.4 Illustration: the problem dual to the Tchebychev approxi-mation problem

Let us look what is the program dual to the (LP form of) the Tchebychev approximationproblem. Our primal LP program is

t→ min | t− [bi − aTi x] ≥ 0, t− [−bi + aTi x] ≥ 0, i = 1, ...,M (2.3.14)

Consequently, the dual problem is the LP program

M∑i=1

bi[ηi − ζi]→ max |

ηi, ζi ≥ 0, i = 1, ...,M ;∑M

i=1[ηi + ζi] = 1;∑Mi=1[ηi − ζi]ai = 0.

In order to simplify the dual problem, let us pass from the variables ηi, ζi to the variablespi = ηi + ζi, qi = ηi − ζi. With respect to the new variables the problem becomes

M∑i=1

biqi → max |

pi ± qi ≥ 0, i = 1, ...,M ;∑Mi=1 pi = 1;∑M

i=1 aiqi = 0.

In the resulting problem one can easily eliminate the p-variables, thus coming to the problem

M∑i=1

biqi → max | ∑M

i=1 aiqi = 0;∑Mi=1 |qi| ≤ 1;

(2.3.15)

The primal-dual pair (2.3.14) – (2.3.15) admits a nice geometric interpretation. Geometri-cally, the primal problem (2.3.14) is:

Given a vector b ∈ RM and a linear subspace L in RM spanned by N vectorsa1, ..., aN , find a closest to b in the norm

‖z‖∞ = maxi=1,...,M

|zi|

on RM element of L.

The dual problem (2.3.15) is

2.4. WHAT CAN BE EXPRESSED AS LP PROBLEM? 43

Given the same data as in (2.3.14), find a linear functional z → qT z on RM

of the ‖ · ‖1-norm

‖q‖1 =M∑i=1

|qi|

not exceeding 1 which separates best of all the point b and the linear subspaceL, i.e., which is identically 0 on L and is as large as possible at b.

The Duality Theorem says, in particular, that the optimal values in (2.3.14) and in (2.3.15)are equal to each other; in other words,

the ‖ · ‖∞-distance from a point b ∈ RM to a linear subspace L ⊂ RM isalways equal to the maximum quantity by which b can be separated from L bya linear functional of ‖ · ‖1-norm 1.

This is the simplest case of a very general and useful statement (a version of the Hahn-BanachTheorem):

The distance from a point b in a linear normed space (E, ‖ · ‖) to a linearsubspace L ⊂ E is equal to the supremum of quantities by which b can beseparated from L by a linear functional of the conjugate to ‖ · ‖ norm 1.

2.4 What can be expressed as LP problem?

Despite its simplistic appearance, LP problems represent an important class of optimizationproblems not only from the theoretical but also from practical point of view. In fact, numer-ous problems in signal processing, network optimization or graph theory may be routinelyexpressed as Linear Programming problems. However, these problems are not normallyin their “catalogue” forms (2.3.12) or (2.3.13), and thus an important skill required fromthose interested in applications of optimization is the ability to recognize the fundamentalstructure underneath the original formulation. The latter is frequently in the form

minxf(x)| x ∈ X

where f is a “loss function”, and the set X of admissible design vectors is typically given as

X =m⋂i=1

Xi

every Xi is a set of vectors admissible for a particular design restriction which in many casesis given by

Xi = x ∈ Rn| gi(x) ≤ 0

where gi is the i-th constraint function.


It is well-known that the objective f is always can be assumed linear, otherwise we couldmove the original objective to the list of constraints, passing to the equivalent problem

mint,x

t| (t, x) ∈ X = (x, t)| x ∈ X, t ≥ f(x)

In other words, we may assume that the original problem is of the form

minx

cTx| x ∈ X =

m⋂i=1

Xi

(2.4.16)

In order to recognize that X is in our “catalogue form” (2.3.12), we may act as follows:

look for a finite system S of linear inequalities A

(xu

)− b ≥ 0, in variables

x ∈ Rn and additional variables u such that X is the projection of the solutionset of S onto the x-space. In other words, x ∈ X if and only if one can extend xto a solution (x, u) of the system S

x ∈ X ⇔ ∃u such that A

(xu

)− b ≥ 0,

Every such system S is called an LP representation (LPr) of the set X.

Note that if we are able to exhibit such system S then, indeed, the optimization problem in(2.4.16) is an LP program:

minx,u

cTx| (x, u) satify S

.

Observe that this is exactly what we have done when reformulating the problem of Tcheby-chev approximation as a Linear Programming program (2.2.3).

Let us consider one less evident example of such reformulation.

Example 2.4.1 Let for x ∈ Rn x(1), ..., x(n) be the entries of x sorted in the decreasing orderof absolute values. We denote

‖x‖k,p =

(k∑i=1

|x(i)|p) 1

p

with, by definition, ‖x‖k,∞ = ‖x‖∞ = |x(1)|.Let us show that the set S = (x, u) ∈ Rn| ‖x‖k,1 ≤ u is LPr, specifically, by the solution

set of the following system of linear inequalities:

(a) t− ks−∑ni=1 zi ≥ 0,

(b) z ≥ 0,(c) z − x+ s ≥ 0,(d) z − x+ s ≥ 0,

(2.4.17)

where z ∈ Rn and s ∈ R are additional variables.

2.4. WHAT CAN BE EXPRESSED AS LP PROBLEM? 45

We should prove that

(i) if a given pair (x, t) can be extended, by properly chosen (s, z), to a solution of thesystem (2.4.17) then ‖x‖s,1 ≤ t;

(ii) Vice versa, if ‖x‖s,1 ≤ t, then the pair (x, t) can be extended, by properly chosen s, andz, to a solution of (2.4.17).

Let us prove (i). When assuming that (x, t, s, z) is a solution to (2.4.17), we get, due to(2.4.17.c,d), |x| ≤ z + s1, where 1 = (1, ..., 1)T and |x| = (|x1|, ..., |xn|)T ). Thus

k∑i=1

|x(i)| ≤k∑i=1

|z(i)|+ sk ≤n∑i=1

|zi|+ sk

(the second ≤ is due to the nonnegativity of z, see (2.4.17.b)). The latter inequality, in viewof (2.4.17.a)), ‖x‖k,1 ≤ t , and (i) is proved.

To prove (ii), assume that we are given x, t with ‖x‖k,1 ≤ t, and let us set s = |x(k)|.Then the k largest elements of the vector |x| − s1 are nonnegative, and the remaining arenonpositive. Let z be such that the k largest entries of z are exactly |x(1)| − s, ..., |x(k)| − s,and all other entries are zero. We clearly have z ≥ 0 and z − |x|+ s1 ≥ 0. Thus, the vectorz and the real s we have built satisfy (2.4.17.b,c). In order to see that (2.4.17.a) is satisfiedas well, note that by construction

∑i zi = ‖x‖k,1 − sk, whence

t− sk −∑i

zi = t− ‖x‖k,1 ≥ 0.


2.5 Exercises: Linear Programming

2.5.1 Around the Theorem on Alternative

The goal of the subsequent exercises is to prove General Theorem on Alternative.

From Homogeneous Farkas Lemma to Theorem on Alternative

Consider the very particular case of Theorem on Alternative – the one where we are interestedwhen a specific system

aTx < 0,aTi x ≥ 0, i = 1, ...,m

(2.5.18)

of homogeneous linear inequalities in Rn has no solutions. The answer is given by thefollowing result which was the subject of Exercise 1.4.3.

Lemma 2.5.1 (Homogeneous Farkas Lemma) System (2.5.18) has no solutions if andonly if the vector a is a linear combination with nonnegative coefficients of the vectorsa1, ..., am:

(2.5.18) is infeasible ⇔ ∃λ ≥ 0 : a =m∑i=1

λiai

Exercise 2.5.1 Prove that Lemma 2.5.1 is exactly what is said by the Theorem on Alter-native as applied to the particular system (2.5.18).

We will now demonstrate that the General Theorem on Alternative can be easily obtainedfrom the Homogeneous Farkas Lemma.

Exercise 2.5.2 Consider the same system of linear inequalities as in Theorem on Alterna-tive:

(S) :

aTi x > bi, i = 1, ...,ms;aTi x ≥ bi, i = ms + 1, ...,m.

Prove that this system has no solutions if and only if this is the case for the homogeneoussystem of the type (2.5.18) as follows:

(S∗) :

−s < 0;

t− s ≥ 0;aTi x− bit− s ≥ 0 i = 1, ...,ms;

aTi x− bit ≥ 0, i = ms + 1, ...,m,

the unknowns in (S∗) being x and two additional real variables s and t.Derive from the above observation and the Homogeneous Farkas Lemma the General

Theorem on Alternative.

The next exercise presents several useful consequences of the General Theorem on Alterna-tive.

2.5. EXERCISES: LINEAR PROGRAMMING 47

Exercise 2.5.3 Prove the following statements:

1. [Gordan’s Theorem on Alternative] One of the inequality systems

(I) Ax < 0, x ∈ Rn,

(II) ATy = 0, 0 6= y ≥ 0, y ∈ Rm,

A being an m× n matrix, has a solution if and only if the other one has no solutions.

2. [Inhomogeneous Farkas Lemma] A linear inequality

aTx ≤ p (2.5.19)

is a consequence of a solvable system of inequalities

Ax ≤ b

if and only ifa = ATν

for some nonnegative vector ν such that

νT b ≤ p.

3. [Motzkin’s Theorem on Alternative] The system

Sx < 0, Nx ≤ 0

has no solutions if and only if the system

STσ +NTν = 0, σ ≥ 0, ν ≥ 0, σ 6= 0

has a solution.

2.5.2 Around uniform approximation

(Exercises of this section are not obligatory!)As we have already indicated, the Tchebychev approximation problem normally arises

as a “discrete version” of the best uniform approximation problem on a segment:

Given a segment ∆ = [a, b], N basic functions f1, ..., fN on the segment anda target function f0 on it, find the best, in the uniform norm on the segment,approximation of f0 by a linear combination of fi:

‖f0 −N∑j=1

xjfj‖∞ → min . (2.5.20)


The discrete version of the latter problem is obtained by replacing ∆ with a finite set T ⊂ ∆:

‖f0 −N∑j=1

xjfj‖T,∞ = supt∈T|f0(t)−

N∑j=1

xjfj(t)| → min . (2.5.21)

Whenever this indeed is the origin of the Tchebychev approximation problem, the followingtwo questions are of primary interest:

A What is the “quality of approximation” of (2.5.20) by (2.5.21)? Specifically, may we writedown an inequality

‖f0 −N∑j=1

xjfj‖∞ ≤ κ‖f0 −N∑j=1

xjfj‖T∞ (2.5.22)

with a given κ? If this is the case then κ can be seen as a natural measure of thequality of the approximation of the original the original problem by its discrete version– the closer to 1 κ is, the better is the quality.

B Given the total number M of points in the finite set T , how should we choose these pointsto get the best possible quality of approximation?

The goal of the subsequent series of problems is to provide some information on these twoquestions. The answers will be given in terms of properties of functions from the linear spaceL spanned by f0 and f1 – fN :

L = f =N∑j=0

ξjfjξ∈RN+1 .

Given a finite set T ⊂ ∆, let us say that T is L-dense, if there exists κ <∞ such that

‖f‖∞ ≤ κ‖f‖T,∞, ∀f ∈ L.

The minimum value of κ’s with the latter property will be denoted by κL(T ). If T is notL-dense, we set κL(T ) = ∞. Note that κL(T ) majorates the quality of approximating theproblem (2.5.20) by (2.5.21), and this is the quantity we will focus on.

Exercise 2.5.4 Let L be a finite-dimensional space comprised of continuous functions on asegment ∆, and let T be a finite subset in ∆. Prove that T is L-dense if and only if the onlyfunction from L which vanishes on T is ≡ 0.

Exercise 2.5.5 Let α < ∞, and assume L is α-regular, i.e., the functions from L arecontinuously differentiable and

‖f ′‖∞ ≤ α‖f‖∞, ∀f ∈ L.

Assume that T ⊂ ∆ is such that the distance from a point in ∆ to the closest point of Tdoes not exceed β < α−1. Prove that under these assumptions

κL(T ) ≤ 1

1− αβ.


Solution: Let f ∈ L, M = ‖f‖∞, and let a ∈ ∆ be the point where |f(a)| = M . Thereexists a point t ∈ T such that |t− a| ≤ β; since L is regular, we have |f(a)− f(t)| ≤Mαβ,whence |f(t)| ≥M(1− αβ), and, consequently, ‖f‖T,∞ ≥ |f(t)| ≥M(1− αβ).

Exercise 2.5.6 Let L be a k-dimensional linear space comprised of continuously differen-tiable functions on a segment ∆. Prove that L is α-regular for some α; consequently, choosinga fine enough finite grid T ⊂ ∆, we can ensure a given quality of approximating (2.5.20) by(2.5.21).

To use the simple result stated in Exercise 2.5.5, we should know something about regularlinear spaces L of functions. The most useful result of this type known to us is the followingfundamental fact:

Theorem 2.5.1 (Bernstein’s theorem on trigonometric polynomials) Let ∆ = [0, 2π],and let f be a trigonometric polynomial of degree k on ∆:

f(t) = a0 +k∑l=1

[a0 cos(lt) + b0 sin(lt)]

with real or complex coefficients. Then

‖f ′‖∞ ≤ k‖f‖∞.

Note that the inequality stated in the Bernstein Theorem is exact: for the trigonometricpolynomial

f(t) = cos(kt)

of degree k the inequality becomes equality. We see that the space of trigonometric polyno-mials of degree ≤ k on [0, 2π] is k-regular. What about the space of algebraic polynomialsof degree ≤ k on the segment, say, [−1, 1]?

Consider the Tchebychev polynomial of degree k, defined on the segment ∆ = [−1, 1] bythe relation

Tk(t) = cos(k arccos(t))

(check that this is the polynomial in t of degree k). This polynomial possesses the followingproperty:

‖Tk‖∞ = 1 and there are k + 1 points of alternance of Tk – the points tl =

cos(π(k−l)k

)∈ ∆, l = 1, ..., k, where the absolute value of the polynomial is equal

to ‖Tk‖∞ = 1, and the signs of the values alternate.

Note that the derivative of Tk at the point t = 1 is k2; thus, the factor α in the inequality

‖T ′k‖∞ ≤ α‖Tk‖∞is at least k2. We conclude that the space Lk of real algebraic polynomials of degree ≤ k onthe segment [−1, 1] is not α-regular for α < k2. Is our space k2-regular? We guess that theanswer is positive, but we were too lazy to find whether it indeed is the case. What we willdemonstrate that Lk is 2k2-regular.


Exercise 2.5.7 Prove that if f ∈ Lk and ‖f‖∞ = 1 then

|f ′(1)| ≤ k2 = T ′k(1).

Derive that |f ′(t)| ≤ 2k2 for all t ∈ [−1, 1] and conclude that Lk is 2k2 regular.

Hint: Assuming that f ′(1) > T ′k(1) consider the polynomial

p(t) = Tk(t)−T ′k(1)

f ′(1)f(t).

Verify that the values of this polynomial at the points of alternance of Tk are of the samesigns as those of Tk, so that p has at least k distinct zeros on [−1, 1]. Taking into accountthe fact that p′(1) = 0 count the zeros of p′(t).

Now let us apply the information collected so far to investigating questions (A) and(B) in the simple case where L is comprised of the trigonometric or algebraic polynomials,respectively.

Exercise 2.5.8 Assume that ∆ = [0, 2π], and let L be a linear space of functions on ∆comprised of all trigonometric polynomials of degree ≤ k. Let also T be an equidistantM-point grid on ∆:

T =

(2l + 1)π

M l=0

M−1

.

1) Prove that if M > kπ, then T is L-dense with

kL(T ) =M

M − kπ.

2) Prove that the above inequality remains valid if we replace T with its arbitrary “shiftmodulo 2π”. I.e., treat ∆ as the unit circumference and rotate T by an angle.3) Prove that if T is an arbitrary M-point subset of ∆ with M ≤ k, then κL(T ) =∞.

Solution:1) It suffices to apply the result of Exercise 2.5.4; in the case in question β = π

M, and by

the Bernstein theorem, α = k.2) follows from 1) due to the fact that the space of trigonometric polynomials is invariant

with respect to “cyclic shift” of the argument by any angle.3) Let T = tiMi=1. The function

f(t) =M∏i=1

sin(t− ti)

is a trigonometric polynomial of degree M ≤ k; this function vanishes on T (i.e. ‖f‖T,∞ = 0),although its uniform norm on ∆ is positive.


Exercise 2.5.9 Let ∆ = [−1, 1], and let L be the space of all algebraic polynomials of degreealgebraic polynomials of degree ≤ k.

1) Assume that 2M ≥ πk and T is the M-point set on ∆ comprised of M pointstl = cos

((2l + 1)π

2M

)M−1

l=0

.

Then T is L-dense with κL(T ) = 2M2M−πk .

2) Let T be an M-point set on ∆ with M ≤ k. Then κL(T ) =∞.

Solution:1) Let us pass from the functions f ∈ L to the functions f+(φ) = f(cos(φ)), φ ∈ [0, 2π].

Note that f+ is a trigonometric polynomial of degree ≤ k. Let

T+ = φl =(2l + 1)π

2M

2M−1

l=0.

According to the result of Exercise 2.5.8 1), for every f ∈ L we have

‖f‖∞ = ‖f+‖∞

≤ 2M

2M − πkmax

0≤l≤2M−1|f+(φl)|

≤ 2M

2M − πkmax

0≤l≤2M−1|f(cos(φl))|

≤ 2M

2M − πkmax

0≤l≤M−1|f(tl)|

(note that when φ takes values on T+, the quantity cos(φ) takes values on T ).2) Whenever the cardinality of T is ≤ k, L contains a nontrivial polynomial

f(t) =∏t′∈T

(t− t′)

which vanishes on T .The result stated in Exercise 2.5.9 says that when Lk is comprised of all real algebraic

polynomials of degree not exceeding k on [−1, 1], and we want to ensure κL(T ) = O(1), itsuffices to take M ≡ card(T ) = O(k). Note, however, that the corresponding grid is notuniform. Whether we may achieve similar results with a regular grid is yet a question. Infact, the answer is “no”:

Exercise 2.5.10 Prove that if T = −1 + 2lMMl=0 is the equidistant M + 1-point grid on

∆ = [−1, 1], then

κL(T ) ≥ c1(M) exp− c2k√M,

with some absolute positive constants c1 and c2. Thus in order to get κL(T ) = O(1) for anequidistant grid T , the cardinality of the grid should be nearly quadratic in k.


Hint: Let t1 = −1, t2 = −1 + 2M, ..., tM = 1 be the points of T . Reduce the question to the

following one:

Given a polynomial f(t) of degree k which is ≤ 1 in absolute value on[−1, tM−1] and equal to 0 at 1, how large could the polynomial be at tM−1+1

2.

To answer the latter question look at the Tchebychev polynomial Tk which grows outsidethe segment [−1, 1] as Tk(t) = cosh(k acosh(t))).

Documents

Lecture 2 Theory of Linear Programming - imag.fr · Lecture 2 Theory of Linear Programming (Linear Programming program, Linear, Theorem on Alternative, Linear Programming dual-ity)