10
Available online at www.sciencedirect.com Systems & Control Letters 51 (2004) 259 – 268 www.elsevier.com/locate/sysconle On innite-time nonlinear quadratic optimal control Yue Chen a , Thomas Edgar b , Vasilios Manousiouthakis a ; ; 1 a Chemical Engineering Department, UCLA, Los Angeles, CA 90095-1592, USA b Chemical Engineering Department, University of Texas at Austin, TX 78712-1062, USA Received 21 November 2002; received in revised form 17 July 2003; accepted 11 August 2003 Abstract This work presents an approximate solution method for the innite-time nonlinear quadratic optimal control problem. The method is applicable to a large class of nonlinear systems and involves solving a Riccati equation and a series of algebraic equations. The conditions for uniqueness and stability of the resulting feedback policy are established. It is shown that the proposed approximation method is useful in determining the region in which the constrained and unconstrained optimal control problems are identical. A reactor control problem is used to illustrate the method. c 2003 Elsevier B.V. All rights reserved. Keywords: Nonlinear; Optimal control; HJB equation; Approximate solution; Constraints 1. Introduction The innite-time nonlinear optimal control problem has been the subject of intense research eorts for a long time. Al’Brecht [1] rst considered this problem for analytic objective functions and analytic systems. He demonstrated that the optimal control could be obtained in the form of a power series, the terms of which could be sequentially obtained through solution of a quadratic optimal control problem for the linearized system and subsequent solution of a series of linear dierential equations. He was also able to establish the convergence of this power series, for single input systems of the form ˙ x = f(x)+ Bu. Lee and Markus, [12, p. 299] again employed the aforementioned analyticity assumption to establish that the analytic feedback controller u =u (x), stabilizing the system ˙ x = f(x; u), giving rise to the nite objective function J (x 0 ;u )= 0 G(x; u (x)) dt , and satisfying the functional equation (@J=@x)(x; u )(@f=@u)(x; u (x)) + (@G=@u)(x; u (x)) = 0 near the origin, is unique and optimal. Lukes, a student of Markus, later relaxed the analyticity condition to second-order dierentiability, in [13]. Werner and Cruz [18] considered an optimally adaptive control problem, expanded the optimal control as a Taylor series and proposed a method to identify the coecients through solution of a series of linear dierential equations. Garrard [7], presented a small perturbation procedure, to identify sub-optimal control laws, as power series of , for systems of the form ˙ x = Ax + (; x)+ Bu. He also demon- strated that a k -order truncation of the power series provides a (2k + 1)-order approximation of the optimal control. Nishikawa et al. [15] established a similar result for time varying systems using an induction proof. Corresponding author. Tel.: +1-3102060300; fax: +1-3102064207. E-mail addresses: [email protected] (Y. Chen), [email protected] (T. Edgar), [email protected] (V. Manousiouthakis). 1 Also can be corresponded to. 0167-6911/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2003.08.006

On infinite-time nonlinear quadratic optimal control

Embed Size (px)

Citation preview

Page 1: On infinite-time nonlinear quadratic optimal control

Available online at www.sciencedirect.com

Systems & Control Letters 51 (2004) 259–268www.elsevier.com/locate/sysconle

On in!nite-time nonlinear quadratic optimal control

Yue Chena, Thomas Edgarb, Vasilios Manousiouthakisa ;∗;1

aChemical Engineering Department, UCLA, Los Angeles, CA 90095-1592, USAbChemical Engineering Department, University of Texas at Austin, TX 78712-1062, USA

Received 21 November 2002; received in revised form 17 July 2003; accepted 11 August 2003

Abstract

This work presents an approximate solution method for the in!nite-time nonlinear quadratic optimal control problem.The method is applicable to a large class of nonlinear systems and involves solving a Riccati equation and a series ofalgebraic equations. The conditions for uniqueness and stability of the resulting feedback policy are established. It is shownthat the proposed approximation method is useful in determining the region in which the constrained and unconstrainedoptimal control problems are identical. A reactor control problem is used to illustrate the method.c© 2003 Elsevier B.V. All rights reserved.

Keywords: Nonlinear; Optimal control; HJB equation; Approximate solution; Constraints

1. Introduction

The in!nite-time nonlinear optimal control problem has been the subject of intense research e>orts for along time. Al’Brecht [1] !rst considered this problem for analytic objective functions and analytic systems. Hedemonstrated that the optimal control could be obtained in the form of a power series, the terms of which couldbe sequentially obtained through solution of a quadratic optimal control problem for the linearized system andsubsequent solution of a series of linear di>erential equations. He was also able to establish the convergenceof this power series, for single input systems of the form x=f(x) + Bu. Lee and Markus, [12, p. 299] againemployed the aforementioned analyticity assumption to establish that the analytic feedback controller u=u∗(x),stabilizing the system x = f(x; u), giving rise to the !nite objective function J (x0; u∗) =

∫∞0 G(x; u∗(x)) dt,

and satisfying the functional equation (@J=@x)(x; u∗)(@f=@u)(x; u∗(x)) + (@G=@u)(x; u∗(x)) = 0 near the origin,is unique and optimal. Lukes, a student of Markus, later relaxed the analyticity condition to second-orderdi>erentiability, in [13]. Werner and Cruz [18] considered an optimally adaptive control problem, expandedthe optimal control as a Taylor series and proposed a method to identify the coeGcients through solution ofa series of linear di>erential equations. Garrard [7], presented a small � perturbation procedure, to identifysub-optimal control laws, as power series of �, for systems of the form x=Ax+ � (�; x)+Bu. He also demon-strated that a k-order truncation of the � power series provides a (2k +1)-order approximation of the optimalcontrol. Nishikawa et al. [15] established a similar result for time varying systems using an induction proof.

∗ Corresponding author. Tel.: +1-3102060300; fax: +1-3102064207.E-mail addresses: [email protected] (Y. Chen), [email protected] (T. Edgar), [email protected] (V. Manousiouthakis).1 Also can be corresponded to.

0167-6911/$ - see front matter c© 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.sysconle.2003.08.006

Page 2: On infinite-time nonlinear quadratic optimal control

260 Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268

Later, in [8], Garrard and Jordan studied systems of the form x=Ax+’(x)+Bu; ’(x) polynomial. For objectivefunctions of the form J (x0; u)=

∫∞0 (xT(t)Qx(t)+uT(t)Ru(t)) dt they were able to employ the Hamilton–Jacobi

equation [12, p. 348] to express the optimal control in terms of a power series of the value function. For aMight control application involving a third-order aircraft model they were able to evaluate up to the third-orderterms of this power series, using a general procedure involving sequential solution of a Riccati equation and anumber of linear algebraic equations. Halme and Hamalainen [9] studied the same problem with Garrard andJordan and presented a solution based on an integral equation formulation of the two-point boundary valueproblem arising from the necessary conditions of optimality. Freeman and Kokotovic [5,6], employed inverseoptimality to establish that control Lyapunov functions are solutions of Hamilton–Jacobi equations associatedwith sensible cost functionals. In [16], Saridis and Lee proposed a recursive algorithm that was shown toconverge to the optimum control law. In [2,3] Beard et al. proposed a Galerkin-based approximation for thesolution of a so-called general HJB equation. They reduced the HJB equation to a sequence of linear partialdi>erential equations that they then approximated using the Galerkin spectral method, and established regionsof convergence and stability for their solution method. In this same paper, a comprehensive review is given ofHJB solution methodologies based on Taylor series. The need for solution of a linear partial di>erential equa-tion for the evaluation of terms beyond third order, and the diGculties associated with estimating the regionsof closed loop stability and power series convergence, are listed as major shortcomings of these techniques.More recently, Manousiouthakis and Chmielewski [14] employed an inverse optimality framework to providean exact solution for an appropriately de!ned constrained in!nite-time nonlinear optimal control problem.This paper is organized as follows. In Section 2, the in!nite-time nonlinear quadratic optimal control problem

is presented and a Taylor series based solution method is discussed. Conditions are then established that helpidentify regions of stability for the approximate optimal control strategies. In Section 3, the technique is usedto evaluate the region in which the constrained and unconstrained in!nite-time optimal control problems havethe same solution. Throughout the work, the method is illustrated on a chemical reactor control problem.

2. Unconstrained in�nite-time nonlinear quadratic optimal control

In this section, we consider the unconstrained in!nite-time nonlinear quadratic optimal control(ITNQOC)problem described by

V (�) = infx;u

∫ ∞

0(x(t)TQ(x(t))x(t) + u(t)TR(x(t))u(t)) dt;

s:t: x(t) = f(x(t)) + g(x(t))u(t); x(0) = �; (1)

where x(t)∈Rn; u(t)∈Rm; t ∈ [0;∞).Throughout this work, the following assumptions are employed:

(A1) f(0) = 0; g(0) �= 0;(A2) Q(·); R(·); R−1(·); f(·); g(·); V (·) are analytic, in!nitely di>erentiable functions in Rn;(A3) V1(x) = 0; V0(x) = 0;∀x∈Rn, where the power series of V (x) is V (x) =

∑∞i=0 Vi(x);

(A4) (1) admits an optimal control;(A5) Q0(x) = 0; Q1(x) = 0 ∀x∈Rn, where the power series of xTQ(x)x is xTQ(x)x =

∑∞i=0 Qi(x).

Under (A4), the optimal feedback control input is

u(x) =− 12R

−1(x)gT(x)@V (x)@x

; (2)

where V (·) is the value function of (1) that satis!es the Hamilton–Jacobi–Bellman (HJB) equation [11, p. 418]

xTQ(x)x +(@V (x)@x

)Tf(x)− 1

4

(@V (x)@x

)Tg(x)R−1(x)gT(x)

@V (x)@x

= 0: (3)

Page 3: On infinite-time nonlinear quadratic optimal control

Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268 261

Based on assumptions (A1)–(A5), the functions (·)TQ(·)(·); G(·) = 14g(·)R−1(·)g(·)T; f(·);V (·) can be

expanded into a power (Taylor) series around the origin, i.e.

xTQ(x)x =∞∑i=2

Qi(x); G(x) =∞∑i=0

Gi(x); f(x) =∞∑i=1

fi(x); f0(x) = 0; (4)

V (x) =∞∑i=2

Vi(x); V0(x) = 0; V1(x) = 0; (5)

where Qi(x); Gi(x); fi(x); Vi(x) denote scalar or matrix ith-order polynomials, as appropriate. As an illus-tration, for a two-dimensional system, the third-order term in (5) can be written as: V3(x)=V30x31 +V21x

21x2 +

V12x1x22 + V03x32. Substituting (4) and (5) into the HJB equation (3), results in

∞∑i=2

Qi(x) +

( ∞∑i=2

@Vi(x)@x

)T ∞∑j=1

fj(x)

( ∞∑i=2

@Vi(x)@x

)T ∞∑j=0

Gj(x)

( ∞∑

k=2

@Vk(x)@x

)= 0: (6)

To make (6) hold for all x, it is necessary and suGcient that terms of any order be zero. Considering that(A3) holds, the zeroth-order and the !rst-order terms are automatically satis!ed. In turn this implies:• second-order term of (6):

Q2(x) +(@V2(x)@x

)T(f1(x))−

(@V2(x)@x

)T(G0(x))

(@V2(x)@x

)= 0; (7)

• ‘th-order (‘¿ 3) term of (6):

Q‘(x) +‘∑i=2

(@Vi(x)

@x

)Tf‘−i+1(x)− ‘−i∑j=0

Gj(x)@V‘−i−j+2(x)

@x

= 0: (8)

Isolating the highest-order term @V‘(x)=@x in (8), then yields(@V‘(x)@x

)T(f1(x)− 2G0(x)@V2(x)@x

)

=− Q‘(x)−‘−1∑i=3

(@Vi(x)

@x

)Tf‘−i+1(x)− ‘−i∑j=0

Gj(x)@V‘−i−j+2(x)

@x

−(@V2(x)

@x

)Tf‘−1(x)− ‘−2∑j=1

Gj(x)@V‘−j(x)

@x

∧= S‘(x): (9)

The structure of Eqs. (7) and (9) can best be appreciated by considering the optimal control problem:

�TV �= infx;u

∫ ∞

0(x(t)TQx(t) + u(t)TRu(t)) dt; s:t: ˙x(t) = Ax(t) + Bu(t); x(0) = �; (10)

where x(t)∈Rn; u(t)∈Rm ∀t ∈ [0;∞); V ; Q; R are constant symmetric matrices such that Q=Q(0); R=R(0),and

xTV x = V2(x); xTQx = Q2(x) ∀x∈Rn (11)

and A; B are constant matrices of appropriate dimensions such that

Ax = f1(x) ∀x∈Rn; B= g(0): (12)

Page 4: On infinite-time nonlinear quadratic optimal control

262 Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268

Considering the additional assumption:

(A6) Q¿ 0; R¿ 0; (A; B) controllable,

the optimal control policy for (10) exists, and is stabilizing, unique and equal to: u(x) =−R−1BTV x, whereV is the unique positive de!nite solution of the Riccati equation (16).Indeed, based on (10)–(12), it holds that

@V2(x)@x

= 2V x; (13)

G0(x) = 14 g(0)R

−1(0)g(0)T = 14 BR

−1BT: (14)

Thus, substituting (11)–(14) into (7), we obtain

xT(Q + V A+ ATV − V BR−1BTV )x = 0: (15)

Since (15) must hold for all x �= 0, it is equivalent to (16)Q + V A+ ATV − V BR−1BTV = 0: (16)

This implies that (7) is equivalent to the Riccati equation corresponding to the optimal control problem (10)for the linearization of (1).This same linearization (10) also plays an important role in (9). Indeed, it is important to note that

f1(x) − 2G0(x)(@V2(x)=@x) = (A − BR−1BTV )x = RAx remains the same for all ‘, and that it is equal to thevector !eld that determines the closed-loop dynamics for the linearization of (1), with constant objectivefunction weights, i.e. problem (10). Examining (9) closely reveals that both its left- and right-hand sides arepolynomials of degree ‘. Thus, equating the coeGcients of corresponding terms in (9), gives rise to a set oflinear equations in terms of the coeGcients of V‘(x), which is denoted as

�‘ • V‘ = S‘; (17)

where �‘ denotes a known square matrix whose elements are known linear functions of the entries of RA; V‘is a vector whose elements are all the coeGcients of V‘(x) and S‘ is a known vector whose entries dependon the coeGcients of the value function polynomials Vi(x) of order i lower than ‘.The following theorem can then be stated.

Theorem 1. If (A1)–(A6) hold, then the HJB equation (3) admits a unique solution V (x)¿ 0 ∀x∈Rn−{0};that is a positive de8nite function near the origin. Furthermore, the coe:cients of the power series expansionof this solution can be identi8ed through 8rst solution of (7) and then sequential solution of (9) (or (17))for ever increasing values of ‘. Finally, if the Taylor series of the value function V (x) is truncated at the‘-order term, then, the (‘− 1)-order terms of the obtained solution for this ‘-order approximation are thesame as the (‘ − 1)-order terms of both the obtained solution for the (‘ − 1)-order approximation and ofthe exact solution of the HJB equation.

Proof. We !rst establish that (9) admits a unique solution. To that purpose, we show that V‘(x) is uniquefor any ‘. We proceed by contradiction. Let ‘ be the lowest order such that V ‘(x); ˜V‘(x) are two solutionsof (9) that are di>erent at least at some point x0 ∈Rn. Clearly, ‘ must be greater than or equal to 3 sinceV2(x) is unique based on (A6), (13) and (16). It then holds(

@(V ‘(x)− ˜V‘(x))@x

)T(f1(x)− 2G0(x)@V2(x)@x

)= 0⇔

(@(V ‘(x)− ˜V‘(x))

@x

)TRAx = 0:

Page 5: On infinite-time nonlinear quadratic optimal control

Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268 263

Table 1The number of coeGcients of the ‘th-order term of the value function (V‘(x))

n 1 2 3 4 5

4 4 10 20 35 5610 10 55 220 715 200230 30 455 4960 40,920 278,25650 50 1275 22,100 292,825 3,162,510

Applying the above equation to the linearized closed-loop system trajectory x(t) starting at x0 (see (10), with�= x0), then, yields(

@(V ‘(x)− ˜V‘(x))@x

)Tdxdt= 0⇔ d(V ‘(x(t))− ˜V‘(x(t))

dt= 0⇔ V ‘(x(t))− ˜V‘(x(t)) = constant ∀t:

However, the linearized closed-loop system dx=dt = RAx; x(0) = x0 is asymptotically stable, because of A6.Therefore, limt→∞‖x(t)‖=0 and thus the above constant is zero, since limt→∞[V ‘(x(t))− ˜V‘(x(t))]=V ‘(0)−˜V‘(0) = 0. Thus, V ‘(x(t)) = ˜V‘(x(t))⇒ V ‘(x0) = ˜V‘(x0) which is a contradiction.To establish positive de!niteness of V (x) near the origin, we recollect that under assumptions (A1)–(A6),

solving the Riccati equation (16) leads to a positive de!nite solution V . Since all higher order terms of V (x)can be neglected in a small enough neighborhood of the origin, V (x) is a positive de!nite function near theorigin.As stated above, V2(x) is unique based on (A6), (13) and (16). Then, the coeGcients of V3(x) can be

solved for uniquely using (17). Carrying on this procedure iteratively until (‘ − 1) order, the coeGcientsof V3(x) through V‘−1(x) can be solved for uniquely based on (17). Based on the above iterative solutionprocedure, it is also easy to verify that if the value function V (x) is truncated with an ‘-order polynomial, itis suGcient to just expand f(x); G(x) and xTQ(x)x in (4) up to at most an ‘-order polynomial, since anyterms of f(x); G(x) and Q(x) higher than ‘-order will not appear in (9).

In order to assess the growth of computational complexity with the order of approximation and the dimensionof the system, we calculate the total number of variables involved in all terms and in the ‘-order term ofthe value function. The number of coeGcients involved in the ‘-order term of the value function (i.e. thedimension of V‘ in (17)) is N‘ =

∑Ki=1 C

ni C

‘−1i−1 ; K = min(n; ‘). N‘ is tabulated for some n, ‘ in Table 1,

where the row index is n and the column index is ‘. The total number of coeGcients involved in the !rst ‘terms of the value function (i.e. e>ectively from 2 to ‘ since V0(x) = 0, V1(x) = 0) is N =

∑Ki=0 C

ni C

‘i .

Having outlined a systematic method for the computation of the ‘th-order term of the value function (V‘(x)),we now proceed to quantify the closed-loop stability region for the associated control law

u‘(x) =− 12 R

−1(x)gT(x)‘∑i=2

@Vi(x)@x

: (18)

To establish stability, we !rst de!ne the following two sets.

De�nition 1. D‘ is the set in which(‘∑i=2

@Vi(x)@x

)Tf(x)− 1

2

(‘∑i=2

@Vi(x)@x

)Tg(x)R−1(x)gT(x)

(‘∑i=2

@Vi(x)@x

)¡ 0

Page 6: On infinite-time nonlinear quadratic optimal control

264 Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268

and‘∑i=2

Vi(x)¿ 0 ∀x �= 0:

De�nition 2. The a-level set of the ‘th-order approximately optimal value function is the connected set thatcontains the origin and is de!ned as

D‘a − {0}= {x∈Rn | 0¡‘∑i=2

Vi(x)6 a; a¿ 0}:

Then the following holds.

Theorem 2. If assumptions (A1)–(A6) hold, then the feedback control given by the above approximationmethod is asymptotically stable for all initial conditions in the set D‘a as long as D

‘a ⊂ D‘.

Proof. Employing the ‘-order approximate control u‘(x)=− 12 R

−1(x)gT(x)(∑‘

i=2 @Vi(x)=@x)(Eq. (18)) gives

rise to the following closed-loop system:

x = f(x)− 12 g(x)R

−1(x)gT(x)

(‘∑i=2

@Vi(x)@x

): (19)

The derivative of∑‘

i=2 Vi(x) along the trajectories of (19), denoted as∑‘

i=2 V i(x), is given by‘∑i=2

V i(x) =

(‘∑i=2

@Vi(x)@x

)Tf(x)− 1

2

(‘∑i=2

@Vi(x)@x

)Tg(x)R−1(x)gT(x)

(‘∑i=2

@Vi(x)@x

): (20)

But then∑‘

i=2 Vi(0) = 0,∑‘

i=2 Vi(x)¿ 0 in D‘a − {0}, ∑‘i=2 V i(x)¡ 0 in D‘a − {0}. Since the origin is an

equilibrium point of (19) and D‘a contains the origin, application of Theorem 3.1 [10, p. 100] yields that theorigin is asymptotically stable with D‘a as a region of attraction [10, p. 109].

The proposed approximation procedure and associated stability results are illustrated on a chemical reactorcontrol problem.

Example 1 (Manousiouthakis and Chmielewski [14]). Consider a continuously stirred tank reactor (CSTR)governed by the following system of equations:[

x1

x2

]=

[−0:01x21 − 0:338x1 + 0:02x20:05x21 + 0:159x1 − 0:03x2

]+

[0:02

0

]u: (21)

The performance to be optimized is in the form of (1) with weight matrices

Q =

[10 0

0 1

]; R= 1:

First, the positive de!nite solution of the Riccati equation (16) is identi!ed. The resulting second-orderapproximate value function and corresponding control are

V2(x) = 19:6527x21 + 21:6336x1x2 + 23:0978x22 ; (22)

u 2(x) =−0:3931x1 − 0:2163x2: (23)

Page 7: On infinite-time nonlinear quadratic optimal control

Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268 265

0 100 200 300−1.6

−1.4

−1.2

−1

− 0.8

− 0.6

− 0.4

− 0.2

0

0.2x1

0 100 200 3000

0.5

1

1.5

2

2.5

3x2

0 100 200 300−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2u

Fig. 1. Closed-loop simulations for !fth-order approximate nonlinear optimal controller (initial condition x1 =−1:5, x2 = 3).

Then sequential solution of a series of algebraic equations allows computation of any required order ap-proximate value function and control.The !fth-order approximate value function and corresponding control are

5∑i=2

Vi(x) = 19:6527x21 + 21:6336x1x2 + 23:0978x22 + 1:1399x

31 + 3:1079x

21x2 + 0:3018x1x

22

+0:05256x32 + 0:0833x41 − 0:0295x31x2 − 0:00164x21x22 − 0:00134x1x32 − 0:0003x42

−0:0029x51 + 8:07× 10−5x41x2 − 8:1210−5x31x22 − 3:63× 10−5x21x32+0:304× 10−5x1x42 + 0:86× 10−6x52 ; (24)

u5(x) =−0:3931x1 − 0:2163x2 − 0:0342x21 − 0:0622x1x2 − 3:018× 10−3x22−0:00333x31 + 0:000885x21x2 + 0:0000328x1x22 + 0:0000135x32+1:45× 10−4x41 − 0:32× 10−5x31x2 + 0:24× 10−5x21x22+0:727× 10−6x1x32 − 0:304× 10−7x42 : (25)

Closed-loop simulations under the !fth-order approximately optimal nonlinear controller (Eq. (25)) are shownin Fig. 1. It can be seen that the closed-loop system is driven to the origin through use of this controller.Based on Theorem 2, stability regions (D‘a) are identi!ed in Fig. 2 for these approximately optimal control

strategies. Furthermore, as discussed in Theorem 1, the (‘−1)-order coeGcients obtained for the (‘−1)-orderapproximation are the same as the (‘ − 1)-order coeGcients for the ‘-order approximation.

3. Constrained in�nite-time nonlinear optimal control

In this section, we consider the constrained in!nite-time nonlinear quadratic optimal control problem (CIT-NQOC):

$(�) = infx;u

∫ ∞

0(x(t)TQ(x(t))x(t) + u(t)TR(x(t))u(t)) dt

s:t: x(t) = f(x(t)) + g(x(t))u(t); x(0) = �; Ci(x(t); u(t))6 0; i = 1; · · · ; p ∀t¿ 0: (26)

Page 8: On infinite-time nonlinear quadratic optimal control

266 Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268

−15 −10 −5 0 5 10 15 20−25

−20

−15

−10

−5

0

5

10

15 2-order, a=978.143-order, a=467.234-order, a=18925-order, a=2279.4

Fig. 2. Stability Regions D‘a for di>erent orders of approximation.

Let u(x) denote the optimal feedback control law (Eq. (2)) of the unconstrained problem (ITNQOC) de!nedby (1), while u‘(x) (Eq. (18)) denotes the ‘th-order approximate solution of ITNQOC obtained by theapproximate method given earlier. Similarly, x(t; �) indicates the trajectory of the unconstrained system withcontrol u(x) given by (2), and initial state x(0) = �, while x‘(t; �) denotes the trajectory of the unconstrainedsystem with the ‘th order approximate optimal control u‘(x), and initial state x(0) = �.Then the following sets are de!ned.

De�nition 3. The set of initial conditions such that the unconstrained solution violates no constraints is de!nedas

O∞ = {�∈Rn|Ci(x(t; �); u(x(t; �)))6 0 ∀t¿ 0}:

De�nition 4. The set of initial conditions such that the unconstrained approximate solution violates no con-straints is de!ned as

O‘∞ = {�∈Rn|Ci(x‘(t; �); u‘(x‘(t; �)))6 0 ∀t¿ 0}:

De�nition 5. The set of initial conditions which satisfy the constraints is de!ned as O0={�∈Rn|Ci(�; u(�))60}.

It is shown in [4,14,17] that $(�)¡∞ in problem (26) implies that there exists a suGciently large but!nite T such that the optimum solution satis!es x∗(T )∈O∞ for system (26). So, the CITNQOC problem canbe converted to a constrained !nite-time quadratic nonlinear optimal control (CFTNQOC) problem and anunconstrained in!nite-time optimal control problem. Thus, when the solution of the CITNQOC problem entersO∞(or a subset of O∞), the CITNQOC problem can be solved as an unconstrained problem with initial statex∗(T ).For the proposed approximate method of unconstrained nonlinear optimal control and the aforementioned

CITNQOC problem, the following then holds:

Theorem 3. If for some a¿ 0, D‘a ⊂ (O0∩D‘), where D‘ and D‘a are the sets de8ned in Section 2 (De8nitions1 and 2), then D‘a ⊂ O‘∞.

Page 9: On infinite-time nonlinear quadratic optimal control

Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268 267

−5 0 5

2-order approximatesolution

3-order approximatesolution

4-orderapproximatesolution

5-order approximatesolution

x1

−5 0 5x1

x1

-5 0 5x1

-5 0 5x1

−8−6−4−202468

x 2

−8−6−4−202468

x 2

−8−6−4−202468

x 2

−8−6−4−202468

x 2

(a) O 2 ~ O5

−4 −3 −2 -1 0 1 2 3 4−6

−4

−2

0

2

4

6

8

2-order

3,4,5 - order

x 2

(b) Comparison of O l∞∞ ∞

Fig. 3. O‘∞ for di>erent orders of approximation and comparison.

Proof. Let z ∈D‘a ⊂ D‘. Then∑‘

i=2 Vi(z)6 a,∑‘

i=2 Vi(z)¿ 0 and∑‘

i=2 Vi(z)¡ 0 from Theorem 2. Thisimplies that x‘(t; z) will remain in D‘a for all t¿ 0. Since D‘a ⊂ O0, this implies that the constraints will besatis!ed from then on and thus z ∈O‘∞.

Remark 1. The above theorem provides a way to identify a subset of O‘∞. However, the approximate HJBsolution methodology outlined above can also be employed to identify O‘∞ and, as the order of approximationincreases, O∞ itself (since O‘∞ converges to O∞ as ‘→ ∞). This fact is illustrated in Example 2.

Example 2 (Manousiouthakis and Chmielewski [14]). Consider the same CSTR model as Example 1 with theaddition of the inequality constraints: −1:596 x16 0:16; −4:216 x2; −106 u6 10.We are interested in identifying O‘∞ and in demonstrating that as ‘ increases, O‘∞ become indistinguishable

from one another. The area surrounded by the dashed lines in Fig. 3(a) is the intersection of the constraintsets imposed on the system states. The constraint on the control input does not appear in these two !guressince it is far from the surrounded area in the scale of these !gures.It is observed in Fig. 3(b) that the constraint satisfaction region O2∞ for the controller corresponding to the

second-order approximation of the optimal value function is di>erent than the region O3∞ corresponding tothe third-order approximation. However, O‘∞ begins to converge for approximations with order higher than 3.The di>erences among the regions O3∞, O

4∞, O

5∞, corresponding to the third-, fourth- and !fth-order approx-

imations, are practically indistinguishable. This observation is important when dealing with the CITNQOCproblem. It implies that, for this example, we can use a low order value function approximation to accuratelyapproximate the set O∞ corresponding to the actual nonlinear model (1), and the optimal nonlinear feedbacklaw (2). Of course, there is no guarantee that, for other nonlinear systems, a low-order approximation willwork as well.

4. Conclusions

In this paper, the unconstrained in!nite time nonlinear quadratic optimal control problem is studied fora general class of nonlinear systems. A power series-based approximation method is proposed to solve theassociated HJB equation. The method involves solution of the Riccati equation for the linearized problem,

Page 10: On infinite-time nonlinear quadratic optimal control

268 Y. Chen et al. / Systems & Control Letters 51 (2004) 259–268

followed by sequential solution of a series of linear algebraic equations. Uniqueness of the solution and aregion of stability are established. The constrained in!nite time nonlinear quadratic optimal control problemis also studied. The aforementioned HJB approximation method is employed to establish regions in whichthe constrained and unconstrained optimal control problems have identical solutions. An example is employedthroughout this work to illustrate the proposed approximation method, and to demonstrate its use in identifyingconstraint satisfying regions as well as regions of stability for the approximate optimal feedback laws obtained.

References

[1] E.G. Al’Brekht, On the optimal stabilization of nonlinear systems, J. Appl. Math. Mech. (PMM) 25 (1961) 836–844 (in Russian).[2] R.W. Beard, G.N. Saridis, J.T. Wen, Galerkin approximation of the generalized Hamilton–Jacobi–Bellman equation, Automatica 33

(1997) 2159–2177.[3] R.W. Beard, G.N. Saridis, J.T. Wen, Approximation solutions to the time-invariant Hamilton–Jacobi–Bellman equation, J. Optim.

Theory Appl. 96 (1998) 589–626.[4] D. Chmielewski, V. Manousiouthakis, Constrained in!nite-time quadratic optimal control: the linear stochastic and nonlinear

deterministic cases, Proceedings of American Control Conference, Philadelphia, PA, 1998, pp. 2093–2097.[5] R.A. Freeman, P.V. Kokotovic, Optimal nonlinear controllers for feedback linearizable systems, Proceedings of the American Control

Conference, Seattle, WA, 1995, pp. 2722–2726.[6] F.A. Freeman, P.V. Kokotovic, Robust Nonlinear Control Design: State-space and Lyapunov Techniques, Birkhauser, Boston, 1996.[7] W.L. Garrard, Additional result on sub-optimal feedback control of non-linear systems, Internat. J. Control 10 (1969) 657–663.[8] W.L. Garrard, J.M. Jordan, Design of nonlinear automatic Might control systems, Automatica 13 (1977) 497–505.[9] A. Halme, R.P. Hamalainen, On the nonlinear regulator problem, J. Optim. Theory Appl. 16 (1975) 255–275.[10] H.K. Khalil, Nonlinear Systems, 2nd Edition, Prentice-Hall, Englewood Cli>s, NJ, 1996.[11] D. Kirk, Optimal Control Theory: an Introduction, Prentice-Hall, Englewood Cli>s, NJ, 1970.[12] E.B. Lee, L. Markus, Foundations of Optimal Control Theory, Wiley, New York, 1967.[13] D.L. Lukes, Optimal regulation of nonlinear dynamical systems, SIAM J. Control Optim. 7 (1969) 75–100.[14] V. Manousiouthakis, D.J. Chmielewski, On constrained in!nite-time nonlinear optimal control, Chem. Eng. Sci. 57 (2002) 105–114.[15] Y. Nishikawa, N. Sannomiya, H. Itakura, A method for suboptimal design of nonlinear feedback systems, Automatica 7 (1971)

703–712.[16] G.N. Saridis, C.G. Lee, An approximation theory of optimal control for trainable manipulators, IEEE Trans. Automatic Control 13

(1968) 621–629.[17] M. Sznaier, J. Cloutier, Receding horizon control Lyapunov function approach to suboptimal regulation of nonlinear systems, J.

Guidance Control Dyn. 23 (2000) 399–405.[18] R.A. Werner, J.B. Cruz, Feedback control which preserves optimality for systems with unknown parameters, IEEE Trans. Autom.

Control 13 (1968) 621–629.