1
IEEE TRANSACTIOXS ON AUTOMATIC CONTROL, APRIL 1969 193 Correspondence On Stochastic Optimal Control Abstract-It is shown that, for a class of stochastic systems, i.e., those in which the cost increases as the distance between the stochastic and the deterministic con- trols increases, the optimal stochastic con- trol is the conditional expectation of the deterministic control, given the measure- ment history. STATEMENT OF THE PROBLEM Consider the stochastic system *(t) = f(z(t), uo(t) + E(0, t) (1) = Q(Z(t), .I(t)) (2) where 2 is an n x 1 vector of states, ua is an mX 1 vector of ordered controls, t is t.he time, y is a p X 1 vector of observa- tions, and 6 and 9 are m X1 and n X 1 vectors,respectively, of measurableran- dom functions with zero means. Note that in (1) t,he disturbance is additive. In this correspondence, the notation z(t) will denote the value of t.he function z at time t, while the symbol z will denote the entire time function, Le., z(t) Vt E [to, tf]. Here to is the initial time, and tf is the final t,ime whichmagbeeither fixed or free. The control uo is t.o be chosen from a set of admissible controls satisfying the con- straint a, so as to transfer the system from an initial state 50, which may be a vector of randomvariables, to a target set S and minimize t.he performance functional p = E[Jlu,tJ IIV (3) where 21 =uo +E. RELATION TO rHE DETERMIKISTIC CONTROL Suppose an optimal control u* exists and can be found for the deterministic syst.em formed by setting g and 9 equal t.o zero in (1) and (2). In this case, the per- formance functional is a deterministic quantity, and the expected value opera- tion may be dropped. In thedeterministic version of the problem, xo is assumed known. Here, u* may be eit.her an open- loop or a closed-loop solution. Returning to a consideration of the stochastic syst.em for a fixed, but unknown initial state, the cost of a solution cannot be less than t,he cost incurred by the det.erministic system starting in t.he same initial st.ate at the same time. This follows from the defini- t,ion of opt.imality. Therefore, Bhe cost of the deterministic solut.ion is a lower bound on the cost incurred by t.he st,o- chastic syst.em. I n light. of t.his, an “excess cost” might be defined by Manuscript received September 28, 1968. which represents the increase in cost due to the presence of uncertainties. The principal result of this correspon- dence is contained in the following theorem. Theorem it. is true that If for the system under consideration AP = E[L(llu* - Ull)IIY (5) where L( s) satisfies L(0) = 0 L(E) 2 0, when E # 0 (6) L(a) 2 L(ed, when 2 el and the conditional distribution function Pr[ul*(t) 5 el,. . I a,lly (7) is symmetric ~ t h respect to the m vari- ables (ul*(t) - I&),. . -, (zh*(t) - em) and convex int,he region whereall of these variables are negative. Here iii denot,es the mean. Then the control that. minimizes (4) and hence (3) also is u,o(t) = E [u*(t) I I Y, WE [to,b]. (8) Proof: Assuming (5) is true, Sherman [5] shows that to minimize (4), it. is necessary t.hat E(llw* - u.11) I y = 0. (9) El.*(O - u(t)Il I y = 0, Vt (10) But (9) implies that except perhaps on a set of measure zero. Equation (10) implies that E[u*(t) - ~(t)]]y = 0 (11) which can be rewritten uo(t) = E[u*(t) - E(t) 1 I y. (12) Recognizing that the mean value of g was assumed to be zero leads to the result (8). EXAMPLE As an application of (8), consider a continuous version of the problem treated by Joseph and Tou [2]. The plant is described by ~(t) = A(Oz(0 + B(t) [~o(t) + E(0 1 (13) y(t) = C(t)z(t) (14) where E(t) is a white Gaussian disturbance with zero mean. The performance func- tional is P = E{zT(t,)Fz(t,) + st’ zT(t)Q(t)z(f) 10 !I (15) +UT(t)R(t)U(t) at y. The optimal cont.ro1 is given by [l ] u*(t) = G(t)r(t) (16) where G(t) is a gain matrix which can be determined. According to (8), the optimal stochastic cont.ro1 is given by ua(0 = E [G(t)z(t) I I Y (17) uo(t) = G(OE[z(t) 11~. (18) Since the disturbance is Gaussian and the syst,emislinear, E [z(t)] I y canbe ob- tained as t,he out.put of a Kalman-Bucy filter [4]. Then the result becomes ~o(t) = G(t)f(t) (19) which exhibits the familiar separat,ion property. COXLUSIONS The Theorem proved shows that., for an important class of stochastic syst.ems, the best performance is obtained by staying as close as possible to the deter- ministic control. G. PARKINS Universit.y of Denver Denver, Colo. REFERENCES Ill hf. Athans and P. L. Falb, Optimal Control: An Introdudiun to the Theory and its Applications. [2] P. D. Jpph and J. T. Tou, On linear control Kern York: WcGraw-Hill, 196:. theory, AIEE Trans. (A4pplicdion.s and In- dustry), 1-01. 80, pp. 193-196, September 1961. I31 R. E. Kalman, “A new approach to linear filter- ing and prediction problems,’ Trans. ASME. J. Ewrg., ser. D, vol. 82, pp. 35-45, March [41 R. E. Kalman and R. S. Bucy, “New rnmults in 1Y60. linear filtering and prediction theory, Trans. ASME, J. Basic Ewg., ser. D, vol. 83, pp. 95108, March 1961. 151 S. Sherman, Won-mean-square error criteria,” IRE Tram. Infonndion Theory, vol. ITA, pp. 125-126. September 1958. ~~ Unbiased Estimator for Identifica- tion of Constant Linear Discrete Systems Abstract-An unbiased estimator for real-time parameter identification and state estimation for constant linear discrete sys- tems is presented. Bias correction is accom- plished by taking account of the mean value of second-order terms which are generally neglected. The inclusion of the bias correc- tion results in better performance for a quadratic cost criterion. The problem of real-time paramet.er identification has been considered else- where [2]-[5]. The general procedure followed has been to linearize the system equations about t.he expected (mean) values of parameters and state variables neglecting all second- and higher order terms. The unknown parameters are then appended to t,he state matrix, thereby forming a new st,ate matrix which is recursively estimated by meam of a Manuscript received September 28, 1968.

On stochastic optimal control

  • Upload
    g

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On stochastic optimal control

IEEE TRANSACTIOXS ON AUTOMATIC CONTROL, APRIL 1969 193

Correspondence

On Stochastic Optimal Control Abstract-It is shown that, for a class

of stochastic systems, i.e., those in which the cost increases as the distance between the stochastic and the deterministic con- trols increases, the optimal stochastic con- trol is the conditional expectation of the deterministic control, given the measure- ment history.

STATEMENT OF THE PROBLEM

Consider the stochastic system

* ( t ) = f ( z ( t ) , uo(t) + E ( 0 , t ) (1)

= Q ( Z ( t ) , . I( t)) (2)

where 2 is an n x 1 vector of states, ua is an m X 1 vector of ordered controls, t is t.he time, y is a p X 1 vector of observa- tions, and 6 and 9 are m X 1 and n X 1 vectors, respectively, of measurable ran- dom functions with zero means. Note that in (1) t,he disturbance is additive. In this correspondence, the notation z ( t ) will denote the value of t.he function z at time t, while the symbol z will denote the entire time function, Le., z ( t ) Vt E [ to , t f ] . Here t o is the initial time, and t f is the final t,ime which mag be either fixed or free. The control u o is t.o be chosen from a set of admissible controls satisfying the con- straint a, so as to transfer the system from an initial state 50, which may be a vector of random variables, t o a target set S and minimize t.he performance functional

p = E[Jlu,tJ I I V (3) where 21 =uo + E .

RELATION TO rHE DETERMIKISTIC CONTROL

Suppose an optimal control u* exists and can be found for the deterministic syst.em formed by setting g and 9 equal t.o zero in (1) and (2). I n this case, the per- formance functional is a deterministic quantity, and the expected value opera- tion may be dropped. In the deterministic version of the problem, xo is assumed known. Here, u* may be eit.her an open- loop or a closed-loop solution. Returning to a consideration of the stochastic syst.em for a fixed, but unknown initial state, the cost of a solution cannot be less than t,he cost incurred by the det.erministic system starting in t.he same initial st.ate at the same time. This follows from the defini- t,ion of opt.imality. Therefore, Bhe cost of the deterministic solut.ion is a lower bound on the cost incurred by t.he st,o- chastic syst.em. I n light. of t.his, an “excess cost” might be defined by

Manuscript received September 28, 1968.

which represents the increase in cost due to the presence of uncertainties.

The principal result of this correspon- dence is contained in the following theorem.

Theorem

it. is true that If for the system under consideration

AP = E[L(llu* - U l l ) I I Y (5) where L( s) satisfies

L(0) = 0 L(E) 2 0, when E # 0 (6)

L(a) 2 L(ed, when 2 el

and the conditional distribution function Pr[ul*(t) 5 el,. . I a,lly (7)

is symmetric ~ t h respect to the m vari- ables (ul*(t) - I&),. . -, ( zh*( t ) - em) and convex in t,he region where all of these variables are negative. Here i i i denot,es the mean. Then the control that. minimizes (4) and hence (3) also is

u,o(t) = E [u*(t) I I Y, WE [ t o , b ] . (8) Proof: Assuming (5) is true, Sherman

[5] shows that to minimize (4), it. is necessary t.hat

E(llw* - u.11) I y = 0. (9)

Ell.*(O - u(t)Il I y = 0, Vt (10)

But (9) implies that

except perhaps on a set of measure zero. Equation (10) implies that

E[u*(t) - ~ ( t ) ] ] y = 0 (11)

which can be rewritten

uo(t) = E[u*(t) - E(t) 1 I y. (12) Recognizing that the mean value of g was assumed to be zero leads to the result (8).

EXAMPLE

As an application of (8), consider a continuous version of the problem treated by Joseph and Tou [ 2 ] . The plant is described by

~ ( t ) = A(Oz(0 + B(t) [ ~ o ( t ) + E ( 0 1 (13) y(t) = C(t)z(t) (14)

where E(t) is a white Gaussian disturbance with zero mean. The performance func- tional is

P = E{zT(t,)Fz(t,) + st’ zT(t)Q(t)z( f ) 10

! I (15)

+ U T ( t ) R ( t ) U ( t ) at y.

The optimal cont.ro1 is given by [l ] u*(t) = G(t)r( t ) (16)

where G(t) is a gain matrix which can be determined. According to (8), the optimal stochastic cont.ro1 is given by

ua(0 = E [G(t)z( t ) I I Y (17)

uo(t) = G(OE[z(t) 1 1 ~ . (18) Since the disturbance is Gaussian and the syst,em is linear, E [ z ( t ) ] I y can be ob- tained as t,he out.put of a Kalman-Bucy filter [4]. Then the result becomes

~ o ( t ) = G(t) f ( t ) (19)

which exhibits the familiar separat,ion property.

COXLUSIONS The Theorem proved shows that., for

an important class of stochastic syst.ems, the best performance is obtained by staying as close as possible to the deter- ministic control.

G. PARKINS Universit.y of Denver

Denver, Colo.

REFERENCES I l l hf. Athans and P. L. Falb, Optimal Control: An

Introdudiun to the Theory and its Applications.

[2] P. D. J p p h and J. T. Tou, On linear control Kern York: WcGraw-Hill, 196:.

theory, AIEE Trans. (A4pplicdion.s and In- dustry) , 1-01. 80, pp. 193-196, September 1961.

I31 R. E. Kalman, “A new approach to linear filter- ing and prediction problems,’ Trans. ASME. J .

Ewrg. , ser. D, vol. 82, pp. 35-45, March

[41 R. E. Kalman and R. S. Bucy, “New rnmults in 1Y60.

linear filtering and prediction theory, Trans. ASME, J . Basic E w g . , ser. D, vol. 83, pp. 95108, March 1961.

151 S. Sherman, Won-mean-square error criteria,” I R E Tram. Infonndion Theory, vol. ITA, pp. 125-126. September 1958.

~~

Unbiased Estimator for Identifica- tion of Constant Linear Discrete Systems

Abstract-An unbiased estimator for real-time parameter identification and state estimation for constant linear discrete sys- tems is presented. Bias correction is accom- plished by taking account of the mean value of second-order terms which are generally neglected. The inclusion of the bias correc- tion results in better performance for a quadratic cost criterion.

The problem of real-time paramet.er identification has been considered else- where [2]-[5]. The general procedure followed has been t o linearize the system equations about t.he expected (mean) values of parameters and state variables neglecting all second- and higher order terms. The unknown parameters are then appended to t,he state matrix, thereby forming a new st,ate matrix which is recursively estimated by meam of a

Manuscript received September 28, 1968.