41
Optimal Continuous-Parameter Stochastic Control Author(s): Wendell H. Fleming Source: SIAM Review, Vol. 11, No. 4 (Oct., 1969), pp. 470-509 Published by: Society for Industrial and Applied Mathematics Stable URL: http://www.jstor.org/stable/2029083 . Accessed: 18/06/2014 05:36 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend access to SIAM Review. http://www.jstor.org This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AM All use subject to JSTOR Terms and Conditions

Optimal Continuous-Parameter Stochastic Control

Embed Size (px)

Citation preview

Page 1: Optimal Continuous-Parameter Stochastic Control

Optimal Continuous-Parameter Stochastic ControlAuthor(s): Wendell H. FlemingSource: SIAM Review, Vol. 11, No. 4 (Oct., 1969), pp. 470-509Published by: Society for Industrial and Applied MathematicsStable URL: http://www.jstor.org/stable/2029083 .

Accessed: 18/06/2014 05:36

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extendaccess to SIAM Review.

http://www.jstor.org

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 2: Optimal Continuous-Parameter Stochastic Control

SIAM REVIEW Vol. I I, No. 4, October 1969

OPTIMAL CONTINUOUS-PARAMETER STOCHASTIC CONTROL*

WENDELL H. FLEMINGt

TABLE OF CONTENTS

1. Background .................................................................... 470 2. Stochastic problems of Pontryagin type .......... .................................. 474 3. Stochastic differential equations ................................................... 475 4. Forward and backward operators ............ ..................................... 481 5. Controlled continuous Markov processes ........................................... 484 6. Dynamic programming (completely observable states) ................................ 488 7. Principle of optimal evolution ..................................................... 491 8. Control of autonomous processes ............ ..................................... 493 9. Open loop controls .............................................................. 495

10. Optimal stopping problems ....................................................... 496 11. Methods of approximate solution ............ ..................................... 498 12. Partially observable problems ..................................................... 500 13. Some open questions ..................... ....................................... 503 References ......................................................................... 505

1. Background. Stochastic optimization theories deal with mathematical models of random phenomena for which certain parameters are to be chosen in some "best" way. Frequently the random phenomena occur over time, in which case the optimization model deals with stochastic processes. In some problems the parameters to be optimized are constants; in others these parameters may be taken as unknown functions of time, or more generally functions of certain data available at each instant of time.

Under the broad heading "mathematical optimization", the term "optimal control" has generally been applied to problems suggested by physical models drawn from the field of automatic control of engineering systems. However, when optimal control problems are formulated in some generality the distinction between them and problems drawn from other fields (for example, mathematical economics) becomes fuzzy. Parts of optimal stochastic control are closely related to statistical decision theory. It therefore seems to be a matter of taste just how broadly one interprets the phrase optimal stochastic control.

In this survey we consider a class of problems which can be treated by methods of partial differential equations, of parabolic or elliptic type. In such problems the process to be optimally controlled is modelled by some continuous Markov process (generally vector-valued). When the states of the Markov process being controlled are completely observable by the controller, the relevant partial differential equation and boundary data can be formally deduced by dynamic programming. Using rather recent developments in the theories of continuous Markov processes and partial differential equations, the dynamic programming

* Received by the editors April 11, 1969. This invited paper was prepared and published in part under Contract DA-49-092-ARO-1 10 with the U.S. Army Research Office while the author was visiting at Stanford University, Stanford, California.

t Department of Mathematics, Brown University, Providence, Rhode Island 02912.

470

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 3: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 471

formalism has been put on a rigorous basis provided the partial differential operators appearing satisfy a uniform ellipticity condition (3.11). See ?? 6, 8, 10.

In many applications the states of the process being controlled are only partially observable. This leads to further difficulties both theoretically and for possible numerical calculations. In certain situations a separation principle holds which reduces the stochastic control problem to two separate problems, one of statistical estimation and the other a control problem with completely observable states. This principle applies, in particular, to a stochastic version of the linear regulator problem, for which a rather explicit solution is known (?? 6, 12).

A dynamic programming formalism, when applied to the general partially observable problem, yields a functional-partial differential equation instead of a partial differential equation for a function of finitely many variables. No rigorous justification has as yet been given. There are several ways to approximate this general problem by ones susceptible to partial differential equations methods (?13).

We consider controlled processes with continuous time- and state-parameters. There is an extensive literature on optimal control of discrete parameter processes. See for example [1], [4], [36], [44], [48], [106]. In many problems either a discrete- or continuous-parameter model is reasonable. The dynamic programming equa- tion for the continuous-parameter model takes the simpler and more elegant form, although more mathematical background is needed to treat the continuous- parameter model rigorously.

We shall also say little about continuous-parameter systems with data sampled at discrete instants of time, although they are of considerable practical interest in guidance problems. See for example [37], [78], [116].

References [1]-[14] are books on stochastic optimization and control, [1 5]-[20] on other control theory, [21]-[23] on parabolic and elliptic partial differential equations, [24]-[28] on Markov processes. References [29]-[131] are research articles with [46], [94], [97], [106], [124] and [125] being primarily survey articles.

1.1. Filtering and prediction. The subject of optimal stochastic control has roots in work on statistical methods for dealing with certain tracking and signal estimation problems. These estimation problems are of the following kind. Suppose that a desired signal ((t) is received at each instant of time t corrupted by an additive noise signal v(t). The problem is to best estimate the unobservable quantity 4(t) from observations ((r) + v(r) for r < t1. "Best" is taken in the mean square sense. This problem is called one of prediction if t > t1, filtering if t =t, and data-smoothing if t < t1. The signal often is assumed to have known form, but to depend on a finite number of parameters which must be statistically es- timated. A related class of problems of practical interest concerns the detection by statistical tests of the presence of noise-corrupted signals of known form. See for example [7], [12].

Early work on these problems was done in the 1940's by Wiener [13] and Kolmogorov [77]. For linear systems with stationary signal and noise processes, and with infinitely long memory, Wiener reduced the estimation problem to the solution of a Wiener-Hopf integral equation. He solved the latter by Laplace

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 4: Optimal Continuous-Parameter Stochastic Control

472 WENDELL H. FLEMING

transforms and a method of spectral factorization. This pioneering work stimulated a great deal of interest. A large literature devoted to estimation problems of practical interest in communications and control engineering has evolved. During the 1950's methods were developed for treating problems which may be non- stationary or finite memory. These involve the solution, in some way or other, of the nonstationary Wiener-Hopf integral equation for the weight function of the optimal linear estimator. Some nonlinear problems are also amenable to these treatments. See for example [2], [7], [8].

Gradually, the importance was recognized of methods which treat the estimation problem in the time domain and thereby do not lose its dynamical features. An important step in this direction was the filtering and prediction theory (1960-61) of Kalman and Bucy [74], [41]. In their model the signal 4(t) is a Gaussian random vector which evolves according to linear stochastic dif- ferential equations driven by a white noise. (Definitions are given in ? 3.) At each time t a white noise-corrupted linear observation of 4(t) is made. It turns out that not only 4(t) but also its conditional mean 4(t) is Gaussian and evolves according to linear stochastic differential equations (4(t) equals the mean square optimal estimate for 4(t) given the observations up to time t). The conditional covariance matrices evolve according to Riccati type ordinary differential equations (? 12).

The success of the Kalman-Bucy linear model was followed by work of several authors on mean square optimal filtering for systems which evolve accord- ing to nonlinear stochastic differential equations. In the nonlinear case one still wishes to calculate the condition mean 4(t) given the observed data, but in order to do so the conditional distribution of 4(t) is needed. This evolves according to a stochastic differential equation in a function space, first derived formally by Stratonovich and Kushner, and later treated rigorously. See [10], [47], [64], [84], [100]. This equation is generally intractible for actual computation; however methods have been proposed for finding the first few moments of the conditional distribution [50], [85].

When the signal process 4 is a finite state Markov chain with N + 1 states, the space of probability distributions is N-dimensional. The stochastic differential equations for the conditional distributions then operate in space of finite dimension N. See [105], [106], [120], also ? 12.

As is often done in statistical problems where mean square estimation is difficult to apply, maximum likelihood procedures have also been considered. In some situations the maximum likelihood estimate is much easier to calculate. See, for example, an application to system identification problems by Balakrishnan [30], [32].

1.2. Deterministic control theory. Since the 1950's a large literature has appeared on optimal deterministic control problems treated in the time domain. Among the early work in this period we mention Bellman's dynamic programming approach [16] and LaSalle's treatment of the linear time-optimal control problem [89]. Pontryagin and colleagues (1956) formulated a rather general control prob- lem which includes deterministic models of phenomena which occur in a variety of applications. He obtained a necessary condition for a minimum now known as Pontryagin's principle [20].

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 5: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 473

In the Pontryagin problem the state vector 4(t) evolves according to a system of ordinary differential equations, which we write using vector notation as

(I.1 dt = f (t, 4(t), u(t)), dt

where u(t) is a control vector applied to the system at time t. As criterion of per- formance of the system, an expression is taken of the form

(1.2) J(u) = { L(t, 4(t), u(t)) dt + (D(z, 4(T)),

where s is an initial time and T a stopping time for the system. In Pontryagin's problem J(u) is to be minimized subject to: fixed initial data (s, 4(s)), variable terminal data (-, 4(z)) constrained to lie on a given set X, and constraints on the control vector of the form u(t) E K for s ? t ? T where K is also given.

Without such constraints on u(t) the Pontryagin problem would be one of Bolza type in calculus of variations. Recently much more general deterministic control problems than Pontryagin's have been treated as mathematical program- ming problems in infinite-dimensional spaces. See [69], [102].

The dynamic programming approach to the Pontryagin problem is the one closest in spirit to the methods we shall describe for the stochastic case. In dynamic programming, the minimum value of (1.2) is regarded as a function of the initial data. Suppose that 4(s) = x; let

(1.3) 4(s, x) = min J(u),

the minimum being taken among all admissible control functions u. Then 0 satisfies (at least formally) the Hamilton-Jacobi equation

(1.4) Os + min [L + fxf] = 0, vc-K

where the expression in brackets is evaluated at (s, x, v), Os = a4/as and ox is the gradient vector in the space-like variables x. On the terminal set E we have

(1.5) 4(s, x) = 'D(s, x) for (s, x) E E.

The characteristic curves of the first order partial differential equation (1.4) satisfy, under suitable assumptions, the necessary condition for a minimum in Pontryagin's principle. See for example [75].

As is well known the method of characteristics guarantees only locally a smooth solution to (1.4)-(1.5), near points of E not tangent to the characteristic ground curves. Sometimes the method can be applied globally, using only those portions of characteristics which actually minimize. The solution 0 so constructed is only piecewise smooth. Generally speaking, the difficulties with (1.4) occur on some lower dimensional "singular set", for example a hypersurface across which the partial derivatives of 0 are discontinuous. This singular set can often be des- cribed in principle. For example, in the simplest problem of calculus of variations with T fixed and 4(z) free, 0 is (totally) differentiable at (s, x) if and only if there is a unique minimizing curve for initial data (s, x) (see [57]). Equation (1.4) holds where- ever 0 is differentiable [53, Section 5]. Unfortunately, the location of those points

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 6: Optimal Continuous-Parameter Stochastic Control

474 WENDELL H. FLEMING

where 4 fails to be differentiable is not known in advance. This is an obstacle to globally applying the method of characteristics to solve (1.4).

2. Stochastic problems of Pontryagin type. Many systems occurring in practice can be better modelled stochastically than by the deterministic equations (1.1). This is so when the system is subject to imperfectly known disturbances which may be taken as random, or when the states of the system are measured with certain random errors. For a stochastic optimal control model let us take a system of stochastic differential equations

(1.1') ~~~~dX f t(t, 4(t), u(t)) + v(t),

where the term v(t) is a white noise perhaps multiplied by a coefficient matrix a depending on the states of the system. See ?? 3, 5. As performance criterion we take the average (i.e., expected value) of (1.2) which is to be minimized in the class of controls admitted. An important feature of the problem is that data obtained during the operation of the system may be used to continually control the system equations (1.1') through choice of u(t). This is in contrast, for example, with filtering and optimal stopping problems in which the system equations do not contain the control parameter u. (The terms active vs. passive information storage are used in this regard by Fel'dbaum [3].)

In ? 6 we consider problems for which the states 4(t) of the system (1.1') are completely observable. The controller may then as well use controls based on the current states. We call these Markov control policies. Given a Markov control policy U, (1.1') determines a continuous Markov process. The differential generators Au of these processes are second order partial differential operators, which we assume to be uniformly elliptic. Dynamic programming now yields a second order nonlinear parabolic partial differential equation (6.2). Unlike equation (1.4), uniform ellipticity implies that all partial derivatives appearing in (6.2) are continuous functions. Using that fact, the dynamic programming for- malism is made rigorous.

An alternative approach (? 7) is to regard the problem as one of optimally controlling the coefficients of the operators AU. There results a necessary condition, resembling formally Pontryagin's principle. We call it the principle of optimal evolution. This approach is also of interest in problems with partial information.

In ? 8 we mention several autonomous problems, treated by dynamic program- ming. The partial differential equation corresponding to (6.2) is then elliptic. The open loop problem (no data) is mentioned in ? 9. In ? 10 we review some results on the problem of optimally stopping a Markov process. It corresponds to a free boundary problem for a linear elliptic (or parabolic) equation. A number of procedures have been proposed for solving numerically the nonlinear partial differential equations obtained by dynamic programming (?? 6, 8). We list several in ? 11. A general partially observable problem is formulated in ? 12, and results about filtering and the separation principle are summarized. In ? 13 we list some unsolved problems.

2.1. Notation. Rn denotes the space of n-tuples of real numbers. Its elements are typically denoted by x, y, - . The letters r, s, t ... denote time-like variables.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 7: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 475

We shall usually have To < s < r < t < T, where To, T are fixed throughout the paper. We use xy = Yi xiyi for the scalar product of x, y E R'. If a = (cir) is an n x m matrix, then for xe Rn xa is in Rm with entries i xiaij; while for z e Rm, az is in Rn with entries 1i aijzj. We write Ixl = (xx)112, Ial = (Zij Iaijl2)1/2.

A function of finitely many variables is of class C(l) if it is continuous together with partial derivatives of orders ? 1. We write Q, OQ for the closure, boundary of a set Q.

The following symbols which occur throughout the paper are introduced in the following sections:

?3 Q ,ab,w,a,E a, b, (Q) ? 4 B, Q (a cylinder), O'Q, el(Q), ?5 f,L,K,7rs,J(U),kIl, IU,AU, ?6 Wok,A(s,x,v),4.

3. Stochastic differential equations. In this section we review some facts about continuous Markov vector processes which are solutions of stochastic differential equations. A concise reference to this subject is [28]. Other standard references on Markov processes are [24]-[27].

Let [s, T] be a time interval, and Q = (Q, X, P) a probability space. Here X

is a a-algebra (sometimes called a Borel field) of subsets of Q and P a probability measure on X. An Rn-valued stochastic process 4 on the time interval [s, T] is a function from [s, T] x Q into Rn such that 4(t, *) is a X-measurable function for s ? t ? T. The process 4 is continuous if 4( . , w) is continuous on [s, T] with proba- bility 1 (i.e., for P-almost all w E Q). From now on we usually do not show the dependence of processes on w, writing 4(t) for 4(t, co).

The Markov property states that past and future are independent, conditioned on the present. Thus we say that 4 is a Markov (vector) process if for s ? t1 ? t ? T and any Borel set F c Rn

P{4(t) E flI(r) on [s, t1]} = P{4(t) E fl4(tl)j

with probability 1 [28, p. 7]. The left side denotes conditional probability with respect to the a-algebra -?0 generated by 4(r) for s < r < t1 (the least a-algebra with respect to which these 4(r) are measurable functions), and the right side conditional probability with respect to the a-algebra generated by 4(t1).

A real-valued process w on [s, T] is a one-dimensional Brownian motion if (i) w is a continuous process; (ii) w(t) - w(s) has Gaussian distribution with mean 0 and variance t -s;

and (iii) w has independent increments. Condition (iii) means that the random variables w(s), w(tj+ 1) - w(tj) are

independent for j = 1, , N- if s < tl < t2... < tN< T. If w is known to be a separable process, then (ii) and (iii) imply (i). There are various other characterizations of Brownian motion. See for example [25, p. 289], [28, p. 9], [111, Section 3]. In the usual constructions of Brownian motion Q is the space of real-valued continuous functions on an interval containing [s, T] and P is Wiener measure.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 8: Optimal Continuous-Parameter Stochastic Control

476 WENDELL H. FLEMING

A process w with values in Rm is an m-dimensional Brownian motion if w = (w1, , wm) where w1, -.. , wm are independent one-dimensional Brownian motions. Brownian motions have rather simple probabilistic structure. Stochastic differential equations provide a way of representing a large class of continuous Markov processes in terms of Brownian motions. This is done as follows. Suppose that there exist functions bi, aij on Rn'+ 1for i,j = 1, , n such that

lim h-'E{jb3ji(t)} = bi(t, 4(t)),

lim h - E{ i X)4()}= 2aij(t, 4(t)) h O+

with probability 1 for each t. Here E{ I } denotes conditional expectation and

6i = Q#(t + h) -(E().

The vector function b = (b1, - , bn) is called the local drift and the matrix 2a = 2(aij) the local covariance of the vector Markov process. Suppose that

(3.2) 2a = a*

where * denotes matrix transpose and a is a n x m matrix-valued function, m < n. Then under suitable conditions described below, a process with given local drift and local covariance can be constructed as the solution of a system of stochastic differential equations together with initial data for d(s). When written in vector- matrix notation this system is

(3.3) d, = b(t, 8(t)) dt + a(t, &(t)) dw,

where w is an m-dimensional Brownian motion independent of 4(s). The stochastic differentials dd, dw are to be interpreted according to the calculus of K. Ito, not as in ordinary calculus. In particular see the important formula (3.7) for differentials of composites.

In order to explain the meaning of (3.3) and to construct a process with the required local drift and covariance, these equations are rewritten with the initial data as stochastic integral equations:

rt ,t (3.3') 4(t) = 4(s) + f b(r, 4(r)) dr + f (r, 4(r)) dw(r), s ? t < T.

In the special case when a does not involve the state of the 4 process (i.e., a = c(r))

and ai is of bounded variation on [s, T], the integral { ci dw can be interpreted

as an ordinary Riemann-Stieltjes integral. However, this cannot be done if the states 4(r) appear in a, since with probability 1 both 4 and w are continuous but

nowhere differentiable functions of time. Instead, { ci dw is to be interpreted as a

stochastic integral in Ito's sense:

rt N-1

(3.4) ci(r, 4(r)) dw(r) = l.i.m. L c(ri, (rj))(w(ri+ 1) -w(r)) h-0O i=1

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 9: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 477

where

s = r1 < r2 < < r. = t, h = max(rj+1 -rj),

and l.i.m. is in the mean square sense on Q. Unlike the Riemann-Stieltjes integral, the stochastic integral (3.4) depends essentially on the fact that the coefficient a is evaluated at the left endpoint of the interval [ri, ri+ 1]. If it is evaluated at the mid- point one gets instead the Stratonovich integral. Formulas relating the Ito and Stratonovich calculi are given in [109]; see also (3.3 ) below. Other treatments of stochastic integration have recently been given by McShane [95] who allows a to be evaluated at a point slightly to the left of ri, and by Young [126].

A Picard iteration technique proves the existence of a solution to (3.3') under the following Ito conditions.

(a) 4(s) is independent of the Brownian motion w and El d(s)l2 < co. The functions b, a are Borel measurable on [TO, T]. Moreover, there exist const- ants C1, C2 such that

(b) lb(t, y)l < C1(1 + lyl), Io(t,y)l < C1(1 + lYl) (3.5) (c) lb(t, x) - b(t, y)l < C2x -Y

(d) lI(t, x) - o(t, y)l < C2x -

for all t e [To, T] and x, ye Rn. Inequalities (c), (d) are a uniform Lipschitz condition on b(t, ), o(t,*). The

solution 4 to (3.3') is continuous Markov with El (t)l2 < oo for s < t < T Given the Brownian motion w, any two solutions of (3.3') in this class agree on [s, T] with probability 1.

We shall later assume that a is bounded (see (4.6b)) in order to quote certain theorems about parabolic partial differential equations. However, for optimal control problems we would like to know that the process 4 is well-defined under weaker assumptions on the local drift b. Let l1(s)l _ po with probability 1. If instead of the global Lipschitz condition (c) b satisfies a Lipschitz condition on any compact set, then 4 is determined up to the first time t when 8(t) = p by the values b(t, y), i(t, y) for IYI < p, p > Po (see [26, Chap. 11, Section 3]). By letting p -+ o and using the growth estimate (b) it follows that there is a version of the process 4 satisfying (3.3') on [s, T], again unique in the sense described above. Of perhaps more interest in control problems is the fact that 4 can often be defined even for discontinuous b. See end of this section.

Let us consider the second order partial differential operator

n 0~~2 na (3.6) A = E a1j.s, x) a + bi(s, x)

A is called the differential generator of the process. An important formula of Ito is the following [28, p. 24]. Let i(s, x) have continuous partial derivatives s, Vxi ,

, i, i = 1, , n. If 4 satisfies (3.3') as above, then

(3.7) dqf(t, 8(t)) = (Os + AO) dt + tx/ dw,

where Os, AO, etc. are evaluated at (t, 4(t)). Note that (3.7) contains a term E a which would be missing if stochastic differentials obeyed the rules of ordinary

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 10: Optimal Continuous-Parameter Stochastic Control

478 WENDELL H. FLEMING

calculus. For example,

d(djdj) = dj dQj + djddi + 2aijdt unlike the ordinary product rule.

3.1. Stopping times. Frequently we are interested in 4(t) only for s < t < -, where T is some random time. It is required that the random time not anticipate the future evolution of the process . This idea is expressed in the following.

DEFINITION. A function T on Q is a stopping time for the Markov process 4 if s ? -c ? T and the event - < t is 0 measurable for each t E [s, T].

As above, 09 is the a-algebra generated by 4(r) for s ? r ? t. An example of a stopping time for a continuous Markov process is: let T be the first time t such that (t, 8(t)) 0 Q, where Q is an open subset of R` 1, or T = T if (t, 8(t)) E Q for s < t < T It is called the exit time from Q.

By integrating (3.7) from s to T and taking expected values one arrives at the formula

(3.8) EO(r, 4(T)) - Ei(s, 4(s)) = E [Os + AO] dt.

,rA t If we let - A t = min (-, t), then 0 xa dw is a martingale and its expected value

is 0. Formula (3.8) will furnish an important connection between the Markov process and a boundary problem for its backward operator (4.3-4.4 below). It is valid under the following conditions. Let Q be an open subset of the strip [To, T] x Rn, and let &(Q) denote the set of all real-valued functions / such that:

(a) 1 is continuous and bounded on the closure Q. (3.9) (b) OS t/xi, /xixj, are continuous on Q, -i, j = 1, * , n.

(c) Os + AO is bounded on Q.

Moreover, suppose that (t, 8(t)) E Q for s < t < T. Then (3.8) holds. In particular, we shall take for Q in .?? 4, 6, 7 a cylinder and T the exit time from Q. If Q = (TO, T) x Rn, we have T- = T; in that case we write & for &(Q).

Different authors define Markov processes in slightly different ways, which are for our purposes equivalent. The Markov property, stopping times, etc. are often defined using some increasing family {t} of a-algebras such that 40 C 4t and 4t is independent of Brownian increments for times ? t (see [24], [26]). For example, !4t might be the a-algebra generated by w(r) - w(s) for s < r < t, or might be obtained by completing the probability measures involved. In many instances the initial state 4(s) is a known vector, x. The process 4 is then said to start at (s, x). TIhis fact is indicated by denoting expectations and probabilities with ESX{ * } and Psx{ }. We shall be interested in the process starting at any x in some open set B and s in (TO, T). The idea of considering not a single process but rather a family of processes obtained by varying the initial point figures in a widely used definition of Markov process [26, Chap. 3].

3.2. Equations (3.3) as a model for physical systems. Suppose that some physical process, if unaffected by random disturbances, can be described by a

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 11: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 479

system of ordinary differential equations d, = b dt. If, however, such disturbances enter the system in an additive way, then we may take as a model

(3.10) d, = b(t,W(t))dt + v(t)dt,

where v(t) is a vector representing the disturbance (which we call noise) entering at time t. The exact nature of the actual noise process is usually not known, but the model should reflect fairly well its qualitative features.

Example 3.1. Stationary, wide band noise. Assume that the noise process is stationary with mean 0 and known covariance matrix V = (Vij):

0 = E{vi(t)}, Vij(r) = E{vj(t)vj(t + r)}, i,j = 1, *, n.

If V(r) is nearly 0 except in a small time interval 0 ? r ? h, then the spectral density V(r) (Fourier transform of V) is nearly constant over a wide 'r-interval. In that case the noise is termed wide band. Stationary white noise corresponds to the ideal case when Vij is a constant 2aij times the Dirac delta function, V(r) = 0 for r : 0. For stationary white noise the matrix a in (3.3) is constant, and (3.2) holds. Formally, stationary white noise is often written adw/dt, where w is an m-dimensional Brownian motion. By replacing wide band noise v by white noise one gets a Markov model for a possibly non-Markovian physical process.

If v(t) is the product of a stationary wide band noise with a coefficient matrix depending on the states of the process 4, then (3.3) is no longer a correct approxima- tion to (3.10). One should instead have (3.3 ) below.

Example 3.2. Rational spectral density. Suppose that v is a stationary process, which for simplicity we take scalar valued (n = 1). Suppose also that the spectral density has the form

V(A)= 1

where P is a real polynomial of degree I > 1 with no real zeros. Then v can be described as the output of a linear constant coefficient stochastic differential equation of order 1:

C1v) + *.. + c1v + cov = W,

where * = d/dt. See [25, p. 546]. This can be rewritten as a system of first order equations for a vector process C = (Cl, * *, CI) by setting

cj = v( - 1), j =1 . 1

Then (3.10) becomes the (I + 1)-dimensional system

d, = b dt + C1 dt,

dCj = Cj+1dt, j= 1,.* , I-1,

cl dC = -(cl - 1lC + + coC0) dt + dw,

which has the form (3.3). With probability 1, C, is a C(l ') function of time. This fact can be used to

construct smoother approximations to Brownian motion. For instance, let

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 12: Optimal Continuous-Parameter Stochastic Control

480 WENDELL H. FLEMING

I = 1 and vc obey

cdvc + vcdt = dw c > 0.

For small c, vc has large bandwidth as in Example 3.1. With probability 1, vc is continuous and

wC(t) = w(s) + vc(r) dr

is of class C(). As c -- 0+, wc tends uniformly on [s, T] to w with probability 1l no matter what (fixed) initial data vC(s) are given.

The process w was first constructed by Wiener and provided a model for the Einstein-Smoluchowski theory of Brownian motion. Objections were raised to this model for the Brownian movement of actual particles in which the velocity w does not exist. The process wc furnishes another model due to Ornstein and Uhlenbeck [103] for which the velocity W = vc is continuous. The process wc is not Markov, but the pair (wC, vC) is a two-dimensional Markov process.

One can generalize Example 3.2 to include any system dX = b dt + dC, where

d; = b(t, C(t)) dt + &(t, C(t)) dw.

These equations together give a system of stochastic differential equations for the vector process (4, C). This procedure has the drawback that dimensionality is increased to n + m if C is m-dimensional. Also, the uniform ellipticity condition (3.11) cannot hold for the process (4, C) since the local covariance matrices for this process can have rank at most m.

Example 3.3. Narrow band noise. If the spectral density V(r) is nearly 0 except in a small neighborhood of a fixed frequency ro, then v is called narrow band. An example is provided by taking v a solution of

V + ?V + V =?W

for small positive e. This noise process is discussed in detail in [11, Volume I, Chap. 7]. The noise amplitude is approximately Rayleigh distributed.

Example 3.4. Shot noise. See for example [11, Chap. 6], [14]. In this case v(t) is a sum of randomly placed Dirac delta functions. The solution 4 of (3.10) exhibits jump discontinuities. These are outside the scope of the present article, which deals with continuous processes.

In analogue computations the computer simulates not the Brownian motion w but instead wC, where dwC = vc dt and vc is some wide band process (for instance, the one in Example 2). Consider the equations

(3.3c) d,c = b(t, ,C(t)) dt + o(t, Qc(t)) dwc.

As c -> 0+ we may not expect that Qc tends to a solution of (3.3), but rather to a solution of

(3.3-) d, = (b + I a,a)dt + acdw,

where the ith component of the vector oxy is Ejk (Ooij/OXk)okj, and as usual b, a, ax are evaluated in (3.3 -) at (t, 8(t)). This has been proved by Wong and Zakai [117] under certain assumptions.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 13: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 481

If stochastic differentials and integrals are interpreted in the Stratonovich (rather than Ito) calculus, then 4 is a solution of (3.3) in Stratonovich's sense. Another feature of the Stratonovich calculus is that the usual rules for differentials of composites and their behavior under coordinate changes are obeyed [109].

3.3. Discontinuous local drift coefficients. The control problems we shall consider can be regarded as problems of controlling the local drift of a Markov process in some optimal way. In order to achieve the optimum it is often necessary to admit discontinuous drift coefficients bi in (3.3). The Lipschitz condition (3.5c.) is then violated and the process 4 need not be well-defined. Using recent work of Stroock and Varadhan [111] this difficulty can be avoided if A is uniformly elliptic. Previous work in the same direction is due to Krylov [79].

In [111] it is assumed that:

(3.5)(c') b is bounded, Borel measurable,

and that aij is continuous, bounded, and satisfies n

(3.11) c121l2 < E aij(S,X)2Aij, c > 0, i,j= 1

for all s E [To, T], x e R , A e Rn. This says that A is a uniformly elliptic operator. In [111] Q is the space of continuous R -valued functions on [s, T] and d? the a-algebra generated by 8(r) for s ? t ? T74 = 4'. Here 8(r) is short for ,(r, co) = o(r), where w e Q. Given the coefficients b = (bl, * , bn), a = (aij) and (s, x) there is shown to exist a unique probability measure P,S on X such that

exp {0L(t) - (s) - b(r, 4(r)) drl- Oa(r, ;(r))O dr}

is a P,S martingale for each 0 E Rn. Then 4 is a vector Markov process on (Q, X, P,) with P,.{J(s) = x} = 1. Moreover (3.8) holds for any i E S. If the Ito conditions (3.5) hold, then PSX is also the probability law of the solution of (3.3') constructed by the Picard method. Thus the two approaches lead to processes identical in law under the Ito conditions.

In [111] the case b 0 O is treated first. Let PS denote the corresponding measure on X. It turns out that PSX is absolutely continuous with respect to PSX and the Radon-Nikodym derivative is

(3.12) d exp 2{ [ba- tl-2 ba-c'bdr]},

where b, a1 are evaluated at (r, 4(r)). Formula (3.12) is due, under somewhat different assumptions, to Girsanov [65]. It is of interest in likelihood ratio methods for signal detection [73].

4. Forward and backward operators. With a continuous Markov process given by the stochastic integral equations (3.3') is associated its backward operator a/as + A, which appeared in (3.8), and also its forward operator - a/at + A*. These operators are formally adjoint. They provide a means of turning various questions about the process into problems in partial differential equations, which can be treated by analytical and numerical methods.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 14: Optimal Continuous-Parameter Stochastic Control

482 WENDELL H. FLEMING

4.1. Transition densities. Under suitable assumptions on the coefficients of A (for example, those in (4.6) below) the process has a transition density p(s, x; t, y), characterized by

PSX4(t) 6 = {l p(s, x; t, y) dy

for any Borel set F c Rn. See [26, Appendix] [21, Chap. 1] [23, Chap. IV, Section 11]. As a function of (s, x), p satisfies the backward equation

-+ Ap = 0. as If the coefficients are smooth enough, then p satisfies in (t, y) the forward equation

-p + A*p = 0, at A*p = y (aijp)- E _(bip).

Without such extra smoothness conditions, the forward equation can be inter- preted in some weak sense; see Remarks in ? 7. If equations (3.3) take the special linear form

(4.1) d, = a(t),(t) dt + o(t) dw,

and the initial distribution 4(s) is Gaussian, then 8(t) is also Gaussian for t > s. Its distribution is determined by the mean and covariance matrix Q(t) = (qij(t)):

,u(t) = E{j,(t)j, qij(t) = Ej(4-#() - y(t))(4j(t) - j(t))}.

These evolve according to the ordinary differential equations

d,= a, dt, (4.2)

dQ = [aQ + Qa* + 2a] dt.

The initial data ,u(s), V(s) are obtained from the initial distribution. In particular, if 4(s) = x is given, then ,u(s) = x, Q(s) = 0 and we get the transition density p. If the coefficients a, a are constants, then p can also be found by Fourier trans- forms. There are formulas, due to Kalman and Bucy, for the conditional means and covariances of 4(t) given linear noisy observations of the solution of (4.1). See ? 12.

If the process is not Gaussian, then calculation of the transition density p is more difficult and has generally been done under such additional assumptions as quite low dimension n, near linearity of the system, or smallness of the co- efficient a in (3.3). For related discussions in a control theory setting see [11], [18], [33], [35], [91].

4.2. Boundary problems for the backward operator. Of more direct interest for the kind of stochastic control problems we shall formulate in ? 5 is the con- nection between the Markov process and certain boundary value problems for

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 15: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 483

the backward operator. Formula (3.8) is the link needed to make this connection. Consider a parabolic equation of the form

(4.3) /O +AOf+l(s,x)=O as in a cylindrical region

Q = (TO, T) x B,

where B c Rn is open. Let

a'Q = ([To, T] x aB) U ({T} x B). a'Q is the "essential" part of the boundary aQ, in the sense that if (t, 4(t)) starts in Q then it must exit from Q through a'Q. Equation (4.3) is to be considered with boundary data

(4.4) f(s, x) = D(s, x) for (s, x) E a'Q.

Let 4(s) = x E B. If E is a solution in &(Q) of (4.3H4.4) and - the exit time from Q, then from (3.8) we get

(4.5) f(s, x) = Esx {l(t, (t)) dt + D(r,

This is the desired probabilistic formula for ql. It remains to state conditions under which such a solution il of the boundary

value problem exists. For this the following assumptions suffice [21, Chap. 3] or [23, Chap. IV, Section 5]:

(a) The functions bi, I are bounded and satisfy a Holder condition on Q. (b) The functions aij are bounded and Lipschitz on Q, and uniform ellipticity

(4.6) (3.11) holds. (c) aB is compact and B has the strong exterior sphere property. (d) FD is continuous and bounded on a'Q. Uniform ellipticity guarantees a certain amount of smoothness of , under

fairly weak smoothness assumptions on the coefficients. For control theory purposes it would be interesting to weaken this assumption as suggested in ? 13.

The strong exterior sphere property is as follows: for each x E aB there is a closed spherical ball Sx of fixed radius po > 0 such that Sx n B consists of the single point x.

Instead of a cylinder one can consider more general regions Q, of the sort described in [21, p. 61]. Probabilistically this amounts to stopping the process at the first time when 4(t) e Bt, where B, may vary with t. For a cylinder, B, = B.

In place of (a) we shall often have in later sections the weaker assumption:

(4.6) (a') bi, 1 are bounded and Borel measurable on Q.

Let &1(Q) denote the space of functions t satisfying (3.9a), (3.9c) and

( xi is continuous on Q and s Ox,j square integrable on any compact (3.9) (b) x subset of Q.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 16: Optimal Continuous-Parameter Stochastic Control

484 WENDELL H. FLEMING

Then (4.3)-(4.4) has a unique solution in d(Q). This can be proved by approximat- ing bi, I by a sequence of functions satisfying (a) and using a priori estimates [23, Chap. V, Section 3] for the partial derivatives appearing in (3.9b'). It can also be shown that formula (3.8) connecting (4.3) with the Markov process constructed by the method in [111] is still correct.

Under stronger smoothness assumptions on OB and (D, statements can be made about continuity of partial derivatives of / up to OQ. See [21], [23]. The relevant facts are also collected in [54, Appendix].

4.3. Unbounded cases. In many interesting situations the boundedness assumptions on b, 1, (D in (4.6) hold only on bounded sets. In such cases the existence of a solution to problem (4.3H4.4) can still often be shown by the following technique. For simplicity let I > 0, (D _ 0. Suppose that B is unbounded and p an integer large enough that lxl < p for all x E OB. Let

Bp = B n {ixi < p}, Qp = (To T) x Bp,

and (DP a continuous function on O'Qp such that

0 ? (DP < (DP+l < (D

(DP(s, x) = (D(s, x) if Ixl ? p-1,

(FP(s,x)=0 if lxl=p.

Let iP be the solution of (4.3) in Qp with

P=(DP on a'Qp. Then /P > 0. From the maximum principle for parabolic equations, the sequence /P is increasing. Let , denote the limit as p -+ oo. If , is bounded on any Qp, then by the a priori estimates cited above for partial derivatives of solutions of (4.3), 1 will also be a solution to (4.3), and its restriction to each Qp is in &1(Qp).

One can define, using a technique described for instance in [26, Chap. 11, Section 3], a solution 4 of (3.3') for s < t < -', where -' is a "blowup time". If Tp is the exit time from Qp, then Tp is increasing in p and has a limit T ? -'. If we suppose that:

(a) /(s, x) is bounded on each Qp; (4.7) (b) z' = T with probability 1 (the process does not blow up on [s, T]); and

(c) limp 00 E.{J(Tpq, 4(Tp); I(J = p} = 0,

then / is the unique solution of (4.3H4.4) for which (4.5) is correct. Sufficient conditions for (4.7) are that b satisfy the growth condition (3.5b), and polynomial growth for 1, (F:

0 _ I(t,y) _ C(1 + IYI)Y, (4.8)

0 < (D(T,y) _ C(1 + IYI)Y for suitable positive C, y. See [58], [6, Chap. 4] for discussions of the technique just indicated for unbounded cases which occur in control theory.

5. Controlled continuous Markov processes. In this survey we are concerned with families of Markov processes which can be controlled. A control is to be

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 17: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 485

applied at each instant of time t, based on an observation of the current state d(t) of the process. Controls of this type will be called Markov control policies. From a practical standpoint a Markov control policy may be rather sophisticated since it depends on information being continually updated and supplied to some possibly quite nonlinear controlling device. On the other hand, Markov control policies are considerably simpler than controls depending in an arbitrary way on past data about the process 4.

We consider continuous processes described by stochastic differential equations (5.1) containing certain control parameters. If the control is a Markov policy, then these processes are also Markov. By excluding discontinuous Markov processes from discussion we are omitting, for instance, the interesting situation when the Brownian motion w in (5.1) is replaced by a Poisson process. For dis- continuous processes the backward operator is no longer a partial differential operator; and the mathematical treatment consequently would have to differ from the one we outline in the sections to follow. Formulas for representing rather general (possibly discontinuous) Markov processes by stochastic integrals are given in [28, Chap. 3]. One class of problems for discontinuous processes reduces to the kind we consider. This is the problem of controlled finite-state Markov chains whose states are observed with an additive white noise error; see [106], also ? 12.

To formulate in a precise way an optimal control problem we must specify the state equations (or plant) which describe the system, the class of control policies admitted, a stopping rule for the state process, and the performance criterion. Following the notation in ?? 3, 4 we consider processes on a time interval [s, T].

5.1. State equations. The state of the system at time t is a vector 4(t) E R', and the initial state 4(s) has probability distribution n,. The states evolve according to

(5.1) d= f(t, 4(t), u(t)) dt + o(t, 4(t)) dw,

where w is a Brownian motion independent of 4(s), and u(t) is a vector in Rd for some d > 0 representing the control applied to the system at time t.

5.2. Control policies. Let K c Rd be a closed convex set; K is called the con- trol set. It is usually described by specifying a finite number of inequality con- straints on the control parameters in the problem. For instance, if d = 1 and it is required that -1 ? u(t) ? 1, then K = [-1, 1].

A Markov control policy is a Borel measurable function U from [To, T] x Rn into K. For any such U let

(5.2) fU(t y) = f(t, y, U(t, y)).

If we let b =fu and 4 be the corresponding Markov process which solves (3.3) with given initial data C(s), then the control applied at time t is

(5.3) u(t) = U(t, 4(t)).

We shall impose below conditions on f and U guaranteeing that 4 is well-defined.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 18: Optimal Continuous-Parameter Stochastic Control

486 WENDELL H. FLEMING

Besides Markov control policies it is sometimes convenient to consider controls which are stochastic processes depending in an arbitrary way on the data available (? 12).

5.3. Stopping rule. Consider B and the cylinder Q as in ? 4. Assume that 4(s) E B with probability 1, and that s E (To, T). Let T denote the exit time of (t, 4(t)) from Q. Both 4 and T depend on the control policy U used, but the notation does not show this fact.

5.4. Perfonnance criterion. We seek to minimize among all Markov control policies U in some class ?, the expected value of an expression like (1.2):

(5.4) J(U) = E L(t, 4(t), u(t)) dt + (D(t, 4(t))

If the state process is completely observable, then we take for 1 the class of all U satisfying certain analytic conditions (? 6). However, for partially observable problems it is interesting to consider smaller classes 1 (? 7).

We assume that the functions f, a in the system equations (5.1) are known to the controller. Thus the work described in this survey concerns system identifica- tion problems only in a peripheral way. [A number of such problems can be included under the general partially observable control problem formulated in ? 12.] In applications the function f is determined from physical properties of the system being modelled, subject to simplifying assumptions to make the problem more tractable. On the other hand, the performance criterion J is frequently a matter of taste. It is often chosen for mathematical convenience, subject to certain qualitative features any reasonable criterion J should have. The interpretation of the noise term a dw in (5.1) as a model for a physical system was mentioned in ? 3.

Example 5.1. Stochastic linear regulator. Suppose that: B = R' (no state constraints, T- T), K = Rd (no control constraints), we have linear system equations, namely

f(t, y, u) = a(t)y + ,B(t)u, a = v(t),

and quadratic performance criterion

L(t, y, u) = yM(t)y + uN(t)u,

D(y) = yPy,

where M(t), N(t), P are positive definite symmetric matrices. With either complete information or linear noise-corrupted partial observations this problem has a rather explicit solution. See ?? 6, 12.

Example 5.2. Let L 1, (D _ O. Then the problem is minimum expected time to reach S.

Example 5.3. Let L 0 and

D(t,y)=1 for yeDB, To<t<T,

(D(T,y) = 0 for ye B.

Then the problem is to minimize the probability that 4(t) leaves B during the time interval [s, T]. This is of interest in applications where the important thing is that

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 19: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 487

certain bounds on 4(t) not be exceeded. [In this example, (D is discontinuous at (T, y), y E DB, contrary to assumption (4.6d). This minor difficulty can be avoided by allowing /7, 0 defined below to be discontinuous at such points.]

Let us rewrite the performance criterion. The reader better versed in partial differential equations than in probability might wish to take the formula (5.4') so obtained as the definition of J(U). Most of ?? 6-11 can then be read without difficulty. Let

Au n a2 na Z aij(s,x)aa3 + E t(S,x) i,j Ia ix = I Oxj

denote the differential generator of the Markov process corresponding to the control policy U. Consider the boundary problem

4u (5.5) as+ Auu + LY = o in Q,

(5.6) u = (F on OV,

where

LU(s, x) = L(s, x, U(s, x)).

From (3.8) we get

(5.4') J(U) = {U(s, x) d7(x).

In particular, if the initial state 4(s) = x is given, then J(U) = U(s, x) is to be mini- mized in W.

5.5. Assumptions. Let us now list assumptions under which various results (about conditions for a minimum, existence of optimal controls, etc.) will be stated in sections to follow. We consider two sets of assumptions, corresponding to what we call bounded and unbounded cases.

Bounded case. The assumptions are:

(a) The control set K is convex and compact. (5.7) (b) The functionsf, L are of class C(2) and are bounded on [TO, T] x B x K.

(c) Conditions (4.6b), (4.6c), (4.6d) hold.

Let W, consist of all Markov control policies U. Then qu - S1(Q) and (5.4') is satisfied for any U E W,. From (5.4') values of U outside Q do not affect the per- formance J (U).

Unbounded case. The boundedness assumptions (5.7) simplify the statements of various theorems to follow. However, they do not hold in many examples of interest, including the stochastic linear regulator. Using the discussion at the end of ? 4, formula (5.4') can still be justified if polynomial-like growth conditions

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 20: Optimal Continuous-Parameter Stochastic Control

488 WENDELL H. FLEMING

are allowed. For the unbounded case we assume:

(a) K is closed and convex. (b) f, L are of class C(2) and there exist positive constants C, y, p such that

If(t,y,u)I < C(1 + IYI + IUI), (5.8) 0 < L(t, y, u) < C(1 + Iyl + Iul4).

(c) Conditions (4.6b), (4.6c) hold. (d) 4D is continuous and

0<? D(T,y) < C(1 + IyY).

We now let WI consist of all Markov control policies satisfying the growth con- dition

(5.9) IU(t,y)I _ C(1 + IYI) If K is compact, as in (5.7), then (5.9) is automatic.

The discussion of necessary conditions for a control policy U0 to minimize J( U) is quite similar under either assumptions (5.7) or (5.8). However, the existence of a minimizing U0 is more delicate in the unbounded case. For this one generally needs p < y as well as growth assumptions on partial derivatives of the functions f,L,D. See [58].

We survey in ? 8 the autonomous case, in which the parabolic equation (5.5) is replaced by the corresponding elliptic equation. There is another interesting class of problems in which the controller chooses not U but rather the region B. These are the problems of optimal stopping (? 10).

6. Dynamic programming (completely observable states). Let us assume in this section that the state 4(t) at each time t is known by the controller. In particular, the initial state x = 4(s) is known. Following the basic dynamic pro- gramming idea of Bellman [16], consider the value of the minimum (or infimum) as a function of the initial data: let

(6.1) 4(s,x) = inf VU(s,x). Uec?ti

By using formally Bellman's principle of optimality the following nonlinear parabolic partial differential equation for 0 is derived:

(6.2) 3' + min [A(s, x, v)+ + L(s, x, v)] = 0, S veK

n 0~~~2 na A(s, x, v) = aij(s, x) + E i(s, x, v) -

Equation (6.2) is to be solved in the cylinder Q with the boundary data

(6.3) 4(s, x) = (D(s, x) for (s, x) Ea 'Q.

In the dynamic programming formalism the optimal policy U0 is characterized by the property that A(s, x, v)4 + L(s, x, v) is minimum on K when v = U?(s, x).

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 21: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 489

Thus if 4) and U0 can be calculated, the minimum problem is solved simultaneously for all initial data (s, x) E Q. See the verification theorem below.

Various analytical difficulties are encountered in putting this formalism on a precise basis. The partial derivatives in (6.2) need not exist everywhere. For instance, in the deterministic Pontryagin problem (a = 0) (6.2) becomes (1.4). By assuming the uniform ellipticity condition (3.11) this difficulty is avoided. However, it still often happens that U0 is discontinuous. For that reason we have admitted discontinuous control policies U in the class W1.

Only the first order partial derivatives Xx enter nonlinearly in (6.2), since a does not depend on 0 and L aijo_i_j can then be taken outside the min sign. See (6.2') below. If in (5.1) one allowed the noise coefficient a to depend on the control parameters, then second order derivatives 0_i_j would also occur nonlinearly in (6.2). Equation (6.2) is still formally correct according to dynamic programming, but it is so strongly nonlinear that few results are known about it.

THEOREM 6.1 (Verification theorem). Let (5.8) hold. Suppose that 4V is a solution of (6.2H6.3) such that

(a) 4/' is continuous on Q- [To, T] x B; (b) The partial derivatives 4', Xxi, o4p. are continuous on C, i, j = 1, , n; (c) In case B is unbounded, there exist positive constants C, y such that 1'(s, x)I

< C(1 + IxAY. Moreover, suppose that U0 E WI is a control policy such that,for almost all (s, x) E Q, A(s, x, v)+' + L(s, x, v) is minimum on K when v = U0(s, x).

Then 0'(s, x) = b(s, x) = /UO(s, x) for all (s, x) E Q. Outline of proof Consider any U E I1 and 1lU given by (5.5H5.6). By setting

v = U(s, x) in A(s, x, v) one gets the operator AU. Thus from (6.2)

(6.4) 0' + Auq5 + Lu ? i' + AU?Oq5' + LjO = 0

By applying (3.8) to 4' and the operator AU, one gets 4' < ou with equality when U = U?. This is the conclusion of the verification theorem. [In place of (3.8) one can equivalently appeal to the maximum principle for parabolic equations.]

The uniform ellipticity assumption included in (5.8) is not essential in the verification theorem. Without it one needs to restrict U and U0 to some subclass of W, for which we are certain that J(U) makes sense and that (3.8) can be applied. The following subclass Vo would suffice. %o consists of all U E WI with the following property: for any p > 0 and T' < T there exist positive M, 6 (perhaps depending on p and T') such that

I U(t,x) - U(t,I A _ MIx - YI

I U(t, y) - U(s, A)I _ M(t -s)

whenever IxI, IyI _ p, To ? s ? t ? T'. Example 6.1. Completely observable stochastic linear regulator (see Example

5.1). This was originally studied by Florentin [61]. It turns out that 4 is quadratic in the state variables:

q(s, x) = xY(s)x + Z(s),

where Y(s) is to be a positive definite matrix. Then 4 will satisfy (6.2) with the data

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 22: Optimal Continuous-Parameter Stochastic Control

490 WENDELL H. FLEMING

O(T, x) = xPx if Y, Z satisfy

d+ I*Y + Ya + M-fYN-1#*Y = O, Y(T) = P, ds

dZ n

ds + I aijtj = O, Z(T) = 0. ds i,j=1I

The first of these equations is a matrix equation of Riccati type. It has a solution on some (largest) open interval I containing T By the verification theorem, 0 has the desired property (6.1) if s E I. In fact, I contains the entire interval [To, T1]. This is seen from the bound 0 ? 0 < t? where i? is obtained using the control U _ 0.

In this example the optimal Uo is linear in the states:

U?(s, x) = -N -(s)#*(s)Y(s)x.

Moreover, U? does not depend on a in (5.1). In particular, for the stochastic linear regulator the optimal policy U0 is the same as for the deterministic linear regulator. For an extension to partially observable states see ? 12. The fact that 0 remains bounded on any finite interval depends essentially on the positivity of the quadratic function L. In the principle of least action in mechanics one often has L = uN(t)u - yM(t)y, where M(t), N(t) are positive definite matrices. In that case 0 may last only on some sufficiently short time interval.

In order to state conditions under which a function b' exists with the pro- perties required in the verification theorem let us for simplicity consider the bounded case (5.7). Let us rewrite (6.2) as

(6.2') 0 + n

a 2a(s,x) X +H(s gx = as i,j=1 I *aJ

where for any (s, x, p)

(6.5) H(s, x, p) = min [L(s, x, v) + pf(s, x, v)]. ve-K

By (5.7a), (5.7b) there is a positive C such that

(6.6) IH(s, x, p)l C(1 + Ipl).

As a special case of existence theorems for nonlinear parabolic equations [23, Chap. VI, Section 4] we have the following.

THEOREM 6.2 (Existence theorem). If (5.7) holds, then the boundary problem (6.2)-(6.3) has a (unique) solution b' in (Q).

Actually [23] considers equations of a more general form than (6.2'). For our purposes the estimates for linear parabolic equations in [21, Chap. 4, Chap. 7, Section 2] suffice, at least if DRS and (D are of class C(3). See remarks in [51], [54, Appendix].

The function k' has properties (a), (b), (c) required in the verification theorem; and the existence of U0 with the required property was proved in [51, Lemma 1]. By the verification theorem, d' = 4.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 23: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 491

Instead of Markov control policies one might admit any control process u with values in K which does not anticipate Brownian future. By modifying the proof of the verification theorem it can be shown that U0 minimizes in comparison with such controls [53, p. 263]. This is to be expected; when the states are com- pletely observable the current state 4(t) contains all relevant information about the past.

An existence theorem for the unbounded case (5.8) is proved in [58].

6.1. Uniqueness of U?. If L(s, x, v) + Ojs, x)f(s, x, v) has a unique minimum on K for almost every (s, x) E Q, then U0 is uniquely determined almost every- where. In most examples occurring in practice this condition is satisfied. In partic- ular, it clearly holds if (6.7) is satisfied.

6.2. Continuity of U?. In general U0 will be discontinuous. For instance, in the time-optimal problem (Example 5.2) with polyhedral control set K and linear f, let ul, ... , ul be the vertices of K. Then U*(s, x) = Uj at those (s, x) where &f has a unique minimum on K at v = uj. Switching from one vertex to another occurs at boundaries of such regions. However, if we suppose

f(t, y, u) = (t, y) + M(t, y)u,

(6.7) d

E oUijoi > Cl 012, c > 0, i,j=I

for all t, y, u, 0 E Rd, then U0 is continuous. In fact, U0 E 10 [53, p. 261].

7. Principle of optimal evolution. Let us now suppose that the controller may not use all Markov control policies. Instead he may use only U e 69 for some691 = c 4. The specific choice we shall make for 9 is motivated by partially observable control problems. The results described in this section are proved in [54]. For a closely related approach, see Mieri [99].

The principle of optimal evolution gives a necessary condition for a control U0 to minimize J(U) in 1. For completely observable problems the necessary condition is also sufficient; and the approach in the present section is then equiv- alent to the one in ? 6.

To simplify matters let us again assume (5.7), the bounded case. Let us also take D(t, y) = 0 for y E DB, and assume that DB is a manifold of class C(3). Fix an initial time s and let QS = (s, T) x B. The backward equation (5.5) describes the evolution of U as time decreases. We have to specify /i at the final time T in order to solve the backward boundary problem. Consider also the forward boundary problem

Oqu_ -at + (AU)*qu = O in QS,

)qU(t,y) = O for s < t < T, yeaB

and initial data at times provided by the probability distribution 7rc of 4(s). This problem has a unique solution qU in the following (weak) sense. The function qU is continuous on QS+h if h > 0, positive on Q, and integrable over QS- Moreover, the partial derivatives qu are square integrable over compact subsets of QS-

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 24: Optimal Continuous-Parameter Stochastic Control

492 WENDELL H. FLEMING

Finally,

+ AuLV qu dt dy =-X q(s, x) d7t(x)

for all f E S(Q) such that f = 0 on D'Q. Probabilistically, qU(t,.) represents the density of 4(t), where 4 is the process obtained by killing 4 at the exit time T from Q.

7.1. The class W. Let 0 ? k < n, and U any Borel measurable function from [To, T] x Rk into K. Let V consist of all U of the form

U(t, y) - U(t, Y1, YJ

Thus we require that the control at time t uses only information about the first k components of the state vector 4(t). For k = 0 this means that U = U(t), the open-loop control case. For k n we have the completely observable case.

Let us write y = (9, y), where

9 = (Y1, , Yk) Y = (Yk 19 '' Yn)-

Consider the conditional density

qU(t9 ) qU qu dy

Conditional expectations are denoted by

E{G(t, -)19}I = G(t, y)qU(t, y) di.

Let B = {9:yeB}, (s, T) x B.

THEOREM 7.1 (Principle of optimal evolution). Let U?(= -U(t, 9)) minimize J(U) in /1. Then, for almost all (t, 9) E

(7.2) E{[A(t, , v)t/ (t, ) + L(t, *, v)]9}

is minimum on K when v = U?(t, 9). This theorem is proved by considering certain control policies U e ? such

that U = U? except in a set of small measure, and using Sobolev's lemma together with estimates for solutions of parabolic equations [54, Section 5].

In order to compute (7.2) one needs the solution /u, of the backward equation (5.5} with data at time T and solution quo of the forward equation (7.1) with data at time s. In this respect, the principle of optimal evolution resembles Pontryagin's principle [20]. The ordinary differential equations for states and costates in Pontryagin's principle are replaced by (7.1), (5.5).

7.2. Existence of U?. Let us assume that L is convex in u and f linear in u (weakened form of (6.7) with c = 0). Using properties of parabolic equations, lower semicontinuity of J under weak* convergence in the space L'(Q,) can be proved. This implies the existence of a minimizing U?.

It is difficult to calculate explicitly examples to illustrate the necessary con- dition (7.2) when the process is partially observable. An example of Witsenhausen

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 25: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 493

[117] for a corresponding discrete-time problem shows that in the linear regulator problem controls linear in 9 need not be optimal.

Boundary control problems for parabolic partial differential equations have been considered by Lions [19], Friedman [63] and Fattorini [130]. There the operator is fixed, with the boundary data chosen by the controller. We consider the opposite kind of problem: fixed boundary data, but operator Au chosen by the controller. It would be interesting to find a unified treatment of both kinds of problems.

8. Control of autonomous processes. Let us now suppose that the system (5.1) is autonomous:

(8.1) d= f (4(t), u(t)) dt + 5(4(t)) dw, t > 0,

and consider autonomous Markov control policies. Such a policy is a Borel measurable function U from R' into K, and (corresponding to (5.3)) we have in (8.1)

u(t)= U(=(t)).

Then 4 is an autonomous (also called time-homogeneous) process. We shall assume that the states are completely observable; in particular, the initial state x = 4(O) is known. We also assume the uniform ellipticity condition (3.11).

Various kinds of optimal control problems for autonomous processes have been treated using methods of dynamic programming and second order elliptic partial differential equations (rather than parabolic equations as in ? 6). We mention four types of such problems.

(a) An autonomous form of the problem in ? 5. Let B be bounded and K compact. Take J(U) as in (5.4) with s = 0 and X the exit time from B. The problem is to minimize J(U) in the class /i of all autonomous Markov control policies. The boundary problem corresponding to (6.2H6.3) is

(8.2) min[A(x, v)4 + L(x, v)] = 0 in B, veK

(8.3) 0(x) = 'D(x) on 8B.

This problem has a unique solution ? in the class of functions C(2) on B and continuous on B [22, Chap. VI, Theorem 3.3]. Moreover

0(x) = min fU(x),

where fu(x) = J(U) for initial point x = 4(O) and /u solves the elliptic boundary problem corresponding to (5.5)-(5.6).

Example 8.1. Let f(x,u) = u, L 1,K the unit ball lul ? 1 in R', and aij = abij where (bij) is the identity matrix. The problem is minimum expected time to reach OB starting at x. The dynamic programming equation (8.2) takes the form

aAo-I I 0 + 1 = O in B

with ? = 0 on OB. For a = 0, +(x) is the minimum time for a particle to reach OB from x travelling with speed < 1; the equation for 0 is then the eikonal equation 10j = 1 of geometric optics.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 26: Optimal Continuous-Parameter Stochastic Control

494 WENDELL H. FLEMING

Kushner [6, Chap. 4] studied the problem for unbounded B, by considering for large p the corresponding problem in the bounded region B n {IxI < p}. A solution ? of (8.2H8.3) need no longer correspond to a solution of the problem of minimizing J(U); but this is true under some additional conditions like (4.7).

(b) Discounted optimal control problems (Kushner [83]). Let B = Rn and consider the performance criterion

00

J(U) = Ex e -tL[4(t), u(t)] dt, / > 0,

where Ex indicates that 4(O) = x. The equation corresponding to (8.2) is now

(8.4) min[A(x, v)o + L(x, v)] - = 0. veK

Suppose that L > 0, K is compact, and that some U1 exists such that J(U1) < x

for all initial states x. Then, writing J(U) = /U(x) as before,

(8.5) ?(x) = min /U(x)

is a C(2) solution of (8.4) in all of R' and 0 ? ? ? tul. If, for instance, /Ul(x) grows no faster than IxlV for some y as Ixl -x QI, then ? is the unique such solution.

A verification theorem [83, Theorem 2] for a positive solution ? to (8.4) to be the function in (8.5) is proved under less restrictive growth assumptions, by an approach like that in [6, Chap. 4] just mentioned above.

(c) Optimal stationary control. (Wonham [121]). The problem is formulated as follows. Let /e denote the class of autonomous U such that:

(i) U is Lipschitzian on R'. (ii) The solution 4 of (8.1) with any initial x is ergodic, and the equilibrium

measure ,tu has finite second moment. The steady state performance

J(U) = f L(x, U(x)) d,1U(x) Rn

is to be minimized in e The equation corresponding to (8.2) in this problem is

(8.6) min [A(x, v)4 + L(x, v)] = veK

where A is the optimal steady state performance. The following verification theorem holds [121, Theorem 4.1]. Let ? be a C(2) function on Rn and A a positive number such that (8.6) holds. Moreover, suppose that

{(11 + IxI lkxil + Ix12IxIxjl)ditu < o Rn

for all U Ce and i,j = 1, - , n. If U0 Ee and the minimum in (8.6) occurs when v = U?(x) for all x, then U0 is optimal and J(U?) = A. This result is applied to a linear-quadratic problem with L = xMx + uNu and

dX = (ao + flu) dt + ,1 dw, + 4U2 dW2,

where w1, w2 are independent Brownian motions. If Ik21 is sufficiently small, then an optimal linear U0 exists. However, for larger values of 1 21 it may happen that no linear U is in ye.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 27: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 495

(d) Optimal stationary control of processes on a 1-dimensional interval (Mandl [96]). Let n = 1 and [x0, xl] a finite interval on R1. If 4(t) is the position of the 1-dimensional Markov process being controlled, then (roughly speaking) 4(t) moves in the interior (x0, xl) according to (8.1) but its behavior at the end- points is determined according to boundary conditions of the general Feller- Ventzel' type. Let g'(x, T) denote the number of times 4(t) jumps from the end- point xi into the interval [x0, x] for 0 < t ? T As performance criterion an expression is taken of the form

J(U) = lim T' l [ L(4(t), u(t)) dt + (ho(x) dgu + h1(x) dgu) T- oo Lo xo

The differential equation to be solved is again (8.6), the solution / now being subject to a certain (nonlocal) condition corresponding to the boundary conditions on the process. As before, A is the minimum value of J(U).

In [98] Mandl considered a different kind of optimal control problem. Let d, = u dt + dw, where u(t) = ? _ and -1 < 4(O) < 1. The drift coefficient switches from ,B to -,B 'according to the local intensity function V0, i.e., the probability of a switch during (t, t + h) given 4(t) is approximately Vo(4(t))h for small h. Let V1 be the local intensity of switching from - / to ,B. The problem is to choose V0, V1 to maximize the expected exit time from the interval (-1, 1) given a bound on the expected number of switchings. If it is in addition required that 0 < V < Z, where Z is a given number, then the optimal V0 has the form V0(x) = Z on some sub- interval of (0, 1), V0(x) = 0 otherwise, and V1(x) = V0(-x). What happens as Z -x o is investigated.

9. Open loop controls. Let us return to the problem in ? 5 on a finite time interval [s, T]. If the controller receives no information about the states 4(t), then U = U(t) must be a function of time only. Such controls are called open loop. Several methods have been used to derive necessary conditions for an open loop control U0 to be optimal. One is stochastic linear programming [116]. If the method in ? 7 is followed, then conditional expectations in (7.2) become expectations since k = 0. A necessary condition for U0 to minimize J(U) is then according to the principle of optimal evolution:

(9. 1) E{L(t, ,(t), v) + /4uo(t, I(t))f(t, 4(t), v)} = minimum when v = UO(t),

for almost all t E [s, T], where 4 is the process obtained by killing 4 at time -C. [If B = RM, i.e., -r T, then =.]

Still another way to derive a necessary condition, without recourse to partial differential equations, is to proceed as in the treatment of Pontryagin's principle and in classical calculus of variations. Consider fixed stopping time z-= T [No extension of the method to variable r seems to be in sight; the difficulty is ap- parently related to the absence of a stochastic transversality condition corres- ponding to the one in the deterministic problem.] In this approach, costate processes are defined which satisfy differential equations dual to the linearized sys- tem equations (5.1). When a = c(t) the costate equations are ordinary differential

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 28: Optimal Continuous-Parameter Stochastic Control

496 WENDELL H. FLEMING

equations [81]. However, if a depends on the state 4(t), then the costate equa- tions are stochastic. The coefficients depend on Brownian past, but the natural sense of time is reversed for the costate process. A different method is then needed to show that the costates p(t) are well-defined [55]. Once the costate process p is defined the necessary condition

(9.2) E{L(t, 4(t), v) + p(t)f(t, ;(t), V)} = minimum when v = U?(t),

almost everywhere on [s, T] is derived (Kushner [81]). The same method applies to the case when nonanticipative control processes u are admitted such that u(t) is s4t-measurable, where {s1d} is a given family of a-algebras increasing with t. In (9.2) expectation is replaced by conditional expectation with respect to s1t [81].

The multipliers in (9.1) and (9.2) are related by

U0(t, 4(t)) =Ep(t)1_4 },

where 40 is generated by 4(r) for s < r ? t. This is proved, at least when OK is a C") manifold, by reasoning in [54, Section 6] together with [75, Section 5].

The existence of an optimal U0 can be proved in the bounded case (5.7), by methods of the author and Nisio [60]. As in ? 7 it is assumed in the existence theorem that f is linear in u, L convex in u. The methods of [60] also prove existence of a minimum among nonanticipative control processes related to a given family { sd} as above.

For partially observable processes (? 12) such sdt can be introduced to describe in a measure theoretic way the data observed up to time t. Unfortunately, these sdt generally vary with the control process u chosen, which limits the applicability to partially observable processes of the results just described.

10. Optimal stopping problems. We have seen that stochastic control problems of Pontryagin type lead to nonlinear parabolic or elliptic equations with given (Dirichlet) boundary data. Another class of boundary problems arises in some sequential decision questions of control theory and statistics. A typical problem of this type is the optimal stopping of a Markov process. For sequential decision questions with a large number of stages it is useful to formulate the problem with continuous time. By using a dynamic programming approach the problem is then (at least formally) equivalent to a free boundary problem for the backward operator, or the differential generator in the autonomous case. See Chernoff [43], Grigelionis and Shiryaev [67], Stratonovich [10, Chap. 10]. Under some further assumptions it has been shown that a solution to the free boundary problem actually solves the optimal stopping problem [43], [67]. This was applied in [43] to explicitly solve some simplified sequential analysis and rocket control models which reduce to optimally stopping a 1-dimensional Brownian motion. The solutions are in some cases approximate, involving asymptotic expansions of solutions of the heat equation [43, Section 9].

In [68] a combined optimal control and stopping problem is considered; the controller selects not only the stopping time but also times to switch from one to another among a finite number of Markov processes.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 29: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 497

For definiteness let us describe an autonomous version of the optimal stopping problem, under somewhat more special assumptions than in [67]. Consider an autonomous Markov process 4 given by

d= b(4(t))dt + c(4(t))dw, t > 0O

with 4(0) = x; and let A = aij(x)&/8xi8xy + I bj(x)8/8xj be its differential generator. Let x E B, where B c Rn is open; and let TB be the exit time from B. Let G be a continuous bounded function on B. The optimal stopping problem is to find, among all stopping timesr- 0 < ? < TB, one for which E.G(4(z)) is minimum. Let

(10.1) ?>(x) = infE.G(4(r)).

Clearly ?(x) ? G(x). Let

(10.2) D = x EB: 0(x) = G(x)}.

The function ? is upper semicontinuous on B, and hence B - D is an open set. By a theorem of Dynkin [67, Theorem 4] ? is the largest A-subharmonic function /

on B such that / < G. [/ is called A-subharmonic if t(x) < E.,4(z(rV)) for all x and open V c B with x E V; for / of class C(2) this is equivalent to A/ ? 0.] If A is uniformly elliptic, then ? is C(2) on B - D and

(10.3) AO=0 forxcEB-D.

Under certain local smoothness assumptions on G, ?, and OD it is shown [67, Theorem 8] that the gradients of G and ? must match at boundary points of D in B:

(10.4) G.(x) = O.,(x) for xeB nl ED.

Conditions (10.3)-10.4) define a free boundary problem, since D is not known in advance.

The question remains: when must a solution ?' of the free boundary problem be the function ? in (10.1)? This is answered in [67, Theorem 9] under some fairly stringent further assumptions. See [43, Section 8] for corresponding results in case of the nonautonomous optimal stopping problem for 1-dimensional Brownian motions. It also turns out that, under these assumptions the exit time from B - D is optimal; in case x E D, T = 0 is optimal. This is what intuition suggests; however the general problem of whether an optimal stopping time exists is fairly difficult. See [67. pp. 549, 556]. In some examples D consists of a single point which, with probability 1, 4(t) never reaches. In such cases an approximate solution is to stop at the first time TE when G(4(re)) ? 4(t(e)) + r, for small E > 0.

At least in some cases, rather complete information about the smoothness of the function ? in (10.1) can be obtained using the theory of variational inequalities. For instance let 4 be an n-dimensional Brownian motion, A the Laplace operator. Suppose also that B is bounded with OB a C(2) manifold and that G is C(2) on the closure B. For any stopping time T, E.G(4(z)) is a Lipschitz function of x, with Lipschitz constant independent of r. Hence ? is also Lipschitz. Let S denote the set of / in the Sobolev space H'(B) such that V _ G in B and / = G on OB. Then

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 30: Optimal Continuous-Parameter Stochastic Control

498 WENDELL H. FLEMING

is the unique member of S which satisfies the variational inequality

T x((x - ox) dx ? 0

for all cE S [90, Lemma 1]. Using this fact it can be shown [40, Corollary II. 3] or [90, Theorem 111.1] that ox, is Holder continuous with any exponent < 1 and ?xixj is pth power integrable on B for any p < cc. Property (10.4) also holds. Some results about smoothness of OD are proved for n = 2 in [90, Part IV].

11. Methods of approximate solution. Let us return to the completely observable optimal control problems considered in ?? 6, 8. The nonlinear partial differential equation (6.2) or (8.2) obtained by dynamic programming can rarely be solved explicitly. [The linear regulator problem in ? 6 is the best known example for which this is possible.] However, various methods for approximate solution have been proposed and used with some success. We now list some of these.

(a) Equations (6.2) and (8.2) are particular kinds of second order nonlinear equations, of parabolic or elliptic type. One can try to apply general methods for the numerical solution of boundary problems for such equations. Work in this direction was done by Van Mellaert [115] using a computational method of Samarskii [104].

The remaining methods which we mention use explicitly the connection between the optimization problem and the nonlinear partial differential equation.

(b) Quasilinearization. This is a technique of Bellman for expressing the solu- tion of an equation with a convex (or concave) nonlinearity as the limit of a mono- tone sequence of solutions to linear equations [17]. It is an extension of Newton's method in calculus.

The method applies to (6.2H6.3) as follows. Define U1, U2, , ?1, 02, by: U1 is arbitrary, Oj = juj, and A(s, x, v)bj + L(s, x, v) = minimum on K when v = Uj+ 1(s, x), for almost all (s, x) E Q. For each j = 1, 2, , 4j satisfies the linear parabolic equation Ajoj + Li = 0 in Q, /j = D on 8'Q, where Ai = AUi, Li - LuJ. It was shown in [51] that, at 1east in the bounded case (5.7), & _ 2 _ *-- and the limit as j -x o is the desired function 0 in (6.1).

This approximation scheme corresponds to a method of Howard [4] for optimal control of Markov chains.

(c) A method which seems to have advantages over (b) was considered by Kleinman and Kushner [76]:1 They consider the elliptic problem (8.2H8.3) in a bounded region B (actually, degenerate elliptic operators are also allowed). The problem is discretized by replacing B with a finite grid and (8.2) with a cor- responding nonlinear difference equation. The discretized problem corresponds to an optimal control problem for Markov chains, whose states are the points of the grid. An iterative scheme for finding the optimum is given. It corresponds to the Gauss-Seidel method for solving linear equations, and converges at least as fast as the schemes of Howard and Eaton-Zadeh.

(d) In [42] Cashman and Wonham considered a nonlinear variant of the stationary linear regulator problem. The nonlinearity appears in the form of the control constraint Iu(t)l < 1. The partial differential equation to be solved is (8.6) with L positive and quadratic, f linear. To find reasonably good suboptimal

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 31: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 499

controls, the unknown function ? in (8.6) is replaced by a quadratic function 0(x) = xYx. Let U be determined by the optimality condition L + /xf = min- imum on K = {lvI ? 1} when v = U(x). By a state-space version of statistical linearization U is replaced by a corresponding linear control U(x) = kx. A system of algebraic equations is obtained for Y, k, and an approximate covariance matrix Q. These equations are solved by successive approximations, using quasi- linearization.

(e) Methods applicable if the noise intensity ulI is small [45], [59], [80], [108]. The objective of these methods is to solve approximately the stochastic optimal control problem using the solution of the corresponding deterministic problem (with a _ 0). Let us consider completely observable problems of Pontry- agin type (?? 5, 6). Let u? be an optimal (open loop) control for the deterministic problem, 4? the solution of (1.1) when u = u?, 4?(s) = x; and let

To = {(t, 4(t)): S < t < T- }

be the corresponding trajectory. Assume that /J is the unique optimal trajectory with initial point (s, x). Let D be a neighborhood of 7'. It is plausible that the optimal trajectories for the stochastic problem lie in D with probability tending to 1 as Jul --* 0. This has been proved under fairly general assumptions [57], [59]; in fact, explicit estimates for the probability in question have been given.

A rather good approximate solution to the stochastic problem is to use the optimal (closed loop) control policy U0 for the deterministic problem. Thus, if at time t1 > s the stochastic control system is observed to be in state 4(t1), then the control applied at time t1 is one which is optimal in the deterministic problem with (t1, 4(t1)) as initial state:

u(ti) = U?(ti, 4(t1)

The following kind of estimate for the goodness of this approximation is proved in [59]. For simplicity let (x = (2E)"12I, where I is the identity matrix. The second order terms in (6.2) then appear as eAx4, where Ax is the Laplace operator in the variables x1, -.. , x,. To indicate the role of E, write JE(U) in (5.4) and 4' in (6.1). In particular, 4' is as in (1.3). The goodness of the approximation is measured by J'(U0) - 1/(s, x). Assume that 00 is smooth in D, more precisely, that 04 is C(1) in D, and C(2) there except perhaps for jumps in the second deriv- atives across switching surfaces for U?. In [59] the following estimates are found to hold near yO:

e= 4O

+ go + o(c),

J(1(UO) = ?>? + EO + o(?),

where - 1o(E) -O 0 as ? -* 0 and

(11.2) O(s,x) = XAxo(t, O(t))dt.

From (11.1), E?- [J6(U0) - 46(s, x)] 0 as E -? 0. By examples, it is shown that the estimates (11.1) may fail if 40 is not smooth in a neighborhood of yT.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 32: Optimal Continuous-Parameter Stochastic Control

500 WENDELL H. FLEMING

If there are no control constraints, then the existence of a neighborhood D of y0 in which 40 is smooth is the classical problem in calculus of variations of the existence of a field of extremals containing /1. For instance, suppose that L, f are as in (6.7) and the stopping time is fixed (T T). If (s, x) is not a conjugate point, then 4? is C(2) in some such neighborhood D. Moreover, good approx- imations to U0 and Aq50? near T0 can be found in terms of the optimal open loop control u? and a solution to the accessory minimum problem [38]. The accessory problem involves the second variation; it is a quadratic minimum problem of the linear regulator type.

Let U' be the optimal control policy for the stochastic problem. A first approximation to U' is U?. In seeking better approximations it suffices to find good approximations for 44, since U'(s, x) minimizes L + 44f on the control set K (see the verification theorem, Theorem 6.1). In case L,f are as in (6.7) then [59]

(11.3) ?'X = Xx + ?oX + O(?)

in some neighborhood of To. Conditions (6.7) exclude switching surfaces for Ue, since they imply strict

convexity of L + 44f on K. In other types of control problems U' is piecewise constant; for instance, this is generally true for linear time-optimal problems. In that case one may seek approximations to the switching surfaces for U' as perturbations of the corresponding surfaces for U?. Some calculations of this nature have been made using formal expansions of 4)' in powers of ? [45], [108]. A rigorous treatment of this matter seems to be hampered by the fact that, while 44 and 00 are everywhere continuous, Ox is discontinuous across switching sur- faces for U0.

12. Partially observable problems. Let us begin with the problem of mean square optimal filtering. Consider a system described by

(3.3) dd = b(t, 4(t)) dt + c(t, 4(t)) dw, s < t _ T,

with 4(s) independent of the Brownian motion w and Eld(s)I2 < oo. Suppose that the states 4(t) are not themselves observable. Instead, noisy observations of some function of the states are made. Specifically, let us assume that the observations are the states rl(t) of a k-dimensional process (k ? n) governed by

(12.1) drq = b1(t, 4(t))dt + a1(t)dw1, s < t < T,

with q(s) = 0, where w1 is a Brownian motion independent of w and 4(s). [Another possibility, which we shall not treat, is that noisy observations of b 1 (t, 4(t)) are made at discrete instants of time.] Let S4t denote the cr-algebra generated by q(r), s < r < t. This a-algebra describes measure theoretically the observed data available at time t. Consider estimates e(t) for 4(t) such that e(t) is Y?/-measurable. The mean square optimal estimate is the conditional expectation 4(t):

4(t) = E{4(t)Is1}

with probability 1 for each t. Bucy and Kalman [41] considered linear system and

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 33: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 501

observation equations:

(12.2) dd = cx(t) dt + u(t) dw,

(12.3) dl= a,(t) dt + u1 (t)dw1.

If 4(s) is Gaussian, then the conditional distribution of 4(t) given the observed data is also Gaussian with mean 4(t) and covariance matrix Q(t). These conditional covariance matrices do not depend on the observed data. They satisfy the Riccati type ordinary differential equation:

(12.4) dQ -Q + Qo* + 2a -QL(2a,)-Lx,Q

with Q(s) the covariance of the initial data 4(s). Here 2a = cr*, 2a, = (X1?X1 are the covariance coefficients for the noises entering (12.2), (12.3). It is assumed that a-1, a71 are bounded functions of t, corresponding to the uniform ellipticity condition previously imposed. The conditional expectations evolve according to

(12.5) dX = ac cdt + Qocx(2a)-'lul dw,

where a, dw = dq- _ X dt and w is a Brownian motion. The process jul1 dwv

is called an innovation; see Kailath [73]. The original derivation of (12.4), (12.5) in [41] was formal. A rigorous treatment may be found in [14, Chap. III].

For the nonlinear mean square optimal filtering problem one generally needs to know the conditional distribution of 4(t) to find 4(t). Let q(t) denote the density of the conditional distribution; actually q = q(t, y, w), where y e R', w c Q. Under appropriate assumptions on the functions b, bl, the conditional density evolves according to the stochastic differential integral equation in function space:

(12.6) dq = A*q dt + (b1 - b1)(2a)- lul dw'.

Here A* is adjoint to the differential generator A of the process X, b, = E{bllsltj, and ul dw = d- b dt. See [47], [64], [84] or [100].

Besides continuous state parameter processes the case when 4 is a finite state Markov chain has been treated. Let white noise corrupted observations of b1(t, 4(t)) be made according to (12.1); and let qi(t) denote the conditional probability that 4(t) is in state i given the observed data. Then (12.6) is replaced by a system of stochastic differential equations [106, Chap. 2], [120] for the qi. For some related optimal control problems involving Markov chains see [106, Chap. 3], [112].

12.1. System identification problems. These concern situations where the system equations (3.3) are not completely known. Suppose that the function b depends on a finite number of unknown parameters tu1, ., um. Taking a Bayesian viewpoint, let (a priori) distributions for these parameters be given initially. By treating the pi as additional (constant) states and adjoining the m equations d,i = 0 to (3.3), the problem of estimating the parameters can be treated as a filtering problem. In certain situations a maximum likelihood technique leads to results when the mean square optimal filter is too complicated to implement.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 34: Optimal Continuous-Parameter Stochastic Control

502 WENDELL H. FLEMING

For instance, if (3.3) is deterministic (c_ 0) and the initial distributions of the ,u are Gaussian, then the maximum likelihood technique leads to relatively simple minimum problems of the type in calculus of variations [30, Section 4]. A compu- tational procedure for dealing with these problems is treated in [32].

12.2. Partially observable optimal control problems. Let us now suppose that the system can be controlled, according to the stochastic differential equations (5.1), with initial data 4(s) as in ? 5. The data available to the controller is assumed to be q(r) for s < r < t, where q obeys (12.1). The problem is to minimize (5.4) among all controls which are, in some sense or other, functions of the available data. Two ways have been suggested to formulate this idea.

(a) Let u be a nonanticipative process with values in K and E Iu(t)12 dt < oo.

Nonanticipative means that u is measurable on [s, T] x Q and that u(t) is 4t-

measurable for each t, where {t} is an increasing family of cr-algebras such that the Brownian increments w(r2) -w(r1) for t < r1 < r2 are independent of 4t. Then (5.1), (12.1) with the initial data for 4(s) and ri(s) = 0 have a unique solution X, , which are continuous nonanticipative processes. Let -4u be the cr-algebra generated by q(r) for s < r < t. The process u is admitted if u(t) is -4u-measurable. Actually, admissibility is a condition on the triple (u, X, q) not merely on the control process u.

(b) Following Wonham [123] or [14, Chap. 4] consider control policies which are functions on the space of past observations. More precisely, let W denote the space of continuous functions from [s, T] into Rk, with the usual sup norm. A control policy is now a function V from [s, T] x W into K, such that V(t, g) = V(t, g) if g(r) = g(r) for s ? r ? t. For g E 6 let

;Ug(r)=g(r), s < r t,

= g(t), r > t.

The control applied at time t is now to be

(12.7) u(t) = V(t, ityj).

Then (5.1) becomes a stochastic functional-differential equation; the existence and uniqueness of a solution is known if V is continuous and V(t, .) satisfies a Lipschitz condition on t. Moreover, the process u in (12.7) is admissible in sense (a).

For technical reasons both definitions (a) and (b) have drawbacks. Quite possibly some modification of one or the other would be more appropriate.

12.3. The separation principle. It has been difficult to treat partially observable control problems unless the control aspects and statistical estimation aspects can somehow be separated. The best known example where such separation occurs is the partially observable linear regulator [62]. A somewhat more general result is the following separation principle, treated in [123]. Consider system equations with state and control entering linearly:

(12.8) dd = (x(t)d + /3(t)u) dt + c(t) dw,

with linear observation equations (12.3). Consider control policies V of type (b).

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 35: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 503

The conditional expectation 4(t) obeys a slight modification of (12.5):

(12.5') d= (_ + flu)dt + Q 1(2a ul)-'c dw.

Moreover, it happens that u = S4t independent of the choice of V, where V, was defined above when u _ 0. The conditional covariance is again given by (12.4) independent of V.

Suppose that the stopping time is fixed (T _ T) and for simplicity take (D = 0 in (5.4). Let

L(t, 9,u) = L(t, y, u),?(t, y -) dy, Rn

where ?(t, ) is the conditional density of 4(t) -_ (t), which is Gaussian with mean 0 and covariance Q(t). The separation principle states that, under certain assump- tions, 4(t) is a sufficient statistic for the optimal control problem. More precisely, regard 4(t) as a new (completely observable) state which evolves according to (12.5'). Then an optimal Markov policy U_ for the problem of minimizing

JU) = Es., L(t, 4(t), U(t, 4(t)) dt,

where x = E{4(s)} also gives a solution to the original partially observable problem. The corresponding optimal V0 is related to U0 by

V0(t, iytj) = _U0(t, A(t)).

Among the assumptions made in [123] are the strict convexity condition (6.7) on L and compactness of K. The latter assumption can be omitted if L satisfies certain growth conditions [58]. These growth conditions hold for instance in the linear regulator problem, for which L is positive and quadratic (see ?? 5, 6). In that problem

L= 9M(t)9 + uN(t)u + trM(t)Q(t).

13. Some open questions. During the last 10 years considerable progress has been made in understanding the nature of stochastic optimal control prob- lems. Nevertheless, the mathematical theory is not in completely satisfactory form. On the applied side, results of practical interest have been obtained only in special instances (for example the linear regulator); for nonlinear problems an additional assumption such as quite low dimension n or smallness of the noise coefficient a seems necessary.

Let us list some open questions suggested by the discussion in the previous sections. The list is not intended to be complete.

Question 13.1. Approximate methods for computing optimal controls. This is currently an area of active research. One open question is to study the con- vergence from discrete to continuous parameters of the method mentioned in ? 11(c). Another is to describe in more detail perturbations of optimal switching surfaces when a is small (? 11, method (e)).

Question 13.2. Completely observable, nonuniformly elliptic problems. It would be desirable to weaken some assumptions made for technical reasons in ? 5. Perhaps the most crucial of these is uniform ellipticity (3.11). When uniform

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 36: Optimal Continuous-Parameter Stochastic Control

504 WENDELL H. FLEMING

ellipticity does not hold, then the boundary problem (6.2)(6.3) may still have a solution O' in some generalized sense. However, it is not usually known whether 4' is the function 4 in (6.1). See [52], [53, Section 5].

An interesting class of problems is obtained if one replaces uniform ellipticity by the assumption that the backward operator a/as + Au satisfies a condition of Hormander [71, Theorem 1.1]. This condition guarantees that the backward operator is hypoelliptic if Au has C(t) coefficients. It is satisfied, for instance, if (5.1) has the form

(13.1) dd = cx4(t)dt + f3[u(t)dt + u(t)dw]

and oc, / are constant matrices satisfying the following complete controllability condition: there exists v such that the vectors f,v, a,B3v, . , cax -'v are linearly independent. For equations of type (13.1) Kushner [87] also obtained results on the existence of a fundamental transition density.

Question 13.3. The general partially observable problem. This was formulated in ? 12 but has not been rigorously treated. In the dynamic programming approach the role of "state at time t" is taken by the conditional density q (or perhaps some function equivalent to q which obeys a slightly simpler equation than (12.6)). Thus the partially observable problem is essentially an optimal control problem with infinite-dimensional state space. Mortensen [100] derived a nonlinear integral-functional differential equation which corresponds to (6.2). This equation involves not only partial derivatives but Frechet-Volterra derivatives of orders 1 and 2. For the open loop case the necessary condition in ? 9 can also be derived from Mortensen's formalism. It would be interesting to obtain general verification and existence theorems for Mortensen's equation corresponding to those in ? 6.

There are various ways to approximate the general partially observable problem by problems with finite-(sufficiently high) dimensional state spaces. One such approximation is to replace the continuous state process by a Markov chain [106, Chap. 3]. Another is to replace the continuous observation process q in (12.1) by data sampled at discrete instants of time; a treatment of this problem using partial differential equations was given in [54, Section 7]. Still a third way is to find some process C which obeys stochastic differential equations

(13.2) d; = b2(t, 4(t), (t)) dt + oU2(t) dW2

such that ;(t) is a fairly good sufficient statistic for the data q(r), s < r < t. By adjoining (13.2) to the system equations (5.1) we get a problem of the type in ? 7, for which the states are (4(t), 4(t)). When the separation principle holds one may take C(t) =(t).

Another question of some interest is to weaken certain technical assumptions made in the rigorous treatment of the separation principle (Section 12).

Question 13.4. Discontinuous state processes. It would be interesting to give a systematic treatment of completely observable controlled Markov processes 4 which may be discontinuous. For general representations of such processes by stochastic integrals see [28]. The dynamic programming equation is still formally (6.2), at least if there are no lateral boundary conditions (B = R', - T). How- ever, A(s, x, v) need not be a partial differential operator.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 37: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 505

Question 13.5. Other infinite-dimensional state space problems. One could consider distributed parameter stochastic control; see [88], [129] for linear- quadratic problems of that type. Another possibility is to replace (5.1) by stochastic differential equations with delays. The existence of solutions to such equations is treated in [601.

REFERENCES

[1] MASANAO AOKI, Optimization of Stochastic Systems, Academic Press, New York, 1967. [2] N. I. ANDREYEV, Correlation Theory of Statistically Optimal Systems, W. B. Saunders, Philadel-

phia, Pennsylvania, 1969. [3] A. A. FEL'DBAUM, Optimal Control Systems, Academic Press, New York, 1965. [4] R. A. HOWARD, Dynamic Programming and Markov Processes, John Wiley, New York, 1960. [5] HERMAN F. KARREMAN, ed., Stochastic Optimization and Control, Proc. Adv. Seminar, University

of Wisconsin, 1967, John Wiley, New York, 1968 [6] HAROLD J. KlSHNER, Stochastic Stability and Control, Academic Press, New York, 1967. [7] J. H. LANING AND R. H. BATTIN, Random Processes in Automatic Control, McGraw-Hill, New

York, 1956. [8] V. PUGACHEV, Theory of Random Functions and its Application to Control Problems, Pergamon,

Oxford, 1965. [9] Y. SAWARAGI, Y. SUNAHARA AND T. NAKAMIZO, Statistical Decision Theory in Adaptive Control

Systems, Academic Press, New York, 1967. [10] R. L. STRATONOVICH, Conditional Markov Processes and their Application to the Theory of

Optimal Control, American Elsevier, New York, 1968. [I1] , Topics in the Theory of Random Noise, vols. 1, 2, Gordon and Breach, New York,

1963. [12] L. A. WAINSTEIN AND V. ZUBAKOV, Extraction of Signalsfrom Noise, Prentice-Hall, Englewood

Cliffs, New Jersey, 1962. [13] N. WIENER, Extrapolation Interpolation, and Smoothing of Stationary Time Series, John Wiley,

New York, 1949. [14] W. M. WONHAM, Random Differential Equations in Control Theory, in Probabilistic Methods in

Applied Math., vol. II, A. T. Bharucha-Reid, ed., Academic Press, New York, 1969. [15] A. V. BALAKRISHNAN AND LUCIEN W. NEUSTADT, eds., Mathematical Theory of Control, Proc.

Symp., University of Southern California, 1967, Academic Press, New York, 1967. [16] R. BELLMAN, Dynamic Programming, Princeton University Press, Princeton, 1957. [17] RICHARD BELLMAN AND R. E. KALABA, Quasilinearization and Nonlinear Boundary- Value

Problems, American Elsevier, 1965. [18] AUSTIN BLAQUIERE, Nonlinear System Analysis, Academic Press, New York, 1966. [19] J. L. LIONS, Controle optimal de systemes gouvernes par des equations aux derives partielles

lineaires, Dunod, Paris, 1968. [20] L. S. PONTRYAGIN, V. G. BOLTYANSKII, R. V. GAMKRELIDZE AND E. F. MISHCHENKO, The Mathe-

matical Theory of Optimal Processes, John Wiley, New York, 1962. [21] AVNER FRIEDMAN, Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood

Cliffs. New Jersey, 1964. [22] 0. A. LADYZHENSKAYA AND N. N. URAL'SEVA, Linear and Quasilinear Elliptic Equations,

Academic Press, New York, 1968. [23] 0. A. LADYZHENSKAYA, V. A. SOLONNIKOV AND N. N. URAL'SEVA, Linear and Quasilinear

Equations of Parabolic Type, Izd-vo "Nauka", Moscow, 1967. (In Russian.) [24] R. M. BLUMENTHAL AND R. K. GETOOR, Markov Processes and Potential Theory, Academic

Press, New York, 1968. [25] J. L. DooB, Stochastic Processes, John Wiley, New York, 1953. [26] E. B. DYNKIN, Markov Processes, Springer-Verlag, Berlin, 1965. [27] I. I. GIKHMAN AND A. V. SKOROKHOD, Introduction to the Theory of Random Processes, Nauka,

Moscow, 1965. (In Russian.) [28] A. V. SKOROKHOD, Studies in the Theory of Random Processes, Addison-Wesley, Reading,

Massachusetts, 1965.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 38: Optimal Continuous-Parameter Stochastic Control

506 WENDELL H. FLEMING

[29] A. V. BALAKRISHNAN, A general theory of nonlinear estimation problems in control systems, J. Math. Anal. Appl., 8 (1964), pp. 4-30.

[30] , Stochastic system identification techniques, in [5], pp. 65-89. [31] , A new computing technique in optimal control, SIAM J. Control, 6 (1968), pp. 149-173. [32] A new computing technique in system identification, to appear. [33] J. F. BARRETT, Application of Kolmogorov's equations to randomly disturbed automatic control

systems, Proc. First Intern. Congr. Intern. Federation Autom. Control, Moscow, 1960, vol. 2, Butterworth, London, 1961, pp. 724-729.

[34] L. D. BERKOVITZ AND H. POLLARD, A nonclassical variational problem arising from an optimal filter problem, Arch. Rational Mech. Anal., 26 (1967), pp. 281-304.

[35] N. M. BLACHMAN, On the effect of noise in a non-linear control system, Proc. First Intern. Congr. Intern. Federation Autom. Control, Moscow, 1960, vol. 2, Butterworth, London, 1961, pp. 770-773.

[36] D. BLACKWELL, Discrete dynamic programming, Ann. Math. Statist., 33 (1962), pp. 719-726. [37] JOHN V. BREAKWELL, Stochastic optimization problems in space guidance, in [5], pp. 91-1 00. [38] J. V. BREAKWELL, J. L. SPEYER AND A. E. BRYSON, Optimization and control of nonlinear systems

using the second variation, SIAM J. Control, 1 (1963), pp. 193-223. [39] J. V. BREAKWELL AND F. TUNG, Minimum effort control of several terminal components, SIAM J.

Control, 2 (1965), pp. 295-316. [40] H. BREZIS AND G. STAMPACCHIA, Sur la regularite de la solution d'inequations elliptiques, Bull.

Soc. Math. France, 96 (1968), pp. 153-180. [41] R. S. BUCY AND R. E. KALMAN, New results in linear filtering and prediction theory, ASME

Transactions, Part D (J. of Basic Engr.), 83 (1961), pp. 95-108. [42] W. F. CASHMAN AND W. M. WONHAM, A computational approach to optimal control of stochastic

saturating systems, Internat. J. Control, 10 (1969), pp. 77-98. [43] HERMAN CHERNOFF, Optimal stochastic control, Sankhya, Ser. A, 30 (1968), pp. 221-252. [44] C. DERMAN, Markovian sequential control processes-denumerable state space, J. Math. Anal.

Appl., 10 (1965), pp. 295-302. [45] PETER DORATO, CHANG-MING HSIEH AND PRENTISS N. ROBINSON, Optimal bang-bang control of

linear stochastic systems with a small noise parameter, IEEE Trans. Automatic Control, AC-12 (1967), pp. 682-689.

[46] STUART E. DREYFUS, Introduction to stochastic optimization and control, in [5], pp. 3-23. [47] T. E. DUNCAN, Probability densities for diffusion processes with applications to nonlinear filtering

theory and detection theory, Tech. Rep. TR 7001-4, Stanford Univ. Systems Theory Lab., Stanford, California, 1967.

[48] E. B. DYNKIN, Controlled random sequences, Theor. Probability Appl., 10 (1965), pp. 1-14. [49] , Sufficient statistics for the optimal stopping problem, Teor. Verojatnos i Primenen,

13 (1968), no. 1, pp. 150-152. [50] J. R. FISHER, Optimal nonlinear filtering, Advances in Control Systems, vol. 5, C. T. Leondes,

ed., Academic Press, New York, 1967, pp. 197-300. [51] W. H. FLEMING, Some Markovian optimization problems, J. Math. Mech., 12 (1963), pp. 131-140. [52] , The Cauchy problem for degenerate parabolic equations, Ibid., 13 (1964), pp. 987-1008. [53] , Duality and a priori estimates in Markovian optimization problems, J. Math. Anal. Appl.,

16 (1966), pp. 254-279; Erratum, Ibid., 19 (1966), p. 204. [54] , Optimal control ofpartially observable diffusions, SIAM J. Control, 6 (1968), pp. 194-214. [55] , Stochastic Lagrange multipliers, in [15], pp.433. [56] , Some problems of optimal stochastic control, in [5], pp. 59-64. [57] , The Cauchy problem for a nonlinearfirst-order partial differential equation, J. Differential

Equations, 5 (1969), pp. 515-530. [58] , Controlled diffusions under polynomial growth conditions, in [127]. [59] , Stochastic controlfor small noise intensities, SIAM J. Control, submitted. [60] W. H. FLEMING AND MAKIKO NISIo, On the existence of optimal stochastic controls, J. Math. and

Mech., 15 (1966), pp. 777-794. [61] J. J. FLORENTIN, Optimal control of continuous time Markov stochastic systems, J. Electronics

Control, 10 (1961), pp. 473-488. [62] , Partial observability and optimal control, Ibid., 13 (1962), pp. 263-279.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 39: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 507

[63] A. FRIEDMAN, Optimal controlfor parabolic equations, J. Math. Anal. Appl., 18 (1967), pp. 479- 491.

[64] P. A. FROST, Nonlinear estimation in continuous time systems, Tech. Rep. TR 6304-4 Stanford Univ. Systems Theory Lab., Stanford, California, 1968.

[65] I. V. GIRSANOV, On transforming a certain class of stochastic processes by absolutely continuous substitution of measures, Theor. Probability Appl., 5 (1960), pp. 285-301.

[66] , Some minimax problems in the theory of controlled Markov processes, Ibid., 7 (1962), p. 223.

[67] B. I. GRIEGELIONIS AND A. N. SHIRYAEV, On Stefan's problem and optimal stopping rules for Markov processes, Ibid., 11 (1966), pp. 541-558.

[68] , On controlled Markov processes and the Stefan problem, Problemy Peredachi Informacii, 4 (1968), pp. 60-72.

[69] H. HALKIN, Nonlinear nonconvex programming in an infinite dimensional space, in [15], pp. 10-25. [70] Y. C. Ho AND R. C. K. LEE, A Bayesian approach to problems in stochastic estimation and control,

Proc. Joint Autom. Controls Conf., Stanford University, 1964, pp. 382-387. [71] LARs HORMANDER, Hypoelliptic second order differential equations, Acta Math., 119 (1968),

pp. 147-171. [72] DONALD E. JOHANSEN, Optimal control of linear stochastic systems with complexity constraints,

Advances in Control Systems, vol. 4, C. T. Leondes, ed., Academic Press, New York, 1966, pp. 181-278.

[73] T. KAILATH, A general likelihood-ratio formula for random signals in gaussian noise, IEEE Trans. Information Theory, 5 (1969), pp. 350-361.

[74] R. E. KALMAN, A new approach to linear filtering and prediction problems, ASME Transactions, Part D (J. of Basic Engr.), 82 (1960), pp. 35-45.

[75] , The theory of optimal control and calculus of variations, Mathematical Optimization Techniques, University of California Press, Berkeley, California, 1963, pp. 309-331.

[76] A. J. KLEINMAN AND H. J. KUSHNER, Numerical methods for the solution of degenerate nonlinear elliptic equations arising in optimal stochastic control theory, IEEE Trans. Automatic Control, AC-13 (1968), pp. 344-353.

[77] A. N. KOLMOGOROV, Interpolation and extrapolation of stationary random sequences, Bull. Acad. Sci. USSR Math., 5 (1941), pp. 3-14.

[78] N. N. KRASOVSKII, On optimum control with discrete feedback signals, J. Differential Equations, 1 (1965), pp. 1111-1121.

[79] N. V. KRYLOV, On quasidiffusion processes, Theor. Probability Appl., 11 (1966), pp. 373-389. [80] HAROLD J. KUSHNER, Near optimal control in the presence of small stochastic perturbations,

ASME Transactions, Part D (J. of Basic Engr.), 87 (1965), pp. 103-108. [81] , On the stochastic maximum principle: fixed time of control, J. Math. Anal. Appl., 11

(1965), pp. 78-92. [82] , On the existence of optimal stochastic controls, SIAM J. Control, 3 (1966), pp. 463-

474. [83] , Optimal discounted stochastic controlfor diffusion processes, SIAM J. Control, 5 (1967),

pp. 520-531. [84] , Dynamical equations for optimal nonlinearfiltering, J. Differential Equations, 3 (1967),

pp. 179-190. [85] , Approximations to optimal nonlinear filters, IEEE Trans. Automatic Control, AC-12

(1967), pp. 546-556. [86] , The concept of invariant set for stochastic dynamical systems and applications to stochastic

stability, in [5], pp. 47-57. [87] , The Cauchy problem for a class of degenerate parabolic equations and asymptotic pro-

perties of the related diffusion processes, J. Differential Equations, 6 (1969), pp. 209-231. [88] , On the optimal control of linear distributed parameter systems with white noise input,

SIAM J. Control, 6 (1968), pp. 596-614. [89] J. P. LA SALLE, The time-optimal control problem, Contributions to the Theory of Nonlinear

Oscillations, vol. V, Princeton University Press, Princeton, 1960, pp. 1-24. [90] H. LEWY AND G. STAMPACCHIA, On the regularity of the solution of a variational inequality, Comm.

Pure Appl. Math., 22 (1969), pp. 153-188.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 40: Optimal Continuous-Parameter Stochastic Control

508 WENDELL H. FLEMING

[91] WM. C. LINDSEY AND CHAS. L. WEBER, On the theory of automatic phase control, in [5], pp. 101- 132.

[92] R. SH. LIPTSER, A comparison of linear and nonlinear.filtration of some Markov processes, Theor. Probability Appl., 11 (1966), pp. 467-472.

[93] R. SH. LIPTSER AND A. N. SHIRYAEV, Extrapolation of multi-dimensional Markov processes with incomplete data, Teor. Verojatnost i Primenen, 13 (1968), pp. 17-38.

[94] E. J. MCSHANE, Integrals devised for special purposes, Bull. Amer. Math. Soc., 69 (1963), pp. 597-627.

[95] , Stochastic functional equations: continuity properties and relation to ordinary equations, in [127]

[96] P. MANDL, On the control of non-terminating diffusion processes, Theor. Probability Appl., 9 (1964), pp. 591-603.

[97] , Analytical methods in the theory of controlled Markov processes, Trans. Fourth Prague Conf. of Info. Theory, Stat. Decision Fns., Random Processes, Prague, 1965, Academia, Prague, 1967, pp. 45-53.

[98] - , On the control of the Wiener process with restricted number of switchings, Theor. Prob- ability Appl., 12 (1967), pp. 68-76.

[99] A. Z. MIERI, A new approach to the general problem of optimal filtering and control of stochastic systems, Doctoral thesis, University of California, Berkeley, California, 1967.

[100] R. E. MORTENSEN, Stochastic optimal control with noisy observations, Internat. J. Control, 4 (1966), pp. 455-464.

[101] , Optimal control of continuous-time stochastic systems, Rep. ERL66-1, Electronic Research Laboratory, Berkeley, California, 1966.

[102] L. W. NEUSTADT, An abstract variational theory with applications to a broad class of optimization problems, SIAM J. Control, 4 (1966), pp. 505-525.

[103] 'L. S. ORNSTEIN AND G. E. UHLENBECK, On the theory of Brownian motion, Phys. Rev., 36 (1930), pp. 823-841; reprinted in: Selected Papers on Noise and Stochastic Processes, N. Wax, ed., Dover, New York, 1954.

[104] A. A. SAMARSKII, On an economical method for the solution of a multidimensional parabolic equation in an arbitrary region, USSR Comput. Math. and Math. Anal.,2 (1963), pp. 894-926.

[105] A. N. SHIRYAEV, Stochastic equations for nonlinear filtering of Markov step processes, Problemy Peredachi Informatsii, 2 (1966), no. 3, pp. 3-22.

[106] , Some new results in the theory of controlled random processes, Trans. Fourth Prague Conf. on Info. Theory, Stat. Decision Fns., Random Processes, Prague, 1965, Academia, Prague, 1967, pp. 113-203. (In Russian.)

[107] , On two problems of sequential analysis, Kibernetika, (1967), no. 2, pp. 79-86. [108] R. L. STRATONOVICH, On the theory of optimal control. An asymptotic method for solving the

diffusive alternative equation, Automat. Remote Control, 23 (1962), pp. 1352-1360. [109] , A new representation for stochastic integrals and equations, SIAM J. Control, 4 (1966),

pp. 362-371. [110] C. STRIEBEL, Sufficient statistics in the optimum control of stochastic systems, J. Math. Anal.

Appl., 12 (1965), pp. 576-592. [111] D. W. STROOCK AND S. R. S. VARADHAN, Diffusion processes with continuous coefficients, Comm.

Pure Appl. Math., to appear. [112] D. D. SWORDER, On the control of stochastic systems. 1, 11, Internat. J. Control, 6 (1967), pp. 179-

188. [113] N. S. TRUDINGER, The Dirichlet problem Jbr quasilinear uniformly elliptic equations in n variables,

Arch. Rational Mech. Anal., 27 (1968), pp. 109-119. [114] , Pointwise estimates and quasilinear parabolic equations, Comm. Pure Appl. Math.,

21 (1968), pp. 205-226. [115] L. VAN MELLAERT, Inclusion Probability Optimal Control, Res. Rep. PIBMRI 1364-67, Poly-

technic Institute of Brooklyn, New York, 1967. [116] R. VAN SLYKE AND R. WETS, Programming under uncertainty and stochastic optimal control,

SIAM J. Control, 4 (1966), pp. 179-193. [117] H. S. WITSENHAUSEN, A counterexample in stochastic optimum control, SIAM J. Control, 6 (1968),

pp. 131-147.

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions

Page 41: Optimal Continuous-Parameter Stochastic Control

STOCHASTIC CONTROL 509

[118] E. WONG AND M. ZAKAI, On the convergence of ordinary integrals to stochastic integrals, Ann. Math. Statist., 36 (1965), pp. 1560-1564.

[1 19] , On the relation between ordinary and stochastic differential equations, Internat. J. Engrg. Sci., 3 (1965), pp. 213-229.

[120] W. M. WONHAM, Some applications of stochastic difterential equations to optimal nonlinear filtering, SIAM J. Control, 2 (1965), pp. 347-369.

[121] , Optimalstationary control of a linear system with state-dependent noise, SIAM J. Control, 5 (1967), pp. 486-500.

[122] , Stochastic control, Lecture notes, Center for Dynamical Systems, Brown University, Providence, Rhode Island, 1967.

[123] , On the separation theorem of stochastic control, SIAM J. Control, 6 (1968), pp. 312-326. [124] , Optimal stochastic control, Automatica, 5 (1969), pp. 113-118. [125] ' Stochastic Problems in Control, Proc. IBM Sci. Comput. Symp. Control Theor. Appl.,

Yorktown Heights, New York, 1964. [126] L. C. YOUNG, Stochastic integralsfor nigh-martingales, MRC Tech. Summ. Rep. 937, University

of Wisconsin, Madison, 1968. [127] A. V. BALAKRISHNAN, ed., Calculus of Variations and Control Theory, Academic Press, New

York, 1969. [128] A. BENSOUSSAN, Sur l'identification et le filtrage de systemes gouvernes par des equations aux

derivees partielles, Cahiers de 1'IRIA, 1969, no. 1, [also Paris thesis]. [129] , Contr6le optimal stochastique de systemes gouvernes par des equations aux derivees

partielles, Rend. Mat. Rome, 2 (1969), pp. 135-173. [130] H. 0. FATTORINI, Time optimal control of solutions of operational differential equations, SIAM

J. Control, 2 (1964), pp. 54-59. [131] J. P. YVON, Application de la penalisation d la resolution d'un probleme de contr6le optimal,

Cahier de l'IRIA, 1968, [also Paris thesis].

This content downloaded from 91.229.229.203 on Wed, 18 Jun 2014 05:36:52 AMAll use subject to JSTOR Terms and Conditions