Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROLInt. J. Robust Nonlinear Control 2008; 18:218–247Published online 16 May 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/rnc.1193
Stochastic multi-player pursuit–evasion differential games
Dongxu Li1, Jose B. Cruz Jr1,*,y and Corey J. Schumacher2
1Department of Electrical and Computer Engineering, The Ohio State University, 205 Dreese Lab, 2015 Neil Ave.,Columbus, OH 43202, U.S.A.
2Air Force Research Laboratory (AFRL/VACA), Wright-Patterson AFB, OH 45433, U.S.A.
SUMMARY
Autonomous aerial vehicles play an important role in military applications such as in search, surveillanceand reconnaissance. Multi-player stochastic pursuit–evasion (PE) differential game is a natural model forsuch operations involving intelligent moving targets with uncertainties. In this paper, some fundamentalissues of stochastic PE games are addressed. We first model a general stochastic multi-player PE differentialgame with perfect state information. To avoid the difficulty of multiplicity of the players, we extend theiterative method for deterministic multi-player PE games to the stochastic case. Starting from certainsuboptimal solutions with an improving property, the optimization based on limited look-ahead can beused for improvement. The process converges when this improvement is applied iteratively. Furthermore,we introduce a hierarchical approach that can determine a valid starting point of the iterative process. As abasis for multi-player games, stochastic two-player PE games are also addressed. We also briefly discuss thegames with imperfect state information and propose a suboptimal approach from a practical point of view.Finally, we demonstrate the usefulness and the feasibility of the method through simulations. Copyright# 2007 John Wiley & Sons, Ltd.
Received 6 June 2006; Revised 23 February 2007; Accepted 26 February 2007
KEY WORDS: pursuit–evasion; differential game; stochastic process; look-ahead
1. INTRODUCTION
Autonomous aerial vehicles (AAVs) have shown great potential value in reducing humanworkload in future military operations. Co-operation among multiple AAVs is a key factor.Important applications such as intelligent search, co-operative surveillance and reconnaissanceof potential threats, and persistent area denial have drawn much attention [1–7]. Usually, the
*Correspondence to: Jose B. Cruz Jr, Department of Electrical and Computer Engineering, The Ohio State University,205 Dreese Lab, 2015 Neil Ave., Columbus, OH 43202, U.S.A.yE-mail: [email protected], [email protected]
Contract/grant sponsor: Air Force Research Laboratory (AFRL/VA); contract/grant number: F33615-01-2-3154Contract/grant sponsor: Air Force Office of Scientific Research (AFOSR)
Copyright # 2007 John Wiley & Sons, Ltd.
(potential) targets and threats in a battlefield are intelligent and mobile, which can employcounter-strategies to avoid being detected, tracked or destroyed. These action and counter-action behaviours can be naturally formulated in a game setting, or more specifically, bypursuit–evasion (PE) differential games (with multiple players). On the other hand, informationsources in military applications are usually limited and involve uncertainties. In this paper, westudy stochastic multi-playerz PE differential games.
The study of differential games was initiated by Rufus P. Isaacs when he investigated PEproblems [8]. In a general PE game problem, one or a group of pursuers go after one or moreevaders, and it is usually formulated as a zero-sum game in which pursuers try to minimize aprescribed cost functional while evaders try to maximize the same functional. In the literature, anumber of formal solutions regarding optimal strategies have been achieved [8, 9]. However,most theoretical results mainly focus on two-player games with a single pursuer and a singleevader, which are no longer adequate in dealing with the newly emergent situations involvingmultiple players.
Recently, the increasing use of autonomous assets and robots has led to renewed interest inPE games [1–3, 10, 11]. Hespanha et al. [2, 10] formulated PE games in discrete time under aprobabilistic framework, in which greedy and one-step Nash equilibrium strategies are solved,respectively. PE strategies were also studied by Antoniades et al. [3], where several heuristicsolutions were attempted and compared. Furthermore, the system structure and various issuesof implementation of PE strategies by AAV teams are discussed in [1, 11]. These approaches alldeal with discrete-time problems, in which the problem of search and pursuit are intertwined.On the other hand, general (deterministic) multi-player PE games in continuous time are studiedin [4, 5,12, 13]. In [4], a suboptimal solution is solved by a hierarchical decomposition method,and it is further generalized as a class of ‘structured’ suboptimal methods such that the resultingsuboptimal solution has an improving property [5, 13]. In [5, 13], the optimization based onlimited look-ahead is used to improve the suboptimal solution iteratively, and an optimalsolution can be approached in the limit. In addition, the performance enhancement by limitedlook-ahead is further analysed in [12].
In this paper, we extend the previous results in deterministic multi-player PE games to thestochastic case, in which we show that the iterative method is still applicable. Under theframework of the hierarchical method, an analytical solution is derived for a two-playerstochastic PE differential game using a simplified Dubin’s car model, and the finite expectationof capture time is analysed. This paper is mainly concerned with games with the perfect stateinformation pattern, i.e. the players can measure the state variables perfectly. For games withimperfect state information, we provide a suboptimal approach from a practical point of view.The paper is organized as follows. In Section 2, a stochastic multi-player PE game with perfectstate information is formulated based on [14]. In Section 3, we first point out the difficulty due tomultiplicity of the players. Then, to extend the relevant results in deterministic games [5, 13], weuse suboptimal ‘structured’ control methods with sample-set-time-consistency (SSTC) to initiateour improvement iteration. Specifically, a hierarchical method is introduced. The resultingsuboptimal solution can be improved by the optimization based on limited look-ahead. Thisimprovement may be applied iteratively, and the process converges. We further investigate two-player stochastic games based on conventional stochastic control theory. Finally, problems withan imperfect information pattern are briefly discussed. In Section 4, the usefulness and the
zHere, ‘multi-player’ means multiple pursuers and multiple evaders.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 219
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
feasibility of the limited look-ahead method are demonstrated through two selected stochasticPE scenarios. Finally, conclusions are drawn in Section 5.
2. STOCHASTIC GAME FORMULATION
Consider a general PE differential game with N pursuers and M evaders in a n0 dimensionalspace S � Rn0 : Denote by x i
p (x je) the state variable of pursuer i; i ¼ 1; . . . ;N (evader j;
j ¼ 1; . . . ;M), where x ip 2 Rn i
p (x je 2 Rn
je ). Here, n i
p; nje ; n0 2 Zþ ¼
4
fn 2 Z; n > 0g; and n ip; n
je5n0:
The first n0 elements in state x ip (x j
e) denote the physical position of pursuer i (evader j) in thecommon space S: The dynamics of pursuer i and evader j are described by (1), where forsimplicity, we use the subscript t to represent time.
dx ipt ¼ f i
p ðxipt; aitÞ dtþ s i
pðxiptÞ dw
ipt with x i
pðt0Þ ¼ x ip0 ð1aÞ
dx jet ¼ f j
e ðxjet; bjtÞ dtþ s j
eðxjetÞ dw
jet with x j
eðt0Þ ¼ x je0 ð1bÞ
In (1), x ipt 2 Rn i
p ; x jet 2 Rn
je for t5t0; ait 2 Ai
a; bjt 2 Bja; where Ai
a 2 Rmip and Bj
a 2 Rmje
are compact sets with mip;m
je 2 Zþ; wi
pt 2W ip � Rk i
p (wjet 2W j
e � Rk je ) is a standard Wiener
process; s ip (s j
e) is a map s ip : Rn i
p/Rn ip�k
ip (s j
e : Rnje/Rn
je�k
je ). In (1), f i
p : Rn ip � Ai
a/Rn ip
( f je : Rn
je � Bj
a/Rnje ) and s i
pð�Þ (sjeð�Þ) are bounded and Lipschitz in x i
p (x je), uniformly with
respect to ait (bjt) [14]. The additive disturbance in this model is used mainly for mathematicalconvenience; however, in practice, it may stand for the pursuers’ ignorance of the evaders’motions [6] or the disturbance on the movement of small AAVs, e.g. random wind effects.
For pursuer i; define the projection operator P : Rn ip ! Rn0 as Pðx i
pÞ ¼ ½xip1;x
ip2; . . . ;x
ipn0�T:
Clearly, Pðx ipÞ represents the physical position of pursuer i: A similar operator can be defined for
every pursuer and every evader, and we use the same notation Pð�Þ for each of them. Evader j isconsidered captured if there exists pursuer i; such that jjPðx i
pðtÞÞ � Pðx jeðtÞÞjj4e for some t5t0:
Here, e > 0 is predefined. The capture time of evader j is defined as
Tj ¼ minft5t0j9i; 14i4N; such that jjPðx ipðtÞÞ � Pðx j
eðtÞÞjj4eg
In a multi-player PE game, the evaders are generally not captured simultaneously. A gameterminates when all the evaders are captured, and the terminal time, T ; can be defined asT ¼ maxj fTjg:
We use a discrete vector, z 2 ZM ¼ Z � Z � � �Z with Z ¼ f0; 1g; as additional states to indicatewhether evader j ( j ¼ 1; . . . ;M) is captured. Here, the jth element of vector z; zj ¼ 1 if evader j isnot captured while zj ¼ 0 if it is captured. According to the definition of the capture of anevader, zj is governed by function gj:Z � X ! Z that is defined as
gjð0;xÞ ¼ 0
gjð1;xÞ ¼0 jjPðx i
pðtÞÞ � Pðx jeðtÞÞjj4e for some i; i ¼ 1; . . . ;N
1 otherwise
(Here, x is the aggregate state including all x i
p and x je and X is the set of all possible x’s, which will
be defined shortly.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER220
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Assumption 1Every pursuer can proceed to other evaders after it captures an evader; while every evader stopsmoving when it is captured.
With Assumption 1, the dynamics of evader j in (1) is equivalent to dx jet ¼ zjt � f
je ðx
jet; bjtÞ dt
þzjt � s jeðx
jetÞ dw
jet: Clearly, control bj is meaningful only when zjt ¼ 1: For simplicity, let
xp ¼ ½x1T
p ;x2T
p ; . . . ;xNT
p �T; xe ¼ ½x
1T
e ; x2T
e ; . . . ;xMT
e �T
a ¼ ½aT1 ; aT2 ; . . . ; a
TN �
T; b ¼ ½bT1 ; bT2 ; . . . ; b
TM �
T
fp ¼ ½ f1T
p ; f 2T
p ; . . . ; f NT
p �T; fe ¼ ½z1 f
1T
e ; z2 f2T
e ; . . . ; zM fMT
e �T
wp ¼ ½w1T
p ;w2T
p ; . . . ;wNT
p �T; we ¼ ½w
1T
e ;w2T
e ; . . . ;wMT
e �T
spðxpÞ ¼Diagðs1pðx1pÞ; . . . ; s
Np ðx
Np ÞÞ and seðxeÞ ¼ Diagðz1s1eðx
1eÞ; . . . ; zMsMe ðx
Me ÞÞ
where spðxpÞ and seðxeÞ are block diagonal matrices with s ip’s and zjs j
e ’s as the diagonal
elements in a generalized sense;} define x ¼ ½xTp ;xTe �
T; f ðx; a; bÞ ¼ ½ f Tp ðx; aÞ; fTe ðx; bÞ�
T; sðxÞ ¼DiagðspðxpÞ;seðxeÞÞ and w ¼ ½wT
p ;wTe �
T: Then, Equation (1) can be rewritten as
dxt ¼ f ðxt; at; btÞ dtþ sðxtÞ dwt with xðt0Þ ¼ x0 ð2Þ
Let X ¼4
ðQN
i¼1 Rn ip Þ � ð
QMj¼1 Rn
je Þ: Denote by Aa ¼
QNi¼1 A
ia the set of possible controls of all the
pursuers and similarly for the evaders, i.e. Ba ¼QM
j¼1 Bja: In addition, let g¼
4
½g1; . . . ; gM �T; and
the algebraic equation that governs state z becomes
zðtÞ ¼ zðtþÞ ¼ gðzðt�Þ;xðtÞÞ ð3Þ
Here, zðtþÞ and zðt�Þ denote the right and the left limit of z at time t:
Definition 1Let ðO;F;PÞ be a probability space, fFtg a monotone family of s-fields Ft �F with Ft1 �
Ft2 for any t04t14t2; and X a complete separable metric space. A stochastic process x definedon the time interval ½t0;T �;
} xðtÞ : O! X with t04t4T ; is said to be fFtgt5t0 -progressivelymeasurable if for all t 2 ½t0;T �; the map ðt;oÞ/xðt;oÞ is B½t0; t� �Ft=BðXÞ measurable, whereB½t0; t� is the Borel s-field on ½t0; t� and BðXÞ is the Borel s-field on X(cf. [14]).
Define K ¼PN
i¼1 k ip þ
PMj¼1 k
je : The sample space for the stochastic process in (2) belong to
the set
Oot ¼ fo 2 Cð½t;T �Þ; o : ½t;T �/RK with ot ¼ 0g
for every t 2 ½t0;T � (T > 0), where Cð½t;T �Þ is the set of continuous functions on ½t;T �:Denote byFo
t;s �Fs the s-field generated by paths from time t to s (for any t4s4T) in Oot : Provided with
the Wiener measure Pot on Fo
t;T ; Oot becomes the canonical sample space for the stochastic
process of (2). Similarly, for t4s4T ; define
Oot;s ¼ fo 2 Cð½t; s�Þ; o : ½t; s�/RK with ot ¼ 0g
}Note that spðxpÞ and seðxeÞ are not necessarily square matrices.}Here, T may be infinity and in which case the interval is right open.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 221
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Information pattern in games is crucial to the players, especially in a stochastic game. In a PEgame, the simplest case is that both pursuers and evaders can access the state variables perfectly,in which case a value function can be well defined. Situations become much more difficult whenthe players’ measurements are noisy and distinct, and in such cases the existence of solutions canbe problematic. In this paper, we focus on problems with perfect state information; and theimperfect state information case is briefly discussed. Readers can refer to [15, 16] for morediscussion on information patterns in stochastic games with imperfect state information.
In a stochastic PE game, suppose that the players can measure the state perfectly.k Weconsider the following objective functional:
Jða; b; xt; ztÞ ¼ Eo;t
Z T
t
Gðxt; zt; at; btÞ dtþQðxT Þ
� �subject to ð2Þ and ð3Þ ð4Þ
In (4), o 2 Oot;T ; the terminal cost Q : X/R50¼
4
fr50jr 2 Rg and the cost rate G : X � ZM �
Aa � Ba/R50 are bounded and Lipschitz continuous in xt; uniformly with respect to at and bt[14]. Here, Eo;t denotes the expectation taken with respect to the stochastic process o startingfrom t: Henceforth, we will use Eo for an abbreviated notation. An admissible control of thepursuers and that of the evaders, að�Þ and bð�Þ are defined, respectively, as
að�Þ 2 AðtÞ ¼4
ff : ½t;T �/Aajfð�Þ is Ft;s-progressively measurableg
bð�Þ 2 BðtÞ ¼4
fj : ½t;T �/Bajjð�Þ is Ft;s-progressively measurableg
for t4s4T : We say that að�Þ; *að�Þ 2 AðtÞ are the same on interval ½t; s�; denoted by a � *aon ½t; s�; if the probability Po
t ða ¼ *a a:e: in ½t; s�Þ ¼ 1: The same holds for bð�Þ; *bð�Þ 2 BðtÞ: Beforedefining the value function, we first define a strategy of the pursuers at time t5t0 as a mapa : BðtÞ/AðtÞ; where the pursuers’ control is a function of the input that evaders exploit.Similarly, the evaders’ strategy b can be defined as b : AðtÞ/BðtÞ:
Definition 2A non-anticipative strategy a (or b) of the pursuers (or evaders) on ½t;T � is
a 2 GðtÞ ¼4 fa : BðtÞ/AðtÞj for any bð�Þ; *bð�Þ 2 BðtÞ; b � *b on ½t; s�
implies a½b� � a½*b� on½t; s� for every s 2 ½t;T �g
ðor b 2 DðtÞ ¼4 fb : AðtÞ/BðtÞj for any að�Þ; *að�Þ 2 AðtÞ; a � *a on ½t; s�
implies b½a� � b½*a � on ½t; s� for every s 2 ½t;T �gÞ
This definition implies that the decision-making of the players only depends on the informationup to the current time. More importantly, under the definitions of admissible control and
k Information is shared by all the pursuers or all the evaders assuming sufficient communication on either side.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER222
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
non-anticipative strategy, the stochastic differential equation (2) has a unique solution [14].More details about the formulation of a stochastic game problem can be found in [9, 17, 18].
A stochastic PE game is formulated as a zero-sum game. For any x 2 X and z 2 ZM ; thelower value of a game Vðx; zÞ is defined as
Vðx; zÞ ¼ infa2GðtÞ
supbð�Þ2BðtÞ
fJða½b�; b; x; zÞg
¼ infa2GðtÞ
supbð�Þ2BðtÞ
Eo
Z T
t
Gðxt; zt; a½b�t; btÞ dtþQðxT Þ
� �ð5Þ
Similarly, the upper value Vðx; zÞ is
Vðx; zÞ ¼ supb2DðtÞ
infað�Þ2AðtÞ
fJða;b½a�; x; zÞg
¼ supb2DðtÞ
infað�Þ2AðtÞ
Eo
Z T
t
Gðxt; zt; at; b½a�tÞ dtþQðxT Þ
� �ð6Þ
It turns out that in (5), pursuers have an informational advantage and similarly for evaders in(6). In general, Vðx; zÞ4Vðx; zÞ [19, p. 434], and if Vðx; zÞ ¼Vðx; zÞ; we say that value of thegame (saddle-point equilibrium) exists, which is denoted by Vðx; zÞ:** This is called the Isaacscondition. With these definitions, optimality of solutions can be interpreted according to V;VorV:Without assuming the Isaacs condition, our study is focused on the upper value defined in(6).yy Hereafter, we use the capitalized ‘Value’ to stand for the (lower/upper) Value functions ofa multi-player game.
3. SOLUTION TECHNIQUES
3.1. Difficulty of conventional differential game theory
Dynamic programming (DP) is a general method for solving stochastic differential games, inwhich the underlying idea is the state rollback, i.e. an optimal state trajectory is tracedbackwards starting from certain terminal state. Specifically, the Value function is characterizedby a Hamilton–Jacobi–Isaacs (HJI) equation, and initial conditions on terminal states aregenerally needed. In contrast to a two-player PE game, in a multi-player game, if pursuer icatches evader j; we say that both players are engaged. Clearly, the possible terminal statesdepend on a specific engagement between the pursuers and the evaders, which are hard to bespecified when N and M are large. Furthermore, the treatment by the corresponding HJIequation for a multi-player game becomes more difficult because the additional discrete state z isinvolved. In general, DP approaches cannot be directly applied to multi-player PE games [13].
3.2. An iterative method based on a suboptimal solution
As in deterministic PE games [13], we start with a class of suboptimal methods, where anadditional structure S is imposed on the pursuers’ controls. Given any states x 2 X and z 2 ZM ;
**The definition of the Value above is due to Varaiya [20], Roxin [21] and Elliott and Kalton [22, 23].yyThis represents a worst case from the pursuers’ perspective, assuming that the pursuers’ control input is known by theevaders at each time t. This may not be true in practice, and if so the pursuers will have a better performance.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 223
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
we denote by ASx;zðtÞ a non-empty ‘structured’ control set of the pursuers, and AS
x;zðtÞ � AðtÞ:Here, the superscript and the subscript in AS
x;z indicate the dependence on the structure S and onthe states x and z: Suppose that for any x 2 X ; z 2 ZM at time t5t0; problem (6) is solvable withrespect to AS
x;zðtÞ; i.e.fVðx; zÞ ¼ supb2DðtÞ
infað�Þ2AS
x;zðtÞEo
Z T
t
Gðxt; zt; at; b½a�tÞ dtþQðxT Þ
� �ð7Þ
where o 2 Oot;T and AS
x;zðtÞ is the restricted control set under structure S at time t: Since
ASx;zðtÞ 2 AðtÞ;V4fV: Given any *atð�Þ 2 AS
xt;ztðtÞ; xt 2 X ; zt 2 ZM ; b 2 DðtÞ and o 2 Oo
t;T ; denoteby xs;xt;*at;b½*at�;o the trajectory of x for s5t starting from xt under *a
t and b½*at� corresponding to o:For short, let *xðsÞ ¼ xs;xt;*at;b½*at�;o: Denote by zs;zt; *x the trajectory of z corresponding to *x and use
*zðsÞ ¼ zs;zt; *x for an abbreviated notation.
Definition 3 (sample-set-time-consistency(SSTC))A suboptimal control structure S is said to be sample-set-time-consistent, or SST-consistent forshort, if given any o 2 Oo
t;T ; for any *atð�Þ 2 ASxt;ztðtÞ at any t with t04t5T and the corresponding
trajectories *xs and *zs under *atð�Þ and any b 2 DðtÞ; there exists *asð�Þ 2 AS*xs;*zsðsÞ such that
*atðtÞ ¼ *asðtÞ for s4t5T ; where AS*xs;*zsðsÞ associated with *xs and *zs at any s for t4s5T :
Remark 1SSTC says that at any state *xðsÞ; *zðsÞ along the trajectories for t4s5T given o 2 Oo
t;T ; the laterportion of a structured control *atð�Þ 2 AS
xt;ztðtÞ; which is determined at time t; from time s to T
(*atðtÞ; s4t5T), belongs to the structured control set AS*xs;*zsðsÞ determined at time s associated with
*xs and *zs: Clearly, if structure S is independent of the state x; z and time t; it is SST-consistent.
Theorem 1Under an SST-consistent control structure S; function fVðx; zÞ in (7) satisfies (8) for any x 2 X ;z 2 ZM with any Dt > 0 under (2) with (3).
fVðx; zÞ5 supb2DðtÞ
infað�Þ2AðtÞ
Eo
Z tþDt
t
Gðxt; zt; at;b½a�tÞ dtþfVðx
tþDt;x;a;b½a�;o; ztþDt;z;xÞ
( )ð8Þ
Here, tþ Dt¼4 minftþ Dt;Tg:
ProofRefer to Appendix A.1. &
Theorem 1 states that the suboptimal upper Value fV can be improved by the optimizationover a limited look-ahead interval. On the right-hand side of (8), a simpler game problem is
solved with a possibly shorter horizon and fV as a cost-to-go function.zz The optimizationis subject to the players’ dynamics (2) with (3).}} It should be noted that as in (6), herein the optimization over limited look-ahead intervals, the evaders have the informational
zzThe term ‘cost-to-go’ states that it stands for the corresponding cost of certain dynamic process from tþ Dt to T :}}Hereafter, without explicit description, optimization problems similar to (8) are subjected to the same constraints.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER224
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
advantage over the pursuers. Next, we define a transformation H½W� for any function W 2W¼
4
fW : X � ZM/Rg as
H½W �ðx; zÞ ¼ supb2DðtÞ
infað�Þ2AðtÞ
Eo
Z tþDt
t
Gðxt; zt; at;b½a�tÞ dtþWðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ
( )ð9Þ
subject to (2) and (3). Then, a sequence of functions Wk can be generated by Wkþ1 ¼H½Wk�
starting from some W0 2W: The following theorem shows that the sequence converges if
W0 ¼fV in (7).
Theorem 2(i) IfW0ðx; zÞ satisfies (8) for any x 2 X and z 2 ZM ; then the sequence fWkg
1k¼0 converges point-
wisely; (ii) The limit W1ðx; zÞ ¼4
limk!1Wkðx; zÞ satisfies W1ðx; zÞ ¼H½W1�ðx; zÞ:
ProofRefer to Appendix A.2. &
Theorem 2 states that a suboptimal upper Value in (7) that results from ‘structured’ controlsof SSTC can be improved iteratively by the optimization based on limited look-ahead. The limitof this process W1 is the best upper bound of V that can be approached by this scheme. Then,a natural question is that whether W1 is the true upper Value, which needs furtherinvestigation. The starting point can be [13, 14, 18].
3.3. The hierarchical (structured) suboptimal method
With the iterative method, a multi-player game reduces to finding a valid suboptimal upperValue function with the improving property in (8). In what follows, we will briefly introduce ahierarchical decomposition method [4, 13] that can determine a valid starting point for theiterative process. To illustrate the idea, we consider a class of games with the following objectivefunctional}} and the players’ dynamics in (2) with (3).
Jða; b; x; zÞ ¼
Z T
t
XMj¼1
zjðtÞ
" #dt ð10Þ
The objective in (10) stands for the sum of the capture time of each evader.There are two levels in the hierarchical approach: the upper level is to determine a proper
engagement scheme between the pursuers and the evaders, such that a multi-player PE gamecan be decomposed into distributed two-player PE games; at the lower level, the decoupledtwo-player games are solved [4]. The basic assumption of this approach is that the underlyingtwo-player stochastic PE games are solvable, i.e. the upper Value Vij jðxi j jÞ of the two-playergame between any evader j and its engaged pursuer ij may be solved and is available to the upper
level, where xij j ¼4
½xijTp ;x j T
e �T: By (10), the objective functional for a decoupled two-player game
is the expected capture time, i.e. J ¼ EfRdtg: Suppose that M4N and each evader (pursuer)
can be engaged with no more than one pursuer (evader). Assume that there exists anengagement scheme E such thatVij jðxij jÞ51 for the game between any evader j and its engaged
}} In general, in a game with an objective J; if there exist distributed objectives Jj (j ¼ 1; . . . ;M) associated with eachevader j such that J4
PJj ; then the hierarchical method is applicable.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 225
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
pursuer ij : Then, a combinatorial optimization problem to determine an optimal engagementbetween the pursuers and the evaders at the upper level can be formulated as
fVhðx; zÞ ¼ min
fsijg
XNi¼1
XMj¼1
VijðxijÞ
( )
subject to sij 2 f0; 1g;XNi¼1
sij ¼ 1;XMj¼1
sij41 ð11Þ
In (11), the superscript h infVhindicates that it is determined by the hierarchical approach; the
assignment variables sij’s are binary, where sij ¼ 1 indicates that pursuer i is engaged withevader j and sij ¼ 0 if it is not.
The hierarchical method is a natural way of dealing with multiplicity of the players, in whichthere is an additional hierarchical structure that is imposed on pursuers’ controls, and thus
V4fVh: It is also worth noting that the (best) strategy of the evaders determined ‘locally’ at the
lower level against their engaged pursuers is optimal with respect to (7) if each evader can onlybe captured by its engaged pursuer. More importantly, the hierarchical approach is SST-consistent because the ‘structure’ (the set of possible engagements between the pursuers and the
‘alive’ evaders) remains the same for any state x and time t given any z 2 ZM : Therefore,fVhis a
valid starting point for the iterative method.Problem (11) is formulated for the case M4N: If M > N; some pursuer i must be engaged
with more than one evader, and pursuer i proceeds to them sequentially. In this case, the(assignment) problem becomes a multi-stage allocation problem where each stage is associatedwith one engaged evader for pursuer i: Denote by j i1; j
i2; . . . ; j
inithe evaders engaged with pursuer
i: Although the problem is similar, it entails intensive computation because the starting points ofthe subsequent games between pursuer i and evaders j i2; . . . ; j
inidepend on the terminal of the
games between pursuer i and their previous evaders j i1; ji2; . . . ; j
ini�1
: These are random events and
the expectation embedded in the calculation of Vij ikis difficult to compute. In practice,
approximations may be taken such that a suboptimal engagement is solved instead.kk
In summary, we show that given a proper suboptimal solution to a stochastic multi-player PEgame, the optimization based on limited look-ahead can be used to improve the solution iteratively.It should be noted that this method is closely related to DP methods, and it suffers the curse ofdimensionality. Thus, scalability is an issue when the dimension of the states or the number of theplayers is large. Practical algorithms still need to be further investigated. To this end, those numericalmethods for solving DP equations, which have been extensively studied [24–27], may benefit theimplementation of the iterative method. On the other hand, despite lack of efficient algorithms, theiterative method still has its practical value in performance enhancement. The iterative process maystop at any step to provide the best suboptimal solution to date due to the monotonicity. In practice,this method can provide a satisfactory solution based on a carefully chosen cost-to-go function. Wewill later demonstrate the usefulness of this method through simulations.
3.4. Solution to a two-player pursuit–evasion game
Under the framework of the iterative method with the hierarchical approach, solutionto two-player games becomes a basis for multi-player games. In this section, we present
kk In this case, the improving property in (8) may not hold.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER226
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
the results on two-player games based on stochastic differential game (optimal control)theory.
3.4.1. Preliminary: Stochastic differential game theory. For the reader’s convenience, we firstbriefly review stochastic differential game theory based on [17]. Consider the following players’dynamic equation:
dxt ¼ f ðt;xt; at; btÞ dtþ sðxtÞ do ð12Þ
Here, x 2 Rn; at 2 Aa and bt 2 Ba are the controls of player 1 (minimizer) and player 2(maximizer), where Aa and Ba are compact sets; o is a standard Wiener process with properdimension. Let the objective functional be
#Jða; b; t;xtÞ ¼ Eo
Z T
t
Gðt;xt; at; btÞ dtþQðxT Þ
� �Here, G : R� Rn
� Aa � Ba/R represents the cost rate; Q : Rn/R is the terminal cost; T isexit time, which is defined as T ¼ minftjðt;xtÞ =2 Qg with some open set Q¼
4
T� X : Here, X �Rn is open and T ¼ ½0;T � � R: The non-anticipative strategies of player 1 and 2 are denoted bya 2 GðtÞ and b 2 DðtÞ; respectively. Let C2ðXÞ be the set of functions with continuous second-order derivatives on X : Denote by C1;2ðQÞ the set of functions fðt; xÞ that have continuous first-and second-order partial derivatives, ft; fx and fxx: Henceforth, we use C2 and C1;2
for simplicity. Define CðXÞ ¼4 fc : X/Rjc is measurable on X and boundedg: Denote byPðs;xs; t;xtÞ the transition probability density of xt 2 X at t given the state xs at time s with s5t:For any c 2 CðXÞ; define operator Ss;t as
Ss;t½c�ðxsÞ ¼ EfcðxtÞjxsg ¼ZX
cðxÞPðs;xs; t;xÞ dx
Define the operator XðtÞ on CðXÞ as
XðtÞ½c�ðxtÞ ¼ limh!0
h�1ðSt;tþh½c�ðxtÞ � cðxtÞÞ
For the stochastic process xt defined in (12), which is a diffusion process [17], XðtÞ is a second-order partial differential operator on C2; i.e.
XðtÞ½c� ¼ cx � f ðt;xt; a; bÞ þ12trðcxxsðxtÞs
TðxtÞÞ ð13Þ
Here, trðMÞ denotes the trace of a square matrix M: Consider any function fðt; xÞ 2 C1;2 and byIto’s differential rule [17, 18],
dfðt;xÞ ¼ftðt;xÞ dtþ fxðt;xÞ dxþ12trðfxxsðxtÞs
TðxtÞÞ do
¼ftðt;xÞ dtþ XðtÞfðt; xÞ dtþ fxsðxtÞ do ð14Þ
Lemma 3Assume that
(i) functions f and s in (12) satisfy that
jj f ðt;x; a; bÞjj4Cð1þ jjxjjÞ and jjsðxÞjj4Cð1þ jjxjjÞ
for some constant C > 0; any x 2 X ; a 2 Aa and b 2 Ba;
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 227
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
(ii) Let function V 2 C1;2; there exist constants D and k; such that
jjVðt;xÞjj4Dð1þ jjxjjkÞ for any ðt;xÞ 2 Q
(iii) V is continuous on Q; the closure of Q;(iv) Vt þ XðtÞVþ Gðt;xÞ50 for all ðt; xÞ 2 Q; where Ef
R Ts jGðt; xtÞj dtg51 for any ðs;xsÞ 2
Q and some function G : R� Rn/R: Then,
Vðs;xsÞ4E
Z T
s
Gðt;xtÞ dtþVðT ;xT Þ
� �
ProofRefer to Theorem 5.1 on page 124 in [17]. &
Denote by #V; #V and #V the upper, the lower and the Value of the game.
Theorem 4Suppose that functions f ; s satisfy the conditions (i), and #V satisfies (ii) and (iii) in Lemma 3.Assume that there exists að�Þ 2 AðsÞ such that
Eo
Z T
s
jGðt;xt; at;b½a�tÞj dt� �
51 ð15Þ
for any ðs;xsÞ 2 Q and b 2 DðsÞ: Let #V be a solution of the following HJI equation:
#Vtðt;xtÞ þ minat2Aa
maxbt2Ba
fXðtÞ½ #V�ðt;xtÞ þ Gðt;xt; at; btÞg ¼ 0 ð16Þ
for any ðt;xtÞ 2 Q with the boundary #VðT ;xT Þ ¼ QðxT Þ where ðT ; xT Þ 2 @Q (the boundary ofQ), such that #V 2 C1;2ðQÞ and continuous on Q (the closure). Then,
(i) #Vðs; xsÞ4 supb2DðtÞ
f #Jða;b½a�; s; xsÞg
for any að�Þ 2 AðsÞ and ðs;xsÞ 2 Q;(ii) if there exists anð�Þ 2 AðsÞ such that ant 2 Aa for s4t5T and it satisfies
maxbt2Ba
fXðtÞ #Vðt;xtÞ þ Gðt; xt; an
t ; btÞg ¼ minat2Aa
maxbt2Ba
fXðtÞ #Vðt;xtÞ þ Gðt;xt; at; btÞg ð17Þ
for any ðt;xtÞ 2 Q; then #Vðs;xsÞ ¼ supb2DðsÞ fJðan; b½an�; s;xsÞg for any ðs;xsÞ 2 Q:
ProofThe theorem can be easily proved by extending Theorem 4.1 on page 159 in [17]. &
Remark 2The conclusion in Theorem 4 can be extended to the lower Value #V if it satisfiesconditions similar to (16) and (17) but with the order of ‘minimization’ and ‘maximization’
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER228
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
reversed, e.g.,
#Vtðt;xtÞ þmaxbt2Ba
minat2Aa
fXðtÞ½ #V �ðt;xtÞ þ Gðt; xt; at; btÞg ¼ 0 ð18Þ
Furthermore, if the HJI equations (16) and (18) coincide, then #V ¼ #V ¼ #V:
3.4.2. Solution to a two-player pursuit–evasion game. In this section, we introduce an analyticresult for a specific two-player differential PE game. Consider the following dynamics of theplayers in R2:
d %xB ¼ vB cos yB dtþ sB doB %x with %xBð0Þ ¼ %xB0 ð19aÞ
d%yB ¼ vB sin yB dtþ sB doB%y with %yBð0Þ ¼ %yB0 ð19bÞ
This is a noise-corrupted version of the simplified Dubin’s car model, where thesubscript B 2 fp; eg stands for the pursuer or the evader; %xB and %yB are the state variables(displacement); vB is the velocity; yB is the control input; oB %x (oB%y) is a standard Wiener process;sB is constant. Assume that oB %x and oB%y are independent, and so are op %x
(op%y ) and oe %x (oe%y).The objective is the capture time, i.e. J ¼ Ef
R Tt dtg: Define xB¼
4
½ %xB; %yB�T; and we rewrite
the dynamics in (19) as dxB ¼ fBðxB; yBÞ dtþ sB doB: The game ends when jjxp � xejj4e forsome e > 0:
Theorem 5Assume that EfTg51; vp > ve with e > ðs2p þ s2eÞ=2ðvp � veÞ; the Value function of the PE gameVðxp; xeÞ is given by
Vðxp;xeÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð%yp � %yeÞ
2q
vp � veþ
s2p þ s2e4ðvp � veÞ
2ln
ð %xp � %xeÞ
2þ ð %yp � %yeÞ
2
!þ CðeÞ ð20Þ
where the constant
CðeÞ ¼ �e
vp � ve�ðs2p þ s2eÞ � lnðe
2Þ
4ðvp � veÞ2
ProofThe proof is an application of Theorem 4. We need to show thatV is both an upper and a lowerValue function. Here, we only show that V is an upper Value, i.e. it satisfies the correspondingHJI equation as in (16), i.e. minyp maxyefXðtÞ½V�ðxp; xeÞ þ 1g ¼ 0: First of all, it is easy tocheck that f ; s in (19) and V satisfy conditions (i)–(iii) in Lemma 3. The assumption thatEfTg51 implies that condition (15) holds. Note that here G ¼ 1 and V ¼ 0 whenffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð%yp � %yeÞ
2q
¼ e: By (13) and the independence of oB %x and oB%y;
XðtÞV ¼@V
@xp� fp þ
@V
@xe� fe|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}
D1
þ1
2
X2i¼1
@2V
@2xpis2p þ
1
2
X2j¼1
@2V
@2xejs2e|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
D2
ð21Þ
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 229
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Here, i ¼ 1; 2 (j ¼ 1; 2) stands for %x and %y; respectively. Substitute (19) into (21), and the termsD1 and D2 in (21) become
D1 ¼1
vp � ve
ð %xp � %xeÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð%yp � %yeÞ
2q vp cosðypÞ
0B@þ
ð%yp � %yeÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð%yp � %yeÞ
2q vp sinðypÞ þ
ð %xe � %xpÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð%yp � %yeÞ
2q ve cosðyeÞ
þð%ye � %ypÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð %xp � %xeÞ2þ ð%yp � %yeÞ
2q ve sinðyeÞ
1CA
þs2p þ s2e
4ðvp � veÞ2
2ð %xp � %xeÞ
ð %xp � %xeÞ2þ ð%yp � %yeÞ
2vp cosðypÞ
þ2ð%yp � %yeÞ
ð %xp � %xeÞ2þ ð%yp � %yeÞ
2vp sinðypÞ þ
2ð %xe � %xpÞ
ð %xp � %xeÞ2þ ð%yp � %yeÞ
2ve cosðyeÞ
þ2ð%ye � %ypÞ
ð %xp � %xeÞ2þ ð%yp � %yeÞ
2ve sinðyeÞ
!ð22Þ
D2 ¼ s2pðV %xp %xp þV%yp %yp Þ þ s2eðV %xe %xe þV%ye %ye Þ
¼1
2ðvp � veÞ
ð %xp � %xeÞ2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�3=2
s2p þð%yp � %yeÞ
2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�3=2
s2p
þð %xp � %xeÞ
2
½ð %xp � %xeÞ2þ ð %xp � %xeÞ
2�3=2
s2e þð %xp � %xeÞ
2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�3=2
s2e
!
þs2p þ s2e
4ðvp � veÞ2
ð%yp � %yeÞ2� ð %xp � %xeÞ
2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�2s2p þ
ð %xp � %xeÞ2� ð%yp � %yeÞ
2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�2s2p
þð%yp � %yeÞ
2� ð %xp � %xeÞ
2
½ð %xp � %xeÞ2þ ð %xp � %xeÞ
2�2s2e þ
ð %xp � %xeÞ2� ð%yp � %yeÞ
2
½ð %xp � %xeÞ2þ ð%yp � %yeÞ
2�2s2e
!ð23Þ
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER230
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
By inspection of (21)–(23), only the term D1 in (22) involves control yp and ye: Clearly,
minyp
maxyefD1g ¼
1
vp � veð�vp þ veÞ þ
s2p þ s2e4ðvp � veÞ
2�
2ð %xp � %xeÞ2
½ð %xp � %xeÞ2þ ð %yp � %yeÞ
2�3=2
vp
�2ð%yp � %yeÞ
2
½ð %xp � %xeÞ2þ ð %yp � %yeÞ
2�3=2
vp þ2ð %xp � %xeÞ
2
½ð %xp � %xeÞ2þ ð %yp � %yeÞ
2�3=2
ve
þ2ð%yp � %yeÞ
2
½ð %xp � %xeÞ2þ ð %yp � %yeÞ
2�3=2
ve
!
¼ � 1�1
2
s2p þ s2evp � ve
1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð %yp � %yeÞ
2q ð24Þ
On the other hand, D2 in (23) can be simplified as
D2 ¼1
2
s2p þ s2evp � ve
1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ
2þ ð %yp � %yeÞ
2q ð25Þ
By (24) and (25), minyp maxyefXðtÞVðxp; xeÞ þ 1g ¼ 0: Thus, V is a solution of the HJIequation. Furthermore, it can be shown similarly that V is also a lower Value. Hence, V is aValue function. &
3.4.3. On finite expectation of the capture time. In this section, we examine the conditionEfTg51 in Theorem 5. Let us first consider a simplified game in a one-dimensional space asshown in Figure 1.The dynamics of the players are described by
dxB ¼ vBuB dtþ sB doB
Here, B 2 fp; eg; vB is the velocity; uB 2 f1;�1g is the control variable; oB is a one-dimensionalstandard Wiener process. To force the capture, the pursuer must move towards the evader; whilethe evader escapes in the same direction, e.g. up ¼ ue ¼ 1 as in Figure 1. Let x ¼ xe � xp and thedynamic equation becomes
dx ¼ v dtþ s do ð26Þ
where v ¼ ve � vp and s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2p þ s2e
q:
Lemma 6For any l; 05l51 and k 2 Z50;
P1n¼1 n
kl�n51:
Lemma 7If vp > ve and e50; then EfTg51; where T ¼
4
infftj jxðtÞj4eg:
Pursuer Evader
vp ye
x
Figure 1. The simplified stochastic PE game in a one-dimensional space.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 231
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
ProofWe first find an upper bound of PðT > tÞ; the probability that the evader has not been capturedby time t:Without loss of generality, assume that the game starts at time 0 and xð0Þ > e: By theproperty of Wiener process, the state xt at time t > 0 is a Gaussian random variable, i.e.xt �Nðmxt ; s
2xtÞ: Here, the mean mxt ¼ xð0Þ þ vt and the variance is s2xt ¼ ðs
2p þ s2eÞt: Let %v ¼
�v ¼ vp � ve > 0: The fact that T > t implies that xt > e at least at time t: It satisfies
PðT > tÞ4Pðxt > eÞ ¼Zx>e
1
sffiffiffiffiffiffiffi2ptp exp �
ðx� mxt Þ2
2s2xt
!dx
¼
Zx>e
1
sffiffiffiffiffiffiffi2ptp exp �
ðx� xð0Þ þ %vtÞ2
2s2t
� �dx ð27Þ
Define t0 ¼ ðxð0Þ � eÞ=%v; *t ¼ t� t0 and r ¼ xþ %v*t; such that
Pðxt > eÞ ¼Zx>0
1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt0 þ *tÞ
p exp �ðxþ %v*t Þ
2
2s2ðt0 þ *t Þ
� �dx
¼
Zr>%v*t
1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt0 þ *tÞ
p exp �r2
2s2ðt0 þ *t Þ
� �dr
Choose some t1 such that %vðt1 � t0Þ > 1; and define t2 ¼ maxf2t0; t1g: Consider the time t whent > t2; i.e. *t > t0 and %v*t > 1; and then
Pðxt > eÞ51
sffiffiffiffiffiffiffiffiffi4pt0p
Zr>%v*t
exp �r2
4s2*t
� �dr5
1
sffiffiffiffiffiffiffiffiffi4pt0p
Zr>%v*t
exp �%v � *t � r
4s2*t
� �dr
¼2s
%vffiffiffiffiffiffiffipt0p exp �
%v2
4s2*t
� �¼
2s%vffiffiffiffiffiffiffipt0p exp �
%v2
4s2ðt� t0Þ
� �ð28Þ
Denote by pT ðtÞ the probability density of the capture time T : Then,
EfTg ¼
Z 10
t � pT ðtÞ dt ¼
Z t2
0
t � pT ðtÞ dtþ
Z 1t2
t � pT ðtÞ dt ð29Þ
Next, we show that the second term on the right-hand side of (29) is finite, which implies thatEfTg is finite. Choose a small dt > 0; and thenZ 1
t2
t � pT ðtÞ dt ¼X1k¼0
Z t2þðkþ1Þdt
t2þkdtt � pT ðtÞ dt
� �ð30Þ
which is illustrated in Figure 2. In (30), each term in the summation satisfies
Z t2þðkþ1Þdt
t2þkdtt � pT ðtÞ dt5ðt2 þ ðkþ 1ÞdtÞ
Z t2þðkþ1Þdt
t2þkdtpT ðtÞ dt5ðt2 þ ðkþ 1ÞdtÞ
Z 1t2þkdt
pT ðtÞ dt ð31Þ
Note that PðT > t2 þ kdtÞ ¼R1t2þkdt
pT ðtÞ dt; and by (27) and (28),Z 1t2þkdt
pT ðtÞ dt52s
%vffiffiffiffiffiffiffipt0p exp �
%v2
4s2ðt2 þ kdt� t0Þ
� �ð32Þ
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER232
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Substitute (32) into (30), and by Lemma 6,Z 1t2
t � pT ðtÞ dt5X1k¼0
ðt2 þ ðkþ 1ÞdtÞ2s
%vffiffiffiffiffiffiffipt0p exp �
%v2
4s2ðt2 þ kdt� t0Þ
� �� �51 ð33Þ
By (29) and (33), EfTg51: &
Remark 3Although Lemma 7 can be proved in a simpler way, the proof presented here is useful in thefollowing theorem for the game in R2:
Now, we examine the game in a R2 space associated with Theorem 5. First of all, change thevariables as *x ¼ %xp � %xe and *y ¼ %yp � %ye: According to (22) and (24), the optimal control of thepursuer coincides with that of the evader, namely, ynp ¼ yne : Suppose that both the pursuer andthe evader use the same control and denote it by y: Then the dynamic equation of the playersbecomes
d *x ¼ ðvp � veÞ cos yðtÞ dtþ sp dop %x � se doe %x with *xð0Þ ¼ *x0 ð34aÞ
d*y ¼ ðvp � veÞ sin yðtÞ dtþ sp dop%y � se doe%y with *yð0Þ ¼ *y0 ð34bÞ
In (34a), op %x and oe %x are independent, such that the term spdop %x � sedoe %x is equivalent to
sdo *x; where s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2p þ s2e
qand o *x is a standard Wiener process; and in (34b), spdop%y � sedoe%y
can be treated similarly as a Wiener process sdo*y: Define v ¼ vp � ve; x ¼ ½ *x; *y�T and rðxÞ
¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2
p:
Complete characterization of the distribution of the state x with time t requires a solution of apartial differential equation called the Fokker–Planck equation (FPE) [28] as
@pðt; xÞ
@t¼ �
X2i¼1
@
@xi½pðt;xÞfiðx; yÞ� þ
1
2
X2i¼1
@2pðt;xÞ
@x2is2 with pð0;xÞ ¼ dðx� x0Þ ð35Þ
where dð�Þ is Dirac-Delta function; fi is the ith element of function f in (34), i.e. i ¼ 1; 2 standfor *x and *y; respectively. Since the analytical solution of Equation (35) is formidable,in the following, we construct an upper bound of PðT > tÞ and verify it using numerical
t2 t2 t2
t
…...
N t T
pT t
…...0 t
Figure 2. The probability density of the capture time pT ðtÞ:
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 233
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
solution to (35). Note that PðT > tÞ4PðrðxtÞ > eÞ; and the following discussion is focused onPðrðxtÞ > eÞ:
To construct an upper bound of PðrðxtÞ > eÞ; we first consider a simplified situation where thecontrol y in (35) is fixed, i.e. yt ¼ y0 for t50: Based on the property of Wiener process, theprobability distribution under y0; py0x ðt;xÞ is
py0x ðt; xÞ ¼1
2ps2texp �
ð *x� m *xðtÞÞ2þ ð*y� m*yðtÞÞ
2
2s2t
!ð36Þ
where m *xðtÞ ¼ *x0 þ vt cos y0 and m*yðtÞ ¼ *y0 þ vt sin y0: Here, the superscript in py0x indicates thefixed control y0 and the subscript x implies the x co-ordinates. Now, we change the co-ordinatesby transformation Gt at time t:
#x¼4
x0
y0
" #¼ Gt
*x
*y
" #with Gt ¼
cos bt sin bt
�sin bt cos bt
" #ð37Þ
where bt is the angle between the ‘line of sight’*** from the evader to the pursuer and the *x-axis,which is illustrated in Figure 3.
In the new x0–y0 co-ordinates, the state x0 and y0 are jointly Gaussian with the mean
mx0
my0
" #¼
cos bt sin bt
�sin bt cos bt
" #m *x
m*y
" #¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*x þ m2*y
q0
24 35 ð38Þ
and the covariance Covt;
Covt ¼ Gt
s2t 0
0 s2t
" #GT
t ¼s2t 0
0 s2t
" #
Evader
Pursuer
xy
x
yv
t
, yx
Figure 3. Change of the *x–*y co-ordinates to the x0–y0 co-ordinates.
***Here, the ‘line of sight’ is drawn according to the expected positions of both the pursuer and the evader.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER234
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Thus, the random variables in the new x0–y0 co-ordinates are independent, and the distributionpy0#x is
py0#x ðt; #xÞ ¼1
2ps2texp �
x0 �
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*xðtÞ þ m2*yðtÞ
q !2þ y02
2s2t
0BBBBB@
1CCCCCA ð39Þ
Next, we change the x0–y0 co-ordinates to a r–j (polar) co-ordinate system as
rð #xÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix02 þ y02
p; jð #xÞ ¼ /ð~#x; ~x0Þ with jð #xÞ 2 ½0; 2pÞ
Here, /ð~#x; ~x0Þ is the angle between the vector ~x and the x0-axis and its range is ½0; 2pÞ: Denote thistransformation by �Gt. At each #x; the absolute value of the Jacobian J �Gt
ð #xÞ of the transformation �Gt
is jJ �Gtð #xÞj ¼ 1=rð #xÞ: The probability density py0r;jðt; r;jÞ in the r–j co-ordinates is
py0r;jðt; r;jÞ ¼ jJ �Gtð #xÞj�1py0#x ðt; xÞ ¼
r2ps2t
exp �
r cos j�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*xðtÞ þ m2*yðtÞ
q !2þ ðr sinjÞ2
2s2t
0BBBBB@
1CCCCCAClearly, pr;jðt;r;jÞ5pr;jðt; r; 0Þ for any j=0: Let %rm *x;m*y
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*x þ m2*y
q: The probability that r > r
(r50) under control y0 at time t; Py0ðrð #xtÞ > rÞ satisfies
Py0 ðrð #xtÞ > rÞ5Z 2p
0
djZ 1r
pr;jðt;r;j ¼ 0Þ dr
¼
Z 2p
0
djZ 1r
r2ps2t
exp �ðr� %rm *x;m*y
Þ2
2s2t
!dr ð40Þ
Next, we consider the case when both the pursuer and the evader exploit their optimal control yn
determined in (30). Intuitively, the optimal control yn (in state feedback) drives each state ð *x; *yÞtowards the origin, such that it outperforms (static) y0 in terms of capture. Similar to (40), weassume that Equation (41) is true. Denote by mn
*xðtÞ and mn*yðtÞ the mean of *x and *y at time t50
under yn and %rmn
*x;mn
*yðtÞ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimn*xtðtÞ2 þ mn
*ytðtÞ2
q.
PynðrðxtÞ > rÞ5Z 1r
Crs2t
exp �ðr� %rmn
*x;mn
*yðtÞÞ2
2s2t
!dr for some constant C > 0 ð41Þ
Here, PynðrðxtÞ > rÞ is the probability of r > r under yn at time t: Due to the convexity of r ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2
pand by Jensen’s inequality,
%rmn
*x;mn
*y¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimn*x2þ mn
*y2
q4E
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2yn þ *y2yn
qn o¼4
%ryn
Thus, inequality (41) still holds when %rmn
*x;mn
*yis replaced by %ryn ; i.e.,
PynðrðxtÞ > rÞ5Z 1r
Crs2t
exp �ðr� %ryn ðtÞÞ
2
2s2t
� �dr ð42Þ
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 235
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Let ryn ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2yn þ *y2yn
qalong the trajectory under the optimal control yn: The evolution of rynðtÞ can
be derived based on the Ito’s rule. According to XðtÞ in (13), the evolution of rynðtÞ under the controlyn satisfies
’rynðtÞ ¼@rynðtÞ@ *x
v cos yn þ@ryn ðtÞ@*y
v sin yn þ1
2
@2rynðtÞ@ *x2
s2 þ1
2
@2 %ryn ðtÞ@*y2
s2
¼ � vþ1
2rynðtÞs2 ð43Þ
Here, ð@rynðtÞ=@ *xÞ cos ynþ ð@rynðtÞ=@*yÞ sin yn ¼ �1: Next, our study is focused on the set
S ¼ fð *x; *yÞjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2
p> eg: Since e > s2=2v (cf. Theorem 5) and ryn > e; according to (43)
’rynðtÞ ¼ �vþ1
2ryn ðtÞs25� vþ
1
2es2 ¼ �k250 ð44Þ
It indicates that the decreasing rate of r at any point inS is greater than k2; and thus the decreasing
rate of %ryn ¼4
Efryn ðxÞjrynðxÞ > eg; the expectation of ryn given ryn > e; is bigger than k2: Namely,
%rynðtÞ5r0�k2t; where r0¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x20 þ *y20
q: The inequality of (41) holds with %rynðtÞreplaced by r0�k
2t:
Claim 1
PynðrðxtÞ > eÞ5Z 1e
Crs2t
exp �ðr� r0 þ k2tÞ2
2s2t
� �dr ð45Þ
Theorem 8In a PE game with the dynamics specified in (19), if vp > ve; e > ðs2p þ s2eÞ=2ðvp � veÞ and theprobability PðrðxtÞ > eÞ under the optimal control yn satisfies (45), then EfTg51:
ProofLet t0 ¼ ðr0 � eÞ=k2; %t ¼ t� t0 and r ¼ rþ k2%t: By (45),
PðrðxtÞ > eÞ5Z 1e
Crs2t
exp �ðr� r0 þ k2tÞ2
2s2t
� �dr
¼
Z 10
Crs2ð%tþ t0Þ
exp �ðrþ k2%tÞ2
2s2ð%tþ t0Þ
� �dr
ðbecause r ¼ rþ k2%t Þ ¼Z 1k2%t
Cðr� k2%tÞs2ð%tþ t0Þ
exp �r2
2s2ð%tþ t0Þ
� �dr
Choose some t1 such that k2ðt1 � t0Þ > 1 and let t2 ¼ maxf2t0; t1g: Consider that t > t2; such that%t > t0 and k2%t > 1: Then,
PðrðxtÞ > eÞ5Z 1k2%t
Cðr� k2%t Þs2ð%tþ t0Þ
exp �r2
2s2ð%tþ t0Þ
� �dr5
Z 1k2%t
Cðr� k2%tÞ2s2t0
exp �k2%t � r4s2%t
� �dr
¼
Z 1k2%t
Cðr� k2%tÞ2s2t0
exp �k2r4s2
� �dr ¼
8Cs2
k4t0exp �
k4ðt� t0Þ
4s2
� �ð46Þ
for any t50: The rest of the proof follows the proof of Lemma 7 from (29). &
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER236
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Finally, we verify Claim 1 by solving the FPE in (35) numerically using the finite differencemethod. Here, set e ¼ 0:5; v ¼ vp � ve ¼ 1; s2 ¼ 0:3: Choose the initial positions of the playerssuch that ð *x0; *y0Þ ¼ ð1; 1Þ; and yn is determined in (24). In addition, by (44), let k2 ¼ v� s2=2e ¼0:7; and choose constant C as 1=6: The evolution of PynðrðxtÞ > eÞ with time by solving the FPEequation is plotted in Figure 4 as well as the analytical upper bound on the right-hand side of(45), which is denoted by Pu: The result verifies the claim in (45).
3.5. Stochastic PE games with imperfect state information
In this section, PE games with imperfect state information is briefly introduced. We only discussthe difficulty of such problems and suggest a potential suboptimal solution technique. Completesolution still need further investigation in depth.
In games where the players’ measurements are imperfect, the information available to eachplayer is no long symmetric, and thus, the information structure can raise fundamentalchallenges [15, 16, 29]. In general, this is a very difficult problem, and theoretical results in thisfield are still largely unavailable. To avoid the difficulty in the information structure, we focuson a class of PE problems where pursuers have noisy measurements but evaders can stillmeasure the states perfectly. This represents a worst-case scenario to the pursuers. Under thissetting, we may approach the problem by optimization from the pursuers’ perspective.
Consider the players’ dynamic equation in (2) with (3) and the objective functional in (4). Thepursuers’ measurement is described by
yt ¼ hðxt; xÞ
where y is the measurement, h is the measurement function, and x is the disturbance with knownstatistics. In addition, since z is a logic state, it is known to the pursuers. Let I
pt be the
0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
3
3.5
Pθ*
(ρ>
ε) w
ith th
e A
naly
tical
Upp
er−b
ound
Pu(
ρ>ε)
Time (s)0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
Pu−P
θ*Time (s)
Probability Pθ*(ρ>ε)
Upper−bound Pu(ρ>ε)
Figure 4. Probability Pyn ðr > eÞ vs the upper bound Pu:
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 237
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
‘information set’ of the pursuers at time t; which includes all the information available up totime t: Here, I
pt ¼ fy½t0;t�; z½t0;t�; a½t0;tÞg; where y½t0;t� ¼
4
fyt; t04t4tg and z½t0;t� is defined similarly;a½t0;tÞ ¼
4
fat; t04t5tg: The admissible control input of the pursuers is a map a : ½0;T �/Aa; whichis progressively measurable with respect to not only the s-field associated with o but also thatassociated with the measurement disturbance x: Denote by AðtÞ the set of all admissible a:Suppose that the evaders exploit the optimal strategies b that are determined in thecorresponding problem with perfect state information. Let b 2 DðtÞ be a non-anticipativestrategy of the evaders. The dynamic optimization problem (for closed-loop strategies) of thepursuers can be; infa2AðtÞ Ext;xfsupbb2BðtÞJða; bb; xt; ztÞjI
pt g: Here, J is similar to (4) in the perfect
information case; bb is the evaders’ control input associated with some strategy b; theexpectation is taken over xt; o and x given I
pt : In this formulation, since the evaders have perfect
state information, they are assumed to exploit the optimal strategies in the correspondingproblem with perfect information. Thus, optimization associated with the evaders is inside theexpectation with respect to xt and x. In addition, the closed-loop control of the pursuers shouldbe adapted to their future measurements yt taken at time t (t4t4T), whereas the evaders’decision-making depends on the perfect states all the time. So far, this problem is analysedconceptually, and the discussion may shed light on why stochastic game (control) problem ishard. Even the problem formulation of a special case is difficult. The issues of existence anduniqueness of solutions are still largely open.
In what follows, the authors only point out a heuristic approach to such problems from apractical point of view. Suppose that measurements are taken by the pursuers at each sampletime t0 þ kDt for k 2 Zþ: The information set of the pursuers at time t is I
pt ¼ fI
pt0 ; z½t0;t�; a½t0;tÞ;
yt0þkDt for k 2 Zþ and t0 þ kDt4tg; where Ipt0 is the initial information set at t0: Clearly, the
information sets at consequent sample times satisfies
Ipt0þðkþ1ÞDt
¼ fIpt0þkDt
; zðt0þkDt;t0þðkþ1ÞDt�; a½t0þkDt;t0þðkþ1ÞDtÞ; yt0þðkþ1ÞDtg ð47Þ
Let us choose a cost-to-go function asfVC2GðIpt Þ ¼ Extf
fVhðxtÞjI
pt g; where
fVhis an approximate
upper Value function determined in (11) by the hierarchical approach given the perfect stateinformation. Then, at each sample time t ¼ t0 þ kDt; the pursuers solve the followingoptimization problem:
infað�Þ2AðtÞ
Ext;o
Z tþDt
t
Gðt;xt; zt; at;bn½a�tÞ dt
� �þ EytþDt
fVC2GðIptþDtÞjI
pt
� �ð48Þ
where bn solves the limited look-ahead optimization problem in (8) with cost-to-go functionfVh
given perfect state information. Note that here, the pursuers’ measurement ytþDt at the nextsample time is taken into account. By (47), the expectation with respect to ytþDt is equivalent tothe expectation with respect to I
ptþDt given I
pt ; a½t;tþDtÞ and zðt;tþDt�: Finally, to reduce the
computation, certainty equivalence can be used in cost-to-go function fVC2G; i.e. fVC2G
ðIpt Þ ¼
fVhð #xtÞ; where #xt is the expectation of state x given I
pt : Note that here the evaders are
assumed to exploit the strategy b*. In practice, this may not be true, such that informationabout the evaders’ control is needed for the pursuers to estimate the states.
In implementation, the optimization problem in (48) is solved at every t0 þ kDt; and theresulting strategy of the pursuers is implemented during the next Dt interval. This repetitiveimplementation of one-step look-ahead optimization is similar to model predictive control in
procedure withfVC2G as the terminal cost. In this suboptimal method, the feature of feedback is
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER238
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
crucial to stochastic games and the optimization based on look-ahead may be beneficial due to
the improving property of fVhwith perfect state information.
3.6. On finite expectation of the capture time
Finally, we explore the finite expected capture time of the evader in a two-player game withimperfect state information. A specific model of measurement is used to demonstrate the effectof the measurement accuracy on the capture time of the evader. Consider the two-player PEgame in Section 3.4.3. Let xB ¼ ½ %xB; %yB�
T for B 2 fp; eg: Again, we assume that the evader canmeasure the states (xp;xe) perfectly; while the pursuer knows its own state xp perfectly but canonly access a noisy measurement of xe; which is described by
ye ¼ xe þ xeðxeÞ ð49Þ
Here, ye is the measurement; xeðxeÞ is a random vector representing the measurementdisturbance.
Since the evader can access the perfect states, we assume that the evader exploits the optimalcontrol yne determined in (24). Note that yne is still optimal here if the pursuer’s measurementdoes not depend on the evader’s control input. Suppose that the pursuer calculates its optimalcontrol *yp according to (24) but with the perfect state xe replaced by its measurement ye: Let*x¼4
%xp � %xe and *y¼4
%yp � %ye: Define r ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2
p: According to (13), the evolution of r under
the controls yne and *yp can be described as follows:
’r ¼@r@ %xp
vp cos *yp þ@r@%yp
vp sin *yp
� �þ ve þ
1
2rs2 ð50Þ
Note that (50) reduces to (43) when ye ¼ xe: Assume that the disturbance xe is bounded. In thefollowing, we provide a bound Xe of xe; such that if jjxejj4Xe; %r¼
4
Efrjr > eg has a positivedecreasing rate.
Construct a co-ordinate system whose origin is the current position of the pursuer. Denote byWðxÞ the angle between the line of sight from the evader to the pursuer and the %x-axis, asillustrated in Figure 5. Define gðrÞ ¼ cos�1ððve þ s2=2rÞ=vpÞ; where s2 ¼ s2p þ s2e : Assume thate > ðs2p þ s2eÞ=2ðvp � veÞ; such that gðrÞ is well defined when r5e; and 05gðrÞ5p=2: Define theset De¼
4
fdjgðeÞ > d > 0g: Let gdðrÞ ¼ gðrÞ � d for some d 2 De:
Evader
Pursuer
vp
ve
x
x
ye
e Ye
p
Figure 5. Illustration of the pursuer’s control based on a noisy measurement.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 239
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
Proposition 9If there exists d 2 De such that XeðrÞ4r tanðgdðrÞÞ for r5e at any xe and any time t50; then’%rðtÞ5� k2 at any t50 for some k=0; where %r ¼ EfrðxÞjrðxÞ > eg:
ProofSince jjxeðxÞjj4XeðrÞ; according to Figure 5, clearly, ye falls within the region Ye; which is acircle of radius XeðrÞ centred at xe: Let #*x ¼ %xp � ye1 and #*y ¼ %yp � ye2 : By (24), *yp satisfies
cos *yp ¼ �#*xffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
#*x2 þ #*y2q and sin *yp ¼ �
#*yffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#*x2 þ #*y2
qNamely, *yp 2 Yd
x¼4
fyjWðxÞ þ p� gdðrÞ4y4WðxÞ þ pþ gdðrÞg: Rewrite Equation (50) as
’r ¼@r@ %xp
vp cos *yp þ@r@%yp
vp sin *yp
� �þ ve þ
1
2rs2 ð51Þ
In (51), @r=@ %xp ¼ cos WðxÞ and @r=@%yp ¼ sin WðxÞ: Thus, ð@r=@ %xpÞ cos *yp þ ð@r=@%ypÞ sin*yp ¼ cosðWðxÞ � *ypÞ: The fact that *yp 2 Yd
x implies j*yp � WðxÞ � pj4gdðrÞ5p=2; i.e.
cosð*yp � WðxÞ � pÞ5cosðgdðrÞÞ: Thus,
cosðWðxÞ � *ypÞ ¼ �cosðWðxÞ � *yp � pÞ4� cosðgdðrÞÞ ¼ �cosðgðrÞ � dÞ ð52Þ
Substitute (52) into (51),
’r4 � vp cosðgðrÞ � dÞ þ ve þ1
2rs2 ¼ �vp cosðgðrÞ � dÞ � ve þ
s2
2r
� �vp
� �¼ � vp cosðgðrÞ � dÞ � cos gðrÞ½ � ¼ �2vp sinðgðrÞ � d=2Þ sinðd=2Þ
4 � 2vp sinðgðeÞ � d=2Þ sinðd=2Þ ¼ �k250 ð53Þ
Since (53) holds for any r > e; thus it holds for ’%r if the evader is not captured, i.e. ’%r5� k2: &
Proposition 9 states that if jjxejj4r tanðgdðrÞÞ; the distance between the pursuer and the evaderdecreases on average. Following the analysis in Section 3.4.3, it is expected that the capture timeof the evader has a finite expectation. Although the bounded measurement disturbanceconsidered in (49) is restrictive and it is unlikely to be true in real-world problems, Proposition 9still has its practical value because it sheds light on the relationship between the measurementaccuracy and the capturability as well as the capture range of the pursuer (indicated by e). Theresults can be extended to a more general case where the measurement noise is unbounded butwith a known probability distribution.
4. SIMULATION RESULTS
In the following, two examples are simulated to illustrate the usefulness and the feasibility of thelimited look-ahead method for stochastic multi-player PE differential games.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER240
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
4.1. Performance enhancement by limited look-ahead
In this section, the usefulness of limited look-ahead in the performance enhancement isdemonstrated in stochastic co-operative pursuit problems. Consider a simple PE game involvingtwo pursuers and two evaders with the players’ dynamics in (19) and the objective in (10), whichis the sum of the capture times of all the evaders. Consider a specific scenario, where the players’initial positions and velocities are specified in Table I. Let s2p ¼ s2e ¼ 0:5 and e ¼ 0:5: Here, weassume that both the pursuers and the evaders can access the state variables perfectly.
We first apply the hierarchical method, where the objective of the optimization problem at theupper level is (11) with the Value function of the distributed two-player games specified in (20).The resulting optimal engagement is that pursuer 1 (2) is engaged with evader 1 (2). Under thisengagement, the strategies of the pursuers and the evaders are determined by solving thedistributed two-player games at the lower level. Typical sample trajectories of co-operativepursuit under such an engagement are shown in Figure 6(a), where the arrows indicate theexpected instantaneous moving directions of the players when the snapshots are taken. We use acircle to indicate the capture range of the pursuer as well as the capture of the correspondingevader inside.
Now, let x ¼ ½ %xp; %yp; %xe; %ye�T and denote by fVh
ðx; zÞ the suboptimal upper Value obtained bythe hierarchical approach as in (11). Suppose that the game starts at t0: Given Dt > 0; at eachsample time t ¼ t0 þ kDt (k 2 Z50), we implement the optimization based on limited look-aheadas in (9) but with certainty equivalence, i.e.
maxb2DðtÞ
minað�Þ2AðtÞ
Z tþDt
t
X2j¼1
zjðtÞ
!dtþfVh
ð #xtþDt;xt;a;b½a�
; ztþDt;zt; #x
Þ
( )ð54Þ
where #xtþDt;xt;a;b½a�
¼4
EfxtþDt;xt;a;b½a�
g: In this example, we choose Dt ¼ 0:1; such that the‘minimax’ problem in (54) can be approximated by a static optimization problem with a;b ¼ b½a� fixed during Dt intervals. The optimal strategies (ant ; b
n
t ) solved are utilized during thenext Dt interval. By repetition of this procedure, the trajectories of the players can be generated,and one of the samples is illustrated in Figure 6(b).
In Figure 6(b), we draw the dashed arrows to emphasize the players’ movement. Clearly, theevolution of the game in Figure 6(b) can better resemble the reality compared to Figure 6(a).Specifically, when the players are close, both pursuers can move co-operatively to force theevaders to change their escaping directions. In such a way, the performance is improved. Itshould be noted here that the trajectories in Figure 6 are sample runs. To further justify thelimited look-ahead method, we have simulated the same game 1000 times using both methods,and the average cost (accumulative capture time) according to (10) under the hierarchical
Table I. The necessary parameters of the players in a PEgame scenario.
Initial position Velocity
Pursuer 1 ð0; 0Þ 4Pursuer 2 ð9; 2Þ 4Evader 1 ð4; 0Þ 3Evader 2 ð5; 2Þ 3
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 241
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
method is 7.79 (s) while it is 6.53 (s) by the limited look-ahead method. Clearly, the performanceby the hierarchical approach can be improved by the limited look-ahead method.
4.2. Cooperative pursuit game with imperfect state information
In this section, we demonstrate the feasibility of the limited look-ahead approach through a multi-player stochastic PE game with imperfect state information. Consider a game with three pursuersand three evaders, and the dynamics of the players are given in (19). The sum of the capture time ofeach evader is the objective. Assume that the evaders have perfect state measurement; the pursuerscan measure their own states perfectly but their measurement of the evaders is noisy. The pursuers’measurement (of evader j) at each time t¼
4
t0 þ kDt is described by
y jðtÞ ¼%x je
%y je
" #þ
x j%x
x j%y
24 35 ð55Þ
Capture of Evader 1
Capture of Evader 2
Evader 2
Evader 1
Pursuer 2
Pursuer 1
0 2 4 6 8 10-4
-3
-2
-1
0
1
2
3
4
0 2 4 6 8 10-4
-3
-2
-1
0
1
2
3
4
Pursuer 1
Pursuer 2Pursuer 2
Pursuer 1
Evader 2
Evader 1
Capture of Evader 1
Capture of Evader 2
At the End of the Game
At the End of the Game
At the Beginning Stage
At the Beginning Stage(a)
(b)
Figure 6. Performance enhancement by limited look-ahead: (a) pursuit trajectories by the hierarchicalapproach and (b) pursuit trajectories by repetition of the optimization based on look-ahead.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER242
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
where x j%x and x j
%y are independent Gaussian random variables, with zero-mean and a common
variance, i.e. x j%xðx
j%yÞ �Nð0; s2j Þ: Define y ¼ ½y1
T
; . . . ; yMT
�T: In this example, let s2j be 0.5 for
j ¼ 1; 2; 3 and Dt ¼ 0:1: The necessary parameters of the players are given in Table II.As in the previous example, we implemented the optimization based on limited look-ahead
repetitively at each sample time t0 þ kDt as in (54) but with #xtþDt;xt;a;b½a�
replaced by #xtþDt; *xt;a;b½a�
with *x¼4
½xTp ; yT�T: By doing this, the noisy measurement yt is used in the calculation as the
substitute of the true state xe: It should be noted that #xtþDt; *xt;a;b½a�
is the expectation of state x attþ Dt given the initial state *x at t under að�Þ and b:
The resulting (sample) pursuit trajectories at various stages are illustrated in Figure 7, where‘P’ stands for pursuer and ‘E’ for evader. It confirms the phenomenon observed in Figure 6(b)that the pursuers move co-operatively at the beginning stages and when the pursuers get closeenough, they become engaged with specific evaders. The reason lies in that according to theobjective in (10), all the evaders are equally important. Every evader tends to avoid potentialcapture by each of the pursuers. However, with the limited look-ahead, each evader may not‘detect’ which pursuer is actually after it. Hence, the pursuers have an advantage of concealingtheir true intent, which turns out to be the extra strength of co-operative pursuit with multiplepursuers compared to the simple summation of the individual pursuits as in the hierarchicalapproach. This advantage is due to the (hierarchical) structure relaxation in the optimizationover limited look-ahead intervals [12]. Finally, it should be noted that in the pursuit trajectoriesof the simulations, the evaders exploit the strategies calculated in the same optimizationproblem based on limited look-ahead as that for the pursuers. In practice, this may not be true,and if not, the pursuers may have a better performance with the same limited look-aheadstrategies because the pursuers optimize their worst case.
5. CONCLUSION
In this paper, a general stochastic multi-player PE differential game with additive Gaussiannoise in the dynamics has been formulated. To avoid the difficulty of multiplicity of the playersin conventional DP methods, a class of suboptimal approaches is specified, such that theresulting suboptimal solution has an improving property based on the optimization with limitedlook-ahead. A hierarchical method that decomposes a multi-player game into two-player gamesbelongs to the set. Starting from a proper suboptimal solution, the improvement based onlimited look-ahead can be applied iteratively and the process converges. Furthermore, we derive
Table II. Simulation parameters.
Pursuers Evaders
P 1 2 3 E 1 2 3
ð %xp0; %yp0 Þ (0;�12) (8; 8) (�8; 8) ð %xe0; %ye0Þ (0,7) (�3;�2) (4;�3)
vp(1/s) 5 5 5 ve(1/s) 3 2 2
s2p 0.5 0.5 0.5 s2e 0.5 0.5 0.5
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 243
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
an analytical solution for a two-player game using the Dubin’s car model, and the conditions onfinite expectation of the capture time are specified. The usefulness and the feasibility of limitedlook-ahead methods are demonstrated through selected simulation scenarios. One appealingadvantage of this limited look-ahead approach is that the real pursuers’ intentions can beconcealed from the evaders’ point of view.
The iterative method provides a natural framework to general multi-player (zero-sum) games.However, due to its close relation to DP methods, scalability is an issue. Practical algorithmsneed further investigation.
APPENDIX A
Lemma 10Given any að�Þ 2 AðtÞ; define
Waðx; zÞ ¼4
supb2DðtÞ
Eo
Z T
t
Gðxt; zt; at; b½a�tÞ dtþQðxT Þ
� �ðA1Þ
-10
-5
0
5
10
-10
-5
0
5
10
Cooperative Pursuit: Stage 1
Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4
Cooperative Pursuit: Stage 2
y bar
-10
-5
0
5
10
y bar
y bar
-10
-5
0
5
10
y bar
-5 0 5xbar
-5 0 5xbar
-5 0 5xbar
-5 0 5xbar
P1
P2
P3E1
E2 E3
Capture of E3
Capture of E1
Capture of E2
Figure 7. Co-operative pursuit trajectories of the players.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER244
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
For any Dt > 0;
Waðx; zÞ5 supb2DðtÞ
Eo Pa;b;oðT4tþ DtÞZ T
t
Gðxt; zt; at;b½a�tÞ dtþQðxT Þ
� ��
þ Pa;b;oðT > tþ DtÞZ tþDt
t
Gðxt; zt; at;b½a�tÞ dt�
þWaðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ
��ðA2Þ
Here, Pa;b;oðT4tþ DtÞ represents the probability of T4tþ Dt given að�Þ; b and o:
ProofInequality (A2) is clear by the definition ofWa in (A1). &
A.1. Proof of Theorem 1
ProofGiven any e > 0 and for any x 2 X and z 2 ZM ; there exists *atð�Þ 2 AS
x;zðtÞ; such thatfVðx; zÞ5 supb2DðtÞ
Eo
Z T
t
Gðxt; zt; *att;b½*a
t�tÞ dtþQðxT Þ
� �� e ¼W*at ðx; zÞ � e ðA3Þ
By Lemma 10, given any Dt > 0;
W*at ðx; zÞ5 supb2DðtÞ
Eo P*at;b;oðT4tþ DtÞZ T
t
Gðxt; zt; *att;b½*a
t�tÞ dtþQðxT Þ
� ��
þ P*at;b;oðT > tþ DtÞZ tþDt
t
Gðxt; zt; *att;b½*a
t�tÞ dt�
þW*at ðxtþDt;x;*at;b½*at�;o; ztþDt;z;xÞ
��
¼ supb2DðtÞ
Eo
Z tþDt
t
Gðxt; zt; *att;b½*a
t�tÞ dtþW*at ðxtþDt;x;*at ;b½*at�;o; ztþDt;z;xÞ
( )ðA4Þ
Note that Wa5fV for any a 2 AðtÞ; and then by (A3) And (A4),
fVðx; zÞ5 supb2DðtÞ
Eo
Z tþDt
t
Gðxt; zt; *att;b½*a
t�tÞ dtþfVðx
tþDt;x;*at ;b½*at�;o; ztþDt;z;xÞ
( )� e ðA5Þ
In addition, *atð�Þ 2 AðtÞ: It follows that
fVðx; zÞ5 supb2DðtÞ
infa2AðtÞ
Eo
Z tþDt
t
Gðxt; zt; at;b½a�tÞ dtþfVðx
tþDt;x;a;b½a�;o; ztþDt;z;xÞ
( )� e
Since e is arbitrary, the proof is completed. &
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 245
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
A.2. Proof of Theorem 2
Proof
(i) We first prove that Wk5H½Wk� for k 2 Zþ by induction. Suppose that Wk5H½Wk�
(k 2 Z50), and we want to show that Wkþ15H½Wkþ1�: By definition,
Wkþ1ðx; zÞ ¼ supb2DðtÞ
infað�Þ2AðtÞ
Eo
(Z tþDt
t
Gðxt; zt; at;b½a�tÞ dt:
þWkðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ
)ðA6Þ
Note that Wk5H½Wk� ¼Wkþ1: Replace Wk by Wkþ1 in (A6), and thenWkþ1ðx; zÞ5H½Wkþ1�ðx; zÞ: Since W05H½W0�; by induction, Wk5H½Wk� fork 2 Zþ: Considering that fWkg
1k¼0 is decreasing at any x 2 X and z 2 ZM ; fWkg
1k¼0
converges point-wisely because Wk is bounded from below for any k 2 Zþ:(ii) For any k 2 Zþ; Wkðx; zÞ5W1ðx; zÞ and it follows that Wkðx; zÞ5H½W1�ðx; zÞ: Let
k!1; and W1ðx; zÞ5H½W1�ðx; zÞ: On the other hand, W1ðx; zÞ4H½Wk�ðx; zÞfor any k 2 Zþ: Similarly, let k!1; such that W1ðx; zÞ4H½W1�ðx; zÞ: Thus,W1ðx; zÞ ¼H½W1�ðx; zÞ: &
ACKNOWLEDGEMENTS
This work was supported by the Collaborative Center of Control Science at the Ohio State Universityunder Grant F33615-01-2-3154 from the Air Force Research Laboratory (AFRL/VA) and the Air ForceOffice of Scientific Research (AFOSR).
REFERENCES
1. Vidal R, Shakernia O, Kim H, Shim D, Sastry S. Probabilistic Pursuit–evasion games: theory, implementation, andexperimental evaluation. IEEE Transactions on Robotics and Automation 2002; 18(5):662–669.
2. Hespanha J, Kim H, Sastry S. Multiple-agent probabilistic pursuit–evasion games. Proceedings of the 38th IEEEConference on Decision and Control, Phoenix, AZ, 1999; 2432–2437.
3. Antoniades A, Kim H, Sastry S. Pursuit–evasion strategies for teams of multiple agents with incompleteinformation. Proceedings of the 42nd IEEE Conference on Decision and Control, Maui, HI, 2003; 756–761.
4. Li D, Cruz Jr JB, Chen G, Kwan C, Chang M. A hierarchical approach to multi-player pursuit–evasion differentialgames. Proceedings of the 44th Joint Conference of CDC-ECC05, Seville, Spain, December 2005; 5674–5679.
5. Li D, Cruz Jr JB. Better cooperative control with limited look-ahead. Proceedings of American Control Conference,Minneapolis, MN, June 2006; 4914–4919.
6. Kanchanavally S, Ordonez R, Layne J. Mobile target tracking by networked uninhabited autonomous vehicles viahospitability maps. Proceedings of the American Control Conference, Boston, MA, 2004; 5570–5575.
7. Liu Y, Cruz Jr JB, Sparks A. Coordinating networked uninhabited air vehicles for persistent area denial.Proceedings of the 43rd IEEE Conference on Decision and Control, Paradise Island, Bahamas, 2004; 3351–3356.
8. Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit. Wiley: New York,1965.
9. Basar T, Olsder G. Dynamic Noncooperative Game Theory (2nd edn). SIAM: Philadelphia, 1998.10. Hespanha J, Prandini M, Sastry S. Probabilistic pursuit–evasion games: a one-step nash approach. Proceedings of
the 39th IEEE Conference on Decision and Control, Sydney, Australia, 2000; 2272–2277.
D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER246
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc
11. Schenato L, Oh S, Sastry S. Swarm coordination for pursuit evasion games using sensor networks. Proceedings of theInternational Conference on Robotics and Automation, Barcelona, Spain, 2005; 2493–2498.
12. Li D, Cruz Jr JB. Improvement with look-ahead on cooperative pursuit games. Proceedings of the 44th IEEEConference on Decision and Control, San Diego, CA, December 2006.
13. Li D, Cruz Jr JB. General multi-player pursuit–evasion differential games. IEEE Transactions on Automatic Control,submitted for publication.
14. Fleming WH, Souganidis PE. On the existence of value functions of two-player, zero-sum stochastic differentialgames. Indiana University Mathematics Journal 1989; 38(2):293–314.
15. Rhodes I, Luenberger D. Differential games with imperfect state information. IEEE Transactions on AutomaticControl 1969; 14(1):29–38.
16. Kian AR, Cruz Jr JB, Simaan MA. Stochastic discrete-time Nash games with constrained state estimators. Journalof Optimization Theory and Applications 2002; 114(1):171–188.
17. Fleming W, Rishel R. Deterministic and Stochastic Optimal Control. Springer: New York, 1975.18. Yong J, Zhou X. Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer: New York, 1999.19. Bardi M, Capuzzo-Dolcetta I. Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations,
Birkhauser: Boston, 1997.20. Varaiya P. On the existence of solutions to a differential game. SIAM Journal of Control 1967; 5:153–162.21. Roxin E. Axiomatic approach in differential games. Journal of Optimization Theory and Applications 1969; 3(3):
153–163.22. Elliott R, Kalton N. The existence of value in differential games. Memoirs of the American Mathematical Society,
vol. 126. American Mathematical Society: Providence, Rhode Island, 1972.23. Elliott R, Kalton N. Cauchy problems for certain Isaacs–Bellman equations and games of survival. Transactions of
the American Mathematical Society 1974; 198:45–72.24. Kushner H, Dupuis P. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer: New York,
2001.25. Barto A, Bradtke S, Singh S. Learning to act using real-time dynamic programming. Artificial Intelligence 1995;
72:81–138.26. Roy B. Neuro-dynamic programming: overview and recent trends.Handbook of Learning and Approximate Dynamic
Programming. Kluwer: Dordrecht, 2001; 431–460.27. Si J, Barto A, Powell W, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. Wiley: New
York, 2004.28. Risken H. The Fokker–Planck Equation: Methods of Solution and Applications (2nd edn). Springer: Berlin, 1996.29. Rhodes I, Luenberger D. Stochastic differential games with constrained state estimators. Transactions on Automatic
Control 1969; 14(5):476–481.
STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 247
Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247
DOI: 10.1002/rnc