Stochastic multi-player pursuit–evasion diﬀerential gamescruz/Papers/J111-IJRNC.pdf · player stochastic games based on conventional stochastic control theory. Finally, problems

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROLInt. J. Robust Nonlinear Control 2008; 18:218–247Published online 16 May 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/rnc.1193

Stochastic multi-player pursuit–evasion differential games

Dongxu Li1, Jose B. Cruz Jr1,*,y and Corey J. Schumacher2

1Department of Electrical and Computer Engineering, The Ohio State University, 205 Dreese Lab, 2015 Neil Ave.,Columbus, OH 43202, U.S.A.

2Air Force Research Laboratory (AFRL/VACA), Wright-Patterson AFB, OH 45433, U.S.A.

SUMMARY

Autonomous aerial vehicles play an important role in military applications such as in search, surveillanceand reconnaissance. Multi-player stochastic pursuit–evasion (PE) differential game is a natural model forsuch operations involving intelligent moving targets with uncertainties. In this paper, some fundamentalissues of stochastic PE games are addressed. We first model a general stochastic multi-player PE differentialgame with perfect state information. To avoid the difficulty of multiplicity of the players, we extend theiterative method for deterministic multi-player PE games to the stochastic case. Starting from certainsuboptimal solutions with an improving property, the optimization based on limited look-ahead can beused for improvement. The process converges when this improvement is applied iteratively. Furthermore,we introduce a hierarchical approach that can determine a valid starting point of the iterative process. As abasis for multi-player games, stochastic two-player PE games are also addressed. We also briefly discuss thegames with imperfect state information and propose a suboptimal approach from a practical point of view.Finally, we demonstrate the usefulness and the feasibility of the method through simulations. Copyright# 2007 John Wiley & Sons, Ltd.

Received 6 June 2006; Revised 23 February 2007; Accepted 26 February 2007

KEY WORDS: pursuit–evasion; differential game; stochastic process; look-ahead

1. INTRODUCTION

Autonomous aerial vehicles (AAVs) have shown great potential value in reducing humanworkload in future military operations. Co-operation among multiple AAVs is a key factor.Important applications such as intelligent search, co-operative surveillance and reconnaissanceof potential threats, and persistent area denial have drawn much attention [1–7]. Usually, the

*Correspondence to: Jose B. Cruz Jr, Department of Electrical and Computer Engineering, The Ohio State University,205 Dreese Lab, 2015 Neil Ave., Columbus, OH 43202, U.S.A.yE-mail: [email protected], [email protected]

Contract/grant sponsor: Air Force Research Laboratory (AFRL/VA); contract/grant number: F33615-01-2-3154Contract/grant sponsor: Air Force Office of Scientific Research (AFOSR)

Copyright # 2007 John Wiley & Sons, Ltd.

(potential) targets and threats in a battlefield are intelligent and mobile, which can employcounter-strategies to avoid being detected, tracked or destroyed. These action and counter-action behaviours can be naturally formulated in a game setting, or more specifically, bypursuit–evasion (PE) differential games (with multiple players). On the other hand, informationsources in military applications are usually limited and involve uncertainties. In this paper, westudy stochastic multi-playerz PE differential games.

The study of differential games was initiated by Rufus P. Isaacs when he investigated PEproblems [8]. In a general PE game problem, one or a group of pursuers go after one or moreevaders, and it is usually formulated as a zero-sum game in which pursuers try to minimize aprescribed cost functional while evaders try to maximize the same functional. In the literature, anumber of formal solutions regarding optimal strategies have been achieved [8, 9]. However,most theoretical results mainly focus on two-player games with a single pursuer and a singleevader, which are no longer adequate in dealing with the newly emergent situations involvingmultiple players.

Recently, the increasing use of autonomous assets and robots has led to renewed interest inPE games [1–3, 10, 11]. Hespanha et al. [2, 10] formulated PE games in discrete time under aprobabilistic framework, in which greedy and one-step Nash equilibrium strategies are solved,respectively. PE strategies were also studied by Antoniades et al. [3], where several heuristicsolutions were attempted and compared. Furthermore, the system structure and various issuesof implementation of PE strategies by AAV teams are discussed in [1, 11]. These approaches alldeal with discrete-time problems, in which the problem of search and pursuit are intertwined.On the other hand, general (deterministic) multi-player PE games in continuous time are studiedin [4, 5,12, 13]. In [4], a suboptimal solution is solved by a hierarchical decomposition method,and it is further generalized as a class of ‘structured’ suboptimal methods such that the resultingsuboptimal solution has an improving property [5, 13]. In [5, 13], the optimization based onlimited look-ahead is used to improve the suboptimal solution iteratively, and an optimalsolution can be approached in the limit. In addition, the performance enhancement by limitedlook-ahead is further analysed in [12].

In this paper, we extend the previous results in deterministic multi-player PE games to thestochastic case, in which we show that the iterative method is still applicable. Under theframework of the hierarchical method, an analytical solution is derived for a two-playerstochastic PE differential game using a simplified Dubin’s car model, and the finite expectationof capture time is analysed. This paper is mainly concerned with games with the perfect stateinformation pattern, i.e. the players can measure the state variables perfectly. For games withimperfect state information, we provide a suboptimal approach from a practical point of view.The paper is organized as follows. In Section 2, a stochastic multi-player PE game with perfectstate information is formulated based on [14]. In Section 3, we first point out the difficulty due tomultiplicity of the players. Then, to extend the relevant results in deterministic games [5, 13], weuse suboptimal ‘structured’ control methods with sample-set-time-consistency (SSTC) to initiateour improvement iteration. Specifically, a hierarchical method is introduced. The resultingsuboptimal solution can be improved by the optimization based on limited look-ahead. Thisimprovement may be applied iteratively, and the process converges. We further investigate two-player stochastic games based on conventional stochastic control theory. Finally, problems withan imperfect information pattern are briefly discussed. In Section 4, the usefulness and the

zHere, ‘multi-player’ means multiple pursuers and multiple evaders.

STOCHASTIC MULTI-PLAYER PURSUIT–EVASION DIFFERENTIAL GAMES 219

Copyright # 2007 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2008; 18:218–247

DOI: 10.1002/rnc

feasibility of the limited look-ahead method are demonstrated through two selected stochasticPE scenarios. Finally, conclusions are drawn in Section 5.

2. STOCHASTIC GAME FORMULATION

Consider a general PE differential game with N pursuers and M evaders in a n0 dimensionalspace S � Rn0 : Denote by x i

p (x je) the state variable of pursuer i; i ¼ 1; . . . ;N (evader j;

j ¼ 1; . . . ;M), where x ip 2 Rn i

p (x je 2 Rn

je ). Here, n i

p; nje ; n0 2 Zþ ¼

4

fn 2 Z; n > 0g; and n ip; n

je5n0:

The first n0 elements in state x ip (x j

e) denote the physical position of pursuer i (evader j) in thecommon space S: The dynamics of pursuer i and evader j are described by (1), where forsimplicity, we use the subscript t to represent time.

dx ipt ¼ f i

p ðxipt; aitÞ dtþ s i

pðxiptÞ dw

ipt with x i

pðt0Þ ¼ x ip0 ð1aÞ

dx jet ¼ f j

e ðxjet; bjtÞ dtþ s j

eðxjetÞ dw

jet with x j

eðt0Þ ¼ x je0 ð1bÞ

In (1), x ipt 2 Rn i

p ; x jet 2 Rn

je for t5t0; ait 2 Ai

a; bjt 2 Bja; where Ai

a 2 Rmip and Bj

a 2 Rmje

are compact sets with mip;m

je 2 Zþ; wi

pt 2W ip � Rk i

p (wjet 2W j

e � Rk je ) is a standard Wiener

process; s ip (s j

e) is a map s ip : Rn i

p/Rn ip�k

ip (s j

e : Rnje/Rn

je�k

je ). In (1), f i

p : Rn ip � Ai

a/Rn ip

( f je : Rn

je � Bj

a/Rnje ) and s i

pð�Þ (sjeð�Þ) are bounded and Lipschitz in x i

p (x je), uniformly with

respect to ait (bjt) [14]. The additive disturbance in this model is used mainly for mathematicalconvenience; however, in practice, it may stand for the pursuers’ ignorance of the evaders’motions [6] or the disturbance on the movement of small AAVs, e.g. random wind effects.

For pursuer i; define the projection operator P : Rn ip ! Rn0 as Pðx i

pÞ ¼ ½xip1;x

ip2; . . . ;x

ipn0�T:

Clearly, Pðx ipÞ represents the physical position of pursuer i: A similar operator can be defined for

every pursuer and every evader, and we use the same notation Pð�Þ for each of them. Evader j isconsidered captured if there exists pursuer i; such that jjPðx i

pðtÞÞ � Pðx jeðtÞÞjj4e for some t5t0:

Here, e > 0 is predefined. The capture time of evader j is defined as

Tj ¼ minft5t0j9i; 14i4N; such that jjPðx ipðtÞÞ � Pðx j

eðtÞÞjj4eg

In a multi-player PE game, the evaders are generally not captured simultaneously. A gameterminates when all the evaders are captured, and the terminal time, T ; can be defined asT ¼ maxj fTjg:

We use a discrete vector, z 2 ZM ¼ Z � Z � � �Z with Z ¼ f0; 1g; as additional states to indicatewhether evader j ( j ¼ 1; . . . ;M) is captured. Here, the jth element of vector z; zj ¼ 1 if evader j isnot captured while zj ¼ 0 if it is captured. According to the definition of the capture of anevader, zj is governed by function gj:Z � X ! Z that is defined as

gjð0;xÞ ¼ 0

gjð1;xÞ ¼0 jjPðx i

pðtÞÞ � Pðx jeðtÞÞjj4e for some i; i ¼ 1; . . . ;N

1 otherwise

(Here, x is the aggregate state including all x i

p and x je and X is the set of all possible x’s, which will

be defined shortly.

D. LI, J. B. CRUZ JR AND C. J. SCHUMACHER220


DOI: 10.1002/rnc

Assumption 1Every pursuer can proceed to other evaders after it captures an evader; while every evader stopsmoving when it is captured.

With Assumption 1, the dynamics of evader j in (1) is equivalent to dx jet ¼ zjt � f

je ðx

jet; bjtÞ dt

þzjt � s jeðx

jetÞ dw

jet: Clearly, control bj is meaningful only when zjt ¼ 1: For simplicity, let

xp ¼ ½x1T

p ;x2T

p ; . . . ;xNT

p �T; xe ¼ ½x

1T

e ; x2T

e ; . . . ;xMT

e �T

a ¼ ½aT1 ; aT2 ; . . . ; a

TN �

T; b ¼ ½bT1 ; bT2 ; . . . ; b

TM �

T

fp ¼ ½ f1T

p ; f 2T

p ; . . . ; f NT

p �T; fe ¼ ½z1 f

1T

e ; z2 f2T

e ; . . . ; zM fMT

e �T

wp ¼ ½w1T

p ;w2T

p ; . . . ;wNT

p �T; we ¼ ½w

1T

e ;w2T

e ; . . . ;wMT

e �T

spðxpÞ ¼Diagðs1pðx1pÞ; . . . ; s

Np ðx

Np ÞÞ and seðxeÞ ¼ Diagðz1s1eðx

1eÞ; . . . ; zMsMe ðx

Me ÞÞ

where spðxpÞ and seðxeÞ are block diagonal matrices with s ip’s and zjs j

e ’s as the diagonal

elements in a generalized sense;} define x ¼ ½xTp ;xTe �

T; f ðx; a; bÞ ¼ ½ f Tp ðx; aÞ; fTe ðx; bÞ�

T; sðxÞ ¼DiagðspðxpÞ;seðxeÞÞ and w ¼ ½wT

p ;wTe �

T: Then, Equation (1) can be rewritten as

dxt ¼ f ðxt; at; btÞ dtþ sðxtÞ dwt with xðt0Þ ¼ x0 ð2Þ

Let X ¼4

ðQN

i¼1 Rn ip Þ � ð

QMj¼1 Rn

je Þ: Denote by Aa ¼

QNi¼1 A

ia the set of possible controls of all the

pursuers and similarly for the evaders, i.e. Ba ¼QM

j¼1 Bja: In addition, let g¼

4

½g1; . . . ; gM �T; and

the algebraic equation that governs state z becomes

zðtÞ ¼ zðtþÞ ¼ gðzðt�Þ;xðtÞÞ ð3Þ

Here, zðtþÞ and zðt�Þ denote the right and the left limit of z at time t:

Definition 1Let ðO;F;PÞ be a probability space, fFtg a monotone family of s-fields Ft �F with Ft1 �

Ft2 for any t04t14t2; and X a complete separable metric space. A stochastic process x definedon the time interval ½t0;T �;

} xðtÞ : O! X with t04t4T ; is said to be fFtgt5t0 -progressivelymeasurable if for all t 2 ½t0;T �; the map ðt;oÞ/xðt;oÞ is B½t0; t� �Ft=BðXÞ measurable, whereB½t0; t� is the Borel s-field on ½t0; t� and BðXÞ is the Borel s-field on X(cf. [14]).

Define K ¼PN

i¼1 k ip þ

PMj¼1 k

je : The sample space for the stochastic process in (2) belong to

the set

Oot ¼ fo 2 Cð½t;T �Þ; o : ½t;T �/RK with ot ¼ 0g

for every t 2 ½t0;T � (T > 0), where Cð½t;T �Þ is the set of continuous functions on ½t;T �:Denote byFo

t;s �Fs the s-field generated by paths from time t to s (for any t4s4T) in Oot : Provided with

the Wiener measure Pot on Fo

t;T ; Oot becomes the canonical sample space for the stochastic

process of (2). Similarly, for t4s4T ; define

Oot;s ¼ fo 2 Cð½t; s�Þ; o : ½t; s�/RK with ot ¼ 0g

}Note that spðxpÞ and seðxeÞ are not necessarily square matrices.}Here, T may be infinity and in which case the interval is right open.



DOI: 10.1002/rnc

Information pattern in games is crucial to the players, especially in a stochastic game. In a PEgame, the simplest case is that both pursuers and evaders can access the state variables perfectly,in which case a value function can be well defined. Situations become much more difficult whenthe players’ measurements are noisy and distinct, and in such cases the existence of solutions canbe problematic. In this paper, we focus on problems with perfect state information; and theimperfect state information case is briefly discussed. Readers can refer to [15, 16] for morediscussion on information patterns in stochastic games with imperfect state information.

In a stochastic PE game, suppose that the players can measure the state perfectly.k Weconsider the following objective functional:

Jða; b; xt; ztÞ ¼ Eo;t

Z T

t

Gðxt; zt; at; btÞ dtþQðxT Þ

� �subject to ð2Þ and ð3Þ ð4Þ

In (4), o 2 Oot;T ; the terminal cost Q : X/R50¼

4

fr50jr 2 Rg and the cost rate G : X � ZM �

Aa � Ba/R50 are bounded and Lipschitz continuous in xt; uniformly with respect to at and bt[14]. Here, Eo;t denotes the expectation taken with respect to the stochastic process o startingfrom t: Henceforth, we will use Eo for an abbreviated notation. An admissible control of thepursuers and that of the evaders, að�Þ and bð�Þ are defined, respectively, as

að�Þ 2 AðtÞ ¼4

ff : ½t;T �/Aajfð�Þ is Ft;s-progressively measurableg

bð�Þ 2 BðtÞ ¼4

fj : ½t;T �/Bajjð�Þ is Ft;s-progressively measurableg

for t4s4T : We say that að�Þ; *að�Þ 2 AðtÞ are the same on interval ½t; s�; denoted by a � *aon ½t; s�; if the probability Po

t ða ¼ *a a:e: in ½t; s�Þ ¼ 1: The same holds for bð�Þ; *bð�Þ 2 BðtÞ: Beforedefining the value function, we first define a strategy of the pursuers at time t5t0 as a mapa : BðtÞ/AðtÞ; where the pursuers’ control is a function of the input that evaders exploit.Similarly, the evaders’ strategy b can be defined as b : AðtÞ/BðtÞ:

Definition 2A non-anticipative strategy a (or b) of the pursuers (or evaders) on ½t;T � is

a 2 GðtÞ ¼4 fa : BðtÞ/AðtÞj for any bð�Þ; *bð�Þ 2 BðtÞ; b � *b on ½t; s�

implies a½b� � a½*b� on½t; s� for every s 2 ½t;T �g

ðor b 2 DðtÞ ¼4 fb : AðtÞ/BðtÞj for any að�Þ; *að�Þ 2 AðtÞ; a � *a on ½t; s�

implies b½a� � b½*a � on ½t; s� for every s 2 ½t;T �gÞ

This definition implies that the decision-making of the players only depends on the informationup to the current time. More importantly, under the definitions of admissible control and

k Information is shared by all the pursuers or all the evaders assuming sufficient communication on either side.



DOI: 10.1002/rnc

non-anticipative strategy, the stochastic differential equation (2) has a unique solution [14].More details about the formulation of a stochastic game problem can be found in [9, 17, 18].

A stochastic PE game is formulated as a zero-sum game. For any x 2 X and z 2 ZM ; thelower value of a game Vðx; zÞ is defined as

Vðx; zÞ ¼ infa2GðtÞ

supbð�Þ2BðtÞ

fJða½b�; b; x; zÞg

¼ infa2GðtÞ

supbð�Þ2BðtÞ

Eo

Z T

t

Gðxt; zt; a½b�t; btÞ dtþQðxT Þ

� �ð5Þ

Similarly, the upper value Vðx; zÞ is

Vðx; zÞ ¼ supb2DðtÞ

infað�Þ2AðtÞ

fJða;b½a�; x; zÞg

¼ supb2DðtÞ

infað�Þ2AðtÞ

Eo

Z T

t

Gðxt; zt; at; b½a�tÞ dtþQðxT Þ

� �ð6Þ

It turns out that in (5), pursuers have an informational advantage and similarly for evaders in(6). In general, Vðx; zÞ4Vðx; zÞ [19, p. 434], and if Vðx; zÞ ¼Vðx; zÞ; we say that value of thegame (saddle-point equilibrium) exists, which is denoted by Vðx; zÞ:** This is called the Isaacscondition. With these definitions, optimality of solutions can be interpreted according to V;VorV:Without assuming the Isaacs condition, our study is focused on the upper value defined in(6).yy Hereafter, we use the capitalized ‘Value’ to stand for the (lower/upper) Value functions ofa multi-player game.

3. SOLUTION TECHNIQUES

3.1. Difficulty of conventional differential game theory

Dynamic programming (DP) is a general method for solving stochastic differential games, inwhich the underlying idea is the state rollback, i.e. an optimal state trajectory is tracedbackwards starting from certain terminal state. Specifically, the Value function is characterizedby a Hamilton–Jacobi–Isaacs (HJI) equation, and initial conditions on terminal states aregenerally needed. In contrast to a two-player PE game, in a multi-player game, if pursuer icatches evader j; we say that both players are engaged. Clearly, the possible terminal statesdepend on a specific engagement between the pursuers and the evaders, which are hard to bespecified when N and M are large. Furthermore, the treatment by the corresponding HJIequation for a multi-player game becomes more difficult because the additional discrete state z isinvolved. In general, DP approaches cannot be directly applied to multi-player PE games [13].

3.2. An iterative method based on a suboptimal solution

As in deterministic PE games [13], we start with a class of suboptimal methods, where anadditional structure S is imposed on the pursuers’ controls. Given any states x 2 X and z 2 ZM ;

**The definition of the Value above is due to Varaiya [20], Roxin [21] and Elliott and Kalton [22, 23].yyThis represents a worst case from the pursuers’ perspective, assuming that the pursuers’ control input is known by theevaders at each time t. This may not be true in practice, and if so the pursuers will have a better performance.



DOI: 10.1002/rnc

we denote by ASx;zðtÞ a non-empty ‘structured’ control set of the pursuers, and AS

x;zðtÞ � AðtÞ:Here, the superscript and the subscript in AS

x;z indicate the dependence on the structure S and onthe states x and z: Suppose that for any x 2 X ; z 2 ZM at time t5t0; problem (6) is solvable withrespect to AS

x;zðtÞ; i.e.fVðx; zÞ ¼ supb2DðtÞ

infað�Þ2AS

x;zðtÞEo

Z T

t


� �ð7Þ

where o 2 Oot;T and AS

x;zðtÞ is the restricted control set under structure S at time t: Since

ASx;zðtÞ 2 AðtÞ;V4fV: Given any *atð�Þ 2 AS

xt;ztðtÞ; xt 2 X ; zt 2 ZM ; b 2 DðtÞ and o 2 Oo

t;T ; denoteby xs;xt;*at;b½*at�;o the trajectory of x for s5t starting from xt under *a

t and b½*at� corresponding to o:For short, let *xðsÞ ¼ xs;xt;*at;b½*at�;o: Denote by zs;zt; *x the trajectory of z corresponding to *x and use

*zðsÞ ¼ zs;zt; *x for an abbreviated notation.

Definition 3 (sample-set-time-consistency(SSTC))A suboptimal control structure S is said to be sample-set-time-consistent, or SST-consistent forshort, if given any o 2 Oo

t;T ; for any *atð�Þ 2 ASxt;ztðtÞ at any t with t04t5T and the corresponding

trajectories *xs and *zs under *atð�Þ and any b 2 DðtÞ; there exists *asð�Þ 2 AS*xs;*zsðsÞ such that

*atðtÞ ¼ *asðtÞ for s4t5T ; where AS*xs;*zsðsÞ associated with *xs and *zs at any s for t4s5T :

Remark 1SSTC says that at any state *xðsÞ; *zðsÞ along the trajectories for t4s5T given o 2 Oo

t;T ; the laterportion of a structured control *atð�Þ 2 AS

xt;ztðtÞ; which is determined at time t; from time s to T

(*atðtÞ; s4t5T), belongs to the structured control set AS*xs;*zsðsÞ determined at time s associated with

*xs and *zs: Clearly, if structure S is independent of the state x; z and time t; it is SST-consistent.

Theorem 1Under an SST-consistent control structure S; function fVðx; zÞ in (7) satisfies (8) for any x 2 X ;z 2 ZM with any Dt > 0 under (2) with (3).

fVðx; zÞ5 supb2DðtÞ

infað�Þ2AðtÞ

Eo

Z tþDt

t

Gðxt; zt; at;b½a�tÞ dtþfVðx

tþDt;x;a;b½a�;o; ztþDt;z;xÞ

( )ð8Þ

Here, tþ Dt¼4 minftþ Dt;Tg:

ProofRefer to Appendix A.1. &

Theorem 1 states that the suboptimal upper Value fV can be improved by the optimizationover a limited look-ahead interval. On the right-hand side of (8), a simpler game problem is

solved with a possibly shorter horizon and fV as a cost-to-go function.zz The optimizationis subject to the players’ dynamics (2) with (3).}} It should be noted that as in (6), herein the optimization over limited look-ahead intervals, the evaders have the informational

zzThe term ‘cost-to-go’ states that it stands for the corresponding cost of certain dynamic process from tþ Dt to T :}}Hereafter, without explicit description, optimization problems similar to (8) are subjected to the same constraints.



DOI: 10.1002/rnc

advantage over the pursuers. Next, we define a transformation H½W� for any function W 2W¼

4

fW : X � ZM/Rg as

H½W �ðx; zÞ ¼ supb2DðtÞ

infað�Þ2AðtÞ

Eo

Z tþDt

t

Gðxt; zt; at;b½a�tÞ dtþWðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ

( )ð9Þ

subject to (2) and (3). Then, a sequence of functions Wk can be generated by Wkþ1 ¼H½Wk�

starting from some W0 2W: The following theorem shows that the sequence converges if

W0 ¼fV in (7).

Theorem 2(i) IfW0ðx; zÞ satisfies (8) for any x 2 X and z 2 ZM ; then the sequence fWkg

1k¼0 converges point-

wisely; (ii) The limit W1ðx; zÞ ¼4

limk!1Wkðx; zÞ satisfies W1ðx; zÞ ¼H½W1�ðx; zÞ:

ProofRefer to Appendix A.2. &

Theorem 2 states that a suboptimal upper Value in (7) that results from ‘structured’ controlsof SSTC can be improved iteratively by the optimization based on limited look-ahead. The limitof this process W1 is the best upper bound of V that can be approached by this scheme. Then,a natural question is that whether W1 is the true upper Value, which needs furtherinvestigation. The starting point can be [13, 14, 18].

3.3. The hierarchical (structured) suboptimal method

With the iterative method, a multi-player game reduces to finding a valid suboptimal upperValue function with the improving property in (8). In what follows, we will briefly introduce ahierarchical decomposition method [4, 13] that can determine a valid starting point for theiterative process. To illustrate the idea, we consider a class of games with the following objectivefunctional}} and the players’ dynamics in (2) with (3).

Jða; b; x; zÞ ¼

Z T

t

XMj¼1

zjðtÞ

" #dt ð10Þ

The objective in (10) stands for the sum of the capture time of each evader.There are two levels in the hierarchical approach: the upper level is to determine a proper

engagement scheme between the pursuers and the evaders, such that a multi-player PE gamecan be decomposed into distributed two-player PE games; at the lower level, the decoupledtwo-player games are solved [4]. The basic assumption of this approach is that the underlyingtwo-player stochastic PE games are solvable, i.e. the upper Value Vij jðxi j jÞ of the two-playergame between any evader j and its engaged pursuer ij may be solved and is available to the upper

level, where xij j ¼4

½xijTp ;x j T

e �T: By (10), the objective functional for a decoupled two-player game

is the expected capture time, i.e. J ¼ EfRdtg: Suppose that M4N and each evader (pursuer)

can be engaged with no more than one pursuer (evader). Assume that there exists anengagement scheme E such thatVij jðxij jÞ51 for the game between any evader j and its engaged

}} In general, in a game with an objective J; if there exist distributed objectives Jj (j ¼ 1; . . . ;M) associated with eachevader j such that J4

PJj ; then the hierarchical method is applicable.



DOI: 10.1002/rnc

pursuer ij : Then, a combinatorial optimization problem to determine an optimal engagementbetween the pursuers and the evaders at the upper level can be formulated as

fVhðx; zÞ ¼ min

fsijg

XNi¼1

XMj¼1

VijðxijÞ

( )

subject to sij 2 f0; 1g;XNi¼1

sij ¼ 1;XMj¼1

sij41 ð11Þ

In (11), the superscript h infVhindicates that it is determined by the hierarchical approach; the

assignment variables sij’s are binary, where sij ¼ 1 indicates that pursuer i is engaged withevader j and sij ¼ 0 if it is not.

The hierarchical method is a natural way of dealing with multiplicity of the players, in whichthere is an additional hierarchical structure that is imposed on pursuers’ controls, and thus

V4fVh: It is also worth noting that the (best) strategy of the evaders determined ‘locally’ at the

lower level against their engaged pursuers is optimal with respect to (7) if each evader can onlybe captured by its engaged pursuer. More importantly, the hierarchical approach is SST-consistent because the ‘structure’ (the set of possible engagements between the pursuers and the

‘alive’ evaders) remains the same for any state x and time t given any z 2 ZM : Therefore,fVhis a

valid starting point for the iterative method.Problem (11) is formulated for the case M4N: If M > N; some pursuer i must be engaged

with more than one evader, and pursuer i proceeds to them sequentially. In this case, the(assignment) problem becomes a multi-stage allocation problem where each stage is associatedwith one engaged evader for pursuer i: Denote by j i1; j

i2; . . . ; j

inithe evaders engaged with pursuer

i: Although the problem is similar, it entails intensive computation because the starting points ofthe subsequent games between pursuer i and evaders j i2; . . . ; j

inidepend on the terminal of the

games between pursuer i and their previous evaders j i1; ji2; . . . ; j

ini�1

: These are random events and

the expectation embedded in the calculation of Vij ikis difficult to compute. In practice,

approximations may be taken such that a suboptimal engagement is solved instead.kk

In summary, we show that given a proper suboptimal solution to a stochastic multi-player PEgame, the optimization based on limited look-ahead can be used to improve the solution iteratively.It should be noted that this method is closely related to DP methods, and it suffers the curse ofdimensionality. Thus, scalability is an issue when the dimension of the states or the number of theplayers is large. Practical algorithms still need to be further investigated. To this end, those numericalmethods for solving DP equations, which have been extensively studied [24–27], may benefit theimplementation of the iterative method. On the other hand, despite lack of efficient algorithms, theiterative method still has its practical value in performance enhancement. The iterative process maystop at any step to provide the best suboptimal solution to date due to the monotonicity. In practice,this method can provide a satisfactory solution based on a carefully chosen cost-to-go function. Wewill later demonstrate the usefulness of this method through simulations.

3.4. Solution to a two-player pursuit–evasion game

Under the framework of the iterative method with the hierarchical approach, solutionto two-player games becomes a basis for multi-player games. In this section, we present

kk In this case, the improving property in (8) may not hold.



DOI: 10.1002/rnc

the results on two-player games based on stochastic differential game (optimal control)theory.

3.4.1. Preliminary: Stochastic differential game theory. For the reader’s convenience, we firstbriefly review stochastic differential game theory based on [17]. Consider the following players’dynamic equation:

dxt ¼ f ðt;xt; at; btÞ dtþ sðxtÞ do ð12Þ

Here, x 2 Rn; at 2 Aa and bt 2 Ba are the controls of player 1 (minimizer) and player 2(maximizer), where Aa and Ba are compact sets; o is a standard Wiener process with properdimension. Let the objective functional be

#Jða; b; t;xtÞ ¼ Eo

Z T

t

Gðt;xt; at; btÞ dtþQðxT Þ

� �Here, G : R� Rn

� Aa � Ba/R represents the cost rate; Q : Rn/R is the terminal cost; T isexit time, which is defined as T ¼ minftjðt;xtÞ =2 Qg with some open set Q¼

4

T� X : Here, X �Rn is open and T ¼ ½0;T � � R: The non-anticipative strategies of player 1 and 2 are denoted bya 2 GðtÞ and b 2 DðtÞ; respectively. Let C2ðXÞ be the set of functions with continuous second-order derivatives on X : Denote by C1;2ðQÞ the set of functions fðt; xÞ that have continuous first-and second-order partial derivatives, ft; fx and fxx: Henceforth, we use C2 and C1;2

for simplicity. Define CðXÞ ¼4 fc : X/Rjc is measurable on X and boundedg: Denote byPðs;xs; t;xtÞ the transition probability density of xt 2 X at t given the state xs at time s with s5t:For any c 2 CðXÞ; define operator Ss;t as

Ss;t½c�ðxsÞ ¼ EfcðxtÞjxsg ¼ZX

cðxÞPðs;xs; t;xÞ dx

Define the operator XðtÞ on CðXÞ as

XðtÞ½c�ðxtÞ ¼ limh!0

h�1ðSt;tþh½c�ðxtÞ � cðxtÞÞ

For the stochastic process xt defined in (12), which is a diffusion process [17], XðtÞ is a second-order partial differential operator on C2; i.e.

XðtÞ½c� ¼ cx � f ðt;xt; a; bÞ þ12trðcxxsðxtÞs

TðxtÞÞ ð13Þ

Here, trðMÞ denotes the trace of a square matrix M: Consider any function fðt; xÞ 2 C1;2 and byIto’s differential rule [17, 18],

dfðt;xÞ ¼ftðt;xÞ dtþ fxðt;xÞ dxþ12trðfxxsðxtÞs

TðxtÞÞ do

¼ftðt;xÞ dtþ XðtÞfðt; xÞ dtþ fxsðxtÞ do ð14Þ

Lemma 3Assume that

(i) functions f and s in (12) satisfy that

jj f ðt;x; a; bÞjj4Cð1þ jjxjjÞ and jjsðxÞjj4Cð1þ jjxjjÞ

for some constant C > 0; any x 2 X ; a 2 Aa and b 2 Ba;



DOI: 10.1002/rnc

(ii) Let function V 2 C1;2; there exist constants D and k; such that

jjVðt;xÞjj4Dð1þ jjxjjkÞ for any ðt;xÞ 2 Q

(iii) V is continuous on Q; the closure of Q;(iv) Vt þ XðtÞVþ Gðt;xÞ50 for all ðt; xÞ 2 Q; where Ef

R Ts jGðt; xtÞj dtg51 for any ðs;xsÞ 2

Q and some function G : R� Rn/R: Then,

Vðs;xsÞ4E

Z T

s

Gðt;xtÞ dtþVðT ;xT Þ

� �

ProofRefer to Theorem 5.1 on page 124 in [17]. &

Denote by #V; #V and #V the upper, the lower and the Value of the game.

Theorem 4Suppose that functions f ; s satisfy the conditions (i), and #V satisfies (ii) and (iii) in Lemma 3.Assume that there exists að�Þ 2 AðsÞ such that

Eo

Z T

s

jGðt;xt; at;b½a�tÞj dt� �

51 ð15Þ

for any ðs;xsÞ 2 Q and b 2 DðsÞ: Let #V be a solution of the following HJI equation:

#Vtðt;xtÞ þ minat2Aa

maxbt2Ba

fXðtÞ½ #V�ðt;xtÞ þ Gðt;xt; at; btÞg ¼ 0 ð16Þ

for any ðt;xtÞ 2 Q with the boundary #VðT ;xT Þ ¼ QðxT Þ where ðT ; xT Þ 2 @Q (the boundary ofQ), such that #V 2 C1;2ðQÞ and continuous on Q (the closure). Then,

(i) #Vðs; xsÞ4 supb2DðtÞ

f #Jða;b½a�; s; xsÞg

for any að�Þ 2 AðsÞ and ðs;xsÞ 2 Q;(ii) if there exists anð�Þ 2 AðsÞ such that ant 2 Aa for s4t5T and it satisfies

maxbt2Ba

fXðtÞ #Vðt;xtÞ þ Gðt; xt; an

t ; btÞg ¼ minat2Aa

maxbt2Ba

fXðtÞ #Vðt;xtÞ þ Gðt;xt; at; btÞg ð17Þ

for any ðt;xtÞ 2 Q; then #Vðs;xsÞ ¼ supb2DðsÞ fJðan; b½an�; s;xsÞg for any ðs;xsÞ 2 Q:

ProofThe theorem can be easily proved by extending Theorem 4.1 on page 159 in [17]. &

Remark 2The conclusion in Theorem 4 can be extended to the lower Value #V if it satisfiesconditions similar to (16) and (17) but with the order of ‘minimization’ and ‘maximization’



DOI: 10.1002/rnc

reversed, e.g.,

#Vtðt;xtÞ þmaxbt2Ba

minat2Aa

fXðtÞ½ #V �ðt;xtÞ þ Gðt; xt; at; btÞg ¼ 0 ð18Þ

Furthermore, if the HJI equations (16) and (18) coincide, then #V ¼ #V ¼ #V:

3.4.2. Solution to a two-player pursuit–evasion game. In this section, we introduce an analyticresult for a specific two-player differential PE game. Consider the following dynamics of theplayers in R2:

d %xB ¼ vB cos yB dtþ sB doB %x with %xBð0Þ ¼ %xB0 ð19aÞ

d%yB ¼ vB sin yB dtþ sB doB%y with %yBð0Þ ¼ %yB0 ð19bÞ

This is a noise-corrupted version of the simplified Dubin’s car model, where thesubscript B 2 fp; eg stands for the pursuer or the evader; %xB and %yB are the state variables(displacement); vB is the velocity; yB is the control input; oB %x (oB%y) is a standard Wiener process;sB is constant. Assume that oB %x and oB%y are independent, and so are op %x

(op%y ) and oe %x (oe%y).The objective is the capture time, i.e. J ¼ Ef

R Tt dtg: Define xB¼

4

½ %xB; %yB�T; and we rewrite

the dynamics in (19) as dxB ¼ fBðxB; yBÞ dtþ sB doB: The game ends when jjxp � xejj4e forsome e > 0:

Theorem 5Assume that EfTg51; vp > ve with e > ðs2p þ s2eÞ=2ðvp � veÞ; the Value function of the PE gameVðxp; xeÞ is given by

Vðxp;xeÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ

2þ ð%yp � %yeÞ

2q

vp � veþ

s2p þ s2e4ðvp � veÞ

2ln

ð %xp � %xeÞ

2þ ð %yp � %yeÞ

2

!þ CðeÞ ð20Þ

where the constant

CðeÞ ¼ �e

vp � ve�ðs2p þ s2eÞ � lnðe

2Þ

4ðvp � veÞ2

ProofThe proof is an application of Theorem 4. We need to show thatV is both an upper and a lowerValue function. Here, we only show that V is an upper Value, i.e. it satisfies the correspondingHJI equation as in (16), i.e. minyp maxyefXðtÞ½V�ðxp; xeÞ þ 1g ¼ 0: First of all, it is easy tocheck that f ; s in (19) and V satisfy conditions (i)–(iii) in Lemma 3. The assumption thatEfTg51 implies that condition (15) holds. Note that here G ¼ 1 and V ¼ 0 whenffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ

2þ ð%yp � %yeÞ

2q

¼ e: By (13) and the independence of oB %x and oB%y;

XðtÞV ¼@V

@xp� fp þ

@V

@xe� fe|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}

D1

þ1

2

X2i¼1

@2V

@2xpis2p þ

1

2

X2j¼1

@2V

@2xejs2e|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

D2

ð21Þ



DOI: 10.1002/rnc

Here, i ¼ 1; 2 (j ¼ 1; 2) stands for %x and %y; respectively. Substitute (19) into (21), and the termsD1 and D2 in (21) become

D1 ¼1

vp � ve

ð %xp � %xeÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ

2þ ð%yp � %yeÞ

2q vp cosðypÞ

0B@þ

ð%yp � %yeÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ

2þ ð%yp � %yeÞ

2q vp sinðypÞ þ

ð %xe � %xpÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ

2þ ð%yp � %yeÞ

2q ve cosðyeÞ

þð%ye � %ypÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ð %xp � %xeÞ2þ ð%yp � %yeÞ

2q ve sinðyeÞ

1CA

þs2p þ s2e

4ðvp � veÞ2

2ð %xp � %xeÞ


2vp cosðypÞ

þ2ð%yp � %yeÞ


2vp sinðypÞ þ

2ð %xe � %xpÞ


2ve cosðyeÞ

þ2ð%ye � %ypÞ


2ve sinðyeÞ

!ð22Þ

D2 ¼ s2pðV %xp %xp þV%yp %yp Þ þ s2eðV %xe %xe þV%ye %ye Þ

¼1

2ðvp � veÞ

ð %xp � %xeÞ2

½ð %xp � %xeÞ2þ ð%yp � %yeÞ

2�3=2

s2p þð%yp � %yeÞ

2


2�3=2

s2p

þð %xp � %xeÞ

2

½ð %xp � %xeÞ2þ ð %xp � %xeÞ

2�3=2

s2e þð %xp � %xeÞ

2


2�3=2

s2e

!

þs2p þ s2e

4ðvp � veÞ2

ð%yp � %yeÞ2� ð %xp � %xeÞ

2


2�2s2p þ

ð %xp � %xeÞ2� ð%yp � %yeÞ

2


2�2s2p

þð%yp � %yeÞ

2� ð %xp � %xeÞ

2

½ð %xp � %xeÞ2þ ð %xp � %xeÞ

2�2s2e þ

ð %xp � %xeÞ2� ð%yp � %yeÞ

2


2�2s2e

!ð23Þ



DOI: 10.1002/rnc

By inspection of (21)–(23), only the term D1 in (22) involves control yp and ye: Clearly,

minyp

maxyefD1g ¼

1

vp � veð�vp þ veÞ þ

s2p þ s2e4ðvp � veÞ

2�

2ð %xp � %xeÞ2

½ð %xp � %xeÞ2þ ð %yp � %yeÞ

2�3=2

vp

�2ð%yp � %yeÞ

2


2�3=2

vp þ2ð %xp � %xeÞ

2


2�3=2

ve

þ2ð%yp � %yeÞ

2


2�3=2

ve

!

¼ � 1�1

2

s2p þ s2evp � ve

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ


2q ð24Þ

On the other hand, D2 in (23) can be simplified as

D2 ¼1

2

s2p þ s2evp � ve

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið %xp � %xeÞ


2q ð25Þ

By (24) and (25), minyp maxyefXðtÞVðxp; xeÞ þ 1g ¼ 0: Thus, V is a solution of the HJIequation. Furthermore, it can be shown similarly that V is also a lower Value. Hence, V is aValue function. &

3.4.3. On finite expectation of the capture time. In this section, we examine the conditionEfTg51 in Theorem 5. Let us first consider a simplified game in a one-dimensional space asshown in Figure 1.The dynamics of the players are described by

dxB ¼ vBuB dtþ sB doB

Here, B 2 fp; eg; vB is the velocity; uB 2 f1;�1g is the control variable; oB is a one-dimensionalstandard Wiener process. To force the capture, the pursuer must move towards the evader; whilethe evader escapes in the same direction, e.g. up ¼ ue ¼ 1 as in Figure 1. Let x ¼ xe � xp and thedynamic equation becomes

dx ¼ v dtþ s do ð26Þ

where v ¼ ve � vp and s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2p þ s2e

q:

Lemma 6For any l; 05l51 and k 2 Z50;

P1n¼1 n

kl�n51:

Lemma 7If vp > ve and e50; then EfTg51; where T ¼

4

infftj jxðtÞj4eg:

Pursuer Evader

vp ye

x

Figure 1. The simplified stochastic PE game in a one-dimensional space.



DOI: 10.1002/rnc

ProofWe first find an upper bound of PðT > tÞ; the probability that the evader has not been capturedby time t:Without loss of generality, assume that the game starts at time 0 and xð0Þ > e: By theproperty of Wiener process, the state xt at time t > 0 is a Gaussian random variable, i.e.xt �Nðmxt ; s

2xtÞ: Here, the mean mxt ¼ xð0Þ þ vt and the variance is s2xt ¼ ðs

2p þ s2eÞt: Let %v ¼

�v ¼ vp � ve > 0: The fact that T > t implies that xt > e at least at time t: It satisfies

PðT > tÞ4Pðxt > eÞ ¼Zx>e

1

sffiffiffiffiffiffiffi2ptp exp �

ðx� mxt Þ2

2s2xt

!dx

¼

Zx>e

1

sffiffiffiffiffiffiffi2ptp exp �

ðx� xð0Þ þ %vtÞ2

2s2t

� �dx ð27Þ

Define t0 ¼ ðxð0Þ � eÞ=%v; *t ¼ t� t0 and r ¼ xþ %v*t; such that

Pðxt > eÞ ¼Zx>0

1

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt0 þ *tÞ

p exp �ðxþ %v*t Þ

2

2s2ðt0 þ *t Þ

� �dx

¼

Zr>%v*t

1

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2pðt0 þ *tÞ

p exp �r2

2s2ðt0 þ *t Þ

� �dr

Choose some t1 such that %vðt1 � t0Þ > 1; and define t2 ¼ maxf2t0; t1g: Consider the time t whent > t2; i.e. *t > t0 and %v*t > 1; and then

Pðxt > eÞ51

sffiffiffiffiffiffiffiffiffi4pt0p

Zr>%v*t

exp �r2

4s2*t

� �dr5

1

sffiffiffiffiffiffiffiffiffi4pt0p

Zr>%v*t

exp �%v � *t � r

4s2*t

� �dr

¼2s

%vffiffiffiffiffiffiffipt0p exp �

%v2

4s2*t

� �¼

2s%vffiffiffiffiffiffiffipt0p exp �

%v2

4s2ðt� t0Þ

� �ð28Þ

Denote by pT ðtÞ the probability density of the capture time T : Then,

EfTg ¼

Z 10

t � pT ðtÞ dt ¼

Z t2

0

t � pT ðtÞ dtþ

Z 1t2

t � pT ðtÞ dt ð29Þ

Next, we show that the second term on the right-hand side of (29) is finite, which implies thatEfTg is finite. Choose a small dt > 0; and thenZ 1

t2

t � pT ðtÞ dt ¼X1k¼0

Z t2þðkþ1Þdt

t2þkdtt � pT ðtÞ dt

� �ð30Þ

which is illustrated in Figure 2. In (30), each term in the summation satisfies

Z t2þðkþ1Þdt

t2þkdtt � pT ðtÞ dt5ðt2 þ ðkþ 1ÞdtÞ

Z t2þðkþ1Þdt

t2þkdtpT ðtÞ dt5ðt2 þ ðkþ 1ÞdtÞ

Z 1t2þkdt

pT ðtÞ dt ð31Þ

Note that PðT > t2 þ kdtÞ ¼R1t2þkdt

pT ðtÞ dt; and by (27) and (28),Z 1t2þkdt

pT ðtÞ dt52s


%v2

4s2ðt2 þ kdt� t0Þ

� �ð32Þ



DOI: 10.1002/rnc

Substitute (32) into (30), and by Lemma 6,Z 1t2

t � pT ðtÞ dt5X1k¼0

ðt2 þ ðkþ 1ÞdtÞ2s


%v2

4s2ðt2 þ kdt� t0Þ

� �� 51 ð33Þ

By (29) and (33), EfTg51: &

Remark 3Although Lemma 7 can be proved in a simpler way, the proof presented here is useful in thefollowing theorem for the game in R2:

Now, we examine the game in a R2 space associated with Theorem 5. First of all, change thevariables as *x ¼ %xp � %xe and *y ¼ %yp � %ye: According to (22) and (24), the optimal control of thepursuer coincides with that of the evader, namely, ynp ¼ yne : Suppose that both the pursuer andthe evader use the same control and denote it by y: Then the dynamic equation of the playersbecomes

d *x ¼ ðvp � veÞ cos yðtÞ dtþ sp dop %x � se doe %x with *xð0Þ ¼ *x0 ð34aÞ

d*y ¼ ðvp � veÞ sin yðtÞ dtþ sp dop%y � se doe%y with *yð0Þ ¼ *y0 ð34bÞ

In (34a), op %x and oe %x are independent, such that the term spdop %x � sedoe %x is equivalent to

sdo *x; where s ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2p þ s2e

qand o *x is a standard Wiener process; and in (34b), spdop%y � sedoe%y

can be treated similarly as a Wiener process sdo*y: Define v ¼ vp � ve; x ¼ ½ *x; *y�T and rðxÞ

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2

p:

Complete characterization of the distribution of the state x with time t requires a solution of apartial differential equation called the Fokker–Planck equation (FPE) [28] as

@pðt; xÞ

@t¼ �

X2i¼1

@

@xi½pðt;xÞfiðx; yÞ� þ

1

2

X2i¼1

@2pðt;xÞ

@x2is2 with pð0;xÞ ¼ dðx� x0Þ ð35Þ

where dð�Þ is Dirac-Delta function; fi is the ith element of function f in (34), i.e. i ¼ 1; 2 standfor *x and *y; respectively. Since the analytical solution of Equation (35) is formidable,in the following, we construct an upper bound of PðT > tÞ and verify it using numerical

t2 t2 t2

t

…...

N t T

pT t

…...0 t

Figure 2. The probability density of the capture time pT ðtÞ:



DOI: 10.1002/rnc

solution to (35). Note that PðT > tÞ4PðrðxtÞ > eÞ; and the following discussion is focused onPðrðxtÞ > eÞ:

To construct an upper bound of PðrðxtÞ > eÞ; we first consider a simplified situation where thecontrol y in (35) is fixed, i.e. yt ¼ y0 for t50: Based on the property of Wiener process, theprobability distribution under y0; py0x ðt;xÞ is

py0x ðt; xÞ ¼1

2ps2texp �

ð *x� m *xðtÞÞ2þ ð*y� m*yðtÞÞ

2

2s2t

!ð36Þ

where m *xðtÞ ¼ *x0 þ vt cos y0 and m*yðtÞ ¼ *y0 þ vt sin y0: Here, the superscript in py0x indicates thefixed control y0 and the subscript x implies the x co-ordinates. Now, we change the co-ordinatesby transformation Gt at time t:

#x¼4

x0

y0

" #¼ Gt

*x

*y

" #with Gt ¼

cos bt sin bt

�sin bt cos bt

" #ð37Þ

where bt is the angle between the ‘line of sight’*** from the evader to the pursuer and the *x-axis,which is illustrated in Figure 3.

In the new x0–y0 co-ordinates, the state x0 and y0 are jointly Gaussian with the mean

mx0

my0

" #¼

cos bt sin bt

�sin bt cos bt

" #m *x

m*y

" #¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*x þ m2*y

q0

24 35 ð38Þ

and the covariance Covt;

Covt ¼ Gt

s2t 0

0 s2t

" #GT

t ¼s2t 0

0 s2t

" #

Evader

Pursuer

xy

x

yv

t

, yx

Figure 3. Change of the *x–*y co-ordinates to the x0–y0 co-ordinates.

***Here, the ‘line of sight’ is drawn according to the expected positions of both the pursuer and the evader.



DOI: 10.1002/rnc

Thus, the random variables in the new x0–y0 co-ordinates are independent, and the distributionpy0#x is

py0#x ðt; #xÞ ¼1

2ps2texp �

x0 �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*xðtÞ þ m2*yðtÞ

q !2þ y02

2s2t

0BBBBB@

1CCCCCA ð39Þ

Next, we change the x0–y0 co-ordinates to a r–j (polar) co-ordinate system as

rð #xÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix02 þ y02

p; jð #xÞ ¼ /ð~#x; ~x0Þ with jð #xÞ 2 ½0; 2pÞ

Here, /ð~#x; ~x0Þ is the angle between the vector ~x and the x0-axis and its range is ½0; 2pÞ: Denote thistransformation by �Gt. At each #x; the absolute value of the Jacobian J �Gt

ð #xÞ of the transformation �Gt

is jJ �Gtð #xÞj ¼ 1=rð #xÞ: The probability density py0r;jðt; r;jÞ in the r–j co-ordinates is

py0r;jðt; r;jÞ ¼ jJ �Gtð #xÞj�1py0#x ðt; xÞ ¼

r2ps2t

exp �

r cos j�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*xðtÞ þ m2*yðtÞ

q !2þ ðr sinjÞ2

2s2t

0BBBBB@

1CCCCCAClearly, pr;jðt;r;jÞ5pr;jðt; r; 0Þ for any j=0: Let %rm *x;m*y

¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2*x þ m2*y

q: The probability that r > r

(r50) under control y0 at time t; Py0ðrð #xtÞ > rÞ satisfies

Py0 ðrð #xtÞ > rÞ5Z 2p

0

djZ 1r

pr;jðt;r;j ¼ 0Þ dr

¼

Z 2p

0

djZ 1r

r2ps2t

exp �ðr� %rm *x;m*y

Þ2

2s2t

!dr ð40Þ

Next, we consider the case when both the pursuer and the evader exploit their optimal control yn

determined in (30). Intuitively, the optimal control yn (in state feedback) drives each state ð *x; *yÞtowards the origin, such that it outperforms (static) y0 in terms of capture. Similar to (40), weassume that Equation (41) is true. Denote by mn

*xðtÞ and mn*yðtÞ the mean of *x and *y at time t50

under yn and %rmn

*x;mn

*yðtÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimn*xtðtÞ2 þ mn

*ytðtÞ2

q.

PynðrðxtÞ > rÞ5Z 1r

Crs2t

exp �ðr� %rmn

*x;mn

*yðtÞÞ2

2s2t

!dr for some constant C > 0 ð41Þ

Here, PynðrðxtÞ > rÞ is the probability of r > r under yn at time t: Due to the convexity of r ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2

pand by Jensen’s inequality,

%rmn

*x;mn

*y¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimn*x2þ mn

*y2

q4E

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2yn þ *y2yn

qn o¼4

%ryn

Thus, inequality (41) still holds when %rmn

*x;mn

*yis replaced by %ryn ; i.e.,

PynðrðxtÞ > rÞ5Z 1r

Crs2t

exp �ðr� %ryn ðtÞÞ

2

2s2t

� �dr ð42Þ



DOI: 10.1002/rnc

Let ryn ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2yn þ *y2yn

qalong the trajectory under the optimal control yn: The evolution of rynðtÞ can

be derived based on the Ito’s rule. According to XðtÞ in (13), the evolution of rynðtÞ under the controlyn satisfies

’rynðtÞ ¼@rynðtÞ@ *x

v cos yn þ@ryn ðtÞ@*y

v sin yn þ1

2

@2rynðtÞ@ *x2

s2 þ1

2

@2 %ryn ðtÞ@*y2

s2

¼ � vþ1

2rynðtÞs2 ð43Þ

Here, ð@rynðtÞ=@ *xÞ cos ynþ ð@rynðtÞ=@*yÞ sin yn ¼ �1: Next, our study is focused on the set

S ¼ fð *x; *yÞjffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2

p> eg: Since e > s2=2v (cf. Theorem 5) and ryn > e; according to (43)

’rynðtÞ ¼ �vþ1

2ryn ðtÞs25� vþ

1

2es2 ¼ �k250 ð44Þ

It indicates that the decreasing rate of r at any point inS is greater than k2; and thus the decreasing

rate of %ryn ¼4

Efryn ðxÞjrynðxÞ > eg; the expectation of ryn given ryn > e; is bigger than k2: Namely,

%rynðtÞ5r0�k2t; where r0¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x20 þ *y20

q: The inequality of (41) holds with %rynðtÞreplaced by r0�k

2t:

Claim 1

PynðrðxtÞ > eÞ5Z 1e

Crs2t

exp �ðr� r0 þ k2tÞ2

2s2t

� �dr ð45Þ

Theorem 8In a PE game with the dynamics specified in (19), if vp > ve; e > ðs2p þ s2eÞ=2ðvp � veÞ and theprobability PðrðxtÞ > eÞ under the optimal control yn satisfies (45), then EfTg51:

ProofLet t0 ¼ ðr0 � eÞ=k2; %t ¼ t� t0 and r ¼ rþ k2%t: By (45),

PðrðxtÞ > eÞ5Z 1e

Crs2t

exp �ðr� r0 þ k2tÞ2

2s2t

� �dr

¼

Z 10

Crs2ð%tþ t0Þ

exp �ðrþ k2%tÞ2

2s2ð%tþ t0Þ

� �dr

ðbecause r ¼ rþ k2%t Þ ¼Z 1k2%t

Cðr� k2%tÞs2ð%tþ t0Þ

exp �r2

2s2ð%tþ t0Þ

� �dr

Choose some t1 such that k2ðt1 � t0Þ > 1 and let t2 ¼ maxf2t0; t1g: Consider that t > t2; such that%t > t0 and k2%t > 1: Then,

PðrðxtÞ > eÞ5Z 1k2%t

Cðr� k2%t Þs2ð%tþ t0Þ

exp �r2

2s2ð%tþ t0Þ

� �dr5

Z 1k2%t

Cðr� k2%tÞ2s2t0

exp �k2%t � r4s2%t

� �dr

¼

Z 1k2%t

Cðr� k2%tÞ2s2t0

exp �k2r4s2

� �dr ¼

8Cs2

k4t0exp �

k4ðt� t0Þ

4s2

� �ð46Þ

for any t50: The rest of the proof follows the proof of Lemma 7 from (29). &



DOI: 10.1002/rnc

Finally, we verify Claim 1 by solving the FPE in (35) numerically using the finite differencemethod. Here, set e ¼ 0:5; v ¼ vp � ve ¼ 1; s2 ¼ 0:3: Choose the initial positions of the playerssuch that ð *x0; *y0Þ ¼ ð1; 1Þ; and yn is determined in (24). In addition, by (44), let k2 ¼ v� s2=2e ¼0:7; and choose constant C as 1=6: The evolution of PynðrðxtÞ > eÞ with time by solving the FPEequation is plotted in Figure 4 as well as the analytical upper bound on the right-hand side of(45), which is denoted by Pu: The result verifies the claim in (45).

3.5. Stochastic PE games with imperfect state information

In this section, PE games with imperfect state information is briefly introduced. We only discussthe difficulty of such problems and suggest a potential suboptimal solution technique. Completesolution still need further investigation in depth.

In games where the players’ measurements are imperfect, the information available to eachplayer is no long symmetric, and thus, the information structure can raise fundamentalchallenges [15, 16, 29]. In general, this is a very difficult problem, and theoretical results in thisfield are still largely unavailable. To avoid the difficulty in the information structure, we focuson a class of PE problems where pursuers have noisy measurements but evaders can stillmeasure the states perfectly. This represents a worst-case scenario to the pursuers. Under thissetting, we may approach the problem by optimization from the pursuers’ perspective.

Consider the players’ dynamic equation in (2) with (3) and the objective functional in (4). Thepursuers’ measurement is described by

yt ¼ hðxt; xÞ

where y is the measurement, h is the measurement function, and x is the disturbance with knownstatistics. In addition, since z is a logic state, it is known to the pursuers. Let I

pt be the

0 2 4 6 8 10

0

0.5

1

1.5

2

2.5

3

3.5

Pθ*

(ρ>

ε) w

ith th

e A

naly

tical

Upp

er−b

ound

Pu(

ρ>ε)

Time (s)0 2 4 6 8 10

0

0.5

1

1.5

2

2.5

Pu−P

θ*Time (s)

Probability Pθ*(ρ>ε)

Upper−bound Pu(ρ>ε)

Figure 4. Probability Pyn ðr > eÞ vs the upper bound Pu:



DOI: 10.1002/rnc

‘information set’ of the pursuers at time t; which includes all the information available up totime t: Here, I

pt ¼ fy½t0;t�; z½t0;t�; a½t0;tÞg; where y½t0;t� ¼

4

fyt; t04t4tg and z½t0;t� is defined similarly;a½t0;tÞ ¼

4

fat; t04t5tg: The admissible control input of the pursuers is a map a : ½0;T �/Aa; whichis progressively measurable with respect to not only the s-field associated with o but also thatassociated with the measurement disturbance x: Denote by AðtÞ the set of all admissible a:Suppose that the evaders exploit the optimal strategies b that are determined in thecorresponding problem with perfect state information. Let b 2 DðtÞ be a non-anticipativestrategy of the evaders. The dynamic optimization problem (for closed-loop strategies) of thepursuers can be; infa2AðtÞ Ext;xfsupbb2BðtÞJða; bb; xt; ztÞjI

pt g: Here, J is similar to (4) in the perfect

information case; bb is the evaders’ control input associated with some strategy b; theexpectation is taken over xt; o and x given I

pt : In this formulation, since the evaders have perfect

state information, they are assumed to exploit the optimal strategies in the correspondingproblem with perfect information. Thus, optimization associated with the evaders is inside theexpectation with respect to xt and x. In addition, the closed-loop control of the pursuers shouldbe adapted to their future measurements yt taken at time t (t4t4T), whereas the evaders’decision-making depends on the perfect states all the time. So far, this problem is analysedconceptually, and the discussion may shed light on why stochastic game (control) problem ishard. Even the problem formulation of a special case is difficult. The issues of existence anduniqueness of solutions are still largely open.

In what follows, the authors only point out a heuristic approach to such problems from apractical point of view. Suppose that measurements are taken by the pursuers at each sampletime t0 þ kDt for k 2 Zþ: The information set of the pursuers at time t is I

pt ¼ fI

pt0 ; z½t0;t�; a½t0;tÞ;

yt0þkDt for k 2 Zþ and t0 þ kDt4tg; where Ipt0 is the initial information set at t0: Clearly, the

information sets at consequent sample times satisfies

Ipt0þðkþ1ÞDt

¼ fIpt0þkDt

; zðt0þkDt;t0þðkþ1ÞDt�; a½t0þkDt;t0þðkþ1ÞDtÞ; yt0þðkþ1ÞDtg ð47Þ

Let us choose a cost-to-go function asfVC2GðIpt Þ ¼ Extf

fVhðxtÞjI

pt g; where

fVhis an approximate

upper Value function determined in (11) by the hierarchical approach given the perfect stateinformation. Then, at each sample time t ¼ t0 þ kDt; the pursuers solve the followingoptimization problem:

infað�Þ2AðtÞ

Ext;o

Z tþDt

t

Gðt;xt; zt; at;bn½a�tÞ dt

� �þ EytþDt

fVC2GðIptþDtÞjI

pt

� �ð48Þ

where bn solves the limited look-ahead optimization problem in (8) with cost-to-go functionfVh

given perfect state information. Note that here, the pursuers’ measurement ytþDt at the nextsample time is taken into account. By (47), the expectation with respect to ytþDt is equivalent tothe expectation with respect to I

ptþDt given I

pt ; a½t;tþDtÞ and zðt;tþDt�: Finally, to reduce the

computation, certainty equivalence can be used in cost-to-go function fVC2G; i.e. fVC2G

ðIpt Þ ¼

fVhð #xtÞ; where #xt is the expectation of state x given I

pt : Note that here the evaders are

assumed to exploit the strategy b*. In practice, this may not be true, such that informationabout the evaders’ control is needed for the pursuers to estimate the states.

In implementation, the optimization problem in (48) is solved at every t0 þ kDt; and theresulting strategy of the pursuers is implemented during the next Dt interval. This repetitiveimplementation of one-step look-ahead optimization is similar to model predictive control in

procedure withfVC2G as the terminal cost. In this suboptimal method, the feature of feedback is



DOI: 10.1002/rnc

crucial to stochastic games and the optimization based on look-ahead may be beneficial due to

the improving property of fVhwith perfect state information.

3.6. On finite expectation of the capture time

Finally, we explore the finite expected capture time of the evader in a two-player game withimperfect state information. A specific model of measurement is used to demonstrate the effectof the measurement accuracy on the capture time of the evader. Consider the two-player PEgame in Section 3.4.3. Let xB ¼ ½ %xB; %yB�

T for B 2 fp; eg: Again, we assume that the evader canmeasure the states (xp;xe) perfectly; while the pursuer knows its own state xp perfectly but canonly access a noisy measurement of xe; which is described by

ye ¼ xe þ xeðxeÞ ð49Þ

Here, ye is the measurement; xeðxeÞ is a random vector representing the measurementdisturbance.

Since the evader can access the perfect states, we assume that the evader exploits the optimalcontrol yne determined in (24). Note that yne is still optimal here if the pursuer’s measurementdoes not depend on the evader’s control input. Suppose that the pursuer calculates its optimalcontrol *yp according to (24) but with the perfect state xe replaced by its measurement ye: Let*x¼4

%xp � %xe and *y¼4

%yp � %ye: Define r ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi*x2 þ *y2

p: According to (13), the evolution of r under

the controls yne and *yp can be described as follows:

’r ¼@r@ %xp

vp cos *yp þ@r@%yp

vp sin *yp

� �þ ve þ

1

2rs2 ð50Þ

Note that (50) reduces to (43) when ye ¼ xe: Assume that the disturbance xe is bounded. In thefollowing, we provide a bound Xe of xe; such that if jjxejj4Xe; %r¼

4

Efrjr > eg has a positivedecreasing rate.

Construct a co-ordinate system whose origin is the current position of the pursuer. Denote byWðxÞ the angle between the line of sight from the evader to the pursuer and the %x-axis, asillustrated in Figure 5. Define gðrÞ ¼ cos�1ððve þ s2=2rÞ=vpÞ; where s2 ¼ s2p þ s2e : Assume thate > ðs2p þ s2eÞ=2ðvp � veÞ; such that gðrÞ is well defined when r5e; and 05gðrÞ5p=2: Define theset De¼

4

fdjgðeÞ > d > 0g: Let gdðrÞ ¼ gðrÞ � d for some d 2 De:

Evader

Pursuer

vp

ve

x

x

ye

e Ye

p

Figure 5. Illustration of the pursuer’s control based on a noisy measurement.



DOI: 10.1002/rnc

Proposition 9If there exists d 2 De such that XeðrÞ4r tanðgdðrÞÞ for r5e at any xe and any time t50; then’%rðtÞ5� k2 at any t50 for some k=0; where %r ¼ EfrðxÞjrðxÞ > eg:

ProofSince jjxeðxÞjj4XeðrÞ; according to Figure 5, clearly, ye falls within the region Ye; which is acircle of radius XeðrÞ centred at xe: Let #*x ¼ %xp � ye1 and #*y ¼ %yp � ye2 : By (24), *yp satisfies

cos *yp ¼ �#*xffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

#*x2 þ #*y2q and sin *yp ¼ �

#*yffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi#*x2 þ #*y2

qNamely, *yp 2 Yd

x¼4

fyjWðxÞ þ p� gdðrÞ4y4WðxÞ þ pþ gdðrÞg: Rewrite Equation (50) as

’r ¼@r@ %xp

vp cos *yp þ@r@%yp

vp sin *yp

� �þ ve þ

1

2rs2 ð51Þ

In (51), @r=@ %xp ¼ cos WðxÞ and @r=@%yp ¼ sin WðxÞ: Thus, ð@r=@ %xpÞ cos *yp þ ð@r=@%ypÞ sin*yp ¼ cosðWðxÞ � *ypÞ: The fact that *yp 2 Yd

x implies j*yp � WðxÞ � pj4gdðrÞ5p=2; i.e.

cosð*yp � WðxÞ � pÞ5cosðgdðrÞÞ: Thus,

cosðWðxÞ � *ypÞ ¼ �cosðWðxÞ � *yp � pÞ4� cosðgdðrÞÞ ¼ �cosðgðrÞ � dÞ ð52Þ

Substitute (52) into (51),

’r4 � vp cosðgðrÞ � dÞ þ ve þ1

2rs2 ¼ �vp cosðgðrÞ � dÞ � ve þ

s2

2r

� �vp

� �¼ � vp cosðgðrÞ � dÞ � cos gðrÞ½ � ¼ �2vp sinðgðrÞ � d=2Þ sinðd=2Þ

4 � 2vp sinðgðeÞ � d=2Þ sinðd=2Þ ¼ �k250 ð53Þ

Since (53) holds for any r > e; thus it holds for ’%r if the evader is not captured, i.e. ’%r5� k2: &

Proposition 9 states that if jjxejj4r tanðgdðrÞÞ; the distance between the pursuer and the evaderdecreases on average. Following the analysis in Section 3.4.3, it is expected that the capture timeof the evader has a finite expectation. Although the bounded measurement disturbanceconsidered in (49) is restrictive and it is unlikely to be true in real-world problems, Proposition 9still has its practical value because it sheds light on the relationship between the measurementaccuracy and the capturability as well as the capture range of the pursuer (indicated by e). Theresults can be extended to a more general case where the measurement noise is unbounded butwith a known probability distribution.

4. SIMULATION RESULTS

In the following, two examples are simulated to illustrate the usefulness and the feasibility of thelimited look-ahead method for stochastic multi-player PE differential games.



DOI: 10.1002/rnc

4.1. Performance enhancement by limited look-ahead

In this section, the usefulness of limited look-ahead in the performance enhancement isdemonstrated in stochastic co-operative pursuit problems. Consider a simple PE game involvingtwo pursuers and two evaders with the players’ dynamics in (19) and the objective in (10), whichis the sum of the capture times of all the evaders. Consider a specific scenario, where the players’initial positions and velocities are specified in Table I. Let s2p ¼ s2e ¼ 0:5 and e ¼ 0:5: Here, weassume that both the pursuers and the evaders can access the state variables perfectly.

We first apply the hierarchical method, where the objective of the optimization problem at theupper level is (11) with the Value function of the distributed two-player games specified in (20).The resulting optimal engagement is that pursuer 1 (2) is engaged with evader 1 (2). Under thisengagement, the strategies of the pursuers and the evaders are determined by solving thedistributed two-player games at the lower level. Typical sample trajectories of co-operativepursuit under such an engagement are shown in Figure 6(a), where the arrows indicate theexpected instantaneous moving directions of the players when the snapshots are taken. We use acircle to indicate the capture range of the pursuer as well as the capture of the correspondingevader inside.

Now, let x ¼ ½ %xp; %yp; %xe; %ye�T and denote by fVh

ðx; zÞ the suboptimal upper Value obtained bythe hierarchical approach as in (11). Suppose that the game starts at t0: Given Dt > 0; at eachsample time t ¼ t0 þ kDt (k 2 Z50), we implement the optimization based on limited look-aheadas in (9) but with certainty equivalence, i.e.

maxb2DðtÞ

minað�Þ2AðtÞ

Z tþDt

t

X2j¼1

zjðtÞ

!dtþfVh

ð #xtþDt;xt;a;b½a�

; ztþDt;zt; #x

Þ

( )ð54Þ

where #xtþDt;xt;a;b½a�

¼4

EfxtþDt;xt;a;b½a�

g: In this example, we choose Dt ¼ 0:1; such that the‘minimax’ problem in (54) can be approximated by a static optimization problem with a;b ¼ b½a� fixed during Dt intervals. The optimal strategies (ant ; b

n

t ) solved are utilized during thenext Dt interval. By repetition of this procedure, the trajectories of the players can be generated,and one of the samples is illustrated in Figure 6(b).

In Figure 6(b), we draw the dashed arrows to emphasize the players’ movement. Clearly, theevolution of the game in Figure 6(b) can better resemble the reality compared to Figure 6(a).Specifically, when the players are close, both pursuers can move co-operatively to force theevaders to change their escaping directions. In such a way, the performance is improved. Itshould be noted here that the trajectories in Figure 6 are sample runs. To further justify thelimited look-ahead method, we have simulated the same game 1000 times using both methods,and the average cost (accumulative capture time) according to (10) under the hierarchical

Table I. The necessary parameters of the players in a PEgame scenario.

Initial position Velocity

Pursuer 1 ð0; 0Þ 4Pursuer 2 ð9; 2Þ 4Evader 1 ð4; 0Þ 3Evader 2 ð5; 2Þ 3



DOI: 10.1002/rnc

method is 7.79 (s) while it is 6.53 (s) by the limited look-ahead method. Clearly, the performanceby the hierarchical approach can be improved by the limited look-ahead method.

4.2. Cooperative pursuit game with imperfect state information

In this section, we demonstrate the feasibility of the limited look-ahead approach through a multi-player stochastic PE game with imperfect state information. Consider a game with three pursuersand three evaders, and the dynamics of the players are given in (19). The sum of the capture time ofeach evader is the objective. Assume that the evaders have perfect state measurement; the pursuerscan measure their own states perfectly but their measurement of the evaders is noisy. The pursuers’measurement (of evader j) at each time t¼

4

t0 þ kDt is described by

y jðtÞ ¼%x je

%y je

" #þ

x j%x

x j%y

24 35 ð55Þ

Capture of Evader 1

Capture of Evader 2

Evader 2

Evader 1

Pursuer 2

Pursuer 1

0 2 4 6 8 10-4

-3

-2

-1

0

1

2

3

4

0 2 4 6 8 10-4

-3

-2

-1

0

1

2

3

4

Pursuer 1

Pursuer 2Pursuer 2

Pursuer 1

Evader 2

Evader 1

Capture of Evader 1

Capture of Evader 2

At the End of the Game

At the End of the Game

At the Beginning Stage

At the Beginning Stage(a)

(b)

Figure 6. Performance enhancement by limited look-ahead: (a) pursuit trajectories by the hierarchicalapproach and (b) pursuit trajectories by repetition of the optimization based on look-ahead.



DOI: 10.1002/rnc

where x j%x and x j

%y are independent Gaussian random variables, with zero-mean and a common

variance, i.e. x j%xðx

j%yÞ �Nð0; s2j Þ: Define y ¼ ½y1

T

; . . . ; yMT

�T: In this example, let s2j be 0.5 for

j ¼ 1; 2; 3 and Dt ¼ 0:1: The necessary parameters of the players are given in Table II.As in the previous example, we implemented the optimization based on limited look-ahead

repetitively at each sample time t0 þ kDt as in (54) but with #xtþDt;xt;a;b½a�

replaced by #xtþDt; *xt;a;b½a�

with *x¼4

½xTp ; yT�T: By doing this, the noisy measurement yt is used in the calculation as the

substitute of the true state xe: It should be noted that #xtþDt; *xt;a;b½a�

is the expectation of state x attþ Dt given the initial state *x at t under að�Þ and b:

The resulting (sample) pursuit trajectories at various stages are illustrated in Figure 7, where‘P’ stands for pursuer and ‘E’ for evader. It confirms the phenomenon observed in Figure 6(b)that the pursuers move co-operatively at the beginning stages and when the pursuers get closeenough, they become engaged with specific evaders. The reason lies in that according to theobjective in (10), all the evaders are equally important. Every evader tends to avoid potentialcapture by each of the pursuers. However, with the limited look-ahead, each evader may not‘detect’ which pursuer is actually after it. Hence, the pursuers have an advantage of concealingtheir true intent, which turns out to be the extra strength of co-operative pursuit with multiplepursuers compared to the simple summation of the individual pursuits as in the hierarchicalapproach. This advantage is due to the (hierarchical) structure relaxation in the optimizationover limited look-ahead intervals [12]. Finally, it should be noted that in the pursuit trajectoriesof the simulations, the evaders exploit the strategies calculated in the same optimizationproblem based on limited look-ahead as that for the pursuers. In practice, this may not be true,and if not, the pursuers may have a better performance with the same limited look-aheadstrategies because the pursuers optimize their worst case.

5. CONCLUSION

In this paper, a general stochastic multi-player PE differential game with additive Gaussiannoise in the dynamics has been formulated. To avoid the difficulty of multiplicity of the playersin conventional DP methods, a class of suboptimal approaches is specified, such that theresulting suboptimal solution has an improving property based on the optimization with limitedlook-ahead. A hierarchical method that decomposes a multi-player game into two-player gamesbelongs to the set. Starting from a proper suboptimal solution, the improvement based onlimited look-ahead can be applied iteratively and the process converges. Furthermore, we derive

Table II. Simulation parameters.

Pursuers Evaders

P 1 2 3 E 1 2 3

ð %xp0; %yp0 Þ (0;�12) (8; 8) (�8; 8) ð %xe0; %ye0Þ (0,7) (�3;�2) (4;�3)

vp(1/s) 5 5 5 ve(1/s) 3 2 2

s2p 0.5 0.5 0.5 s2e 0.5 0.5 0.5



DOI: 10.1002/rnc

an analytical solution for a two-player game using the Dubin’s car model, and the conditions onfinite expectation of the capture time are specified. The usefulness and the feasibility of limitedlook-ahead methods are demonstrated through selected simulation scenarios. One appealingadvantage of this limited look-ahead approach is that the real pursuers’ intentions can beconcealed from the evaders’ point of view.

The iterative method provides a natural framework to general multi-player (zero-sum) games.However, due to its close relation to DP methods, scalability is an issue. Practical algorithmsneed further investigation.

APPENDIX A

Lemma 10Given any að�Þ 2 AðtÞ; define

Waðx; zÞ ¼4

supb2DðtÞ

Eo

Z T

t


� �ðA1Þ

-10

-5

0

5

10

-10

-5

0

5

10

Cooperative Pursuit: Stage 1

Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4

Cooperative Pursuit: Stage 2

y bar

-10

-5

0

5

10

y bar

y bar

-10

-5

0

5

10

y bar

-5 0 5xbar

-5 0 5xbar

-5 0 5xbar

-5 0 5xbar

P1

P2

P3E1

E2 E3

Capture of E3

Capture of E1

Capture of E2

Figure 7. Co-operative pursuit trajectories of the players.



DOI: 10.1002/rnc

For any Dt > 0;

Waðx; zÞ5 supb2DðtÞ

Eo Pa;b;oðT4tþ DtÞZ T

t

Gðxt; zt; at;b½a�tÞ dtþQðxT Þ

� ��

þ Pa;b;oðT > tþ DtÞZ tþDt

t

Gðxt; zt; at;b½a�tÞ dt�

þWaðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ

��ðA2Þ

Here, Pa;b;oðT4tþ DtÞ represents the probability of T4tþ Dt given að�Þ; b and o:

ProofInequality (A2) is clear by the definition ofWa in (A1). &

A.1. Proof of Theorem 1

ProofGiven any e > 0 and for any x 2 X and z 2 ZM ; there exists *atð�Þ 2 AS

x;zðtÞ; such thatfVðx; zÞ5 supb2DðtÞ

Eo

Z T

t

Gðxt; zt; *att;b½*a

t�tÞ dtþQðxT Þ

� �� e ¼W*at ðx; zÞ � e ðA3Þ

By Lemma 10, given any Dt > 0;

W*at ðx; zÞ5 supb2DðtÞ

Eo P*at;b;oðT4tþ DtÞZ T

t


t�tÞ dtþQðxT Þ

� ��

þ P*at;b;oðT > tþ DtÞZ tþDt

t


t�tÞ dt�

þW*at ðxtþDt;x;*at;b½*at�;o; ztþDt;z;xÞ

��

¼ supb2DðtÞ

Eo

Z tþDt

t


t�tÞ dtþW*at ðxtþDt;x;*at ;b½*at�;o; ztþDt;z;xÞ

( )ðA4Þ

Note that Wa5fV for any a 2 AðtÞ; and then by (A3) And (A4),


Eo

Z tþDt

t


t�tÞ dtþfVðx

tþDt;x;*at ;b½*at�;o; ztþDt;z;xÞ

( )� e ðA5Þ

In addition, *atð�Þ 2 AðtÞ: It follows that


infa2AðtÞ

Eo

Z tþDt

t

Gðxt; zt; at;b½a�tÞ dtþfVðx

tþDt;x;a;b½a�;o; ztþDt;z;xÞ

( )� e

Since e is arbitrary, the proof is completed. &



DOI: 10.1002/rnc

A.2. Proof of Theorem 2

Proof

(i) We first prove that Wk5H½Wk� for k 2 Zþ by induction. Suppose that Wk5H½Wk�

(k 2 Z50), and we want to show that Wkþ15H½Wkþ1�: By definition,

Wkþ1ðx; zÞ ¼ supb2DðtÞ

infað�Þ2AðtÞ

Eo

(Z tþDt

t

Gðxt; zt; at;b½a�tÞ dt:

þWkðxtþDt;x;a;b½a�;o; ztþDt;z;xÞ

)ðA6Þ

Note that Wk5H½Wk� ¼Wkþ1: Replace Wk by Wkþ1 in (A6), and thenWkþ1ðx; zÞ5H½Wkþ1�ðx; zÞ: Since W05H½W0�; by induction, Wk5H½Wk� fork 2 Zþ: Considering that fWkg

1k¼0 is decreasing at any x 2 X and z 2 ZM ; fWkg

1k¼0

converges point-wisely because Wk is bounded from below for any k 2 Zþ:(ii) For any k 2 Zþ; Wkðx; zÞ5W1ðx; zÞ and it follows that Wkðx; zÞ5H½W1�ðx; zÞ: Let

k!1; and W1ðx; zÞ5H½W1�ðx; zÞ: On the other hand, W1ðx; zÞ4H½Wk�ðx; zÞfor any k 2 Zþ: Similarly, let k!1; such that W1ðx; zÞ4H½W1�ðx; zÞ: Thus,W1ðx; zÞ ¼H½W1�ðx; zÞ: &

ACKNOWLEDGEMENTS

This work was supported by the Collaborative Center of Control Science at the Ohio State Universityunder Grant F33615-01-2-3154 from the Air Force Research Laboratory (AFRL/VA) and the Air ForceOffice of Scientific Research (AFOSR).

REFERENCES

1. Vidal R, Shakernia O, Kim H, Shim D, Sastry S. Probabilistic Pursuit–evasion games: theory, implementation, andexperimental evaluation. IEEE Transactions on Robotics and Automation 2002; 18(5):662–669.

2. Hespanha J, Kim H, Sastry S. Multiple-agent probabilistic pursuit–evasion games. Proceedings of the 38th IEEEConference on Decision and Control, Phoenix, AZ, 1999; 2432–2437.

3. Antoniades A, Kim H, Sastry S. Pursuit–evasion strategies for teams of multiple agents with incompleteinformation. Proceedings of the 42nd IEEE Conference on Decision and Control, Maui, HI, 2003; 756–761.

4. Li D, Cruz Jr JB, Chen G, Kwan C, Chang M. A hierarchical approach to multi-player pursuit–evasion differentialgames. Proceedings of the 44th Joint Conference of CDC-ECC05, Seville, Spain, December 2005; 5674–5679.

5. Li D, Cruz Jr JB. Better cooperative control with limited look-ahead. Proceedings of American Control Conference,Minneapolis, MN, June 2006; 4914–4919.

6. Kanchanavally S, Ordonez R, Layne J. Mobile target tracking by networked uninhabited autonomous vehicles viahospitability maps. Proceedings of the American Control Conference, Boston, MA, 2004; 5570–5575.

7. Liu Y, Cruz Jr JB, Sparks A. Coordinating networked uninhabited air vehicles for persistent area denial.Proceedings of the 43rd IEEE Conference on Decision and Control, Paradise Island, Bahamas, 2004; 3351–3356.

8. Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit. Wiley: New York,1965.

9. Basar T, Olsder G. Dynamic Noncooperative Game Theory (2nd edn). SIAM: Philadelphia, 1998.10. Hespanha J, Prandini M, Sastry S. Probabilistic pursuit–evasion games: a one-step nash approach. Proceedings of

the 39th IEEE Conference on Decision and Control, Sydney, Australia, 2000; 2272–2277.



DOI: 10.1002/rnc

11. Schenato L, Oh S, Sastry S. Swarm coordination for pursuit evasion games using sensor networks. Proceedings of theInternational Conference on Robotics and Automation, Barcelona, Spain, 2005; 2493–2498.

12. Li D, Cruz Jr JB. Improvement with look-ahead on cooperative pursuit games. Proceedings of the 44th IEEEConference on Decision and Control, San Diego, CA, December 2006.

13. Li D, Cruz Jr JB. General multi-player pursuit–evasion differential games. IEEE Transactions on Automatic Control,submitted for publication.

14. Fleming WH, Souganidis PE. On the existence of value functions of two-player, zero-sum stochastic differentialgames. Indiana University Mathematics Journal 1989; 38(2):293–314.

15. Rhodes I, Luenberger D. Differential games with imperfect state information. IEEE Transactions on AutomaticControl 1969; 14(1):29–38.

16. Kian AR, Cruz Jr JB, Simaan MA. Stochastic discrete-time Nash games with constrained state estimators. Journalof Optimization Theory and Applications 2002; 114(1):171–188.

17. Fleming W, Rishel R. Deterministic and Stochastic Optimal Control. Springer: New York, 1975.18. Yong J, Zhou X. Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer: New York, 1999.19. Bardi M, Capuzzo-Dolcetta I. Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations,

Birkhauser: Boston, 1997.20. Varaiya P. On the existence of solutions to a differential game. SIAM Journal of Control 1967; 5:153–162.21. Roxin E. Axiomatic approach in differential games. Journal of Optimization Theory and Applications 1969; 3(3):

153–163.22. Elliott R, Kalton N. The existence of value in differential games. Memoirs of the American Mathematical Society,

vol. 126. American Mathematical Society: Providence, Rhode Island, 1972.23. Elliott R, Kalton N. Cauchy problems for certain Isaacs–Bellman equations and games of survival. Transactions of

the American Mathematical Society 1974; 198:45–72.24. Kushner H, Dupuis P. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer: New York,

2001.25. Barto A, Bradtke S, Singh S. Learning to act using real-time dynamic programming. Artificial Intelligence 1995;

72:81–138.26. Roy B. Neuro-dynamic programming: overview and recent trends.Handbook of Learning and Approximate Dynamic

Programming. Kluwer: Dordrecht, 2001; 431–460.27. Si J, Barto A, Powell W, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. Wiley: New

York, 2004.28. Risken H. The Fokker–Planck Equation: Methods of Solution and Applications (2nd edn). Springer: Berlin, 1996.29. Rhodes I, Luenberger D. Stochastic differential games with constrained state estimators. Transactions on Automatic

Control 1969; 14(5):476–481.



DOI: 10.1002/rnc

Documents

Stochastic multi-player pursuit–evasion diﬀerential gamescruz/Papers/J111-IJRNC.pdf · player stochastic games based on conventional stochastic control theory. Finally, problems