10
Nash Strategies for Pursuit-Evasion Differential Games Involving Limited Observations WEI LIN, Member, IEEE Western Digital Corporation Irvine, CA, USA ZHIHUA QU, Fellow, IEEE MARWAN A. SIMAAN, Life Fellow, IEEE University of Central Florida Orlando, FL, USA A linear-quadratic N-pursuer single-evader differential game is considered. The evader can observe all the pursuers but pursuers have limited observations of themselves and the evader. The evader implements the conventional feedback Nash strategy and the pursuers implement Nash strategies based on a novel concept of best achievable performance indices. This problem has potential applications in situations where a well-equipped unmanned vehicle is evading several weakly equipped pursuing vehicles. An illustrative example is solved, and several scenarios are presented. Manuscript received August 26, 2013; revised June 12, 2014, August 10, 2014; released for publication October 12, 2014. DOI. No. 10.1109/TAES.2014.130569. Refereeing of this contribution was handled by T. Robertazzi. This work is partially supported by US National Science Foundation under grants ECCS-1308928 and CCF-0956501. A preliminary version of this paper was presented at the 2013 American Control Conference, June 2013, Washington, D.C., and is referenced in [16]. Authors’ addresses: W. Lin, Western Digital Corporation, 3355 Michelson Drive, Suite 100, Irvine, CA 92612; Z. Qu, M. A. Simaan, University of Central Florida, Dept. of Electrical Engineering and Computer Science, Harris Engineering Center, Orlando, FL 32816, E-mail: ([email protected]). 0018-9251/15/$26.00 C 2015 IEEE I. INTRODUCTION Differential game theory has attracted considerable attention in aerospace systems and optimal control literature since the original work of Isaacs [11] in 1965. Since then, there has been a variety of applications of this theory, including pursuit-evasion games [1, 8, 10, 1416, 19, 2124] as well as military and aerospace applications [2, 6, 7, 9, 17, 18]. Basically, the pursuit-evasion problem models the process where one or more pursuers tries to chase one or more evaders while the evaders try to escape. Solving a pursuit-evasion game essentially involves developing strategies for both the pursuers and evaders such that their prescribed performance indices are optimized. Historically, pursuit-evasion games were first investigated by Isaacs [11] in the 1960s. Saddle point solutions for a type of zero-sum single-pursuer single-evader games were considered in [10]. Nonzero-sum pursuit-evasion games were introduced and investigated as an example of the Nash strategies in [23] and as an example of the leader-follower Stackelberg strategies in [21, 22]. Two pursuers and one evader games were studied in [8]. In recent years, multiplayer pursuit evasion differential games involving a number of pursuers and a single evader have received considerable attention in the literature [1, 16, 18, 24]. These types of games present numerous interesting challenges when compared to the original single-pursuer single-evader game. For the pursuers, the challenge revolves around designing coordinated strategies to accomplish their common goal of capturing the evader. For the evader, who is clearly outnumbered by the pursuers, the challenge is to maneuver around the pursuers so as to avoid being captured. Conventional multiplayer pursuit-evasion games usually assume that every player has unlimited observations of all the other players at all times and yields feedback strategies that are dependent on the initial states. While this unlimited observations assumption may hold in some cases, a realistic challenge in solving these problems is when some of the players may not have access to unlimited observations of other players. This condition might occur due to factors such as limited sensing capabilities, obstacles in the environment, or severe weather conditions. The conventional differential games approach for solving problems with limited observations yields feedback strategies that depend on the initial states of all the players [12, 13], but the requirement of all pursuers having access to the full information of initial states may not be met in certain applications. This paper considers an N-pursuer single-evader game over a finite time horizon in which only the evader is assumed to have unlimited sensing capability that allows it to observe all the pursuers at all times. Each pursuer, on the other hand, has limited sensing capabilities that allows it to observe the evader and/or other pursuers only if they fall within its sensing range. A practical example of such a situation occurs when a well-equipped unmanned aerial vehicle (UAV) with a very wide range of sensing IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015 1347

Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

Nash Strategies forPursuit-Evasion DifferentialGames Involving LimitedObservations

WEI LIN, Member, IEEEWestern Digital CorporationIrvine, CA, USA

ZHIHUA QU, Fellow, IEEEMARWAN A. SIMAAN, Life Fellow, IEEEUniversity of Central FloridaOrlando, FL, USA

A linear-quadratic N-pursuer single-evader differential game isconsidered. The evader can observe all the pursuers but pursuershave limited observations of themselves and the evader. The evaderimplements the conventional feedback Nash strategy and thepursuers implement Nash strategies based on a novel concept of bestachievable performance indices. This problem has potentialapplications in situations where a well-equipped unmanned vehicle isevading several weakly equipped pursuing vehicles. An illustrativeexample is solved, and several scenarios are presented.

Manuscript received August 26, 2013; revised June 12, 2014, August 10,2014; released for publication October 12, 2014.

DOI. No. 10.1109/TAES.2014.130569.

Refereeing of this contribution was handled by T. Robertazzi.

This work is partially supported by US National Science Foundationunder grants ECCS-1308928 and CCF-0956501. A preliminary versionof this paper was presented at the 2013 American Control Conference,June 2013, Washington, D.C., and is referenced in [16].

Authors’ addresses: W. Lin, Western Digital Corporation, 3355Michelson Drive, Suite 100, Irvine, CA 92612; Z. Qu, M. A. Simaan,University of Central Florida, Dept. of Electrical Engineering andComputer Science, Harris Engineering Center, Orlando, FL 32816,E-mail: ([email protected]).

0018-9251/15/$26.00 C© 2015 IEEE

I. INTRODUCTION

Differential game theory has attracted considerableattention in aerospace systems and optimal controlliterature since the original work of Isaacs [11] in 1965.Since then, there has been a variety of applications of thistheory, including pursuit-evasion games [1, 8, 10, 14–16,19, 21–24] as well as military and aerospace applications[2, 6, 7, 9, 17, 18]. Basically, the pursuit-evasion problemmodels the process where one or more pursuers tries tochase one or more evaders while the evaders try to escape.Solving a pursuit-evasion game essentially involvesdeveloping strategies for both the pursuers and evaderssuch that their prescribed performance indices areoptimized. Historically, pursuit-evasion games were firstinvestigated by Isaacs [11] in the 1960s. Saddle pointsolutions for a type of zero-sum single-pursuersingle-evader games were considered in [10].Nonzero-sum pursuit-evasion games were introduced andinvestigated as an example of the Nash strategies in [23]and as an example of the leader-follower Stackelbergstrategies in [21, 22]. Two pursuers and one evader gameswere studied in [8]. In recent years, multiplayer pursuitevasion differential games involving a number of pursuersand a single evader have received considerable attention inthe literature [1, 16, 18, 24]. These types of games presentnumerous interesting challenges when compared to theoriginal single-pursuer single-evader game. For thepursuers, the challenge revolves around designingcoordinated strategies to accomplish their common goal ofcapturing the evader. For the evader, who is clearlyoutnumbered by the pursuers, the challenge is tomaneuver around the pursuers so as to avoid beingcaptured. Conventional multiplayer pursuit-evasion gamesusually assume that every player has unlimitedobservations of all the other players at all times and yieldsfeedback strategies that are dependent on the initial states.While this unlimited observations assumption may hold insome cases, a realistic challenge in solving these problemsis when some of the players may not have access tounlimited observations of other players. This conditionmight occur due to factors such as limited sensingcapabilities, obstacles in the environment, or severeweather conditions. The conventional differential gamesapproach for solving problems with limited observationsyields feedback strategies that depend on the initial statesof all the players [12, 13], but the requirement of allpursuers having access to the full information of initialstates may not be met in certain applications.

This paper considers an N-pursuer single-evader gameover a finite time horizon in which only the evader isassumed to have unlimited sensing capability that allows itto observe all the pursuers at all times. Each pursuer, onthe other hand, has limited sensing capabilities that allowsit to observe the evader and/or other pursuers only if theyfall within its sensing range. A practical example of such asituation occurs when a well-equipped unmanned aerialvehicle (UAV) with a very wide range of sensing

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015 1347

Page 2: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

capability must evade several (possibly a large number of)weakly equipped pursuing UAVs. Because of the limitedsensing capabilities of the pursuers, these types ofproblems need to be treated using an approach differentfrom the probabilistic or artificial intelligence approachesthat currently exist in the literature [1, 7, 9, 14, 18, 24].This paper considers a differential game approach informulating and solving this problem. This approachbuilds upon previous results obtained in [15, 16] andovercomes the requirement of full observations of theinitial states by all pursuers in implementing theirfeedback strategies. In [15], a class of distributed gamestrategies for the pursuers and evader were shown to beinversely optimal with respect to certain performanceindices that are typically different from the originalindices. These results were then extended in [16] using anoptimization approach to derive optimal strategies withinthis class. In this paper, we further explore the controlstructure of feedback Nash strategies for multipursuersingle-evader differential games with unlimitedobservations and extend it to the case where only thepursuers have limited observation capabilities. In such acase, the novel concept of best achievable performanceindices that utilize the inverse optimality is proposed todetermine feedback Nash strategies that can beimplemented for the given control structure and areindependent of the initial states. This is a very importantproperty of our approach in overcoming a long-standingproblem of the dependence on the initial states of thefeedback Nash strategies that are derived using theconventional approach.

The remainder of the paper is organized as follows.Problem formulation is presented in Section II. Thefeedback Nash strategies with unlimited observations arepresented in Section III. Pursuers’ Nash strategies withlimited observations are analyzed and obtained in SectionIV. An illustrative example and simulation results withdifferent scenarios are shown in Section V. Concludingremarks are made in Section VI.

II. PROBLEM FORMULATION

In this paper, we consider a differential game problembetween a single evader with unlimited sensing capabilityand N pursuers with limited sensing capabilities. Thedisplacement vector zi between pursuer i and the evader eas shown in Fig. 1 is defined as

zi = xe − xi ∀i ∈ {1, · · · , N}, (1)

where xe ∈ Rn is the evader’s position vector and xi ∈ R

n

is pursuer i’s position vector.We assume that a collective objective of the pursuers is

to minimize the sum of the weighted distances betweenthe evader and themselves at a terminal time tf > 0 whileat the same time minimizing these distances and theircontrol efforts over the time interval [0, tf]. Hence, thegroup of pursuers tries to minimize the following

Fig. 1. Displacement vectors.

performance index:

Jp =N∑

j=1

fpj

2

∥∥zj (tf )∥∥2

+∫ tf

0

N∑j=1

(qpj

2

∥∥zj

∥∥2 + rj

2

∥∥uj

∥∥2)

dt, (2)

where uj is pursuer j’s velocity control input, ‖·‖ is theEuclidean norm, and scalarsfpj

, qpj, and rj are positive

weights for j = 1, . . . , N. On the other hand, we assumethat the evader’s objective is to maximize the sum of theweighted terminal distances between the pursuers anditself while at the same time maximizing these distancesand minimizing its control effort over the time interval[0, tf]. Hence, the evader will try to minimize theperformance index:

Je = −N∑

j=1

fej

2

∥∥zj (tf )∥∥2

+∫ tf

0

N∑j=1

(−qej

2

∥∥zj

∥∥2)

+ re

2‖ue‖2 dt, (3)

where ue is the evader’s velocity control input andscalarsfej

, qej, and re are positive weights for j = 1, . . . ,

N. To express the system dynamics more compactly, wedefine the vector z = [zT

1 ; · · · zTN ]T which, along with (1),

yields

z = Beue + Bpup. (4)

where matrix Be = 1N ⊗ In, 1N ∈ RN×1 is a vector with

all the entries equal to 1, In ∈ Rn×n is the identity matrix,

⊗ is the Kronecker product, up = [uT1 · · · uT

N ]T , and Bp

= −IN ⊗ In. The performance indices (2) and (3) can berewritten as

Jp = 1

2

∥∥z(tf )∥∥2

Fp+ 1

2

∫ tf

0(‖z‖2

Qp+ ‖up‖2

Rp) dt, (5a)

Je = 1

2

∥∥z(tf )∥∥2

Fe+ 1

2

∫ tf

0(‖z‖2

Qe+ ‖ue‖2

Re)dt, (5b)

1348 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015

Page 3: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

where ‖z‖2F = zT Fz and

Fp = diag{fp1, · · · , fpN} ⊗ In,

Fe = −diag{fe1, · · · , feN} ⊗ In

Qp = diag{qp1, · · · , qpN} ⊗ In,

Qe = −diag{qe1, · · · , qeN} ⊗ In

Rp = diag{r1, · · · , rN } ⊗ In,

Re = re ⊗ In,

and “diag” stands for “diagonal matrix.” Hence, given thesystem dynamics in (4) and performance indices in (5), adifferential nonzero-sum game between the group ofpursuers and the evader is formed. A Nash equilibriumsolution (u∗

p, u∗e ) for this game is defined by the following

inequalities [23]:

Jp(u∗p, u∗

e ) ≤ Jp(up, u∗e ), ∀up ∈ Up (6a)

Je(u∗p, u∗

e ) ≤ Je(u∗p, ue), ∀ue ∈ Ue, (6b)

where Up and Ue are the admissible strategy sets for thepursuers and evader, respectively.

The admissible strategy sets Up and Ue are closelyrelated to the information structures of the pursuers andevader. We assume that both the group of pursuers andevader have feedback information structures, which meansthat their admissible strategies will be functions of thedisplacement vector z at every instant of time t duringthe game. Furthermore, since the evader’s observationrange is assumed to be sufficiently wide, its admissiblestrategy can be implemented as a full feedback of z1, . . . ,zN. However, since each pursuer has only a limitedobservation range, its admissible strategy can only beimplemented as a partial feedback of z1, . . . , zN. We callthese strategies as “structured” and refer to them as us

pi

for i = 1, . . . , N.Mathematically, to accurately model the limited

observation capabilities of the pursuers, we assume thatpursuer i has a sensing range defined by a sensingradius ri > 0. If the Euclidean distance between pursueri and the evader is less than or equal to ri, i.e.,‖zi‖ = ‖xi − xe‖ ≤ ri , then pursuer i is able to observethe evader, otherwise, pursuer i cannot observe the evader.Consequently, we define a binary scalar function hi(t) torepresent pursuer i’s ability to observe the evader at time tas follows:

hi(t) ={

1 if ‖zi‖ ≤ ri

0 if ‖zi‖ > ri

. (7)

Similarly, if the Euclidean distance between pursuer i andpursuer j is less than or equal to ri, i.e., ‖zi – zj‖ =‖xi – xj‖ ≤ ri, then pursuer i is able to observe pursuer j,otherwise, pursuer i cannot observe pursuer j.Consequently, we can use the Laplacian matrix [5] (awidely used tool in cooperative control theory [3, 20]) to

describe the observations among the pursuers at everyinstant of time t. The Laplacian matrix is denoted byL(t) = [Lij (t)] ∈ R

N×N where

Lij (t) =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

−1 if∥∥zi − zj

∥∥ ≤ ri for j = i

0 if∥∥zi − zj

∥∥ > ri forj = i

−N∑

l=1,l =i

Lil if j = i

(8)

for i, j = 1, . . . , N.It is important to note at this point that although the

evader has sufficiently wide observation range to observeall of the pursuers at every instant of time, we will assumethat it has no information on the individual pursuers’observation radii r1, . . . , rN or the capabilities of thepursuers to obverse each other. Therefore, we assume thatduring the game process, the evader has no knowledge ofthe existence of limited observations among the pursuersand the overall information topology. On the other hand,for the pursuers, we assume that all are aware of theirlimited observation capabilities as well as the evader’sunlimited observation capability.

III. FEEDBACK NASH STRATEGIES WITH UNLIMITEDOBSERVATIONS

For the pursuit-evasion problem described in (4) andperformance indices (5), the standard feedback Nashstrategies for all players can be determined as [23]:

u∗p = −R−1

p BTp Ppz (9a)

u∗e = −R−1

e BTe Pez, (9b)

where matrices Pp and Pe are solutions to the coupleddifferential Riccati equations

Pp + Qp − PpBpR−1p BT

p Pp − PpBeR−1e BT

e Pe

− PeBeR−1e BT

e Pp = 0 (10a)

Pe + Qe − PeBpR−1p BT

p Pp − PpBpR−1p BT

p Pe

− PeBeR−1e BT

e Pe = 0. (10b)

with boundary condition Pp(tf) = Fp and Pe(tf) = Fe.To implement the strategies in (9a) and (9b), both theevader and pursuers require unlimited observationsbecause solutions Pe and Pp are matrices of full nonzeroentries.

In this paper, the evader is assumed to have fullinformation about the pursuers, hence, it can employstrategy (9a). On the other hand, the pursuers are assumedto have only limited observations and hence cannotimplement the strategies in (9b). To see how strategy (9b)can be modified to match the limited observations, we

LIN ET AL.: NASH STRATEGIES FOR PURSUIT-EVASION DIFFERENTIAL GAMES 1349

Page 4: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

note that (9b) can be rewritten as:

u∗pi = −R−1

i (−BTi )P T

piz

= R−1i BT

i

N∑j=1

Pij zj

= R−1i BT

i

⎡⎣ N∑

j=1

Pij

⎤⎦ zi + R−1

i BTi

N∑j=1

Pij (zj − zi)

(11)

for all i = 1, . . . , N, where Ppi is the ith nN by n blockcolumn in Pp, i.e., Pp = [Pp1 . . . PpN], and Pij is the jth n byn block matrix in Ppi. The full-information pursuerstrategy (11) consists of two parts. The first part representsa feedback of zi (which is the displacement betweenpursuer i and the evader), and the second part is the sum offeedbacks (z1 − zi) up to (zN − zi) (which are thedisplacement vectors between pursuer i and the rest ofpursuers). Essentially, expression (11) reveals that pursueri will chase the evader not only along its own line of sighttoward the evader but also it takes into account all the linesof sight from all the pursues toward the evader. The reasonfor the pursuers to follow such a strategy is because theevader considers in its strategy the line-of-sightinformation about all the pursuers.

Inspired by expression of (11), a pursuer with limitedobservations can only choose a strategy based on theavailable information but in the form of (11). If, e.g.,pursuer i can observe the evader (i.e., ‖zi‖ = ‖xi − xe‖≤ ri), the first part in (11) can be implemented and henceshould be included. Similarly, if pursuer i can observesome of the other pursuers, the corresponding terms in thesecond part of (11) are implementable and hence shouldbe included for all j satisfying ‖zi − zj‖ = ‖xi − xj‖ ≤ ri.Such a modification represents the best possible effort bythe pursuers from the information perspective, and, asmentioned earlier, the resulting strategy becomes“structured” (with respect to the available information)and will be discussed in details in the next section. Clearly,this structured strategy no longer forms a Nashequilibrium with evader’s strategy (9b). Therefore, adifferent concept or approach is needed to analyze anddesign pursuers’ strategy under limited observations sothat they do form a Nash equilibrium with the evader’sstrategy (9b), which is the subject of the following section.

IV. FEEDBACK NASH STRATEGIESFOR THE PURSUERS

Motivated by the discussions in the previous section,the following structured feedback strategies are proposedfor the pursuers:

uspi

�= hi(t)Kie(t)zi + Kip(t)N∑

j=1

Lij (t)[zj − zi],

which, using the zero-row-sum property of the Laplacianmatrix L, can be reduced to

uspi = hi(t)Kie(t)zi︸ ︷︷ ︸

a

+ Kip(t)N∑

j=1

Lij (t)zj

︸ ︷︷ ︸b

∀i = 1, · · · , N (12)

where the superscript s in uspi means that the strategy is

structured, scalar hi(t) is defined in (7), scalar Lij(t) isdefined in (8), and matrices Kie ∈ R

n×n and Kip ∈ Rn×n

are feedback gains to be determined. Term a in (12)represents the component for pursuer i to chase the evaderdirectly if it observes the evader (i.e., when hi(t) = 1).Term b in (12) is also known as a cooperative protocol,and it ensures that the pursuer also uses all the availableinformation about the rest of pursuers that it can observein addition to the evader. When pursuer i is unable toobserve the evader (i.e., when hi(t) = 0), it has no choicebut to merely collaborate with the nearby pursuers.Strategy (12) can be expressed in a more compactform as

uspi = Kie[0 · · · 0 (hiIn) 0 · · · 0]z

+Kip[(Li1In) · · · (LiNIn)]z

�= MiCiz, (13)

where Mi = [Kie Kip] ∈ Rn×2n and

Ci =[

0 · · · 0 (hiIn) 0 · · · 0

(Li1In) · · · · · · (LiNIn)

],

where hi and Lij are defined in (7) and (8), respectively.Therefore, the pursuers’ control vector us

p =[(us

p1)T · · · (uspN )T ]T can be written as us

p = Mpz,

where

Mp = [(M1C1)T · · · (MNCN )T ]T . (14)

The problem now reduces to finding a set of matricesM∗

1 , · · · , M∗N that are independent of the initial states such

that feedback gain M∗p = [(M∗

1 C1)T · · · (M∗NCN )T ]T and

the resulting pursuers’ strategy

us∗p = M∗

pz (15)

can still form a Nash equilibrium with the evader’sstrategy u∗

e in (9b). Clearly, since the structure of usp is in

the output feedback type with N output feedback channelsC1z, . . . , CNz, one possible approach to finding matricesM∗

1 , · · · , M∗N that ensure that (9b) and (15) form a Nash

equilibrium is to adopt the widely used optimal outputfeedback or structured control design approach in [12, 13],which is also applicable to differential games. The basicidea of this approach when applied to the optimal controlcase is to parameterize the control inputs with givenstructure and optimize the feedback gain directly withrespect to the given performance index. However, for agame problem, because every player’s control input has

1350 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015

Page 5: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

influence on other players’ performance indices values, allthe players’ structured controls and the correspondingfeedback gains need to be simultaneously parameterizedand optimized with respect to the given set of performanceindices to obtain the Nash equilibrium. This cannot beimplemented in our game setup because, as mentionedearlier, the evader will be implementing the Nash strategy(9b) and hence it is not possible to simultaneouslyparameterize and optimize it along with the strategyof the pursuers to form a Nash equilibrium. In whatfollows, we propose a novel Nash strategy designapproach based on a concept of “best achievableperformance indices.”

Because the evader’s Nash strategy in (9b) is fixed, ourapproach is to find a class of performance indicesparameterized by the feedback gain Mp with respect towhich the evader’s strategy in (9b) and pursuers’ strategyin (15) form a Nash equilibrium. Within this class, we thenfind one set of performance indices called the bestachievable performance indices such that they are as closeas possible to the original indices given in (5b). Thepursuers’ Nash strategy is then chosen to be the one with amatrix Mp = M∗

p corresponding to the best achievableperformance indices. To achieve this goal, we first need tofind the class of performance indices corresponding to thestrategies (9b) and (15). Therefore, we propose thefollowing theorem.

THEOREM 1 For the pursuit-evasion game described bysystem dynamics (4) and performance indices (5), for anarbitrary set of matrices M1, . . . , MN, the strategies u∗

e in(9b) and us

p = Mpz form a Nash equilibrium:

J sp(us

p, u∗e ) ≤ J s

p(up, u∗e ), ∀up ∈ Up (16a)

J se (us

p, u∗e ) ≤ J s

e (usp, ue), ∀ue ∈ Ue, (16b)

with respect to performance indices

J sp = 1

2

∥∥z(tf )∥∥2

Fp+ 1

2

∫ tf

0(‖z‖2

Qsp− uT

pSz − zT ST up

+‖up‖2Rp

)dt, (17a)

J se = 1

2

∥∥z(tf )∥∥2

Fe+ 1

2

∫ tf

0(‖z‖2

Qse+ ‖ue‖2

Re)dt, (17b)

where

Qsp = MT

p RpMp − PpBpR−1p BT

p Pp + Qp (18a)

S = BTp Pp + RpMp, (18b)

Qse = −PeBpR−1

p (RpMp + BTp Pp)

−(RpMp + BTp Pp)T R−1

p BTp Pe + Qe (18c)

matrices Pp and Pe are the solutions to (10) and matrix Mp

is as given in (14).

PROOF Consider Lyapunov functions

Vp = 1

2zT Ppz and Ve = 1

2zT Pez. (19)

Differentiating Vp in (19) with respect to t and integratingit from 0 to tf yield

Vp(tf ) − Vp(0)

= 1

2

∫ tf

0[− ‖z‖2

Qsp− ‖u‖2

Rp+ uT

pSz

+ zT ST up + 2zT PpBe(ue + R−1e BT

e Pez)

+ ∥∥up − Mpz∥∥2

Rp]dt.

Hence,

J sp = Vp(0) +

∫ tf

0

1

2

∥∥up − Mpz∥∥2

Rp

+ zT PsBe(ue + R−1e BT

e Pez)dt, (20)

where J sp is defined in (17a). Similarly, we can show that

J se = Ve(0) +

∫ tf

0

1

2

∥∥ue + R−1e BT

e Pez∥∥2

Re

− zT PeBp(up − Mpz)dt, (21)

where J se is defined in (17b). Because Rp and Re are

positive definite, it is obvious from (20) and (21) thatgiven performance indices defined in (17), the inequalitiesin (6) holds for up = Mpz and u∗

e = −R−1e BT

e Pe. Hence,strategies (9b) and (15) form a Nash equilibrium withrespect to performance indices in (17). Clearly, if Mp canbe written asMp = −R−1

p BTp Pp, then J s

p in (17a) becomesidentical to (5b) and J s

e in (17b) becomes identical to (5b).The evader’s strategy (9b) and pursuers’ strategy (15)

form a Nash equilibrium with respect to (17) for anymatrix Mp. The next step is to find a matrix M∗

p such thatthe corresponding pair of performance indices in (17) withMp = M∗

p being closest to the original performanceindices Jp and Je in (5). Toward that end, we propose thefollowing concept of best achievable performance indices.

DEFINITION 1 Given the class of performance indices in(17), if the matrix norms of (Qs

p − Qp), S, and (Qse − Qe)

are simultaneously minimized by a feedback matrix M∗p(t)

for all t, then the set of performance indices J s∗p and J s∗

e

corresponding to this M∗p(t) are called the best achievable

performance indices.

To find the optimal matrix M∗p(t) corresponding to the

best achievable performance indices, we need to solve amultiobjective optimization problem of minimizing∥∥Qs

p − Qp

∥∥ , ‖S‖, and∥∥Qs

e − Qe

∥∥ simultaneously. Oneway to accomplish this is to minimize a convexcombination of these three terms over [0, tf]. Let

φ(Mp(t)) =∫ tf

0H (t)dt, (22)

LIN ET AL.: NASH STRATEGIES FOR PURSUIT-EVASION DIFFERENTIAL GAMES 1351

Page 6: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

where

H (t) = α1

∥∥Qsp − Qp

∥∥2

f+ α2 ‖S‖2

f + α3

∥∥Qse − Qe

∥∥2f

= α1Tr[(Qsp − Qp)2] + α2Tr(ST S)

+ α3Tr[(Qse − Qe)2] (23)

where ‖·‖f is the Frobenius norm with the property‖S‖2

f = Tr(ST S), 0 < αj < 1 for j = 1, 2, 3, and∑3j=1 αj = 1. The minimization problem reduces to

finding a matrix M∗p(t) such that

φ(M∗p(t)) ≤ φ(Mp(t)) ∀Mp(t). (24)

Because Mp is as defined in (14), the minimization in (24)is actually done with respect to M1(t), . . . , MN(t). Thisminimization problem is generally quite difficult to solveanalytically. A possible numerical approach is to usegradient-based iterative algorithms [4] to find matricesM∗

1 (t), · · · , M∗N (t). These algorithms will require an

expression for the gradient of H(t) with respect to Mi(t) fori = 1, . . . , N. This expression can be determined from (23)as follows:

∇MiH = (dT

i ⊗ In)[4α1RpMp(Qsp − Qp)

+ 2α2RpS − 4α3BTp Pe(Qs

e − Qe)]CTi (25)

for all i = 1, . . . , N, where Mp is defined in (14) anddi ∈ R

N is a vector with the ith entry equal to 1 and theother entries equal to 0. Note that using this approach, thederived optimal matrices M∗

1 (t), · · · , M∗N (t) that define the

Nash feedback strategies of the pursuers (15) will beindependent of the initial states. As mentioned earlier inthe introduction section, this is an important property ofour approach because any dependence of these matriceson the initial states would render their use as feedbackgains impractical. Also note that by varying thecoefficients α1, α2, and α3 in (23), a noninferior set of thesolutions can be generated. An appropriate choice of thesecoefficients can be made to place a desired emphasis onthe importance of minimizing each of the three terms in(23) as compared to the other two.

It is important to note at this stage that the algorithmcan be implemented in a distributed manner. This is clearsince it follows from (13) that gradient (25) does notrequire any state information but only the knowledge ofL(t), which, according to (8), has a finite number of values.Optimization algorithm (25) can be performed offline withrespect to possible values of the Laplacian matrix L(t).The proposed method can then be implemented in asetting of distributed communication and coordinationwith low-rate changes of L(t) being relayed to all thepursuers. This can be accomplished because, as the gamebegins and proceeds, distributed information of relativedistance [zi(t) − zj(t)] becomes available, the row entriesof L(t) reflect evader/other-pursuers moving into/out of thesensor range of individual pursuers, and thesemechanical-motion-driven changes in L(t) are muchslower than the sampling rate of locally measuring therelative distances.

Fig. 2. Initial positions of three pursuers and single evader.

V. ILLUSTRATIVE EXAMPLE

In order to illustrate the Nash strategy obtained usingthe best achievable performance indices approach, let usconsider a three-pursuer single-evader differential gametaking place in a planar environment and defined over atime interval [0, 3]. Suppose that xi = [xT

i1 xTi2]T ∈ R

2

represents player i’s position and upi = [uTpi1 uT

pi2]T ∈ R2

represents player i’s velocity control. Hence, in equation(4), we have

Be =

⎡⎢⎣

I2

I2

I2

⎤⎥⎦ and Bp = −

⎡⎢⎣

I2 0 0

0 I2 0

0 0 I2

⎤⎥⎦ .

The performance indices are given by (5) with tf = 3,Fp = Qp = qI6, Rp = I6, Fe = Qe = I6, and Re = I2, whereq is a positive scalar that can be varied to analyze differentscenarios. As shown in Fig. 2, we assume that thepursuers’ initial positions are x1(0) = [–3 0]T, x2(0) =[3 0]T, x3(0) = [5 1]T, the evader’s initial position is xe(0)= [0 1]T, and the pursuers’ sensing radii are the same andequal to 4. Clearly, at t = 0, pursuer 1 can only observe theevader, pursuer 2 can observe the evader and pursers 3,and pursuer 3 can only observe pursuer 2. Further, weassume that the evader is captured if the minimumdistance between the pursuers and evader is less than acapture radius σ = 0.1, which is shown as a light blackcircle centered at the evader in Fig. 2. In this example, wewill consider two different scenarios:

1. Scenario 1: q = 1. Pursuers put equal emphasis onminimizing their distances to the evader and onminimizing their control effort.

2. Scenario 2: q = 5. Pursuers put more emphasis onminimizing their distances to the evader than onminimizing their control effort.

A. Evader’s Strategy

The evader solves the coupled differential Riccatiequations (10) and implements the corresponding Nashstrategy. This yields solutions Pp and Pe in the following

1352 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015

Page 7: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

Fig. 3. Motion trajectories of pursuers and evader for Scenario 1 (seeanimation in Supplemental Video S1).

form:

Pp =

⎡⎢⎣

Pp1 Pp2 Pp2

Pp2 Pp1 Pp2

Pp2 Pp2 Pp1

⎤⎥⎦ ⊗ I2,

Pe =

⎡⎢⎣

Pe1 Pe2 Pe2

Pe2 Pe1 Pe2

Pe2 Pe2 Pe1

⎤⎥⎦ ⊗ I2.

Hence, the evader’s feedback Nash strategies (9b) in termsof z1, z2, and z3 can be expressed as

u∗e = (Pe1 + 2Pe2)(z1 + z2 + z3).

B. Pursuers’ Strategies

To derive the pursuers’ strategy, we assume that forimplementation purpose, the pursuers perform sensingonly at discrete instants of time t0, t1, . . . , t299, where t0 =0, t300 = tf = 3. Since tj − tj–1 = 0.01 is quite small for all j= 1, . . . , 300, we assume that the observations among theplayers can be regarded to be constant within such a smalltime interval (tj − tj–1). We also assume that the pursuerswill carry out the proposed best achievable performanceindices approach with the following (arbitrary) choice ofcoefficients in (23): α1 = 1/4, α2 = 1/2, and α3 = 1/4.

1) Scenario 1: In this scenario, the motion trajectoriesof the pursuers and evader over time are shown in Fig. 3.The distances between the pursuers and evader over timeare shown in Fig. 4 where the capture radius σ = 0.1 isshown in terms of a dashed black horizontal line. Clearly,in this scenario, none of the pursuer is able to capture theevader when the final time tf = 3 is reached. Furthermore,the change in the observations among the players isreflected in the changes of hi(t) in (7) and Laplacianmatrix with Lij(t) defined in (8). In this scenario, the valuesof h1(t), h2(t), h3(t) and the value of the Laplacian matrix

Fig. 4. Distances between pursuers and evader for Scenario 1.

L(t) have changed as follows:

h1(t) ={

1 0 ≤ t ≤ 2.910 2.91 < t ≤ 3

h2(t) ={

1 0 ≤ t ≤ 2.910 2.91 < t ≤ 3

h3(t) = 0 0 < t ≤ 3

and

L(t) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎡⎣ 0 0 0

0 1 −10 −1 1

⎤⎦ 0 ≤ t ≤ 0.16

⎡⎣ 1 −1 0

−1 2 −10 −1 1

⎤⎦ 0.16 < t ≤ 0.52

⎡⎣ 1 −1 0

−1 1 00 0 0

⎤⎦ 0.52 < t ≤ 3

.

The change in hi(t) means that pursuers 1 and 2 loseobservation of the evader after t = 2.91 while pursuer 3was never able to observe the evader for the entire game.The change in the Laplacian matrix essentially means thatonly pursuers 2 and 3 can observe each other for t ∈[0, 0.16], pursuer 2 can observe pursuers 1 and 3 for t ∈(0.16, 0.52] while pursuers 1 and 3 cannot observe eachother at this time interval, and only pursuers 1 and 2 canobserve each other for t ∈ (0.52, 3].

2) Scenario 2: In this scenario, the motion trajectoriesof the pursuers and evader are shown in Fig. 5. Thedistances between the pursuers and evader are shown inFig. 6 where the capture radius σ = 0.1 is shown in termsof a dashed black horizontal line. Clearly, in this scenario,pursuer 2 is the first one to capture the evader at t = 1.2.During the entire game, the values of h1(t), h2(t), h3(t) andthe value of the Laplacian matrix L(t) have changed as

LIN ET AL.: NASH STRATEGIES FOR PURSUIT-EVASION DIFFERENTIAL GAMES 1353

Page 8: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

Fig. 5. Motion trajectories of pursuers and evader for Scenario 2 (Seeanimation in Supplemental Video S2).

Fig. 6. Distances between pursuers and evader for Scenario 2.

follows:

h1(t) = 10 < t ≤ 3

h2(t) = 10 < t ≤ 3

h3(t) ={

0 0 ≤ t ≤ 0.29

1 0.29 < t ≤ 3

and

L(t) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎡⎢⎣

0 0 0

0 1 −1

0 −1 1

⎤⎥⎦ 0 ≤ t ≤ 0.11

⎡⎢⎣

1 −1 0

−1 2 −1

0 −1 1

⎤⎥⎦ 0.11 < t ≤ 0.41

⎡⎢⎣

2 −1 −1

−1 2 −1

−1 −1 2

⎤⎥⎦ 0.41 < t ≤ 3

Fig. 7. Trajectories of pursuers and evader when q = qc = 1.38 (Seeanimation in Supplemental Video S3).

Fig. 8. Distances between pursuers and evader when q = qc = 1.38.

The change of hi(t) means that after t = 0.29, all thepursuers are able to observe the evader. The change of theLaplacian matrix means that only pursuers 2 and 3 canobserve each other for t ∈ [0, 0.11], pursuer 2 can observepursuers 1 and 3 for t ∈ (0.11, 0.41] while pursuers 1 and 3cannot observe each other at this time interval, and all thepursuers are able to observe each other for t ∈ (0.41, 3].

C. Remark

It would be interesting to determine a critical value qc

of q that separates the escape and capture regions of theevader. That is, if q < qc, the evader escapes and if q ≥ qc,the evader is captured at a time instant t ∈ [0, 3]. For thisgame, the critical value of q has been determined to be qc

= 1.38. Fig. 7 shows the motion trajectories of thepursuers and evader when q = qc = 1.38. Fig. 8 shows thedistances between the pursuers and evader when q = qc =1.38, where the capture time occurs at t = 1.55.

VI. CONCLUSION

In this paper, the problem of deriving feedback Nashstrategies that are independent of the initial states for an

1354 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015

Page 9: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

N-pursuer single-evader differential game with quadraticperformance indices over a finite time horizon isconsidered. The evader is assumed to have unlimitedobservations while each pursuer has limited observationsbased on its own sensing radius. Because the evader isable to observe all the pursuers at all times, it implementsthe standard feedback Nash strategy derived from thewell-known coupled differential Riccati equationsapproach. However, for the pursuers whose observationsare limited, we propose a novel approach for them toimplement a collective strategy based on the concept ofbest achievable performance indices. This approach yieldsinitial-states-independent strategies for the pursuers thatsatisfy a Nash equilibrium with the evader’s strategy withrespect to the best achievable performance indices. Anillustrative example involving three pursuers and oneevader is solved and simulation results corresponding todifferent scenarios are presented.

REFERENCES

[1] Bopardikar, S. D., Bullo, F., and Hespanha, J. P.On discrete-time pursuit-evasion games with sensinglimitations.IEEE Transactions on Robotics, 24, 6 (2008),1429–1439.

[2] Bugnon, F. J., and Mohler, R. R.A controllability study for a team linear quadratic game.IEEE Transactions on Aerospace and Electronic Systems, 23,5 (1987), 686–690.

[3] Cao, Y., Yu, W., Ren, W., and Chen, G.An overview of recent progress in the study of distributedmulti-agent coordination.IEEE Transactions on Industrial Informatics, 9, 1 (2013),427–438.

[4] Chong, E. K., and Zak, S. H.An Introduction to Optimization. New York: John Wiley &Sons, 2013.

[5] Chung, F. R.Spectral Graph Theory, Vol. 92. Providence, RI: AMSBookstore, 1997.

[6] Cruz, J., Simaan, M. A., Gacic, A., and Liu, Y.Moving horizon Nash strategies for a military air operation.IEEE Transactions on Aerospace and Electronic Systems, 38,3 (2002), 989–999.

[7] Cruz, J. B., Simaan, M. A., Gacic, A., Jiang, H., Letelliier, B., Li,M., and Liu, Y.Game-theoretic modeling and control of a military airoperation.IEEE Transactions on Aerospace and Electronic Systems, 37,4 (2001), 1393–1405.

[8] Foley, M., and Schmitendorf, W.A class of differential games with two pursuers versus oneevader.IEEE Transactions on Automatic Control, 19, 3 (1974),239–243.

[9] Galati, D. G., and Simaan, M. A.Effectiveness of the Nash strategies in competitive multi-teamtarget assignment problems.

IEEE Transactions on Aerospace and Electronic Systems, 43,1 (2007), 126–134.

[10] Ho, Y., Bryson, A., and Baron, S.Differential games and optimal pursuit-evasion strategies.IEEE Transactions on Automatic Control, AC-10, 4 (1965),385–389.

[11] Isaacs, R.Differential Games. New York: John Wiley and Sons, 1965.

[12] Levine, W., Johnson, T., and Athans, M.Optimal limited state variable feedback controllers for linearsystems.IEEE Transactions on Automatic Control, 16, 6 (1971),785–793.

[13] Lewis, F., Vrabie, D., and Syrmos, V.Optimal Control, 3rd ed. New York: Wiley Interscience,2012.

[14] Li, D., and Cruz, J.Defending an asset: A linear quadratic game approach.IEEE Transactions on Aerospace and Electronic Systems, 47,2 (Apr. 2011), 1026–1044.

[15] Lin, W., Qu, Z., and Simaan, M. A.A design of distributed nonzero-sum Nash strategies.In 49th IEEE Conference on Decision and Control, Atlanta,GA, 2010, 6305–6310.

[16] Lin, W., Qu, Z., and Simaan, M. A.Multi-pursuer single-evader differential games with limitedobservations.In Proceedings of the American Control Conference,Washington, D.C., 2013, 2711–2716.

[17] Liu, Y., Cruz, J. B., and Schumacher, C. J.Pop-up threat models for persistent area denial.IEEE Transactions on Aerospace and Electronic Systems, 43,2 (2007), 509–521.

[18] Liu, Y., Qi, N., and Tang, Z.Linear quadratic differential game strategies with two-pursuitversus single-evader.Chinese Journal of Aeronautics, 25, 6 (2012), 896–905.

[19] Pachter, M.Simple-motion pursuit-evasion in the half plane.Computers & Mathematics with Applications, 13, 1 (1987),69–82.

[20] Qu, Z.Cooperative Control of Dynamical Systems: Applications toAutonomous Vehicles. London: Springer Verlag, 2009.

[21] Simaan, M., and Cruz, J.On the Stackelberg solution in the nonzero-sum games.Journal of Optimization Theory and Applications, 11, 5(1973), 533–555.

[22] Simaan, M., and Cruz, Jr., J. B.Additional aspects of the Stackelberg strategy in nonzero-sumgames.Journal of Optimization Theory and Applications, 11, 6(1973), 613–626.

[23] Starr, A., and Ho, Y.Nonzero-sum differential games.Journal of Optimization Theory and Applications, 3, 3 (1969),184–206.

[24] Wei, M., Chen, G., Cruz, J. B., Haynes, L. S., Pham, K., andBlasch, E.Multi-pursuer multi-evader pursuit-evasion games withjamming confrontation.AIAA Journal of Aerospace Computing, Information, andCommunication, 4, 3 (2007), 693–706.

LIN ET AL.: NASH STRATEGIES FOR PURSUIT-EVASION DIFFERENTIAL GAMES 1355

Page 10: Differential game theory has attracted considerable Nash …ece.ucf.edu/~qu/Journals/2015 IEEE TAES.pdf · 2015-07-16 · I. INTRODUCTION Differential game theory has attracted considerable

Wei Lin (S’10–M’14) received his B.Eng. degree in automation from TongjiUniversity, Shanghai, China, in 2007, M.S. degree in electrical engineering fromBoston University, Boston, MA, in 2009, and Ph.D. degree in electrical engineeringfrom University of Central Florida, Orlando, FL, in 2013 in the area of control systems.He is currently working at Western Digital Corp. in Irvine, CA.

Zhihua Qu (M’90–SM’93–F’09) is the SAIC endowed professor in College ofEngineering and Computer Science, professor and chair of Electrical and ComputerEngineering at the University of Central Florida, and the director of the FEEDERCenter (one of the Department of Energy-funded centers on distributed technologiesand smart grid). His recent research focuses on networked systems and cooperativecontrol, distributed optimization, and their applications to energy/power systems andautonomous vehicles. He is a Fellow of the IEEE and AAAS. He served on Board ofGovernors of IEEE Control Systems Society and as an associate editor for IEEETransactions on Automatic Control and International Journal of Robotics andAutomation (since 1995). Currently, he serves as an associate editor for Automatica andIEEE ACCESS.

Marwan A. Simaan (S’69C–M’72C–SM’79C–F’88C–LF’12) received his Ph.D.degree in electrical engineering from the University of Illinois at Urbana-Champaign.He joined the University of Central Florida in 2008 as the Florida 21st Century Chairand distinguished professor of Electrical Engineering and Computer Science. In 2009,he served as interim dean and then dean of the College of Engineering and ComputerScience until 2012. Prior to joining the University of Central Florida, he was the Bell ofPA/Bell Atlantic Professor and former chairman of the Department of ElectricalEngineering at the University of Pittsburgh. His research covers a broad spectrum oftopics in game theory, control, optimization, and signal processing. He has publishedfive books, more than 125 refereed journal papers and book chapters, and more than210 papers in conference proceedings. Dr. Simaan is a registered Professional Engineerin Pennsylvania. He is a member of US National Academy of Engineering (NAE) and aFellow of the American Society for Engineering Education (ASEE), the AmericanAssociation for the Advancement of Science (AAAS), and the American Institute forMedical and Biological Engineering (AIMBE). He currently serves or has served onnumerous IEEE boards and committees, including the IEEE Fellow Committee, theIEEE Education Medal Committee, the IEEE Access Editorial Board, the IEEEProceedings Editorial Board, and others. He was the recipient of two IEEE best paperawards from the Geoscience and Remote Sensing Society (1985) and the IndustryApplications Society (1999). In 1995, he was named a Distinguished Alumnus of theDepartment of Electrical and Computer Engineering, University of Illinois. In 2007, hereceived the IEEE William E. Sayle II Award for Achievement in Education. In 2008 hereceived the University of Illinois, College of Engineering Award for DistinguishedService in Engineering.

1356 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 51, NO. 2 APRIL 2015