15
1 Risk-Sensitive Mean-Field Games Hamidou Tembine, Quanyan Zhu, Tamer Bas ¸ar Abstract—In this paper, we study a class of risk-sensitive mean- field stochastic differential games. We show that under appropri- ate regularity conditions, the mean-field value of the stochastic differential game with exponentiated integral cost functional coincides with the value function satisfying a Hamilton-Jacobi- Bellman (HJB) equation with an additional quadratic term. We provide an explicit solution of the mean-field best response when the instantaneous cost functions are log-quadratic and the state dynamics are affine in the control. An equivalent mean- field risk-neutral problem is formulated and the corresponding mean-field equilibria are characterized in terms of backward- forward macroscopic McKean-Vlasov equations, Fokker-Planck- Kolmogorov equations, and HJB equations. We provide numeri- cal examples on the mean field behavior to illustrate both linear and McKean-Vlasov dynamics. I. I NTRODUCTION Most formulations of mean-field (MF) models such as anonymous sequential population games [22], [8], MF stochas- tic control [20], [18], [39], MF optimization, MF teams [36], MF stochastic games [37], [1], [36], [34], MF stochastic difference games [16], and MF stochastic differential games [26], [15], [35] have been of risk-neutral type where the cost (or payoff, utility) functions to be minimized (or to be maximized) are the expected values of stage-additive loss functions. Not all behavior, however, can be captured by risk-neutral cost functions. One way of capturing risk-seeking or risk- averse behavior is by exponentiating loss functions before expectation (see [3], [21] and the references therein). The particular risk-sensitive mean-field stochastic differen- tial game that we consider in this paper involves an exponential term in the stochastic long-term cost function. This approach was first taken by Jacobson in [21], when considering the risk-sensitive Linear-Quadratic-Gaussian (LQG) problem with state feedback. Jacobson demonstrated a link between the exponential cost criterion and deterministic linear-quadratic differential games. He showed that the risk-sensitive approach provides a method for varying the robustness of the controller We are grateful to many seminar and conference participants such as those in the Workshop on Mean Field Games (Rome, Italy, May 2011) and IFAC World Congress (Milan, Italy, August-September 2011) for their valuable comments and suggestions on the preliminary versions of this work. An earlier version of this work appeared in the Proceedings of the 18th IFAC World Congress (Milan, Italy; August 29 - September 2, 2011). Research of second and third authors was supported in part by the U.S. Air Force Office of Scientific Research (AFOSR) under the MURI Grant FA9550-10-1-0573. The first author acknowledges the financial support from the CNRS mean-field game project “MEAN-MACCS”. H. Tembine is with Ecole Sup´ erieure d’Electricit´ e (SUPELEC), France. E-mail: [email protected] Q. Zhu and T. Bas ¸ar are with Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, Univer- sity of Illinois at Urbana-Champaign, Urbana, IL, USA. {zhu31, basar1}@illinois.edu and noted that in the case of no risk, or risk-neutral case, the well known LQR solution would result (see, for follow-up work on risk-sensitive stochastic control problems with noisy state measurements, [38], [7], [30]). We examine here the risk-sensitive stochastic differential game in a regime of large population of players. We first present a mean-field stochastic differential game model where the players are coupled not only via their risk-sensitive cost functionals but also via their states. The main coupling term is the mean-field process, also called the occupancy process or population profile process. Each player reacts to the mean field or a subset of the mean field generated by the states of some groups of players, and at the same time the mean field evolves according to a controlled Kolmogorov forward equation. Our contribution can be summarized as follows. Using a particular structure of state dynamics, we derive the mean- field limit of the individual state dynamics leading to a non- linear controlled macroscopic McKean-Vlasov equation [24]. Combining this with a limiting risk-sensitive cost functional, we arrive at a framework where the mean-field response can be characterized, and establish its compatibility with the density distribution using the controlled Fokker-Planck-Kolmogorov forward equation. The mean-field equilibria are then charac- terized by coupled backward-forward equations. In general a backward-forward system may not have solution (a simple example is provided in section III-D). But in the case of the affine-exponentiated-Gaussian mean-field game, we provide an explicit solution of the corresponding Hamilton-Jacobi- Bellman (HJB) equation. We further formulate an equivalent risk-neutral mean-field problem (in terms of the value func- tion), and characterize the solution of the mean-field game. Finally, we provide a sufficiency condition for having at most one smooth solution to the risk-sensitive mean field system in the local sense. The rest of the paper is organized as follows. In Section II, we first describe the general model adopted in the paper (Sub- section II-A), and then provide an overview of the mean-field convergence result (Subsection II-B), and finally introduce the cost (Subsection II-C). In Section III, we introduce the risk- sensitive mean-field stochastic differential game formulation and its equivalences. In Section IV, we analyze a special class of risk-sensitive mean-field games where the state dynamics are linear and independent of the mean field. In Section V, we provide a numerical example, and section VI concludes the paper. Two appendices provide background on the important property of indistinguishability, and proofs of some of the main results in the main body of the paper. We summarize some of the notations used in the paper in Table I.

Risk-Sensitive Mean-Field Games

Embed Size (px)

Citation preview

1

Risk-Sensitive Mean-Field GamesHamidou Tembine, Quanyan Zhu, Tamer Basar

Abstract—In this paper, we study a class of risk-sensitive mean-field stochastic differential games. We show that under appropri-ate regularity conditions, the mean-field value of the stochasticdifferential game with exponentiated integral cost functionalcoincides with the value function satisfying a Hamilton-Jacobi-Bellman (HJB) equation with an additional quadratic term.We provide an explicit solution of the mean-field best responsewhen the instantaneous cost functions are log-quadratic and thestate dynamics are affine in the control. An equivalent mean-field risk-neutral problem is formulated and the correspondingmean-field equilibria are characterized in terms of backward-forward macroscopic McKean-Vlasov equations, Fokker-Planck-Kolmogorov equations, and HJB equations. We provide numeri-cal examples on the mean field behavior to illustrate both linearand McKean-Vlasov dynamics.

I. INTRODUCTION

Most formulations of mean-field (MF) models such asanonymous sequential population games [22], [8], MF stochas-tic control [20], [18], [39], MF optimization, MF teams [36],MF stochastic games [37], [1], [36], [34], MF stochasticdifference games [16], and MF stochastic differential games[26], [15], [35] have been of risk-neutral type where thecost (or payoff, utility) functions to be minimized (or to bemaximized) are the expected values of stage-additive lossfunctions.

Not all behavior, however, can be captured by risk-neutralcost functions. One way of capturing risk-seeking or risk-averse behavior is by exponentiating loss functions beforeexpectation (see [3], [21] and the references therein).

The particular risk-sensitive mean-field stochastic differen-tial game that we consider in this paper involves an exponentialterm in the stochastic long-term cost function. This approachwas first taken by Jacobson in [21], when considering therisk-sensitive Linear-Quadratic-Gaussian (LQG) problem withstate feedback. Jacobson demonstrated a link between theexponential cost criterion and deterministic linear-quadraticdifferential games. He showed that the risk-sensitive approachprovides a method for varying the robustness of the controller

We are grateful to many seminar and conference participants such as thosein the Workshop on Mean Field Games (Rome, Italy, May 2011) and IFACWorld Congress (Milan, Italy, August-September 2011) for their valuablecomments and suggestions on the preliminary versions of this work.

An earlier version of this work appeared in the Proceedings of the 18thIFAC World Congress (Milan, Italy; August 29 - September 2, 2011). Researchof second and third authors was supported in part by the U.S. Air Force Officeof Scientific Research (AFOSR) under the MURI Grant FA9550-10-1-0573.The first author acknowledges the financial support from the CNRS mean-fieldgame project “MEAN-MACCS”.

H. Tembine is with Ecole Superieure d’Electricite (SUPELEC), France.E-mail: [email protected]

Q. Zhu and T. Basar are with Coordinated Science Laboratoryand the Department of Electrical and Computer Engineering, Univer-sity of Illinois at Urbana-Champaign, Urbana, IL, USA. {zhu31,basar1}@illinois.edu

and noted that in the case of no risk, or risk-neutral case,the well known LQR solution would result (see, for follow-upwork on risk-sensitive stochastic control problems with noisystate measurements, [38], [7], [30]).

We examine here the risk-sensitive stochastic differentialgame in a regime of large population of players. We firstpresent a mean-field stochastic differential game model wherethe players are coupled not only via their risk-sensitive costfunctionals but also via their states. The main coupling term isthe mean-field process, also called the occupancy process orpopulation profile process. Each player reacts to the mean fieldor a subset of the mean field generated by the states of somegroups of players, and at the same time the mean field evolvesaccording to a controlled Kolmogorov forward equation.

Our contribution can be summarized as follows. Using aparticular structure of state dynamics, we derive the mean-field limit of the individual state dynamics leading to a non-linear controlled macroscopic McKean-Vlasov equation [24].Combining this with a limiting risk-sensitive cost functional,we arrive at a framework where the mean-field response can becharacterized, and establish its compatibility with the densitydistribution using the controlled Fokker-Planck-Kolmogorovforward equation. The mean-field equilibria are then charac-terized by coupled backward-forward equations. In general abackward-forward system may not have solution (a simpleexample is provided in section III-D). But in the case of theaffine-exponentiated-Gaussian mean-field game, we providean explicit solution of the corresponding Hamilton-Jacobi-Bellman (HJB) equation. We further formulate an equivalentrisk-neutral mean-field problem (in terms of the value func-tion), and characterize the solution of the mean-field game.Finally, we provide a sufficiency condition for having at mostone smooth solution to the risk-sensitive mean field system inthe local sense.

The rest of the paper is organized as follows. In Section II,we first describe the general model adopted in the paper (Sub-section II-A), and then provide an overview of the mean-fieldconvergence result (Subsection II-B), and finally introduce thecost (Subsection II-C). In Section III, we introduce the risk-sensitive mean-field stochastic differential game formulationand its equivalences. In Section IV, we analyze a special classof risk-sensitive mean-field games where the state dynamicsare linear and independent of the mean field. In Section V, weprovide a numerical example, and section VI concludes thepaper. Two appendices provide background on the importantproperty of indistinguishability, and proofs of some of the mainresults in the main body of the paper. We summarize some ofthe notations used in the paper in Table I.

2

TABLE ISUMMARY OF NOTATIONS

Symbol Meaningf drift function (finite dimensional)σ diffusion function (finite dimensional)xn

j (t) state of Player j in a population of size nx j(t) solution of macroscopic McKean-Vlasov equationx j(t) limit of state process xn

j (t)U j space of feasible control actions of Player jγ j state feedback strategy of Player jγ j individual state-feedback strategy of Player jΓ j set of admissible state feedback strategies of Player jΓ j set of admissible individual state-feedback strategies of Player ju j control action of Player j under a generic control strategyc instantaneous cost functiong terminal cost functionδ risk-sensitivity indexB j standard Brownian motion process for Player j’s dynamicsE Expectation operatorL risk-sensitive cost functional∂x partial derivative with respect to x (gradient)∂ 2

xx second partial derivative (Hessian operator) with the respect to xx′ transpose of xmn

t empirical measure of the states of the playersmt limit of mn

t as n→ ∞

mn limit of mnt as t→ ∞

tr(M) trace of a square matrix M, i.e., tr(M) := ∑i Mii.A� B A−B is positive definite, where A, B are square symmetric

matrices of the same dimension.

II. THE PROBLEM SETTING

A. General description of the game dynamics

We consider a class of n−person stochastic differentialgames, where Player j’s individual state, xn

j , evolves accordingto the Ito stochastic differential equation (S) as follows, fort ≥ 0:

dxnj(t) = 1

n

n

∑i=1

f ji(t,xnj(t),u

nj(t),x

ni (t))dt

+√

ε

n

n

∑i=1

σ ji(t,xnj(t),u

nj(t),x

ni (t))dB j(t),

xnj(0) = x j,0 ∈X ⊆ Rk, k ≥ 1, j ∈ {1, . . . ,n},

(S)

where xnj(t) is the k-dimensional state of Player j; un

j(t) ∈U j,is the control of Player j at time t with U j being a subset ofthe p j-dimensional Euclidean space Rp j ; B j(t) are mutuallyindependent standard Brownian motion processes in Rk; andε is a small positive parameter, which will play a role in theanalysis in later sections. We will assume in (S) that there issome symmetry in f ji and σ ji, in the sense that there existf and σ (conditions on which will be specified shortly) suchthat for all j and i,

f ji(t,xnj(t),u

nj(t),x

ni (t))≡ f (t,xn

j(t),unj(t),x

ni (t))

and

σ ji(t,xnj(t),u

nj(t),x

ni (t))≡ σ(t,xn

j(t),unj(t),x

ni (t)) .

The system (S) is a controlled McKean-Vlasov dynamics. His-torically, the McKean-Vlasov stochastic differential equation(SDE) is essentially a mean field forward SDE suggested byKac in 1956 as a stochastic toy model for the Vlasov kinetic

equation of plasma [23]. Its study was initiated by McKean in1966, and since then several papers have addressed McKean-Vlasov type SDEs and related applications; see, for example,[11].

The uncontrolled version of the state dynamics (S) capturesmany interesting problems involving interactions betweenagents. We list below a few examples.

Example 1 (Stochastic Kuramoto model). Consider n os-cillators where each of the oscillators (players) is consid-ered to have its own intrinsic natural frequency ω j, andeach is coupled symmetrically to all other oscillators. Withf ji(xi,x j)= f (xi,ui,x j)=K sin(xi−x j)+ω j and σ ji a constantin (S), what we have is a (stochastic) Kuramoto oscillator[25]. The goal here is convergence to some common value(consensus) or alignment of the players’ parameters. Thestochastic Kuramoto model is thus given by (replacing x withθ , as is conventional in this context):

dθ j(t) =

(ω j(t)+

Kn

n

∑i=1

sin(θi(t)−θ j(t))

)dt +DdB j(t),

where D,K > 0.

Example 2 (Stochastic Cucker-Smale dynamics:). Considera population, say a flock of birds (or a swarm of fish) thatmove in a three-dimensional space. It has been observed thatfor some initial conditions, for example on their positionsand velocities, the state of the flock converges to one inwhich all birds fly with the same velocity. See, for exam-ple, Cucker-Smale flocking dynamics [10], [9] where eachvector xi = (yi,vi) is composed of position dynamics andvelocity dynamics of the corresponding bird (player). Withf (x j,u j,xi) = (ε2+ ‖ x j − xi ‖2)−α c(x j − xi) in (S), whereε > 0,α > 0 and c(·) an odd continuous function, one arrivesat a generic class of consensus algorithms developed forflocking problems.

Example 3 (Temperature dynamics for energy-efficient build-ings). Consider a heating system serving a finite number ofzones. In each zone, the goal is to maintain the temperatureat a certain level. Denote by Tj the temperature of zone j,and by T ext the ambient temperature. The law of conservationof energy can be written down as described by the followingequation for zone j,

dTj(t) = σdB j(t)+[

r j(t)+γ

β(T ext(t)−Tj(t))

+∑i 6= j

αi j(t)(Ti(t)−Tj(t))

]dt,

where r j denotes the input rate of the heater in zone j,γ,β > 0, αi j is the thermal conductance between zone i andzone j, and σ is a small variance term. The evolution ofthe temperature has a McKean-Vlasov structure of the typein system (S). We can actually introduce a control variableinto r j such that the heater can be turned on and off in eachzone.

The three examples above can be viewed as special cases ofthe system (S). The controlled dynamics in (S) allows one to

3

address several interesting questions, such as: How to controlthe flocking dynamics and consensus algorithms in the firsttwo examples above to a certain target? How to control thetemperature in the third example in order to achieve a specificthermal comfort level while minimizing energy cost? In orderto define the controlled dynamical system in precise terms,we have to specify the nature of information that players areallowed in the choice of their control at each point in time.This brings us to the first definition below.

Definition 1. A state-feedback strategy for Player j is amapping γ j : R+×(Rk)n −→U j, whereas an individual state-feedback strategy for Player j is a mapping γ j : R+×Rk −→U j.

Note that the individual state-feedback strategy involvesonly the self state of a player, whereas the state-feedbackstrategy involves the entire nk−dimensional state vector. Theindividual strategy spaces in each case have to be chosen insuch a way that the resulting system of stochastic differentialequations (S) admits a unique solution (in the sense specifiedshortly) when the players pick their strategies independently;furthermore, as standard, we take the strategy sets to be timeinvariant and independent of the controls. We denote by Γ jthe set of such admissible control laws γ j : [0,T ]×Rk →U jfor Player j; a similar set, Γ j, is defined for state-feedbackstrategies γ j of Player j.

We assume the following standard conditions on f ,σ , γ j,and the action sets U j, for all j = 1,2, . . . ,n.Assumption (i) f (t,x,u,y) is C1 in (t,x,u,y), and Lipschitzin (x,u,y).Assumption (ii) The entries of the matrix σ are C2 and σσ ′

is strictly positive;Assumption (iii) f and ∂x f are uniformly bounded;Assumption (iv) U j is non-empty, closed, and bounded;Assumption (v) γ j : [0,T ]×Rk−→U j is piecewise continuousin t and Lipschitz in x.

Normally, when we have a cost function for Player j,which depends also on the state variables of the other players,either directly, or implicitly through the coupling of the statedynamics (as in (S)), then any state-feedback Nash equilibriumsolution will generally depend not only on self states but alsoon the other states, i.e., it will not be in the set Γ j, j = 1, . . . ,n.However, since this paper aims to characterize the solution inthe high-population regime (i.e., as n→ ∞), the dependenceon other players’ states will be only through the distributionof the player states. Hence each player will respond (in anoptimal, cost minimizing manner) to the behavior of the masspopulation and not to behaviors of individual players. Validityof this claim will be established later in Section III, but inanticipation of this, we first introduce the quantity

mnt =

1n

n

∑j=1

δxnj (t)

, (1)

as an empirical measure of the collection of states of theplayers, where δ is a Dirac measure on the state space. Thisenables us to introduce the long-term cost function of Playerj (to be minimized by him) in terms of only the self variables(x j and u j) and the mass mn

t , t ≥ 0, where the latter can be

viewed as an exogenous process (not directly influenced byPlayer j). But we first introduce a mean-field representationof the dynamics (S), which depends on mn

t , and will be usedin the description of the cost.

B. Mean-field representation

The system (S) can be written into a measure representationusing the formula∫

φ(w)

[n

∑i=1

ωiδxi

](dw) =

n

∑i=1

ωiφ(xi),

where δz,z ∈X is a Dirac measure concentrated at z, φ is ameasurable bounded function defined on the state space, andωi ∈ R. Then, the system (S) reduces to the system

dxnj(t) =

(∫w

f (t,xnj(t),u

nj(t),w)

[1n

n

∑i=1

δxni (t)

](dw)

)dt

+√

ε

(∫w

σ(t,xnj(t),u

nj(t),w)

[1n

n

∑i=1

δxni (t)

](dw)

)dB j(t),

xnj(0) = x j,0 ∈ Rk, k ≥ 1, j ∈ {1, . . . ,n},

which, by (1), is equivalent to the following system (SM):

dxnj(t) =

(∫w

f (t,xnj(t),u

nj(t),w)m

nt (dw)

)dt

+√

ε

(∫w

σ(t,xnj(t),u

nj(t),w)m

nt (dw)

)dB j(t),

xnj(0) = x j,0 ∈ Rk, k ≥ 1, j ∈ {1, . . . ,n}.

(SM)The above representation of the system (SM) can be seen as acontrolled interacting particles representation of a macroscopicMcKean-Vlasov equation where mn

t represents the discretedensity of the population. Next, we address the mean fieldconvergence of the population profile process mn. To do so,we introduce the key notion of indistinguishability, which is abasic property of our model. For the benefit of the reader,we discuss below the existence of a limiting measure andmean-field convergence in the framework of de Finetti-Hewitt-Savage [12], [17], [2] (see Appendix I).

Definition 2 (Indistinguishability). We say that a family of pro-cesses (xn

1,xn2, . . . ,x

nn) is indistinguishable (or exchangeable)

if the law of xn is invariant by permutation over the indexset {1, . . . ,n}, i.e., for any permutation π over {1,2, . . . ,n},one has L (xn

j1 , . . . ,xnjn) = L (xn

π( j1), . . . ,xn

π( jn)), where L (X)

denotes the law of the random variable X .

Observing that the index does not matter in the functions fand σ , the key term that drives the dynamics is the processmn

t whose distribution remains unchanged by permutation. Thenext lemma follows immediately.

Lemma 1. A solution of (S) (and hence of (SM)) obtainedunder homogeneous control strategies γ j, i.e., γ j = γ for alli ∈N , generates indistinguishable processes.

For indistinguishable (exchangeable) processes, the conver-gence of the empirical measure has been studied extensively.We have provided in Appendix I the fundamental result as

4

Theorem 4; see also [32] and the references therein. Topreserve this property for the controlled system we restrictourselves to admissible homogeneous control strategies. Then,from Theorem 4, the mean field convergence becomes equiv-alent to the existence of a random measure µ such that thesystem is µ−chaotic, i.e.,

limn

∫ L

∏l=1

φl(xnjl )µ

n(dxn) =L

∏l=1

(∫φldµ

),

for any fixed natural number L ≥ 2 and a collection ofmeasurable bounded functions {φl}1≤l≤L defined over the statespace X .

Following the indistinguishability property, one has that thelaw of xn

j = (xnj(t), t ≥ 0) is E[mn]. Knowing that mn goes

to µ in law, the individual state process xnj conditioned on µ

becomes independent of the system size n and hence has alimiting process x j as n grows. The distribution of x j(t) isobtained through the weak convergence of the individual statedynamics to a macroscopic McKean-Vlasov equation (see laterProposition 5). Then, when the initial states are i.i.d. and givensome homogeneous control actions u, the solution of the statedynamics generates an indistinguishable random process andthe weak convergence of the population profile process mn toµ is equivalent to µ−chaoticity.

Note that processes depend implicitly on the strategies usedby the players, and that an admissible control law γ maydepend on time t, the value of the individual state x j(t) and themean-field process mt . The weak convergence of the processmn implies the weak convergence of its marginal mn

t to mtand one can characterize the distribution of mt by the Fokker-Planck-Kolmogorov (FPK) equation:

∂tmt +D1x

(mt

∫w

f (t,x,u(t),w)mt(dw))

2D2

xx

(mt

(∫w

σ′(t,x,u(t),w)mt(dw)

)·(∫

wσ(t,x,u(t),w)mt(dw)

)). (2)

Here f (·) ∈ Rk, which denotes by ( fk′(·))1≤k′≤k, where fk′ isscalar. We let

σ [t,x,u(t),mt ] :=∫

wσ(t,x,u(t),w)mt(dw),

Γ(·) := σ(·)σ ′(·) is a square matrix with dimension k×k. Theterm D1

x(·) denotes

k

∑k′=1

∂xk′

(mt

∫w

fk′(t,x,u(t),w)mt(dw)),

and the last term on D2xx(·) is

k

∑k′′=1

k

∑k′=1

∂ 2

∂xk′∂xk′′(mtΓk′k′′(·)) .

In the one-dimensional case, the terms D1,D2 reduce to thedivergence “div” and the Laplacian operator ∆, respectively.

It is important to note that the existence of a unique restpoint (distribution) in FPK does not automatically imply that

the mean-field converges to the rest point when t goes toinfinity. This is because the rest point may not be stable.

Remark 1. In mathematical physics, convergence to an in-dependent and identically distributed system is sometimesreferred to as chaoticity [31], [32], [13], and the fact thatchaoticity at the initial time leads to chaoticity at further timesis known as propagation of chaos. In our setting, however,the chaoticity property needs to be studied together with thecontrols of the players. In general the chaoticity property maynot hold. One particular case should be mentioned, which iswhen the rest point m∗ is related to the δm∗− chaoticity. If themean-field dynamics has a unique global attractor m∗, thenthe propagation of chaos property holds for the measure δm∗ .Beyond this particular case, one may have multiple rest pointsand also the double limit, limn limt mn

t , may depend on theorder of the limiting operations, leading a non-commutativediagram; for an instance of this, see [33].

C. Cost Structure

We now introduce the cost functions associated with thedifferential game. Risk-sensitive behaviors can be captured bycost functions which exponentiate loss functions before theexpectation operator. For each t ∈ [0,T ], and mn

t ,x j initializedat a generic feasible pair m,x at t, the risk-sensitive costfunction for Player j is given by

L(γ j,mn[t,T ]; t,x,m) =

δ logE

e1δ[g(xT )+

∫ T

tc(s,xn

j(s),unj(s),m

n(s)) ds]

∣∣∣∣∣ x j(t) = x,mnt = m

), (3)

where c(·) is the instantaneous cost at time s; g(·) is theterminal cost; δ > 0 is the risk-sensitivity index; mn

[t,T ] denotesthe process {mn

s , t ≤ s ≤ T}; and unj(s) = γ j(s,xn

j(s),mn(s)),

with γ j ∈ Γ j. Note that because of the symmetry assumptionacross players, the cost function of Player j is not indexed byj, since it is in the same structural form for all players. Thisis still a game problem (and not a team problem), however,because each such cost function depends only on the selfvariables (indexed by j for Player j) as well as the commonpopulation variable mn.

We invoke the following standard conditions on c and g.Assumption (vi) c is C1 in (t,x,u,m); g is C2 in x; c,g arenon-negative;Assumption (vii) c,∂xc,g,∂xg are uniformly bounded.

The cost function (3) is called a risk-sensitive cost func-tional or an exponentiated integral cost, which measures risk-sensitivity for the long-run and not at each instant of time(see [21], [38], [7], [3]). We note that the McKean-Vlasovmean field game considered here differs from the model in[19]; specifically, in this paper, the volatility term in (SM) is afunction of state, control and the mean field, and further, thecost functional is of the risk-sensitive type.

5

Remark 2 (Connection with mean-variance cost). Considerthe function cλ : λ 7−→ 1

λlog(EeλC). It is obvious that the

risk-sensitive cost cλ takes into consideration all the momentsof the cost C, and not only its mean value. Around zero, theTaylor expansion of cλ is given by

cλ ≈︸︷︷︸λ∼0

E(C)+λ

2var(C)+o(λ ),

where, for small λ , the dominating terms are the mean valueand the variance of the cost. Hence risk-sensitive cost entailsa weighted sum of the mean and variance of the cost, to somelevel of approximation.

With the dynamics (SM) and cost functionals as introduced,we seek an individual state-feedback non-cooperative Nashequilibrium {γ∗i , i ∈ {1, . . . ,n}}, satisfying the set of inequali-ties

L(γ∗j ,mn[0,T ];0,x j,0,m)≤ L(γ j, m

n, j[0,T ];0,x j,0,m), (4)

for all γ j ∈ Γ j, j ∈ {1,2, · · · ,n}, where mn[0,T ] is generated by

the γ∗j ’s, and mn, j[0,T ] by (γ, γ∗− j), γ∗− j = {γ∗i , i = 1,2, · · · ,n, i 6=

j}; u∗j and u j are control actions generated by control lawsγ∗j and γ j, respectively, i.e., u∗j = γ∗j (t,x j) and u j = γ j(t,x j);mn

t =mnt [u∗] laws are given by the forward FPK equation under

the strategy γ∗, and mn, jt = mn, j

t [u j,u∗− j] is the induced measureunder the strategy (γ j, γ

∗− j).

A more stringent equilibrium solution concept is that ofstrongly time-consistent individual state-feedback Nash equi-librium, satisfying

L(γ∗j ,mn[t,T ]; t,x j,m)≤ L(γ j, m

n, j[t,T ]; t,x j,m), (5)

for all x j ∈X , t ∈ [0,T ), γ j ∈ Γ j, j ∈ {1,2, · · · ,n}.Note that the two measures mn

t and mn, jt differ only in the

component j and have a common term which is 1n ∑ j′ 6= j δxn

j′ (t),

which converges in distribution to some measure with adistribution which is a solution of the forward PFK partialdifferential equation.

III. RISK-SENSITIVE BEST RESPONSE TO MEAN-FIELD,AND EQUILIBRIUM

In this section, we present the risk-sensitive mean-fieldresults. We first provide an overview of the mean-field(feedback) best response for a given mean-field trajectorymn = (mn(s), s ≥ 0). A mean-field best-response strategy ofa generic Player j to a given mean field mn

t is a measurablemapping γ∗j satisfying: ∀ γ j ∈ Γ j, with x j and mn

t initializedat x j,0,m, respectively,

L(γ∗j ,mn[0,T ],0,x j,0,m)≤ L(γ j,mn

[0,T ],0,x j,0,m).

where law of mnt is given by the forward FPK equation

in the whole space X n, and is an exogenous process. Letvn(t,x j,m) = infu j L(u j,mn

[0,T ], t,x j,m). The next propositionestablishes the risk-sensitive Hamilton-Jacobi-Bellman (HJB)equation of the risk-sensitive cost function satisfied by asmooth optimal value function of a generic player. The maindifference from the standard HJB equation is the presence ofthe term (ε/2δ ) ‖ σ∂x j v

n ‖2 .

Proposition 1. Suppose that the trajectory of mnt is given.

If vn is twice continuously differentiable, then vn solves therisk-sensitive HJB equation

∂tvn + infu j∈U j

{f ·∂x j v

n +ε

2tr(σσ

′∂

2x jx j

vnj)+

ε

2δ‖ σ∂x j v

n ‖2

+c}= 0,vn(T,x j) = g(x j).

Moreover, any strategy satisfying

γnj (·) ∈ arg inf

u j∈U j

{f ·∂x j v

n +ε

2tr(σσ

′∂

2x jx j

vn)+ε

‖ σ∂x j vn ‖2 +c

}, (6)

constitutes a best response strategy to the mean-field mn.

Proof of Proposition 1: For feasible initial conditions xand m, we define

φn(t,x,m) := inf

γ j∈Γ jE(

e1δ[g(xT )+

∫ Tt c(s,xn(s),u j(t),mn

s ) ds]

| x j(t) = x,mnt = m) . (7)

It is clear that vn(t,x j,m) = infL= δ logφ n(t,x j,m). Under theregularity assumptions of Section II, the function φ n is C1 int and C2 in x. Using Ito’s formula,

dφn(t,x j) = [∂tφ

n(t,x j)+ f ·∂x j φn +

ε

2tr(σσ

′∂

2x jx j

φn)]dt.

Using the Ito-Dynkin’s formula (see [29], [7], [30]), we obtain

infu j∈U j{dφ

n +1δ

cφndt}= 0 ,

and thus

∂tφn + inf

u j∈U j

{f ·∂x j φ

n +ε

2tr(σσ

′∂

2xxφ

n)+1δ

cφn}

= 0,

φn(T,x j) = e

g(x j).

To establish the connection with the risk-sensitive cost value,we use the relation φ n = e

vn. One can compute the partial

derivatives:

∂tφn = (∂tvn)

φn, ∂x j φ

n =(∂x j v

n) 1δ

φn,

and

∂2x jx j

φn =

(∂

2x jx j

vn) 1

δφ

n +1

δ 2

(∂x j v

n)′ (∂x j v

n)φ

n,

where the latter immediately yields

tr(∂ 2x jx j

φnσσ′) = tr(∂ 2

x jx jvn

σσ′)

φn +

1δ 2 ‖ σ∂x j v

n ‖2φ

n.

Collecting terms together, and dividing by φ n/δ , we arrive atthe HJB equation (6).

Remark 3. Let us introduce the Hamiltonian H as

H(t,x, p,M) = infu

{p · f +

ε

2tr(σσ

′M)+ε

2δ‖ σ p ‖2 +c

},

for a vector p and a matrix M which is the same as the Hessianof vn.

6

If σ does not depend on the control variable(s), then theabove expression reduces to

infu{ p · f + c}+ ε

2tr(σσ

′M)+ε

2δ‖ σ p ‖2,

and the term to be minimized is H2(t,x, p,M) = infu{p · f +c},which is related to the Legendre-Fenchel transform for lineardynamics, i.e., the case where f is linear in the control u. Inthat case,

∂pH2(t,x, p,M) = αu∗

for some non-singular matrix α of proper dimensions. Thissays that the derivative of the modified Hamiltonian is relatedto the optimal feedback control. Now, for the non-linear driftf , the same technique can be used but the function f needsto be inverted to obtain a generic closed-form expression forthe optimal feedback control, which is given by

u∗j = g−1(∂pH2(t,x, p,M)),

where g−1 is the inverse of the map

u 7−→ f (t,x,u,m).

This generic expression of the optimal control will play an im-portant role in non-linear McKean-Vlasov mean-field games.

The next proposition provides the best-response control tothe affine-quadratic in u-exponentiated cost-Gaussian mean-field game, and the proposition that follows it deals with thecase of affine-quadratic structure (in both u and x).

Proposition 2. Suppose that σ(t,x) = σ(t), andf (t,x j,u j,m) = f (t,x j,m)+B(t,x j,m)u j,

c(t,x j,u j,m) = c(t,x j,m)+ ‖ u j ‖2 .

Then, the best-response control of Player j is γn,∗j =− 1

2 B∂x j vn.

Proof: Following Proposition 1, we know that

un,∗j = γ

n,∗j (·)

∈ argminu j{c(t,x j(t),u j(t),mt)+ f (t,x j(t),u j,mt) ·∂x j v

n}.

With the assumptions on σ , f ,c,g, the condition reduces to

argminu j

{[ f +Bu j]∂x j v

n + c+ ‖ u j ‖2} ,and hence we obtain γ

n,∗j = − 1

2 B∂x j vn by convexity and

coercivity of the mapping u j 7−→ [ f +Bu j]∂x j vn + c+ ‖ u j ‖2 .

Proposition 3 (Explicit optimal control and cost, [3]).Consider the risk-sensitive mean-field stochastic game de-scribed in Proposition 2, with f = A(t)x, B a constant matrix,c = x′Q(t)x, Q(t)≥ 0, g(x) = x′QT x,QT ≥ 0, where the sym-metric matrix Q(·) has continuous entries. Then, the solutionto the HJB equation in Proposition 1 (whenever it exists) isgiven by vn(t,x) = x′Z(t)x+ ε

∫ Tt tr(Z(s)σσ ′) ds, where Z(s)

is the nonnegative-definite solution of the generalized Riccatidifferential equation

Z +A′Z +ZA+Q−Z(

BB′− 1ρ2 σσ

′)

Z = 0, Z(T ) = QT ,

where ρ = (δ/2ε)1/2 and the optimal response strategy is

u∗j(t) = γ∗j (x) =−B′Zx. (8)

Using Proposition 3, one has the following result for anygiven trajectory (mn

t )t≥0, which enters the cost function in aparticular way.

Proposition 4. If c is in the form c = x′(Q(t)−Λ(t,mnt ))x,

where Λ is symmetric and continuous in (t,m), then thegeneralized Riccati equation becomes

Z∗+A′Z∗+Z∗A+Q−Λ(t,mnt )−Z∗

(BB′− 1

ρ2 σσ′)

Z∗ = 0,

Z∗(T ) = QT , and

vn(t,x) = x′Z∗x+ ε

∫ T

ttr(Z∗(s)σσ

′) ds.

A. Macroscopic McKean-Vlasov equation

Since the controls used by the players influence the mean-field limit via the state dynamics, we need to characterizethe evolution of the mean-field limit as a function of thecontrols. The law of mt is the solution of the Fokker-Planck-Kolmogorov equation given by (2), and the individual statedynamics follows the so-called macroscopic McKean-Vlasovequation

dx j(t) =(∫

wf (t, x j(t),u∗j(t),w)mt(dw)

)dt

+√

ε

(∫w

σ(t, x j(t),u∗j(t),w)mt(dw))

dB j(t). (9)

In order to obtain an error bound, we introduce the follow-ing:

Definition 3 (Monge-Kantorovich Metric). Given two mea-sures µ and ν , the Monge-Kantorovich metric (also calledWasserstein metric) between µ and ν is

W1(µ,ν) = infX∼µ,Y∼ν

E|X−Y |.

In other words, if E(µ,ν) is the set of probability measuresP on the product space such that the image of P under theprojection on the first argument (resp. on the second argument)is µ (resp. ν), then

W1(µ,ν) = infP∈E(µ,ν)

∫ ∫|z− z′|P(dz,dz′). (10)

The Monge-Kantorovich metric is indeed a distance mea-sure (it can be checked that the separation, the triangle inequal-ity and positivity properties are satisfied), and it metricizes theweak topology.

Proposition 5. Under the conditions (i)-(vii), the followingholds: For any t, if the control law γ∗j (·) is used, then thereexists yt > 0 such that

E(‖ xn

j(t)− x j(t) ‖)≤ yt√

n.

Moreover, for any T < ∞, there exists CT > 0 such that

W1(L ((xn

j(t))t∈[0,T ]),L ((x j(t))t∈[0,T ]))≤ CT√

n, (11)

7

where L (Xt) denotes the law of the random variable Xt .

The last inequality in the proposition says that the errorbound is at most of O( 1√

n ) for any fixed compact interval.The proof of this result follows the following steps (detailsare given in Appendix II): Let xn

j(t) and x j(t) be the solutionsof the two SDEs with initial gap less than 1√

n . Then, takingthe difference between the two solutions, using triangle in-equality of norms, taking expectation, and using the Gronwallinequality one arrives at the result.

1) Risk-sensitive mean-field cost: Based on the fact thatmn

t converges weakly to mt under the admissible controls(un

j(s), s ≥ 0) −→ (u j(s), s ≥ 0) when n goes to infinity,the weak convergence of the risk-sensitive cost function (3)follows, under the regularity conditions (vi) and (vii) onfunctions c and g, i.e., as n→ ∞,

L(γ j,mn[t,T ]; t,x,m) → L(u j,m[t,T ], t,x,m)

= δ logE(

e1δ[g(x j(T ))+

∫ Tt c(s,x j(s),u j(s),ms) ds]

∣∣∣∣ x j(t) = x,mt = m).

Based on this limiting cost, we can construct the bestresponse to mean field in the limit. Given {ms}s∈[t,T ], weminimize L(u j,m[t,T ]; t,x,m) subject to the state-dynamicsconstraints.

B. Fixed-point problem

We now define the mean-field equilibrium game as thefollowing fixed-point problem.

Definition 4. The mean field equilibrium game problem (P)is one where each player solves the optimal control problem

infu j

δ logE(

e1δ[g(x j(T ))+

∫ Tt c(s,x j(s),u j(s),m∗s ) ds]

∣∣∣∣ x j(t) = x,mt = m),

subject to the dynamics of x j(t) given by the dynamics inSection III-A, where the mean field mt is replaced by m∗t ,and m∗t is the mean of the optimal mean-field trajectory. Theoptimal feedback control u∗j [t,x,m

∗] depends on m∗, and m∗

is the mean field reproduced by all the u∗j , i.e., m∗t = m[t,u∗]solution of the Fokker-Planck-Kolmogorov forward equation(2). The equilibrium is called an individual feedback mean-field equilibrium if every player adopts an individual state-feedback strategy.

Note that this problem differs from the risk-sensitive meanfield stochastic optimal control problem where the objective is

δ logE(

e1δ[g(x j(T ))+

∫ Tt c(s,x j(s),u j(s),ms[u]) ds]

∣∣∣∣ x j(t) = x,mt = m),

with ms[u] the distribution of the state dynamics x j(s) drivenby the control u j.

C. Risk-sensitive FPK-McV equations

The regular solutions to problem (P) introduced above aresolutions to the HJB backward equation combined with the

FPK equation and macroscopic McKean-Vlasov version of thelimiting individual dynamics, i.e.,

dx j(t) =

(∫w

f (t,x j(t),u∗j(t),w)mt(dw))

dt

+√

ε

(∫w

σ(t,x j(t),u∗j(t),w)mt(dw))

dB j(t),

x j(0) = x j,0 = x

0 = ∂tv+ infu j

{f ·∂xv+

ε

2tr(σσ

′∂

2xxv)+

ε

2δ‖ σ∂xv ‖2 +c

},

x j := x; v(T,x) = g(x)

∂tmt = −D1x

(mt

∫w

f (t,x,u∗,w)mt(dw))

2D2

xx

(mt

(∫w

σ′(t,x,u∗,w)mt(dw)

)·(∫

wσ(t,x,u∗,w)mt(dw)

))m0(·) fixed.

Then, the relevant question that arises is the existence ofa solution to the above system. This is a backward-forwardsystem, and little is known about the existence of a solutionto such a system. In general, a solution may not exist as wenext demonstrate.

D. Possibility of non-existence of a solution to backward-forward boundary value problems

There are many examples of systems of backward-forwardequations which do not admit solutions. As a very simpleexample from [40], consider the system:

v = m, m =−v,m(0) = m0;v(T ) =−mT .

It is obvious that the coefficients of this pair of backward-forward differential equations are all (trivially) uniformlyLipschitz. However, depending on T , this may not be solvablefor m0 6= 0. We can easily show that for T = kπ + 3π/4 (k,a nonnegative integer), the above two-point boundary valueproblem does not admit a solution for any m0 6= 0 and it admitsinfinitely many solutions for m0 = 0.

Following essentially the same ideas, one can show that thesystem of stochastic differential equations

dv = mdt +σdB(t), dm =−vdt +νdB(t),

where B(t) is the standard Brownian motion in R, and withthe initial conditions

m(0) = m0 6= 0;v(T ) =−mT ,

and with T = 7π/4, does not admit any solution.This example demonstrates that the system needs to be

normalized and the boundary conditions need to be properlypicked. In view of this, we will introduce the notion of areduced mean-field system in Section IV to establish theexistence of equilibrium for a specific class of risk-sensitivegames. We first provide below a key result covering the mostgeneral case.

8

E. Risk-sensitive mean-field equilibrium

Theorem 1. Consider a risk-sensitive mean-field stochasticdifferential game as formulated above. Assume that σ = σ(t)and that there exists a unique pair (u∗,m∗) such that

(i) The coupled backward-forward PDEs

∂tv∗+ infu j

{f ∗ ·∂xv∗+

ε

2tr(σσ

′∂

2xxv∗) +

ε

2δ‖ σ∂xv ‖2 +c∗

}= 0,

v(T,x) = g(x), m∗0(x) fixed.

∂tm∗t + D1x

(m∗t

∫w

f ∗(t,x,u∗,w)m∗t (dw))

2D2

xx

(m∗t

(∫w

σ′m∗t (dw)

)(∫w

σm∗t (dw)))

admit a pair a bounded nonnegative solutions v∗,m∗; and(ii) u∗ minimizes the Hamiltonian, H = f (t,x,u,m∗) ·∂xv∗+

c(t,x,u,m∗).Then, the pair (u∗,m∗) is a strongly time-consistent mean-

field equilibrium and L(t,u∗,m∗) = v∗. In addition, if c =x′(Q(t)−Λt(mn

t ))x where Λ(t, ·) is a measurable symmetricmatrix-valued function, then any convergent subsequence ofoptimal control laws γ

α(n)j leads to a best strategy response to

m.

Proof: See Appendix II.

Remark 4. This result can be extended to finitely multipleclasses of players (see [28], [4], [26] for discussions). Todo so, consider a finite number of classes indexed by θ ∈Θ. The individual dynamics are then indexed by θ , i.e. thefunction f becomes fθ , and σ becomes σθ . This means thatthe indistinguishability property is not satisfied anymore. Thelaw depends on θ (it is not invariant by permutation of index).However, the invariance property holds within each class. Thisallows us to establish a weak convergence of the individualdynamics of each generic player for each class, and we obtainxθ (t). The multi-class mean-field equilibrium is then definedby a system for each class and the classes are interdependentvia the mean field and the value functions per class.

Limiting behavior with respect to ε : We scale the pa-rameters δ ,ε and ρ such that δ = 2ερ2. The PDE given inProposition 1 then becomes

∂tv+ infu

{f ∗ ·∂xv+

ε

2tr(σσ

′∂

2xxv)+

14ρ2 ‖ σ∂xv ‖2 +c∗

}= 0,

v(T,x) = g(x).

When the parameter ε goes to zero, one arrives at a determin-istic PDE. This situation captures the large deviation limit:

∂tv+ infu

{f ∗ ·∂xv+

14ρ2 ‖ σ∂xv ‖2 +c∗

}= 0, v(T,x) = g(x).

F. Equivalent stochastic mean-field problem

In this subsection, we formulate an equivalent (n +1)−player game in which the state dynamics of the n players

are given by the system (ESM) as follows:

dxnj(t) =

(∫w

f (t,xnj(t),u

nj(t),w)m

nt (dw)+σζ (t)

)dt

+√

εσdB j(t),xn

j(0) = x j,0 ∈ Rk, k ≥ 1, j ∈ {1, . . . ,n},(ESM)

where ζ (t) is the control parameter of the “fictitious” (n+1)−th player. In parallel to (3), we define the risk-neutral costfunction of the n players as follows:

L(γ j, ζ ,xnj ,m

n[0,T ]; t,x,m) =

E(

g(xnj(T ))+

∫ T

tc(s,xn

j(s),unj(s),m

ns ) ds

−ρ2∫ T

t‖ ζ (s) ‖2 ds

∣∣∣∣x j(t) = x,mnt = m

), (12)

where ζ : [0,T ]×Rk→Un+1 is the individual feedback controlstrategy of the fictitious Player n+1, which yields an admis-sible control action ζ (t) in a set of feasible actions Un+1.

Every player j ∈ {1,2, . . . ,n} minimizes L under the worstchoice of feedback strategy ζ by player n + 1, which ispiecewise continuous in t and Lipschitz in x j. We refer to thisgame described by (ESM) and (12) as the robust mean-fieldgame. In the following Proposition, we describe the connectionbetween the mean-field risk-sensitive game problem describedin (SM) and (3) and the robust mean-field game problemdescribed in (ESM) and (12),

Proposition 6. Under the regularity assumptions (i)-(vii),given a mean field mn

t , the value functions of the risk-sensitivegame and the robust game problems are identical, and themean-field best-response control strategy of the risk-sensitivestochastic differential game is identical to the one for thecorresponding robust mean-field game.

Proof: Let vn = infu j supζ L(u j,ζ ,xnj ,m

n[0,T ], t,x j,m) de-

note the upper-value function associated with this robust mean-field game. Then, under the regularity assumptions (i)-(vii), ifvn is C1 in t and C2 in x, it satisfies the Hamilton-Jacobi-Isaacs(HJI) equation

infu

supζ

{∂t vn

j +( f +σζ ) ·∂x j vn + c−ρ

2 ‖ ζ ‖2

2tr(∂ 2

x jx jvn

σσ′)}= 0, (13)

vn(T,x j) = g(x j).

Note that (13) can be rewritten as infu supζ H3, where

H3 := H +(σζ ) ·∂x j vn−ρ

2 ‖ ζ ‖2

is the Hamiltonian associated with this robust game.Since the dependences on u and ζ above are separable, the

Isaacs condition (see [5]) holds, i.e.,

infu

supζ

H3 = supζ

infu

H3 ,

9

and hence the function vnj satisfies the following after comput-

ing and substituting back the best-response strategy for ζ :

−∂t vn = infu

{f ·∂x j v

n + c+1

4ρ2 ‖ σ′∂x j v

n ‖2

2tr(∂ 2

x jx jvn

σσ′)}. (14)

vn(T,x j) = g(x j).

Note that the two PDEs, (14) and the one given in Proposi-tion 1, are identical, with ρ2 = (δ/2ε). Moreover, the optimalcost and the optimal control laws in the two problems are thesame.

Remark 5. The FPK forward equation will have to bemodified to account for the control of fictitious player in therobust mean-field game formulation, by including the term σζ

in (ESM). Hence the mean-field equilibrium solutions to thetwo games are not necessarily identical.

IV. LINEAR STATE DYNAMICS

In this section, we study a special class of risk-sensitivegames where state dynamics are linear and do not dependexplicitly on the mean field. We first state a related resultfrom [27], [14] for the risk-neutral case.

Theorem 2 ([27]). Consider the reduced mean field system(rMFG):

∂xv+H(x,∇xv,mt(x))+σ2

2∂

2xxv = 0,

∂xmt +div(mt∂pH(x,∇xv,mt(x))−σ2

2∂

2xxmt = 0,

m0(·) fixed,v(T, ·) fixed,

v,m are 1-periodic.,

x ∈ (0,1)d := X ,

where H is the Legendre transform (with respect to the control)of the instantaneous cost function.

Suppose that (x, p,z) 7−→ H(x, p,z) is twice continuouslydifferentiable with the respect to (p,z), and for all (x, p,z) ∈X ×Rp×R∗+,(

∂ 2ppH(x, p,z) 1

2 ∂ 2pzH(x, p,z)

12 [∂

2pzH(x, p,z)]′ − 1

z ∂zH(x, p,z)

)� 0

Then, there exists at most one smooth solution to the (rMFG).

Remark 6. We have a number of observations and notes.• The Hamilitonian function H in the result above requires

a special structure. Instead of a direct dependence on theentire mean field distribution mt , its dependence on themean field is through the value of mt evaluated at statex.

• For global dependence on m, a sufficiency condition foruniqueness can be found in [26] for the case wherethe Hamiltonian is separable, i.e., H(x, p,m) = ξ (x, p)+f (x,m) with f monotone in m and ξ strictly convex in p.

• The solution of (rMFG) can be unique even if the aboveconditions are violated. Further, the uniqueness conditionis independent of the horizon of the game.

Fig. 1. The evolution of distribution m∗t ,0≤ t ≤ 5,−19≤ x≤ 21.

• For the linear-quadratic mean field case, it has beenshown in [4] that the normalized system may have aunique i.i.d. solution or infinitely many solutions de-pending on the system parameters. See also [6] forrecent analysis on risk-neutral linear-quadratic mean-field games using the stochastic maximum principle.

The next result provides the counterpart of Theorem 2 in therisk-sensitive case. It provides sufficient conditions for havingat most one smooth solution in the risk-sensitive mean-fieldsystem by exploiting the presence of the additive quadraticterm (which is strictly convex in p).

Theorem 3. Consider the risk-sensitive (reduced) mean fieldsystem (RS-rMFG). Let δ > 0, and H(x, p,z) be twice con-tinuously differentiable in (p,z) ∈ Rd × R+, satisfying thefollowing conditions:• H is strictly convex in p,• H is decreasing in z,

(−∂zH

z

)·(∂ 2

ppH)� (∂ 2

pzH −εσ2

2δp/z)′ · (∂ 2

pzH −

εσ2

2δp/z).

Then, (RS-rMFG) has at most one smooth solution.

Proof: See Appendix II.

Remark 7. We observe that in contrast to Theorem 2 (risk-neutral case), the sufficiency condition for having at most onesmooth solution in (RS-rMFG) now depends on the varianceterm.

V. NUMERICAL ILLUSTRATION

In this section, we provide two numerical examples toillustrate the risk-sensitive mean-field game under affine statedynamics and McKean-Vlasov dynamics.

A. Affine state dynamics

We let Player j’s state evolution be described by a decoupledstochastic differential equation

dxnj(t) = u j(t)dt +

√εσdB j(t).

10

Fig. 2. Mean value E(m∗t ) as a function of time, 0≤ t ≤ 5.

Fig. 3. Variance of the distribution m∗t as a function of time, 0≤ t ≤ 5.

The risk-sensitive cost functional is given by

L(γ j,mn; t,x,m) = δ logEx,m

{exp[

(Q(xn

j)2

+∫ T

0(q−E(mn

t ))(xnj)

2(t)+ u2j(t)dt

)]}, (15)

where δ ,Q,q are positive parameters; hence coupling of theplayers is only through the cost. The optimal strategy of Playerj has the form of

u∗j(t) =−z(t)x, (16)

where z(t) is a solution to the Riccati differential equation

z(t)+q−E(mn)− z2(t)(1−σ2/ρ

2) = 0,

Fig. 4. z(t) as a function of time, 0≤ t ≤ T .

with boundary condition z(T ) = Q. An explicit solution isgiven by

z(t)=−√

q−M√L

tan[√

L√

q−M(t−T )+ arctan

( √LQ√

q−M

)],

0≤ t ≤ T, where L := 1−σ2/ρ2 and M := E(mn). The FPK-McV equation reduces to

∂tm∗t +∂x(m∗t z(t)x(t)) =ε

2∂

2xxm∗t .

We set the parameters as follows: q = 1.2,Q = 0.1, δ =100,000, σ = 2.0, T = 5 and ε = 5.0. Let m∗0(x) be a normaldistribution N (1,1) and for every t ∈ [0,T ], m∗t vanish atinfinity. In Figure 1, we show the evolution of the distributionm∗t , and in Figures 2 and 3, we show the mean and the varianceof the distribution which affects the optimal strategies in (16).The optimal linear feedback z(t) is illustrated in Figure 4.We note that the mean value E(m∗t ) monotonically decreasesfrom 1.0 and hence the unit cost on state is monotonicallyincreasing. As the state cost increases, the control effortbecomes relatively cheaper and therefore we can observe anincrease in the magnitude of z(t). However, when the meanvalue goes beyond 1.08, we observe that the control effortreduces to avoid undershooting in the state.

B. McKean-Vlasov dynamics

We let the dynamics of an individual player be

dxnj(t) =

n

n

∑i=1

xni (t)+un

j(t)

)dt +√

εσdB j(t), (17)

and take the risk-sensitive cost function to be

L = δ logE{

exp[

∫ T

0q(xn

j(t))2 +(un

j(t))2]}

.

Note that in this case the cost function is independent ofother players’ controls or states. As n→ ∞, under regularityconditions,

limn→∞

n

∑i=1

1n

xni (t) = M(t),

where M(t) is the mean of the population. The feedbackoptimal control u j in response to the mean field M(t) ischaracterized by

u j(t) =−z(t)x j(t)− k(t),

where

z(t)+q− z2(1−σ2/ρ

2) = 0, z(T ) = 0,k(t)− z(t)k(t)+ z(t)M(t) = 0, k(T ) = 0,

and ρ2 = (δ/2ε) and M(t) =∫

x∈X xm(x, t)dx. The Fokker-Planck-Kolmogorov equation is

∂tm(x, t)+∂x

(m(x, t)(−z(t)x(t)− k(t))+βm(x, t)

∫w

wm(w, t)dw)

2∂

2xxm(x, t)

11

Fig. 5. Evolution of the probability density function m(x, t)

Fig. 6. The mean M(t) under equilibrium solution

By solving the ODEs, we obtain

z(t) =−√

qtan(√

q(t−T )), 0≤ t ≤ T.

where q = q/(1−σ2/ρ2) and q = q(1−σ2/ρ2). Letting q =r = 1, we compute the solution to be

k(t) = cos(t−T )(∫ T

1m(τ)sec(T − τ)tan(T − τ)dτ

−∫ T

tm(τ ′)sec(T − τ

′)tan(T − τ′)dτ

′).

We let σ = 1,ρ = 2,β = 1, and depict in Figure 5 the evolutionof the probability density function m(x, t). The evolutions ofthemean M(t) and the variance are shown in Figure 6 andFigure 7, respectively.

VI. CONCLUDING REMARKS

We have studied risk-sensitive mean-field stochastic differ-ential games with state dynamics described by an Ito stochasticdifferential equation and the cost function being the expectedvalue of an exponentiated integral.

Using a particular structure of state dynamics, we haveshown that the mean-field limit of the individual state dy-namics leads to a controlled macroscopic McKean-Vlasovequation. We have formulated a risk-sensitive mean-field

Fig. 7. Variance over time under equilibrium solution

response framework, and established its compatibility withthe density distribution using the controlled Fokker-Planck-Kolmogorov forward equation. The risk-sensitive mean-fieldequilibria are characterized in terms of coupled backward-forward equations. For the general case, obtaining a solution tothe resulting mean-field system (numerically or analytically) isvery challenging, even if the number of equations have beenreduced. We have, however, provided generic explicit formsin the particular case of the affine-exponentiated-Gaussianmean-field problem. In addition, we have shown that the risk-sensitive problem can be transformed into a risk-neutral mean-field game with the introduction of an additional fictitiousplayer. This allows one to study a novel class of mean fieldgames, robust mean-field games, under the Isaacs condition.

An interesting direction that we leave for future research isto extend the model to accommodate multiple classes of play-ers and a drift function which may depend on the other players’controls. Another direction would be to soften the conditionsunder which Proposition 5 is valid, such as boundedness andLipschitz continuity, and extend the result to games with non-smooth coefficients. In this context, one could address a meanfield central limit question on the asymptotic behavior of theprocess

√nE(‖ xn

j(t)− x j(t) ‖). Yet another extension would

be to the time average risk-sensitive cost functional. Finally,the approach needs to be compared with other risk-sensitiveapproaches such as the mean-variance criterion, and extendedto the case where the drift is a function of the state-mean fieldand the control-mean field.

REFERENCES

[1] S. Adlakha, R. Johari, G. Weintraub, and A. Goldsmith. Obliviousequilibrium for large-scale stochastic games with unbounded costs. Proc.IEEE CDC, Cancun, Mexico, pages 5531–5538, 2008.

[2] D. Aldous. Exchangeability and related topics. In Hennequin, P., editor,Ecole d’ Ete de Probabilites de Saint-Flour XIII - 1983, Springer-Verlag, Heidelberg. Lecture Notes in Mathematics 1117, pages 1–198,1985.

[3] T. Basar. Nash equilibria of risk-sensitive nonlinear stochastic differen-tial games. J. of Optimization Theory and Applications, 100(3):479–498,1999.

[4] M. Bardi. Explicit solutions of some linear-quadratic mean field games.Workshop on Mean Field Games, Roma, 2011.

[5] T. Basar and G. J. Olsder. Dynamic noncooperative game theory,volume 23. Society for Industrial and Applied Mathematics (SIAM),1999.

12

[6] A. Bensoussan, K. C. J. Sung, S. C. P. Yam, and S. P.Yung. Linear-quadratic mean field games. avaliable athttp://www.sta.cuhk.edu.hk/scpy/Preprints/, 2011.

[7] A. Bensoussan and J.H. van Schuppen. Optimal control of partially ob-servable stochastic systems with an exponential-of-integral performanceindex. SIAM J. Control and Optimization, 23:599–613, 1985.

[8] J. Bergin and D. Bernhardt. Anonymous sequential games with aggre-gate uncertainty. J. Mathematical Economics, 21:543–562, 1992.

[9] F. Cucker and S. Smale. Emergent behavior in flocks. IEEE Trans.Automat. Control, 52, 2007.

[10] F. Cucker and S. Smale. On the mathematics of emergence. Japan. J.Math., 2:197–227, 2007.

[11] D. A. Dawson. Critical dynamics and fluctuations for a mean-field modelof cooperative behavior. Journal of Statistical Physics, 31:29–85, 1983.

[12] B. de Finetti. Funzione caratteristica di un fenomeno aleatorio. atti dellar. academia nazionale dei lincei, serie 6. memorie, classe di scienzefisiche, mathematice e naturale. 4:251–299, 1931.

[13] C. Graham. Chaoticity on path space for a queueing network withselection of the shortest queue among several. Journal of AppliedProbability, 37:198–211, 2000.

[14] O. Gueant. Mean field games - uniqueness result. Course notes, 2011.[15] O Gueant, Jean-Michel Lasry, and Pierre-Louis Lions. Mean field games

and applications. Springer: Paris-Princeton Lectures on MathematicalFinance, Eds. Rene Carmona, Nizar Touzi, 2010.

[16] Tembine H. and M. Huang. Mean field stochastic difference games:McKean-Vlasov dynamics. CDC-ECC, 50th IEEE Conference on Deci-sion and Control and European Control Conference, Orlando, Florida,December 12-15 2011.

[17] E. Hewitt and L. J. Savage. Symmetric measures on cartesian products.Transactions of the American Mathematical Society, 80:470–501, 1955.

[18] M. Huang, P. E. Caines, and Malhame R. P. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behav-ior and decentralized ε-Nash equilibria. IEEE Trans. Automat. Control,52:1560–1571, 2007.

[19] M. Huang, R. P. Malhame, and P. E. Caines. Large population stochasticdynamic games: closed-loop McKean-Vlasov systems and the Nashcertainty equivalence principle. Commun. Inf. Syst., 6(3):221–252, 2006.

[20] M. Y. Huang, P. E. Caines, and R. P. Malhame. Individual and massbehaviour in large population stochastic wireless power control problems: Centralized and Nash equilibrium solution. IEEE Conference onDecision and Control, HA, USA, pages 98 – 103, December 2003.

[21] D.H. Jacobson. Optimal stochastic linear systems with exponential per-formance criteria and their relation to deterministic differential games.IEEE Trans. Automat. Contr., 18(2):124–131, 1973.

[22] B. Jovanovic and R. W. Rosenthal. Anonymous sequential games.Journal of Mathematical Economics, 17:77–87, 1988.

[23] M. Kac. Foundations of kinetic theory. Proc. Third Berkeley Symp. onMath. Statist. and Prob., 3:171–197, 1956.

[24] P. Kotolenez and T. Kurtz. Macroscopic limits for stochastic partialdifferential equations of McKean-Vlasov type. Probability theory andrelated fields, 146(1):189–222, 2010.

[25] Y. Kuramoto. Chemical oscillations, waves, and turbulence. Springer,1984.

[26] J.M. Lasry and P.L. Lions. Mean field games. Japan. J. Math., 2:229–260, 2007.

[27] P. L. Lions. Cours jeux a champ moyens et applications. College deFrance, 2010.

[28] R.P. Malhame M.Y. Huang and P.E. Caines. Large population stochasticdynamic games: Closed loop McKean-Vlasov systems and the Nashcertainty equivalence principle. Special issue in honour of the 65thbirthday of Tyrone Duncan, Communications in Information and Sys-tems, 6(3):221–252, 2006.

[29] B. Oksendal. Stochastic Differential Equations: An Introduction withApplications (Universitext). Springer, 6th edition, 7 2003.

[30] Z. Pan and T. Basar. Model simplification and optimal control ofstochastic singularly perturbed systems under exponentiated quadraticcost. SIAM J. Control and Optimization, 34(5):1734–1766, September1996.

[31] A. S. Sznitman. Topics in propagation of chaos. In P.L. Hennequin,editor, Springer Verlag Lecture Notes in Mathematics 1464, Ecole d’Etede Probabilites de Saint-Flour XI (1989), pages 165–251, 1991.

[32] Y. Tanabe. The propagation of chaos for interacting individuals in alarge population. Mathematical Social Sciences, 51:125–152, 2006.

[33] H. Tembine. Mean field stochastic games. Notes, Supelec, October 2010.[34] H Tembine. Mean field stochastic games: Convergence, Q/H-learning

and optimality. In Proc. American Control Conference (ACC), SanFrancisco, California, USA, 2011.

[35] H Tembine, S. Lasaulce, and M. Jungers. Joint power control-allocationfor green cognitive wireless networks using mean field theory. Proc. 5thIEEE Intl. Conf. on Cogntitive Radio Oriented Wireless Networks andCommunications (CROWNCOM), pages 1–5, 2010.

[36] H. Tembine, J. Y. Le Boudec, R. ElAzouzi, and E. Altman. Mean fieldasymptotic of Markov decision evolutionary games and teams. Proc.International Conference on Game Theory for Networks (GameNets),Istanbul, Turkey, May 13-15, 2009., pages 140–150, May 2009.

[37] G. Y. Weintraub, L. Benkard, and B. Van Roy. Oblivious equilibrium:A mean field approximation for large-scale dynamic games. Advancesin Neural Information Processing Systems, 18, 2005.

[38] P Whittle. Risk-sensitive linear quadratic Gaussian control. Advancesin Applied Probability, 13:764–777, 1981.

[39] H. Yin, P. G. Mehta, S. P. Meyn, and U. V. Shanbhag. Synchronizationof coupled oscillators is a game. Proc. American Control Conference(ACC), Baltimore, MD, pages 1783–1790, 2010.

[40] J. Yong and X. Y. Zhou. Stochastic Controls. Springer, 1999.

APPENDIX I: BACKGROUND ON INDISTINGUISHABILITY

In this appendix, we provide an overview of the funda-mental result on mean-field convergence of indistinguishableprocesses in a general setup. Let X be a complete andseparable metric space, which is a Polish space. Let Sn be theset of permutations of {1,2, . . . ,n}. The collection {x1, . . . ,xn}is indistinguishable or exchangeable if for every permutationπ ∈ Sn, the new collection {xπ(1), . . . ,xπ(n)} has the same dis-tribution. The infinite sequence x1,x2, . . . is indistinguishable iffor any finite subcollection {x j1 , . . . ,x jn} is indistinguishable.

For indistinguishable (or exchangeable) processes, the con-vergence of the empirical measure 1

n ∑ j=1 δx j has been widelystudied. This sits at the intersection between group theoryand probability theory. The symmetry group properties havebeenused to derive some properties of the distributions ofthe processes. The next theorem provides the mean-fieldconvergence of such a process.

Theorem 4 (de Finetti-Hewitt-Savage). Let x1,x2, . . . , be anindistinguishable sequence of X −valued random variables.Then there is a P(X )−valued random variable µ such that

µ = limn

1n

n

∑j=1

δx j , almost surely

where P(X ) denotes the space of probability measures on X .Moreover, conditioned on the random measure µ, one has

limn

∫ L

∏l=1

φl(x jl )µn(dxn) =

L

∏l=1

(∫φldµ

),

for any fixed natural number L ≥ 2, and a collection ofmeasurable bounded functions {φl}1≤l≤L defined on the statespace X .

Theorem 4 has been proved by de Finetti (1931, [12]) forinfinite binary sequences and has been extended by Hewitt andSavage (1955, [17]) to continuous and compact state spaces.A simple and elegant proof can be found in Aldous (1985,[2]), pp. 18-22, for the general state space.

APPENDIX II: PROOFS

Proof of Proposition 5: Under the stated standard as-sumptions on the drift f and variance σ , the forward stochasticdifferential equation has a unique solution adapted to the

13

filtration generated by the Brownian motions. We want to showthat

E

(sup

t∈[0,T ]‖ xn

j(t)− x j(t) ‖

)≤ CT√

n,

where CT is a positive number which only depends on thebounds, T , and the Lipschitz constants of the coefficients ofthe drifts and the variance term. First we observe that for afixed control u, the averaging terms 1

n ∑ni=1 f (t,x j,u,xi) and

1n ∑

ni=1 σ(t,x j,u,xi) are measurable, bounded and Lipschitz

with the respect to the state and uniformly with the respectto time.

Second, we observe that the bound on the Lipschitz con-stants of the coefficients do not depend on the population sizen.

Hence,∫

f (t,x,u,x′) mt(dx′) and∫

σ(t,x,u,x′) mt(dx′) arebounded and Lipschitz uniformly with the respect to t. More-over, these coefficients are deterministic. This means that thereis a unique solution to the limiting SDE and that the solutionis measurable with the filtration generated by the mutuallyindependent Brownian motions.

Third, we evaluate the gap between the coefficients inorder to obtain an estimate of the two processes. We startby evaluating the gap

E

∥∥∥∥∥1n

n

∑i=1

f (t,x,u,xi)−∫

f (t,x,u,x′) mt(dx′)

∥∥∥∥∥2

Notice that f returns a k−dimensional vector and x belongsto Rk. By reordering the above expression (in 2−norm), weobtain

k

∑l=1

var

(1n

n

∑i=1

fl(t,x j,u,xi)

)≤ k

n(1+max

lbl)

2 ≤ CT

n, (18)

where var(X) denotes the variance of X and bl is a bound onthe l−th component of the drift term (this exists because wehave assumed boundedness conditions on the coefficients).

Following a similar reasoning, we obtain the bounds on thesecond term in σ , i.e.,

∑l,l′

var

(1n

n

∑i=1

σll′(t,x j,u,xi)

)≤ k

n(1+max

ll′cll′)

2 ≤ CT

n,

where cll′ is a bound on the entry (l, l′) of the matrix σ .The difference between x j and xn

j can be expressed inintegral form as

∫ t

0

[1n

n

∑i=1

f (s,xnj ,u,x

ni )−

∫f (s,x j,u,x′) ms(dx′)

]ds

and the deviation in terms of variance terms can be written as∫ t

0

1n

n

∑i=1

σ(s,xnj ,u,x

ni ) dBs−

∫ t

0

∫σ(s,xn

j ,u,x′) ms(dx′) dBs.

We now apply the following standard decompositions toestimate SDEs:

1n

n

∑i=1

f (t,xnj ,u,x

ni )−

∫f (t, xn

j ,u,x′) mt(dx′) (19)

=1n

n

∑i=1

f (t,xnj ,u,x

ni )−

1n

n

∑i=1

f (t, xnj ,u,x

ni ) (20)

+1n

n

∑i=1

f (t, xnj ,u,x

ni )−

1n

n

∑i=1

f (t, xnj ,u, x

ni ) (21)

+1n

n

∑i=1

f (t, xnj ,u, x

ni )−

∫f (t,x j,u,x′) mt(dx′) (22)

(23)

Next, using the Lispchitz conditions and boundedness as-sumptions, we arrive at:

‖ 1n

n

∑i=1

f (t,xnj ,u,x

ni )−

∫f (t, xn

j ,u,x′) mt(dx′) ‖ (24)

≤ L f ‖ xnj − xn

j ‖+L f

n

n

∑i=1‖ xn

i − xni ‖ (25)

+ ‖ 1n

n

∑i=1

f (t, xnj ,u, x

ni )−

∫f (t,x j,u,x′) mt(dx′) ‖ (26)

where L f > 0 is a Lipschitz constant of f .We take the sum over j and integrate over time s ∈ [0, t].

Then, using the Cauchy-Schwartz inequality and the bound-edness property of σ , we obtain a recursive equation forE(‖ xn

j(t)− xnj(t) ‖

). Using the standard Gronwall estimates

we deduce that the mean of the quadratic gap between the twostochastic processes (starting from x at time 0) is in order of1n .

Proof of Theorem 1: Under the stated regularity andboundedness assumptions, there is a solution to the McKean-Vlasov equation. Suppose that (i) and (ii) are satisfied. Then,mt = m∗(t,u∗(t)) is the solution of the mean-field limit statedynamics, i.e., the macroscopic McKean-Vlasov PDE whenm is substituted into the HJB equation. By fixing f ∗,c∗,σ ,we obtain a novel HJB equation for the mean-field stochasticgame. Since the new PDE admits a solution according to(ii), the control u∗(t) = u(t,x) minimizing ∂xv · f + c, is abest response to m∗ at time t. The optimal response of theindividual player generates a mean-field limit which in lawis a solution of the FPK PDE and the players compute theircontrols as a function of this mean field. Thus, the consistencybetween the control, the state and the mean field is guaranteedby assumption (i). It follows that (u∗,m∗) is a solution tothe fixed-point problem i.e., a mean-field equilibrium, and astrongly time-consistent one.

Now, we look at the quadratic instantaneous cost case. Inthat case, we obtain the risk-sensitive equations provided inProposition 3. The fact that any convergent subsequence ofbest-response to mn is a best response to m∗ and the fact that u∗

is an ε∗−best response to the mean-field limit m∗ follow frommean-field convergence of order O

(1√n

)and the continuity

of the risk-sensitive quadratic cost functional.

14

Proof of Theorem 3: We provide a sufficient condition forthe risk-sensitive mean field game to have at most one smoothsolution. Suppose δ > 0, and σ is positive constant. Let H bethe Hamiltonian associated with the risk-neutral mean fieldsystem. Then the Hamiltonian for the risk-sensitive mean-field system is H(x, p,m) = H +( εσ2

2δ) ‖ p ‖2 . Assume that

the dependence on m is local, i.e., it is function of m(x).The generic expression for the optimal control is given by

u∗ = ∂pH(x,∂xv,mt(x)) (note that the generic feedback controlis expressed in terms of H, and not of H).

Suppose that there exist two smooth solutions (v1, m1),(v2, m2) to the (normalized) risk-sensitive mean-field system.Now, consider the function t 7−→

∫x∈X (v2(x)− v1(x))(m2(x)−

m1(x))dx. Observe that this function is 0 at time t = 0 becausethe measures coincide initially, and the function is equal to 0at time t = T because the final values coincide. Therefore, thefunction will be identically 0 in [0,T ] if we can show that it ismonotone. This will imply that the integrand is zero, and henceone of the two terms (v2(x)− v1(x)) or (m2,t(x)− m1,t(x))should be 0. Then, if the measures are identical, we use theHJB equation to obtain the result. If the value functions areidentical, we can use the FPK equation to show the uniquenessof the measure. Thus, it remains to find a sufficient conditionfor monotonicity, that is, a sufficient condition under which thequantity

∫x∈X (v2(x)− v1(x))(m2(x)−m1(x))dx is monotone in

time. We compute the following time derivative:

S(t) :=ddt

[∫x∈X

(v2(x)− v1(x))(m2(x)− m1(x))dx].

We interchange the order of the integral and the differentiationand use time derivative of a product to arrive at;

S(t) =∫

x∈X(∂t v2−∂t v1)(m2(x)− m1(x))dx

+∫

x∈X(v2− v1)(∂tm2(x)−∂tm1(x))dx

Now we expand the first term A :=∫

x∈X (∂t v2−∂t v1)(m2(x)−m1(x))dx. Consider the two HJB equations:

∂t v1 + H(x,∂xv1, m1(x))+12

σ2∂

2xxv1 = 0,

∂t v2 + H(x,∂xv2, m2(x))+12

σ2∂

2xxv2 = 0

To compute A, we take the difference between the two HJBequations above and multiply by m2− m1, which leads to

∂t v2−∂t v1

=−H(x,∂xv2, m2)+ H(x,∂xv1, m1)−12

σ2∂

2xxv2 +

12

σ2∂

2xxv1

Hence,

A :=∫

x[∂t v2−∂t v1](m2(x)− m1(x))dx

= −∫

xH(x,∂xv2, m2)(m2(x)− m1(x))dx

+∫

xH(x,∂xv1, m1)(m2(x)− m1(x))dx

−∫

x

12

σ2∂

2xx(v2)(m2(x)− m1(x))dx

+∫

2 12

∂2xx(v1)(m2(x)− m1(x))dx

Next we expand the second term B :=∫

x∈X (∂tm2 −∂tm1)(v2− v1)dx. Note that the Laplacian terms are canceledby integration by parts in the expression A+B. By collectingall the terms in A+B, we obtain

A+B = −∫

xH(x,∂xv2, m2)(m2(x)− m1(x))dx

+∫

xH(x,∂xv1, m1)(m2(x)− m1(x))dx

+∫

xm2(x)[∂pH(x,∂xv2, m2)](∂xv2−∂xv1)dx

−∫

xm1(x)[∂pH(x,∂xv1, m1)](∂xv2−∂xv1)dx

Letting S(t) = A+B, we introduce

mλ := (1−λ )m1 +λ m2 = m1 +λ (m2− m1).

The measure mλ starts with m1 for the parameter λ = 0 andyields the measure m2 for λ = 1. Similarly define

vλ := (1−λ )v1 +λ v2.

Introduce an auxiliary integral parameterized by λ :

Cλ := −∫

xH(x,∂xvλ , mλ )(mλ (x)− m1(x))dx

+∫

xH(x,∂xv1, m1)(mλ (x)− m1(x))dx

+∫

xmλ (x)[∂pH(x,∂xvλ , mλ )](∂xvλ −∂xv1)dx

−∫

xm1(x)[∂pH(x,∂xv1, m1)](∂xvλ −∂xv1)dx

Substituting the terms vλ − v1 = λ (v2− v1) and mλ − m1 =λ (m2− m1), we obtain

λ:= −

∫xH(x,∂xvλ , mλ )(m2(x)− m1(x))dx

+∫

xH(x,∂xv1, m1)(m2(x)− m1(x))dx

+∫

xmλ (x)[∂pH(x,∂xvλ , mλ )](∂xv2−∂xv1)dx

−∫

xm1(x)[∂pH(x,∂xv1, m1)](∂xv2−∂xv1)dx

Using the continuity of the terms (of the RHS) above andthe compactness of X , we deduce that

limλ−→0

λ= 0.

We next find a condition under which the one-dimensionalfunction λ 7−→ Cλ

λis monotone in λ . We need to compute the

variations ofd

(Cλ

λ

).

Suppose that (x, p,m) 7−→ H(x, p,m) is twice continuouslydifferentiable with the respect to (p,m). Then,

ddλ

(Cλ

λ

)= −

∫x

[∂pH(x,∂xvλ , mλ )(∂xv2−∂xv1)

](m2(x)− m1(x))dx

−∫

x

[∂mH(x,∂xvλ , mλ )(m2(x)− m1(x))

](m2(x)− m1(x))dx

+∫

x∂λ (mλ (x)[∂pH(x,∂xvλ , mλ )])(∂xv2−∂xv1)dx

15

ddλ

(Cλ

λ

)= −

∫x∂pH(x,∂xvλ , mλ )(∂xv2−∂xv1)(m2(x)− m1(x))dx

−∫

x∂mH(x,∂xvλ , mλ )(m2(x)− m1(x))2dx

+∫

x(m2− m1)[∂pH(x,∂xvλ , mλ )](∂xv2−∂xv1)dx

+∫

xmλ ∂λ [∂pH(x,∂xvλ , mλ )](∂xv2−∂xv1)dx

Computation of the term mλ (x)∂λ ([∂pH(x,∂xvλ , mλ )])yields

Dλ = ∂λ [∂pH(x,∂xvλ , mλ )]

= ∂2ppH.(∂xv2−∂xv1)+∂

2mpH.(m2− m1)

and we obtaind

(Cλ

λ

)= −

∫x∂pH(x,∂xvλ , mλ )(∂xv2−∂xv1)(m2(x)− m1(x))dx

−∫

x∂mH(x,∂xvλ , mλ )(m2(x)− m1(x))2dx

+∫

x(m2− m1)[∂pH(x,∂xvλ , mλ )](∂xv2−∂xv1)dx

+∫

xmλ ∂

2ppH.(∂xv2−∂xv1)

2 + mλ ∂2mpH.(m2− m1)(∂xv2−∂xv1)

The first and the third lines differ by

−∫

x

(εσ2

δ〈.,∇xv〉

)(∂xv2−∂xv1)(m2(x)− m1(x))dx.

Hence, we obtaind

(Cλ

λ

)=

∫xmλ (∂xv2−∂xv1, m2− m1)

(a11 a12a21 a22

)(∂xv2−∂xv1

m2− m1

)dx,

wherea11 := ∂

2ppH,

a21 :=12

∂2mpH =

12

∂2mpH− εσ2

2δp/m,

a21 :=12(∂ 2

pmH)′− εσ2

pm

=12(∂ 2

pmH)′− εσ2

pm,

a22 :=−∂mHm

.

Suppose that for all (x, p,m) ∈X ×Rd×R+, we have(a11 a12a21 a22

)� 0.

Then, the monotonicity follows, and this completes the proof.