Proportional and double imitation rules for spectrum ...chen/index_fichiers/publications/comnet2013.pdf · Cognitive radio [1], with its capability to ﬂexibly conﬁg-ure its transmission

Computer Networks 57 (2013) 1863–1879

Contents lists available at SciVerse ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/ locate/comnet

Proportional and double imitation rules for spectrum accessin cognitive radio networks q

1389-1286/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.comnet.2013.03.008

q This work is supported by the project TEROPP (technologies forTERminal in OPPortunistic radio applications) funded by the FrenchNational Research Agency (ANR).⇑ Corresponding author. Tel.: +33 1 45 81 75 88; fax: +33 1 45 81 31 19.

E-mail addresses: [email protected] (S. Iellamo),[email protected] (L. Chen), [email protected] (M.Coupechoux).

Stefano Iellamo a, Lin Chen b, Marceau Coupechoux a,⇑a Telecom ParisTech, CNRS LTCI, 46 Rue Barrault, Paris 75013, Franceb University Paris-Sud XI, CNRS, LRI, 91405 Orsay, France

a r t i c l e i n f o a b s t r a c t

Article history:Received 29 June 2012Received in revised form 7 March 2013Accepted 8 March 2013Available online 26 March 2013

Keywords:Cognitive radio networksResource allocationPopulation gameLearning

In this paper, we tackle the problem of opportunistic spectrum access in large-scale cogni-tive radio networks, where the unlicensed Secondary Users (SUs) access the frequencychannels partially occupied by the licensed Primary Users (PUs). Each channel is character-ized by an availability probability unknown to the SUs. We apply population game theoryto model the spectrum access problem and develop distributed spectrum access policiesbased on imitation, a behavior rule widely applied in human societies consisting of imitat-ing successful behaviors. We develop two imitation-based spectrum access policies basedon the basic Proportional Imitation (PI) rule and the more advanced Double Imitation (DI)rule given that a SU can only imitate the other SUs operating on the same channel. A sys-tematic theoretical analysis is presented for both policies on the induced imitation dynam-ics and the convergence properties of the proposed policies to the Nash equilibrium. Simpleand natural, the proposed imitation-based spectrum access policies can be implementeddistributedly based on solely local interactions and thus is especially suited in decentral-ized adaptive learning environments as cognitive radio networks.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction access mechanism is crucial to achieve efficient spectrum

Cognitive radio [1], with its capability to flexibly config-ure its transmission parameters, has emerged in recentyears as a promising paradigm to enable more efficientspectrum utilization. Spectrum access models in cognitiveradio networks can be classified into three categories,namely exclusive use (or operator sharing), commonsand shared use of primary licensed spectrum [2]. In the lastmodel, unlicensed Secondary Users (SUs) are allowed toaccess the spectrum of licensed Primary Users (PUs) in anopportunistic way. In this case, a well-designed spectrum

usage.In this paper, we focus on the generic model of cogni-

tive networks consisting of multiple frequency channels,each characterized by a channel availability probabilitydetermined by the activity of PUs on it. In such a model,from the SUs perspective, a challenging problem is to coor-dinate with other SUs in order to opportunistically accessthe unused spectrum of PUs to maximize its own payoff(e.g., throughput); at the system level, a crucial research is-sue is to design efficient spectrum access protocols achiev-ing optimal spectrum usage and load balancing on theavailable channels.

We tackle the spectrum access problem in large-scalecognitive radio networks from an evolutionary game theo-retic angle. We formulate the spectrum access problem,show the existence of a Nash Equilibrium (NE) and developdistributed spectrum access policies based on imitation, abehavior rule widely applied in human societies consistingof imitating successful behavior. We study the system

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.comnet.2013.03.008&domain=pdf

http://dx.doi.org/10.1016/j.comnet.2013.03.008

mailto:[email protected]



http://dx.doi.org/10.1016/j.comnet.2013.03.008

http://www.sciencedirect.com/science/journal/13891286

http://www.elsevier.com/locate/comnet

1 Imitate If Better (IIB) is a rule consisting in picking a player andmigrating to its strategy if the latter has yielded a higher payoff than theachieved one. IIB is called Random Local Search in [18].

1864 S. Iellamo et al. / Computer Networks 57 (2013) 1863–1879

dynamics and the convergence of the proposed policies tothe NE when the SU population is large. Simple andnatural, the proposed spectrum access policies can beimplemented distributedly based on solely local interac-tions and thus is especially suited in decentralized adap-tive learning environments as cognitive radio networks.

In our analysis, we develop imitation-based spectrumaccess policies where a SU can only imitate the other SUsoperating on the same channel. More specifically, we pro-pose two spectrum access policies based on the followingtwo imitation rules: the Proportional Imitation (PI) rulewhere a SU can sample one other SU; the more advancedadjusted proportional imitation rule with double sampling(Double Imitation, DI) where a SU can sample two otherSUs. Under both imitation rules, each SU strives to improveits individual payoff by imitating other SUs with higherpayoff. A systematic theoretical analysis is presented forboth policies on the induced imitation dynamics and theconvergence properties of the proposed policies to the NE.

The key contribution of our work in this paper lies in thesystematical application of the natural imitation behaviorto address the spectrum access problem in cognitive radionetworks, the design of distributed imitation-based channelaccess policies, and the theoretic analysis on the induced imi-tation dynamics and the convergence to an efficient and sta-ble system equilibrium. In this paper, we extend the resultsof [3], where it is assumed that SUs are able to immediatelyand uniformly imitate any other SU. This assumption makesthe theoretical analysis straightforward from the literatureon imitation. We assume here that SUs can only imitateSUs on the same channel and obtain a delayed information,as a result of which significant changes should be done interms of policy design and theoretical analysis.

The rest of the paper is structured as follows. Section 2 dis-cusses related work in the literature. Section 3 presents thesystem model and Section 4 presents the formulation of thespectrum access game. Section 5 describes the proposed imi-tation-based spectrum access policies and motivates thechoices of proportional and double imitation rules as basis ofour policies. In Section 6, we study the system dynamics andthe convergence of our algorithms. Section 7 discusses theassumptions of our network model. Section 8 presents simula-tion based performance evaluation, where our schemes arecompared to another decentralized approach called Trial andError. Section 9 concludes the paper.

2. Related Work

The problem of distributed spectrum access in cognitiveradio networks (CRN) has been widely addressed in the lit-erature. A first set of papers assumes that the number ofSUs is smaller than the number of channels. In this case,the problem is closely related to the classical Multi-ArmedBandit (MAB) problem [4]. Some recent work has investi-gated the issue of adapting traditional MAB approachesto the CRN context, among which Anandkumar et al. pro-posed two algorithms with logarithmic regret, where thenumber of SUs is known or estimated by each SU [5].Contrary to this literature, we assume in our paper a largepopulation of SUs, able to share the available bandwidthwhen settling on the same channel.

Another important thrust consists of applying gametheory to model the competition and cooperation amongSUs and the interactions between SUs and PUs (see [6]for a review). Several papers propose for example algo-rithms based on no-regret learning (e.g. [7,8]), which arenot guaranteed to converge to the NE. Besides, due to theperceived fairness and allocation efficiency, auction tech-niques have also attracted considerable research attentionand resulted in a number of auction-based spectrum allo-cation mechanisms (cf. [9] and references therein). Thesolution proposed in this paper differs from the existingapproaches in that it requires only local interactionsamong SUs and is thus naturally adapted in the distributedenvironments as CRNs.

Due to the success of applying evolutionary [10] andpopulation game theories [11] in the study of biologicaland economic problems [11], a handful of recent studieshave applied these tools to study resource allocation prob-lems arisen from wired and wireless networks (see e.g.[12,13]), among which Shakkottai et al. addressed theproblem of non-cooperative multi-homing of users toWLANs access points by modeling it as a population game[14]. Authors however focus on the system dynamicsrather than on the distributed algorithms as we do in thispaper. Niyato et al. studied the dynamics of network selec-tion in a heterogeneous wireless network using the theoryof evolutionary game [15]. The proposed algorithm leadingto the replicator dynamics is however based on a central-ized controller able to broadcast to all users the averagepayoff. Our algorithms are on the contrary fully distrib-uted. Coucheney et al. studied the user-network associa-tion problem in wireless networks with multi-technologyand proposed an algorithm based on Trial and Error mech-anisms to achieve the fair and efficient solution [13].

Several theoretical works focus on imitation dynamics.Ackermann et al. investigated the concurrent imitationdynamics in the context of finite population symmetriccongestion games by focusing on the convergence proper-ties [16]. Berenbrik et al. applied the Proportional ImitationRule to load-balance system resources by focusing on theconvergence speed [17]. Ganesh et al. applied the ImitateIf Better rule1 (see [19] for a review on imitation rules) in or-der to load-balance the service rate of parallel server sys-tems [18]. Contrary to our work, it is assumed in [17,18]that a user is able to observe the load of another resource be-fore taking its decision to switch to this resource.

As it is supposed to model human behavior, imitation ismostly studied in economics. In the context of CRN, spe-cific protocol or hardware constraints may however ariseso that imitation dynamics are modified, as we show it inthis paper. Two very recent works in the context of CRNare [20,21], which have the same goals as ours. In [20],authors propose a distributed learning algorithm for spec-trum access. User decisions are based on their accumulatedexperience and they are using a mixed strategy. In [21],imitation is also used for distributed spectrum access.However, the proposed scheme relies on the existence of

Fig. 1. Network model: N SUs try to opportunistically access thespectrum left free by a PU. The spectrum is made of C slotted frequencychannels.

S. Iellamo et al. / Computer Networks 57 (2013) 1863–1879 1865

a common control channel for the sampling procedure.Double imitation is moreover not considered.

3. System model

In this section, we present the system model of ourwork with the notations used.

3.1. System model and PU operation

We consider the network model shown in Fig. 1 made ofa primary network and a secondary network. In the former,a primary transmitter is using on the downlink a set C of Cfrequency channels, each with bandwidth B. The primaryreceivers are operated in a synchronous time-slotted fash-ion. The secondary network is made of a set N of N SUs,which try to opportunistically access the channels whenthey are left free by the PU. We assume that SUs can per-form a perfect sensing of PU transmissions, i.e., no collisioncan occur between the PU and SUs. This assumption isadopted in the literature focusing on resource allocation(see e.g. [22,23,5]). The secondary network is also sup-posed to be sufficiently small so that every SU can receiveand decode packets sent on the same channel.

Let Zi(k) be the random variable equal to 1 when ofchannel i is unoccupied by any PU at slot k and 0 otherwise.We assume that the process {Zi(k)} is stationary and inde-pendent for each i and k, i.e., the Zi(k) are i.i.d. random vari-ables for all (i, k). We also assume that at each time slot,channel i is free with probability li, i.e., E½ZiðkÞ� ¼ li. With-out loss of generality, we assume l1 P l2 P � � �P lC. Thechannel availability probabilities l , {li} are a priori notknown by SUs.

Fig. 2. SU operation: Nb slots of a frequency channel form a block, SUs use a MACa throughput and strategy indication.

3.2. SU operation

We describe in this section the SU operation and capa-bilities. As shown in Fig. 2 for a given frequency channel j,time-slots of the primary network are organized intoblocks of Nb slots. All SUs are assumed to be synchronized,stay on the same channel during a block and may changetheir channel at block boundary. Let ni be the number ofSUs operating on channel i.

Assuming perfect sensing of the cognitive users, there isno secondary transmission during slots occupied by the PU(gray slots on the figure). When the PU is idle, SUs sharethe available bandwidth using a decentralized random ac-cess MAC protocol (hatched slots on the figure). The waythis MAC protocol is implemented is out of the scope ofthe paper. Mini-slots can for example be used at the begin-ning of each slot in order to perform CSMA, as assumed in[20], or CSMA/CA can be used, as assumed in [24]. Formathematical convenience, we will assume in Sections 5and 6 that the MAC protocol is perfect and operates likeTDMA. Our motivation of such an assumption is to concen-trate the analysis on the interactions between SUs and theresulting structural properties of the system equilibria. Theresults give an upper-bound on the performance of thedeveloped policies. A similar asymptotic analysis has beencarried on in [25].

In our work, each SU j is modeled as a rational decisionmaker, striking to maximize the throughput it can achieve,denoted as Ti

j when j operates on channel i. Assuming a fairMAC protocol and invoking symmetry reasons, all SUs onchannel i obtain the same expected throughput, whichcan be expressed as a function of ni as piðniÞ ¼ E½Ti

j� forall j operating on channel i. It should be noted that pi(ni)depends on the MAC protocol implemented at the cogni-tive users. An example is pi(ni) = Bli/ni in the case of a per-fect MAC protocol operating like TDMA, where B is aconstant standing for the channel bandwidth. Generically,pi(ni) can be rewritten as pi(ni) = BliS(ni) where S(ni) de-notes the throughput of a channel of unit bandwidth with-out PU. Without loss of generality, we will now assumethat B = 1. The assumption that SUs on the same channelobtain the same expected throughput can be found in theliterature using evolutionary game theory to study spec-trum access, see e.g. [15,26,20].

Channel availabilities, li, are estimated in the long termby SUs, while the expected throughput pi and the numberof SUs ni are estimated at the end of each block. In all theirtransmissions in block b, SUs include in the header of theirpackets the throughput pi(ni) obtained in block b � 1 and

protocol for their transmissions and include in the header of their packets


the corresponding channel (or strategy) i. We further as-sume that every SU can overhear at random one or twopackets transmitted by SUs on the same channel and de-code the throughput and strategy indications. The over-hearing of packets is called a sampling procedure and wewrite i [ j when SU i samples SU j. Sampling is supposedto be symmetric, i.e., the probabilities P(i [ j) and P(j [ i)are identical. After the sampling, SU transmitters commu-nicate to their receiver a channel change order to be exe-cuted at the next block boundary.

4. Spectrum access game formulation

To study the interactions among autonomous SUs and toderive distributed channel access policies, we formulate inthis section the channel selection problem as a spectrum ac-cess game where the players are the SUs and we show theuniqueness of the Nash Equilibrium (NE) when the numberof SUs is large. The game is defined formally as follows:

Definition 1. The spectrum access game G is a 3-tuple(N ; C; fUjg), whereN is the player set, C is the strategy set ofeach player. Each player j chooses its strategy sj 2 C tomaximize its payoff function Uj defined as Uj ¼ psj ðnsj Þ ¼E½Tsj

j �.The solution of the spectrum access game G is charac-

terized by a Nash equilibrium [27], a strategy profile fromwhich no player has incentive to deviate unilaterally.

Lemma 1. For the spectrum access game G, there exists atleast one Nash equilibrium.

Proof. Given the form of the SU payoff function, it followsfrom [28] that the spectrum access game is a congestiongame and a potential game with potential function:Pðn1; . . . ;nCÞ ¼

Pi2CPni

k¼1piðniÞ, where ni is the number ofSUs on channel i and

Pini ¼ N. This function takes only a

finite set of values and thus achieves a maximum value. h

We now consider the population game G, where (1)the number of SUs is large, (2) SUs are small, (3) SUsinteract anonymously and (4) payoffs are continuous(see [11] for the discussion on these assumptions). Inthis model, we focus on the system state x , fxi; i 2 Cgwhere xi denotes the proportion of SUs choosing channeli. In such context, by regarding xi as a continuous vari-able, we make the following assumption on the through-put function S(xiN).

Assumption 1. S(xiN) is strictly monotonously decreasingand it holds that S(xiN) 6 1/(Nxi).

We can now establish the uniqueness of the NE in thespectrum access game G for the asymptotic case in the fol-lowing lemma and theorem.

Lemma 2. For N sufficiently large, there is no empty channelat NE.

Proof. Assume, by contradiction, that at a NE, there are noSUs on channel i. Since there are C channels, at a NE, there

exists at least one channel where there are at least N/C SUs.Assume that this channel is channel j, i.e., nj P N/C. Con-sider a SU on channel j, its payoff is pj(nj) = ljS(N/C). FromAssumption 1, pj(nj) 6 ljC/N. Now let a SU in channel jswitch to channel i, its payoff becomes pi(1) = liS(1). Itholds straightforwardly that pj(nj) < pi(1) when N >

ljCliSð1Þ

.Hence there is no empty channel at NE. h

Theorem 1. For N sufficiently large, G admits a unique NE,where all SUs get the same payoff. Let y denote the root ofP

i2CS�1 y

li

� �¼ N, at the NE, there are S�1 y

li

� �SUs operating

on channel i.

Proof. This theorem is a classical result of populationgames. See Appendix A for more details. h

We can observe two desirable properties of the uniqueNE derived in Theorem 1: (1) the NE is optimal from thesystem perspective as the total throughput of the networkachieves its optimum at the NE and (2) at NE, all SUs obtainexactly the same throughput. Note that any state such thatxi > 0 for all i 2 C is also system optimal, the NE is one ofthem. Note also that when N grows indefinitely and asplayers are symmetric, the NE approaches the Wardropequilibrium of the system [29].

One critical challenge in the analyzed spectrum accessgame is the design of distributed spectrum access strate-gies for rational SUs to converge to the NE. In response tothis challenge, we develop in the sequel sections of this pa-per an efficient spectrum access policy.

5. Imitation-based spectrum access policies

The spectrum access policy we develop is based on imita-tion. As a behavior rule widely observed in human societies,imitation captures the behavior of a bounded rational playerthat mimics the actions of other players with higher payoff inorder to improve its own payoff from one block to the next,while ignoring the effect of its strategy on the future evolu-tions of the system and forgetting its past experience. The in-duced imitation dynamics model the spreading of successfulstrategies under imitation [30]. In this section, we developtwo spectrum access policies based on the proportional imi-tation rule and the double imitation rule. For tractability rea-sons, we assume in the next sections that pi(ni) = li/ni on achannel i, i.e., a perfect MAC protocol for SUs.

5.1. Motivation

In this first part, we recall some useful definitions givenin [30], we introduce new notations and we provide ourmotivations.

Definition 2. A behavioral rule with single sampling (resp.with double sampling) is a function F : C2 ! DðCÞ (resp.F : C3 ! DðCÞ), where DðCÞ is the set of probability distri-butions on C and Fk

i;j;8i; j; k 2 C (resp. Fki;j;l;8i; j; k; l 2 C) is the

probability of choosing channel k in the next iteration(block) after operating on channel i and sampling a SU withstrategy j (resp. sampling two SUs with strategies j and l).

1. Initialization: Set the parameters x, a and r = 1/(x � a).

Define [A]+,max{0, A} and QðUÞ , 2� U�a

x�a.2. At t = 0 and t = 1, randomly choose a channel and

store the payoff Uj(0).3. while at each iteration t P 2 do4. Let i and Uj be resp. the channel and the payoff

of j at t � 1.5. Randomly sample two SUs j1 and j2 (with

channels i1 and i2 and with payoffs Uj1and Uj2

resp.at t � 1). Suppose w.l.o.g. that Uj1

6 Uj2

6. if j{i, i1, i2}j = 1, i.e., i = i1 = i2 then7. Go to channel i.8. else if j{i, i1, i2}j = 2 then9. if i = i1, i – i2 and Uj 6 Uj2

then10. pj2

¼ r2 QðUjÞðUj2

� UjÞ.Switch to channel i2 w.p. pj2

and go tochannel i w.p. 1� pj2

.11. else if i1 = i2, i – i1 and Uj 6 Uj1

¼ Uj2then

12. pj1¼ r

2 ðQðUj1Þ þ QðUjÞÞðUj1

� UjÞ.Switch to channel i1 w.p. pj1

and go to


Definition 3. A behavioral rule with single sampling is imi-tating if Fk

i;j ¼ 0 when k R {i, j}. A behavioral rule with dou-ble sampling is imitating if Fk

i;j;l ¼ 0 when k R {i, j, l}.In this paper, we assume that all SUs adopt the same

behavioral rule, i.e., the population is monomorphic in thesense of Schlag [30] (see e.g. [31–33] for other papers usingthis notion).

Schlag has shown in [30] that the Proportional ImitationRule (PIR) is an improving rule, i.e., in any state of the sys-tem the expected average payoff is increasing after an iter-ation of the rule. He has also shown that it is a dominantrule, i.e., it always achieves a higher expected payoffimprovement than any other improving rule. PIR ismoreover the unique dominant rule that never imitates astrategy that achieved a lower payoff and that minimizesthe probability of switching among the set of dominantrules.

Schlag has also shown in [34] that the Double Imitation(DI) rule is the rule that causes less SUs to change theirstrategy after each iteration among the set of improvingbehavioral rules with double sampling. As switching mayrepresent a significant cost for today’s wireless devices interms of delay, packet loss and protocol overhead, thisproperty makes PIR and DI particularly attractive. Theseproperties motivate the design of spectrum access policiesbased on PIR and DI.

5.2. Spectrum access policy based on proportional imitation

Algorithm 1 presents our proposed spectrum accesspolicy based on the proportional imitation rule, termedas PISAP. The core idea is: At each iteration t, each SU(say j) randomly selects another SU (say j0) on the samechannel; if the payoff at t � 1 of the selected SU (denotedUj0 ðt � 1Þ) is higher than its own payoff at t � 1 (denotedUj(t � 1)), the SU imitates the strategy of the selected SUat the next iteration with a probability proportional tothe payoff difference, with coefficient the imitation factorr.2 The payoff and the strategy at t � 1 of the sampled SUare read from the packet header.

channel i w.p. 1� pj1.

13. end if14. else if j{i, i1, i2}j = 3 then15. if Uj 6 Uj1

6 Uj2then

16. pj1 ¼r2 ½QðUjÞðUj1

� Uj2Þ þ QðUj2

ÞðUj1� UjÞ�þ.

pj2¼ r

2 ½QðUj1ÞðUj2

� UjÞ þ QðUj2ÞðUj1

� UjÞ� � pj1.

Switch to channel i1 w.p. pj1, to channel

i2 w.p. pj2and go to channel i w.p. 1� pj1

� pj2.

17. else if Uj16 Uj 6 Uj2

then18. pj2 ¼

r2 ½QðUj1

ÞðUj2� UjÞ þ QðUj2

ÞðUj1� UjÞ�þ.

Switch to channel i2 w.p. pj2and go to

channel i w.p. 1� pj2.

19. end if

Algorithm 1. PISAP: Executed at each SU j

1: Initialization: Set the imitation factor r2: At t = 0, randomly choose a channel to stay and

store the payoff Uj(0).3: while at each iteration t P 1 do4: Randomly select a SU j0

5: if Ujðt � 1Þ < Uj0 ðt � 1Þ then6: Migrate to the channel sj0 ðt � 1Þ with

probability p ¼ rðUj0 ðt � 1Þ � Ujðt � 1ÞÞ7: endif8: endwhile

2 One way of setting r is to set r = 1/(x � a), where x and a are twoexogenous parameters such that Uj 2 ½a;x�;8j 2 C. In our case, x = 1 anda = 0 can be chosen.

5.3. Spectrum access policy based on double imitation

In this subsection, we turn to a more advanced imita-tion rule, the double imitation rule [34] and propose theDI-based spectrum access policy, termed as DISAP. UnderDISAP, each SU randomly samples two SUs on the samechannel by decoding two packet headers. It then imitatesthem with a certain probability determined by the payoffdifferences. The spectrum access policy based on the dou-ble imitation is detailed in Algorithm 2, in which each SU j(with payoff Uj and strategy i at t � 1) randomly samplestwo other SUs j1 and j2 (operating at t � 1 on channel i1and i2 respectively and, without loss of generality, withutilities Uj1 6 Uj2 ) and updates the probabilities of switch-ing to channels i1 and i2, denoted as pj1

and pj2respectively.

Algorithm 2. DISAP: Executed at each SU j.

20. else21. Go to channel i.22. end if23. end while


5.4. Discussion

As pointed in [34], double imitation may seem compli-cated compared to the proportional imitation rule. We canhowever extract the following properties [34]: DI is an imi-tating rule, i.e., a SU never chooses a channel that is not inhis sample; switching probabilities are continuous in thesampled payoffs and increase with payoff differences; fora joint sample with three different channels, the most suc-cessful channel is chosen more likely; a SU never imitatesanother SU that obtains a lower payoff.

Note that PISAP is clearly different from the propor-tional imitation rule presented in [30] in the sense that aSU is not able to uniformly sample another SU across thenetwork. The radio constraint indeed forces it to samplea SU on the same channel. As in this case, current strategiesare identical, imitation is based on the previous iteration.

If every SU is able to uniformly sample another SU inthe network and to imitate the current strategy, the systemdynamics is straightforward to obtain. It is indeed showne.g. in [35] that in the asymptotic case (assuming continu-ous time for simplicity), the proportional imitation rulegenerates a population dynamics described by the follow-ing set of differential equations:

_xiðtÞ ¼ rxiðtÞ½piðtÞ � pðtÞ�; 8i 2 C; ð1Þ

where �p ,P

i2Cxipi denotes the expected payoff of all SUsin the network. This equation can be easily solved as:

xiðtÞ ¼ xið0Þ �liPl2Cll

� �e�P

l2Cll

� �rt þ liP

l2Cll; 8i 2 C:

ð2Þ

The imitation dynamics induced by PIR thus convergesexponentially in time to an evolutionary equilibrium,which is also the NE of G.

With the same assumption, the double imitation rulegenerates in the asymptotic case an aggregate monotonedynamics [34,36], which is defined as follows:

_xi ¼xi

x� a1þx� �p

x� a

� ðpi � �pÞ; 8i 2 C; ð3Þ

whose solution is

xiðtÞ ¼ xið0Þ �liPl2Cll

� �e�

�px�a 1þx��p

x�að Þt þ liPl2Cll

; 8i 2 C:

ð4Þ

As a consequence, DI converges exponentially in time tothe NE of the spectrum access game G, however at a higherrate than PIR because by definition r ¼ 1

x�a and x and abeing upper and lower bounds on payoffs, x��p

x�a P 0. From(2) and (4), it turns out that the aggregate monotonedynamics is a time-rescaled version of the replicatordynamics, as pointed in [10]. We will see in the next sec-tion that both dynamics continue to play an important rolein our model.

As desirable properties, the proposed imitation-basedspectrum access policies (both PISAP and DISAP) are state-less, incentive-compatible for selfish autonomous SUs andrequires no central computational unit. The spectrumassignment is achieved by local interactions among auton-

omous SUs. The autonomous behavior and decentralizedimplementation make the proposed policies especiallysuitable for large scale cognitive radio networks. The imita-tion factor r controls the tradeoff between the conver-gence speed and the channel switching frequency in thatlarger r represents more aggressiveness in imitation andthus leads to fast convergence, at the price of more fre-quent channel switching for the SUs.

6. Imitation dynamics and convergence

We have seen that proportional imitation and doubleimitation rules generate a replicator dynamics and anaggregate monotone dynamics. In the sequel analysis, westudy the induced imitation dynamics and the convergenceof the proposed spectrum access policies PISAP and DISAP,which take into account the constraint imposed by SUradios.

6.1. System dynamics

In this subsection, we first derive in Theorem 2 thedynamics for a generic imitation rule F with large popula-tion. We then derive in Lemma 3, Theorems 3 and 4 thedynamics of the proposed proportional imitation policy PI-SAP and study its convergence. The counterpart analysisfor the double imitation policy DISAP is explored in Lemma4, Theorems 5 and 6.

We start by introducing the notation used in our analysis.At an iteration, we label all SUs performing strategy i (chan-nel i in our case) as SUs of type i and we refer to the SUs on sj

as neighbors of SU j. We denote nliðtÞ the number of SUs on

channel i at iteration t and operating on channel l at t � 1.It holds that

Pl2Cn

liðtÞ ¼ niðtÞ and

Pi2Cn

liðtÞ ¼ nlðt � 1Þ. For

a given state sðtÞ , fsjðtÞ; j 2 Ng of the system at iteration tand for a finite population of size N, we denote pi(t) , ni(t)/N the proportion of SUs of type i and pl

iðtÞ , nliðtÞ=N the pro-

portion of SUs migrating from channel l to i. We use x insteadof p to denote these proportions when N is large. It holds thatp ? x when N ? +1.

Denote by F a generic imitation rule under the channelconstraint. In the case of a simple imitation rule (e.g. PI-SAP), F is characterized by the probability set fFi

j;kg, whereFi

j;k denotes the probability that a SU choosing strategy j atthe precedent iteration imitates another SU choosing strat-egy k at the precedent iteration and then switches to chan-nel i at next iteration after imitation. Instead, by applying adouble imitation rule (e.g. DISAP), we can characterize F bythe probability set fFi

j;k;lg, where Fij;k;l denotes the probabil-

ity that a SU choosing strategy j at the precedent iterationimitates two neighbors choosing respectively strategy kand strategy l at the precedent iteration and then switchesto channel i at the next iteration after imitation. In bothcases the only way to switch to a channel i is to imitate aSU that was on channel i. That means Fi

j;k ¼ 0;8k – i (PI-SAP) and Fi

j;k;l ¼ 0, "k, l – i (DISAP).At the initialization phase (iterations 0 and 1), each SU

randomly chooses its strategy with uniform distribution.In the asymptotic case, we thus have 8i 2 C; xið0Þ > 0 andxi(1) > 0 a.s. After that, the system state at iteration t + 1,


denoted as p(t + 1) (x(t + 1) in the asymptotic case), de-pends on the states at iteration t and t � 1.

We have now the following theorem that relates the fi-nite and asymptotic cases.

Theorem 2. For any imitation rule F, if the imitation amongSUs of the same type occurs randomly and independently,then "d > 0, � > 0 and any initial state f~xið0Þg; f~xið1Þg suchthat 8i 2 C; ~xið0Þ > 0 and ~xið1Þ > 0, there exists N0 2 N suchthat if N > N0;8i 2 C, the event jpi(t) � xi(t)j > d occurswith probability less than � for all t, wherepið0Þ ¼ xið0Þ ¼ ~xið0Þ; pið1Þ ¼ xið1Þ ¼ ~xið1Þ. In the case of asimple imitation policy it holds that

xiðt þ 1Þ ¼X

j;l;k2C

xljðtÞxk

j ðtÞxjðtÞ

Fil;k 8i 2 C:

Differently, a double imitation policy yields:

xiðt þ 1Þ ¼X

j;l;k;z2C

xljðtÞxk

j ðtÞxzj ðtÞ

½xjðtÞ�2Fi

l;fk;zg 8i 2 C:

Proof. The proof consists of first showing the theoremholds for iteration t = 2 and then proving the case t P 3by induction. The detail is in Appendix B. h

Theorem 2 is a result on the short run adjustments oflarge populations under any generic imitation rule F: theprobability that the behavior of a large population differsfrom the one of an infinite population is arbitrarily smallwhen N is sufficiently large. In what follows, we studythe convergence of PISAP and DISAP specifically.

6.2. PISAP dynamics and convergence

In this section, we now focus on PISAP and derive theinduced imitation dynamics in the following analysis.

Lemma 3. On the proportional imitation policy PISAP underchannel constraint, it holds that

xjiðt þ 1Þ ¼

Xl;k2C

xljðtÞxk

j ðtÞxjðtÞ

Fil;k 8i; j 2 C: ð5Þ

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

tim

type

pro

porti

on x

i = m

i/N

Fig. 3. PISAP dynamics and its approxima

Proof. The proof is straightforward from the analysis inthe proof of Theorem 2. h

Theorem 3. The proportional imitation policy PISAP underchannel constraint generates the following dynamics in theasymptotic case:

xiðtþ1Þ¼ xiðt�1Þþrpiðt�1Þxiðt�1Þ�rXj;l2C

plðt�1Þxi

jðtÞxljðtÞ

xjðtÞ;

ð6Þ

where pi(t) denotes the expected payoff of an individual SU onchannel i at iteration t.

Proof. See Appendix C. h

Although we are not able to prove it theoretically, weobserve via extensive numerical experiments that (6) con-verges to the NE. The formal proof is left for future work. Toget more in-depth insight on the dynamics (6), we noticethat under the following approximation:

Xl2C

plðt � 1Þxl

jðtÞxjðtÞ

� �pðt � 1Þ; ð7Þ

where �pðt � 1Þ is the average individual payoff for thewhole system at iteration t � 1, noticingP

jxijðtÞ ¼ xiðt � 1Þ, (6) can be written as:

xiðt þ 1Þ ¼ xiðt � 1Þ þ rxiðt � 1Þ½piðt � 1Þ � �pðt � 1Þ�: ð8Þ

Note that the approximation (7) states that in any channel jat iteration t, the proportions of SUs coming from anychannel l are representative of the whole population.

Under the approximation (7), given the initial state{xi(0)}, {xi(1)}, we can decompose (8) into the followingtwo independent discrete-time replicator dynamics:

xiðuÞ ¼ xiðu� 1Þ þ rxiðu� 1Þ½piðu� 1Þ � �pðu� 1Þ�;xiðvÞ ¼ xiðv � 1Þ þ rxiðv � 1Þ½piðv � 1Þ � �pðv � 1Þ�;

ð9Þ

where u = 2t, v = 2t + 1. The two equations in (9) illustratethe underlying system dynamics hinged behind PISAP un-der the approximation (7): it can be decomposed into two

12 14 16 18 20

e

PISAP dynamic approx. with double replicators

x1

x3

x2

tion by double replicator dynamics.


independent delayed replicator dynamics that alterna-tively occur at the odd and even iterations, respectively.The following theorem establishes the convergence of (9)to a unique fixed point, which is also the NE of the spec-trum access game G.

Theorem 4. Starting from any initial point, the systemdescribed by (9) converges to a unique fixed point which isalso the NE of the spectrum access game G.

Proof. The proof, of which the detail is provided in Appen-dix D, consists of showing that the mapping described by(9) is a contraction mapping. h

As an illustrative example, Fig. 3 shows that the doublereplicator dynamics provides an accurate approximation ofthe system dynamics induced by PISAP.

6.3. DISAP dynamics and convergence

We now focus on DISAP and derive the induced imita-tion dynamics.

Lemma 4. On the double imitation policy DISAP underchannel constraint, it holds that

xjiðt þ 1Þ ¼

Xl;k;z2C

xljðtÞxk

j ðtÞxzj ðtÞ

½xjðtÞ�2Fi

l;k;z 8i; j 2 C: ð10Þ

Proof. The proof is straightforward from the analysis inthe proof of Theorem 2. h

Theorem 5. The double imitation policy DISAP under channelconstraint generates the following dynamics in the asymptoticcase:

xiðt þ 1Þ ¼ xiðt � 1Þ þX

j

xijðtÞQð�pjðt � 1ÞÞðpiðt � 1Þ

� �pjðt � 1ÞÞ; ð11Þ

where �pjðt � 1Þ ¼P

k

xkjðtÞ

xjðtÞpkðt � 1Þ and QðUÞ , 2� U�a

x�a.

Proof. See Appendix E. h

1 2 3 4 5 6

0.2

0.25

0.3

0.35

0.4

0.45

0.5

ti

type

pro

porti

on x

i=mi/N

DI ap

Fig. 4. DISAP dynamics and its approximation b

Again, we are not able to prove it analytically and leavethe formal proof as future work. However, we observe viaextensive numerical experiments that (11) converges tothe NE and, as shown in Fig. 5, is also characterized by asmoother and faster convergence with respect to the pro-portional imitation dynamics (Eq. (6)).

By performing the approximation �pjðt � 1Þ � �pðt � 1Þfor all j, (11) can be written as:

xiðt þ 1Þ ¼ xiðt � 1Þ þ xiðt � 1ÞQð�pðt � 1ÞÞðpiðt � 1Þ � �pðt � 1ÞÞ:ð12Þ

Given the initial state {xi(0)}, {xi(1)}, we can nowdecompose (12) into the following two independent dis-crete-time aggregate monotone dynamics:

xiðuÞ ¼ xiðu� 1Þ þ xiðu� 1Þ½2� �pðu� 1Þ�½piðu� 1Þ � �pðu� 1Þ�;xiðvÞ ¼ xiðv � 1Þ þ xiðv � 1Þ½2� �pðv � 1Þ�½piðv � 1Þ � �pðv � 1Þ�;

ð13Þ

where u = 2t, v = 2t + 1. The underlying system dynamicscan thus be decomposed into two independent delayedaggregate monotone dynamics that alternatively occur atthe odd and even iterations, respectively. The followingtheorem establishes the convergence of (13) to a uniquefixed point which is also the NE of the spectrum accessgame G. The proof follows exactly the same analysis as thatof Theorem 4.

Theorem 6. Starting from any initial point, the systemdescribed by (13) converges to a unique fixed point which isalso the NE of the spectrum access game G.

As an illustrative example, Fig. 4 shows that the doubleaggregate dynamics provides an accurate approximation ofthe system dynamics induced by DISAP.

7. Discussion

This paper is a first step to systematically apply imita-tion rules to cognitive radio networks. There are severalpoints to be tackled in order to make the model morerealistic.

� It is assumed in this paper that a SU can capture anotherSU packet for sampling with probability 1. Assuming acapture probability less than 1 would have the same

7 8 9 10 11 12me

SAP dynamicprox. with double aggregate monotone

y double aggregate monotone dynamics.


effect as decreasing the value of r in PISAP, i.e., it wouldslow down the convergence speed of the proposedalgorithms.� It is assumed that all SUs can all hear each other on the

same channel. A more realistic setting would consider agraph of possible communications between SUs. In thiscase, our algorithms are not any more ensured to con-verge. This point is left for further work. A promisingapproach is to use the results of the literature on ‘learn-ing from neighbors’, which studies the conditions underwhich efficient actions are adopted by a population ifagents receive information only from their neighbors(see e.g. [37]).� In this paper, SUs are supposed to provide in their

packet header the exact average throughput that canbe obtained on a given channel. We have investigatedin [24] the effect of providing only an estimate of theaverage throughput. Assuming the use of CSMA/CA asSU MAC protocol, we have shown by simulations thatour algorithms continue to converge in this more realis-tic context.� For mathematical convenience, we have assumed in

this paper that the SU MAC protocol was perfect andcould act as TDMA. Although unrealistic, this approachgives an upper bound on the performance of our poli-cies. Also, the analysis can be extended with other morerealistic MAC protocols by adapting the utility func-tions. Particularly, we have investigated in [24] theuse of CSMA/CA and shown by simulations the conver-gence of our policies.� It is assumed that a generic PU transmits with a certain

probability in TDMA-like mode. If there are multiple PUtransmitters it is possible to distinguish two cases:

1. The transmission of each PU covers the totality ofSU receivers. This scenario boils down to the caseof a unique generic PU transmitter.

2. The transmission of one or more PUs covers a sub-set of the SU receivers. In this case, different SUsmay have different perceptions of the environmentand a further analysis, based on the fact that thechannel availability probabilities are now depen-dent on both channel i and SU j, should be carriedon. This point is left for future work.

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

time

type

pro

porti

on x

i=mi/N

Fig. 5. PISAP and DISAP system dyn

8. Performance evaluation

In this section, we conduct simulations to evaluate theperformance of the proposed imitation-based channel ac-cess policies (PISAP and DISAP) and demonstrate someintrinsic properties of the policies, which are not explicitlyaddressed in the analytical part of the paper.

For performance comparison, we also show the resultsobtained by simulating Trial and Error [38] (shortened intoT&E in the following). The latter has been chosen as it is, tothe best of our knowledge, one of the best existing mecha-nisms that (1) applies to our model and (2) is guaranteed toconverge to a NE. In T&E, players locally implement a statemachine, so that at each iteration each player is character-ized by a state, which is defined by the triplet {cur-rent_mood, benchmark_mood, benchmark_strategy}. Playerscurrent mood (the four possible moods are: content, watch-ful, hopeful and discontent) reflects the machine reaction toits experience of the environment. A NE is reached wheneverybody is in state content.

8.1. Simulation settings

We simulate two cognitive radio networks, termed Net-work 1 and Network 2. We study the performance of ouralgorithms on Network 1, and compare their convergencebehaviors and fairness to the ones obtained by T&E on Net-work 2.

� Network 1: We consider N = 50 SUs, C = 3 channels char-acterized by the availability probabilitiesl = [0.3,0.5,0.8].� Network 2: We set N = 10, C = 2 and l = [0.2,0.8].

Note that the introduction of Network 2 has been nec-essary as the dynamics induced by T&E turns out to be veryslow to converge on the bigger Network 1 (after 105 itera-tions convergence is still not achieved).

We assume that the block duration is long enough, sothat the SUs, regardless of the occupied channel, can eval-uate their payoff without errors. T&E learning parameters(i.e., experimentation probability and benchmark moodacceptance ratio) are set at each iteration according to [39].

12 14 16 18 20

PISAP dynamic DISAP dynamic

amics in the asymptotic case.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

5

10

15

20

25

30

iteration t

num

ber o

f SU

s pe

r cha

nnel

channel 1

channel 2

channel 3

Fig. 6. PISAP on Network 1: number of SUs per channel as a function of the number of iterations.

500 1000 1500 2000 2500 3000 3500 4000 4500 50005

10

15

20

25

30

iteration t

num

ber o

f SU

s pe

r cha

nnel

Fig. 7. DISAP on Network 1: number of SUs per channel as a function of the number of iterations.


8.2. System dynamics

In Fig. 5, the trajectories described by (6) and (11) arecompared. The first part of the curves is characterized byimportant variations. This can be interpreted by the over-lap of two replicator/aggregate monotone dynamics atodd and even instants, as explained in Section 6. We ob-serve that, in the asymptotic case, DISAP outperforms PI-SAP as it is characterized by less pronounced waveletsand a faster convergence. However, both dynamics cor-rectly converge to an evolutionary equilibrium. It is easyto check that the converged equilibrium is also the NE ofG and the system optimum, which confirms our theoreticanalysis. The dynamics presented in Fig. 5 are valid in anasymptotic case, when the number of SUs is large. Wenow turn our attention to small size scenarios.

8.3. Convergence with finite number of SUs

We study in this section the convergence of PISAP andDISAP on Network 1 (N = 50, C = 3). Figs. 6 and 7 show arealization of our algorithms. We notice that an imita-tion-stable equilibrium is achieved progressively followingthe dynamics characterized by (6) and (11). The equilib-rium is furthermore very close to the system optimum:we can in fact check that, according to Theorem 1, the pro-

portion of SUs choosing channels 1, 2 and 3 at the systemoptimum is 0.1875, 0.3125 and 0.5 respectively; in thesimulation results we observe that there are 9, 16 and 25SUs settling on channels 1, 2 and 3 respectively. We alsonotice on this example that DISAP convergence is fasterthan PISAP convergence.

We now focus on Network 2 (N = 10, C = 2) and compareT&E convergence behavior (Fig. 8)) to the trends of PISAP(Fig. 9) and DISAP (Fig. 10). It is easy to notice that T&Econverges in a much slower and more chaotic way with re-spect to PISAP and DISAP. With T&E, the search of a NE mayturn out to be extremely long (in the realization depictedin Fig. 8, e.g., convergence is achieved within 3.5 � 103 iter-ations). On the contrary, PISAP and DISAP converge within75 and 32 iterations respectively.

8.4. System fairness

We now turn to the analysis of the fairness of the pro-posed spectrum access policies. To this end, we adopt theJain’s fairness index [40], which varies in [0,1] andreaches its maximum, when the resources are equallyshared amongst users. Figs. 11 and 12, whose curves rep-resent an average over 103 independent realizations onNetwork 2 of our algorithms and of T&E respectively,show that PISAP and DISAP clearly outperform T&E in

20 40 60 80 100 120 140 160 180 2001

2

3

4

5

6

7

8

9

iteration t

num

ber o

f SU

s pe

r cha

nnel channel 2

channel 1

Fig. 9. PISAP on Network 2: number of SUs per channel as a function of the number of iterations.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

1

2

3

4

5

6

7

8

9

10

iteration t

num

ber o

f SU

s pe

r cha

nnel

channel 1

channel 2

Fig. 8. T&E on Network 2: number of SUs per channel as a function of the number of iterations.

10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

iteration t

num

ber o

f SU

s pe

r cha

nnel channel 2

channel 1

Fig. 10. DISAP on Network 2: number of SUs per channel as a function of the number of iterations.


terms both of fairness and convergence speed. Infact, while our system turns out to be very fair fromthe early iterations, T&E needs 6 � 103 iterations to getits system to reach a fairness value of 0.85. FromFig. 11, one can further infer that indeed DISAP convergesmore rapidly than PISAP: for example, a fairness index of0.982 is reached at t = 100 by DISAP and at t = 200 byPISAP.

8.5. Switching cost

At last, we concentrate on the switching frequency ofthe three algorithms because switching may represent asignificant cost for today’s wireless devices in terms of de-lay, packet loss and protocol overhead. In Fig. 13, we definethe switching cost at iteration t as the number of strategyswitches between 0 and t. After 200 iterations, the

0 50 100 150 2000

100

200

300

400

500

600

iteration t

switc

hing

cos

t

PISAP DISAP T&E

Fig. 13. Switching cost of PISAP, DISAP and T&E in Network 2.

0 50 100 150 200 250 3000.85

0.9

0.95

1

iteration t

Jain

’s fa

irnes

s in

dex PISAP

DISAP

Fig. 11. PISAP and DISAP on Network 1: Jain’s fairness index as a function of the number of iterations (average over 103 realizations).

1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.4

0.5

0.6

0.7

0.8

0.9

iteration t

Jain

’s fa

irnes

s in

dex

Fig. 12. T&E on Network 2: Jain’s fairness index as a function of the number of iterations (average over 103 realizations).


switching cost of DISAP and PISAP has stabilized becauseconvergence has been reached. On the contrary, T&E exhib-its a fast growing cost.

8.6. Imperfect observations of PU activity

We assumed so far that cognitive radio users observethe channel activity of the primary user without errors.In this section, we investigate the performance of the pro-posed algorithms when the PU activity is imperfectly ob-

served by the SUs. We denote by Pe the SU probability oferror in detecting the PU activity and by Qe the miss detec-tion probability of an idle PU (probability of false alarm).The expected value of the payoff experienced by SUs onchannel i can be written as follows:

E½piðniÞ� ¼Xni�1

m¼0

ni � 1m

� �ð1� Q eÞmQ ni�1�m

e ð1� QeÞli

mþ 1;

ð14Þ

2 3 4 5 6 7

0.1

0.15

0.2

0.25

number of SUs

estim

ated

thro

ughp

ut

Qe=0.3

Qe=0

Qe=0.1

Qe=0.6

Fig. 14. Expected throughput deviation under sensing errors (li = 0.6).


which does not depend on Pe because a miss detectionof the PU activity does not affect the throughput ofany SU.

We now want to evaluate the impact of Qe on the ex-pected throughput estimates. To this end, we calculatethe values taken by (14) for different values of Qe and fordifferent numbers of SUs. Results are shown in Fig. 14. Sur-prisingly, we see that the estimates under sensing errorsrapidly converge to the values calculated for the ideal casewith no miss detections (i.e., Qe = 0). This is due to the factthat a trade-off arises. On the one hand, a SU, which is un-able to detect a free slot experiences a penalty in itsthroughput. On the other hand, there are less SUs in aver-age accessing free slots, which results in a higher through-put. As shown in Fig. 14, the two effects counterweightwhen the number of SUs gets larger. Hence, one can inferthat in practice the impact of miss detections of the PUactivity/inactivity is limited for a number of SUs on thesame channel greater than 5.

9. Conclusion and further work

In this paper, we address the spectrum access problemin cognitive radio networks by applying population gametheory and develop two imitation-based spectrum accesspolicies. In our model, a SU can only imitate the otherSUs operating on the same channel. This constraint makesthe basic proportional imitation and double imitation rulesirrelevant in our context. These two imitation rules arethus adapted to propose PISAP, a proportional imitationspectrum access policy, and DISAP, a double imitationspectrum access policy. A systematic theoretical analysisis presented on the induced imitation dynamics and theconvergence properties of the proposed policies to theNash equilibrium. Simulation results show the efficiencyof our algorithms even for small size scenarios. It is alsoshown that PISAP and DISAP outperform Trial and Errorin terms of convergence speed and fairness. As an impor-tant direction of the future work, we plan to investigatethe imitation-based channel access problem in the moregeneric multi-hop scenario where SUs can imitate theirneighbors and derive the relevant channel access policies.

Appendix A. Proof of Theorem 1

We first show that any point x ¼ ðxiÞi2C cannot be a NE ifthere exists i1 and i2 such that pi1 ðNxi1 Þ < pi2 ðNxi2 Þ. Other-wise, consider the strategy profile x0 where �N SUs movefrom channel i1 to i2. For N large and with sufficient small�, it follows from the continuity of pi(xi) thatpi1 ðNðxi1 � �ÞÞ < pi2 ðNðxi2 þ �ÞÞ, which indicates that byswitching from i1 to i2, one can increase its payoff. We thenproceed to show the second part of the theorem. To this end,let y denote the payoff of any SU at the NE, we have:liSðxiNÞ ¼ y;8i 2 C. It follows that xiN ¼ S�1 y

li

� �. Noticing

thatP

ixi ¼ 1, at the NE, we have:

Xi2C

S�1 yli

� �¼ N: ðA:1Þ

Since a NE is ensured to exist, (A.1) admits at least a solu-tion y. Moreover, it follows from the strict monotonicity ofS in Assumption 1 that its inverse function S�1 is alsostrictly monotonous. Hence (A.1) admits a unique solution.We thus complete the proof.

Appendix B. Proof of Theorem 2

We prove the statement for t = 2. The case for t P 3 isanalogous to [30], which can be shown by induction andis therefore omitted.

Define the random variable wjiðcÞ such that

wjiðcÞ ¼

1 if SU c is on channel j at iteration t ¼ 1and migrates to channel i at t ¼ 2;

0 otherwise:

8><>:

ðB:1Þ

We now distinguish two cases: proportional and doubleimitation.

B.1. Proportional imitation

By definition, if j – sc(1), it holds that wjiðcÞ ¼ 0. Other-

wise, c imitates with probabilitynk

sc ð1Þnsc ð1Þ

a SU that was using

channel k at t = 0 and that is currently (t = 1) on the same


channel as c (sc(1)), and then migrates to channel i with

probability Fiscð0Þ;k. Note that we allow for self-imitation in

our algorithm. At initial states, all strategies are supposedto be chosen by at least one SU (N is large), so thatnscð1Þ – 0. We thus have:

P wjiðcÞ ¼ 1

h i¼

0 if j – scð1Þ;Xk2C

nksc ð1Þ

nsc ð1ÞFi

scð0Þ;k otherwise:

8><>: ðB:2Þ

We can now derive the population proportions at itera-tion t = 2 as:

pjið2Þ ¼

1N

Xc2N

wjiðcÞ 8i; j 2 C: ðB:3Þ

The expectations of these proportions can now be writtenas (using the Kronecker delta di, j):

E½pjið2Þ� ¼

1N

Xc2N

P½wjiðcÞ ¼ 1� ðB:4Þ

¼ 1N

Xc2N ;k2C

nkscð1Þð1ÞF

iscð0Þ;kdj;scð1Þ

nscð1Þð1ÞðB:5Þ

¼ 1N

Xh;l;k2C

nlhð1Þnk

hð1ÞFil;kdj;h

nhð1ÞðB:6Þ

¼ 1N

Xl;k2C

nljð1Þnk

j ð1ÞFil;k

njð1ÞðB:7Þ

¼Xl;k2C

~xljð1Þ~xk

j ð1Þ~xjð1Þ

Fil;k: ðB:8Þ

It follows that

E½pið2Þ� ¼Xj2C

E½pjið2Þ� ¼

Xj;l;k2C

~xljð1Þ~xk

j ð1Þ~xjð1Þ

Fil;k: ðB:9Þ

As wjiðcÞ and wj

iðdÞ are independent random variables forc – d and since the variance of wj

iðcÞ is less than 1, the var-iance of pj

ið2Þ and pi(2) for any i; j 2 C are less than 1/N andC/N, respectively. It then follows the Bienaymé–Chebychevinequality that

8i 2 C; P½fjpið2Þ � E½pið2Þ�j > dg� < C

ðNdÞ2: ðB:10Þ

Choosing N0 such that CðN0dÞ2

< � concludes the proof fort = 2. The proof can then be induced to any t as in [30].

B.2. Double imitation

If j – sc(1), it holds that wjiðcÞ ¼ 0. Otherwise, c imitates

with probabilitynk

sc ð1ÞNsc ð1Þ

nzsc ð1Þ

nsc ð1Þtwo SUs that were using respec-

tively channel k and channel z at t = 0 and that are cur-rently (t = 1) on the same channel as c (sc(1)), and then

migrates to channel i with probability Fiscð0Þ;k;z.

The proof follows in the steps of the proportional imita-tion and only the main passages will be sketched out. We al-low a SU to sample twice the same SU on the channel, so that:

P wjiðcÞ ¼ 1

h i¼

0 if j – scð1Þ;Xk;z2C

nksc ð1Þ

nsc ð1Þ

nzsc ð1Þ

nsc ð1ÞFi

scð0Þ;k;z otherwise:

8><>:

ðB:11Þ

We then derive the proportions expectations:

E pjið2Þ

h i¼ 1

N

Xc2N

P½wjiðcÞ ¼ 1� ðB:12Þ

¼ 1N

Xc2N ;k;z2C

nksc ð1Þð1Þ

nsc ð1Þð1Þnz

scð1Þð1Þnscð1Þð1Þ

Fiscð0Þ;k;zdj;scð1Þ ðB:13Þ

¼X

l;k;z2C

~xljð1Þ~xk

j ð1Þ~xzj ð1Þ

½~xjð1Þ�2Fi

l;k;z: ðB:14Þ

It follows that:

E pið2Þ½ � ¼Xj2C

E pjið2Þ

h iðB:15Þ

¼X

j;l;k;z2C

~xljð1Þ~xk

j ð1Þ~xzj ð1Þ

½~xjð1Þ�2Fi

l;k;z: ðB:16Þ

The rest of the proof for the double imitation follows thesame way as that of proportional imitation.

Appendix C. Proof of Theorem 3

Recall the analysis in [30]. In this reference, Eq. (10)states that Fj

i;j ¼ Fij;i þ r½pj � pi�. We can now characterize

fFil;kg for PISAP as:

Fil;k ¼

0 if l;k – i;

Fli;lþr½piðt�1Þ�plðt�1Þ� if k¼ i and l – i;

1� Fik;i�r½pkðt�1Þ�piðt�1Þ� if l¼ i and k – i;

1 if l¼ k¼ i:

8>>>><>>>>:

The above four equations state that: (1) if none of the in-volved channels is i then the probability to switch to chan-nel i is null (F is imitating); (2) the switching probability isproportional to the payoff difference; (3) if a SU does notimitate, it stays on the same channel; (4) if a SU imitatesanother SU with the same strategy, its strategy is not mod-ified. Eq. (5) can now be written as follows:

xjiðtþ1Þ ¼

Xl–i

xljðtÞxi

jðtÞxjðtÞ

ðFli;lþr½piðt�1Þ�plðt�1Þ�Þ

þXk–i

xijðtÞxk

j ðtÞxjðtÞ

ð1� Fii;k�r½pkðt�1Þ�piðt�1Þ�Þþ

xi2j

xj

¼Xl–i

xljðtÞxi

jðtÞxjðtÞ

ð1þr½piðt�1Þ�plðt�1Þ�Þþxi2

j

xj

¼Xl–i

xljðtÞxi

jðtÞxjðtÞ

r½piðt�1Þ�plðt�1Þ�þXl2C

xljðtÞxi

jðtÞxjðtÞ

¼ xijðtÞþ

Xl2C

xljðtÞxi

jðtÞxjðtÞ

r½piðt�1Þ�plðt�1Þ�:

This concludes the proof.

Appendix D. Proof of Theorem 4

We prove the convergence of (9) by showing that themapping described by (9) is a contraction. A contractionmapping is defined [41] as follows: let (X, d) be a metricspace, f: X ? X is a contraction if there exists a constant

3 In this reference, Eq. (3) of Theorem 1 is wrong. The correct formula ishowever given in the proof of the theorem in Appendix.


k 2 [0,1) such that "x, y 2 X, d(f(x), f(y)) 6 kd(x, y), whered (x, y) = kx � yk = maxi jxi � yij. Such an f is called a con-traction and admits a unique fixed point, to which themapping described by f converges.

Noticing that

dðf ðxÞ; f ðyÞÞ ¼ kf ðxÞ � f ðyÞk 6 @f@x

��dðx; yÞ; ðD:1Þ

it suffices to show that the Jacobian @f@x

�� 6 k. In our case, itsuffices to show that kJk1 6 k, where J ¼ ðJijÞi;j2C is the Jaco-bian of the mapping described by one of the equation in(9), defined by Jij ¼ @xiðuÞ

@xjðu�1Þ.

Recall that pi ¼ liNxi

and �p ¼P

lllN , (9) can be rewritten

as:

xiðuÞ ¼ xiðu� 1Þ þ r li

N� xiðu� 1Þ

Xl

ll

N

" #: ðD:2Þ

It follows that

Jij ¼1�

Xl

llN if j ¼ i;

0 otherwise:

8><>: ðD:3Þ

Hence

kJk1 ¼maxi2N

Xj2NjJijj ¼ 1�

Xl

ll

N< 1; ðD:4Þ

which shows that the mapping described by (9) is a con-traction. It is further easy to check that the fixed point of(9) is x� ¼ liP

l2Nll

, which is also the unique NE of G.

Appendix E. Proof of Theorem 5

We start from the following equation (we skip the ref-erence to time on the right hand side after the first linefor the sake of clarity):

xjiðt þ 1Þ ¼

Xl;k;z2C

xljðtÞxk

j ðtÞxzj ðtÞ

½xjðtÞ�2Fi

l;k;z ðE:1Þ

¼xi

j

x2j

Xk–i

xk2j ½F

ii;k;k þ Fi

k;i;k þ Fik;k;i�

þxi2

j

x2j

Xk–i

xkj ½F

ik;i;i þ Fi

i;k;i þ Fii;i;k�

þxi

j

x2j

Xk–i

XlRfk;ig

xljx

kj ½F

ii;k;l þ Fi

k;i;l þ Fik;l;i� þ

xi3j

x2j

: ðE:2Þ

The second equality can be understood as follows. Fil;k;z – 0

only if at least one of the indices l, k, or z is equal to i. Thefirst sum of the right hand side (RHS) is obtained when twoindices are equal and different from i, the third one is equalto i. The second sum is obtained when one index is differ-ent from i and the two others are equal to i. The third sumis obtained when one index is equal to i and the two othersare different and different from i. The last term corre-sponds to the case where all indices are equal to i (in thiscase, obviously, Fi

i;i;i ¼ 1). Now we have:

Fii;k;k þ Fi

k;i;k þ Fik;k;i ¼ 2Fi

k;i;k þ 1� Fki;k;k; ðE:3Þ

Fik;i;i þ Fi

i;k;i þ Fii;i;k ¼ Fi

k;i;i þ 2ð1� Fki;i;kÞ; ðE:4Þ

Fii;k;l þ Fi

k;i;l þ Fik;l;i ¼ 1� Fk;l

i;k;l þ Fil;i;k þ Fi

k;i;l; ðE:5Þ

where Fk;li;k;l ¼ Fk

i;k;l þ Fli;k;l. Above, we used the fact that

8ði; j; k; lÞ; Fli;j;k ¼ Fl

i;k;j (i.e., there is no order in the samplingof two individuals) and Fi

i;k;l þ Fki;k;l þ Fl

i;k;l ¼ 1 (i.e., withprobability one, the SU goes to channel i, j or k at the nextiteration).

Moreover, we note that:

xij

x2j

Xk–i

xk2j þ 2xi

j

Xk–i

xkj þ

Xk–i

XlRfk;ig

xljx

kj þ xi2

j

" #

¼xi

j

x2j

Xk

xk2j þ 2xi

j

Xk–i

xkj þ

Xk–l

xljx

kj � xi

j

Xl–i

xlj � xi

j

Xk–i

xkj

" #

¼xi

j

x2j

Xk;l

xljx

kj ¼ xi

j:

ðE:6Þ

We used here the fact thatP

kxkj ¼ xj.

Eq. (E.2) can now be written (we skip the reference totime on the RHS, all xi

j are functions of t):

xjiðt þ 1Þ ¼ xi

j þxi

j

x2j

Xk–i

xk2j 2Fi

k;i;k � Fki;k;k

h i

þxi2

j

x2j

Xk–i

xkj Fi

k;i;i � 2Fki;i;k

h i

þxi

j

x2j

Xk–i

XlRfk;ig

xljx

kj Fi

l;i;k þ Fik;i;l � Fk;l

i;k;l

h i: ðE:7Þ

We now use the following property of the double imitation[34]3 for i R {j, k}:

Fj;ki;j;k � Fi

j;i;k � Fik;i;j ¼

12

Qðpkðt � 1ÞÞðpjðt � 1Þ � piðt � 1ÞÞ

þ 12

QðpjÞðpkðt � 1Þ � piðt � 1ÞÞ: ðE:8Þ

Payoffs p are functions of t � 1 because imitation is basedon the payoff obtained at the previous iteration). In partic-ular, for j = k, we obtain:

Fki;k;k � 2Fi

k;i;k ¼ Qðpkðt � 1ÞÞðpkðt � 1Þ � piðt � 1ÞÞ: ðE:9Þ

From these equations, we can simplify (E.7) into (skippingagain reference to time on the RHS):

xjiðt þ 1Þ ¼ xi

j þxi

j

x2j

Xk

xk2j QðpkÞðpi � pkÞ

þxi2

j

x2j

Xk

xkj QðpiÞðpi � pkÞ

þxi

j

x2j

Xk

XlRfk;ig

xljx

kj QðplÞðpi � pkÞ: ðE:10Þ


The term in the last double summation has been obtained byusing (E.8), separating the expression in two double sumsand interchanging indices j and k in the first double sum.Note also that all terms of the involved sums are null for k = i.

We now obtain:

xjiðtþ1Þ ¼ xi

jþxi

j

x2j

Xk

xkj ðpi�pkÞ xk

j QðpkÞþ xijQðpiÞþ

XlRfk;ig

xljQðplÞ

" #;

¼ xijþ xi

j

Xk

xkj

xjðpi�pkÞ

Xl

xlj

xjQðplÞ ¼ xi

jþ xijQð�pjÞðpi� �pjÞ;

ðE:11Þ

where �pjðt � 1Þ ¼P

k

xkjðtÞ

xjðtÞpkðt � 1Þ can be interpreted as

the average payoff at the previous iteration of SUs settlingnow on channel j. We now have:

xiðt þ 1Þ ¼X

j

xjiðt þ 1Þ ¼

Xj

xijðtÞ þ xi

jðtÞQð�pjðt � 1ÞÞðpiðt � 1Þ � �pjðt � 1ÞÞh i

¼ xiðt � 1Þ þX

j

xijðtÞQð�pjðt � 1ÞÞðpiðt � 1Þ � �pjðt � 1ÞÞ: ðE:12Þ

We used the fact thatP

jxijðt þ 1Þ ¼ xiðtÞ. This concludes

the proof.

References

[1] S. Haykin, Cognitive radio: brain-empowered wirelesscommunications, IEEE Journal on Selected Areas inCommunications 23 (2) (2005) 201–220.

[2] M. Buddhikot, Understanding dynamic spectrum access: models,taxonomy and challenges, in: Proc. IEEE Dynamic Spectrum AccessNetworks (DySPAN), Dublin, Ireland, 2007.

[3] S. Iellamo, L. Chen, M. Coupechoux, A.V. Vasilakos, Imitation-basedspectrum access policy for cognitive radio networks, in: Proc.International Symposium on Wireless Communication Systems(ISWCS), Paris, France, 2012.

[4] A. Mahajan, D. Teneketzis, Foundations and Applications of SensorManagement, Springer-Verlag, 2007. pp. 121–151 (Ch. Multi-armedBandit Problems).

[5] A. Anandkumar, N. Michael, A. Tang, Opportunistic spectrum accesswith multiple users: learning under competition, in: Proc. IEEEInternational Conference on Computer Communication (INFOCOM),San Diego, CA USA, 2010.

[6] J. Neel, J. Reed, R. Gilles, Convergence of cognitive radio networks, in:Proc. IEEE Wireless Communications and Networking Conference(WCNC), Atlanta, Georgia USA, 2004.

[7] N. Nie, C. Comaniciu, Adaptive channel allocation spectrum etiquettefor cognitive radio networks, ACM Mobile Networks andApplications (MONET) 11 (6) (2006) 779–797.

[8] Z. Han, C. Pandana, K. Liu, Distributive opportunistic spectrum accessfor cognitive radio using correlated equilibrium and no-regretlearning, in: Proc. IEEE Wireless Communications and NetworkingConference (WCNC), Hong Kong, China, 2007.

[9] L. Chen, S. Iellamo, M. Coupechoux, P. Godlewski, An auctionframework for spectrum allocation with interference constraint incognitive radio networks, in: Proc. IEEE International Conference onComputer Communication (INFOCOM), San Diego, California USA,2010.

[10] J. Hofbauer, K. Sigmund, Evolutionary game dynamics, Bulletin of theAmerican Mathematical Society 40 (4) (2003) 479–519.

[11] W.H. Sandholm, Population Games and Evolutionary Dynamics, TheMIT Press, 2010.

[12] H. Tembine, E. Altman, R. El-Azouzi, Y. Hayel, Bio-inspired delayedevolutionary game dynamics with networking applications,Telecommunication Systems 47 (1) (2011) 137–152.

[13] P. Coucheney, C. Toutati, B. Gaujal, Fair and efficient user-networkassociation algorithm for multi-technology wireless networks, in:Proc. IEEE International Conference on Computer Communication(INFOCOM), Rio de Janeiro, Brazil, 2009.

[14] S. Shakkottai, E. Altman, A. Kumar, Multihoming of users to accesspoints in WLANs: a population game perspective, IEEE Journal onSelected Areas in Communications 25 (6) (2007) 1207–1215.

[15] D. Niyato, E. Hossain, Dynamics of network selection inheterogeneous wireless networks: an evolutionary game approach,IEEE Transactions on Vehicular Technology 58 (4) (2009) 2008–2017.

[16] H. Ackermann, P. Berenbrink, S. Fischer, M. Hoefer, Concurrentimitation dynamics in congestion games, in: Proc. ACM Symposiumon Principles of Distributed Computing (PODC), Calgary, Canada,2009.

[17] P. Berenbrik, T. Friedetzky, L. Goldberg, P. Goldberg, Distributedselfish load balancing, in: Proc. ACM-SIAM Symposium on DiscreteAlgorithms (SODA), San Francisco, CA, 2011.

[18] A. Ganesh, S. Lilienthal, D. Manjunath, A. Proutiere, F. Simatos, Loadbalancing via random local search in closed and open systems, in:

Proc. ACM SIGMETRICS International Conference on Measurementand Modeling of Computer Systems, New York, NY, 2010.

[19] C. Alos-Ferrer, K.H. Schlag, The Handbook of Rational and SocialChoice, Oxford University Press, 2009 (Ch. Imitation and Learning).

[20] X. Chen, J. Huang, Evolutionarily Stable Spectrum Access, IEEETransactions on Mobile Computing, in press. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6185553&tag=1.

[21] X. Chen, J. Huang, Imitative spectrum access, in: Proc. InternationalSymposium on Modeling and Optimization in Mobile, Ad Hoc, andWireless Networks (WiOpt), Paderborn, Germany, 2012.

[22] S. Huang, X. Liu, Z. Ding, Opportunistic spectrum access incognitive radio networks, in: Proc. IEEE International Conferenceon Computer Communication (INFOCOM), Phoenix, Arizona USA,2008.

[23] M. Maskery, V. Krishnamurthy, Q. Zhao, Decentralized dynamicspectrum access for cognitive radios: cooperative design of a non-cooperative game, IEEE Transactions on Communications 57 (2)(2009) 459–469.

[24] S. Iellamo, L. Chen, M. Coupechoux, Imitation-based spectrum accesspolicy for CSMA/CA-based cognitive radio networks, in: Proc. IEEEWireless Communications and Networking Conference (WCNC),Paris, France, 2012.

[25] X. Chen, J. Huang, Spatial spectrum access game: Nash equilibria anddistributed learning, in: Proc. ACM SIGMOBILE InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc), Hilton Head Island, South Carolina USA, 2012.

[26] L. Lai, H.E. Gamal, H. Jiang, H.V. Poor, Cognitive medium access:exploration, exploitation, and competition, IEEE Transactions onMobile Computing 10 (2) (2011) 239–253.

[27] R.B. Myerson, Game Theory: Analysis of Conflict, Harvard UniversityPress, Cambridge, MA, 1991.

[28] I. Milchtaich, Congestion games with player-specific payofffunctions, Games and Economic Behavior 13 (1) (1996) 111–124.

[29] A. Haurie, P. Marcotte, On the relationship between Nash, Cournotand Wardrop equilibria, Networks 15 (3) (1985) 295–308.

[30] K.H. Schlag, Why imitate, and if so, how? A boundedly rationalapproach to multi-armed bandits, Journal of Economic Theory 78 (1)(1998) 130–156.

[31] C. Alos-Ferrer, F. Shi, Imitation with asymmetric memory, EconomicTheory 49 (1) (2012) 193–215.

[32] B. Pradelski, H. Young, Learning efficient Nash equilibria indistributed systems, Games and Economic Behavior 75 (2) (2012)882–897.

[33] J. Elias, F. Martignon, E. Altman, Joint pricing and cognitive radionetwork selection: a game theoretical approach, in: Proc.International Symposium on Modeling and Optimization in Mobile,Ad Hoc, and Wireless Networks (WiOpt), Paderborn, Germany, 2012.

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6185553&tag=1

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6185553&tag=1


[34] K.H. Schlag, Which one should i imitate?, Journal of MathematicalEconomics 31 (4) (1999) 493–522

[35] W.H. Sandholm, Local stability under evolutionary game dynamics,Theoretical Economics 5 (1) (2010) 27–50.

[36] L. Sammuelson, J. Zhang, Evolutionary stability in asymmetricgames, Journal of Economic Theory 57 (2) (1992) 363–391.

[37] C. Alós-Ferrer, S. Weidenholzer, Contagion and efficiency, Journal ofEconomic Theory 143 (1) (2008) 251–274.

[38] H.P. Young, Learning by trial and error, Games and EconomicBehavior 65 (2) (2009) 626–643.

[39] B.S. Pradelski, H.P. Young, Efficiency and Equilibrium in Trial andError Learning, Discussion Paper, Department of Economics,University of Oxford, 2010.

[40] R. Jain, D. Chiu, W. Hawe, A Quantitative Measure of Fairness andDiscrimination for Resource Allocation in Shared Computer Systems,Research Report TR-301, DEC, 1984.

[41] R. Abraham, J. Marsden, T. Ratiu, Manifolds, Tensor Analysis, andApplications, Springer-Verlag, 1988.

Stefano Iellamo is currently a Ph.D. studentat Telecom ParisTech. He completed his Mas-ter’s degree in Telecommunications Engi-neering at Politecnico di Milano in 2009. Hisresearch activity, carried on at the Computerand Network Science department of TelecomParisTech, mainly focuses on resources allo-cation for Cognitive Radio Networks.

Lin Chen received his B.E. degree in RadioEngineering from Southeast University, Chinain 2002 and the Engineer Diploma fromTelecom ParisTech, Paris in 2005. He alsoholds a M.S. degree ofNetworking from theUniversity of Paris 6. He currently works asassociate professor in the department ofcomputer science of the University of Paris-Sud XI. His main research interests includemodeling and control for wireless networks,security and cooperation enforcement inwireless networks and game theory.

Marceau Coupechoux is Associate Professorat Telecom ParisTech since 2005. He obtainedhis master from Telecom ParisTech in 1999and from University of Stuttgart, Germany in2000, and his Ph.D. from Institut Eurecom,Sophia-Antipolis, France, in 2004. From 2000to 2005, he was with Alcatel-Lucent (in BellLabs former Research & Innovation and thenin the Network Design department). In theComputer and Network Science departmentof Telecom ParisTech, he is working on cellu-lar networks, wireless networks, ad hoc net-

works, cognitive networks, focusing mainly on layer 2 protocols,scheduling and resource management.

Documents

Proportional and double imitation rules for spectrum ...chen/index_fichiers/publications/comnet2013.pdf · Cognitive radio [1], with its capability to ﬂexibly conﬁg-ure its transmission