A Game Theoretic Approach to Distributed Opportunistic ... · Banchs et al.: A Game Theoretic Approach to Distributed Opportunistic Scheduling Abstract—Distributed Opportunistic

1

A Game Theoretic Approach to DistributedOpportunistic Scheduling

Albert Banchs, Senior Member, IEEE, Andres Garcia-Saavedra, Pablo Serrano, Member, IEEE, andJoerg Widmer, Senior Member, IEEE.

Banchs et al.: A Game Theoretic Approach to DistributedOpportunistic Scheduling

Abstract—Distributed Opportunistic Scheduling (DOS) is in-herently more difficult than conventional opportunistic schedul-ing due to the absence of a central entity that knows the channelstate of all stations. With DOS, stations use random access tocontend for the channel and, upon winning a contention, theymeasure the channel conditions. After measuring the channelconditions, a station only transmits if the channel quality isgood; otherwise, it gives up the transmission opportunity. Thedistributed nature of DOS makes it vulnerable to selfish users:by deviating from the protocol and using more transmissionopportunities, a selfish user can gain a greater share of wirelessresources at the expense of “well-behaved” users. In this paper,we address the problem of selfishness in DOS from a gametheoretic standpoint. We propose an algorithm that satisfiesthe following properties: (i) when all stations implement thealgorithm, the wireless network is driven to the optimal pointof operation, and (ii) one or more selfish stations cannot obtainany gain by deviating from the algorithm. The key idea ofthe algorithm is to react to a selfish station by using a moreaggressive configuration that (indirectly) punishes this station. Webuild on multivariable control theory to design a mechanism forpunishment that is sufficiently severe to prevent selfish behavioryet not so severe as to render the system unstable. We conducta game theoretic analysis based on repeated games to show thealgorithm’s effectiveness against selfish stations. These results areconfirmed by extensive simulations.

Index Terms—Contention-based channel access, distributedopportunistic scheduling, game theory, multivariable controltheory, repeated games, selfish stations, wireless networks

I. INTRODUCTION

OPPORTUNISTIC scheduling techniques have beenshown to significantly improve performance in wireless

networks. These techniques take advantage of the fluctuationsin the channel conditions of different wireless stations overtime; by selecting the station with the best instantaneouschannel for data transmission, opportunistic scheduling canutilize wireless resources more efficiently. A key assumptionof most opportunistic scheduling techniques [1], [2] is thatthe scheduler is centralized and has knowledge of the instan-taneous channel conditions of all stations.

Distributed Opportunistic Scheduling (DOS) techniques[3]–[6] have been proposed only recently. In contrast tocentralized schemes, with DOS each station has to makescheduling decisions without knowing the channel conditionsof the other stations. Stations contend for the channel usingrandom access with a given access probability. After suc-cessful contention, a station measures the channel and, if the

channel conditions are poor (i.e., the instantaneous transmis-sion rate is below a given threshold), the station gives up thetransmission opportunity. This allows all stations to contendfor the channel again, letting a station with better conditionswin the contention, which increases the overall throughput.DOS techniques thus exploit both multi-user diversity acrossstations and time diversity across slots.

The lack of global channel information makes DOS systemsvery vulnerable to selfish users. By deviating from the aboveprotocol and using a more aggressive configuration, a selfishuser can easily gain a greater share of wireless resources atthe expense of the other, well-behaved users. In this paper,we address the problem of selfishness in DOS from a gametheoretic standpoint. In our formulation of the problem, theplayers are wireless stations that implement DOS and striveto obtain as great a share of resources as possible from thewireless network. We show that, in the absence of penalties,the wireless network naturally tends to either great unfairnessor network collapse. Building on this result, we design apenalty mechanism in which any player who misbehaves willbe punished by other players in such a way that there isno incentive to misbehave. A key challenge when designingsuch a penalty scheme is to carefully adjust the punishmentinflicted on a misbehaving station. If the punishment is toolight, a selfish station may still benefit from misbehaving. Ifit is too excessive, however, the punishment itself could beinterpreted as misbehavior and trigger punishment from otherstations, leading to an endless spiral of increasing punishmentsand ultimately throughput collapse. We address this challengethrough a combination of game theory and control theory.

The most relevant prior work on DOS by Zheng et al. [3]lays the basic foundations of distributed opportunistic schedul-ing. The authors propose a mechanism based on optimal stop-ping theory and analyze its performance with well-behaved aswell as selfish users. The aim of the algorithm is to maximizethe total throughput of the network. [4]–[6] extend the basicmechanism of [3] by analyzing the case of imperfect channelinformation [4], improving channel estimation through two-level channel probing [5], and incorporating delay constraints[6]. While our algorithm deals with the basic DOS mechanismof [3], it can be extended to incorporate the enhancements of[4]–[6]. The key contributions of our work are as follows:

1) We perform a joint optimization of both the transmissionrate thresholds and the access probabilities, whereas [3]only optimizes the thresholds.

2) We provide a proportionally fair allocation that achievesa good tradeoff between total throughput and fairness.

2

In contrast, [3] maximizes the total throughput of thenetwork and, as a result, it risks starving stations withpoor channel conditions.

3) We propose a simple algorithm based on control theorythat guarantees stability and quick convergence to theoptimal point of operation, in contrast to the compara-tively complex heuristics of [3].

4) Our game theoretic analysis considers that users canselfishly configure both their access probability andtransmission rate threshold, whereas the analysis of [3]assumes that selfish users only have control over thethresholds.

5) We use a penalty mechanism to force an optimal Nashequilibrium, whereas [3] introduces a pricing mechanismwhich may not be practical in many scenarios; further-more, the performance of the pricing mechanism reliesheavily on the cost parameter and it is suboptimal evenfor the best parameter setting.

Some of the concepts and tools used in this paper buildon our previous works of [7] and [8]. In [7] we proposedan algorithm based on control theory to optimally adjust theconfiguration of DOS. In contrast to [7], in this paper ouraim is to prevent selfish users from obtaining any benefit bydeviating from the optimal configuration, which is a muchmore difficult problem. In [8] we designed an algorithm basedon multivariable control theory to adjust the contention param-eters of a WLAN. In this paper, we obtain the same linearizedsystem as [8], and hence we use the corresponding part of theanalysis from that paper; however, the purpose, algorithm andmost of the analysis of this paper differ substantially from [8].

The remainder of the paper is organized as follows. InSection II, we present an analysis of our system and derivethe optimal configuration of access probabilities and trans-mission rate thresholds. In Section III, we show that, inthe absence of penalties, the wireless network tends to ahighly undesirable resource allocation. We then propose analgorithm called Distributed Opportunistic scheduling withdistributed Control (DOC) that avoids this by implementinga decentralized penalty mechanism to control selfish users.Section IV shows by means of control theory, that when all thestations implement DOC, the system is stable and convergesto the optimal point of operation derived in Section II. InSection V, we conduct a game theoretic analysis of DOC toshow that stations cannot obtain any gain by behaving selfishly.The performance of the proposed scheme is extensively eval-uated through simulations in Section VI. Finally, Section VIIprovides some concluding remarks.

II. ANALYSIS AND OPTIMAL CONFIGURATION

In the following, we present our system model and analyzethe throughput as a function of the access probabilities andtransmission rate thresholds. We then compute the optimalconfiguration of these parameters for a proportionally fairthroughput allocation, which is well known to provide a good

Idlesuccessful

contentioncollision

data

transmission

iiRR )(

iiRR )(

Fig. 1. Example of channel contention.

tradeoff between total throughput and fairness.1

A. System model

Our system model follows that of [3]–[6]. We consider asingle-hop wireless network with N stations, where station icontends for the channel with an access probability pi. Weassume a collision model for the channel access, where astation contends successfully for the channel if no other stationcontends at the same time. Let τ denote the duration of a minislot for channel contention, which can either be empty, containa successful contention, or a collision.

As in [3]–[6], we assume that a station i obtains its localchannel conditions after a successful contention. Let Ri(θ)denote the corresponding transmission rate at time θ. If Ri(θ)is small (indicating a poor channel), station i gives up thistransmission opportunity and lets all the stations contend forthe channel again. Otherwise, it transmits for a duration ofT . Fig. 1 depicts an example of such channel contention.Our model, like that of [3]–[6], assumes that Ri(θ) remainsconstant for the duration of a data transmission and thatdifferent observations of Ri(θ) are independent.2 From [3],we have that the optimal transmission policy is a thresholdpolicy: for a given threshold R̄i, station i only transmits aftera successful contention if Ri(θ) ≥ R̄i.

B. Throughput analysis

The throughput ri achieved by station i is a function of theparameters pi and R̄i. Let li be the average number of bitsthat station i transmits following a successful contention andTi be the average time it holds the channel (including the timespent in contention). Then, the throughput of station i is

ri =ps,ili∑

j ps,jTj + (1− ps)τ(1)

where ps,i is the probability that a mini slot contains asuccessful contention of station i,

ps,i = pi∏j ̸=i

(1− pj) (2)

1The notion of proportional fairness was originally proposed by F. Kelly [9]and has been later applied to opportunistic scheduling [2]. According to [9],an allocation {r1, . . . , rN} is proportionally fair if (i) it is feasible, and(ii) for any other feasible allocation, the aggregate of proportional changes iszero or negative. [9] shows that the proportionally fair allocation maximizes∑

i log(ri).2The assumption that Ri(θ) remains constant during a data transmission is

a standard assumption for the block-fading channel in wireless communica-tions [10], [11]. The assumption that different observations are independentis justified in [3] through numerical calculations which show that in manypractical scenarios the channel correlation between two adjacent successfulcontentions of a station is very small with a very high probability.

3

and ps is the probability that the minislot contains a successfulcontention of any station in the system

ps =∑i

ps,i. (3)

Both li and Ti depend on R̄i. When a station contendssuccessfully, it holds the channel for a time T +τ if it transmitsdata and τ if it gives up the transmission opportunity. Thus,Ti can be computed as

Ti = Prob(Ri(θ) < R̄i)τ + Prob(Ri(θ) ≥ R̄i)(T + τ). (4)

When the station uses the transmission opportunity, it trans-mits a number of bits given by Ri(θ)Ti, which yields

li =

∫ ∞

R̄i

rTifRi(r)dr (5)

where fRi(r) is the pdf of Ri(θ).Based on the above, we can compute ri from p =

{p1, . . . , pN} and R̄ = {R̄1, . . . , R̄N}. In the following, weobtain the optimal configuration of these parameters to achieveproportional fairness.

C. Optimal pi configuration

The problem of determining the configuration that providesproportional fairness can be formulated as the unconstrainedoptimization problem of finding the p and R̄ configurationthat maximizes

∑i log(ri). We start by computing the optimal

configuration of p. Let us define wi as

wi =ps,ips,1

(6)

where we take station 1 as reference. From the above equationwe have that ps,i = wips/

∑j wj . Substituting this into (1)

yields

ri =wipsli∑

j wjpsTj +∑

j wj(1− ps)τ. (7)

Following the results of [12], we approximate the optimalsuccess probability ps by P = (1 − 1/N)N−1, which ismore accurate than the well-known approximation of 1/efor slotted random access.3 The numerical results providedin Section VI-A confirm the accuracy of the approximation.Substituting ps by the constant value P in (7) gives

ri =wili∑

j wjTj +∑

j wj(1/P − 1)τ. (8)

The problem of determining the optimal p configuration isequivalent to finding the wi values that maximize

∑i log(ri),

for ri defined in (8). To obtain these wi values, we impose

∂∑

j log(rj)

∂wi= 0 (9)

which yields

1

wi−N

Ti + (1/P − 1)τ∑j wjTj +

∑j wj(1/P − 1)τ

= 0. (10)

3[13] shows that the approximation of 1/e holds both for symmetric andasymmetric access probabilities, and it further shows that this approximationis accurate as long as the number of stations is sufficiently large and theaccess probabilities are sufficiently small.

Combining this expression for wi and wj , we obtain

wi

wj=

Tj + (1/P − 1)τ

Ti + (1/P − 1)τ. (11)

From the above, the values of p that solve the optimizationproblem are those that satisfy both ps = P and (11). Thesevalues can be obtained by solving the following system ofequations: ∑

i

pi∏j ̸=i

(1− pj) = P (12)

pi∏

j ̸=i 1− pj

p1∏

j ̸=1 1− pj=

T1 + τ(1/P − 1)

Ti + τ(1/P − 1), i = 2, . . . , N.

(13)As P is only an approximation to the optimal ps, the above

system of equations has in fact two solutions. This can beseen as follows. From (13) we can express {pi}i=2,...,N asa function of p1. With this, (12) becomes an equation withonly one unknown (p1). The left-hand side of this equationincreases from 0 (for p1 = 0) to a maximum value that isgreater than P and then decreases to 0 (for p1 = 1). Hence,there are two distinct values of p1 that solve (12). Takingthese two values of p1 and computing the corresponding valuesof {pi}i=2,...,N in each case, we obtain the two solutionsof the system of equations. For one of the solutions, allof the access probabilities are larger than the correspondingones from the other solution; we select the solution withthe larger access probabilities. As an exception to this, whenall access probabilities are equal, the optimal ps is exactlyP and the system has only one solution; in this case, weselect this unique solution. We denote the selected solutionby p∗ = {p∗1, . . . , p∗N}, and refer to these probabilities as theoptimal access probabilities.

To determine p∗ above, the Ti values have to be computedfor all stations. These depend on the optimal configuration ofthe thresholds R̄. In the following section, we compute theoptimal R̄, which we denote by R̄∗ = {R̄∗

1, . . . , R̄∗N}.

D. Optimal R̄i configuration

In order to obtain the optimal configuration of R̄, we needto find the transmission threshold of each station that, giventhe p∗ computed above, optimizes the overall performance interms of proportional fairness. This is given by the followingtheorem.

Theorem 1: Consider a station k that is alone in the networkand contends for the channel with pk = P . Let R̄1

k bethe transmission rate threshold that optimizes the throughputof this station under the assumption that different channelobservations are independent. Then, R̄∗

k = R̄1k.

Proof: The proof is by contradiction. Assume there existsa configuration R̄∗ with R̄∗

k ̸= R̄1k for some station k that

provides proportional fairness.Let l1k and T 1

k be the values of lk and Tk for the thresholdR̄1

k and l∗k and T ∗k the corresponding values for R̄∗

k. Since R̄1k

maximizes rk when station k is alone:

l1kT 1k + (1/P − 1)τ

>l∗k

T ∗k + (1/P − 1)τ

. (14)

4

Consider a network with N stations that use configurationR̄∗. Given R̄∗, the p∗ that maximizes

∑i log(ri) is given

by (12) and (13). This results in the following throughput forstation k:

r∗k =p∗s,kl

∗k∑

j p∗s,j(T

∗j + (1/P − 1)τ)

=l∗k

N(T ∗k + (1/P − 1)τ)

(15)and for the other stations:

r∗i =l∗i

N(T ∗i + (1/P − 1)τ)

∀i ̸= k. (16)

Let us now consider the alternative configuration R̄1k for

station k and R̄∗i for the other stations. If we take the p1k

and p1i configuration that satisfies (12) and (13) with thisalternative configuration, we obtain the following throughputfor station k:

r1k =l1k

N(T 1k + (1/P − 1)τ)

> r∗k (17)

and for the other stations:

r∗i =l∗i

N(T ∗i + (1/P − 1)τ)

∀i ̸= k. (18)

With the above, we have found an alternative configurationthat provides a higher throughput to station k and the samethroughput to all other stations. This alternative configura-tion thus increases

∑i log(ri), which contradicts the initial

assumption that the configuration R̄∗ provides proportionalfairness.

Following the above theorem, the optimal configuration ofthe thresholds R̄∗ can be computed using optimal stoppingtheory. This is done in [3], which finds that the optimalthreshold R̄∗

i can be obtained by solving the following fixedpoint equation:

E(Ri(θ)− R̄∗

i

)+=

R̄∗i τ

T P. (19)

The above concludes the search for the optimal configura-tion. The key advantage of this configuration is that it allowseach station to compute its R̄∗

i based on local informationonly, and thus decouples the computation of R̄∗

i from that ofp∗i . We use this finding to design a distributed mechanism forcomputing the optimal configuration, where each station usesa fixed R̄i = R̄∗

i obtained locally, together with an adaptivealgorithm to determine the optimal p∗i .

III. DOC ALGORITHM

In this section we propose an adaptive algorithm that satis-fies the following properties: (i) when all stations implementthe algorithm, it leads to the optimal configuration computedabove, and (ii) a selfish station cannot obtain any gain bydeviating from the algorithm. We first motivate our algorithmby showing that, in the absence of punishments, the systemwill naturally tend to a highly undesirable point of operation.We then present our algorithm, which uses punishments todrive the system to the optimal point of operation derived inthe previous section.

A. Motivation

If no constraints are imposed on the wireless network andstations are allowed to configure their {pi, R̄i} parameters toselfishly maximize their own benefit, the network will notnaturally tend to the optimum configuration derived above.To show this, we model the wireless system as a static gamein which each station can choose its configuration withoutsuffering any penalty. The following theorem characterizes theNash equilibria of this game.

Theorem 2: In the absence of penalties, there is at least onestation that plays pi = 1 in any Nash equilibrium.

Proof: The proof is by contradiction. Let us assume thatthere is a Nash equilibrium such that pj ̸= 1 ∀j.

If we consider one player i and take the partial derivativeof its throughput ri, we obtain

∂ri∂pi

=

∏j ̸=i (1− pj)liT̂−i(

piT̂i + (1− pi)T̂−i

)2 > 0 (20)

where T̂i is the average duration during which the channelis occupied when station i transmits and T̂−i is the averageduration of a transmission or an empty mini slot when stationi does not transmit.

From the above, it can be seen that the throughput ri isa strictly increasing function of pi. It follows from this that{pi, R̄i}, with pi ̸= 1, is not the best strategy for player igiven the configuration of the other stations, since station icould obtain a higher throughput by increasing pi to 1 andusing the same R̄i. The configuration {pi, R̄i}, with pi ̸= 1, istherefore not a Nash equilibrium, which contradicts our initialassumption.

Any of the above Nash equilibria are highly undesirable.If station i is the only one that plays pi = 1, then player iachieves non-zero throughput while all other players have zerothroughput. Conversely, if some other station j also plays pj =1, the result is a network collapse with all players obtainingzero throughput.

We conclude from the above that, in the absence of punish-ments, selfish behavior will severely degrade the performanceof the wireless system. In the following, we propose analgorithm that addresses this problem by implementing adistributed punishment mechanism.

B. Rationale behind the algorithm

Before presenting the algorithm, we first discuss the ratio-nale behind its design. This rationale relies heavily on thenotion of channel time that a station obtains over a certaininterval Θ, defined as

ti(Θ) =

ni(Θ)∑k=1

(T ki (Θ) + (1/P − 1)τ

)(21)

where ni(Θ) is the number of successful contentions of stationi in that period and T k

i (Θ) is the duration of the kth successfulcontention of the station in the interval. The above definitioncomprises the aggregate transmission time of the station plusa fixed overhead of (1/P − 1)τ that is added every time thestation accesses the channel.

5

An important observation that drives the design of our algo-rithm is that, with the configuration of Section II, all stationsreceive the same channel time on average, i.e., ti = tj ∀i, j(where ti = E[ti(Θ)]). This can be seen as follows. From (21)we have

titj

=E[ni(Θ)]

(E[T k

i (Θ)] + (1/P − 1)τ)

E[nj(Θ)](E[T k

j (Θ)] + (1/P − 1)τ)

=ps,i(Ti + (1/P − 1)τ)

ps,j(Tj + (1/P − 1)τ). (22)

Furthermore, from (13) we have ps,i(Ti + (1/P − 1)τ) =ps,j(Tj + (1/P − 1)τ) and thus ti = tj .

When all stations use the optimal configuration, the over-head in the definition of channel time, (1/P − 1)τ , coincideswith the average time between two successes. As a result, foran interval Θ of duration Tinterval it holds that

∑i E[ti(Θ)] =

Tinterval. From this and ti = tj , we have that with the optimalconfiguration, all stations receive an average channel time of

E[ti(Θ)] = Tinterval/N ∀i. (23)

We define this average channel time as the optimal channeltime and denote it by t∗ (i.e., t∗ = Tinterval/N ).

The last observation upon which our algorithm relies is thatas long as a selfish station does not receive more channel timethan t∗, it cannot increase its throughput. The throughput ofa station with a given channel time and R̄i is equal to thethroughput it would obtain if it were alone in the channelduring this time with pi = P and the same R̄i. From Theorem1, we have that this throughput is maximized for the optimaltransmission rate threshold R̄∗

i . Therefore, as long as thestation does not receive extra channel time, it will not be ableto achieve a higher throughput.

Given these observations, we base our algorithm on thefollowing principles: (i) if a given station i detects that anotherstation k is receiving more channel time than itself, it considersstation k to be selfish and indirectly punishes it by using amore aggressive configuration, and (ii) when punishing stationk, the punishment needs to be severe enough to keep stationk’s channel time below t∗ so that station k does not benefitfrom misbehaving.

C. Algorithm design

The objective of DOC is to drive the system to the optimalconfiguration {p∗, R̄∗} obtained in Section II. As discussedin Section II-D, each station can locally compute its optimalconfiguration of R̄i independently of the configuration of theother stations. Therefore, with DOC each station maintainsa fixed R̄i (equal to the optimal value) and implements anadaptive algorithm to configure its access probability pi.

Time is divided into intervals of fixed length Tinterval, andeach station updates its access probability pi at the beginningof every interval. We use the discrete variable Θ to refer tothe different intervals, and pi(Θ) to denote the value of pi ina given interval Θ. The central idea behind DOC is that whena misbehaving station is detected, the other stations increasetheir access probabilities in subsequent intervals to prevent theselfish station from benefiting from its misbehavior.

PI

controller

station 1

Wireless

network

E1( )Eq.(25)

P1( ) p1( )

PI

controllerEq.(25)

PN( ) pN( )EN( )

station N

Fig. 2. DOC control system.

A key challenge in DOC is to determine the appropriatereaction against a selfish station. If the reaction is not severeenough, a selfish station may benefit from misbehaving. How-ever, if the reaction is too severe, the system may becomeunstable by entering an endless loop where all stations indef-initely increase their pi to punish each other.

Control theory is a particularly suitable tool to address thischallenge, as it helps guarantee the convergence and stabilityof adaptive algorithms. We use techniques from multi-variablecontrol theory [14] for the design of the DOC algorithm. Thealgorithm is based on the classic system illustrated in Fig. 2,where each station runs an independent controller to computeits configuration. The controller that we have chosen for thispaper is a proportional-integral (PI) controller, a well-knowncontroller from classic control theory.

As shown in the figure, the PI controller of station i takesas input the error signal measured over an interval Θ, Ei(Θ),and provides as output the control signal Pi(Θ) for the nextinterval. The error signal indicates how far the system is fromthe desired point of operation. If the system is operating asdesired, the error signals of all stations are zero; otherwise, theerror signals are non-zero and the state of the system needs tochange from its current point of operation to the desired one.To do this, the PI controller adjusts the control signal Pi(Θ),increasing it if Ei(Θ) > 0 and decreasing it otherwise. In thefollowing, we address the design of Pi(Θ) and Ei(Θ).

D. Control signal Pi

The DOC algorithm needs to adjust the access probabilitypi(Θ) based on the control signal. To do this, there needs to bea one-to-one mapping between the control signal Pi(Θ) givenby the controller and pi(Θ). In addition, we design the systemsuch that the Pi(Θ) values are the same for all stations at theoptimal point of operation. This latter requirement is necessaryto derive the conditions for stability in Section IV.

Based on the above requirements, we design Pi(Θ) as

Pi(Θ) =pi(Θ)

1− pi(Θ)(Ti + (1/P − 1)τ) . (24)

A station can therefore compute its pi(Θ) from the controlsignal Pi(Θ) as

pi(Θ) =Pi(Θ)

Ti + (1/P − 1)τ + Pi(Θ). (25)

6

E. Error signal Ei

The design of the error signal Ei(Θ) has the followingtwo goals: (i) selfish stations should not be able to obtainextra channel time from the wireless network by using aconfiguration different from the optimal, and (ii) as long asthere are no selfish stations, p(Θ) should converge to theoptimal p∗.

For the design of the error signal, DOC relies (like [15],[16]) on the broadcast nature of the wireless medium, whichenables stations to overhear the transmissions of the otherstations. In particular, in every interval Θ, each stationmeasures (i) the channel time used by the other stations,tj(Θ), and (ii) the average time (over the interval) that theyhold the channel upon a successful contention, Tj(Θ) =∑nj(Θ)

k=1 T kj (Θ)/nj(Θ). Based on this, station i computes the

error signal at the end of the interval as

Ei(Θ) =∑j ̸=i

(tj(Θ)− ti(Θ))− Fi(Θ) (26)

where Fi(Θ) is a function that we design below. The errorsignal Ei(Θ) consists of the following two components:

• The first component,∑

j ̸=i tj(Θ)− ti(Θ), punishes self-ish stations. If a station i receives less channel time thanthe other stations, this component will be positive andhence station i will increase its access probability pi(Θ).

• The second component, Fi(Θ), drives the system tothe desired point of operation in the absence of selfishbehavior (i.e., when all stations receive the same channeltime).

We next address the design of the function Fi(Θ). In orderto drive p(Θ) to the desired p∗ when all stations receive thesame channel time, we need Fi(Θ) > 0 for pi(Θ) > p∗i ,such that in this case pi(Θ) decreases, and Fi(Θ) < 0 forpi(Θ) < p∗i .

The design of Fi(Θ) should also prevent selfish stationsfrom obtaining more channel time than t∗. In the following,we derive the conditions that Fi(Θ) needs to meet in order tosatisfy this requirement. To derive these conditions, we assumethat the system is in steady state, which implies that selfishstations play with a static configuration. (In the analysis ofSection V we show that DOC is also effective against selfishstrategies that change the configuration over time.)

We first consider the case where one station k is selfish andall others are well-behaved and run the DOC algorithm. Sincethe PI controller drives the error signal Ei(Θ) to 0 in steadystate, the following holds for all well-behaved stations:

Fi(Θ) =∑j ̸=i

tj(Θ)− ti(Θ). (27)

Summing Fi(Θ) over all stations except the selfish oneyields:∑i̸=k

Fi(Θ) = (N−1)tk(Θ)−∑i ̸=k

ti(Θ) = Ntk(Θ)−∑i

ti(Θ).

(28)If we combine the above with the requirement that the

selfish station cannot gain, i.e., tk(Θ) ≤ t∗, we obtain the

following inequality, ∑i ̸=k

Fi(Θ) ≤ D(Θ) (29)

where D(Θ) is defined as the difference between the sum ofchannel times in optimal operation and the sum of channeltimes in the current interval, i.e., D(Θ) = Nt∗ −

∑i ti(Θ).

Note that, if the current access probabilities are not optimal,∑i ti(Θ) will be smaller than Nt∗. Hence, D(Θ) reflects the

channel time lost due to non-optimal access probabilities.The following upper bound on Fi(Θ) guarantees that (29)

is satisfied, and thus ensures that a selfish station does notbenefit from misbehaving:

Fi(Θ) ≤ 1

N − 1D(Θ). (30)

The intuition behind this upper bound is as follows. When aselfish station misbehaves, it receives more channel time thanthe well-behaved stations. This, however, moves the point ofoperation away from the optimal access probabilities, reducingthe overall efficiency in terms of aggregate channel time. Theabove upper bound ensures that the additional channel timereceived by the selfish station does not outweigh the channeltime it loses due to the overall loss of aggregate channel time.This guarantees that the selfish station does not receive morechannel time and hence does not benefit from misbehaving.

We next consider the case of multiple selfish stations. Inthis case, the aggregate channel time received by the selfishstations must not exceed the aggregate channel time that theywould receive in optimal operation, i.e.,

∑mi=1 ti(Θ) ≤ mt∗

(where {1, . . . ,m} is the set of selfish stations). Followingsimilar reasoning to that above, we obtain the upper bound

Fi(Θ) ≤ m

N −mD(Θ). (31)

Given all the above requirements, we design Fi(Θ) as:

Fi(Θ) =min((N− 1)D(Θ), D(Θ)

N

), pi(Θ) > pmin

i

min((N− 1)D(Θ),−D(Θ)

N , (N− 1)∆), pi(Θ) ≤ pmin

i

(32)

where pmin = {pmin1 , . . . , pmin

N } are the access probabilitiesthat minimize D = E[D(Θ)] subject to ti = tj ∀i, j, and ∆is the value that D takes at this point,

∆ = D|p=pmin . (33)

In order to compute pmin and ∆, the Tj of all stations arerequired. For these, we use the Tj(Θ) values measured overthe current interval.

The above design satisfies all of our previous requirements:• The term D(Θ)/N ensures that (30) and (31) are satisfied

when D(Θ) > 0 and the term (N −1)D(Θ) ensures thatthey are satisfied when D(Θ) < 0. This provides therequired protection against (one or more) selfish stations.

• As illustrated in Fig. 3, when all stations have the sameexpected channel time, the expected value of Fi(Θ) is

7

pi( )

Fi

pi*pimin

(N-1)(N-1)

-D/N

(N-1)D

D/N

Fig. 3. Fi as a function of pi(Θ) when ti = tj ∀i, j.

positive for pi(Θ) > p∗i and negative otherwise. Thisensures that p(Θ) is driven to the desired p∗.

The above design of the DOC algorithm is based onthe assumption that the number of stations in the wirelessnetwork is fixed. In the following, we address the case ofstations joining and leaving the network. With DOC, eachstation only keeps the state maintained by the PI controller,∑

Θ

∑j ̸=i

(tj(Θ)− ti(Θ)

)+ Fi(Θ), which accounts for the

deficit or surplus of the station’s channel time over the otherstations in the network. When a new station joins the wirelessnetwork, this station does not have a surplus or deficit, andtherefore the other stations keep their state. The new stationinitializes the state of its PI controller such that its initialpi corresponds to the optimal p∗i . When a station leaves, theremaining stations keep their state: this ensures that the deficitaccumulated by a selfish station is not reset if it leaves andrejoins the network.

This concludes the design of the algorithm. In the followingtwo sections, we analytically evaluate its performance when allstations are well-behaved (Section IV) and when some stationsmisbehave (Section V).

IV. DOC ANALYSIS

In this section we analyze the performance of DOC whenall stations are well-behaved. As stations do not obtain anybenefit from misbehaving, it is to be expected that they will allplay DOC, and therefore this is the most meaningful scenariofor the performance analysis of the system. We first analyzethe wireless system under steady state conditions and showthat it is driven to the desired point of operation obtained inSection II. We then conduct a transient analysis and derivesufficient conditions for stability.

A. Steady state analysis

Our analysis is based on the system model of Fig. 4.In this model, C represents the function implemented bythe controllers, which computes the control signals Pi(Θ),taking the error signals Ei(Θ) as input. H represents thewireless system which provides the error signals Ei(Θ) basedon the control signals Pi(Θ). In line with standard controltheory [17], we model the randomness of the channel withthe noise signals Wi(Θ) and let Ei(Θ) represent the expectedvalue of the error signal for the given control signals Pi(Θ).Since the controller includes an integrator, there is no steady

H CE P

System Controller

z-1

+

W

Fig. 4. Control system.

state error [17] and the steady state solution can be obtainedfrom

Ei(Θ) = 0 ∀i. (34)

Using (26) and (32), Ei(Θ) can be computed from p(Θ).This enables (34) to be expressed as a system of equations inp(Θ). The following theorem guarantees that the the solutionof this system of equations is unique and shows that the uniquestable point in steady state is the desired point of operationfrom Section II.

Theorem 3: The unique stable point of operation of thesystem in steady state is p(Θ) = p∗.

Proof: Let us consider two stations i and j. From (34)we have Ei(Θ)− Ej(Θ) = 0, which yields

Ntj(Θ) + Fj(Θ)−Nti(Θ)− Fi(Θ) = 0. (35)

Note that tj(Θ) > ti(Θ) implies Fj(Θ) ≥ Fi(Θ), and viceversa. This can be seen as follows: If pj(Θ) > pmin

j andpi(Θ) > pmin

i , then Fj(Θ) = Fi(Θ). If pj(Θ) ≤ pminj and

pi(Θ) ≤ pmini , then also Fj(Θ) = Fi(Θ). If pj(Θ) > pmin

j

and pi(Θ) ≤ pmini , then Fj(Θ) ≥ Fi(Θ). When tj(Θ) >

ti(Θ), we are in one of these three cases, and hence Fj(Θ) ≥Fi(Θ). Combining this with (35) yields ti(Θ) = tj(Θ) ∀i, j.Substituting this into Ei(Θ) = 0 yields Fi(Θ) = 0. Giventi(Θ) = tj(Θ), Fi(Θ) is an increasing function of pi(Θ) thatcrosses 0 at pi(Θ) = p∗i . Hence, the only pi(Θ) that satisfiesFi(Θ) = 0 is p∗i . Since this holds for all i, the unique stablepoint of operation is pi(Θ) = p∗i ∀i.

B. Stability analysis

We now conduct a stability analysis of DOC to configurethe parameters of the PI controller. According to the definitionof a PI controller [17], station i computes the value of Pi

at interval Θ′ as a function of the error values measured bythe station in the current and previous intervals based on thefollowing equation:

Pi(Θ′) = KpEi(Θ

′) +Ki

Θ′−1∑Θ=0

Ei(Θ) (36)

where Kp and Ki are the parameters of the controller that weneed to configure.

In order to analyze our system from a control theoreticstandpoint, we need to characterize the transfer functions Cand H in the system model of Fig. 4. The control and errorsignals in the figure are given by the following vectors in thez-domain [17]:

P(z) = (P1(z), . . . , PN (z))T (37)

8

andE(z) = (E1(z), . . . , EN (z))

T. (38)

Our control system consists of one PI controller in eachstation i that takes Ei(z) as input and provides Pi(z) as output.We can therefore express the relationship between E(z) andP(z) as follows

P(z) = C ·E(z) (39)

where

C =

CPI(z) 0 0 . . . 0

0 CPI(z) 0 . . . 00 0 CPI(z) . . . 0...

......

. . ....

0 0 0 . . . CPI(z)

(40)

with CPI(z) being the z-transform of a PI controller [17],

CPI(z) = Kp +Ki

z − 1. (41)

In order to characterize our wireless system with a transferfunction H that takes P(z) as input and has E(z) as output,we proceed as follows. Equation (26) provides a nonlinearrelationship between E(Θ) and P(Θ). To express this rela-tionship as a transfer function, we linearize it at the optimalpoint of operation.4 We then study the linearized model andensure its stability through appropriate choice of parameters.Note that the stability of the linearized model guarantees thatour system is locally stable.5

We express the perturbations around the stable point ofoperation as follows:

P(Θ) = P∗ + δP(Θ) (42)

where P∗ is the stable point of operation as given by (24)with p(Θ) = p∗.

With the above, the perturbations of E can be approximatedby

δE(Θ) = H · δP(Θ) (43)

where

H =

∂E1(Θ)∂P1(Θ)

∂E1(Θ)∂P2(Θ) . . . ∂E1(Θ)

∂PN (Θ)∂E2(Θ)∂P1(Θ)

∂E2(Θ)∂P2(Θ) . . . ∂E2(Θ)

∂PN (Θ)

......

. . ....

∂EN (Θ)∂P1(Θ)

∂EN (Θ)∂P2(Θ) . . . ∂EN (Θ)

∂PN (Θ)

. (44)

To compute these partial derivatives we proceed as follows.The error signal Ei(Θ) can be expressed as

Ei(Θ) = Tinterval

∑j ̸=i

(ps,j(Θ)

(Tj +

(1P − 1

)τ)∑

k ps,k(Θ)Tk + (1− ps(Θ))τ

−ps,i(Θ)

(Ti +

(1P − 1

)τ)∑

k ps,k(Θ)Tk + (1− ps(Θ))τ

)− Fi(Θ).

(45)

4This linearization provides a good approximation of the behavior of thesystem when it suffers small perturbations around the stable point of operation.

5A similar approach was used in [18] to analyze RED from a controltheoretic standpoint.

The above can be rewritten as a function of P(Θ) given by

Ei(Θ) = Tinterval

∑j ̸=i (Pj(Θ)− Pi(Θ))∑

j Pj(Θ)− ps(Θ)pe(Θ) (

1P − 1)τ + 1−ps(Θ)

pe(Θ) τ

− Fi(Θ) (46)

where pe(Θ) =∏

j 1− pj(Θ).We start by showing that ∂Fi(Θ)/∂Pi(Θ) = 0 at the stable

point of operation. It follows from (32) that

∂Fi(Θ)

∂Pi(Θ)= 0 ⇐⇒ ∂D(Θ)

∂Pi(Θ)= 0. (47)

D(Θ) can be expressed as

D(Θ) = Nt∗ − Tinterval

∑i ps,i(Θ)Ti + ps(Θ)(1/P − 1)τ∑

i ps,i(Θ)Ti + (1− ps(Θ))τ.

(48)The partial derivative of D(Θ) can be computed as

∂D(Θ)

∂Pi(Θ)=

∂D(Θ)

∂pi(Θ)

∂pi(Θ)

∂Pi(Θ). (49)

Taking the partial derivative of (48) with respect to pi(Θ)and evaluating it at the stable point of operation yields

∂D(Θ)

∂pi(Θ)= Tinterval

(τ/P∑

i ps,i(Θ)Ti + (1/P − 1)τ

)∂ps(Θ)

∂pi(Θ).

(50)Since ps(Θ) has a maximum value at the stable point of

operation, we have that ∂ps(Θ)/∂pi(Θ) = 0, which yields∂D(Θ)/∂Pi(Θ) = 0 and hence

∂Fi(Θ)

∂Pi(Θ)= 0. (51)

The partial derivative of Ei(Θ) evaluated at the stable pointof operation can then be computed from (46) as

∂Ei(Θ)

∂Pi(Θ)

∣∣∣∣P(Θ)=P∗

= −(N − 1)Tinterval1∑j P

∗j

. (52)

Using similar reasoning, we can see that

∂Ej(Θ)

∂Pj(Θ)

∣∣∣∣P(Θ)=P∗

= Tinterval1∑j P

∗j

. (53)

Substituting these expressions in matrix H yields

H = KH

−(N − 1) 1 . . . 1

1 −(N − 1) . . . 1...

.... . .

...1 1 . . . −(N − 1)

(54)

whereKH = Tinterval

1∑j Pj

∗ . (55)

Thus, the linearized system is fully characterized by thematrices C and H . The next step is to configure the Kp

and Ki parameters. The following theorem provides sufficientconditions which {Kp,Ki} must meet to ensure stability:

Theorem 4: The linearized system is guaranteed to be stableas long as Kp and Ki meet the following conditions:

Ki < Kp +1

NKH, Ki > 2Kp −

1

NKH. (56)

9

Proof: The reader is referred to [8] for the proof of thetheorem. Since [8] uses the same linearized system as thispaper, the proof follows very closely that of [8].

In addition to guaranteeing stability, our goal in the config-uration of the {Kp,Ki} parameters is to find the right balancebetween reaction time in transients and oscillations in steadystate. To this end, we use the Ziegler-Nichols rules [17], whichhave been designed for this purpose. Following these rules (see[8] for a detailed description), we obtain the configuration:

Kp =0.4

2NKH, Ki =

(1

0.85 · 2

)0.4

2NKH. (57)

The stability of the resulting configuration is guaranteed bythe following corollary:

Corollary 1: The Kp and Ki configuration given by (57)is stable.

Proof: The proof follows from the fact that the configu-ration of (57) meets the conditions of Theorem 4.

Note that the above control theoretic analysis guaranteesthat the system will always converge to the desired point ofoperation regardless of the initial state. This implies that thesystem remains stable in the presence of any kind of per-turbation. Such perturbations include, among others, transientselfish behavior or stations joining and leaving the network.

V. GAME THEORETIC ANALYSIS

In the previous section we have shown that, when all stationsimplement the DOC algorithm, they all play with pi = p∗i andR̄i = R̄∗

i , which leads to the optimal throughput allocationr∗i obtained in Section II.6 In this section we conduct a gametheoretic analysis to show that one or more stations cannotobtain any gain by deviating from DOC. In what follows, wesay that a station is honest or well-behaved when it implementsthe DOC algorithm to configure its pi and R̄i parameters,while we say that it is selfish or misbehaving when it plays astrategy different from DOC to configure these parameters inorder to obtain a greater share of wireless resources.

The game theoretic analysis conducted in this section as-sumes that users are rational and want to maximize theirown benefit or utility, which is given by the throughput.Furthermore, it is reasonable to assume that the game is non-cooperative in that no binding agreements can be reachedbetween the players as to their their future play [16]. Themodel is based on the theory of repeated games [19]. Inrepeated games, time is divided into stages and a playercan take new decisions at each stage based on the observedbehavior of the other players in the previous stages. Thismatches our algorithm, where time is divided into intervalsand stations update their configuration at each interval.7 Likeprevious analyses on repeated games [15], [16], we consider

6Since the throughput allocation {r∗1 , . . . , r∗N} maximizes∑

i log(ri), itis Pareto optimal. This follows from the fact that if there existed anotherfeasible allocation that provided all stations with more throughput than r∗i ,this allocation would yield a larger

∑i log(ri).

7Note that the game theoretic study conducted in Section III-A was basedon static games instead of repeated ones. The reason is that we considered asystem without penalties where a user does not react to the behavior of otherusers. Hence, we could model it as a static game where all players only makea single move at the beginning of the game.

an infinitely repeated game, which is a common assumptionwhen the players do not know when the game will end.

A. Single selfish station

While the design of the DOC algorithm in Section IIIguarantees that a station cannot benefit from playing with afixed selfish configuration, selfish stations might still benefitby varying their configuration over time. As an example, letus consider a naive algorithm that only takes into account thestations’ behavior in the previous stage. While this algorithmmay be effective against a fixed selfish configuration, it couldeasily be defeated by a selfish station that alternates betweena selfish configuration (pk = 1, R̄k = 0) and an honest one(pk = p∗k, R̄k = R̄∗

k) at every other stage. Since this stationwould play selfish when all the others play honest, it wouldachieve a significantly higher throughput every other interval,thus benefiting from its misbehavior.

The above example shows that it is important to ensure thata selfish station cannot obtain any gain no matter how it variesits configuration over time. The following theorem confirmsthe effectiveness of DOC against any (fixed or variable) selfishstrategy. The proof of the theorem relies on the integratorcomponent of the PI controller, which keeps track of theaggregate channel time received by all stations and can thus beused to guarantee that this aggregate does not exceed a givenamount.

Theorem 5: Let us consider a selfish station that uses apk(Θ) and R̄k(Θ) configuration that can vary over time.If all the other stations implement the DOC algorithm, thethroughput received by this station will be no larger than r∗k.

Proof: The PI controller computes Pi at a given intervalΘ′ according to the following expression:

Pi(Θ′) = P initial

i +Kp

(∑j ̸=i

(tj(Θ′)− ti(Θ

′))− Fi(Θ′)

)

+Ki

Θ′−1∑Θ=0

(∑j ̸=i

(tj(Θ)− ti(Θ))− Fi(Θ)

). (58)

With the above, Pi(Θ′) stays between 0 and a given

maximum value Pmaxi . If at some point Pi reaches a Pmax

i

value such that pi = 1, this will result in tj = 0 for j ̸= iand Fi > −(N − 1)ti, which yields Ei < 0, and thereforePi will decrease. Similarly, if at any time Pi reaches 0, thenti = 0 and Fi ≤ 0, which yields Ei > 0, and therefore Pi

will increase.Considering that 0 ≤ Pi(Θ

′) ≤ Pmaxi , the above equation

can be expressed as

Θ′∑Θ=0

(∑j ̸=i

(tj(Θ)− ti(Θ))− Fi(Θ)

)= Bi (59)

where Bi is a bounded value: Bi = (Pmaxi −P initial

i +(Ki−Kp)Ei(Θ

′))/Ki.Let us consider the case in which there is a selfish station

that changes its configuration over time and receives a channel

10

time tk(Θ). Equation (59) can be written as∑Θ

tk(Θ) =∑Θ

((N − 1)ti(Θ)−

∑j ̸=i,k

tj(Θ) + Fi(Θ))+Bi.

(60)Let us now consider a given interval Θ. From (30), we have

Fj(Θ) ≤ 1

N − 1

(Nt∗ −

∑i

ti(Θ)). (61)

Summing the above expression over all j ̸= k, we have∑i ti(Θ) +

∑j ̸=k Fj(Θ) ≤ Nt∗. As this satisfied for all Θ,∑

Θ

(∑i

ti(Θ) +∑j ̸=k

Fj(Θ))≤

∑Θ

Nt∗. (62)

Furthermore, by summing (60) over all j ̸= k,

(N−1)∑Θ

tk(Θ) =∑j ̸=k

∑Θ

(tj(Θ) + Fj(Θ))+∑j ̸=k

Bj . (63)

Adding the above two equations yields N∑

Θ tk(Θ) ≤N

∑Θ t∗ +

∑j ̸=k Bj . If we consider a long period of time,

the constant term∑

j ̸=k Bj can be neglected, resulting in∑Θ tk(Θ) ≤

∑Θ t∗. This means that the selfish station cannot

receive more channel time using a selfish strategy than byplaying DOC. Following the argument of Section III-B, thisimplies that it cannot obtain more throughput than it wouldby playing DOC, i.e., rk ≤ r∗k, which proves the theorem.

The above theorem leads to Corollary 2.Corollary 2: A state in which all stations play DOC (All-

DOC) is a Nash equilibrium of the game.Proof: According to Theorem 5, if all stations but one

play DOC, the best response of this station is to play DOC aswell since it cannot benefit from playing a different strategy.Thus, All-DOC is a Nash equilibrium.

The above shows that if all stations start playing with noprevious history, none of them can benefit by deviating fromDOC. In addition to this, in repeated games it is also importantto ensure that, if at some point the game has a given history,a selfish station cannot exploit knowledge of this history byplaying a different strategy from DOC. The following theoremconfirms that All-DOC is a Nash equilibrium of any subgame(where a subgame is defined as the game resulting fromstarting to play with a certain history). Therefore, a selfishstation cannot benefit by deviating from DOC for any previoushistory of the game.

Theorem 6: All-DOC is a subgame perfect Nash equilib-rium of the game.

Proof: Since the proof of Theorem 5 is not dependenton past history and can therefore be applied to any subgame,All-DOC is a Nash equilibrium of any subgame.

Note that, even though the above analysis assumes a fixednumber of stations in the wireless network, it also holds forthe case when the number of stations changes over time, aslong as these changes occur over sufficiently long periods suchthat the constant term

∑j ̸=k Bj is not significant.

B. Multiple selfish stations

The above results show the effectiveness of DOC against asingle selfish station. In the following, we focus on the caseof multiple selfish stations.

The following theorem shows that, by following a differentstrategy from DOC, multiple stations cannot gain any aggre-gate channel time.

Theorem 7: Let us consider a scenario with m selfishstations. If all other stations play DOC, the selfish stationscannot gain any aggregate channel time.

Proof: Without loss of generality, let us consider thatstations i = {1, . . . ,m} are selfish. Applying a reasoningsimilar to Theorem 5 leads to

∑mi=1

∑Θ ti(Θ) ≤ m

∑Θ t∗.

As the left-hand side of this inequality is the aggregate channeltime obtained by the selfish stations, and the right-hand sideis the aggregate channel time that they would obtain if theyplayed DOC, the theorem is proven.

According to the above theorem, it is possible for a selfishstation to obtain some gain, but this will be at the expenseof another selfish station that receives less channel time.Corollary 3 follows from this.

Corollary 3: Let us consider a scenario with m selfishstations. If all other stations play DOC and a selfish station kreceives a throughput larger than r∗k, then there exists anotherselfish station l that receives a throughput smaller than r∗l .

Proof: If there is some station k ∈ {1, . . . ,m} for whichrk > r∗k, this station must necessarily receive more channeltime than it would if all stations played DOC. Since (accordingto Theorem 7) the selfish stations cannot gain any aggregatechannel time, there must then exist some other station l ∈{1, . . . ,m} that receives less channel time. For this station, itholds that rl < r∗l , which proves the corollary.

Based on the above, we argue that DOC is effective againstmultiple selfish stations, since two or more selfish stationscannot simultaneously benefit and therefore do not have anyincentive to play a coordinated strategy different from DOC.

VI. PERFORMANCE EVALUATION

In this section we evaluate DOC by means of simulationto show that (i) in the absence of selfish stations, DOCprovides optimal performance while remaining stable andreacting quickly to changes, and (ii) selfish stations cannotbenefit by following a strategy different from DOC. Unlessotherwise stated, we assume that different observations ofthe channel conditions are independent. We also assume thatthe available transmission rate for a given SNR is given bythe Shannon channel capacity: R(h) = Wlog2(1 + ρ|h|2)bits/s, where W is the channel bandwidth, ρ is the normalizedaverage SNR and h is the random gain of Rayleigh fading.We implemented the DOC algorithm in OMNET++. In thesimulations, we set W = 107, T /τ = 10 and the interval ofthe controller Ttotal = 105τ . For all results, 95% confidenceintervals are below 0.5%.

A. Validation of the optimal configuration

In order to assess the accuracy of the analysis of Section II,we compare the performance of the configuration computed

11

5

10

15

20

25

30

35

40

2 4 6 8 10 12 14 16 18 20

Tot

al T

hrou

ghpu

t [M

bps]

N

optimal configuration (ρ2=1)optimal configuration (ρ2=5)

optimal configuration (ρ2=20)optimal configuration (ρ2=100)

exhaustive search

Fig. 5. Validation of the optimal configuration of Section II for differentnumbers of stations and levels of heterogeneity.

in that section (‘optimal configuration’) against the result ofperforming an exhaustive search over all possible configu-rations of {p, R̄} and selecting the best one (‘exhaustivesearch’). We perform the following two experiments. First,we consider a wireless network with N stations, N/2 ofthem with a normalized SNR equal to ρ1 and the otherN/2 with a normalized SNR equal to ρ2. Fig. 5 shows thetotal throughput obtained by both approaches for differentnumbers of stations (N = {2, 4, . . . , 20}) and levels ofheterogeneity (ρ1 = 1, ρ2 = {1, 5, 20, 100}). We observethat both approaches perform very closely, with a differencewell below 0.5% in all cases. Next, we consider a wirelessnetwork with N stations, N = {2, 4, . . . , 20}, where the ρiof each station is randomly chosen in the range (ρ1, ρ2), forρ1 = 1 and ρ2 = {1, 5, 20, 100}. The results (not shown in agraph) confirm that also here the difference between the twoapproaches is well below 0.5% in all cases. We conclude fromthese two experiments that the analysis is very accurate.

The key approximation of the analysis is to assume thatps is equal to (1 − 1/N)N−1, which corresponds to thevalue of the optimal success probability with symmetric accessprobabilities [12]. By analyzing the access probabilities in theabove experiments, we observe that all stations use similaraccess probabilities regardless of their channel conditions,which makes this approximation particularly accurate. Forinstance, even for the extreme case of N = 2 and ρ2 = 100,the difference between the access probabilities of the twostations is only around 18%. This is a consequence of (11),whereby the ratio of the access probabilities of two stationsdepends on their Ti values; since the thresholds R̄i are setso that all stations have a similar probability of using atransmission opportunity, this implies that all Ti values aresimilar, and as a result, the access probabilities are also similar.

B. Throughput evaluation

For the throughput evaluation, we compare the performanceof DOC to the following approaches: (i) the static optimalconfiguration obtained in Section II (‘optimal configuration’),(ii) the configuration proposed in [3] (‘DOS’), and (iii)

134

136

138

140

142

144

1 2 3 4 5 6 7 8 9 10

Σ(lo

g(r i)

)

ρ2

optimal configurationDOCDOSnon-opportunistic

Fig. 6. Proportional fairness as a function of SNR (ρ1 = 1, 1 ≤ ρ2 ≤ 10).

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

optimalconfiguration

DOC DOS non-opportunistic

Thr

ough

put [

Mbp

s]

r1r2

Fig. 7. Throughput for heterogeneous SNRs (ρ1 = 1, ρ2 = 4).

an approach that does not perform opportunistic schedul-ing but always transmits after successful contention (‘non-opportunistic’). We consider a scenario with N = 10 stations,half of them with a normalized SNR of ρ1 = 1 and the otherhalf with a normalized SNR ρ2 that varies from 1 to 10.Fig. 6 shows the proportional fairness metric,

∑i log(ri), as

a function of ρ2. We observe that DOC performs at the samelevel as the benchmark given by the optimal configuration,while the other two approaches (DOS and non-opportunistic)provide a substantially lower performance.

For the above scenario with ρ2 = 4, Fig. 7 depicts theindividual throughput allocation of two stations (where r1 isthe throughput of a station with ρ1 and r2 that of a station withρ2). DOC is effective in driving the system to the optimalpoint of operation and provides the same throughput as theoptimal configuration. In contrast, DOS exhibits a high degreeof unfairness as it provides a much higher throughput to thestation with high SNR. The non-opportunistic approach pro-vides a reasonable degree of fairness but has lower throughputdue to the lack of opportunistic scheduling. In conclusion, theproposed DOC algorithm provides a good tradeoff betweenoverall throughput and fairness.

12

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0.01 0.1 1

Thr

ough

put [

Mbp

s]

pk

Using DOCR-

k=R- *

kR-

k=0R-

k=2·R- *

kR-

k=1.5·R- *

kR-

k=0.5·R- *

k

Fig. 8. Throughput of a selfish station for fixed configurations of {pk, R̄k}.

0

2

4

6

8

10

12

14

2 4 6 8 10 12 14 16 18 20

Thr

ough

put [

Mbp

s]

N

Selfish configuration (ρ2=8)




DOC

Fig. 9. Selfish station with fixed configuration for different N and ρ2 values.

C. Selfish station with fixed configuration

We verify that a station cannot obtain more throughput witha selfish configuration than by playing DOC in a scenario withN = 10 stations, half of them with ρ1 = 1 and the other half(including the selfish station) with ρ2 = 4. The selfish stationuses a fixed configuration and all other stations implementDOC. Fig. 8 shows the throughput of the selfish station fordifferent configurations {pk, R̄k} of the selfish station. This iscompared to the throughput that the station would obtain ifit played DOC, given by the horizontal line. We observe thatnone of the selfish configurations provides greater throughputthan DOC.

Fig. 9 analyzes the impact of fixed selfish configurationsfor a range of different N and ρ2 values. It shows the largestthroughput that a selfish station can receive with a fixed config-uration, which is obtained by performing an exhaustive searchover the {pk, R̄k} space. This throughput is compared to thatwhich station would receive if it played DOC. Again, weobserve that the station never benefits from playing selfishly,which validates the design of the DOC algorithm.

0

1

2

3

4

5

N=4 N=8 N=12 N=16 N=20

Thr

ough

put [

Mbp

s]

DOCAdaptive pk strategy

Adaptive R-

k strategyAdaptive pk and R

-k strategy

Fig. 10. Throughput of selfish station with different adaptive strategies.

D. Selfish station with variable configuration

According to Theorem 5, a selfish station cannot benefitfrom changing its configuration over time. To verify this,we evaluate the throughput obtained by a selfish station withdifferent adaptive strategies. These strategies are inspired bythe schemes used in [15] for a similar purpose. The underlyingprinciple of all of them is that the cheating station uses a selfishconfiguration to gain more throughput and, when it realizesthat it is not obtaining more throughput, it assumes that ithas been detected as selfish and switches back to the honestconfiguration to avoid being punished.

In particular, we consider the following strategies. The‘adaptive pk strategy’ fixes the R̄k configuration of the selfishstation to its optimal value, R̄k = R̄∗

k, and modifies the pkconfiguration as follows: the station uses a selfish configura-tion of pk = 1 as long as it obtains some gain, i.e. rk > r∗k.When rk drops below r∗k, the station switches to the honestconfiguration, pk = p∗k, and stays with this configuration aslong as rk remains below 0.95r∗k. It switches back to pk = 1when rk exceeds 0.95r∗k. The ‘adaptive R̄k strategy’ fixes thepk configuration to the optimal value, pk = p∗k, and modifiesthe R̄k configuration following a strategy similar to the oneabove: the station uses a selfish configuration of R̄k = 0 (i.e.,it uses all transmission opportunities) as long as it obtainssome gain and switches to the honest configuration when itstops benefiting. Finally, the ‘adaptive pk and R̄k strategy’follows a similar behavior to the previous ones but adapts theconfiguration of both pk and R̄k.

Fig. 10 compares the throughput obtained with each of theabove strategies to that obtained with DOC for different valuesof N . As expected, when all other stations play DOC, a givenstation maximizes its payoff by playing DOC as well, as thisresults in a larger throughput for the station than any of theother strategies. This confirms the result of Theorem 5.

E. Multiple selfish stations

Corollary 3 states that all of the selfish stations cannotsimultaneously benefit by deviating from DOC: if one or moreof the selfish stations experience throughput gains, there mustbe other selfish stations that suffer some loss. To validate this

13

0.5

1.5

2.5

3.5

Thr

ough

put [

Mbp

s]Kp,Ki

0.5

1.5

2.5

3.5

0 2000 4000 6000 8000 10000

Interval

Kp*10,Ki*10

Fig. 11. Stability analysis of the parameters of the PI controller.

result, we consider a network with N = 10 stations (twoselfish), half of them (including one of the selfish stations) withρ1 = 1 and the other half (including the other selfish station)with ρ2 = 4. We perform an exhaustive search over a widerange of {pi, R̄i} configurations of the two selfish stations. Theresults of this experiment show that there is no configurationthat simultaneously improves the throughput of the two selfishstations, which confirms the result of Corollary 3.

F. Parameter setting of the PI controller

The main objective in the configuration of the Kp andKi parameters proposed in Section IV is to achieve a goodtradeoff between stability and reaction time.

To validate that our system guarantees stable behavior, weanalyze the evolution of the throughput received by a stationover time in a wireless network with N = 10 stations. Fig. 11shows the throughput for the chosen setting (labeled “Kp,Ki”)and for a configuration of these parameters 10 times larger(labeled “Kp∗10,Ki∗10”). We observe that with the proposedsetting, the throughput only suffers minor deviations aroundits average value. In contrast, for a larger setting, it exhibitshighly oscillatory, unstable behavior.

To investigate the speed with which the system reactsagainst selfish stations, we consider the following scenario.In a wireless network with N = 10 stations, initially allstations play DOC. After 50 intervals, one station becomesselfish and changes its access probability to pk = 1. Fig. 12shows the evolution of the throughput of the selfish stationover time. We observe from the figure that with our setting(labeled “Kp,Ki”), the system reacts quickly, and after a fewtens of intervals the selfish station no longer benefits from itsbehavior. In contrast, for a parameter setting 10 times smaller(labeled “Kp/10,Ki/10”) the reaction is very slow and ittakes almost 2000 intervals for the station to stop benefitingfrom its misbehavior.8

The results show that with a larger setting of {Kp,Ki}the system suffers from instability while with a smaller one it

8We note that, while the analysis of Section IV guarantees stability whenall stations run DOC, our system is also stable when some of the stations areselfish. This is shown by the experiment of Fig. 12 where, after one of thestations turns selfish, the others increase their access probability to a value thatensures the selfish station does not have any gain. The system then remainsstable at this point of operation.

2

6

10

14

Thr

ough

put [

Mbp

s]

Kp,Ki

2

6

10

14

0 500 1000 1500 2000

Interval

Kp/10,Ki/10

Fig. 12. Speed of reaction provided by the parameters of the PI controller.

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

optimalconfiguration

DOC DOS non-opportunistic

Thr

ough

put [

Mbp

s]

r1r2

Fig. 13. Performance with Jakes’ channel model.

reacts too slowly. Hence, the proposed setting provides a goodtradeoff between stability and reaction time.

G. Impact of channel coherence time

Our channel model is based on the assumption that differentobservations of the channel conditions are independent. Inorder to understand the impact of this assumption, we repeatthe experiment of Fig. 7 using Jakes’ channel model [20] toobtain the different channel observations. The results, for aDoppler frequency of fD = 2π/100τ , are given in Fig. 13.We observe that the throughput obtained is slightly smallerthan that of Fig. 7. This is due to the fact that when thechannel is bad, a station does not transmit after a successfulcontention and therefore it takes (on average) a shorter timeuntil the next successful contention of this station. As a result,a station accesses the channel more often when it is bad thanwhen it is good, which introduces a bias that slightly reducesthe throughput. Overall, the results are sufficiently similar tothose of Fig. 7 to conclude that our assumption on the channelmodel only has a minor impact on the resulting performance.

We further investigate whether, in the above scenario, astation with ρ2 = 4 could obtain more throughput by using aselfish configuration. While the station obtains 1.752 Mbpswith DOC, it can obtain up to 1.757 Mbps with a selfishconfiguration. Note that this increase is not due to the DOCdesign, as no other configuration provides the selfish stationwith more channel time, but rather due to the fact that thetransmission rate threshold of [3] is not truly optimal under

14

0

0.5

1

1.5

2

2.5T

hrou

ghpu

t [M

bps]

Number of intervals (I)

DOCSelfish

500020001000500200100

Fig. 14. No throughput gain for the selfish station with stations joining andleaving the network.

Jakes’ channel model. In any case, the throughput gain of theselfish station is negligible.

H. Stations joining and leaving the network

To assess the effectiveness of DOC with stations joining andleaving the network, we perform the following experiment. Weconsider a wireless network with 5 stations, one of which is aselfish station. After 1000 intervals, 5 additional stations jointhe wireless network, stay for I intervals and then leave. Theinitial 5 stations stay for another 1000 intervals. The selfishstation plays with a configuration {p1, R̄1} when there are5 stations in the network and a configuration {p2, R̄2} whenthere are 10 stations. We obtain these configurations by per-forming an exhaustive search over all possible configurationsand selecting the {p1, R̄1} and {p2, R̄2} values that providethe selfish station with the largest average throughput. Fig. 14shows the average throughput obtained by the station with thisselfish strategy compared to the throughput it would obtain ifit played DOC. The results confirm that the selfish stationcannot obtain any gain by deviating from DOC.

VII. CONCLUSIONS

Recently proposed Distributed Opportunistic Scheduling(DOS) techniques provide throughput gains in wireless net-works that do not have a centralized scheduler. One of theproblems of these techniques, however, is that they are vul-nerable to malicious users who may configure their parametersto obtain a greater share of the wireless resources. In this paperwe address this problem by proposing a novel algorithm thatprevents such throughput gains from selfish behavior. Withour approach, upon detecting a selfish user, stations react byusing a more aggressive parameter configuration that indirectlypunishes the selfish station. Such an adaptive algorithm has tocarefully adjust the reaction against a selfish station in order toprevent the system from becoming unstable. A key aspect ofthe paper is that we use tools from control theory combinedwith game theory to design our algorithm: by conducting acontrol theoretic analysis, we show that when all stations runDOC the system converges to the desired configuration, andby conducting game theoretic analysis, we show that selfishstations cannot benefit from playing a different strategy.

ACKNOWLEDGEMENTS

The authors would like to thank the anonymous reviewersand Dr. S. Murphy for their valuable feedback.

REFERENCES

[1] M. Andrews et al., “Providing quality of service over a shared wirelesslink,” vol. 39, no. 2, pp. 150–154, Feb. 2001.

[2] P. Viswanath, D. Tse, and R. Laroia, “Oppurtunistic beamforming usingdumb antennas,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1277–1294,Jun. 2002.

[3] D. Zheng, W. Ge, and J. Zhang, “Distributed opportunistic scheduling forad hoc networks with random access: An optimal stopping approach,”IEEE Trans. Inf. Theory, vol. 55, no. 1, pp. 205–222, Jan. 2009.

[4] D. Zheng et al., “Distributed opportunistic scheduling for ad hoc com-munications with imperfect channel information,” IEEE Trans. WirelessCommun., vol. 7, no. 12, pp. 5450–5460, Dec. 2008.

[5] P. S. C. Thejaswi et al., “Distributed opportunistic scheduling with two-level probing,” IEEE/ACM Trans. Netw., vol. 18, no. 5, pp. 1464 –1477,Oct. 2010.

[6] S. Tan, D. Zheng, J. Zhang, and J. R. Zeidler, “Distributed opportunisticscheduling for ad-hoc communications under delay constraints,” in Proc.IEEE INFOCOM, Mar. 2010, pp. 2874–2882.

[7] A. Garcia-Saavedra, A. Banchs, P. Serrano, and J. Widmer, “Distributedopportunistic scheduling: A control theoretic approach,” in Proc. IEEEINFOCOM, Mar. 2012, pp. 540–548.

[8] P. Patras, A. Banchs, P. Serrano, and A. Azcorra, “A control-theoreticapproach to distributed optimal configuration of 802.11 wlans,” IEEETrans. Mob. Comput., vol. 10, no. 6, pp. 897–910, Jun. 2011.

[9] F. Kelly, “Charging and rate control for elastic traffic,” Eur. Trans.Telecommun., vol. 8, no. 1, pp. 33–37, Jan. 1997.

[10] G. Holland, N. Vaidya, and P. Bahl, “A rate-adaptive mac protocol formulti-hop wireless networks,” in Proc. ACM MOBICOM, Jul. 2001, pp.236–251.

[11] B. Sadhegi, V. Kanodia, A. Sabharwal, and E. Knightly, “Opportunisticmedia access for multirate ad hoc networks,” in Proc. ACM MOBICOM,Sep. 2002, pp. 24–35.

[12] I. Menache and N. Shimkin, “Rate-based equilibria in collision channelswith fading,” IEEE J. Sel. Areas Commun., vol. 26, no. 7, pp. 1070–1077, Sep. 2008.

[13] P. Gupta, Y. Sankarasubramaniam, and A. Stolyar, “Random-accessscheduling with service differentiation in wireless networks,” in Proc.IEEE INFOCOM, Mar. 2005, pp. 1815–1825.

[14] T. Glad and L. Ljung, Control Theory: Multivariable and NonlinearMethods. London: Taylor & Francis, 2000.

[15] M. Cagalj, S. Ganeriwal, I. Aad, and J.-P. Hubaux, “On selfish behaviorin csma/ca networks,” in Proc. IEEE INFOCOM, Mar. 2005, pp. 2513–2524.

[16] J. Konorski, “A game-theoretic study of csma/ca under a backoff attack,”IEEE/ACM Trans. Netw., vol. 16, no. 6, pp. 1167–1178, Dec. 2006.

[17] G. F. Franklin, J. D. Powell, and M. L. Workman. Reading, MA:Addison-Wesley, 1990.

[18] C. Hollot, V. Misra, D. Towsley, and W. B. Gong, “A control theoreticanalysis of red,” in Proc. IEEE INFOCOM, Apr. 2001, pp. 1510–1519.

[19] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA: MIT Press,1991.

[20] W. C. Jakes, Microwave Mobile Communications. New York: Wiley& Sons, 1975.

15

Albert Banchs (M’04–SM’12) received his degreein telecommunications engineering from the Poly-technic University of Catalonia in 1997, and hisPh.D. degree from the same university in 2002. Hereceived a national award for the best Ph.D. thesis onbroadband networks. He was a visiting researcher atICSI, Berkeley, in 1997, worked for Telefonica I+D,Spain, in 1998, and for NEC Europe Ltd., Germany,from 1998 to 2003. He has been with the UniversityCarlos III of Madrid since 2003. Since 2009, he alsohas a double affiliation as Deputy Director of the

institute IMDEA Networks.Albert Banchs has authored over 80 publications in peer-reviewed journals

and conferences and holds six patents. He is area editor for ComputerCommunications and has been senior and associate editor for IEEE Communi-cations Letters and guest editor for IEEE Wireless Communications, ComputerNetworks and Computer Communications. He has served on the TPC of anumber of conferences and workshops including IEEE INFOCOM, IEEE ICCand IEEE GLOBECOM, and has been TPC chair for European Wireless 2010,IEEE HotMESH 2010 and IEEE WoWMoM 2012. He is senior member ofIEEE.

Andres Garcia-Saavedra received his B.Sc. degreein Telecommunications Engineering from Universityof Cantabria in 2009, and his M.Sc. degree inTelematics Engineering from University Carlos IIIof Madrid (UC3M) in 2010. He currently holdsthe position of Teaching Assistant and pursues hisPh.D. in the Telematics Department of UC3M. Hiswork focuses on performance evaluation and energyefficiency of wireless networks.

Pablo Serrano (M’09) received his degree intelecommunication engineering and his Ph.D. degreefrom University Carlos III of Madrid (UC3M) in2002 and 2006, respectively. He has been with theTelematics Department of UC3M since 2002, wherehe currently holds the position of associate professor.In 2007 he was a visiting researcher at the ComputerNetwork Research Group at the University of Mas-sachusetts, Amherst. His current work focuses onperformance evaluation of wireless networks. He hasauthored over 40 scientific papers in peer-reviewed

international journals and conferences. He also serves as TPC member ofseveral international conferences, including IEEE GLOBECOM and IEEEINFOCOM.

Joerg Widmer (M’06–SM’10) is a Chief Re-searcher at Institute IMDEA Networks in Madrid,Spain. His research expertise covers computer net-works and distributed systems, ranging from MAClayer design, sensor networking, and network codingto transport protocols and Future Internet architec-tures. From June 2005 to July 2010, he was managerof the Ubiquitous Networking Research Group atDOCOMO Euro-Labs in Munich, Germany, leadingseveral projects in the area of mobile and cellularnetworks. Before joining DOCOMO Euro-Labs, he

worked as post-doctoral researcher at EPFL, Switzerland, focusing on ultra-wide band communication and network coding.

Joerg Widmer received his M.S. and Ph.D. degrees in computer sciencefrom the University of Mannheim, Germany in 2000 and 2003, respectively.In 1999 and 2000 he was a visiting researcher at the International Com-puter Science Institute in Berkeley, CA, USA. He has authored more than100 conference and journal papers and three IETF RFCs. He also holdsseveral patents, serves on the editorial board of IEEE TRANSACTIONS ONCOMMUNICATIONS, and regularly participates in the program committees ofseveral major conferences. He is senior member of IEEE and ACM.

Documents

A Game Theoretic Approach to Distributed Opportunistic ... · Banchs et al.: A Game Theoretic Approach to Distributed Opportunistic Scheduling Abstract—Distributed Opportunistic