15
Mathematical Social Sciences 56 (2008) 395–409 www.elsevier.com/locate/econbase Codification schemes and finite automata Pen´ elope Hern´ andez * , Amparo Urbano Departamento de An ´ alisis Econ´ omico and ERI-CES, Universidad de Valencia, Campus dels Tarongers, Edificio Departamental Oriental, Avda. dels Tarongers, s/n. 46022 Valencia, Spain Received 1 October 2005; received in revised form 1 November 2006; accepted 1 January 2008 Available online 16 February 2008 Abstract This paper is a note on how Information Theory and Codification Theory are helpful in the computational design of both communication protocols and strategy sets in the framework of finitely repeated games played by bounded rational agents. More precisely, we show the usefulness of both theories to improve the existing automata bounds on the work of Neyman (1998) Finitely repeated games with finite automata, Mathematics of Operations Research, 23 (3), 513–552. c 2008 Elsevier B.V. All rights reserved. JEL classification: C73; C72 Keywords: Complexity; Codification; Repeated games; Finite automata 1. Introduction This paper is a note on how Information Theory and Codification Theory are helpful in the computational design of both communication protocols and strategy sets in the framework of finitely repeated games played by boundedly rational agents. More precisely, we will show their usefulness for improving the existing automaton bounds of Neyman (1998) work on finitely repeated games played by finite automata. Until quite recently, Economics and Game Theory did not consider computational complexity as a valuable resource. Therefore, it has sometimes been predicted that rational agents will invest a large amount of computation even for small payoffs and for all kinds of deviations, in * Corresponding author. E-mail addresses: [email protected] (P. Hern´ andez), [email protected] (A. Urbano). 0165-4896/$ - see front matter c 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.mathsocsci.2008.01.007

Codification schemes and finite automata

Embed Size (px)

Citation preview

Mathematical Social Sciences 56 (2008) 395–409www.elsevier.com/locate/econbase

Codification schemes and finite automata

Penelope Hernandez∗, Amparo Urbano

Departamento de Analisis Economico and ERI-CES, Universidad de Valencia, Campus dels Tarongers, EdificioDepartamental Oriental, Avda. dels Tarongers, s/n. 46022 Valencia, Spain

Received 1 October 2005; received in revised form 1 November 2006; accepted 1 January 2008Available online 16 February 2008

Abstract

This paper is a note on how Information Theory and Codification Theory are helpful in the computationaldesign of both communication protocols and strategy sets in the framework of finitely repeated gamesplayed by bounded rational agents. More precisely, we show the usefulness of both theories to improvethe existing automata bounds on the work of Neyman (1998) Finitely repeated games with finite automata,Mathematics of Operations Research, 23 (3), 513–552.c© 2008 Elsevier B.V. All rights reserved.

JEL classification: C73; C72

Keywords: Complexity; Codification; Repeated games; Finite automata

1. Introduction

This paper is a note on how Information Theory and Codification Theory are helpful in thecomputational design of both communication protocols and strategy sets in the framework offinitely repeated games played by boundedly rational agents. More precisely, we will show theirusefulness for improving the existing automaton bounds of Neyman (1998) work on finitelyrepeated games played by finite automata.

Until quite recently, Economics and Game Theory did not consider computational complexityas a valuable resource. Therefore, it has sometimes been predicted that rational agents willinvest a large amount of computation even for small payoffs and for all kinds of deviations, in

∗ Corresponding author.E-mail addresses: [email protected] (P. Hernandez), [email protected] (A. Urbano).

0165-4896/$ - see front matter c© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.mathsocsci.2008.01.007

396 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

particular, those from cooperative behavior. However, if we limit players’ computational power,then new solutions might emerge, thus providing one of the main motivations for theories ofbounded rationality in Economics (see, Simon (1955)). Theories of bounded rationality assumethat decision makers are rational in a broad sense, but not omniscient: they are rational but withlimited abilities. One way to model computational capacity bounds is by assuming that decisionmakers use machines (automata) to implement their strategies. The machine size, or the numberof states of the automaton, determines the bounds on the agents’ rationality.

Since the eighties there have been several papers on the repeated games literature whichhave studied the conditions under which the set of feasible and rational payoffs are equilibriumoutcomes, when there are bounds (possibly very large) in the number of strategies that playersmay use. In the context of strategies implemented by finite automata, these bounds havebeen related to the complexity of the automata playing the equilibrium strategy (see Neyman(1985), Rubinstein (1986), Abreu and Rubinstein (1988), Ben-Porath (1993), Neyman (1997),Papadrimitriou and Yannakakis (1994), Neyman and Okada (1999), Neyman and Okada(2000a,b), Gossner and Hernandez (2003), Zemel (1989), among others).

The number of strategies in finitely repeated games is finite but huge, hence full rationalityin this context can be understood as the ability to implement any strategy of such games. In thecontext of finite automata, this implies that if an agent used an automaton of exponential size withrespect to the length of the game, then she could implement all the strategies and, in particular,the one entailing knowledge of the last stage of the game. Alternatively, if the automaton sizeis polynomial in the length of the game, then there will be several strategies which cannot beimplemented, for instance defection in the last stage of the game. Bounds on automaton sizes areimportant for they give a measure of how far agents are from rational behavior. The bigger thebounds the closer to full rationality. Therefore, improving the automata bounds means gettingcloser to fully rational behavior while still achieving the Folk Theorem payoffs.

It has been shown in Neyman (1998) that players in a long but finite interaction can getarbitrarily close to the cooperative payoffs even if they choose large automata, provided that theyare allowed to randomize on their choices of machines. Specifically, the key idea is that if aplayer randomizes among strategies, then her opponent will be forced to fill up his capacity inorder to be able to respond to all the strategies in the support of such a random strategy. Thestructure of Neyman’s equilibrium strategy is twofold: first, the stronger player (the one with thebigger automaton) specifies the finite set of plays which forms her mixed strategy (uniformlydistributed over the above set); and second, she communicates a play through a message to theweaker player.

Neyman’s proof of such an equilibrium strategy is by construction and the framework offinite automata imposes two additional requirements, apart from strategic considerations. Onerefers to the computational work, which entails measuring the number of plays and buildingup a communication protocol between the two machines. The other is related to the problemof the automata design in the framework of game theory. These requirements jointly determinethe complexity needed to support the equilibrium strategy in games played by finite automata.Specifically, the complexity of the set of plays is related to the number of states of the weakerplayer. These equilibrium strategies can be implemented by finite automata whose size is asubexponential function of the approximation to the targeted payoff and of the length of therepeated game. Neyman’s result is very important for two reasons. The first is that it suggests thatalmost rational agents may follow a cooperative behavior in finitely repeated games as opposedto their behavior under full rationality. The second is that it enables to quantify the distance fromfull rationality through a bound on the automaton size.

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 397

Our contribution focuses on the computational work, i.e. on the design of both the set of playsand the communication protocol. Thus, Information Theory is firstly applied to construct the setof equilibrium plays. Plays differ from each other only at a set of stages called the verificationphase. In our approach, the complexity of the mixed strategy is related to the entropy of theempirical distribution of each verification sequence. This captures, from an Information Theoryviewpoint, the cardinality of the set of equilibrium plays and hence its complexity. Secondly, byCodification Theory an optimal communication scheme, mapping plays into messages, is offeredwhich produces “short words” guaranteeing the efficiency of the process. A set of messagesis proposed with the properties that each of them has the same empirical distribution and thatall of them are of the minimal length. The former property implies that all plays induce thesame payoff and the latter ensures the efficiency of the process. Since the size of the weakerplayer’s automaton implementing the equilibrium strategy is determined by the complexity ofthe set of plays, this set is expanded as much as possible while maintaining the same payoff perplay. Moreover, the length of the communication sequences is the shortest one given the aboveset. Both procedures allow us to obtain an equilibrium condition which improves upon that ofNeyman. Under his strategic approach, our result offers the largest upper bound on the weakerplayer’s automaton implementing the cooperative outcome: this bound is an exponential functionof both the entropy of the approximation to the targeted payoff and the number of repetitions.

The paper is organized as follows. The standard model of finite automata playing finitelyrepeated games is reviewed in Section 2. Section 3 offers our main result and states Neyman’sTheorem, with Section 4 explaining the key features behind his construction. The tools ofInformation Theory and Codification Theory needed for our approach are presented in Section 5.This section also shows the construction of the verification and communication sets as well as theconsequences of our construction on the measure of the weaker player’s size bound. Concludingremarks close the paper.

2. The model

For k ∈ R, let dke denote the superior integer part of k, i.e. k ≤ dke < k + 1. Given a finiteset Q, |Q| denotes the cardinality of Q.

Let G = ({1, 2}, (Ai )i∈{1,2}, (ri )i∈{1,2}) be a game where {1, 2} is the set of players. Ai is afinite set of actions for player i (or pure strategies for player i) and ri : A = A1 × A2 −→ R isthe payoff function of player i and r = (r1, r2).

Denote by ui (G) the individually rational payoff of player i in pure strategies, i.e., ui (G) =min max ri (ai , a−i ) where the max ranges over all pure strategies of player i , and the minranges over all pure strategies of the other player. For any finite set B denote by ∆(B) the set ofall probability distributions on B. An equilibrium of G is a pair σ = (σ1, σ2) ∈ ∆(A1)×∆(A2)

such that for every i and any strategy of player i , τi ∈ Ai , ri (τi , σ−i ) ≤ ri (σ1, σ2), wherer(σ ) = E σ (r(ai , a−i )), where Eσ denotes the expectation operator with respect to σ . If σ is anequilibrium, the vector payoff r(σ ) is called an equilibrium payoff and let E(G) be the set of allequilibrium payoffs of G.

From G define a new game in strategic form GT which models a sequence of T plays of G,called stages. By choosing actions at stage t , players are informed of actions chosen in previousstages of the game. Denote the set of all pure strategies of player i in GT by Σi (T ).

Any pair σ = (σ1, σ2) ∈ ×Σi (T ) of pure strategies induces a play ω(σ) =

(ω1(σ ), . . . , ωT (σ )) with ωt (σ ) = (ωt1(σ ), ω

t2(σ )) defined by ω1(σ ) = (σ1(�), σ2(�)) and

by the induction relation ωti (σ ) = σi (ω

1(σ ), . . . , ωt−1(σ )).

398 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

Let rT (σ ) =1T

∑Tt=1 r(ωt (σ )) be the average vector payoff during the first T stages induced

by the strategy profile σ.A finite automaton Mi of player i is a finite machine which implements pure strategies

for player i in strategic games in general, and in repeated games in particular. Formally,Mi =< Qi , q0

i , fi , gi >, where, Qi is the set of states; q0i is the initial state; fi is the action

function, fi : Qi → Ai and gi is the transition function from state to state, gi : Qi × A−i → Qi .The size of a finite automaton is the number of its states, i.e. |Qi | = mi .

Finally, define a new game in strategic form GT (m1,m2) which denotes the T stage repeatedversion of G, with the average payoff as the evaluation criterion and with all the finite automataof size mi as the pure strategies of player i , i = 1, 2. Let Σi (T,mi ) be the set of pure strategiesin GT (m1,m2) that are induced by an automaton of size mi .

A finite automaton for player i can be viewed as a prescription for this player to chooseher action in each stage of the repeated game. If at state q the other player chooses theaction tuple a−i , then the automaton’s next state is qi = gi (q, a−i ) and the associatedaction is fi (qi ) = fi (gi (q, a−i )). More generally, define inductively, gi (q, b1, . . . , bt ) =

gi (gi (q, b1, . . . , bt−1), bt ); the action prescribed by the automaton for player i at stage j isfi (gi (qi , a1

−i , . . . , at−1−i )), where a j

−i ∈ A−i .For every automaton Mi , consider the strategy σM

i , of player i in GT (m1,m2), such thatthe action at stage t is the one dictated by the action function of the automaton, after theupdating of the transition function for the history of length t − 1. Namely, σM

i (a1, . . . , at−1) =

fi (gi (q, a1−i , . . . , at−1

−i )). The strategy σi for player i in GT is implementable by the automatonMi if both strategies generate the same play for any strategy τ of the opponent, i.e. σi is equivalentto σM

i for every τ ∈ Σ−i (T ): ω(σi , τ ) = ω(σMi , τ ). Denote by E(GT (m1,m2)) the set of

equilibrium payoffs of GT (m1,m2).Define the complexity of σi denoted by comp(σi ), as the size of the smallest automaton that

implements σi . See Kalai and Standford (1988) and Neyman (1998), for a fuller explanation.Any pair σ = (σ1, σ2) of pure strategies GT (m1,m2) induces a play ω(σ1, σ2) of length T (ω,

henceforth). Conversely, since a finite play of actions, ω, can be regarded as a finite sequence ofpairs of actions, i.e. ω = (a1, . . . , at ), then the play ω and a pure strategy σi are compatible, if forevery 1 ≤ s ≤ T , σi (a1, . . . , as−1) = as

i . Now, given a play ω, define player i’s complexity ofω, compi (ω), as the smallest complexity of a strategy σi of player i which is compatible with ω.

compi (ω) = inf {comp(σi ) : σi ∈ Σi is compatible with ω} .

Let Q be a finite set of plays. A pure strategy σi of player i is conformable to Q if it iscompatible with any ω ∈ Q. The complexity of player i of a set of plays Q is defined as thesmallest complexity of a strategy σi of player i that is conformable to Q.

compi (Q) = inf{compi (σ ) : σ ∈ Σi is conformable to Q

}.

Assume without any loss of generality that player 2 has a bigger size than player 1, i.e. m2 >

m1, therefore player 1 will be referred to as the “weaker player” (he) and player 2 as the“stronger” player (she).

3. Main result

The main result improves upon the weaker player’s automaton upper bound – m1 – whichguarantees the existence of an equilibrium payoff of GT (m1,m2), ε-close to a feasible andrational payoff.

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 399

The idea is that in the context of finitely repeated games, deviations in the last stages couldbe precluded if players did not know the end of the game. This may be achieved if playersimplement their strategies by playing with finite automata which cannot count until the last stageof the game. On the contrary, player i will deviate if she is able to implement cycles whoselength is at least the number of repetitions. Hence, if player i had to respond to different playsof a length smaller than the number of repetitions, then she could exhaust her capacity and at thesame time would be unable to count until the end of the game. Therefore, a player could fill upthe rival’s capacity by requiring her to conform with distinct plays of sufficiently large length.

With this idea in mind, Neyman (1998) establishes the existence of an equilibrium payoffof GT (m1,m2) which is ε-close to a feasible and rational payoff even if the size of player 1’sautomaton (the smaller automaton) is quite big, i.e. a subexponential function of ε and of thenumber of repetitions, T :

Theorem (Neyman, 1998). Let G = ({1, 2}, A, r) be a two-person game in strategic form.Then, for every ε sufficiently small, there exist positive integers T0 and m0, such that if T ≥ T0,and x ∈ co(r(A)) with xi > ui (G) and m0 ≤ m1 ≤ exp(ε3T ) and m2 > T then there existsy ∈ E(GT (m1,m2)) with |yi − xi | < ε.

Basically, the structure of Neyman’s equilibrium strategies is as follows:

• The stronger player selects a sequence that leads approximately to the targeted payoff. Thesequence is selected uniformly from a set of plays denoted by Q that all yield the same payoff.• This player codifies the selected play into a message, transmitted to the other player in the

communication phase.• There exists a one-to-one correspondence between the set Q and the set of messages M .• The difference among the distinct plays is a small portion of each play, which is called the

verification phase. The set of sequences of the verification phase is denoted by V .

The above design permits the filling up of the weaker player’s capacity, since this player hasto respond to any possible play selected by the stronger player.

Similarly to Neyman’s result, we offer the equilibrium conditions in terms of the size of theweaker player’s automaton implementing the equilibrium play. The main difference is that ourupper bound includes previous bound domains: the size of m1 is much bigger. This is due to theapplication of Information Theory to the optimal construction of the set of verification sequencesand the associated communication scheme.

More specifically, to establish Theorem 1, we construct the set V of verification sequencesand the associated set M of communication sequences. Set M is designed such that its sequencesare as short as possible for a given V . Each of both sets is characterized by the length andby the empirical distribution of the sequences belonging to each of them. Namely, the lengthof the verification sequences is k, which depends on the parameters of the game T and m1.The empirical distribution of such sequences, denoted by r, depends on the ε-approximationto the targeted payoff. The length of the communication sequences is k, and it is related tothe parameters of V . More precisely, k is bounded by dk H(r)e < k ≤ 2k H(r), where H(r)corresponds to the entropy1of the empirical distribution of the sequences in V . Finally, theempirical distribution of the sequences in M is the uniform distribution.

1 The entropy function H is given by H(x) = −x log2 x − (1− x) log2(1− x) for 0 < x < 1.

400 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

The above conditions on set M show that, under our construction, the communicationsequences are the shortest ones for constructing the equilibrium strategies. The formal statementof this claim is presented in Section 5 below (see Lemma 1) after surveying the basic tools ofInformation Theory needed to prove it.

Our theorem is stated as follows:

Theorem 1. Let G = ({1, 2}, A, r) be a two-person game in strategic form. Then for everyε sufficiently small, there exist positive integers T0 and m0, such that if T ≥ T0, and x ∈co(r(A)) with xi > ui (G) and m0 ≤ m1 ≤ exp( T

27 H(ε)) and m2 > T then there existsy ∈ E(GT (m1,m2)) with |yi − xi | < ε.

Both Theorems follow from conditions on: (1) a feasible payoff x ∈ co(r(A)); (2) a positiveconstant ε > 0; (3) the number of repetitions T , and (4) the bounds of the automata sizes m1and m2, that guarantee the existence of an equilibrium payoff y of the game GT (m1,m2) that isε-close to x .

One of the conditions is specified in terms of the inequalities mi ≥ m0, where m0 issufficiently large. This lower bound m0 does not depend on the communication structure andtherefore it remains the same as that of Neyman. Another condition requires the bound ofone or both automata sizes to be subexponential in the number of repetitions, i.e. a conditionthat asserts that (log mi )/T is sufficiently small. Neyman’s upper bound on m1 is written interms of exp(ε3T ). In contrast, our upper bound on m1 is written in terms of exp( T

27 H(ε)).This condition comes from the use of codification tools to construct the communication andverification sets and hence it is expressed in terms of the entropy function, as will be made clearlater on. Therefore, the improvement upon Neyman’s subexponential condition is connected tothe codification schemes to be explained in Section 5.

4. The scheme of Neyman’s equilibrium play

We present an informal description of the equilibrium strategy construction. Since thestructure of the equilibrium strategies as well as their formalization are as that of Neyman, weoffer a sketch of the construction (for more details see Neyman (1998, pages 534–549)). For thesake of simplicity, we assume that only two action profiles are needed to generate the targetedpayoff. Therefore the following analysis depends on the parameters of this subcase.

Communication phase: Neyman’s Theorem is stated in terms of an upper bound, m1 ≤

exp(ε3T ), of the weaker player’s size which is related to the length of the game T , and the ε-approximation to the targeted payoff. This bound gives the constraints on a new parameter kwhich determines both the cardinality of the set of plays Q and the length of the communicationphase. Following Neyman, let k be the unique integer such that:

2k−1≤

m1 − l

l< 2k (1)

where l is the length of the cyclic part of the proposed play.Fix two actions for each player and rename them 0 and 1. Knowing player 1’s automaton

size, player 2 chooses the set of messages. Each message is a sequence of the above two actions.Player 1 responds to any message by playing a fixed action (for instance 0), independently of themessage sent by player 2. The communication sequences consist of messages of length 2k. Thefirst k stages coincide with an element of {0, 1}k and the remaining stages start with a string of

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 401

zeros followed by a string of ones such that the total number of ones (or that of zeros) equals k.This parameter also gives the cardinality of the set of plays Q, which by Eq. (1), is lower than 2k .

Play phase: After the communication phase the equilibrium play enters into the play phase,which consists of a cycle repeated a finite number of times L ≥ 4. The remaining stages until T ,d = T − 2k− Ll, are completed with the repetitions of the action pairs (0, 0), with the last stagecorresponding to (0, b2), where b2 denotes the best response of player 2 to the action 0 of player1. The length l of the cycle does not depend on the signal sent by player 2 and it is at most T

4 .The cycle has two parts: the verification play and the regular play. The regular play is commonfor every signal and it consists of a sequence of pairs of actions ε-close to the targeted payoff x .

Players follow a verification play that consists of a coordinated sequence of actions from setV , with length 2k and coinciding with the signal sent in the communication phase. This sequenceof actions can be understood as a coordination process which determines each play compatiblewith player 1’s equilibrium strategy σ . The cardinality of set V is the same as that of set Q.

The relationship between the communication phase and the verification play is summarizedas follows: (1) Set M coincides with set V , (2) both phases are of the same length 2k and, (3) themap between V and M is the identity.

The number of states needed to follow the equilibrium strategies corresponds to the numberof states for each play times the number of possible plays Q. As |Q| < 2k , then the upper boundof player 1 has to be lower than 2kl in order to implement both the communication phase and theplay phase, i.e. m1 < 2kl. It is straightforward to check that player 1 is able to implement bothphases for any k satisfying Eq. (1). In particular, for k ≤ d ε

3Tln 2 −

ln Tln 2 e,

m1 < 2kl ≤T

42k

=T

4ln 2 exp(k)

≤T

4exp

(ε3T

ln 2−

ln T

ln 2

)≈ exp(ε3T ).

The above analysis uncovers the two computational features to design the equilibriumstructure. The first one is the design of the set of possible plays given the parameters of the game.The difference among the plays is given by the verification play which satisfies the followingproperties. First, each sequence has the same empirical distribution to deter player 2’s deviationsby selecting the best payoff sequence. Second, player 2 fills up player 1’s capacity by generatingenough different plays so that the number of remaining states is sufficiently small. In this way,player 1’s deviations from the proposed play by counting up until the last stage of the game areavoided. Therefore, the cardinality of this set gives the upper bound on the weaker player’s size,supporting the equilibrium strategies.

The second computational feature is concerned with the communication phase: given the setof plays, player 2 selects one of these possible plays and communicates it by a message. Sinceplayer 2 proposes the plays, messages have to be independent of the associated payoffs to eachof them.2

2 These two features have to be combined with the implementation requirements of finite automata. To avoidrepetitions, this issue is explained in more detail in the next section.

402 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

The above are the main points of Neyman’s construction. It should be clear that not all 3-tupleof sets Q, M , and V can be implemented as equilibrium strategies by finite automata, but thoseimplementable provide the upper bound on m1. Our query is whether it is possible to improveupon such a bound in order to get automata that are closer to fully rational behavior. To do that,we stick to both the same equilibrium structure and the same implementation procedure as thatof Neyman and focus on the computational features of the equilibrium strategies, namely on thedesign of sets Q, M and V .

5. Codification schemes and the equilibrium play

The size of player 1 (the set of states) has to be allocated in such a way that he first identifiesand then plays a specific path. We can view this task as the construction of a matrix of automatonstates with O(T ) columns – the length of the cycle l – and about m1

l rows — the number ofpossible paths which equal the cardinality of set V . The rows then correspond to the set of playswhich player 1 has to conform with, thus filling up his capacity.

A solution to this problem is equivalent to solving a codification problem in InformationTheory, since the verification sequences have to be codified in the communication phase. Tocodify means to describe a phenomenon. The realization of this phenomenon can be viewed asthe representation of a random variable. A codification problem is then just a one-to-one mapping(denoted the source code) from a finite set – the range of a random variable or input – to anotherset of sequences of finite length or output sequences, with specific properties. The most importantone is that the expected length of the source code of the random variable is as short as possible.With this requirement we achieve an optimal data compression.

We proceed to construct the set of verification sequences and the associated communicationscheme under the framework of finite automata. The key points of the construction are: (1) thecharacterization of such sequences by their empirical distribution and (2) the design of the set ofcommunication sequences by the optimal codification of the verification set. The above sets ofsequences have to satisfy the following requirements:

• R1: Each message determines a unique play, therefore a unique verification sequence.• R2: Each communication sequence and each verification sequence has the same associated

payoff in order to preclude any strategic deviation.• R3: The length of each signal is the smallest positive integer such that it can be implemented

by a finite automaton and therefore generates the lowest distortion from the targeted payoff.• R4: The end of both the communication and the verification phases has to be signalled in

order to properly compute the complexity associated with any equilibrium strategy.

In our setting the set of verification sequences is the input set, the set of messages is the outputset and the random variable is the uniform distribution over the input set. All the sequences ofthe verification set V have the same empirical distribution, finite length k + 1, and a fixed lastcomponent to signal the end of such a phase. The codification of this set V results in the setof messages for the communication phase. This is the output set consisting of strings of finitelength from the binary alphabet and of optimal length. The communication sequences have thesame empirical distribution to deter payoff deviations and are of minimal length by the optimalcodification theory results.

The formal details of our construction are presented next. Firstly, by the methodology oftypes, the class of sequences with the same empirical distribution is considered. This class givesus the verification sequences. Secondly, the information properties of the verification sequences

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 403

are analyzed by the entropy concept. Finally, the minimal length of the optimal codification ofsuch sequences is offered. This codification generates the communication sequences.

5.1. Information theory and codification

Some basic results from Information Theory are presented here. For a more completeexplanation, consult Cover and Thomas (1991).

Let X be a random variable over a finite set Θ , whose distribution is p ∈ ∆(Θ), i.e.p(θ) = Pr(X = θ) for each θ ∈ Θ . The entropy H(X) of X is defined by H(X) =−Σθ∈Θ p(θ) log(p(θ)) = −EX

[log p(X)

], where 0 log 0 = 0 by convention. With some abuse

of notation, denote the entropy of a binary random variable X by H(p), where p ∈ [0, 1].Let x = x1, . . . , xn be a sequence of n symbols from a finite alphabet Θ . The type Px (or

empirical probability distribution) of x is the relative proportion of occurrences of each symbolof Θ , i.e. Px(a) =

N (a|x)n for all a ∈ Θ , where N (a | x) is the number of times that a occurs in the

sequence x ∈ Θn . The set of types of sequences of length n is denoted by Pn = {Px | x ∈ Θn}.If P ∈ Pn , then the set of sequences of length n and type P is called the type class of P , denotedby Sn(P) = {x ∈ Θn

: Px = P} .The cardinality of a type class Sn(P) is related to the type entropy:

1

(n + 1)|Θ |2nH(P)

≤ |Sn(P)| ≤ 2nH(P). (2)

Some notions of codification and data compression are presented next.A source code C for a random variable X is a mapping from Θ , the range of X , to D∗ ,

the set of finite length strings of symbols from a D-ary alphabet. Denote by C(x) the codewordcorresponding to x and let l(x) be the length of C(x). A code is said to be non-singular if everyelement of the range of X maps into a different string in D∗, i.e. xi 6= x j ⇒ C(xi ) 6= C(x j ).

The expected length L(C) of a source C(x) for a random variable X with a probability massfunction p(x) is given by L(C) =

∑x∈Θ p(x)l(x), where l(x) is the length of the codeword

associated with x . An optimal code is a source code of the minimum expected length.The necessary and sufficient condition for the existence of a uniquely decodable code with

a specified set of codeword lengths is known as the Kraft inequality: for any code over analphabet of size D, the codeword lengths l1, l2, . . . , lm must satisfy the inequality Σ D−li ≤ 1.Given this condition we ask what the optimal coding among a feasible code is. The optimalcoding is found by minimizing L =

∑pi li subject to Σ D−li ≤ 1. By the Lagrangian

multipliers, the optimal code lengths are l∗i = − logD pi . Therefore, the expected length isL∗ =

∑pi l∗i = −

∑pi logD pi which coincides with HD(X). Since codeword lengths must

be integers, it is not always possible to construct a code with a minimal length exactly equal tol∗i = − logD pi . The next remark summarizes these issues.

Remark 1. The expected length of a source C(X) for a random variable X is greater than orequal to the entropy of X . Therefore, given an optimal code of X denoted by C∗(X), its expectedlength verifies that H(X) ≤ L(C∗(X)) ≤ H(X)+ 1.

5.2. Construction of the verification and the communication sets

We first construct the verification set V , which is the set of sequences of length k + 1,belonging to the same type set over a binary alphabet and with the last component equal to

404 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

1; i.e. V ⊂ Sk(P) × {1} for some3 P ∈ Pk . The parameters k and P depend on those of thegame, T , ε and m1. Without any loss of generality, let us assume that (1− ε, ε) belongs4 to Pk ,for a family of k’s. Let k be the largest integer satisfying

2k H(ε)

(k + 1)2≤

m1 − l

l< 2k H(ε) (3)

where l is the length of the cyclic part of the proposed play, which has to be repeated at leastfour times5 (l ≤ T

4 ).Next, the communication set M is constructed. Let k = log2 |V | and consider a source code

C of a uniformly distributed random variable X over a finite alphabet of size |V |. The associated

entropy of X is H = −∑2k

i=1 2−k log 2−k= k. By Remark 1, the expected length of an optimal

code of X is greater than or equal to the entropy of X , which coincides with the logarithm of thecardinality of V , i.e. k, and it also provides the shortest length of the communication sequences.Notice that k need not be an integer. Let K be the set of integers greater than or equal to k.

By the above remark and the four requirements summarized in Section 5, we construct theoptimal set of messages M , implementing the communication phase, with the properties:

• By Remark 1 and requirement R3, M ⊆ {0, 1}k for some k ∈ K ;• By R2 and R4, M ⊂ Sk(r)× 1 for some r ∈ Pk ;• By requirement R1, |M | ≥ |V |.

Therefore, to completely describe the set of signals for the communication phase, we haveto establish both the length k ∈ K and the empirical distribution r of the sequences belongingto M . Let r = (r, 1 − r) = (Pr(0),Pr(1)), where r ∈ [0, 1

2 ] and let Φ(k, r) = {(k, r) ∈

(K × Pk)|1

(k+1)22k H(r) > 2k

} be the set of pairs (k, r) characterizing the communication set.

Define k as the smallest positive even integer such that there exists some r ∈ [0, 12 ] verifying that

(k, r) ∈ Φ(k, r). Fixing k, let rk ∈ [0, 12 ] be the smallest r such that (k, rk) ∈ Φ(k, r).

The next Lemma states that on fixing k, the corresponding rk becomes unique and equal to12 . It also provides the bounds of k. As a consequence, the set of signals for the communicationphase consists of the set of sequences with the same number of zeros and ones6 and of length k,which might be shorter than the length of the sequences for the verification play.

Lemma 1. • (k, r) 6∈ Φ(k, r) for any r < 12 .

• rk =12 .

• dke < k ≤ 2k for k > 6.694.

Proof. Suppose that (k, r) ∈ Φ(k, r) for r ∈ [0, 12 ).

Let k = dk H(r)e. Assume without any loss of generality that k < k.For k even, consider the pair (k, 1

2 ) and the corresponding type set Sk(12 ). By property (2) we

get:

3 Recall that the last component “1” is added in order to signal the end of the verification phase.4 Otherwise, we can also find a rational distribution which is close enough to distribution (1− ε, ε).5 Following Neyman’s construction on page 534, the specification of length l could be the integer part of [ 2T

9 ].6 Neyman’s construction gives the same kind of balanced sequences for the communication phase but their length is

not the shortest one.

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 405

1(k+1)2

2k H( 12 ) ≤ |Sk(r)| ≤ 2k H( 1

2 ), and then

1

(k + 1)22k H( 1

2 ) =1

(k + 1)22dk H(r)eH

(12

)>

1

(k + 1)22k H(r)

>1

(k + 1)22k H(r)

> 2k .

Therefore (k, 12 ) ∈ Φ(k, r). This is a contradiction because k < k since H(r) < 1 for

r ∈ [0, 12 ).

When k is odd, then k ≤ k − 1 and it suffices to prove the statement for k − 1. Consider the

pair (k, k−12k) = (dk H(r)e, dk H(r)e−1

2dk H(r)e). Then,

1

(k + 1)22k H( k−1

2k)=

1

(k + 1)22dk H(r)eH( dk H(r)e−1

2dk H(r)e)

≥1

(k + 1)22dk H(r)e( dk H(r)e−1

dk H(r)e+

1dk H(r)e

)

=1

(k + 1)22dk H(r)e

>1

(k + 1)22k H(r)

> 2k

where the first inequality follows from the fact that H( dk H(r)e−12dk H(r)e

) is greater than 2( dk H(r)e−12dk H(r)e

+

12dk H(r)e

); the second inequality is given by the definition of the upper integer part of a number

and the last one comes from the assumption that the pair (k, H(r)) ∈ Φ(k, r). We obtain acontradiction again because k < k

Moreover, from the above arguments we get (k, 12 ) ∈ Φ(k, r). In other words, rk =

12 .

Finally, dke < k since 1(dke+1)2

2dkeH(r) < 1(k+1)2

2k H( 12 ) for any r ∈ [0, 1

2 ). In order to prove

that k ≤ 2k for k > 6.694 it suffices to show that

1

(2k + 1)222k H( 1

2 ) ≥ 2k

for k > 6.694. �

The above construction of sets V , M and Q satisfy the computational requirementsto implement the equilibrium strategies. These features have to be combined with theimplementation requirements of finite automata since codification results are not enough bythemselves to guarantee the improvement of automata bounds. In particular, additional conditionsmay be needed to guarantee that players do not have profitable deviations from the equilibriumplay.

406 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

On the one hand, player 1 has to follow the specific play selected by player 2, and therefore hehas to store all possible cycles. Paralleling lemmata 9 and 10 of Neyman (1998, pages 537–539),any strategy σ in

∑1 of player 1 improving the equilibrium payoff against player 2 ’s equilibrium

mixed strategy τ ∗, needs more than m1 states. The key to prove these lemmata is that in order fordeviations to be profitable, they should take place in the last repetitions of the play. Therefore,the fact that the cycle has to be repeated at least four times is enough to fill up player 1’s capacity.

On the other hand, player 1 wants to design a “smart” automaton making good use of hiscapacity. His information processing is minimized by using the same states, called reducedstates, to process the signal and to follow the regular part of the different cycles. However,this introduces a difficulty since these states of player 1’s automaton admit more than oneaction, giving rise to the existence of deviations by player 2 that might remain unpunished.To avoid this problem, player 1 uses a mixed strategy whose support consists of a subsetof pure strategies, conformable to all the proposed plays, and with different locations of thereduced states. Equilibrium requirements impose a condition on the design of player 1’s mixedequilibrium strategy. Namely, it has to be constructed such that the reused states are randomlyallocated in a window7 inside his matrix of states with |Q| rows and l columns. The idea is thatthe number of reused states in this window is small enough for them to remain hidden fromplayer 2. We can assume that the size of such a window is a submatrix with l

6 columns and |Q|

rows. For each message sent in the communication phase there are at most k2 reused states, and

hence there are at most k2 |Q| in the window, where k is the length of the communication phase.

As length k depends on the length of the verification phase k, then for a k ≤ T27 , there is no

profitable deviation of player 2, since there is enough dispersion to guarantee the equilibriumcondition for this player. This argument parallels Lemma 11 of Neyman (1998, page 541).

To summarize, the set of plays Q has the same cardinality as that of V . This cardinality isapproximately 2k H(ε) (i.e. the number of rows of the matrix of player 1’s automaton), where ksatisfies Eq. (3).

The set of messages M is the typical set with empirical distribution ( 12 ,

12 ) and length k

satisfying:

dk H(ε)e < k ≤ 2k H(ε).

Then, for k ≤ T27 the upper bound of m1 is at most |Q|l, or more precisely

m1 ≤

[T

4

]2

T27 H(ε)

≈ exp(

T

27H(ε)

).

This construction, as well as that of Neyman, relies on three key parameters: the cardinality ofQ, the length of the messages belonging to M and the ε-approximation to the targeted payoff x .We use an N superscript to refer to Neyman’s parameters and an HU superscript to denote ours.We compare the upper bound of our smaller automaton denoted by m HU

1 , with that of Neyman,

denoted by m N1 . Let us recall that the set of plays in Neyman’s construction has cardinality 2k N

7 A window is a submatrix of states where player 1 processes the signal sent by player 2. Notice that after thecommunication phase, player 1 has to follow the row -in the general matrix- corresponding to the play chosen by player 2.

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 407

Fig. 1. Range of ε with respect to log m1T .

where k N satisfies Eq. (1) and, in particular,

k N≤

⌈ε3T

ln 2−

ln T

ln 2

⌉and that in our construction, the set of plays has cardinality which is at most |V | ≤ |S T

27(1−ε, ε)|.

Therefore k HU is bounded by

k HU≤

T

27H(ε)− 2 log

(T

27+ 1

).

Fig. 1 plots the exponents of the size of m1 (i.e. log m1T ) in the y-axis, both in Neyman’s and

ours, as functions of ε (x-axis). Neyman’s approach is represented by the dotted curve ε3T , andours by the solid curve T

27 H(ε). For small values of ε, the increase in m N1 is very slow, while the

increase in m HU1 is very fast, strongly approaching the exponential size of m1 (full rationality).

More precisely, the concave shape of H(ε) versus the convex shape of ε3, captures the usefulnessof Information Theory and Codification Theory to solve computational designs in the frameworkof repeated games played by finite automata.

6. Concluding remarks

This paper has applied a codification procedure to the construction of the computationalfeatures of equilibrium strategies in finitely repeated games played by finite automata. As alreadymentioned, we have mainly focused on the subcase where the targeted payoff only depends ona pair of pure actions. Our construction is general enough to cover all the subcases in Neyman’sproof. The extension can be undertaken by appropriately modifying the length parameter of theverification phase, which in turn depends on the length of the cycle.

In addition, we have assumed that punishment levels were in pure strategies. Differentpunishments, either in pure strategies for player 2 and mixed strategies for player 1, or in mixedstrategies for both players convey additional relationships between the automata complexitybounds, between m1 and m2, which depend on the bound on m1, already obtained in the purestrategy construction. For example, the former case is in Theorem 2 in Neyman (1998, page522), where m1 ≤ min(m2, exp(ε3T )), and the latter case in Theorem 3 in Neyman, on the same

408 P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409

page, with m1 ≤ m2 ≤ exp(ε3 min(T,m1)). As easily seen, the above conditions depend on theconstraint stated in Neyman’s Theorem 1 (page 521), i.e. m1 ≤ exp(ε3T ). Thus, the bounds inboth these Theorems can also be improved by our construction.

As already shown, the bound improvement depends directly on the design of both theverification and communication sets. There are several ways to construct both sets andcharacterize the equilibrium strategies, and therefore the equilibrium bounds. Neyman presentedone of them and we have proven that ours is optimal with respect to the communication design.This optimality comes from the use of codification tools applied to Neyman’s equilibrium design(see Lemma 1).

Codification tools and/or Information Theory have been applied for solving different questionsin the framework of long strategic interactions. For instance, Gossner et al. (2003) design theoptimal strategies in a three-player zero-sum game by the Type methodology and Gossner et al.(2006) apply Codification Theory to construct strategies in repeated games with asymmetricinformation about a dynamic state of nature. Their main theorem shows how to constructequilibrium strategies by means of the existence of a code-book, ensuring an appropriatecommunication level in order to reduce the inefficiencies due to information asymmetriesbetween players. Tools of Information Theory, in particular, the entropy concept, the Kullbackdistance and typical sets have also been applied by Gossner and Vieille (2002), when constructingε-optimal strategies in two-person repeated zero-sum games and in Gossner and Tomala (2006),to characterize the set of limits of the expected empirical distribution of a stochastic process withrespect to the conditional beliefs of the observer.

Acknowledgements

The first author thanks the Spanish Ministry of Education and science for partial financialsupport under project SE J2004-02172. The second author thanks the Spanish Ministry ofEducation and Science under projects SEJ2004-07554 and SEJ2007-66581 and the “GeneralitatValenciana” under project GRUPOS04/13. The authors also wish to thank the “InstitutoValenciano de Investigaciones Economicas” (IVIE). A previous version of this paper appearsas IVIE Working Paper WP-AD, 2006–28.

References

Abreu, D., Rubinstein, A., 1988. The structure of Nash equilibrium in repeated games with finite automata. Econometrica56, 1259–1281.

Ben-Porath, E., 1993. Repeated games with finite automata. Journal of Economic Theory 59, 17–32.Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. Wiley Series in Telecommunications. Wiley, New

York.Gossner, O., Hernandez, P., 2003. On the complexity of coordination. Mathematics of Operations Research 28 (1),

127–140.Gossner, O., Hernandez, P., Neyman, A., 2003. Online matching pennies. Discussion Papers of Center for Rationality

and Interactive Decision Theory, DP 316.Gossner, O., Hernandez, P., Neyman, A., 2006. Optimal use of communication resources. Econometrica 74 (6),

1603–1636.Gossner, O., Tomala, T., 2006. Empirical distributions of beliefs under imperfect observation. Mathematics of Operations

Research 31, 13–30.Gossner, O., Vieille, N., 2002. How to play with a biased coin? Games and Economic Behavior 41, 206–222.Kalai, E., Standford, W., 1988. Finite rationality and interpersonal complexity in repeated games. Econometrica 56 (2),

397–410.

P. Hernandez, A. Urbano / Mathematical Social Sciences 56 (2008) 395–409 409

Neyman, A., 1985. Bounded complexity justifies cooperation in the finitely repeated prisoner’s dilemma. EconomicsLetters 19, 227–229.

Neyman, A., 1997. Cooperation, repetition, and automata. In: Hart, S., Mas Colell, A. (Eds.), Cooperation: Game-Theoretic Approaches. In: NATO ASI Series F, 155. Springer-Verlag, pp. 233–255.

Neyman, A., 1998. Finitely repeated games with finite automata. Mathematics of Operations Research 23 (3), 513–552.Neyman, A., Okada, D., 1999. Strategic entropy and complexity in repeated games. Games and Economic Behavior 29,

191–223.Neyman, A., Okada, D., 2000. Repeated games with bounded entropy. Games and Economic Behavior 30, 228–247.Neyman, A., Okada, D., 2000. Two-person repeated games with finite automata. International Journal of Game Theory

29, 309–325.Papadrimitriou, C.H., Yannakakis, M., 1994. On complexity as bounded rationality. STOC 726–733.Rubinstein, A., 1986. Finite Automata play the repeated prisoners’ dilemma. Journal of Economic Theory 39, 183–196.Simon, H., 1955. A behavioral model of rational choice. The Quarterly Journal of Economics 69 (1), 99–118.Zemel, E., 1989. Small talk and cooperation: A note on bounded rationality. Journal of Economic Theory 49 (1), 1–9.