12
Physica A xx (xxxx) xxx–xxx Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game Q1 Enock Almeida a , Roberto da Silva b,, Alexandre Souto Martinez a a Departamento de Física e Matemática, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes, 3900 - CEP 14040-901, Ribeirão Preto, São Paulo, Brazil b Instituto de Fisica, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 - CEP 91501-970, Porto Alegre, Rio Grande do Sul, Brazil highlights We propose a generalization of strategies in the iterated ultimatum game. Our analysis is separated in two ways: no-memory players and one-step-memory players. The accepting of the proposals is performed under two prescriptions: (a) fixed probabilities and (b) dependent on proposals. A detailed analysis of optimization of payoff fluctuations was performed. Monte Carlo simulations corroborate analytical results. article info Article history: Received 19 June 2014 Available online xxxx abstract An iterated version of ultimatum game, based on generalized probabilistic strategies, which are mathematically modeled by accepting proposal functions is presented. These strategies account for the behavior of the players by mixing levels of altruism and greed. We obtained analytically the moments of the payoff of the players under such a generalization. Our analysis is divided into two cases: (i) no memory players, where players do not remember previous decisions, and (ii) one-step memory players, where the offers depend on players’ last decision. We start considering the former case. We show that when the combination of the proposer’s altruism and responder’s greed levels balances the proposer’s greedy and responder’s altruism levels, the average and variance of the payoff of both players are the same. Our analysis is carried out considering that the acceptance of an offer depends on: (a) a fixed probability p or (b) the value offered. The combination of cases (i) and (a) shows that there exists a p value that maximizes the cumulative gain after n iterations. Moreover, we show n × p diagrams with ïso-average’’ and ïso-variance’’ of the cumulative payoff. Our analytical results are validated by Monte Carlo simulations. For the latter case, we show that when players have no memory (i), there are cutoff values, which the variance of the proposer’s cumulative payoff presents local maximum and minimum values, while for the responder, the same amount presents a global maximum. Case (b) combined with one- step memory players (ii), we verified, via MC simulations that, for the same number of iterations, the responder obtains different cumulative payoffs by setting different cutoff values. This result composes an interesting pattern of stripes in the cutoff per n diagrams. Simultaneously, looking at variance of this amount, for the responder player in a similar diagram, we observe regions of iso-variance in non trivial patterns which depend on initial Corresponding author. Tel.: +55 51 8196 0903. E-mail addresses: [email protected] (E. Almeida), [email protected], [email protected] (R. da Silva), [email protected] (A.S. Martinez). http://dx.doi.org/10.1016/j.physa.2014.06.032 0378-4371/© 2014 Published by Elsevier B.V.

Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

Embed Size (px)

Citation preview

Page 1: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

Physica A xx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Physica A

journal homepage: www.elsevier.com/locate/physa

Strategies generalization and payoff fluctuation optimizationin the iterated ultimatum game

Q1 Enock Almeida a, Roberto da Silva b,∗, Alexandre Souto Martinez a

a Departamento de Física e Matemática, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo,Av. Bandeirantes, 3900 - CEP 14040-901, Ribeirão Preto, São Paulo, Brazilb Instituto de Fisica, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 - CEP 91501-970, Porto Alegre, Rio Grandedo Sul, Brazil

h i g h l i g h t s

• We propose a generalization of strategies in the iterated ultimatum game.• Our analysis is separated in two ways: no-memory players and one-step-memory players.• The accepting of the proposals is performed under two prescriptions: (a) fixed probabilities and (b) dependent on proposals.• A detailed analysis of optimization of payoff fluctuations was performed.• Monte Carlo simulations corroborate analytical results.

a r t i c l e i n f o

Article history:Received 19 June 2014Available online xxxx

a b s t r a c t

An iterated version of ultimatumgame, based on generalized probabilistic strategies,whicharemathematicallymodeled by accepting proposal functions is presented. These strategiesaccount for the behavior of the players bymixing levels of altruism and greed.We obtainedanalytically the moments of the payoff of the players under such a generalization. Ouranalysis is divided into two cases: (i) no memory players, where players do not rememberprevious decisions, and (ii) one-step memory players, where the offers depend on players’last decision. We start considering the former case. We show that when the combinationof the proposer’s altruism and responder’s greed levels balances the proposer’s greedy andresponder’s altruism levels, the average and variance of the payoff of both players are thesame. Our analysis is carried out considering that the acceptance of an offer depends on:(a) a fixed probability p or (b) the value offered. The combination of cases (i) and (a) showsthat there exists a p value that maximizes the cumulative gain after n iterations. Moreover,we show n×p diagrams with ïso-average’’ and ïso-variance’’ of the cumulative payoff. Ouranalytical results are validated by Monte Carlo simulations. For the latter case, we showthat when players have no memory (i), there are cutoff values, which the variance of theproposer’s cumulative payoff presents local maximum and minimum values, while for theresponder, the same amount presents a global maximum. Case (b) combined with one-step memory players (ii), we verified, via MC simulations that, for the same number ofiterations, the responder obtains different cumulative payoffs by setting different cutoffvalues. This result composes an interesting pattern of stripes in the cutoff per n diagrams.Simultaneously, looking at variance of this amount, for the responder player in a similardiagram, we observe regions of iso-variance in non trivial patterns which depend on initial

∗ Corresponding author. Tel.: +55 51 8196 0903.E-mail addresses: [email protected] (E. Almeida), [email protected], [email protected] (R. da Silva), [email protected]

(A.S. Martinez).

http://dx.doi.org/10.1016/j.physa.2014.06.0320378-4371/© 2014 Published by Elsevier B.V.

Page 2: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

2 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

value of the proposal. Our contributions detailed by analytical and MC simulations areuseful to design new experiments in the ultimatum game in stochastic scenarios.

© 2014 Published by Elsevier B.V.

Q2

1. Introduction1

Q3Game theory plays an important role in explaining the interaction between living creatures in biological sciences or social2

features of stock markets among other examples. In these systems one considers individuals composing homogeneous or3

heterogeneous populations, with or without spacial structure and they negotiate/combat/collaborate via any protocol of the4

theoretical game theory framework. The full comprehension of cooperation between individuals as an emergent collective5

behavior is a challenge [1–3]. The cooperation emerges as a stable strategy in the spatial prisoner’s dilemma [4–8].6

The ultimatum game plays an important role to mimic bargaining aspects that emerge in real situations. In this game,7

firstly proposed by Güth et al. [9], two players must divide an amount (a sum of money). One of the players proposes a8

division (the proposer) and the second player can either accept or reject it. If the second player accepts it, the values are9

distributed according to the division established by the proposer. Otherwise, no earning is distributed to the players.10

Even in a single turn, the ultimatum gamemight be interesting. Although it is better for the responder to accept any offer,11

offers below one third are often rejected [10]. The responder punishes the proposer up to the balance between proposal and12

accepting in the iterated game. In general, values around a half of the total amount are accepted [10,11]. If played iteratively,13

for example, in n turns, the iterated ultimatum game is suitable to explain the emergence of player’s cooperation [10]. On14

one hand, the authors of Refs. [12,13] have showed that, in a linear lattice with the periodic boundary conditions, players15

who offer and accept the smallest values can spread their strategies throughout their neighbors. On the other hand, Szolnoki16

et al. [10] noticed that, in a square lattice, fair players get larger payoffs. Also, this altruistic behavior has been observed in17

humans, and there are evidences that unequal offers activate brain regions related to pain and affliction [11]. Uncomfortable18

feelings lead the responder to sacrifice his own gain to punish the proposer.19

The authors of Refs. [14,15] have calculated the players’ payoff statistics and have shown the necessary conditions for a20

strategy to dominate the others. Their results have been corroborated by Monte Carlo numerical simulations.21

The aspects of the payoff statistical fluctuations of this game have been addressed in different versions of the model: (i)22

spacial ultimatum game (see, for example, Refs. [16,17]) and (ii) populations of players in matching graphs and in complete23

graphs (mean-field regimes) [14,15]. Nevertheless, the ultimatumgame strategy generalizations andoptimization are poorly24

explored, even its iterated version, without topology effects.25

In this paper, we call attention to iterated ultimatum game, disregarding the effects of the spatial structure. We focus on26

the payoff statistical moments of generalized strategies searching for optimum parameter values. We generalize strategies27

for the proposer and the responder in the iterated ultimatum game. This generalization allows us to go beyond basic28

strategies, in which the only effect is to lead to the fifty-fifty proposal/accepting as the known punishment mechanism.29

Also, we address optimal strategies related to maximization/minimization of the payoff and its variance when the players30

face stochastic scenarios. Firstly, we consider players without memory of previous decisions. Next, we consider players31

who recall the values and decisions of the preceding turn. For memoryless players, we have obtained analytical probability32

distributions for proposal and for the response, with a parameter tuning player decisions from altruism to greed. For players33

with one step memory, the proposer adjusts the values depending on the declination/acceptance of the adversary in the34

preceding turn. We have analytically calculated the first and the second payoff statistical moment.35

Our result is presented as follows. In Section 2, we develop a generalized approach to calculate the statistical payoff36

moments considering static strategies, for memoryless players. General results, ranging from altruistic to greedy behaviors37

are obtained for the gain fluctuations for the iterated version of this game. In Section 3, we address the evolutionary38

strategies, with players having one step memory of this iterated game based on the idea of offer increments/decrements39

depending on the previous result. In Section 4, we present ourmain results in static and evolutionary versions of the iterated40

game. Optimization of strategies are studied in detail. Finally, in Section 5, we present our summaries and some conclusions41

of our results.42

2. No memory players (static strategies) with generalized strategies43

In the iterated ultimatum game, consider the probability: (i) pp(y), that the proposer offers y ∈ [0, x] to her/his opponent44

and (x − y) is its corresponding part, and (ii) pr(y), that the responder accepts the deal. The kth statistical moments of the45

gain (per/match) for the both players are respectively:46 gk

p (n) =1n

nj=1

(x − yj)kp(j)p (yj)p(j)

r (yj) (1)47

gk

r (n) =1n

nj=1

ykj p(i)p (yj)p(j)

r (yj),48

Page 3: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

E. Almeida et al. / Physica A xx (xxxx) xxx–xxx 3

where n is the number of rounds (iterations). In each round, the proposer offers and the responder accepts the amounts 1

with different probability distributions, indexed by (j) in p(j)p (yj) and p(j)

r (yj). Notice that the term relative to zero gain, which 2

occurs with probability 1 − p(j)r , has not been written. For numerous rounds (n ≫ 1), one can make use of the continuous 3

approximation. Considering that the offers and acceptances are identically distributed, these statistical moments become: 4gk

p =

1

0(1 − y)kfp(y)fr(y)dy (2) 5

gk

r =

1

0ykfp(y)fr(y)dy, (3) 6

where, without loss of generality, we consider x = 1. The discrete case probability pp(y) becomes the fp(y) probability den- 7

sity function (pdf). Nevertheless, the probability pr(y)does not lead directly to a pdf. For technicalmeans, it only needs to be a 8

number in the interval [0, 1] for each given y ∈ [0, x], since the responder acceptswith probability pr(y) and rejects 1−pr(y). 9

To cover a multitude of different situations in the iterated ultimatum game, let us propose the use of the generalized 10

fp(y) and fr(y) multi-parametric functions: 11

fp(y) =Γ (q1 + q2 + 2)

Γ (q1 + 1)Γ (q2 + 1)yq1 (1 − y)q2 , (4) 12

for 0 ≤ y ≤ 1, so that 10 dyfp(y) = 1. The distribution function relative to pr(y) is: 13

fr(y) =

cq3,q4q5q

q46

g(y) for 0 ≤ y ≤ q6

1 for q6 < y ≤ 1,(5) 14

cq3,q4 =(q3 + q4)q3+q4

qq33 qq44(6) 15

g(y) = (1 − y)q3 yq4 , (7) 16

so that 0 ≤ fr(y) ≤ 1. The parameters q1, . . . , q6 control the behavior of the players. On one hand, the proposer can offer 17

greater amounts to the responder with greater probabilities following a distribution based on ‘‘complete altruism’’ (q2 = 18

0, q1 > 0). On the other hand, the probabilities can follow a distribution based on a ‘‘complete greedy ’’ (q2 > 1, q1 = 0), 19

where the proposer offers smaller amounts to the responderwith greater probabilities. The intermediate cases (q1 > 0, q2 > 20

0) mimic players with mixed strategies, which interpolate the extreme situations. The case q1 = q2 = 0 represents a 21

very special situation, the proposer offers amounts carelessly, i.e., the offers are equally probable. The accepting process is 22

significantly different from the proposing one. Mathematically, fr(y) is not a pdf and depends on four parameters: q5 ≥ 1 23

only controls the offer accepting magnitude and q6 < 1 is a cutoff to be used in some specific strategies. 24

For example, firstly consider q3 = q4 = 0. For q5 = q6 = 1, the responder accepts any amount, independently of the 25

offer. The responder accepts the offers according to a non-biased coin, for q5 = 2 or for, q5 > 2, according to a biased coin. 26

Now, let us turn our attention to the payoffs by justifying the term: cq3,q4 . Consider Eq. (7), for q3 = q4 = 0, g(y) 27

vanishes in the domain. For q3 > 0 and q4 = 0, g is a monotonic decreasing function with, g(0) = 1. For q3 = 0 and 28

q4 > 0, g is a monotonic increasing function, with g(1) = 1. If q3, q4 > 0, then g(0) = g(1) = 0. Since g(y) ≥ 0, it 29

has a maximum value for any y ∈ ]0, 1[. If g ′(y) = [q4/y − q3/(1 − y)]g(y), g ′(y) = 0, then for 0 < y < 1 only if 30

y∗= q4/(q3 + q4) ⇒ f (y∗) = 1/cq3,q4 , which leads to Eq. (5). Some examples are depicted in Fig. 1. 31

For the sake of the simplicity, let us start considering q6 = 1. The cases with cutoff will be explored in Section 4. 32

Substituting Eqs. (4) and (5) in Eq. (2), we obtain: 33gk

p =cq3,q4q5

Γ (q1 + q2 + 2)Γ (q1 + 1)Γ (q2 + 1)

·Γ (q1 + q4 + 1)Γ (q2 + q3 + k + 1)

Γ (q1 + q2 + q3 + q4 + k + 2)(8) 34

and 35gk

r =cq3,q4q5

Γ (q1 + q2 + 2)Γ (q1 + 1)Γ (q2 + 1)

·Γ (q1 + q4 + k + 1)Γ (q2 + q3 + 1)

Γ (q1 + q2 + q3 + q4 + k + 2)(9) 36

which leads to the general ratio between the average values: 37gk

p

gkr

=Γ (q1 + q4 + 1)Γ (q2 + q3 + k + 1)Γ (q1 + q4 + k + 1)Γ (q2 + q3 + 1)

. (10) 38

A simple expression is obtained for k = 1 and q1, q2, q3 and q4 integers: 39

⟨g⟩p⟨g⟩r

=Γ (q1 + q4 + 1)Γ (q2 + q3 + 2)Γ (q1 + q4 + 2)Γ (q2 + q3 + 1)

=q2 + q3 + 2q1 + q4 + 2

. (11) 40

Page 4: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

4 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

Fig. 1. Examples of different parameter values in the accepting functions (Eq. (5)) without cutoff effects (i.e., q6 = 1). (a) The plot shows the cases withq3 = q4 = 0, which means to accept an offer according with probability 1/q5 , where q5 > 1, for q5 = 1 (all values are accepted), q5 = 2, where 50% ofthe offers are accepted independently of their value and q5 = 3. The case q5 → ∞ corresponds to a player who simply rejects all offers. In the plots (b),(c), (d), (e), and (f) we set q5 = 1 which guarantees that max[fr (y)] = 1. The plots (b) and (c) explore respectively q3(q4) fixed to 0.1 and q4(q3) assumingthe values 0, 0.1, 0.5, 1, and 4. The plots (d) and (e) correspond to the same study but fixing q3(q4) to 1. Finally the plot (f) corresponds to the case whereq3 = q4 = q assuming the same values 0.1, 0.3, 1, 2, and 10. Such figures explore that such parameters cover the many maps between altruism and greedybehavior.

Page 5: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

E. Almeida et al. / Physica A xx (xxxx) xxx–xxx 5

If q2+q3 = q1+q4, then ⟨g⟩p = ⟨g⟩r . It means that the proposer and the responder have the same average gainwhen the 1

altruism level of the proposer combined with the greedy level of the responder balances the altruism level of the responder 2

combined with the greedy level of the proposer. 3

The variances of the proposer and the responder are Var(g)p =g2

p − ⟨g⟩2p and Var(g)r =

g2

r − ⟨g⟩2r respectively, 4

Var(g)p =

(q2 + q3 + 3)(q2 + q3 + 2)(q1 + q2 + q3 + q4 + 4)

− (q2 + q3 + 2)2Φ(q1, q2, q3, q4, q5)

· Φ(q1, q2, q3, q4, q5) (12) 5

and 6

Var(g)r =

(q1 + q4 + 3)(q1 + q4 + 2)(q1 + q2 + q3 + q4 + 4)

− (q1 + q4 + 2)2Φ(q1, q2, q3, q4, q5)

· Φ(q1, q2, q3, q4, q5) (13) 7

where 8

Φ(q1, q2, q3, q4, q5) =cq3,q4q5

Γ (q1 + q2 + 2)Γ (q1 + 1)Γ (q2 + 1)

·Γ (q1 + q4 + 1)Γ (q2 + q3 + 1)

Γ (q1 + q4 + q2 + q3 + 3). (14) 9

Thus, 10

Var(g)pVar(g)r

=(q2 + q3 + 3) − (q1 + q2 + q3 + q4 + 4) · (q2 + q3 + 2)2Φ(q1, q2, q3, q4, q5)(q1 + q4 + 3) − (q1 + q2 + q3 + q4 + 4) · (q1 + q4 + 2)2Φ(q1, q2, q3, q4, q5)

. (15) 11

Independent of the value of q5, if q2 + q3 = q1 + q4, then Var(g)p = Var(g)r , as well as it has occurred with average values. 12

In Section 4, we present some results for no memory players about two specific situations with static strategies. In each 13

round, the proposer offers an amount x uniformly distributed in [0, 1] and the responder accepts the value with (i) the same 14

probability p, independently from offer, (ii) probability that depends on the offer. In the latter case, an analysis for the cutoff 15

parameter is performed. 16

3. Players with one step memory 17

Now, let us consider the case where the proposer’s offer can be incremented or decremented depending only on the 18

immediate responder decision. In our model, in the first round, the proposer offers y0. If the receptor accepts it, in the 19

following round, the proposer decreases the offer by 1y. However, if the receptor declines it, in the following round, the 20

proposer increases the offer by 1y. 21

Consider the case where the responder always accepts the offer with a fixed probability: f (y) = p ∈ [0, 1], and the offer 22

rejection occurs with probability 1 − p. This consideration allows us to obtain analytical results in the one-step memory 23

iterated game. Given 1y and p, in the ith round, the average offer is: 24

yi = y0 + i1y(1 − 2p), (16) 25

where i = 0, 1, . . . , n, since in each round the average offer is modified by ⟨(1y)i⟩ = (1 − p)1y − p1y = 1y(1 − 2p). 26

In the ith round, the responder average payoff is gi = pyi = py0 + ip1y(1 − 2p). Thus, after n iteration, the average of the 27

cumulative payoff is 28

Gr =

ni=1

gi = npy0 +n(n − 1)

2p(1 − 2p)1y. (17) 29

Similarly, the proposer has an average cumulative payoff: 30

Gp = np(1 − y0) −n(n − 1)

2p(1 − 2p)1y. (18) 31

The probability p, for a given n, that maximizes the cumulative responder gain GR is: 32

p∗=

14

2y0

(n − 1)1y+ 1

. (19) 33

After a determined number of rounds, the value offered by the proposer can be negative or greater than 1. Such a critical 34

number can be calculated with yi = 0 or 1 in Eq. (16), respectively, yielding two possibilities: 35

nc =

12p − 1

n0

for p > 1/2

1 − y0y0(1 − 2p)

n0

for p < 1/2

(20) 36

Page 6: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

6 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

Fig. 2. Existence of a value of p that maximizes the cumulative payoff (Eq. (17)) as function of p, for different values of y0 . For these experiments, n = 103

and 1y = 6 · 10−4 have been used.

where ⌈x⌉ denotes the nearest integer greater than x. Naturally, y0 and 1 − y0 are non-negative values, therefore, if1

p > 1/2, nc = ⌈y0/[1y(2p − 1)]⌉, while for p < 1/2, nc = ⌈(1 − y0)/[1y(1 − 2p)]⌉. As p → 1/2, nc → ∞. Only2

for n < nc , Eq. (17) holds. For example, for n − 1 = n0, p∗≡ 3/4, showing that there is a non-trivial optimal value. There is3

probability p that maximizes the proposer gain, given y0, n and 1y. This maximization is depicted in Fig. 2.4

In the nth round, the responder cumulative gain variance is:5

Varr(G) = np(1 − p)y20 + 4n(n − 1)p(p − 1)p −

14

y01y6

+

2n(n − 1)(2n − 3)p3(1 − p) − 2n(n − 1)(n − 2)p2(1 − p) +

n(n − 1)(2n − 1)p(1 − p)6

1y2 (21)7

and similarly, for the proposer:8

Varp(G) = np(1 − p)(1 − y0)2 − 4n(n − 1)p(p − 1)p −

14

(1 − y0)1y9

+

2n(n − 1)(2n − 3)p3(1 − p) − 2n(n − 1)(n − 2)p2(1 − p) +

n(n − 1)(2n − 1)p(1 − p)6

1y2. (22)10

So in this section, we showed that accepting offers with fixed probability p, after n iterations, leads to the cumulative11

gain which has a maximum value at p∗ given by Eq. (19). In next section (Main results), we will show color maps n × p that12

illustrate some patterns of iso-payoff and iso-variance, since the cumulative payoff and its variance can assume equal values13

for different values of n under different p-values.Wewill also show the results for the cases wherewe do not have analytical14

formulas just MC simulations: accepting which depends on the offer. In this case we check that the interesting pattern of15

stripes of iso-payoff appears in diagrams n×q6 with cross over in values of q6 that depends on y0. In other similar diagrams,16

we checked that regions of maximal variances are larger for small values y0 and concentrated for q6 > 0.5. However the17

magnitude of maximal values of variance is larger for greater values of y0.18

4. Main results19

Here, we present results about some specific situations. For instance, in each round, the proposer offers an amount x20

uniformly distributed in [0, 1]. The responder accepts this amount with (i) the same probability p, independently from21

offer value or (ii) probability y, which is greater, the greater the offer (greedy player) is or (iii) it can raise the value up22

to y = q6 (controlled greedy player), remaining unitary for greater values. Our main results concern fixed strategies and23

evolutionary strategies (increment/decrement of the offers). In the first part, we studied some cases by setting some specific24

parameters to explore our analytical formulas previously proposed. In the second part, we compare the analytical results25

with MC simulations.26

4.1. Static probabilistic strategies on the offers (no memory game)27

In the case of the static probabilistic strategies on the offers, it is important to calculate the average gain per round, since28

the trace of game is not available. The gain in the following iteration does not depend on the previous decisions. Let us29

choose an important particular case: consider that offers are distributed uniformly in the interval [0, 1]. In this case, let us30

consider accepting functions of Eqs. (8) and (9). The emphasis of our analysis is for cutoff values (q6 = 1), which have been31

disregarded previously.32

Page 7: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

E. Almeida et al. / Physica A xx (xxxx) xxx–xxx 7

Fig. 3. Average and variance payoff of the responder (surface) and the proposer (patch) as function of the pair (q4, q6).

Firstly, consider a fixed probability for the responder to accept an offer, so that q1 = q2 = q3 = q4 = 0, q6 = 1 and 1

p = 1/q5 in Eqs. (8) and (9), leading to ⟨g⟩p = p/2, ⟨g⟩r = p/2 and Varr(g) = Varp(g) = −p2/4 + p/3. Particularly for 2

p = 1/2,Varr = Varp = 5/48. 3

Still, considering the case of no cutoff (q6 = 1). Let us address the case where the responder accepts the offer with 4

probability proportional to the offer value (greedy player). In this case, we choose: q1 = q2 = q3 = 0 and q4 > 5

0, q5 = q6 = 1, so that ⟨g⟩p = Γ (q4 + 1)/Γ (q4 + 3) and ⟨g⟩r = Γ (q4 + 2)/Γ (q4 + 3). For arbitrary q4 value, one 6

has ⟨g⟩r / ⟨g⟩p = Γ (q4 + 2)/Γ (q4 + 1) = q4 + 1, for q4 ∈ Z. For example, for the linear case (q4 = 1), i.e., fr(y) = y, we 7

have ⟨g⟩p = 1/6 = ⟨g⟩r /2, which shows that the responder’s average gain is double that of the proposer. 8

Before analyzing the variance, let us generalize the study of the fluctuations for any pair (q4, q6) and then make q6 → 1. 9

Now, let us consider the case with cutoff (q6 ≤ 1) to explore the effects of the cutoff value for accepting dependence of 10

the offer for an interesting case in particular. Offers are uniformly distributed in the interval [0, 1] (q1 = q2 = q3 = 0) and 11

q5 = 1 so that: 12

fr(y) =

yq6

q4if 0 < y ≤ q6

1 if q6 < y ≤ 1.(23) 13

Here it is important to mention that our analysis concerns 0 ≤ q4 ≤ 1, where the maximum response is the linear one. 14

In this case, Eqs. (8) and (9) are not valid since q6 < 1. The average values are given by: 15

⟨g⟩p =1qq46

q6

0(1 − y)yq4dy +

1

q6(1 − y)dy =

q4q262(2 + q4)

−q4q6

(1 + q4)+

12

(24) 16

and 17

⟨g⟩r =1qq46

q6

0yq4+1dy +

1

q6ydy =

12

1 −

q4q262(q4 + 2)

. (25) 18

As expected we can verify that: ⟨g⟩r / ⟨g⟩p = Γ (q4 + 2)/Γ (q4 + 1), according to previous results. 19

An interesting case is the linear response (q4 = 1) up to q6, and the responder accepts all offers with greater values, i.e., 20

fa(y) = y/q6, for 0 < y < q6 and fa(y) = 1 for q6 ≤ y ≤ 1. In this case ⟨g⟩p = q26/6 − q6/2 + 1/2 and ⟨g⟩r = −q26/6 + 1/2. 21

Obviously, argmax0<q6<1 {⟨g⟩r} = 0 which corresponds to the responder who accepts all random purposes made by the 22

proposer. Naturally, argmin0<q6<1⟨g⟩p

= 3/2 > 1, i.e., the proposer has a minimum gain, when the responder does not 23

accept all offers, with unitary probability. There are no candidates to local extremal since ∂ ⟨g⟩r /∂q6 = 0 leads to q∗

6 ≡ 0 24

and ∂ ⟨g⟩r /∂q4 = 0 results in q∗

6 ≡ 0. On the other hand, ∂ ⟨g⟩p /∂q6 = 0 results in q∗

6 = (2 + y)/(1 + y), but under 25

restriction 0 ≤ q∗

6 ≤ 1, so that q∗

4 ∈] − ∞, −2]. Since the offers are concentrated in 0 ≤ q4 ≤ 1, there are no candidates 26

to extremal values. Similarly, ∂ ⟨g⟩p /∂q4 = 0, leads to q∗

6 = [(q4 + 2)/(y + 1)]2 and under the same restriction, so that 27

q∗

4 ∈] − ∞, −2] there are no candidates to extremal values. 28

In Fig. 3 (first plot), we observe the monotonic behavior of average gains of the receptor (surface) and proposer (patch). 29

The receptor’s average gain outperforms the proposer’s one, for any set of parameters (q6, q4). The second moments are 30

calculated according to calculations: 31g2

p =1qq46

q6

0(1 − y)2yq4dy +

1

q6(1 − y)2dy =

q6q4 + 1

− 2q26

q4 + 2+

q36q4 + 3

−(q6 − 1)3

3(26) 32

Page 8: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

8 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

Fig. 4. Variance of the responder (a) and the proposer (b) for different values of q4 as a function of q6 .

Fig. 5. Variance of the receptor as a function of q6 for specific case q4 = 0.47.

and1

g2

r =1qq46

q6

0yq4+2dy +

1

q6y2dy =

q36q4 + 3

+1 − q36

3. (27)2

The variances are Var(g)r =g2

r − ⟨g⟩2r and Var(g)p =

g2

p − ⟨g⟩2p according to Eqs. (24)–(27). In Fig. 3 (second plot), we3

show the behaviors of Var(g)r (surface) and Var(g)p (patch) which motivate some questions to be explored. First of all we4

analyze the candidates to the critical points. Setting ∂Var(g)r/∂q6 = 0 results in two nonzero possibilities:5

q∗

6 = −(2 + q4)2 ±

√∆

2q4(3 + q4)(28)6

Page 9: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

E. Almeida et al. / Physica A xx (xxxx) xxx–xxx 9

Fig. 6. Validation of our analytical results with Monte Carlo numerical simulations. The color diagrams show the average values of the cumulative payoffof the responder as a function of n (iterations) and p (accepting probability). The left side presents the results obtained by Monte Carlo simulations whilethe right side the analytical results (Eq. (18)). (For interpretation of the references to color in this figure legend, the reader is referred to the web version ofthis article.)

where ∆ = 16+ 104q4 + 108q24 + 40q34 + 5q44. On the other hand, ∂Var(g)r/∂q4 = 0 leads to also two nonzero candidates: 1

q∗

6 = −(2 + q4)3 ±

√∆2

2q4(3 + q4)2(29) 2

where ∆2 = 64 + 840q4 + 1428q24 + 1024q34 + 372q44 + 68q54 + 5q64. 3

The only simultaneous solutions of Eqs. (28) and (29) correspond to q∗

4 = −2, which does not belong to [0, 1]. So, there 4

are no extremal candidates for Var(g)r , and for Var(g)p. Similarly, we can conclude that there are also no candidates for 5

0 < q6 < 1. However, it is important to analyze the followingmarginal optimization question: for a fixed a q4-value, what is 6

the value of q6 that locallymaximizes/minimizes Var(g)? Let us start with Var(g)r . In this case, wemust analyze Var(g)r , us- 7

ing Eq. (28), where the acceptable alternative (q6 ≥ 0) is the one for+√

∆. Substituting it in the formula of Var(g)r , we have 8

a function that only depends on q4. Numerically, we can observe that 0 < q∗

6 < 1 when√3 − 1 = 0.732 05 . . . < q4 < 1, 9

so extremal of Var(g)r conditioned to values of q4 must be found only for this interval of q4. The candidates to extremes of 10

Page 10: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

10 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

Fig. 7. Validation of our analytical results with Monte Carlo numerical simulations. The color diagrams show the average values of the cumulative payoffvariance of the responder as a function of n (iterations) and p (accepting probability). The left side presents the results obtained byMonte Carlo simulationswhile the right side the analytical results (Eq. (21)). (For interpretation of the references to color in this figure legend, the reader is referred to the webversion of this article.)

Var(g)r are concentrated around q∗

6 > 0.94 . . . as can be numerically verified. In Fig. 4(a), the plots show the maximum of1

Var(g)r , for different values of q4, as a function of q6.2

We can also verify that maximum points just appear for interval q4 ∈ [√3 − 1, 1]. Now, let us analyze Var(g)p. The3

condition: ∂Var(g)p/∂q6 = 0 results in:4

q∗

6 =q34 + 5q24 + 5q4 − 2 ±

√∆3

q4(1 + q4)(3 + q4)(30)5

where ∆3 = 4 − q44 − 6q34 − 10q24 − 2q4. On the one hand, if one considers +√

∆3, a branch of acceptable solutions for the6

interval: 0.682 . . . < q∗

6 < 1,which corresponds to 0.4142 . . . < q4 < 0.481 . . . . On the other hand, if one considers−√

∆3,7

a branch of acceptable solutions for the complementary interval 0 < q∗

6 < 0.682 . . . also corresponds to the same interval:8

0.4142 . . . < q4 < 0.481 . . . . Outside the [0.4142 . . . , 0.481 . . .], there are no extremal values. In Fig. 4(b), the plots validate9

the behavior of the Var(g)p as a function of q6, for different values of q4. We can observe that for values of q4 belonging to10

the referred interval, extrema exist in the q6-values, respectively for the first and the second branch of solutions. In Fig. 511

Page 11: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

E. Almeida et al. / Physica A xx (xxxx) xxx–xxx 11

Fig. 8. Validation of our analytical results with Monte Carlo numerical simulations. The color diagrams show the average (left side) and variance (rightside) values of the cumulative payoff of the responder as function of n (iterations) and q6 (cutoff) via MC simulations. (For interpretation of the referencesto color in this figure legend, the reader is referred to the web version of this article.)

we choose the specific value q4 = 0.47 to show that a maximum corresponds to the first branch and that a minimum to the 1

second one. 2

We have presented a detailed study of fluctuations on the gain of players, considering acceptations that depend on the 3

offer, with and without cutoff effects. 4

4.2. Evolutionary strategies on the offers (game with unitary memory) 5

Now, let us study the case where the offers are incremented/decremented. Consider the case where the proposer can 6

increase/decrease the offer according to the previous responder’s decision. Our results are presented in two steps. 7

First, we calculate the average receptor’s gain as well as the variance when acceptation is randomly chosen with a 8

probability p (q3 = q4 = 0, q6 = 1 and p = 1/q5). In Figs. 6 and 7, the p × n diagrams show respectively results for Q4 9

the average and variance of the cumulative receptor’s gain for the different values of initial offers. The results from Monte 10

Carlo simulations are presented on left hand side in both the figures and Eqs. (18) and (21) on the right hand side. OurMonte 11

Carlo simulations simply mimic the sequence of results between two players under a certain nrun number of repetitions in 12

Page 12: Strategies generalization and payoff fluctuation optimization in the iterated ultimatum game

12 E. Almeida et al. / Physica A xx (xxxx) xxx–xxx

order to obtain the average and variance of the cumulative gain. Our analytical results are corroborated by Monte Carlo1

simulations. The gain optimization is not a trivial issue. We can observe that cumulative payoff (Fig. 6) and the variance2

(Fig. 7) can have equal values for different values of n, for different p values. In such color maps n × p we can observe3

patterns of iso-payoff and iso-variance.4

Next, consider that the accepting decision depends on the offer value. This situation is studied only using Monte Carlo5

simulations. Let us consider a linear accepting function (q4 = 1)6

fr(y) =

yq6

if 0 < y ≤ q6

1 if q6 < y ≤ 1(31)7

to study the effects of the cutoff, so that 0 ≤ q6 ≤ 1.8

In this case, an interesting pattern of stripes of iso-payoff appears in diagrams n× q6 (Fig. 8, left side), with cross over in9

values of q6 that depends on y0. For the variance (Fig. 8, right side), we can observe the regions of maximal dispersion are10

larger for small values y0 and concentrated for q6 > 0.5. However the magnitude of maximal values of variance are larger11

for greater values of y0.12

5. Conclusions13

Our paper presents results for the iterated and probabilistic ultimatum game based on generalized offering/accepting14

strategies. The proposer can offer random independent values (memoryless) or these values can depend on previous results15

(one-stepmemory). Also, the responder can randomly accept the offer or it can depend on the value. Formemoryless players,16

we calculate the averaged payoff and the averaged cumulative payoff, as well as their variances, using very general choices17

of offer/accepting functions. Our analytical calculations have been corroborated by numerical validations. The possible18

situationswhere the average andvariance of the payoff of the proposer and the responder are optimizedhave been found. For19

one-step memory, the system evolution is described by analytical results in some cases and Monte Carlo simulations been20

performed for all of them. Interesting patterns of iso-variance and iso-payoff were observed. Finally, our study of generalized21

ultimatum game under different proposal and accepting functions is the basis for future studies on the emergent behavior22

considering many players, with or without, topological structures, since it may still keep many possibilities of optimization23

as the ones of the iterated two-players game.24 Q5

Acknowledgment25

The authors thank CNPq (The Brazilian National Research Council) (305738/2010-0, 476722/2010-1) for its financial26

support.27

References28

[1] J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, Princeton, NJ, 1944.29

[2] J.M. Smith, J. Theoret. Biol. 47 (1974) 209–221.30

[3] G. Szabó, G. Fáth, Phys. Rep. 446 (2007) 97–216.31

[4] M.A. Nowak, R.M. May, Evolutionary games and spatial chaos, Nature 359 (1992) 826–829.32

[5] R.O.S. Soares, A.S. Martinez, The geometrical patterns of cooperation evolution in the spatial prisoner’s dilemma: an intra-groupmodel, Physica A 369(2006) 823–829.

33

[6] M.A. Pereira, A.S. Martinez, A.L. Espíndola, Exhaustive exploration of prisoner’s dilemma parameter space in one-dimensional cellular automata, Braz.J. Phys. 38 (2008) 65–69.

34

[7] M.A. Pereira, A.S. Martinez, A.L. Espíndola, Prisoner’s dilemma in one-dimensional cellular automata: visualization of evolutionary patterns, Internat.J. Modern Phys. C 19 (2008) 187–201.

35

[8] M.A. Pereira, A.S. Martinez, A.L. Espíndola, Pavlovian prisoner’s dilemma—analytical results, the quasi-regular phase and spatio-temporal patterns, J.Theoret. Biol. 265 (2010) 346–358.

36

[9] W. Guth, R. Schmittberger, B. Schwarze, J. Econ. Behav. Organ. 24 (1982) 153.37

[10] A. Szolnoki, M. Perc, G. Szabó, Phys. Rev. Lett. 109 (2012) 078701-1.Q638

[11] A.G. Sanfey, J.K. Rilling, J.A. Aronson, L.E. Nystrom, J.D. Cohen, Science 300 (2003) 1755.39

[12] M.N. Kuperman, S. Risau-Gusman, Eur. Phys. J. B 62 (2008) 233.40

[13] M.A. Nowak, K.M. Page, K. Sigmund, Science 289 (2000) 1773.41

[14] R. da Silva, G.A. Kellermann, L.C. Lamb, J. Theoret. Biol. 258 (2009) 208–218.42

[15] R. da Silva, G.A. Kellerman, Braz. J. Phys. 37 (2007) 1206–1211.43

[16] K.M. Page, M.A. Nowak, K. Sigmund, Proc. R. Soc. B: Biol. Sci. 267 (2000) 2177–2182.44

[17] J. Iranzo, J. Román, A. Sánchez, J. Theoret. Biol. 278 (2011) 1–10.45