View
262
Download
1
Embed Size (px)
DESCRIPTION
Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration
Citation preview
The complexity of solving reachability games using value andstrategy iteration
Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen
Aarhus UniversityDenmarkCSR 2011, 14’th June
Overview
What are concurrent reachabillity games? Two standard algorithms solving concurrent
reachabillity games: The value iteration algorithm The strategy iteration algorithm
Examplify important facts for the proof of the time lower bound for both algorithms
1/42
Matrix games von Neumann 1928
0 -1 1
1 0 -1
-1 1 0
2/42
Matrix games von Neumann 1928
0 -1 1
1 0 -1
-1 1 0
2/42
0 -1 1
1 0 -1
-1 1 0
Each entry can be either 0, 1 or a pointer
vs.Dante* Lucifer*
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
0 1
* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
vs.Dante* Lucifer*
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
0
0 0
0
0 0
0
0 0
0
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1
0 1
0 0 1
0
0 0
0
0 0
0
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
3/42
Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S:
S S
0 S
0 0
S S
0 S
0 0
3/42
Histories
Each entry can be either 0, 1 or a pointer
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
4/42
Histories and strategies
History: Sequence of positions and choices for each player in each position.
Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history
S1: Set of strategies for Dante
S2: Set of strategies for Lucifer
H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)
5/42
Payoffs
v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.
6/42
Everett 1957
iviviv
),,( supinf),,( infsup :i1221 SSSS
Value of i
iH
viviv
),,( supinf),,( infsup :i1221 SSH
7/42
Algorithmic problems
Quantitatively solving a game: Given the game, compute the value of all positions.
Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.
8/42
Value iteration Shapley 1953
9/42
Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.
Gt: A modified version of G, where Dante loses after t moves.
Our results: Lower bound for value iteration There exists a concurrent reachabillity game
G, with N matrices and m rows and columns in each matrix, so that:
val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2
10/42
Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)
11/42
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
12/42
Value iteration example – G0
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
0
0 0
0
12/42
Value iteration example – G0
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
0
0
0
0
1 S S
0 1 S
0 0 1
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0 0
00
0 0
13/42
Value iteration example – G1
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1 0
0 0
0
0 0
01
1
1
1
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1
0 1
0 0 1
0
0
0
0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1
0 1
0 0 1
0
0 0000
0
13/42
0
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
1 0 0
0 1 0
0 0 1
0.33333/
0
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0 0
0.33333/0 00
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1 0
0 0
0 0000
00000.33333/
0 0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0 0 0
0 0 0
0 0 0
0
0.33333/0
00/
0
13/42
Value iteration example – G1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0
0
0.33333/0
0/ 0/
0/
13/42
Value iteration example – G2
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0
0
0
0.33333/0.33333
0.11111/ 0/
0/
14/42
Value iteration example – G3
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0
0
0.33333/0.33333
0.11111/ 0/
0.03704/
15/42
Value iteration example – G4
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0.03704
0
0.33333/0.33333
0.11111/ 0.01235/
0.03704/
16/42
Value iteration example – G5
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11111
0.03704
0.01235
0.33748/0.33333
0.11533/ 0.01754/
0.04147/
17/42
Value iteration example – G6
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11533
0.04147
0.01754
0.33925/0.33748
0.11855/ 0.02172/
0.04493/
18/42
Value iteration example – G7
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.11855
0.04493
0.02172
0.34068/0.33925
0.12064/ 0.02519/
0.04772/
19/42
Value iteration example – G8
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.12064
0.04772
0.02519
0.34187/0.34068
0.12388/ 0.02815/
0.04991/
20/42
Value iteration example – G9
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1 S S
0 1 S
0 0 1
0.12388
0.04991
0.02815
0.34378/0.34187
0.12517/ 0.03070/
0.05129/
21/42
Strategy iterationChatterjee, de Alfaro, Henzinger ’06
22/42
Was conjectured to be fast
Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after
t=(1/ε)mO(N) iterations of strategy iteration
This follows from the corresponding results for value iteration
23/42
Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game
G, with N matrices, for large N, and m rows and columns in each matrix, so that:
val(G)=1 and The strategy optained by strategy iteration
guarantees winning probability at most 4m-N/2, for t= 2mN/4
24/42
Strategy iteration, m=2
N Number of iterations neededto get over 1/2
7 18446744073709551617
8 340282366920938463463374607431768211457
9 115792089237316195423570985008687907853269984665640564039457584007913129639937
Strategy iteration: Before iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S:
S S
0 S
0 0
1. Start strategy for Dante:= Uniform
25/42
Strategy iteration: Before iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
SS S
0 S
0 0
1. Start strategy for Dante:= Uniform
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
25/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1
1
1
0
0 0
0
0 0
0
0 0
0
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
0 S
S S
0 S
0 0
S S
0 S
0 0
1
0 0
S S
S S
0 S
0 0
0.66667
The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
0
1
0.66667
The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.
0.66667
0.66667
0.66667
0.66667
0.66667
0.66667
0.66667
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
S
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
0.11111
0.03704
0.01235
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
Strategy iteration: Iteration 1
1 S S
0 1 S
0 0 1
0.01235
0
0 0
S
1
1
1
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.012350.012350.01235
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.33748
26/42
Strategy iteration: Iteration 1
S
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
0.33333
26/42
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
Strategy iteration: Iteration 1
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
26/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11111
0.03704
0.01235
0.33333
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.33748
0.33332
0.32920
0.34599
0.33317
0.32084
0.37327
0.33180
0.29493
0.47368
0.31579
0.21053
27/42
Strategy iteration: Iteration 2
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
27/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.11677
0.04359
0.02065
0.33748
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34031
0.33329
0.32640
0.35458
0.33289
0.31253
0.39987
0.33180
0.32917
0.55453
0.29186
0.15361
28/42
Strategy iteration: Iteration 3
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
28/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12067
0.04825
0.02676
0.34031
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
0.12360
0.05185
0.03154
0.34241
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34241
0.33325
0.32434
0.36097
0.33259
0.30644
0.41947
0.32646
0.25407
0.60831
0.27098
0.12071
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
29/42
Strategy iteration: Iteration 4
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
29/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12360
0.05185
0.03154
0.34241
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34407
0.33322
0.32271
0.36601
0.33230
0.30169
0.43486
0.32390
0.24125
0.64720
0.25350
0.09930
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 5
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
30/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12593
0.05476
0.03544
0.34407
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34543
0.33319
0.32138
0.37015
0.33202
0.29783
0.44745
0.32152
0.23103
0.67692
0.23882
0.08426
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 6
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
31/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12786
0.05721
0.03873
0.34543
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34658
0.33316
0.32026
0.37366
0.33177
0.29457
0.45807
0.31933
0.22260
0.70055
0.22633
0.07312
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 7
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
32/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.12950
0.05932
0.04156
0.34658
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34758
0.33313
0.31929
0.37670
0.33153
0.29177
0.46723
0.31730
0.21547
0.71988
0.21557
0.06455
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 8
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
33/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13093
0.06118
0.04404
0.34758
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34845
0.33311
0.31844
0.37937
0.33130
0.28933
0.47527
0.31541
0.20932
0.73606
0.20618
0.05776
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Strategy iteration: Iteration 9
S
1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante
0.13219
0.06283
0.04624
0.34845
0.34923
0.33309
0.31768
0.38176
0.33109
0.28715
0.48241
0.31366
0.20393
0.74985
0.19791
0.05224
1 S S
0 1 S
0 0 1
S S
0 S
0 0
S S
0 S
0 0
S S
0 S
0 0
34/42
Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1
and m. Dante must try to guess the number. If he guesses correctly N times in a row, he
goes to heaven. If he ever guesses incorrectly overshooting
Lucifer’s number, he goes to hell.
35/42
Interesting fact
The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.
36/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
1
0 1
0
0
1
0 1
1
0 1
Strategy iteration on 3 matrices
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
1
0 1
0
0
1
0 1
1
0 1
t:=0
Strategy iteration on 3 matrices
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=00
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
37/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
0
0
1
0 1
1
0 1
t:=10.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.25
0.125
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=10.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
38/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.5
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.5
0.5
0.25
0.125
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.66667
0.66667
0.33333
0.66667
0.33333
0.57143
0.42857
0.53333
0.46667
0.66667
0.53333
0.30476
0.20317
1
0 1
0
0
1
0 1
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=20.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
39/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.66667
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.66667
1
0 1
0
0
1
0 1
0.53333
0.30476
0.20317
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.75000
0.75000
0.25000
0.75000
0.25000
0.61765
0.38235
0.55654
0.44346
0.75000
0.55654
0.34374
0.25781
1
0 1
0
0
1
0 1
40/42
Exemplifying important factsValue iteration on 1 matrix
Strategy iteration on 1 matrix
Strategy iteration on 3 matrices
1
0 1
t:=30.75000
0.80000
0.20000
0.80000
0.20000
0.65072
0.34928
0.57399
0.42601
0.75000
0.55654
0.34374
0.25781
1
0 1
0
0
1
0 1
41/42
The end
Open problems: Find a fast algorithm for the problem
There exists a PSPACE algorithm for the problem, but it is not fast.
Thanks for listening
42/42