Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds...
If you can't read please download the document
Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before
Uri Zwick Tel Aviv Univ. Randomized pivoting rules for the
simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the
TexPoint manual before you delete this box.: AAAAAA MDS summer
school The Combinatorics of Linear and Semidefinite Programming
August 14-16, 2012
Slide 3
Largest improvement Largest slope Dantzigs rule Largest
modified cost Blands rule avoids cycling Lexicographic rule also
avoids cycling Deterministic pivoting rules All known to require an
exponential number of steps, in the worst-case Klee-Minty (1972)
Jeroslow (1973), Avis-Chvtal (1978), Goldfarb-Sit (1979), ,
Amenta-Ziegler (1996)
Slide 4
Klee-Minty cubes (1972) Taken from a paper by
Grtner-Henk-Ziegler
Slide 5
Random-Edge Choose a random improving edge Randomized pivoting
rules Random-Facet is sub-exponential! Random-Facet Described in
previous lecture [Kalai (1992)] [Matouek-Sharir-Welzl (1996)] Are
Random-Edge and Random-Facet polynomial ???
Slide 6
Abstract objective functions (AOFs) Every face should have a
unique sink Acyclic Unique Sink Orientations (AUSOs)
Slide 7
AUSOs of n-cubes The directed diameter is exactly n Stickney,
Watson (1978) Morris (2001) Szab, Welzl (2001) Grtner (2002) USOs
and AUSOs Exercise: Prove it. 2n facets 2 n vertices
Slide 8
AUSO results Random-Facet is sub-exponential [Kalai (1992)]
[Matouek-Sharir-Welzl (1996)] Sub-exponential lower bound for
Random-Facet [Matouek (1994)] Sub-exponential lower bound for
Random-Edge [Matouek-Szab (2006)] Lower bounds do not correspond to
actual linear programs Can geometry help?
Slide 9
Random-Edge, Random-Facet are not polynomial for LPs Consider
LPs that correspond to Markov Decision Processes (MDPs) Simplex
Policy iteration Obtain sub-exponential lower bounds for the
Random-Edge and Random-Facet variants of the Policy Iteration
algorithm for MDPs
Slide 10
Upper boundLower boundAlgorithm RANDOM EDGE RANDOM FACET
Randomized Pivoting Rules [Kalai 92] [Matousek-Sharir-Welzl 92]
[Friedmann-Hansen-Z 11] Lower bounds obtained for LPs whose
diameter is n
Slide 11
3-bit counter
Slide 12
Limiting average version Discounted version Total reward
version Turn-based 2-Player Stochastic Games [Shapley 53] [Gillette
57] [Condon 92] Both players have optimal positional strategies Can
optimal strategies be found in polynomial time?
Slide 13
Stopping condition For the total reward version assume: No
matter what the players do, the game stops with probability 1.
Exercise: Show that discounted games correspond directly to
stopping total reward games
Slide 14
A deterministic strategy specifies which action to take given
every possible history A memoryless strategy is a strategy that
depends only on the current state A positional strategy is a
deterministic memoryless strategy Strategies / Policies A mixed
strategy is a probability distribution over deterministic
strategies
Slide 15
Values Both players have positional optimal strategies
positional general positional general There are positional
strategies that are optimal for every starting position
Slide 16
Markov Decision Processes [Shapley 53] [Bellman 57] [Howard 60]
Optimal positional policies can be found using LP Is there a
strongly polynomial time algorithm? Limiting average version
Discounted version Total reward version
Slide 17
Stochastic shortest paths (SSPs) Minimize the expected cost of
getting to the target
Slide 18
Limiting average version Discounted version Total reward
version Turn-based non-Stochastic Games [Ehrenfeucht-Mycielski
(1979)] Both players have optimal positional strategies Still no
polynomial time algorithms known! Easy
Slide 19
Turn-based Stochastic Games (SGs) long-term planning in a
stochastic and adversarial environment Deterministic MDPs (DMDPs)
non-stochastic, non-adversarial Markov Decision Processes (MDPs)
non-adversarial stochastic Non-Stochastic Games (MPGs) adversarial
non-stochastic 2-players 2-players1-players 1-player
Slide 20
Parity Games (PGs) A simple example 2 141 32 EVEN wins if
largest priority seen infinitely often is even Priorities
Slide 21
Parity Games (PGs) EVEN 3 ODD 8 EVEN wins if largest priority
seen infinitely often is even Equivalent to many interesting
problems in automata and verification: Non-emptyness of -tree
automata modal -calculus model checking
Slide 22
Parity Games (PGs) EVEN 3 ODD 8 Replace priority k by payoff (
n) k Mean Payoff Games (MPGs) Move payoffs to outgoing edges
[Stirling (1993)] [Puri (1995)]
Slide 23
Lets focus on MDPs
Slide 24
Evaluating a policy MDP + policy Markov Chain Values of a fixed
policy can be found by solving a system of linear equations
Slide 25
Improving a policy (using a single switch)
Slide 26
Slide 27
Policy iteration for MDPs [Howard 60]
Slide 28
Dual LP formulation for MDPs
Slide 29
Basic solution (positional) Policy a is not an improving
switch
Slide 30
Primal LP formulation for MDPs Vertex Complement of a
Policy
Slide 31
TB2SG NP co-NP TB2SG P ???
Slide 32
Policy iteration variants
Slide 33
Random-Facet for MDPs Choose a random action not in the current
policy and ignore it. Solve recursively without this action. If the
ignored action is not an improving switch with respect to the
returned policy, we are done. Otherwise, switch to the ignored
action and solve recursively.
Slide 34
Policy iteration for 2-player games Keep a strategy of player 1
and an optimal counter-strategy of player 2. Perform improving
switches for player 1 and recompute an optimal counter-strategy for
player 2. Exercise: Does it really work? Random-Facet yields a
sub-exponential algorithm for turn-based 2-player stochastic
games!
Slide 35
Lower bounds for Policy Iteration Switch-All for Parity Games
is exponential [Friedmann 09] Switch-All for MDPs is exponential
[Fearnley 10] Random-Facet for Parity Games is sub-exponential
[Friedmann-Hansen-Z 11] Random-Facet and Random-Edge for MDPs and
hence for LPs are sub-exponential [FHZ11]
Slide 36
Lower bound for Random-Facet Implement a randomized
counter
Slide 37
Lower bound for Random-Facet Implement a randomized counter
Lower bound for Random-Edge Implement a standard counter
Slide 38
Dantzigs pivoting rule, and the standard policy iteration
algorithm, Switch-All, are polynomial for discounted MDPs, with a
fixed discount factor [Ye 10] Switch-All is almost linear for
discounted MDPs and discounted turn-based 2-player Stochastic
Games, with a fixed discount factor [Hansen-Miltersen-Z 11] Upper
bounds for Policy Iteration
Slide 39
Non- discounted DiscountedAlgorithm SWITCH BEST SWITCH ALL [Ye
10] [Hansen-Miltersen-Z 11] [Friedmann 09] [Fearnley 10]
Deterministic Algorithms [Condon 93]
Slide 40
3-bit counter (N) 15
Slide 41
3-bit counter 010
Slide 42
3-bit counter Improving switches 010 Random-Edge can choose
either one of these improving switches
Slide 43
Cycle gadgets Cycles close one edge at a time Shorter cycles
close faster
Slide 44
Cycle gadgets Cycles open simultaneously
Slide 45
3-bit counter 2 3 010 1
Slide 46
From b to b+1 in seven phases B k -cycle closes C k -cycle
closes U-lane realigns A i -cycles and B i -cycles for i