Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before

Uri Zwick Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA MDS summer school The Combinatorics of Linear and Semidefinite Programming August 14-16, 2012

Largest improvement Largest slope Dantzigs rule Largest modified cost Blands rule avoids cycling Lexicographic rule also avoids cycling Deterministic pivoting rules All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvtal (1978), Goldfarb-Sit (1979), , Amenta-Ziegler (1996)

Klee-Minty cubes (1972) Taken from a paper by Grtner-Henk-Ziegler

Random-Edge Choose a random improving edge Randomized pivoting rules Random-Facet is sub-exponential! Random-Facet Described in previous lecture [Kalai (1992)] [Matouek-Sharir-Welzl (1996)] Are Random-Edge and Random-Facet polynomial ???

Abstract objective functions (AOFs) Every face should have a unique sink Acyclic Unique Sink Orientations (AUSOs)

AUSOs of n-cubes The directed diameter is exactly n Stickney, Watson (1978) Morris (2001) Szab, Welzl (2001) Grtner (2002) USOs and AUSOs Exercise: Prove it. 2n facets 2 n vertices

AUSO results Random-Facet is sub-exponential [Kalai (1992)] [Matouek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matouek (1994)] Sub-exponential lower bound for Random-Edge [Matouek-Szab (2006)] Lower bounds do not correspond to actual linear programs Can geometry help?

Random-Edge, Random-Facet are not polynomial for LPs Consider LPs that correspond to Markov Decision Processes (MDPs) Simplex Policy iteration Obtain sub-exponential lower bounds for the Random-Edge and Random-Facet variants of the Policy Iteration algorithm for MDPs

Upper boundLower boundAlgorithm RANDOM EDGE RANDOM FACET Randomized Pivoting Rules [Kalai 92] [Matousek-Sharir-Welzl 92] [Friedmann-Hansen-Z 11] Lower bounds obtained for LPs whose diameter is n

3-bit counter

Limiting average version Discounted version Total reward version Turn-based 2-Player Stochastic Games [Shapley 53] [Gillette 57] [Condon 92] Both players have optimal positional strategies Can optimal strategies be found in polynomial time?

Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games

A deterministic strategy specifies which action to take given every possible history A memoryless strategy is a strategy that depends only on the current state A positional strategy is a deterministic memoryless strategy Strategies / Policies A mixed strategy is a probability distribution over deterministic strategies

Values Both players have positional optimal strategies positional general positional general There are positional strategies that are optimal for every starting position

Markov Decision Processes [Shapley 53] [Bellman 57] [Howard 60] Optimal positional policies can be found using LP Is there a strongly polynomial time algorithm? Limiting average version Discounted version Total reward version

Stochastic shortest paths (SSPs) Minimize the expected cost of getting to the target

Limiting average version Discounted version Total reward version Turn-based non-Stochastic Games [Ehrenfeucht-Mycielski (1979)] Both players have optimal positional strategies Still no polynomial time algorithms known! Easy

Turn-based Stochastic Games (SGs) long-term planning in a stochastic and adversarial environment Deterministic MDPs (DMDPs) non-stochastic, non-adversarial Markov Decision Processes (MDPs) non-adversarial stochastic Non-Stochastic Games (MPGs) adversarial non-stochastic 2-players 2-players1-players 1-player

Parity Games (PGs) A simple example 2 141 32 EVEN wins if largest priority seen infinitely often is even Priorities

Parity Games (PGs) EVEN 3 ODD 8 EVEN wins if largest priority seen infinitely often is even Equivalent to many interesting problems in automata and verification: Non-emptyness of -tree automata modal -calculus model checking

Parity Games (PGs) EVEN 3 ODD 8 Replace priority k by payoff ( n) k Mean Payoff Games (MPGs) Move payoffs to outgoing edges [Stirling (1993)] [Puri (1995)]

Lets focus on MDPs

Evaluating a policy MDP + policy Markov Chain Values of a fixed policy can be found by solving a system of linear equations

Improving a policy (using a single switch)

Policy iteration for MDPs [Howard 60]

Dual LP formulation for MDPs

Basic solution (positional) Policy a is not an improving switch

Primal LP formulation for MDPs Vertex Complement of a Policy

TB2SG NP co-NP TB2SG P ???

Policy iteration variants

Random-Facet for MDPs Choose a random action not in the current policy and ignore it. Solve recursively without this action. If the ignored action is not an improving switch with respect to the returned policy, we are done. Otherwise, switch to the ignored action and solve recursively.

Policy iteration for 2-player games Keep a strategy of player 1 and an optimal counter-strategy of player 2. Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithm for turn-based 2-player stochastic games!

Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann 09] Switch-All for MDPs is exponential [Fearnley 10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z 11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ11]

Lower bound for Random-Facet Implement a randomized counter

Lower bound for Random-Facet Implement a randomized counter Lower bound for Random-Edge Implement a standard counter

Dantzigs pivoting rule, and the standard policy iteration algorithm, Switch-All, are polynomial for discounted MDPs, with a fixed discount factor [Ye 10] Switch-All is almost linear for discounted MDPs and discounted turn-based 2-player Stochastic Games, with a fixed discount factor [Hansen-Miltersen-Z 11] Upper bounds for Policy Iteration

Non- discounted DiscountedAlgorithm SWITCH BEST SWITCH ALL [Ye 10] [Hansen-Miltersen-Z 11] [Friedmann 09] [Fearnley 10] Deterministic Algorithms [Condon 93]

3-bit counter (N) 15

3-bit counter 010

3-bit counter Improving switches 010 Random-Edge can choose either one of these improving switches

Cycle gadgets Cycles close one edge at a time Shorter cycles close faster

Cycle gadgets Cycles open simultaneously

3-bit counter 2 3 010 1

From b to b+1 in seven phases B k -cycle closes C k -cycle closes U-lane realigns A i -cycles and B i -cycles for i

Documents

Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before