Uri Zwick – Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Uri Zwick Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA MDS summer school The Combinatorics of Linear and Semidefinite Programming August 14-16, 2012
  • Slide 3
  • Largest improvement Largest slope Dantzigs rule Largest modified cost Blands rule avoids cycling Lexicographic rule also avoids cycling Deterministic pivoting rules All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvtal (1978), Goldfarb-Sit (1979), , Amenta-Ziegler (1996)
  • Slide 4
  • Klee-Minty cubes (1972) Taken from a paper by Grtner-Henk-Ziegler
  • Slide 5
  • Random-Edge Choose a random improving edge Randomized pivoting rules Random-Facet is sub-exponential! Random-Facet Described in previous lecture [Kalai (1992)] [Matouek-Sharir-Welzl (1996)] Are Random-Edge and Random-Facet polynomial ???
  • Slide 6
  • Abstract objective functions (AOFs) Every face should have a unique sink Acyclic Unique Sink Orientations (AUSOs)
  • Slide 7
  • AUSOs of n-cubes The directed diameter is exactly n Stickney, Watson (1978) Morris (2001) Szab, Welzl (2001) Grtner (2002) USOs and AUSOs Exercise: Prove it. 2n facets 2 n vertices
  • Slide 8
  • AUSO results Random-Facet is sub-exponential [Kalai (1992)] [Matouek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matouek (1994)] Sub-exponential lower bound for Random-Edge [Matouek-Szab (2006)] Lower bounds do not correspond to actual linear programs Can geometry help?
  • Slide 9
  • Random-Edge, Random-Facet are not polynomial for LPs Consider LPs that correspond to Markov Decision Processes (MDPs) Simplex Policy iteration Obtain sub-exponential lower bounds for the Random-Edge and Random-Facet variants of the Policy Iteration algorithm for MDPs
  • Slide 10
  • Upper boundLower boundAlgorithm RANDOM EDGE RANDOM FACET Randomized Pivoting Rules [Kalai 92] [Matousek-Sharir-Welzl 92] [Friedmann-Hansen-Z 11] Lower bounds obtained for LPs whose diameter is n
  • Slide 11
  • 3-bit counter
  • Slide 12
  • Limiting average version Discounted version Total reward version Turn-based 2-Player Stochastic Games [Shapley 53] [Gillette 57] [Condon 92] Both players have optimal positional strategies Can optimal strategies be found in polynomial time?
  • Slide 13
  • Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games
  • Slide 14
  • A deterministic strategy specifies which action to take given every possible history A memoryless strategy is a strategy that depends only on the current state A positional strategy is a deterministic memoryless strategy Strategies / Policies A mixed strategy is a probability distribution over deterministic strategies
  • Slide 15
  • Values Both players have positional optimal strategies positional general positional general There are positional strategies that are optimal for every starting position
  • Slide 16
  • Markov Decision Processes [Shapley 53] [Bellman 57] [Howard 60] Optimal positional policies can be found using LP Is there a strongly polynomial time algorithm? Limiting average version Discounted version Total reward version
  • Slide 17
  • Stochastic shortest paths (SSPs) Minimize the expected cost of getting to the target
  • Slide 18
  • Limiting average version Discounted version Total reward version Turn-based non-Stochastic Games [Ehrenfeucht-Mycielski (1979)] Both players have optimal positional strategies Still no polynomial time algorithms known! Easy
  • Slide 19
  • Turn-based Stochastic Games (SGs) long-term planning in a stochastic and adversarial environment Deterministic MDPs (DMDPs) non-stochastic, non-adversarial Markov Decision Processes (MDPs) non-adversarial stochastic Non-Stochastic Games (MPGs) adversarial non-stochastic 2-players 2-players1-players 1-player
  • Slide 20
  • Parity Games (PGs) A simple example 2 141 32 EVEN wins if largest priority seen infinitely often is even Priorities
  • Slide 21
  • Parity Games (PGs) EVEN 3 ODD 8 EVEN wins if largest priority seen infinitely often is even Equivalent to many interesting problems in automata and verification: Non-emptyness of -tree automata modal -calculus model checking
  • Slide 22
  • Parity Games (PGs) EVEN 3 ODD 8 Replace priority k by payoff ( n) k Mean Payoff Games (MPGs) Move payoffs to outgoing edges [Stirling (1993)] [Puri (1995)]
  • Slide 23
  • Lets focus on MDPs
  • Slide 24
  • Evaluating a policy MDP + policy Markov Chain Values of a fixed policy can be found by solving a system of linear equations
  • Slide 25
  • Improving a policy (using a single switch)
  • Slide 26
  • Slide 27
  • Policy iteration for MDPs [Howard 60]
  • Slide 28
  • Dual LP formulation for MDPs
  • Slide 29
  • Basic solution (positional) Policy a is not an improving switch
  • Slide 30
  • Primal LP formulation for MDPs Vertex Complement of a Policy
  • Slide 31
  • TB2SG NP co-NP TB2SG P ???
  • Slide 32
  • Policy iteration variants
  • Slide 33
  • Random-Facet for MDPs Choose a random action not in the current policy and ignore it. Solve recursively without this action. If the ignored action is not an improving switch with respect to the returned policy, we are done. Otherwise, switch to the ignored action and solve recursively.
  • Slide 34
  • Policy iteration for 2-player games Keep a strategy of player 1 and an optimal counter-strategy of player 2. Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithm for turn-based 2-player stochastic games!
  • Slide 35
  • Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann 09] Switch-All for MDPs is exponential [Fearnley 10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z 11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ11]
  • Slide 36
  • Lower bound for Random-Facet Implement a randomized counter
  • Slide 37
  • Lower bound for Random-Facet Implement a randomized counter Lower bound for Random-Edge Implement a standard counter
  • Slide 38
  • Dantzigs pivoting rule, and the standard policy iteration algorithm, Switch-All, are polynomial for discounted MDPs, with a fixed discount factor [Ye 10] Switch-All is almost linear for discounted MDPs and discounted turn-based 2-player Stochastic Games, with a fixed discount factor [Hansen-Miltersen-Z 11] Upper bounds for Policy Iteration
  • Slide 39
  • Non- discounted DiscountedAlgorithm SWITCH BEST SWITCH ALL [Ye 10] [Hansen-Miltersen-Z 11] [Friedmann 09] [Fearnley 10] Deterministic Algorithms [Condon 93]
  • Slide 40
  • 3-bit counter (N) 15
  • Slide 41
  • 3-bit counter 010
  • Slide 42
  • 3-bit counter Improving switches 010 Random-Edge can choose either one of these improving switches
  • Slide 43
  • Cycle gadgets Cycles close one edge at a time Shorter cycles close faster
  • Slide 44
  • Cycle gadgets Cycles open simultaneously
  • Slide 45
  • 3-bit counter 2 3 010 1
  • Slide 46
  • From b to b+1 in seven phases B k -cycle closes C k -cycle closes U-lane realigns A i -cycles and B i -cycles for i