Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary

Combinatorial Problems I: Finding Solutions

Ashish Sabharwal

Cornell University

March 3, 2008

2nd Asian-Pacific School on Statistical Physics and Interdisciplinary Applications KITPC/ITP-CAS, Beijing, China

Computer Science

Mathematics

Operations Research

Physics Cognitive Science

Economics

Cross-fertilizationof ideas for the study

and design ofIntelligent SystemsPhase transition

Engineering

Research part of Cornell’s Intelligent Information Systems Institute (IISI)Director: Carla Gomes

3

Combinatorial Problems

Examples

• Routing: Given a partially connected networkon N nodes, find the shortest path between X and Y

• Traveling Salesperson Problem (TSP): Given apartially connected network on N nodes, find a paththat visits every node of the network exactly once[much harder!!]

• Scheduling: Given N tasks with earliest start times, completion deadlines, and set of M machines on which they can execute, schedule them so that they all finish by their deadlines

4

Problem Instance, Algorithm

• Specific instantiation of the problem

• E.g. three instances for the routing problem with N=8 nodes:

• Objective: a single, generic algorithm for the problem that can solve any instance of that problem

A sequence of steps, a “recipe”

5

Measuring the Effectiveness of Algorithms

• Capture scaling with input size N, rather than runtime on specific instances

• The most common notion in Computer Science is worst-case complexity: What is the longest time (or number of steps) the algorithm might take on any input of size N?

Perhaps only N steps, 100 N+5 N linear time, O(N)

Maybe N2 steps, or N2 + 4 N + 6 quadratic ,O(N2)

Maybe N3 + 1000 log N cubic, O(N3)

… … …

Maybe 2N, or 2N + N1000 exponential, O(2N)

6

Polynomial vs. Exponential Complexity

exponential

polynomial

Polynomial time: “tractable”, canhope to solve very large problemswith enough computing power

E.g. known routing / shortestpath algorithms [O(N3)]

Exponential time: quickly run intoscalability issues as N increases

E.g. best known algorithms for TSP

Are some problems inherently harder than others?

A large amount of work on answering this question: computational complexity theory

8

P

NP

P^#P

PSPACE

NP-complete: SAT, scheduling, graph coloring, puzzles, …

PSPACE-complete: QBF, adversarial planning, chess (bounded), …

EXP-complete: games like Go, …

P-complete: circuit-value, …

Note: widely believed hierarchy; know P≠EXP for sure

In P: sorting, shortest path, …

Computational Complexity Hierarchy

Easy

Hard

PH

EXP

#P-complete/hard: #SAT, sampling, probabilistic inference, …

9

NP-Completeness

• P : class of problems for which a solution can be found in poly time

e.g. can find a shortest path in poly time

• NP: class of problems for which a solution can be verified in poly time

e.g. can’t find a TSP solution in poly time (as far as we know) but, given a candidate solution (a “witness”)

can verify the correctness of the witness in poly time

“N”: non-deterministic, with the power of “guessing” “P”: polynomial time

• NP-complete: the “hardest” problems within NP

10

One of the biggest discoveries in Computer Science:

All NP-complete problems are equally hard! [worst-case complexity]

• An algorithm for any one NP-complete problem can be used to solve any other NP-complete problem with only a polynomial overhead!

• There are catalogues of 10,000’s of such problems

e.g. “Boolean satisfiability” or SAT, TSP, scheduling, (bounded) planning, chip verification, 0-1 integer programming, graph coloring, logical inference, …

[Similarly for PSPACE-complete, #P-complete, etc.]

NP-Completeness

Can one design a single algorithm that can efficiently solve thousands of different problems of interest?

12

The Quest for Machine Reasoning

A cornerstone of Artificial Intelligence

Objective: Develop foundations and technology to enable effective, practical, large-scale automated reasoning.

Computational complexity of reasoning appears to severely limit real-world applications

Current reasoning technology

Revisiting the challenge:Significant progress with new ideas / tools for dealing with complexity (scale-up), uncertainty, and multi-agent reasoning

Machine Reasoning (1960-90s)

13

General Automated Reasoning

GeneralInferenceEngine

Solution

Domain-specific

Probleminstance

applicable to all domainswithin range of modeling language

ModelGenerator(Encoder)

Research objective

Better reasoning and modeling technology

Impact

Faster solutionsin several domains

e.g. logistics, chess,planning, scheduling, ...

Generic

14

• EXPONENTIAL COMPLEXITY: INHERENT AN worst case N= No. of Variables/Objects A= Object states

• TIME/SPACE Granularity Object states

• Current implementations trade time with soundness

Question: Given: X1= true; X2 = false; X7=true. What is X4 = ?

Answer Development: Inference Chain

Step 1: X7 X8 (rule 4)Step 2: X8 X5 (rule 6)Step 3: X5 X3 or X6 (rule 3)

Case A: X6 = trueStep 4: X6 not X9Step 5: X9 not X8Step 6: Contradiction Backtrack to M

Case B: X3 = trueX1 & (not X2) & X3 X4Step 7: X4 = true (Rule 1)

M

Search for rules to apply

Check Contradictions

For N variables: 2N cases drive complexity!

Simple Example:

Variables (binary)X1 = email_ receivedX2 = in_ meetingX3 = urgentX4 = respond_to_email

X5 = near_deadlineX6 = postpone

X7 = air_ticket_info_requestX8 = travel_ requestX9 = info_request

Rules:1. X1 & (not X2) & X3 X42. X2 not X4

3. X5 X3 or X64. X7 X85. X8 X96. X8 X57. X6 not X9

Knowledge Base

Reasoning Complexity

15

Exponential Complexity Growth: The Challenge of Complex Domains

100 200

10K 50K

20K 100K

0.5M 1M

1M5M

Variables

1030

10301,020

10150,500

106020

103010

Cas

e co

mp

lexi

ty

Car repair diagnosis

Deep space mission control

Chess (20 steps deep)

VLSIVerification

War Gaming

100K 450K

Military Logistics

Seconds until heat death of sun

Protein foldingCalculation (petaflop-year)

No. of atomson the earth

1047

100 10K 20K 100K 1MRules (Constraints)

Exponential

Compl

exity

Note: rough estimates, for propositional reasoning

[Credit: Kumar, DARPA; Cited in Computer World magazine]

16

Focus: Combinatorial Search Spaces

Specifically, the Boolean satisfiability problem, SAT

Significant progress since the 1990’s.

How much?

• Problem size: We went from 100 variables, 200 constraints (early 90’s) to 1,000,000 vars. and 5,000,000 constraints in 15 years.

Search space: from 10^15 to 10^300,000.[Aside: “one can encode quite a bit in 1M variables.”]

• Tools: 50+ competitive SAT solvers available

Overview of the state of the art: Plenary talk at IJCAI-05 (Selman); Discrete App. Math. article (Kautz-Selman

’06)

Progress in Last 15 Years

17

How Large are the Problems?

A bounded model checking problem:

18

i.e., ((not x1) or x7) ((not x1) or x6)

etc.

x1, x2, x3, etc. are our Boolean variables(to be set to True or False)

Should x1 be set to False??

SAT Encoding(automatically generated from problem specification)

19

i.e., (x177 or x169 or x161 or x153 …x33 or x25 or x17 or x9 or x1 or (not x185))

clauses / constraints are getting more interesting…

…

Note x1 …

10 Pages Later:

20

…

4,000 Pages Later:

21

Current SAT solvers solve this instance in under 30 seconds!

Search space of truth assignments:

Finally, 15,000 Pages Later:

22

SAT Solver Progress

Instance Posit' 94 Grasp' 96 Sato' 98 Chaff' 01

ssa2670-136 40.66s 1.20s 0.95s 0.02s

bf1355-638 1805.21s 0.11s 0.04s 0.01s

pret150_25 >3000s 0.21s 0.09s 0.01s

dubois100 >3000s 11.85s 0.08s 0.01s

aim200-2_0-no-1 >3000s 0.01s < 0.01s < 0.01s

2dlx_..._bug005 >3000s >3000s >3000s 2.90s

c6288 >3000s >3000s >3000s >3000s

Source: Marques-Silva 2002

Solvers have continually improved over time

23

How do SAT Solvers Keep Improving?

From academically interesting to practically relevant.

We now have regular SAT solver competitions.

(Germany ’89, Dimacs ’93, China ’96, SAT-02, SAT-03, …, SAT-07)

E.g. at SAT-2006 (Seattle, Aug ’06):

• 35+ solvers submitted, most of them open source

• 500+ industrial benchmarks

• 50,000+ benchmark instances available on the www

This constant improvement in SAT solvers is the key to making, e.g.,SAT-based planning very successful.

24

Current Automated Reasoning Tools

Most-successful fully automated methods: based on Boolean Satisfiability (SAT) / Propositional Reasoning

– Problems modeled as rules / constraints over Boolean variables– “SAT solver” used as the inference engine

Applications: single-agent search

• AI planning SATPLAN-06, fastest optimal planner; ICAPS-06 competition (Kautz & Selman ’06)

• Verification – hardware and softwareMajor groups at Intel, IBM, Microsoft, and universitiessuch as CMU, Cornell, and Princeton.SAT has become the dominant technology.

• Many other domains: Test pattern generation, Scheduling,Optimal Control, Protocol Design, Routers, Multi-agent systems,E-Commerce (E-auctions and electronic trading agents), etc.

25

Recall: General Automated Reasoning

GeneralInferenceEngine

Solution

Domain-specific

Probleminstance

applicable to all domainswithin range of modeling language

ModelGenerator(Encoder)

Research objective

Better reasoning and modeling technology

Impact

Faster solutionsin several domains

e.g. logistics, chess,planning, scheduling, ...

Generic

26

Automated Reasoning with SAT

• A simple but useful modeling language: Boolean formulas

• Corresponding inference engine: Satisfiability or SAT algorithm(e.g. complete search, local search, message passing)

• Numerous applications:hardware and software verification, planning, scheduling, e-commerce, circuit design,open problems in algebra, …

27

Boolean Logic

Defined over Boolean (binary) variables a, b, c, …

Each of these can be True (1, T) or False (0, F)

Variables connected together with logic operators: and, or, not (denoted )

E.g. ((c d) f) is True iff either c is True and d is False, or f is True

Fact: All other Boolean logic operators can be expressed with and, or, not E.g. (a b) same as (a or b)

Boolean formula, e.g. F = (a or b) and (a and (b or c))

(Truth) Assignment: any setting of the variables to True or False

Satisfying assignment: assignment where the formula evaluates to True

E.g. F has 3 satisfying assignments: (0,1,0), (0,1,1), (1,0,0)

28

Boolean Logic: Example

F = (a or b) and (a and (b or c))

Note: True often written as 1, False as 0

• There are 23 = 8 possible truth assignments to a, b, c– (a=0,b=1,c=0) representing (a=False, b=True, c=False)

– (a=0,b=0,c=1)

– …Truth Table for F

a b c F

0 0 0 0

0 0 1 0

0 1 0 1

0 1 1 1

1 0 0 1

1 0 1 0

1 1 0 0

1 1 1 0

• Exactly 3 truth assignments satisfy F– (a=0,b=1,c=0)

– (a=0,b=1,c=1)

– (a=1,b=0,c=0)

29

Rules:1. X1 & (not X2) & X3 X42. X2 not X4

3. X5 X3 or X64. X7 X85. X8 X96. X8 X57. X6 not X9

VariablesX1 = email_ receivedX2 = in_ meetingX3 = urgentX4 = respond_to_email

X5 = near_deadlineX6 = postpone

X7 = air_ticket_info_requestX8 = travel_ requestX9 = info_request

Boolean Logic: Expressivity

All discrete single-agent search problems can be cast as a Boolean formula

Variables a, b, c, … often represent “states” of the system, “events”, “actions”, etc.(more on this later, using Planning as an example)

Very general encoding language. E.g. can handle

• Numbers (k-bit binary representation)

• Floating-point numbers

• Arithmetic operators like +, x, exp(), log()

• …

SAT encodings (generated automatically from high level languages) routinely used in domains like planning, scheduling, verification, e-commerce, network design, …

Recall Example:

“state”

“action”

constraint

“event”

30

Boolean Logic: Standard Representations

Each problem constraint typically specified as (a set of) clauses:

E.g. (a or b), (c or d or f), (a or c or d), …

Formula in conjunctive normal form, or CNF: a conjunction of clauses

E.g. F = (a or b) and (a and (b or c)) changes to

FCNF = (a or b) and (a or b) and (b or c)

Alternative [useful for QBF]: specify each constraint as a term (only “and”, “not”):

E.g. (a and d), (b and a and f), (b and d and e), …

Formula in disjunctive normal form, or DNF: a disjunction of terms

E.g. FDNF = (a and b) or (a and b and c)

clauses (only “or”, “not”)

31

Boolean Satisfiability Testing

• A wide range of applications• Relatively easy to test for small formulas (e.g. with a Truth Table)• However, very quickly becomes hard to solve

– Search space grows exponentially with formula size (more on this next)

SAT technology has been very successful in taming this exponential blow up!

The Boolean Satisfiability Problem, or SAT:

Given a Boolean formula F,

• find a satisfying assignment for F

• or prove that no such assignment exists.

32

SAT Search Space

SAT Problem: Find a path to a True leaf node.

For N Boolean variables, the raw search space is of size 2N

• Grows very quickly with N• Brute-force exhaustive search unrealistic without efficient heuristics, etc.

All vars free

Fix one variable to True or False

Fix another var

Fix a 3rd var

TrueTrueFalse False

False

Fix a 4th var

34

k-CNF, 3-CNF

k-CNF: all clauses have k literals

1-CNF SAT: trivial

2-CNF SAT: solvable in O(N2) time [N = num. of variables]

3-CNF SAT: NP-complete 4-CNF SAT: NP-complete …

Note: Any Boolean formula can be converted into CNF.-- with or without extra variables (without size increase)

35

Worst-Case Complexity

SAT is an NP-complete problem

• Worst-case believed to be exponential(roughly 2N for N variables)

• 10,000+ problems in CS are NP-complete (e.g. planning, scheduling, protein folding, reasoning)

• P vs. NP --- $1M Clay Prize

However, real-world instances are usually not pathological and can often be solved very quickly with the latest technology!

Typical-case complexity provides a moredetailed understanding and a more positive picture.

exponential

polynomial

36

Exponential Complexity Growth

Planning (single-agent): find the right sequence of actions

HARD: 10 actions, 10! = 3 x 106 possible plans

REALLY HARD: 10 x 92 x 84 x 78 x … x 2256 =

10224 possible contingency plans!

Contingency planning (multi-agent): actions may or may not produce the desired effect!

exponential

polynomial

…1 outof 10

2 outof 9

4 outof 8

37

Typical-Case Complexity

A key hardness parameter for k-SAT: the ratio of clauses to variables

Add Constraints

Delete Constraints

Problems that are not critically constrained tend to be much easier in practicethan the relatively few critically constrained ones

[Mitchell, Selman, and Levesque ’92; Kirkpatrick and Selman – Science ’94]

38

Typical-Case Complexity

Random 3-SAT as of 2004

Random Walk

DP

DP’

Walksat

SP

Linear time algs.

GSAT

Phase transition

SAT solvers continually getting close to tackling problems in the hardest region!

SP (survey propagation) now handles 1,000,000 variablesvery near the phase transition region

39

Tractable Sub-Structure Can Dominate and Drastically Reduce Solution Cost!

2+p-SAT model: mix 2-SAT (tractable) and 3-SAT (intractable) clauses

> 40% 3-SAT: exponential scaling

40% 3-SAT: linear scaling!

(Monasson, Selman et al. – Nature ’99; Achlioptas ’00)

Number of variables

Med

ian

runt

ime

How are other NP-complete problems translated into SAT instances?

“SAT encoding”

41

SAT Encoding Example: Planning Domain

Planning Problem Propositional CNF formulaby axiom schemas

Logistics planning: think of a number of trucks and planes that need to transport a bunch of packages from their origin to their destination

Discrete time, modeled by integers

• state predicates: indexed by time at which they holdE.g. at_location(x,,loc,i), free(x,i+1), route(cityA,cityB,i)

• action predicates: indexed by time at which action beginsE.g. fly(cityA,cityB,i), pickup(x,loc,i), drive_truck(loc1,loc2,i)

– each action takes 1 time step– many actions may occur at the same step

42

Encoding Rules

• Actions imply preconditions and effects

fly(x,y,i) at(x,i) and route(x,y,i) and at(y,i+1)

• Conflicting actions cannot occur at same time (A deletes a precondition of B)

fly(x,y,i) and yz not fly(x,z,i)

• If something changes, an action must have caused it(Explanatory Frame Axioms)

at(x,i) and not at(x,i+1) y . route(x,y) and fly(x,y,i)

• Initial and final states hold

at(NY,0) and ... and at(LA,9) and ...

43

Using SAT Solvers for Planning

axiomschemas instantiated

propositionalclauses

satisfyingmodelplan

mapping

length

Problem description inhigh level language

SATengine(s)

instantiate

interpret

Modeling and Solving a Planning Problem

(fully automatic)

(manual)

44

Planning Benchmark Complexity

Logistics domain – a complex, highly-parallel transportation domain

E.g. logistics.d problem:

o 2,165 possible actions per time slot

o optimal solution contains 74 distinct actions over 14 time slots

(out of 5 x 10^46 possible sequential plans of length 14)

Satplan [Selman et al.] approach is currently fastest optimal planning approach. Winner ICAPS-05 & ICAPS-06 international planning competitions.

Solution Approaches to SAT

46

Solving SAT: Systematic Search

One possibility: enumerate all truth assignments one-by-one, test whether any satisfies F

– Note: testing is easy!– But too many truth assignments (e.g. for N=1000 variables, have

21000 10300 truth assignments)

00000000

00000001

00000010

00000011

……

11111111

2N

47

Solving SAT: Systematic Search

Smarter approach: the “DPLL” procedure [1960’s]

(Davis, Putnam, Logemann, Loveland)

1. Assign values to variables one at a time (“partial” assignments)

2. Simplify F

3. If contradiction (i.e. some clause becomes False), “backtrack”, flip last unflipped variable’s value, and continue search

• Extended with many new techniques -- 100’s of research papers, yearly conference on SATe.g., extremely efficient data-structures (representation), randomization, restarts, learning “reasons” of failure

• Provides proof of unsatisfiability if F is unsat. [“complete method”]• Forms the basis of dozens of very effective SAT solvers!

e.g. minisat, zchaff, relsat, rsat, … (open source, available on the www)

48

Solving SAT: Local Search

• Search space: all 2N truth assignments for F

• Goal: starting from an initial truth assignment A0, compute assignments A1, A2, …, As such that As is a satisfying assignment for F

Ai+1 is computed by a “local transformation” to Ai

e.g. A1 = 000110111 green bit “flips” to red bit A2 = 001110111 A3 = 001110101 A4 = 101110101 … … As = 111010000 solution found!

No proof of unsatisfiability if F is unsat. [“incomplete method”] Several SAT solvers based on this approach, e.g. Walksat

49

Solving SAT: Decimation

• “Search” space: all 2N truth assignments for F• Goal: attempt to construct a solution in “one-shot” by very carefully

setting one variable at a time

• Survey Inspired Decimation:– Estimate certain “marginal probabilities” of each variable being True, False,

or ‘undecided’ in each solution cluster using Survey Propagation

– Fix the variable that is the most biased to its preferred value

– Simplify F and repeat

• A method rarely used by computer scientists• But has received tremendous success from the physics community on

random k-SAT; can easily solve random instances with 1M+ variables!

• No searching for solution• No proof of unsatisfiability [“incomplete method”]

50

The Next Two Lectures

• Problems beyond SAT / searching for a single solution

• #P-complete: count the number of solutions of a SAT instance• #P-hard: sample a solution uniformly at random for a SAT instance

• PSPACE-complete: quantified Boolean formula (QBF)

Thank you for attending!

Slides: http://www.cs.cornell.edu/~sabhar/tutorials/kitpc08-combinatorial-problems-I.ppt

Ashish Sabharwal : http://www.cs.cornell.edu/~sabhar

Bart Selman : http://www.cs.cornell.edu/selman

http://www.cs.cornell.edu/~sabhar






http://www.cs.cornell.edu/selman






Documents

Combinatorial Problems I: Finding Solutions Ashish Sabharwal Cornell University March 3, 2008 2nd Asian-Pacific School on Statistical Physics and Interdisciplinary