28
1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

  • View
    220

  • Download
    4

Embed Size (px)

Citation preview

Page 1: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

1

Towards Efficient Sampling: Exploiting Random Walk Strategy

Wei Wei, Jordan Erenrich, and Bart Selman

Page 2: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

2

Motivations

Recent years have seen tremendous improvements in SAT solving. Formulas with up to 300 variables (1992) to formulas with one million variables.

Various techniques for answering

“does a satisfying assignment exist for a formula?” But there are harder questions to be answered .

“how many satisfying assignments does a formula have?” Or closely related “can we sample from the satisfying assignments of a formula?”

Page 3: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

3

SAT is NP-complete. 2-SAT is solvable in linear time.

Counting assignments (even for 2cnf) is #P-complete, and is NP-hard to approximate (Valiant, 1979).

Approximate counting and sampling are equivalent if the problem is “downward self-reducible”.

Complexity

Page 4: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

4

Challenge

Can we extend SAT techniques to solve harder counting/sampling problems?

Such an extension would lead us to a wide

range of new applications.

SAT testing counting/sampling

logic inference probabilistic reasoning

Page 5: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

5

Standard Methods for Sampling - MCMC

Based on setting up a Markov chain with a predefined stationary distribution.

Draw samples from the stationary distribution by running the Markov chain for sufficiently long.

Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

Page 6: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

6

Simulated Annealing

Simulated Annealing uses Boltzmann distribution as the stationary distribution.

At low temperature, the distribution concentrates around minimum energy states.

In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability.

Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

Page 7: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

7

Standard Methods for Counting

Current solution counting procedures extend DPLL methods with component analysis.

Two counting precedures are available. relsat (Bayardo and Pehoushek, 2000) and cachet (Sang, Beame, and Kautz, 2004). They both count exact number of solutions.

Page 8: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

8

Question: Can state-of-the-art local search procedures be used for SAT sampling/counting? (as alternatives to standard Monte Carlo Markov Chain and DPLL methods)

Yes! Shown in this talk

Page 9: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

9

Our approach – biased random walk Biased random walk = greedy bias +

pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT.

Can we use it to sample from solution space?

– Does WalkSat reach all solutions?

– How uniform is the sampling?

Page 10: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

10

WalkSat

visited 500,000 times

visited 60 times

Hamming distance

Page 11: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

11

Probability Ranges in Different Domains

Instance Runs Hits Rarest

Hits Common

Common-to -Rare Ratio

Random 50 106

53 9 105 1.7 104

Logistics 1 106 84 4 103 50

Verif. 1 106 45 318 7

Page 12: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

12

Improving the Uniformity of Sampling

SampleSat:– With probability p, the algorithm makes a

biased random walk move– With probability 1-p, the algorithm makes a

SA (simulated annealing) move

WalkSat

Nonergodic

Quickly reach sinks

Ergodic

Slow convergence

Ergodic

Does not satisfy DBC

SA = SampleSat+

Page 13: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

13

Comparison Between WalkSat and SampleSat

WalkSat SampleSat

104

10

Page 14: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

14

SampleSat

Hamming Distance

Page 15: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

15

Instance Runs Hits Rarest

Hits Common

Common-to -Rare Ratio

WalkSat

Ratio SampleSat

Random 50 106

53 9 105 1.7 104 10

Logistics 1 106 84 4 103 50 17

Verif. 1 106 45 318 7 4

Page 16: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

16

Analysis

c1 c2 c3 … cn a bF F F … F F F

F F F … F F T

Page 17: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

17

Property of F*

Proposition 1 SA with fixed temperature takes exponential time to find a solution of F*

This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

Page 18: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

18

Analysis, cont.

c1 c2 c3 … cn aT T T … T T

F F F … F T

F F F … F F

Proposition 2: pure RW reaches this solution with exp. small prob.

Page 19: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

19

SampleSat

In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat.

instance WalkSat SampleSat SA

random 382 677 24667

logistics 5.7 104 15.5 105 > 109

verification 36 65 10821

Page 20: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

20

SampleSat, cont.

After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA.

Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly.

This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

Page 21: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

21

Verification on Larger formulas - ApproxCount Small formulas -> Figures, solution

frequencies. How to verify on large formulas? ApproxCount.

ApproxCount approximates the number of solutions of Boolean formulas, based on SampleSat algorithm.

Besides using it to justify the accuracy of our sampling approach, ApproxCount is interesting on its own right.

Page 22: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

22

Algorithm

The algorithm works as follows (Jerrum and Valiant, 1986):

1. Pick a variable X in current formula2. Draw K samples from the solution space3. Set variable X to its most sampled value t,

and the multiplier for X is K/#(X=t). Note 1 multiplier 2

4. Repeat step 1-3 until all variables are set5. The number of solutions of the original

formula is the product of all multipliers.

Page 23: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

23

Accumulation of Errors

#variables Sample error Overall error

200 10%

1%

1.9 105

7.3

400 10%

1%

3.6 1016

53.5

800 10%

1%

1.3 1033

2865

Page 24: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

24

Within the Capacity of Exact Counters We compare the results of approxcount with those of the exact

counters.

instances #variables Exact count

ApproxCount Average Error

prob004-log-a 1790 2.6 1016

1.4 1016 0.03%

wff.3.200.810 200 3.6 1012

3.0 1012 0.09%

dp02s02.shuffled 319 1.5 1025

1.2 1025 0.07%

Page 25: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

25

And beyond …

We developed a family of formulas whose solutions are hard to count– The formulas are based on SAT encodings

of the following combinatorial problem– If one has n different items, and you want

to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

Page 26: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

26

Hard Instances

Encoding of P(20,10) has only 200 variables, but neither cachet or Relsat was able to count it in 5 days in our experiments.

On the other hard, ApproxCount is able to finish in 2 hours, and estimates the solutions of even larger instances.

instance #variables #solutions ApproxCount Average Error

P(30,20) 600 7 1025 7 1024 0.4%

P(20,10) 200 7 1011 2 1011 0.6%

Page 27: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

27

Summary

Small formulas -> complete analysis of the search space

Larger formulas -> compare ApproxCount results with results of exact counting procedures

Harder formulas -> handcraft formulas compare with analytic results

Page 28: 1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

28

Conclusion and Future Work Shows good opportunity to extend

SAT solvers to develop algorithms for sampling and counting tasks.

Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.