Download pdf - IS703: Decision Support and Optimization - mysmu.edu · IS703: Decision Support and Optimization Week 10: ... (tabu search, simulated ... Lin-Kernighan Heuristic

1

IS703:Decision Support and Optimization

Week 10: Local Search

Lau Hoong ChuinSchool of Information Systems

2

Reference:• Johnson and McGeoch, The Traveling Salesman Problem - A

Case Study in Local Optimization. In Local Search in Combinatorial Optimization, E. H. L. Aarts and J. K. Lenstra(eds), John-Wiley and Sons, 1997

Objectives:• To learn the basic local search paradigm for solving (NP-

hard) optimization problems. Why?• To design LS-based meta-heuristics (tabu search, simulated

annealing, genetic algorithms, ACO, etc)

Local Search

3

Heuristics• Optimization problem:

minx∈Sf(x), where S=feasible solutions space(find a feasible x such that f is minimized)

• Heuristic algorithms do not guarantee to find an optimal solution (as opposed to exact algorithms)

• However, they are usually designed to find a “reasonably good” solution quickly.

• Solutions to optimization problems:– Global Optimum: Equal/better than all other solutions– Local Optimum: Equal/better than all solutions in a

certain neighborhood

4

Global vs. Local Optimum

5

Why Do We Need Heuristics?• Many real world applications are too large to be

solved by exact methods (e.g. Branch and Bound, Cutting Planes, etc.)

• Solutions are usually needed fast (sometimes even in real time)– for hard problems, the time for finding an optimal

solution is usually much greater than finding a near-optimal solution

• Optimal solutions, although desirable, are often not needed in the real world.

6

Local SearchDefinitions:• N (neighborhood) : S→2s

• The size of a neighborhood is the number of solutions in the neighborhood set

Algorithm:1. Pick a starting solution s // by other algorithm2. If f(s) < mins’ in N(s) f(s’), stop // what is the time complexity?3. Else set s = s’ s.t. f(s’) = mins’’ in N(s) f(s’’)4. Goto Step 2

7

Local Search

An iteration in a local search

Current Solution

GenerateNeighborhood

Evaluate Neighborhood

Choose Neighbor

8

Example: Traveling Salesman Problem Given:

A set of n cities {c1, c2, …, cn}Cost matrix, containing cost to travel between cities

Find a minimum-cost tour.

9

Solving TSP Heuristically

3 Broad Steps:1. Construction Heuristics

– Constructing a tour from scratch – Nearest neighbor, insertion heuristic

2. Local Search Algorithms– Improving an existing tour

3. Extensions – “Escaping” from locally optimal tours– Meta-heuristics

10

Construction Heuristics (Nearest Neighbor)

Step 1: Choose starting city p at randomStep 2: Scan remaining cities for the city NNp

nearest to pStep 3: Add edge (p,NNp) to the sequenceStep 4: Set p = NNp

Step 5: Repeat from Step 2 until all cities are addedStep 6: Add last edge to complete a tour

11

Some Variants of Nearest Neighbour• Double-Sided Nearest Neighbour

– Grow the nearest neighbour sequence from both ends• Randomized Nearest Neighbour

– Instead of using the nearest neighbour of a city, pick a neighbour at random from the set of remaining knearest neighbours of the city and add it to the sequence

12

Multiple Fragment HeuristicBasic Idea:

Build a tour one edge at a time, by adding the shortest remaining available edge to the tour edge set.

Step 1: Order the set of edges in increasing length in a listStep 2: Pick first available edge on the list Step 3: Remove edge from list and add it to the tour edge setStep 4: Repeat from step 2 until there are no more available

edges on the list

Note: similarity with Kruskal’s algorithm for MSTs

13

Multiple Fragment Tour:

14

Insertion Heuristics• Basic Idea:

Starting with a subtour made of 2 arbitrary chosen cities grow a tour by inserting cities which fulfill some criteria.

• The criteria usually depends on the cities which are already in the tour

Step 1: Form a subtour by choosing 2 cities at randomStep 2: Scan remaining cities for the city which fulfills

insertion criteriaStep 3: Insert the city to the subtourStep 4: Repeat Step 2 until all cities are in the tour

15

Some Insertion Criteria• Nearest Insertion:

– Insert the city that has shortest distance to a tour city• Farthest Insertion:

– Insert the city whose shortest distance to a tour city is maximized

• Cheapest Insertion:– Choose the city whose insertion causes the least

increase in length of the resulting subtour• Random Insertion

– Select the city to be inserted at random

16

Nearest vs Cheapest Insertion

17

AnalysisNumber of steps:

n-2 cities are to be insertedFor each insertion the set of remaining cities has to be scanned : O(n)

Worst case run time: O(n2)

18

Local Search Algorithm for TSP • Create an initial tour // discussed previous lecture • Define a local search neighborhood• Two tours are neighbors if they share most of their edges

• Edge exchange neighborhood:

Delete k edges in the current tour and then add kedges which form a new feasible tour

This neighborhood is called k-opt.

19

2-opt• Delete 2 edges, add 2 edges to restore the tour

• 2-opt removes the “crossings” of edges in a tour

20

2-opt Analysis• How many possible ways?• Size of the 2-opt neighborhood:

– Number of edge pairs = n(n-3)/2– Exactly 1 way to restore tour

• At each step of Local Search, evaluate O(n2) new tours and choose the best available.

Worst-case run time: O(n2)

21

3-opt• Delete 3 edges, add 3 edges to restore the tour

• 3-opt can remove subsequences of the tour

22

3-opt Analysis• Size of the 3-opt neighborhood:

– Number of edge triples = – Each of which has 8 possible ways to be restored– Some neighbors may not be valid

• At each step of Local Search, O(n3) new toursworst case run time: O(n3)

(1) (2)

⎟⎟⎠

⎞⎜⎜⎝

⎛3n

23

k-opt• It may seem sensible to increase k even further.• However, once k edges of a tour have been

deleted there are O(nk) new tours. Expensive!• Observe that some 3-opt moves are equivalent to

applying two 2-opt moves

24

Lin-Kernighan Heuristic

• First observed by Lin and Kernighan• One of most successful heuristics to solve

combinatorial optimization problems• “Adaptive” k-Opt• Build k-opt moves (with variable k) by a sequence of

2-opt moves• "Tabu lists” for edges added and deleted• Multiple restarts

25

Given- k vehicles, each with fixed capacity - a set of customers, having location,

time window, service duration and demand- cost matrix [cij]

Find min-cost vertex-disjoint feasible routes to cover all customers.

Finding a feasible solution is NP-hard, even k=1!

Vehicle Routing Problem with Time Windows

26


11:00~12:0010units

11:30~12:3020units

10:30~12:0010units

11:00~11:3030units

Depot

27

Vehicle Routing Problem with Time WindowsNeighborhood

Relocate:Relocate:

Distribute:Distribute:

Exchange:Exchange:

28


22--Opt:Opt:

Neighborhood

Question:Question:

What is the time complexity of each neighborhood?

29

Iterated Local Search• Problem of Local Search:

– may get stuck in a local optimum of poor quality– very dependent on the starting tour

• Solution:• Apply Local Search to more than one tours• Search Diversification

30

Multi-start• Step 1: Create a feasible tour (randomly or by

using a construction heuristic)• Step 2: Apply Local Search until local optimality• Step 3: Repeat from Step 1 until some stopping

criteria is met• Step 4: Output best tour found during this process

31

Multi-start: Random vs Constructive Restarts• Random:

• creating a random tour is fast O(n)• Local Search is likely to be slow• Local Search usually does not perform very well

• Construction Heuristics• creating a new tour is slow (best case O(n2))• Local Search is usually quite fast• Local Search usually performs better when started from

good tours

32

Chained Local SearchMotivation:

Why abandon a tour that is already locally optimal?

Basic concept:Instead of restarting local search from a new tour, modify current tour (kick) and restart local search on modified tour

33

Chained Local SearchStep 1: Create an initial tour Step 2: Apply Local SearchStep 3: Modify the locally optimal tour in a way that Local

Search cannot repairStep 4: Apply Local Search only to the modified parts of the

tourStep 5: Repeat from step 3 until stopping criteria is metStep 6: Output best tour found

34

Local Search for Satisfiability

• WalkSat( C) Guess initial assignment While unsatisfied or not time out do

Select from C an unsatisfied clause c = ±Xi v ±Xj v ±XkSelect variable v in unsatisfied clause c Flip v

• GSat( C) Guess initial assignment While unsatisfied or not time out do

Flip value assigned to the variable that yields the greatest # of satisfied clauses. (Note: Flip even if no improvement)

If sat assignment not found, repeat entire process, starting from a different initial random assignment.

3SAT:Input: 3CNF, Output: Yes/No whether input can be satisfied

35

Example:

36

Mixing Random Walk with Greedy

• Value of p determined empirically, by finding best setting for problem class

37

Finding the best value of p• Q: What value for p? • Let Q[p, c] be quality of using MixedWalkSat[p] on

problem c. – Q[p, c] = Time to return answer, or

= 1 if MixedWalkSat[p] return (correct) answer within 5minutes and 0 otherwise, or

= . . . perhaps some combination of both . . . • Then, find p that maximize the average performance of

MixedWalkSat[p] on a set of challenge problems.

38

Experimental Results: “Hard” Random 3CNF

• Time in seconds• Effectiveness: probability that random initial

assignment leads to a solution. • Test instances up to 400 variables

– MixedWalkSat better than Simulated Annealing – MixedWalkSat better than Basic GSAT

Source: Selman and Kautz 1993

39

Advanced Local Search Paradigms(Meta-heurstics) •• ““An iterative master process that An iterative master process that guides and modifiesguides and modifies the the

operations of a subordinate heuristic to efficiently produce operations of a subordinate heuristic to efficiently produce high quality solutionshigh quality solutions”” (Voss et al, 2002)(Voss et al, 2002)

• Different philosophies and forms• Single-point

– Tabu Search– Simulated Annealing– Greedy Randomized Adaptive Search Procedure

• Population-based– Evolutionary (Genetic) Algorithms– Ant Colony Optimization– Particle Swarm

• Hybrids, e.g. Hyper-Heuristics

40

What can go wrong with Local Search?• Caught in local optimality

– Diversify : multiple (random restarts), etc• Cycling:

41

What can go wrong with Local Search? Analogy

42

Tabu Search

• A tabu list is maintained for forbidden (tabu) moves, to avoid cycling

• Tabu moves are based on the short- and long-term history of the search process.

• Aspiration criteria is the condition which allows the tabu status of a tabu move to be overwritten so that the move can be considered at the iteration.

• The next move is the best move among the feasible moves from the neighborhood of the current solution. A tabu move is taken if it satisfies the aspiration criteria.

Fred Glover (1986)

43

Tabu Search Algorithm

1. construct an initial solution s2. while not finished3. compute N(s), T(s), A(s)4. choose s’∈(N(s) - T(s)) ∪ A(s) s.t. cost(s’) is min5. s = s’6. endwhile

• s: current solution• N(s): neighborhood set of s where N(s) ⊆ S• T(s): tabu set of s where T(s) ⊆ N(s)• A(s): aspiration set of s where A(s) ⊆ T(s)

44

Tabu Search Algorithm

Step 3

Move operate on the best-picked neighbor to generate new solution

Step 1 Step 2

Step 4Step 5

Step 6

Tabu search Stopped

Set Current Solution

Define the Neighborhood

Update the TabuList and trigger any related events

Stopping conditions meet

ObjectiveFunction used to evaluate each of the neighbors

TabuList and AspirationCriteria are consulted

45

Example: Constrained MST Problem

x1 x2x3

x4

x6 x7

x5

6

2 0

8 12

189

Constraints:

(1) x1 + x2 + x6 ≤ 1

(2) x1 ≤ x3

Penalty of each constraint violation= 50

46

Example • Neighborhood: standard “edge swap”• An edge is tabu if it was added within the last two

iterations• The aspiration criteria is satisfied if the tabu move

would create a tree that is better than the best tree so far

47

Example • Iteration 1 (initial MST)

x1 x2x3

x4

x6 x7

x5

6

2 0

8 12

189

cost = 16 + 100

AddDrop

Constraints:

(1) x1 + x2 + x6 ≤ 1

(2) x1 ≤ x3

Penalty of each constraint violation= 50

48

Example • Iteration 2

x1 x2x3

x4

x6 x7

x5

6

2 0

8 12

189

cost = 28

Tabu

AddDrop

49


x1 x2x3

x4

x6 x7

x5

6

2 0

8 12

189

cost = 32

Tabu

Tabu

DropAdd

Edge x3 is aspired.

50


x1 x2x3

x4

x6 x7

x5

6

2 0

8 12

189

cost= 23

Tabu

Tabu

51

TabuTabu Search forSearch for TSPTSPTabu ListTabu List

Proposal 1Proposal 1: : TabuTabu--inging the solutionthe solution•• Record the complete tourRecord the complete tour•• Strictly prevent solution cyclingStrictly prevent solution cycling•• Verification required checking all rotationsVerification required checking all rotations

Proposal 2Proposal 2: : TabuTabu--inging the movethe move•• Record the recent made movesRecord the recent made moves•• Easy to verifyEasy to verify•• Less restrictive: Does not prevent cycling entirelyLess restrictive: Does not prevent cycling entirely

Proposal 3Proposal 3: : TabuTabu--inging the objective valuethe objective value•• Record the solutionRecord the solution’’s objective values objective value•• Easy to verifyEasy to verify•• More restrictive: May accidentally miss out good solutions More restrictive: May accidentally miss out good solutions

52

TTABUABU SSEARCH EARCH Advanced StrategiesAdvanced Strategies

Reactive Tabu ListReactive Tabu List•• Tabu tenure varies according to search historyTabu tenure varies according to search history•• Lengthen tenure if a series of poor solutions is encountered Lengthen tenure if a series of poor solutions is encountered •• Shorten tenure if good (elite) solution is encounteredShorten tenure if good (elite) solution is encountered

IntensificationIntensification•• Focus on solutions with Focus on solutions with ““goodgood”” characteristicscharacteristics•• Idea is to find better solutions around elite solutions Idea is to find better solutions around elite solutions •• Usually applied on local optimalUsually applied on local optimal

DiversificationDiversification•• Forcing search to move away from specific characteristicsForcing search to move away from specific characteristics•• Idea is to explore new regions of the search spaceIdea is to explore new regions of the search space•• Usually applied when potential of finding better solution is lowUsually applied when potential of finding better solution is low

53

Simulated Annealing

Let T be non-negativeLoop:

pick random neighbor s’ in N(s)let D = f(s) – f(s’) (improvement)if D > 0 (better), s = s’else with probability eD/T , set s = s’

As T tends to 0, hill climbingAs T tends to ∞, random walkIn general, start with large T and decrease it slowly.

Kirkpatrick et al. (1982); Metropolis et al. (1953)

54

Intuition Behind Simulated Annealing• A stochastic variation on hill climbing in which

downhill moves can be made.• The probability of making a downhill move decreases

with time (length of the exploration path from a start state).

• Model after the physical process of annealing metals(cooling molten metal to solid minimal-energy state)– solutions = states – cost of solution = energy of state

55

Intuition Behind Simulated Annealing• During the annealing process in metals, there is

a probability p that a transition to a higher energy state (which is sub-optimal) occurs. This probability is edE/T where– dE = (energy of previous state) – (energy of current

state) (it is < 0)– T = temperature of the metal

• p is higher when T is higher, and movement to higher energy states become less likely as the temperature comes down.

• The rate at which system is cooled is called the annealing schedule.

56

Effect of Varying dE for fixed T=10

1.000

0.27-13

0.01-43e dE/10dE

The smaller thevalue of dE, the bigger the step wouldbe downhill, and the lower the probabilityof taking this downhillmove.

57

Effect of Varying T for fixed dE = -13

0.9999…1010

0.5650

0.0000021e -13/TT

The greater thevalue of T, the smaller the relativeimportance of dEand the higher theprobability of choosing a downhillmove.

58

Logistic (sigmoid) Function• select any move (uphill or downhill) with probability 1/(1+e –dE/T)

dE

1

0.5T very high

p T near 0

0-20 20

59

Annealing Schedule

After each step, how to update T?• Logarithmic: Ti = γ/log(i+2)

provable properties, slow• Geometric: Ti = T0 γ int(i/L)

lots of parameters to tune• Adaptive

change Ti dynamically

60

Properties of Simulated Annealing• If the annealing schedule lowers T slowly enough

(exponentially), algorithm can find global optimum.• If large running time is allowed, SA generally

outperforms all other meta-heurstics with respect to quality of solutions.

61

Population-Based Meta-HeuristicsHybrid Meta-Heuristics

Next Week…