Search algorithms for discrete optimization

Search Algorithms for Discrete Optimization

Chp13- book: Introduction to Parallel Computing, Second Edition

e . .ng sallysalem@gmail com ,2013

mailto:[email protected]

mailto:[email protected]

Agenda• Definitions• Sequential Search Algorithms

– Depth-First Search – Best-First Search – Example to DFS

• Parallel Search Algorithms – Depth-First Search – Best-First Search – Example

Definitions (cont)

• A discrete optimization problem can be expressed as a tuple (S, f). The set S is a finite or countably infinite set of all solutions that satisfy specified constraints.

• The function f is the cost function that maps each element in set S onto the set of real numbers R.

• The objective of a DOP is to find a feasible solution xopt, such that f(xopt) ≤ f(x) for all x S.

Rsf :

Discrete Optimization: Example • The 8-puzzle problem consists of a 3×3 grid containing eight

tiles, numbered one through eight.

• One of the grid segments (called the ``blank'') is empty. A tile can be moved into the blank position from a position adjacent to it, thus creating a blank in the tile's original position.

• The goal is to move from a given initial position to the final position in a minimum number of moves.

Discrete Optimization: Example initial configuration

final configuration

a sequence of moves leading from the initial to the final configuration.

Discrete Optimization: Example

The set S for this problem is the set of all sequences of moves that lead from the initial to the final configurations.

Discrete Optimization: Example

The cost function f of an element in S is defined as the number of moves in the sequence.

Discrete Optimization Basics • heuristic estimate h(x) : the cost to reach the goal state

from an intermediate state• heuristic function: h

Sequential Search Algorithms

Depth-First Search Algorithms • Applies to search spaces that are trees.

• DFS begins by expanding the initial node and generating its successors. In each subsequent step, DFS expands one of the most recently generated nodes.

• If there exists no success, DFS backtracks to the parent and explores an alternate child.

• Often, successors of a node are ordered based on their likelihood of reaching a solution. This is called directed DFS.

• The main advantage of DFS is that its storage requirement is linear in the depth of the state space being searched.

States resulting from the first three steps of depth-first search applied to an instance of the 8-puzzle.

Depth-First Search Algorithms

Depth-First Search Algorithms • 1-Simple Backtracking

start ?

?dead end

dead end

??

dead end

dead end

?

success!

dead end

Depth-First Search Algorithms 1-Simple Backtracking • Simple backtracking performs DFS until it finds the first feasible

solution and terminates. • Not guaranteed to find a minimum-cost solution.

Depth-First Search Algorithms 2-Depth-First Branch-and-Bound (DFBB)• Exhaustively searches the state space, that is, • It continues to search even after finding a solution path. • Whenever it finds a new solution path, it updates the

current best solution path

Depth-First Search Algorithms 3- Iterative deepening depth-first search (ID-DFS):

Select depth = 0

F is not the goal then back and increase the

depth

Depth-First Search Algorithms 3- Iterative deepening depth-first search (ID-DFS):

Select depth = 1

B is not the goal the move to G

F is not the goal then back and increase the

depth ..…and so on

until get the goal

Depth-First Search Algorithms 3- Iterative deepening depth-first search (ID-DFS):• Used for trees of deep search space, we impose a bound on

the depth to which the DFS algorithm searches. • If a solution is not found, then the entire state space is

searched again using a larger depth bound.

Representing a DFS tree: (a) the DFS tree; Successor nodes shown with dashed lines have already been explored; (b) the stack storing untried alternatives only; and (c) the stack storing untried alternatives along with their parent. The shaded blocks represent the parent state and the block to the right represents successor states that have not been explored.

Depth-First Search Algorithms

Best-First Search Algorithms • We have total weight and weight to each step, depend on the weight we select

next node

First only have F no weight , so expand the F

W5

W4 W3 W2

W1 W0

If W0 and W1 not the goal and both weights < total weight then expand the B, G

Calculate W0 + W2 < total weight

If ( W0 + W 3 ) < Total Weight Then

Move to I..

Calculate W0 + W3 > total weight ,

Then remove D

Best-First Search Algorithms • Can search both graphs and trees• BFS algorithms use a heuristic to guide search. • BFS maintains two lists: open and closed.• At the beginning, the initial node is placed on the open list.• This list is sorted according to a heuristic evaluation function• In each step of the search, the most promising node from the open list is

removed. • If this node is a goal node, then the algorithm terminates.• Otherwise, the node is expanded. The expanded node is placed on the

closed list.

Search Overhead Factor • The search overhead factor of the parallel system is defined as the ratio of the

work done by the parallel formulation to that done by the sequential formulation Wp/ W

• Where W be the amount of work done by a single processor and Wp be the total amount of work done by p processors

• Upper bound on speedup is p x (W/ Wp)– Speedup may be less due to other parallel processing overhead.

Example for DFS

Parallel Search Algorithms

Parallel Depth-First Search• The critical issue in parallel depth-first search algorithms is the distribution of

the search space among the processors.

In Case use 2 processors (A , B) :one processor is idle for a significant amount of time, reducing efficiency.

In Case Large number of processors add C, D, E, F :The static partitioning of unstructured trees yields poor performance

Parallel Depth-First Search• Dynamic load balancing: when a processor runs out of

work, it gets more work from another processor that has work.

Parallel Depth-First Search: Dynamic Load Balancing

A generic scheme for dynamic load balancing .

Parallel Depth-First Search: Dynamic Load Balancing

• Steps– each processor can be in one of two states: active or idle.– In message passing architectures, an idle processor selects a donor

processor and sends it a work request.– If the idle processor receives work (part of the state space to be searched)

from the donor processor, it becomes active. – If it receives a reject message (because the donor has no work), it selects

another donor and sends a work request to that donor.– This process repeats until the processor gets work or all the processors

become idle.

Important Parameters of Parallel DFS• Two characteristics of parallel DFS are critical to

determining its performance1. The method for splitting work at a processor2. The scheme to determine the donor processor when a processor

becomes idle.

Splitting the DFS tree: the two subtrees along with their stack representations are shown in (a) and (b) .

Parameters in Parallel DFS: Work Splitting

Parameters in Parallel DFS: Work Splitting

• Work is split by splitting the stack into two

• If too little work is sent, the recipient quickly becomes idle

• if too much, the donor becomes idle

• Ideally, we do not want either of the split pieces to be small

• Using a half-split the stack is split into two equal pieces

• Cutoff depth: To avoid sending very small amounts of work, nodes beyond a specified stack depth are not given away

Parameters in Parallel DFS: Load-Balancing Schemes

1. Asynchronous round robin2. Global round robin3. Random polling

• Each of these schemes can be coded for message passing as well as shared address space machines.

Parameters in Parallel DFS: Load-Balancing Schemes

• Asynchronous round robin (ARR): Each processor maintains a counter and makes requests in a round-robin fashion.

• Global round robin (GRR): The system maintains a global counter and requests are made in a round-robin fashion, globally.

• Random polling: Request a randomly selected processor for work.

A General Framework for Analysis of Parallel DFS

• For each load balancing strategy, we define V(P) as the total number of work requests after which each processor receives at least one work request (note that V(p) ≥ p).

• Assume that the largest piece of work at any point is W.

• After V(p) requests, the maximum work remaining at any processor is less than (1 - )W; αafter 2V(p) requests, it is less than (1 - )α 2 W.

• After (log1/1(1- α )(W/ ))V(p) ε requests, the maximum work remaining at any processor is below a threshold value . ε

• The total number of work requests is O(V(p)log W).


• If tcomm is the time required to communicate a piece of work, then the communication overhead To is given by

T0 = tcommV(p)log W

The corresponding efficiency E is given by


• To depends on two values: tcomm and V(p). The value of tcomm is determined by the underlying architecture, and the function V(p) is determined by the load-balancing scheme.

• So lets Computation of V(p) for Various Load-Balancing Schemes


• Asynchronous Round Robin: when all processors issue work requests at the same time to the same processor. V(p) = O(p2) in the worst case.

• Global Round Robin: all processors receive requests in sequence V(p) = p.

• Random Polling: Worst case V(p) is unbounded V (p) = O (p log p).

• Asynchronous round robin has poor performance because it makes a large number of work requests.

• Global round robin has poor performance because of contention at counter, although it makes the least number of requests.

• Random polling strikes a desirable compromise.

Analysis of Load-Balancing Schemes: Conclusions

Termination Detection • How do you know when everyone's done? • A number of algorithms have been proposed.

• Techniques1. Dijkstra's Token Termination Detection 2. Tree-Based Termination Detection

Dijkstra's Token Termination Detection P0

p4

p3

P1

P2

P0 P0 goes idle, it colors itself green and initiates a green token.

If pi is ideal Move to next p

The algorithm terminates when processor P0 receives a green token and is itself idle.

Dijkstra's Token Termination Detection • Now, let us do away with the restriction on work transfers.

• When processor P0 goes idle, it colors itself green and initiates a green token.

• If processor Pj sends work to processor Pi and j > i then processor Pj becomes red.

• If processor Pi has the token and Pi is idle, it passes the token to Pi+1. If Pi is red, then the color of the token is set to red before it is sent to Pi+1. If Pi is green, the token is passed unchanged.

• After Pi passes the token to Pi+1, Pi becomes green .

• The algorithm terminates when processor P0 receives a green token and is itself idle.

Tree-Based Termination Detection • Associate weights with individual workpieces. Initially, processor P0 has all the

work and a weight of one.

• Whenever work is partitioned, the weight is split into half and sent with the work.

• When a processor gets done with its work, it sends its parent the weight back.

• Termination is signaled when the weight at processor P0 becomes 1 again.

• Note that underflow and finite precision are important factors associated with this scheme.

Tree-Based Termination Detection

Tree-based termination detection. Steps 1-6 illustrate the weights at various processors after each work transfer

Parallel Best-First Search • The core data structure is the Open list (typically implemented as a priority queue).

• Each processor locks this queue, extracts the best node, unlocks it.

• Successors of the node are generated, their heuristic functions estimated, and the nodes inserted into the open list as necessary after appropriate locking.

• Termination signaled when we find a solution whose cost is better than the best heuristic value in the open list.

• Since we expand more than one node at a time, we may expand nodes that would not be expanded by a sequential algorithm.

Parallel Best-First Search

A general schematic for parallel best-first search using a centralized strategy. The locking operation is used here to serialize queue access

by various processors .

Parallel Best-First tree Search Three strategies for tree search:1. Random communication strategy 2. Ring communication3. Blackboard communication.

Parallel Best-First tree Search 1. Random communication strategy• each processor periodically sends some of its best nodes to the

open list of a randomly selected processor.

• This strategy ensures that, if a processor stores a good part of the search space, the others get part of it.

• The cost of communication determines the node transfer frequency.

Parallel Best-First tree Search 2. Ring communication strategy• In the ring communication strategy, the processors are mapped in a virtual ring.

• Each processor periodically exchanges some of its best nodes with the open lists of its neighbors in the ring.

• This strategy can be implemented on message passing as well as shared address space machines with the processors organized into a logical ring.

• The cost of communication determines the node transfer frequency.

Parallel Best-First tree Search 2. Ring communication strategy

A message-passing implementation of parallel best-first search using the ring communication strategy.

Parallel Best-First tree Search 3. Blackboard communication • there is a shared blackboard through which nodes are switched among processors as follows.

• After selecting the best node from its local open list, a processor expands the node only if its value is within a tolerable limit of the best node on the blackboard.

• If the selected node is much better than the best node on the blackboard, the processor sends some of its best nodes to the blackboard before expanding the current node.

• If the selected node is much worse than the best node on the blackboard, the processor retrieves some good nodes from the blackboard and reselects a node for expansion.

• The blackboard strategy is suited only to shared-address-space computers, because the value of the best node in the blackboard has to be checked after each node expansion.

Parallel Best-First tree Search 3. Blackboard communication

An implementation of parallel best-first search using the blackboard communication strategy.

Speedup Anomalies in Parallel Search • Since the search space explored by processors is determined dynamically at

runtime, the actual work might vary significantly.

• Executions yielding speedups greater than p by using p processors are referred to as acceleration anomalies. Speedups of less than p using p processors are called deceleration anomalies.

• Speedup anomalies also manifest themselves in best-first search algorithms.

• If the heuristic function is good, the work done in parallel best-first search is typically more than that in its serial counterpart.

Speedup Anomalies in Parallel Search

The difference in number of nodes searched by sequential and parallel formulations of DFS. For this example, parallel DFS reaches a goal node after searching fewer nodes than

sequential DFS .

Speedup Anomalies in Parallel Search

A parallel DFS formulation that searches more nodes than its sequential counterpart

Example

Questions

Questions1. How calculate Search Overhead Factor? Ans: Page 21 2. What is Termination Detection Techniques? Ans: Page 38, 39, 40, 41, 423. Descript the effect of the Speedup Anomalies in Parallel Search with

examples? Ans: Page 59, 60, 614. Write sequential for search programs? Ans: Page 225. Write parallel for search programs? Ans: Page 54

Thank you

Engineering

Search algorithms for discrete optimization