Massively Parallel Seesaw Search for Minimum Vertex Cover › ~ark › students › mmn4185 › report.pdf · Vertex Cover problem where it uses the idea of solving the decision problem

Massively Parallel Seesaw Search forMinimum Vertex Cover

by

Mansi Nahar

A Project Report Submittedin

Partial Fulfillment of theRequirements for the Degree of

Master of Sciencein

Computer Science

Supervised by

Prof. Alan Kaminsky

Department of Computer Science

B. Thomas Golisano College of Computing and Information SciencesRochester Institute of Technology

Rochester, New York

May 2017

ii

Dedication

I wouldn’t have come this far in life without the constant support of my parents and my

brother. Both, financially and emotionally. So, this one is to them!

iii

Acknowledgments

The completion of this project wouldn’t have been possible without the guidance provided

by Prof. Alan Kaminsky. I am also grateful to my friends and family members for their

constant support and understanding.

iv

Abstract

Massively Parallel Seesaw Search for Minimum Vertex Cover

Mansi Nahar

Supervising Professor: Prof. Alan Kaminsky

Graph theory has always been a key research area in the field of Computer Science but

recently, with the increase in the amount of data, there has been more focus on graph prob-

lems for mining data from large scale/massive graphs. These graphs include web networks,

social media user-interaction networks, road traffic networks, etc. A lot of interesting ap-

proaches and results have been obtained in the last few decades. In the area of research

on various NP-hard problems, Minimum Vertex Cover is one of the fundamental problems

due to its applications in computational biochemistry, scheduling problems, surveillance

system deployment, etc. Also, Minimum Vertex cover can be directly related to other NP-

hard problems, namely, Maximum Independent Set and Maximum Clique which have their

own applications. Although over the years a lot of different approaches have been tried for

this problem, not a lot of effort is been put towards making use of the high computational

power of parallel systems. In this paper we propose, a Massively parallel Seesaw search

for Minimum Vertex Cover.

v

Contents

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.0.1 Branch and Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 63.0.2 Seesaw Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.0.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

vi

List of Tables

4.1 Results for various BHOSLIB[16] graphs . . . . . . . . . . . . . . . . . . 254.2 Branch & Bound and Intelligent Seesaw search results for real world graphs 264.3 Intelligent Seesaw search results for real world graphs . . . . . . . . . . . . 26

vii

List of Figures

1.1 Vertex Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

3.1 A graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Branch and bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Branch and bound with pruning . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Weak Scaling for Naive Seesaw search with 100000 steps . . . . . . . . . . 204.2 Weak Scaling for Naive Seesaw search with 1000000 steps . . . . . . . . . 214.3 Weak Scaling for Seesaw Intelligent with 1000 steps . . . . . . . . . . . . 224.4 Weak Scaling for Seesaw Intelligent with 10000 steps . . . . . . . . . . . . 23

1

Chapter 1

Introduction

For the purpose of this paper, we are going to consider the following vocabulary: Consider

an undirected graph G = (V,E) where V is the set of vertices and E is the set of edges in

the graph. A vertex cover of a graph is a subset of vertices in that graph such that each edge

is incident to at least one vertex in that set. If a vertex cover is such, that removing any

single vertex from it will make it no longer a vertex cover, then it is known as a minimal

vertex cover whereas a minimum vertex cover is a vertex cover with the minimum number

of vertices in it. Consider, figure 1.1. They are all examples of vertex covers for the same

undirected graph. For all of them, vertices in blue represent the vertices in the vertex cover.

The difference between them is that the first one represents a vertex cover for the graph,

the second one represents a minimal vertex cover and the third one represents a minimum

vertex cover for the graph.

Figure 1.1: Vertex Covers

2

Minimum Vertex Cover is one of the combinatorial optimization problems and is also

closely related to other such NP-hard problems such as the Maximum Independent Set

problem and the Maximum Clique Problem. The Maximum Independent Set problem is

the problem of finding a subset of vertices in a graph such that no two vertices in that subset

are connected by an edge. On the other hand, the Maximum Clique problem is to find a

subset of vertices in the graph such that all the vertices in the subset are connected to each

other. A Maximum Independent Set in a graph G would be the complement of the vertices

that are in the Minimum Vertex Cover and a Maximum Clique in a Graph G would be the

complement of the vertices that are in the Minimum Vertex Cover of the complement of

graph G.

Minimum Vertex Cover and these other related problems have real world applications

and one of their application is in computational biochemistry [11]. In computational bio-

chemistry there often occurs a situation where there is a need to resolve conflicts between

sequences in a sample by excluding some of the sequences. What represents a conflict be-

tween the sequences is assumed to be precisely defined in the biochemical context. In such

a case, the sequences can be represented as the vertices of a graph and an edge is added

between two vertices if there exists a conflict between the two sequences represented by

those vertices. A Minimum Vertex Cover can now be found in that graph to help resolve

all the conflicts by getting rid of minimum number of sequences in the sample. Another

very simple and commonly used example is of the surveillance system. Suppose, there is

a building where a surveillance system needs to be set up for security reasons. Now, the

question is how many cameras are needed to be put and where, such that each and every

corner of the building is covered? If all corners of the building are represented as vertices

and all lobbies as edges then finding a minimum vertex cover for the graph will give the

minimum number of cameras that would be needed to make sure that all the lobbies of the

building are covered by the surveillance system.

In this paper we discuss various Minimum Vertex Cover finding algorithms and imple-

ment them in a parallel fashion, making use of multiple cores to get better and faster results.

3

The remainder of the paper is organized as follows: In the next chapter we discuss, what

work has already been done in this area and then in chapter 3, we describe a branch and

bound algorithm and talk a little about its parallel implementation. We also describe two

seesaw search algorithms in that chapter. In chapter 4 we provide experimental evaluations

and then finally, in chapter 5, we provide some concluding remarks.

4

Chapter 2

Related Work

There are two different types of algorithms to solve any type of combinatorial optimiza-

tion problem: exact algorithms and approximation algorithms. While exact algorithms

guarantee to find the best optimal solution, they don't do so in reasonable time. On the

other hand, approximation algorithms which mainly include local search algorithms, usu-

ally based on heuristics or randomness are algorithms that do not guarantee an optimal

solution. Although, they can find optimal or satisfactory sub-optimal solution in much

lesser time and for this reason local search algorithms are more appealing for solving Min-

imum Vertex Cover and related problems in large-scale graphs. The considerable interest

in these local search algorithms can be seen in [13][4][6][5][3]. All of these papers in-

troduce different heuristics and local search techniques for the Minimum Vertex Cover

problem. For example, [13] proposes a stochastic local search algorithm for the Minimum

Vertex Cover problem where it uses the idea of solving the decision problem by finding a

k sized vertex cover. It does so by doing a number of iterations and in each iteration it ex-

changes two vertices. [4] proposes another local search algorithm known as EWLS (Edge

Weighting Local Search) that makes use of edge weighting and other search techniques to

improve the quality of local optima. EWLS has a problem where it requires an instance-

dependent parameter. [6] proposes a solution that uses EWLS with EWCC (Edge Weighted

Configuration Checking) to get rid of the instance-dependent parameters by using induced

subgraph information. These previous local search algorithms have some drawbacks like

selecting a pair of vertices to exchange simultaneously, which is very time consuming or

using edge-weighting to diversify the search but not having any mechanism for decreasing

5

those weights. In order to overcome these drawbacks, [5] proposes a solution with a two

stage vertex exchanging mechanism and edge weighting with forgetting. The edge weight-

ing not only helps increase weights of uncovered edges but also reduces weights of all the

edges periodically. Although all of these algorithms are good, the most popular one is the

FastVc algorithm proposed in [3]. It is a simple and fast local search algorithm that uses

two low complexity heuristics and focuses on solving the Minimum Vertex Cover problem

for massive graphs.

A lot of research has also been towards developing exact algorithms for such NP-hard

problems [14][12][10][9][7]. All of this research focuses on trying to solve the Maximum

Clique problem but with different approaches. [7] and [14] propose solutions that uses

an approximate coloring algorithm to provide upper bounds for the size of the maximum

clique. [14] also makes use of vertex sorting to further improve the algorithm. On the other

hand, [9] uses Maximum Satisfiability technology to provide upper bounds for the maxi-

mum clique. [10] proposes a solution that focuses on pruning strategies by using previously

introduced methods in combination with some new methods. These improvements make it

suitable for massive sparse graphs. [12] makes use of constraint programming for solving

the Maximum Clique problem. Out of all these various approaches, one of the popular ap-

proach is the branch-and-bound method. This method considers the full configuration space

as a tree. The algorithm then recursively explores the configuration space by deciding at

each level of the tree, the presence or absence of a node in the vertex cover. The algorithm

stops when either a vertex cover is found or a bound condition is met. Some research has

also been done in developing GPU-based algorithms for Minimum Vertex Cover [15] but

unfortunately, this focuses on finding the minimal vertex cover rather than the minimum.

Even though minimal vertex covers are good, they are not the optimal solution.

In this paper we propose a parallel version of the FastVC algorithm, mentioned above,

to further improve its performance.

6

Chapter 3

Design

3.0.1 Branch and Bound

Branch and bound is an exact search design paradigm that is usually used to find optimal

solutions for combinatorial optimization problems. It is nothing but a systematic enumer-

ation of all candidates. The set of all solutions is thought of to be a rooted tree with the

root being the full set and the branches being the subset of the solution set. The algorithm

traverses each branch of the tree so as to go through every possible solution but before enu-

merating the candidate solutions of a branch, it first checks the branch against the upper and

lower bounds of the optimal solution. If this does not satisfy then the branch is discarded

and the algorithm continues with the rest of the branches. Since, the branch and bound

algorithm goes through every possible candidate solution set, it is known to be a complete

algorithm. In other words it guarantees to find an optimal solution.

The branch and bound algorithm enumerates the full configuration space by deciding

at each level, if a vertex is in the vertex cover or not. The full configuration can be thought

of as the tree where each node decides on the presence or absence of one of the vertices.

Therefore, each node in the tree has two branches, one corresponding to having the vertex

under consideration in the vertex cover and other corresponding to not having the vertex

under consideration in the vertex cover. The algorithm starts with having the entire vertex

set at the root. The full set is obviously a vertex cover and then, at each level, it considers

one of the vertex in the vertex cover that can be removed. It then creates one branch where

it keeps the vertex in the vertex cover and for the other branch it first checks if removing

7

the vertex from the vertex cover is still going to be a cover or not. If not, there is no point

in going down that branch but if it is, then it creates another branch where it removes the

vertex from the vertex cover.

We make a few changes to this basic branch and bound technique to get the most out of

it. At every level of the tree, along with the vertex cover, we maintain one more set known

as VerticesAllowedToRemove. This set contains the vertices that are allowed to remove at

any level. At every level, when a vertex is either chosen to be kept or removed, this vertex

is removed from the VerticesAllowedToRemove set so that the same vertex is not considered

again at deeper levels. Also, at any level if a vertex is removed from the vertex cover then

the VerticesAllowedToRemove is updated. This is done by removing all the removed vertexs

adjacent vertices from the VerticesAllowedToRemove set, that are also present in the vertex

cover. This can be done because for a vertex that is removed from a vertex cover, removing

any of its adjacent vertices from the vertex cover will mean that the edge between those

two vertices will no longer be covered. Consider the graph, 3.1. Figure 3.2 shows how the

branch and bound tree would look for that graph. In each node, the first set represents the

vertex cover under consideration and the second set represents the vertices that are allowed

to remove i.e the second set is the VerticesAllowedToRemove set.

The bound condition of the branch and bound algorithm is based on keeping track of

the best (minimum) vertex cover seen. By doing this at every level a question can be asked

- even after removing all the vertices from the VerticesAllowedToRemove set, is there still

a possibility of getting a better solution than what is already been seen? If not, then theres

again no point of going down that branch. This can be done by subtracting the number of

vertices in VerticesAllowedToRemove set from the number of vertices in the vertex cover

and then checking if that is smaller than the best vertex cover size seen up to now. Figure

3.3 shows how the branch and bound tree would look if the bound condition was added.

It can be seen from the figure 3.3 that the nodes is the red box are not visited because

by checking the bound condition it is known that enumerating through those nodes is not

going to result in a better solution than what is already seen.

8

Figure 3.1: A graph

Implementation Details of the Branch and Bound Algorithm

The branch and bound method requires to explore the entire solution tree. This tree can

be traversed using two approaches: breadth first search (BFS) and depth first search(DFS).

BFS is easy to parallelize. The search starts with the full set and then at every level it

creates two new subproblems by considering a vertex, where one path decides to keep the

vertex and other decides to remove (if after removing the vertex, the vertex set is still a

cover) the vertex from the previous set. It then adds these subproblems to a queue. After

that, each thread can now dequeue these subproblems, create new subproblems and add

them to the queue, until all the subproblems have been explored. While doing this, the BFS

search would keep track of the best solution. Although, this has a problem. Each node in

the tree has two child nodes and therefore, the entire tree would have 2V nodes, where V is

the number of vertices in the graph. With the bounding condition of the branch and bound

method this might reduce but in worse case it can still be 2V. Therefore, the amount of

storage required by BFS to store all possible subproblems in the queue is an exponential

function of V.

The other approach to explore the search space is by using DFS. DFS would start with

a full set too but then at every level it would choose one path and go down the tree, until

the end of that path and then backtrack to the previous node and take a different path and

9

Figure 3.2: Branch and bound

continue doing so until all the possible paths have been visited. Unlike BFS, DFS doesnt

require exponential storage. In fact, it requires storage just proportional to V but DFS is

very difficult to parallelize. So, in order to explore the entire search tree, the best strategy

is to use both, as proposed in [8].

• Sequential approach

A sequential program would do this by doing a BFS search till a certain threshold

i.e starting with a full set and adding subproblems to the queue by considering some

vertex in the full set and deciding whether that vertex should be part of the vertex

cover or not. It would then continue to take out subproblems from the queue. If the

threshold has not been reached yet, then further subproblems would be created and

added to the queue and the program will continue doing this as long as the threshold

has not been reached. Once the threshold has been reached, every time when a sub-

problem is removed from the queue, the program would perform DFS search on that

subproblem rather than BFS. Once done, it will remove the next subproblem in the

queue and perform DFS on that subproblem and continue doing so until the queue is

empty and there are no more subproblems to be examined. Performing a BFS search

10

Figure 3.3: Branch and bound with pruning

till a certain threshold and then DFS on the rest of the subproblems ensures that the

queue doesnt consume excessive amount of storage space. The threshold to be used

depends on the particular problem. Algorithm 1 lists the sequential search() function

of the branch and bound algorithm in detail.

Algorithm 1 Sequential branch and bound Search

queue.add(V, V )while notqueue.empty() do

vertexSet, verticesAllowedToRmv ← queue.dequeue()level← |V | − |verticesAllowedToRmv|if level ≤ threshold then

BFS(vertexSet, verticesAllowedToRmv)else

DFS(vertexSet, verticesAllowedToRmv)end if

end while

• Parallel approach

A parallel program on the other hand can do this by using a parallel workQueue. A

parallel workQueue is used for the parallel implementation of the branch and bound

algorithm rather than the normal queue. The workQueue starts with having the full

11

Algorithm 2 Parallel branch and bound search

workQueue.add(V, V )parallelFor (workQueue)

vertexSet, verticesAllowedToRmv ← workQueue.dequeue()level← |V | − |verticesAllowedToRmv|if level ≤ threshold then

BFS(vertexSet, verticesAllowedToRmv)else

DFS(vertexSet, verticesAllowedToRmv)end if

End parallelFor

set as the initial problem. The parallelFor loop goes through each subproblem in the

workQueue. One of the threads picks up each of these subproblems, explore them

and add more subproblems in the workQueue or perform DFS and explore that entire

branch, depending on if the level of the search has reached the threshold or not. The

search function is called as long as there are subproblems in the queue. Algorithm 2

lists the parallel search() function of the branch and bound algorithm in detail.

Algorithm 3 Branch and Bound: BFSfunction BFS(vertexSet, verticesAllowedToRemove)

vertexToRemove← verticesAllowedToRmv[0]verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveif |vertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then

queue.add(vertexSet, verticesAllowedToRmv)end ifexpV ertexSet← vertexSet \ vertexToRemoveif isCover(expVertexSet) then

updateV erticesAllowedToRemove(vertexToRemove, verticesAllowedToRmv)if |expV ertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then

queue.add(expV ertexSet, verticesAllowedToRmv)end ifif |expV ertexSet| < |bestV ertexCover| then

bestV ertexCover ← expV ertexSetend if

end ifend function

12

Algorithm 4 Branch and Bound: DFSfunction DFS(vertexSet, verticesAllowedToRemove)

vertexToRemove← verticesAllowedToRmv[0]verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveif |vertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then

DFS(vertexSet, verticesAllowedToRmv)end ifexpV ertexSet← vertexSet \ vertexToRemoveif isCover(expVertexSet) then

updateV erticesAllowedToRemove(vertexToRemove, verticesAllowedToRmv)if |expV ertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then

if |expV ertexSet| < |bestV ertexCover| thenbestV ertexCover ← expV ertexSet

end ifDFS(expV ertexSet, verticesAllowedToRmv)

end ifend if

end function

Algorithm 5 Branch and Bound: Update Vertices Allowed to Removefunction UPDATEVERTICESALLOWEDTOREMOVE(vertexToRemove, verticesAl-lowedToRmv)

for each vertex in verticesAllowedToRemove doif vertexToRemove is adjacent to vertex then

verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveend if

end forend function

13

3.0.2 Seesaw Search

The other type of algorithms for combinatorial optimization problems are the local search

algorithms. Local search algorithms are algorithms that start at some point in the search

space and by exploring the neighbors of that point, it moves to one of those neighboring

points in the search space. The decision made for each move is based on local knowledge

only. Stochastic local search is a type of local search which makes decisions based on

randomness for selecting or generating candidate solutions. Seesaw search is a type of

stochastic local search for combinatorial optimization problems.

Seesaw search consists of two phases, optimizing phase and constraining phase. In

the optimizing phase the search focuses on optimizing the solution as much as possible

without caring about the constraints whereas in the constraining phase the search focuses

on making sure that all the constraints are met without caring if the solution is optimal or

not. The search keeps alternating between these two phases to find the best solution. For

Minimum Vertex Cover, the optimizing phase is to keep removing vertices from the vertex

cover as long as the resulting vertex set is still a cover and the constraining phase is to

keeping on adding vertices until the vertex set becomes a vertex cover.

Implementation Details of the Seesaw Search Algorithm

In this paper, we define two types of seesaw search algorithms and they will be referred to

as Naive Seesaw Search algorithm and Intelligent Seesaw Search Algorithm.

Naive Seesaw Search Algorithm

• Sequential Approach

The Naive Seesaw search algorithm starts with an empty vertex set and then at each

step it either adds a vertex or removes a vertex based on if the vertex set is a vertex

cover or not. It chooses the vertex to be added/removed randomly and keeps on doing

it for a specified number of steps. At each step in addition to adding/removing a ver-

tex, it also checks whether the new vertex set is a vertex cover or not and keeps track

14

of the best vertex cover found. These steps are performed for a specified number of

repetitions such that each of these repetitions uses a different seed and thus explores

a different search space. After all the repetitions the best vertex cover across all

the repetitions is found. Algorithm 6 lists the sequential version of a Naive Seesaw

search algorithm.

Algorithm 6 Sequential Naive Seesaw searchbestV ertexSet← Vfor (1, reps) do

for (1, steps) doif isCover(vertexSet) then

vertexSet← vertexSet \ randomV ertexSetElementelse

vertexSet← vertexSet ∪ randomV ertexSetElementif isCover(vertexSet) and |vertexSet| < |bestV ertexSet| then

bestV ertexSet← vertexSetend if

end ifend for

end forreturn bestVertexSet

• Parallel Approach

For parallel implementation of the Naive Seesaw search, a parallelFor loop is used for

looping over the repetitions so that each repetition is executed by a different thread.

Algorithm 7 lists the parallel version of a Naive Seesaw search algorithm

Intelligent Seesaw Search Algorithm

A better and intelligent version of this is proposed in FastVC[3]. We use the parallel version

of the FastVC algorithm and call it as Intelligent Seesaw search. FastVC focuses on low

complexity approximate heuristics rather than focusing on heuristics that are accurate but

have very high complexity. FastVC solves the MVC problem by iteratively solving its

decision version i.e. searching for a k-sized vertex cover, where k is a positive number.

15

Algorithm 7 Parallel Naive Seesaw searchbestV ertexSet← VparallelFor (1, reps)

for (1, steps) doif isCover(vertexSet) then

vertexSet← vertexSet \ randomV ertexSetElementelse

vertexSet← vertexSet ∪ randomV ertexSetElementif isCover(vertexSet) and |vertexSet| < |bestV ertexSet| then

bestV ertexSet← vertexSetend if

end ifend for

End parallelFor

• Sequential approach

FastVC starts by constructing an initial candidate solution C. The process of con-

structing this candidate solution involves two phases: extending phase and shrinking

phase. It starts with an empty set C and in the extending phase, it goes through all the

edges in the graph and if that edge is uncovered then it adds the endpoint with higher

degree to C. At the end of this phase, a vertex cover is obtained. In the shrinking

phase, the loss values are calculated for all the vertices in C, where loss of a vertex

u represents the number of edges that will be uncovered if u would be removed from

the vertex cover. Then, it goes through all the vertices in C and removes the vertex

that have a loss value of 0 and then updates the loss value of all its neighbors. This is

done iteratively until all the vertices in C has a loss value greater than 0. This proce-

dure provides a minimal vertex cover i.e if any of the vertex was to be removed from

C then, C would no longer be a cover. The complexity of this part is O(m), where m

is the number of edges in the graph. This is better than O(n2) in most cases, where

n is the number of vertices in the graph. After constructing the candidate solution C

this way, FastVC performs a number of steps until the elapsed time is greater than

the cutoff time specified, wherein in each step it chooses a vertex u belonging to C

16

to remove. This is done by using a technique called Best From Multiple Selections

(BMS). BMS selects k random vertices with replacement from C, where k is a pa-

rameter. It then chooses the one with the best (lowest) loss value. The algorithm

iteratively removes vertices from C until it is no longer a cover. Then, it chooses a

random edge which is uncovered by C and adds the endpoint with higher gain into

C, where gain of a vertex u is the number of edges that would be covered if u would

be added to the cover. The loss and gain values of the vertex and its neighbors are

updated along with removing and adding the vertex in each step. The complexity

of using the BMS heuristic is O(1) as k is constant. Algorithm 8 lists the sequential

version of the FastVC algorithm which is same as that in [3].

Algorithm 8 Sequential Intelligent Seesaw search [3]Input: graph G = (V,E), the cutoff timeOutput: vertex cover of GC ← ConstructV C()gain(V )← 0 for each vertex v /∈ Cwhile elapsedtime < cutoff do

if C covers all edges thenC∗ ← Cremove a vertex with minimum loss from Ccontinue

end ifu← ChooseRmV ertex(C)C ← C \ ue← a random uncovered edgev ← the endpoint of e with greater gain, breaking ties in favor of the older oneC ← C ∪ v

end whilereturn C∗

• Parallel approach

For the parallel implementation of the FastVC algorithm, the algorithm starts similar

to that of the sequential version, wherein a minimal vertex cover is found using the

ConstructVC() method. Then, unlike the sequential version which performs steps as

long as the cutoff time has not elapsed, in case of the parallel version, the number

17

of repetitions and number of steps is user-specified. Each repetition performs the

specified number of steps and each repetition uses a different seed to select vertices

for the vertex exchanging step described above. This ensures that in each repetition

a different solution space is explored. The parallel version of FastVC is listed in 9.

Algorithm 9 Parallel Intelligent Seesaw search [3]Input: graph G = (V,E), reps: no. of repetitions and steps: no. of stepsOutput: vertex cover of GC ← ConstructV C()gain(V )← 0 for each vertex v /∈ CbestV ertexCover ← {V }parallelFor (1, reps)

for 1,steps doif C covers all edges then

C∗ ← Cremove a vertex with minimum loss from Ccontinue

end ifu← ChooseRmV ertex(C)C ← C \ ue← a random uncovered edgev ← the endpoint of e with greater gain, breaking ties in favor of the older oneC ← C ∪ v

end forif |C∗| < |bestV ertexCover| then

bestV ertexCover ← C∗

end ifEnd parallelForreturn bestV ertexCover

18

Algorithm 10 ConstructVC(G) [3]Input: graph G = (V,E)Output: vertex cover of GC ← ∅//extend C to cover all edgesfor e ∈ E do

if only one endpoint of e belongs to C thenfor the endpoint v ∈ C, loss(v) + +

end ifend for//remove redundant verticesfor V ∈ C do

if loss(v) = 0 thenC ← C \ v, updatelossofverticesofalltheneighborsofv

end ifend forreturn C

Algorithm 11 Best from Multiple Selection (BMS) Heuristic [3]Input: A set S, a parameter k, a comparison function f//assume f is a function such that we say an element is better than another one if it hassmaller f valueOutput: an element of Sbest← a random element from Sfor 1, k-1 do

r ← a random element from Sif f(r) < f(best) then

best← rend if

end forreturn best

19

Chapter 4

Experiments

For the experiments we have used a set of undirected graphs from the Network Data Repos-

itory on-line [2]. These graphs include the BHOSLIB benchmark graphs and a lot of real

world graphs from the area of biological networks, collaboration networks, interaction net-

works, miscellaneous networks, etc. The BHISLIB [16] benchmark graphs are generated

by transforming from forced satisfiable SAT [1] benchmarks, where the set of vertices and

set of edges respectively correspond to the set of variables and set of binary clauses in SAT.

These are graphs with 'hidden optimal solution 'and are designed to be hard to solve. All

the BHOSLIB graphs are expressed in ASCII DIMACS graph format. A description of the

ASCII DIMACS graph format is as follows:

• 0 or more of comment lines at the top of the file, starting with c and can be ignored.

• After the comment lines, there is a line that contain the size of the graph in the form:

p edge V E, where V is the number of vertices and E is the number of edges.

• Following lines in the file is a list of edges where each line is in the form: e vertex no

vertex no, where vertex no is a number between 1 and V.

These BHOSLIB [16] benchmark graphs are used to evaluate weak scaling performance

of the Naive Seesaw search algorithm and the Intelligent Seesaw search algorithm and their

ability to find high quality covers, where quality is defined as 4.1

QualityofCover = 1− |FoundV ertexCover| − |MinimumV ertexCover||MinimumV ertexCover| (4.1)

20

For these experiments we have used the RIT computer Science parallel computers,

namely champ, nessie and kraken. Champ and nessie each have one Intel Xeon E5-2690

processor with 8 dual hyper-threaded CPU cores and kraken has 4 Intel Xeon E7-8850

processors with 10 dual hyper-threaded CPU cores per processor. All the three algorithms,

Branch and Bound, Naive Seesaw search and Intelligent Seesaw search have been written

using Java’s PJ2 library [8].

Figure 4.1: Weak Scaling for Naive Seesaw search with 100000 steps

The BHOSLIB [16] graphs are used to test the weak scaling performance of the Naive

Seesaw search and Intelligent Seesaw search algorithms. In order to test the weak scaling

performance of a program, we increase the number of cores that the program uses in pro-

portion to the problem size. In our case, neither the number of vertices nor the number

of edges affect the complexity of the program. Rather, the time taken by the program is

proportional to the number of repetitions and the number of steps the program performs.

So, in order to test weak scaling, we increase the number of repetitions as we increase the

21

Figure 4.2: Weak Scaling for Naive Seesaw search with 1000000 steps

number of cores. Note that, we can’t increase the number of steps to test weak scaling

because the number of steps are not performed in parallel. So, for Naive Seesaw search

algorithm, we perform the search using 100000 and 1000000 steps on a set of graphs. For

these program runs, we go from 10 repetitions to 160 repetitions and 1 core to 16 cores.

The number of cores is increased in proportion to the number of repetitions. The weak

scaling performance of Naive Seesaw search when run with 100000 can be seen in figure

4.1 and when run with 1000000 steps can be seen in figure 4.2. We do the same for the

Intelligent Seesaw search algorithm too but in that case the we perform the program runs

with 1000 and 10000 steps. The weak scaling performance of Intelligent Seesaw search

with 1000 steps can be seen in figure 4.3 and when run with 10000 steps can be seen in fig-

ure 4.4. From the graphs it can be seen that as the number of cores increase, the efficiency

decreases. This is because, the RIT CS parallel machines used have dual hyper-threaded

22

cores. In case of dual hyper-threaded cores, for every physical core, the operating system

addresses two virtual cores and so there are possibly threads that share resources, leading

to the decrease in efficiency.

Figure 4.3: Weak Scaling for Seesaw Intelligent with 1000 steps

4.0.1 Results

The results of various experiments are shown in tables 4.1, 4.2 and 4.3. This tables are

organized as follows:

• Graph Name: Represents the name of the graph as in the BHOSLIB[16] or the net-

work repository [2].

• V: Number of vertices in the graph.

• E: Number of edges in the graph.

23

Figure 4.4: Weak Scaling for Seesaw Intelligent with 10000 steps

• Known MVC: Known minimum vertex cover size for the graph.

• Algorithm: Name of the algorithm which was run on the graph to find the minimum

vertex cover.

• Reps: Number of repetitions performed by the algorithm.

• Steps: Number of steps performed by the algorithm.

• Cores: Number of cores the algorithm was run on

• Minimal VC or threshold: In case of Intelligent Seesaw search this represents the

minimal vertex cover found by the seesaw search before starting the search. In case of

Branch and Bound, this represents the threshold till which BFS need to be performed.

• MVC found: Size of the vertex cover found by the algorithm.

24

• Quality of cover: Quality of the minimum vertex cover found.

• Time: Time taken by the algorithm in milliseconds.

The BHOSLIB[16] graphs are used to compare the quality of vertex covers found by

both, the Naive Seesaw search and Intelligent Seesaw search algorithms. The results of

both of these algorithms can be seen in table 4.1 and the following observations can be

made:

• Naive Seesaw search is unable to find the minimum vertex cover for all the graphs

even after using 10 million steps whereas Intelligent Seesaw search is able to find

minimum vertex cover by just using 10000 steps and in some cases even by using

just 1000 steps. Of course, a various number of reps have been tried for all of these

graphs and the one with the most optimal solution is reported.

• Intelligent Seesaw search finds a minimal vertex cover that is smaller than the final

vertex cover found by the Naive Seesaw search algorithm. This shows that one of

the major reason for the efficiency of the Intelligent Seesaw algorithm is that it starts

its search very close to the true optimum solution and thus, it reaches there faster as

compared to the Naive Seesaw search algorithm, which starts its search from the full

set.

• The table also shows the quality of the covers found by each of the algorithms.

A number of real world graphs are used from the network repository [2] to test the per-

formance of the Branch and Bound algorithm v/s the Intelligent Seesaw search algorithm.

Table 4.2 shows the result of these tests and the following observations can be made from

it:

• For most of the graphs, the minimal vertex cover found by the Intelligent Seesaw

search is the minimum vertex cover too and thus, it finds the optimal solution very

25

Table 4.1: Results for various BHOSLIB[16] graphs

GraphName

V EKnownMVC

Algorithm Reps Steps CoresMinimalVC

MVCfound

Qualityofcover

Time(ms)

frb30-15-1

450 17827 420SeesawNaive

400 10000000 40 NA 435 0.96 339196

frb30-15-1

450 17827 420Seesaw In-telligent

10 10000 1 430 420 1 1504

frb35-17-1


800 10000000 80 NA 578 0.96 778565

frb35-17-1


160 10000 16 569 560 1 2800

frb40-19-1


200 10000000 20 NA 743 0.96 507204

frb40-19-1


20 300000 16 732 720 1 11580

frb45-21-1


400 10000000 40 NA 927 0.97 681985

frb45-21-1


160 100000 16 913 900 1 27499

quickly whereas the Branch and Bound algorithm takes 1000s of times more time to

find the optimal solution.

• These are very small graphs and it can be easily seen that for larger graphs, Branch

and Bound will take a lot of time to find the solution whereas Intelligent Seesaw

search algorithm will take very less time to find an optimal solution or a solution that

i very close to the optimal solution.

Table 4.3 shows the results of Intelligent Seesaw search algorithm on various real world

graphs selected randomly from the network repository [2]. These graphs belong to the

biological and collaboration networks from the network repository[2].

26

Table 4.2: Branch & Bound and Intelligent Seesaw search results for real world graphs

GraphName

V EKnownMVC

Algorithm Reps Steps Cores

MinimalVC orthresh-old

MVCfound

Qualityofcover

Time(ms)

GD95 b 73 96 23Branch andBound

NA NA 1 10 23 1 2065048

GD95 b 73 96 23Seesaw In-telligent

1 1 1 23 23 1 161

GD95 c 62 287 38Branch andBound

NA NA 1 10 38 1 484176

GD95 c 62 287 38Seesaw In-telligent

1 1 1 38 38 1 166

GD96 b 111 193 20Branch andBound

NA NA 1 10 20 1 404584

GD96 b 111 193 20Seesaw In-telligent

1 1 1 20 20 1 170

ia-infect-hyper

113 2196 90Branch andBound

NA NA 1 80 90 1 3975

ia-infect-hyper


1 150 1 93 90 1 106

Table 4.3: Intelligent Seesaw search results for real world graphs

GraphName

V EKnownMVC

Reps Steps CoresMinimalVC

MVCfound

Qualityofcover

Time(ms)

bio-dmela

7393 25569 2630 1 10000001 2723 2632 0.99 452080

bio-yeast

1458 1948 456 50 500 16 464 456 1 836

ca-AstroPh

17903 196972 11483 50 50000 16 11512 11483 1 3005084

ca-CondMat

21363 91286 12480 1 50000 1 12499 12480 1 641315

ca-CSphd

1882 1740 550 1 600 1 554 550 1 295

ca-Erdos992

6100 7515 461 1 1 1 461 461 1 201

ca-GrQc 4158 13422 2208 1 3000 1 2213 2208 1 943ca-HepPh

11204 117619 6555 1 25000 1 6567 6555 1 33178

27

Chapter 5

Conclusions

5.1 Lessons Learned

All the 3 algorithms: Branch and Bound, Naive Seesaw search and, Intelligent Seesaw can

find the optimal solution but, as the graph size increases, the time took by the Branch and

Bound algorithm increases exponentially. In case of the Naive Seesaw search algorithm,

even though the time complexity of the algorithm doesn’t increase with the problem size,

it takes a lot of steps and repetitions to be performed in order to find the optimal solution.

But, unlike the Naive Seesaw search algorithm, the Intelligent Seesaw search algorithm

starts its search by first finding the minimal vertex cover, thus, starting the search close to

the optimal solution. Due to this, the Seesaw Intelligent search is able to find the optimal

solution very quickly for even larger graphs. When ran on various other real word graphs,

it was also observed that a lot of times, increasing the number of repetitions don't help

much but increasing the number of steps does. This can be seen in table 4.3 For most

of the graphs, Intelligent Seesaw search algorithm was able to find the optimal solution

only in a single repetition. Therefore, the Intelligent Seesaw search algorithm not only

provides better solutions than the Naive Seesaw search algorithm but also takes less time

as compared to both of the other algorithms.

5.2 Future Work

As observed from the results, the Naive Seesaw Search algorithm is not able to find solu-

tions even as good as what the Intelligent Seesaw search starts with. This happens because

28

the Intelligent Seesaw search starts with a minimal vertex cover whereas the Naive Seesaw

search starts with an empty set. In future, it would be interesting to examine the quality of

covers found by the Naive Seesaw Search if it starts its search by finding the minimal vertex

cover too, like the Intelligent Seesaw Search does. This will provide a true comparison of

the Naive Seesaw search and the Intelligent Seesaw search algorithms and tell us if it was

only starting the search closer to the optimal solution that provided such better results for

the Intelligent Seesaw search or the heuristics that it uses for its search are important too.

Additionally, this work takes a first step towards implementing parallel programs for

Minimum Vertex Cover problems for massive graphs using local search techniques. In

future, additional work can be done to solve other NP-hard problems using the seesaw

search technique and leveraging the performance of parallel computing.

29

Bibliography

[1] Boolean satisfiability problem (https://en.wikipedia.org/wiki/boolean satisfiability problem.

[2] Netwrok repository (http://networkrepository.com/).

[3] Shaowei Cai. Balance between complexity and quality: Local search for minimumvertex cover in massive graphs. In Proceedings of the 24th International Conferenceon Artificial Intelligence, IJCAI’15, pages 747–753. AAAI Press, 2015.

[4] Shaowei Cai, Kaile Su, and Qingliang Chen. Ewls: A new local search for minimumvertex cover. In Maria Fox and David Poole, editors, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). AAAI Press, 2010.

[5] Shaowei Cai, Kaile Su, Chuan Luo, and Abdul Sattar. Numvc: An efficient localsearch algorithm for minimum vertex cover. Journal of Artificial Intelligence Re-search, 46(1):687–716, January 2013.

[6] Shaowei Cai, Kaile Su, and Abdul Sattar. Local search with edge weighting andconfiguration checking heuristics for minimum vertex cover. Artificial Intelligence,175(9-10):1672–1696, June 2011.

[7] Torsten Fahle. Simple and Fast: Improving a Branch-And-Bound Algorithm for Max-imum Clique, pages 485–498. Springer Berlin Heidelberg, Berlin, Heidelberg, 2002.

[8] Alan Kaminsky. Big CPU, Big Data: Solving the World’s Toughest ComputationalProblems with Parallel Computing. CreateSpace Independent Publishing Platform,USA, 1st edition, 2016.

[9] Chu-Min Li and Zhe Quan. An efficient branch-and-bound algorithm based on maxsatfor the maximum clique problem. In Proceedings of the Twenty-Fourth AAAI Confer-ence on Artificial Intelligence, AAAI’10, pages 128–133. AAAI Press, 2010.

[10] Patric R. J. Ostergard. A fast algorithm for the maximum clique problem. DiscreteApplied Mathematics, 120(1-3):197–207, August 2002.

30

[11] S. Pirzada. Applications of graph theory. Proceedings in Applied Mathematics andMechanics, 7(1):2070013–2070013, 2007.

[12] Jean-Charles Regin. Using Constraint Programming to Solve the Maximum CliqueProblem, pages 634–648. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.

[13] Silvia Richter, Malte Helmert, and Charles Gretton. A Stochastic Local Search Ap-proach to Vertex Cover, pages 412–426. Springer Berlin Heidelberg, Berlin, Heidel-berg, 2007.

[14] Etsuji Tomita and Toshikatsu Kameda. An efficient branch-and-bound algorithm forfinding a maximum clique with computational experiments. Journal of Global Opti-mization, 37(1):95–111, 2007.

[15] K. Toume, D. Kinjo, and M. Nakamura. A GPU algorithm for minimum vertex coverproblems. In American Institute of Physics Conference Series, volume 1618 of Amer-ican Institute of Physics Conference Series, pages 724–727, October 2014.

[16] Ke Xu. Bhoslib: Benchmarks with hidden optimum solutions for graph problems(http://www.nlsde.buaa.edu.cn/ kexu/benchmarks/graph-benchmarks.htm.

Documents

Massively Parallel Seesaw Search for Minimum Vertex Cover › ~ark › students › mmn4185 › report.pdf · Vertex Cover problem where it uses the idea of solving the decision problem