Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Massively Parallel Seesaw Search forMinimum Vertex Cover
by
Mansi Nahar
A Project Report Submittedin
Partial Fulfillment of theRequirements for the Degree of
Master of Sciencein
Computer Science
Supervised by
Prof. Alan Kaminsky
Department of Computer Science
B. Thomas Golisano College of Computing and Information SciencesRochester Institute of Technology
Rochester, New York
May 2017
ii
Dedication
I wouldn’t have come this far in life without the constant support of my parents and my
brother. Both, financially and emotionally. So, this one is to them!
iii
Acknowledgments
The completion of this project wouldn’t have been possible without the guidance provided
by Prof. Alan Kaminsky. I am also grateful to my friends and family members for their
constant support and understanding.
iv
Abstract
Massively Parallel Seesaw Search for Minimum Vertex Cover
Mansi Nahar
Supervising Professor: Prof. Alan Kaminsky
Graph theory has always been a key research area in the field of Computer Science but
recently, with the increase in the amount of data, there has been more focus on graph prob-
lems for mining data from large scale/massive graphs. These graphs include web networks,
social media user-interaction networks, road traffic networks, etc. A lot of interesting ap-
proaches and results have been obtained in the last few decades. In the area of research
on various NP-hard problems, Minimum Vertex Cover is one of the fundamental problems
due to its applications in computational biochemistry, scheduling problems, surveillance
system deployment, etc. Also, Minimum Vertex cover can be directly related to other NP-
hard problems, namely, Maximum Independent Set and Maximum Clique which have their
own applications. Although over the years a lot of different approaches have been tried for
this problem, not a lot of effort is been put towards making use of the high computational
power of parallel systems. In this paper we propose, a Massively parallel Seesaw search
for Minimum Vertex Cover.
v
Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.0.1 Branch and Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 63.0.2 Seesaw Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.0.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
vi
List of Tables
4.1 Results for various BHOSLIB[16] graphs . . . . . . . . . . . . . . . . . . 254.2 Branch & Bound and Intelligent Seesaw search results for real world graphs 264.3 Intelligent Seesaw search results for real world graphs . . . . . . . . . . . . 26
vii
List of Figures
1.1 Vertex Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3.1 A graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Branch and bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Branch and bound with pruning . . . . . . . . . . . . . . . . . . . . . . . 10
4.1 Weak Scaling for Naive Seesaw search with 100000 steps . . . . . . . . . . 204.2 Weak Scaling for Naive Seesaw search with 1000000 steps . . . . . . . . . 214.3 Weak Scaling for Seesaw Intelligent with 1000 steps . . . . . . . . . . . . 224.4 Weak Scaling for Seesaw Intelligent with 10000 steps . . . . . . . . . . . . 23
1
Chapter 1
Introduction
For the purpose of this paper, we are going to consider the following vocabulary: Consider
an undirected graph G = (V,E) where V is the set of vertices and E is the set of edges in
the graph. A vertex cover of a graph is a subset of vertices in that graph such that each edge
is incident to at least one vertex in that set. If a vertex cover is such, that removing any
single vertex from it will make it no longer a vertex cover, then it is known as a minimal
vertex cover whereas a minimum vertex cover is a vertex cover with the minimum number
of vertices in it. Consider, figure 1.1. They are all examples of vertex covers for the same
undirected graph. For all of them, vertices in blue represent the vertices in the vertex cover.
The difference between them is that the first one represents a vertex cover for the graph,
the second one represents a minimal vertex cover and the third one represents a minimum
vertex cover for the graph.
Figure 1.1: Vertex Covers
2
Minimum Vertex Cover is one of the combinatorial optimization problems and is also
closely related to other such NP-hard problems such as the Maximum Independent Set
problem and the Maximum Clique Problem. The Maximum Independent Set problem is
the problem of finding a subset of vertices in a graph such that no two vertices in that subset
are connected by an edge. On the other hand, the Maximum Clique problem is to find a
subset of vertices in the graph such that all the vertices in the subset are connected to each
other. A Maximum Independent Set in a graph G would be the complement of the vertices
that are in the Minimum Vertex Cover and a Maximum Clique in a Graph G would be the
complement of the vertices that are in the Minimum Vertex Cover of the complement of
graph G.
Minimum Vertex Cover and these other related problems have real world applications
and one of their application is in computational biochemistry [11]. In computational bio-
chemistry there often occurs a situation where there is a need to resolve conflicts between
sequences in a sample by excluding some of the sequences. What represents a conflict be-
tween the sequences is assumed to be precisely defined in the biochemical context. In such
a case, the sequences can be represented as the vertices of a graph and an edge is added
between two vertices if there exists a conflict between the two sequences represented by
those vertices. A Minimum Vertex Cover can now be found in that graph to help resolve
all the conflicts by getting rid of minimum number of sequences in the sample. Another
very simple and commonly used example is of the surveillance system. Suppose, there is
a building where a surveillance system needs to be set up for security reasons. Now, the
question is how many cameras are needed to be put and where, such that each and every
corner of the building is covered? If all corners of the building are represented as vertices
and all lobbies as edges then finding a minimum vertex cover for the graph will give the
minimum number of cameras that would be needed to make sure that all the lobbies of the
building are covered by the surveillance system.
In this paper we discuss various Minimum Vertex Cover finding algorithms and imple-
ment them in a parallel fashion, making use of multiple cores to get better and faster results.
3
The remainder of the paper is organized as follows: In the next chapter we discuss, what
work has already been done in this area and then in chapter 3, we describe a branch and
bound algorithm and talk a little about its parallel implementation. We also describe two
seesaw search algorithms in that chapter. In chapter 4 we provide experimental evaluations
and then finally, in chapter 5, we provide some concluding remarks.
4
Chapter 2
Related Work
There are two different types of algorithms to solve any type of combinatorial optimiza-
tion problem: exact algorithms and approximation algorithms. While exact algorithms
guarantee to find the best optimal solution, they don't do so in reasonable time. On the
other hand, approximation algorithms which mainly include local search algorithms, usu-
ally based on heuristics or randomness are algorithms that do not guarantee an optimal
solution. Although, they can find optimal or satisfactory sub-optimal solution in much
lesser time and for this reason local search algorithms are more appealing for solving Min-
imum Vertex Cover and related problems in large-scale graphs. The considerable interest
in these local search algorithms can be seen in [13][4][6][5][3]. All of these papers in-
troduce different heuristics and local search techniques for the Minimum Vertex Cover
problem. For example, [13] proposes a stochastic local search algorithm for the Minimum
Vertex Cover problem where it uses the idea of solving the decision problem by finding a
k sized vertex cover. It does so by doing a number of iterations and in each iteration it ex-
changes two vertices. [4] proposes another local search algorithm known as EWLS (Edge
Weighting Local Search) that makes use of edge weighting and other search techniques to
improve the quality of local optima. EWLS has a problem where it requires an instance-
dependent parameter. [6] proposes a solution that uses EWLS with EWCC (Edge Weighted
Configuration Checking) to get rid of the instance-dependent parameters by using induced
subgraph information. These previous local search algorithms have some drawbacks like
selecting a pair of vertices to exchange simultaneously, which is very time consuming or
using edge-weighting to diversify the search but not having any mechanism for decreasing
5
those weights. In order to overcome these drawbacks, [5] proposes a solution with a two
stage vertex exchanging mechanism and edge weighting with forgetting. The edge weight-
ing not only helps increase weights of uncovered edges but also reduces weights of all the
edges periodically. Although all of these algorithms are good, the most popular one is the
FastVc algorithm proposed in [3]. It is a simple and fast local search algorithm that uses
two low complexity heuristics and focuses on solving the Minimum Vertex Cover problem
for massive graphs.
A lot of research has also been towards developing exact algorithms for such NP-hard
problems [14][12][10][9][7]. All of this research focuses on trying to solve the Maximum
Clique problem but with different approaches. [7] and [14] propose solutions that uses
an approximate coloring algorithm to provide upper bounds for the size of the maximum
clique. [14] also makes use of vertex sorting to further improve the algorithm. On the other
hand, [9] uses Maximum Satisfiability technology to provide upper bounds for the maxi-
mum clique. [10] proposes a solution that focuses on pruning strategies by using previously
introduced methods in combination with some new methods. These improvements make it
suitable for massive sparse graphs. [12] makes use of constraint programming for solving
the Maximum Clique problem. Out of all these various approaches, one of the popular ap-
proach is the branch-and-bound method. This method considers the full configuration space
as a tree. The algorithm then recursively explores the configuration space by deciding at
each level of the tree, the presence or absence of a node in the vertex cover. The algorithm
stops when either a vertex cover is found or a bound condition is met. Some research has
also been done in developing GPU-based algorithms for Minimum Vertex Cover [15] but
unfortunately, this focuses on finding the minimal vertex cover rather than the minimum.
Even though minimal vertex covers are good, they are not the optimal solution.
In this paper we propose a parallel version of the FastVC algorithm, mentioned above,
to further improve its performance.
6
Chapter 3
Design
3.0.1 Branch and Bound
Branch and bound is an exact search design paradigm that is usually used to find optimal
solutions for combinatorial optimization problems. It is nothing but a systematic enumer-
ation of all candidates. The set of all solutions is thought of to be a rooted tree with the
root being the full set and the branches being the subset of the solution set. The algorithm
traverses each branch of the tree so as to go through every possible solution but before enu-
merating the candidate solutions of a branch, it first checks the branch against the upper and
lower bounds of the optimal solution. If this does not satisfy then the branch is discarded
and the algorithm continues with the rest of the branches. Since, the branch and bound
algorithm goes through every possible candidate solution set, it is known to be a complete
algorithm. In other words it guarantees to find an optimal solution.
The branch and bound algorithm enumerates the full configuration space by deciding
at each level, if a vertex is in the vertex cover or not. The full configuration can be thought
of as the tree where each node decides on the presence or absence of one of the vertices.
Therefore, each node in the tree has two branches, one corresponding to having the vertex
under consideration in the vertex cover and other corresponding to not having the vertex
under consideration in the vertex cover. The algorithm starts with having the entire vertex
set at the root. The full set is obviously a vertex cover and then, at each level, it considers
one of the vertex in the vertex cover that can be removed. It then creates one branch where
it keeps the vertex in the vertex cover and for the other branch it first checks if removing
7
the vertex from the vertex cover is still going to be a cover or not. If not, there is no point
in going down that branch but if it is, then it creates another branch where it removes the
vertex from the vertex cover.
We make a few changes to this basic branch and bound technique to get the most out of
it. At every level of the tree, along with the vertex cover, we maintain one more set known
as VerticesAllowedToRemove. This set contains the vertices that are allowed to remove at
any level. At every level, when a vertex is either chosen to be kept or removed, this vertex
is removed from the VerticesAllowedToRemove set so that the same vertex is not considered
again at deeper levels. Also, at any level if a vertex is removed from the vertex cover then
the VerticesAllowedToRemove is updated. This is done by removing all the removed vertexs
adjacent vertices from the VerticesAllowedToRemove set, that are also present in the vertex
cover. This can be done because for a vertex that is removed from a vertex cover, removing
any of its adjacent vertices from the vertex cover will mean that the edge between those
two vertices will no longer be covered. Consider the graph, 3.1. Figure 3.2 shows how the
branch and bound tree would look for that graph. In each node, the first set represents the
vertex cover under consideration and the second set represents the vertices that are allowed
to remove i.e the second set is the VerticesAllowedToRemove set.
The bound condition of the branch and bound algorithm is based on keeping track of
the best (minimum) vertex cover seen. By doing this at every level a question can be asked
- even after removing all the vertices from the VerticesAllowedToRemove set, is there still
a possibility of getting a better solution than what is already been seen? If not, then theres
again no point of going down that branch. This can be done by subtracting the number of
vertices in VerticesAllowedToRemove set from the number of vertices in the vertex cover
and then checking if that is smaller than the best vertex cover size seen up to now. Figure
3.3 shows how the branch and bound tree would look if the bound condition was added.
It can be seen from the figure 3.3 that the nodes is the red box are not visited because
by checking the bound condition it is known that enumerating through those nodes is not
going to result in a better solution than what is already seen.
8
Figure 3.1: A graph
Implementation Details of the Branch and Bound Algorithm
The branch and bound method requires to explore the entire solution tree. This tree can
be traversed using two approaches: breadth first search (BFS) and depth first search(DFS).
BFS is easy to parallelize. The search starts with the full set and then at every level it
creates two new subproblems by considering a vertex, where one path decides to keep the
vertex and other decides to remove (if after removing the vertex, the vertex set is still a
cover) the vertex from the previous set. It then adds these subproblems to a queue. After
that, each thread can now dequeue these subproblems, create new subproblems and add
them to the queue, until all the subproblems have been explored. While doing this, the BFS
search would keep track of the best solution. Although, this has a problem. Each node in
the tree has two child nodes and therefore, the entire tree would have 2V nodes, where V is
the number of vertices in the graph. With the bounding condition of the branch and bound
method this might reduce but in worse case it can still be 2V. Therefore, the amount of
storage required by BFS to store all possible subproblems in the queue is an exponential
function of V.
The other approach to explore the search space is by using DFS. DFS would start with
a full set too but then at every level it would choose one path and go down the tree, until
the end of that path and then backtrack to the previous node and take a different path and
9
Figure 3.2: Branch and bound
continue doing so until all the possible paths have been visited. Unlike BFS, DFS doesnt
require exponential storage. In fact, it requires storage just proportional to V but DFS is
very difficult to parallelize. So, in order to explore the entire search tree, the best strategy
is to use both, as proposed in [8].
• Sequential approach
A sequential program would do this by doing a BFS search till a certain threshold
i.e starting with a full set and adding subproblems to the queue by considering some
vertex in the full set and deciding whether that vertex should be part of the vertex
cover or not. It would then continue to take out subproblems from the queue. If the
threshold has not been reached yet, then further subproblems would be created and
added to the queue and the program will continue doing this as long as the threshold
has not been reached. Once the threshold has been reached, every time when a sub-
problem is removed from the queue, the program would perform DFS search on that
subproblem rather than BFS. Once done, it will remove the next subproblem in the
queue and perform DFS on that subproblem and continue doing so until the queue is
empty and there are no more subproblems to be examined. Performing a BFS search
10
Figure 3.3: Branch and bound with pruning
till a certain threshold and then DFS on the rest of the subproblems ensures that the
queue doesnt consume excessive amount of storage space. The threshold to be used
depends on the particular problem. Algorithm 1 lists the sequential search() function
of the branch and bound algorithm in detail.
Algorithm 1 Sequential branch and bound Search
queue.add(V, V )while notqueue.empty() do
vertexSet, verticesAllowedToRmv ← queue.dequeue()level← |V | − |verticesAllowedToRmv|if level ≤ threshold then
BFS(vertexSet, verticesAllowedToRmv)else
DFS(vertexSet, verticesAllowedToRmv)end if
end while
• Parallel approach
A parallel program on the other hand can do this by using a parallel workQueue. A
parallel workQueue is used for the parallel implementation of the branch and bound
algorithm rather than the normal queue. The workQueue starts with having the full
11
Algorithm 2 Parallel branch and bound search
workQueue.add(V, V )parallelFor (workQueue)
vertexSet, verticesAllowedToRmv ← workQueue.dequeue()level← |V | − |verticesAllowedToRmv|if level ≤ threshold then
BFS(vertexSet, verticesAllowedToRmv)else
DFS(vertexSet, verticesAllowedToRmv)end if
End parallelFor
set as the initial problem. The parallelFor loop goes through each subproblem in the
workQueue. One of the threads picks up each of these subproblems, explore them
and add more subproblems in the workQueue or perform DFS and explore that entire
branch, depending on if the level of the search has reached the threshold or not. The
search function is called as long as there are subproblems in the queue. Algorithm 2
lists the parallel search() function of the branch and bound algorithm in detail.
Algorithm 3 Branch and Bound: BFSfunction BFS(vertexSet, verticesAllowedToRemove)
vertexToRemove← verticesAllowedToRmv[0]verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveif |vertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then
queue.add(vertexSet, verticesAllowedToRmv)end ifexpV ertexSet← vertexSet \ vertexToRemoveif isCover(expVertexSet) then
updateV erticesAllowedToRemove(vertexToRemove, verticesAllowedToRmv)if |expV ertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then
queue.add(expV ertexSet, verticesAllowedToRmv)end ifif |expV ertexSet| < |bestV ertexCover| then
bestV ertexCover ← expV ertexSetend if
end ifend function
12
Algorithm 4 Branch and Bound: DFSfunction DFS(vertexSet, verticesAllowedToRemove)
vertexToRemove← verticesAllowedToRmv[0]verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveif |vertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then
DFS(vertexSet, verticesAllowedToRmv)end ifexpV ertexSet← vertexSet \ vertexToRemoveif isCover(expVertexSet) then
updateV erticesAllowedToRemove(vertexToRemove, verticesAllowedToRmv)if |expV ertexSet| − |verticesAllowedToRmv| < |bestV ertexCover| then
if |expV ertexSet| < |bestV ertexCover| thenbestV ertexCover ← expV ertexSet
end ifDFS(expV ertexSet, verticesAllowedToRmv)
end ifend if
end function
Algorithm 5 Branch and Bound: Update Vertices Allowed to Removefunction UPDATEVERTICESALLOWEDTOREMOVE(vertexToRemove, verticesAl-lowedToRmv)
for each vertex in verticesAllowedToRemove doif vertexToRemove is adjacent to vertex then
verticesAllowedToRmv ← verticesAllowedToRmv \ vertexToRemoveend if
end forend function
13
3.0.2 Seesaw Search
The other type of algorithms for combinatorial optimization problems are the local search
algorithms. Local search algorithms are algorithms that start at some point in the search
space and by exploring the neighbors of that point, it moves to one of those neighboring
points in the search space. The decision made for each move is based on local knowledge
only. Stochastic local search is a type of local search which makes decisions based on
randomness for selecting or generating candidate solutions. Seesaw search is a type of
stochastic local search for combinatorial optimization problems.
Seesaw search consists of two phases, optimizing phase and constraining phase. In
the optimizing phase the search focuses on optimizing the solution as much as possible
without caring about the constraints whereas in the constraining phase the search focuses
on making sure that all the constraints are met without caring if the solution is optimal or
not. The search keeps alternating between these two phases to find the best solution. For
Minimum Vertex Cover, the optimizing phase is to keep removing vertices from the vertex
cover as long as the resulting vertex set is still a cover and the constraining phase is to
keeping on adding vertices until the vertex set becomes a vertex cover.
Implementation Details of the Seesaw Search Algorithm
In this paper, we define two types of seesaw search algorithms and they will be referred to
as Naive Seesaw Search algorithm and Intelligent Seesaw Search Algorithm.
Naive Seesaw Search Algorithm
• Sequential Approach
The Naive Seesaw search algorithm starts with an empty vertex set and then at each
step it either adds a vertex or removes a vertex based on if the vertex set is a vertex
cover or not. It chooses the vertex to be added/removed randomly and keeps on doing
it for a specified number of steps. At each step in addition to adding/removing a ver-
tex, it also checks whether the new vertex set is a vertex cover or not and keeps track
14
of the best vertex cover found. These steps are performed for a specified number of
repetitions such that each of these repetitions uses a different seed and thus explores
a different search space. After all the repetitions the best vertex cover across all
the repetitions is found. Algorithm 6 lists the sequential version of a Naive Seesaw
search algorithm.
Algorithm 6 Sequential Naive Seesaw searchbestV ertexSet← Vfor (1, reps) do
for (1, steps) doif isCover(vertexSet) then
vertexSet← vertexSet \ randomV ertexSetElementelse
vertexSet← vertexSet ∪ randomV ertexSetElementif isCover(vertexSet) and |vertexSet| < |bestV ertexSet| then
bestV ertexSet← vertexSetend if
end ifend for
end forreturn bestVertexSet
• Parallel Approach
For parallel implementation of the Naive Seesaw search, a parallelFor loop is used for
looping over the repetitions so that each repetition is executed by a different thread.
Algorithm 7 lists the parallel version of a Naive Seesaw search algorithm
Intelligent Seesaw Search Algorithm
A better and intelligent version of this is proposed in FastVC[3]. We use the parallel version
of the FastVC algorithm and call it as Intelligent Seesaw search. FastVC focuses on low
complexity approximate heuristics rather than focusing on heuristics that are accurate but
have very high complexity. FastVC solves the MVC problem by iteratively solving its
decision version i.e. searching for a k-sized vertex cover, where k is a positive number.
15
Algorithm 7 Parallel Naive Seesaw searchbestV ertexSet← VparallelFor (1, reps)
for (1, steps) doif isCover(vertexSet) then
vertexSet← vertexSet \ randomV ertexSetElementelse
vertexSet← vertexSet ∪ randomV ertexSetElementif isCover(vertexSet) and |vertexSet| < |bestV ertexSet| then
bestV ertexSet← vertexSetend if
end ifend for
End parallelFor
• Sequential approach
FastVC starts by constructing an initial candidate solution C. The process of con-
structing this candidate solution involves two phases: extending phase and shrinking
phase. It starts with an empty set C and in the extending phase, it goes through all the
edges in the graph and if that edge is uncovered then it adds the endpoint with higher
degree to C. At the end of this phase, a vertex cover is obtained. In the shrinking
phase, the loss values are calculated for all the vertices in C, where loss of a vertex
u represents the number of edges that will be uncovered if u would be removed from
the vertex cover. Then, it goes through all the vertices in C and removes the vertex
that have a loss value of 0 and then updates the loss value of all its neighbors. This is
done iteratively until all the vertices in C has a loss value greater than 0. This proce-
dure provides a minimal vertex cover i.e if any of the vertex was to be removed from
C then, C would no longer be a cover. The complexity of this part is O(m), where m
is the number of edges in the graph. This is better than O(n2) in most cases, where
n is the number of vertices in the graph. After constructing the candidate solution C
this way, FastVC performs a number of steps until the elapsed time is greater than
the cutoff time specified, wherein in each step it chooses a vertex u belonging to C
16
to remove. This is done by using a technique called Best From Multiple Selections
(BMS). BMS selects k random vertices with replacement from C, where k is a pa-
rameter. It then chooses the one with the best (lowest) loss value. The algorithm
iteratively removes vertices from C until it is no longer a cover. Then, it chooses a
random edge which is uncovered by C and adds the endpoint with higher gain into
C, where gain of a vertex u is the number of edges that would be covered if u would
be added to the cover. The loss and gain values of the vertex and its neighbors are
updated along with removing and adding the vertex in each step. The complexity
of using the BMS heuristic is O(1) as k is constant. Algorithm 8 lists the sequential
version of the FastVC algorithm which is same as that in [3].
Algorithm 8 Sequential Intelligent Seesaw search [3]Input: graph G = (V,E), the cutoff timeOutput: vertex cover of GC ← ConstructV C()gain(V )← 0 for each vertex v /∈ Cwhile elapsedtime < cutoff do
if C covers all edges thenC∗ ← Cremove a vertex with minimum loss from Ccontinue
end ifu← ChooseRmV ertex(C)C ← C \ ue← a random uncovered edgev ← the endpoint of e with greater gain, breaking ties in favor of the older oneC ← C ∪ v
end whilereturn C∗
• Parallel approach
For the parallel implementation of the FastVC algorithm, the algorithm starts similar
to that of the sequential version, wherein a minimal vertex cover is found using the
ConstructVC() method. Then, unlike the sequential version which performs steps as
long as the cutoff time has not elapsed, in case of the parallel version, the number
17
of repetitions and number of steps is user-specified. Each repetition performs the
specified number of steps and each repetition uses a different seed to select vertices
for the vertex exchanging step described above. This ensures that in each repetition
a different solution space is explored. The parallel version of FastVC is listed in 9.
Algorithm 9 Parallel Intelligent Seesaw search [3]Input: graph G = (V,E), reps: no. of repetitions and steps: no. of stepsOutput: vertex cover of GC ← ConstructV C()gain(V )← 0 for each vertex v /∈ CbestV ertexCover ← {V }parallelFor (1, reps)
for 1,steps doif C covers all edges then
C∗ ← Cremove a vertex with minimum loss from Ccontinue
end ifu← ChooseRmV ertex(C)C ← C \ ue← a random uncovered edgev ← the endpoint of e with greater gain, breaking ties in favor of the older oneC ← C ∪ v
end forif |C∗| < |bestV ertexCover| then
bestV ertexCover ← C∗
end ifEnd parallelForreturn bestV ertexCover
18
Algorithm 10 ConstructVC(G) [3]Input: graph G = (V,E)Output: vertex cover of GC ← ∅//extend C to cover all edgesfor e ∈ E do
if only one endpoint of e belongs to C thenfor the endpoint v ∈ C, loss(v) + +
end ifend for//remove redundant verticesfor V ∈ C do
if loss(v) = 0 thenC ← C \ v, updatelossofverticesofalltheneighborsofv
end ifend forreturn C
Algorithm 11 Best from Multiple Selection (BMS) Heuristic [3]Input: A set S, a parameter k, a comparison function f//assume f is a function such that we say an element is better than another one if it hassmaller f valueOutput: an element of Sbest← a random element from Sfor 1, k-1 do
r ← a random element from Sif f(r) < f(best) then
best← rend if
end forreturn best
19
Chapter 4
Experiments
For the experiments we have used a set of undirected graphs from the Network Data Repos-
itory on-line [2]. These graphs include the BHOSLIB benchmark graphs and a lot of real
world graphs from the area of biological networks, collaboration networks, interaction net-
works, miscellaneous networks, etc. The BHISLIB [16] benchmark graphs are generated
by transforming from forced satisfiable SAT [1] benchmarks, where the set of vertices and
set of edges respectively correspond to the set of variables and set of binary clauses in SAT.
These are graphs with 'hidden optimal solution 'and are designed to be hard to solve. All
the BHOSLIB graphs are expressed in ASCII DIMACS graph format. A description of the
ASCII DIMACS graph format is as follows:
• 0 or more of comment lines at the top of the file, starting with c and can be ignored.
• After the comment lines, there is a line that contain the size of the graph in the form:
p edge V E, where V is the number of vertices and E is the number of edges.
• Following lines in the file is a list of edges where each line is in the form: e vertex no
vertex no, where vertex no is a number between 1 and V.
These BHOSLIB [16] benchmark graphs are used to evaluate weak scaling performance
of the Naive Seesaw search algorithm and the Intelligent Seesaw search algorithm and their
ability to find high quality covers, where quality is defined as 4.1
QualityofCover = 1− |FoundV ertexCover| − |MinimumV ertexCover||MinimumV ertexCover| (4.1)
20
For these experiments we have used the RIT computer Science parallel computers,
namely champ, nessie and kraken. Champ and nessie each have one Intel Xeon E5-2690
processor with 8 dual hyper-threaded CPU cores and kraken has 4 Intel Xeon E7-8850
processors with 10 dual hyper-threaded CPU cores per processor. All the three algorithms,
Branch and Bound, Naive Seesaw search and Intelligent Seesaw search have been written
using Java’s PJ2 library [8].
Figure 4.1: Weak Scaling for Naive Seesaw search with 100000 steps
The BHOSLIB [16] graphs are used to test the weak scaling performance of the Naive
Seesaw search and Intelligent Seesaw search algorithms. In order to test the weak scaling
performance of a program, we increase the number of cores that the program uses in pro-
portion to the problem size. In our case, neither the number of vertices nor the number
of edges affect the complexity of the program. Rather, the time taken by the program is
proportional to the number of repetitions and the number of steps the program performs.
So, in order to test weak scaling, we increase the number of repetitions as we increase the
21
Figure 4.2: Weak Scaling for Naive Seesaw search with 1000000 steps
number of cores. Note that, we can’t increase the number of steps to test weak scaling
because the number of steps are not performed in parallel. So, for Naive Seesaw search
algorithm, we perform the search using 100000 and 1000000 steps on a set of graphs. For
these program runs, we go from 10 repetitions to 160 repetitions and 1 core to 16 cores.
The number of cores is increased in proportion to the number of repetitions. The weak
scaling performance of Naive Seesaw search when run with 100000 can be seen in figure
4.1 and when run with 1000000 steps can be seen in figure 4.2. We do the same for the
Intelligent Seesaw search algorithm too but in that case the we perform the program runs
with 1000 and 10000 steps. The weak scaling performance of Intelligent Seesaw search
with 1000 steps can be seen in figure 4.3 and when run with 10000 steps can be seen in fig-
ure 4.4. From the graphs it can be seen that as the number of cores increase, the efficiency
decreases. This is because, the RIT CS parallel machines used have dual hyper-threaded
22
cores. In case of dual hyper-threaded cores, for every physical core, the operating system
addresses two virtual cores and so there are possibly threads that share resources, leading
to the decrease in efficiency.
Figure 4.3: Weak Scaling for Seesaw Intelligent with 1000 steps
4.0.1 Results
The results of various experiments are shown in tables 4.1, 4.2 and 4.3. This tables are
organized as follows:
• Graph Name: Represents the name of the graph as in the BHOSLIB[16] or the net-
work repository [2].
• V: Number of vertices in the graph.
• E: Number of edges in the graph.
23
Figure 4.4: Weak Scaling for Seesaw Intelligent with 10000 steps
• Known MVC: Known minimum vertex cover size for the graph.
• Algorithm: Name of the algorithm which was run on the graph to find the minimum
vertex cover.
• Reps: Number of repetitions performed by the algorithm.
• Steps: Number of steps performed by the algorithm.
• Cores: Number of cores the algorithm was run on
• Minimal VC or threshold: In case of Intelligent Seesaw search this represents the
minimal vertex cover found by the seesaw search before starting the search. In case of
Branch and Bound, this represents the threshold till which BFS need to be performed.
• MVC found: Size of the vertex cover found by the algorithm.
24
• Quality of cover: Quality of the minimum vertex cover found.
• Time: Time taken by the algorithm in milliseconds.
The BHOSLIB[16] graphs are used to compare the quality of vertex covers found by
both, the Naive Seesaw search and Intelligent Seesaw search algorithms. The results of
both of these algorithms can be seen in table 4.1 and the following observations can be
made:
• Naive Seesaw search is unable to find the minimum vertex cover for all the graphs
even after using 10 million steps whereas Intelligent Seesaw search is able to find
minimum vertex cover by just using 10000 steps and in some cases even by using
just 1000 steps. Of course, a various number of reps have been tried for all of these
graphs and the one with the most optimal solution is reported.
• Intelligent Seesaw search finds a minimal vertex cover that is smaller than the final
vertex cover found by the Naive Seesaw search algorithm. This shows that one of
the major reason for the efficiency of the Intelligent Seesaw algorithm is that it starts
its search very close to the true optimum solution and thus, it reaches there faster as
compared to the Naive Seesaw search algorithm, which starts its search from the full
set.
• The table also shows the quality of the covers found by each of the algorithms.
A number of real world graphs are used from the network repository [2] to test the per-
formance of the Branch and Bound algorithm v/s the Intelligent Seesaw search algorithm.
Table 4.2 shows the result of these tests and the following observations can be made from
it:
• For most of the graphs, the minimal vertex cover found by the Intelligent Seesaw
search is the minimum vertex cover too and thus, it finds the optimal solution very
25
Table 4.1: Results for various BHOSLIB[16] graphs
GraphName
V EKnownMVC
Algorithm Reps Steps CoresMinimalVC
MVCfound
Qualityofcover
Time(ms)
frb30-15-1
450 17827 420SeesawNaive
400 10000000 40 NA 435 0.96 339196
frb30-15-1
450 17827 420Seesaw In-telligent
10 10000 1 430 420 1 1504
frb35-17-1
595 27856 560SeesawNaive
800 10000000 80 NA 578 0.96 778565
frb35-17-1
595 27856 560Seesaw In-telligent
160 10000 16 569 560 1 2800
frb40-19-1
760 41314 720SeesawNaive
200 10000000 20 NA 743 0.96 507204
frb40-19-1
760 41314 720Seesaw In-telligent
20 300000 16 732 720 1 11580
frb45-21-1
945 59186 900SeesawNaive
400 10000000 40 NA 927 0.97 681985
frb45-21-1
945 59186 900Seesaw In-telligent
160 100000 16 913 900 1 27499
quickly whereas the Branch and Bound algorithm takes 1000s of times more time to
find the optimal solution.
• These are very small graphs and it can be easily seen that for larger graphs, Branch
and Bound will take a lot of time to find the solution whereas Intelligent Seesaw
search algorithm will take very less time to find an optimal solution or a solution that
i very close to the optimal solution.
Table 4.3 shows the results of Intelligent Seesaw search algorithm on various real world
graphs selected randomly from the network repository [2]. These graphs belong to the
biological and collaboration networks from the network repository[2].
26
Table 4.2: Branch & Bound and Intelligent Seesaw search results for real world graphs
GraphName
V EKnownMVC
Algorithm Reps Steps Cores
MinimalVC orthresh-old
MVCfound
Qualityofcover
Time(ms)
GD95 b 73 96 23Branch andBound
NA NA 1 10 23 1 2065048
GD95 b 73 96 23Seesaw In-telligent
1 1 1 23 23 1 161
GD95 c 62 287 38Branch andBound
NA NA 1 10 38 1 484176
GD95 c 62 287 38Seesaw In-telligent
1 1 1 38 38 1 166
GD96 b 111 193 20Branch andBound
NA NA 1 10 20 1 404584
GD96 b 111 193 20Seesaw In-telligent
1 1 1 20 20 1 170
ia-infect-hyper
113 2196 90Branch andBound
NA NA 1 80 90 1 3975
ia-infect-hyper
113 2196 90Seesaw In-telligent
1 150 1 93 90 1 106
Table 4.3: Intelligent Seesaw search results for real world graphs
GraphName
V EKnownMVC
Reps Steps CoresMinimalVC
MVCfound
Qualityofcover
Time(ms)
bio-dmela
7393 25569 2630 1 10000001 2723 2632 0.99 452080
bio-yeast
1458 1948 456 50 500 16 464 456 1 836
ca-AstroPh
17903 196972 11483 50 50000 16 11512 11483 1 3005084
ca-CondMat
21363 91286 12480 1 50000 1 12499 12480 1 641315
ca-CSphd
1882 1740 550 1 600 1 554 550 1 295
ca-Erdos992
6100 7515 461 1 1 1 461 461 1 201
ca-GrQc 4158 13422 2208 1 3000 1 2213 2208 1 943ca-HepPh
11204 117619 6555 1 25000 1 6567 6555 1 33178
27
Chapter 5
Conclusions
5.1 Lessons Learned
All the 3 algorithms: Branch and Bound, Naive Seesaw search and, Intelligent Seesaw can
find the optimal solution but, as the graph size increases, the time took by the Branch and
Bound algorithm increases exponentially. In case of the Naive Seesaw search algorithm,
even though the time complexity of the algorithm doesn’t increase with the problem size,
it takes a lot of steps and repetitions to be performed in order to find the optimal solution.
But, unlike the Naive Seesaw search algorithm, the Intelligent Seesaw search algorithm
starts its search by first finding the minimal vertex cover, thus, starting the search close to
the optimal solution. Due to this, the Seesaw Intelligent search is able to find the optimal
solution very quickly for even larger graphs. When ran on various other real word graphs,
it was also observed that a lot of times, increasing the number of repetitions don't help
much but increasing the number of steps does. This can be seen in table 4.3 For most
of the graphs, Intelligent Seesaw search algorithm was able to find the optimal solution
only in a single repetition. Therefore, the Intelligent Seesaw search algorithm not only
provides better solutions than the Naive Seesaw search algorithm but also takes less time
as compared to both of the other algorithms.
5.2 Future Work
As observed from the results, the Naive Seesaw Search algorithm is not able to find solu-
tions even as good as what the Intelligent Seesaw search starts with. This happens because
28
the Intelligent Seesaw search starts with a minimal vertex cover whereas the Naive Seesaw
search starts with an empty set. In future, it would be interesting to examine the quality of
covers found by the Naive Seesaw Search if it starts its search by finding the minimal vertex
cover too, like the Intelligent Seesaw Search does. This will provide a true comparison of
the Naive Seesaw search and the Intelligent Seesaw search algorithms and tell us if it was
only starting the search closer to the optimal solution that provided such better results for
the Intelligent Seesaw search or the heuristics that it uses for its search are important too.
Additionally, this work takes a first step towards implementing parallel programs for
Minimum Vertex Cover problems for massive graphs using local search techniques. In
future, additional work can be done to solve other NP-hard problems using the seesaw
search technique and leveraging the performance of parallel computing.
29
Bibliography
[1] Boolean satisfiability problem (https://en.wikipedia.org/wiki/boolean satisfiability problem.
[2] Netwrok repository (http://networkrepository.com/).
[3] Shaowei Cai. Balance between complexity and quality: Local search for minimumvertex cover in massive graphs. In Proceedings of the 24th International Conferenceon Artificial Intelligence, IJCAI’15, pages 747–753. AAAI Press, 2015.
[4] Shaowei Cai, Kaile Su, and Qingliang Chen. Ewls: A new local search for minimumvertex cover. In Maria Fox and David Poole, editors, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). AAAI Press, 2010.
[5] Shaowei Cai, Kaile Su, Chuan Luo, and Abdul Sattar. Numvc: An efficient localsearch algorithm for minimum vertex cover. Journal of Artificial Intelligence Re-search, 46(1):687–716, January 2013.
[6] Shaowei Cai, Kaile Su, and Abdul Sattar. Local search with edge weighting andconfiguration checking heuristics for minimum vertex cover. Artificial Intelligence,175(9-10):1672–1696, June 2011.
[7] Torsten Fahle. Simple and Fast: Improving a Branch-And-Bound Algorithm for Max-imum Clique, pages 485–498. Springer Berlin Heidelberg, Berlin, Heidelberg, 2002.
[8] Alan Kaminsky. Big CPU, Big Data: Solving the World’s Toughest ComputationalProblems with Parallel Computing. CreateSpace Independent Publishing Platform,USA, 1st edition, 2016.
[9] Chu-Min Li and Zhe Quan. An efficient branch-and-bound algorithm based on maxsatfor the maximum clique problem. In Proceedings of the Twenty-Fourth AAAI Confer-ence on Artificial Intelligence, AAAI’10, pages 128–133. AAAI Press, 2010.
[10] Patric R. J. Ostergard. A fast algorithm for the maximum clique problem. DiscreteApplied Mathematics, 120(1-3):197–207, August 2002.
30
[11] S. Pirzada. Applications of graph theory. Proceedings in Applied Mathematics andMechanics, 7(1):2070013–2070013, 2007.
[12] Jean-Charles Regin. Using Constraint Programming to Solve the Maximum CliqueProblem, pages 634–648. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
[13] Silvia Richter, Malte Helmert, and Charles Gretton. A Stochastic Local Search Ap-proach to Vertex Cover, pages 412–426. Springer Berlin Heidelberg, Berlin, Heidel-berg, 2007.
[14] Etsuji Tomita and Toshikatsu Kameda. An efficient branch-and-bound algorithm forfinding a maximum clique with computational experiments. Journal of Global Opti-mization, 37(1):95–111, 2007.
[15] K. Toume, D. Kinjo, and M. Nakamura. A GPU algorithm for minimum vertex coverproblems. In American Institute of Physics Conference Series, volume 1618 of Amer-ican Institute of Physics Conference Series, pages 724–727, October 2014.
[16] Ke Xu. Bhoslib: Benchmarks with hidden optimum solutions for graph problems(http://www.nlsde.buaa.edu.cn/ kexu/benchmarks/graph-benchmarks.htm.