Upload
independent
View
3
Download
0
Embed Size (px)
Citation preview
Page 1 PASSWORD RECOVERY BY USING MPI AND CUDA
WALCHAND COLLEGE OF ENGINEERING, SANGLI
AN AUTONOMOUS INSTITUTION
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
YEAR 2013-14
PROJECT REPORT
ON
Password Recovery by using MPI and CUDA
UNDER GUIDANCE OF
Prof. M. A. Shah
SUBMITTED BY:
Sr.
No
NAME Roll
No
Seat No
1 SHWETA CHAUDHARI 125 2011BCS216
2 BHIMASHANKAR DHAPPADHULE 130 W5097102
3 CHAITANYA BHUSARE 132 W53551
Page 2 PASSWORD RECOVERY BY USING MPI AND CUDA
CERTIFICATE
2013 – 2014
WALCHAND COLLEGE OF ENGINEERING, SANGLI
AN AUTONOMOUS INSTITUTION
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
This is to certify that, the project report entitled
“Password Recovery by using MPI and CUDA”
Is bonafied record of their own work carried out by them in partial fulfillment for the award
of B. Tech in Computer Science and Engineering Under my supervision of guidance during the
academic year 2013-14.The project report has been approved as it satisfies the academic
requirement with respect to the project work prescribed for the above said course.
Submitted By:
Prof. M. A. Shah.
(External Examiner) (Project Guide)
Sr.
No
NAME Roll
No
Seat No
1 SHWETA CHAUDHARI 125 2011BCS216
2 BHIMASHANKAR DHAPPADHULE 130 W5097102
3 CHAITANYA BHUSARE 132 W53551
Page 3 PASSWORD RECOVERY BY USING MPI AND CUDA
ACKNOWLEDGEMENT
We are immensely glad to represent Project report entitled
“Password Recovery by using MPI and CUDA”
Realizing our foremost duty, we express our deep gratitude and respect to our guide Prof.
M. A. Shah, for her uplifting tendency and inspiring us for taking up this project in a successful
way.
We are also grateful to Dr. B. F. Momin, Head of Department of Computer Science &
Engineering for providing all necessary facilities to carry out the project work and whose
encouraging part has been a perpetual source of information.
We are grateful to the library personnel’s for offering all the help in completing the project
work. Last but not only the least we are thankful to our colleagues and those helped us directly or
indirectly throughout this project work.
Page 4 PASSWORD RECOVERY BY USING MPI AND CUDA
DECLARATION
We hereby declare that this project report entitled “Password Recovery by using MPI
and CUDA” is the record of authentic work carried out by us during the academic year 2013-
2014.
Date:
Place:
Sr.
No
NAME Seat No Signature
1 SHWETA CHAUDHARI 2011BCS216
2 BHIMASHANKAR D W5097102
3 CHAITANYA BHUSARE W53551
Page 5 PASSWORD RECOVERY BY USING MPI AND CUDA
INDEX PAGE NO
1. Introduction 6
1.1. Existing System and it’s Drawback 6
1.2. Literature Survey 7
1.3. Motivation 7
1.4. Scope 8
1.5. Problem Statement 8
2. Objectives of project 9
2.1 Significance of undertaking the project 9
3. Project Overview 10
3.1. System Architecture 10
3.2. MPI 10
3.3. CUDA 13
3.4. MPI+CUDA Platforms Used 13
3.5. Platforms Used 14
3.6. Inputs to the algorithms 14
3.7. Flow Layout 16
3.8. Control Layout 17
3.9. Flowcharts of Algorithms 17
3.9.1 Divided Dictionary Algorithm 17
3.9.2 Divided Password Database Algorithm 18
3.9.3 Minimal Memory Algorithm 19
3.10. MPI Functions Used 20
3.10. CUDA Functions Used 21
4. Implementation 22
4.1. Formation of cluster 22
4.2. Installation of CUDA 23
4.3. Implementation of algorithms 25
5. Experiments and Result Analysis 29
6. Conclusion 34
7. Reference 35
Page 6 PASSWORD RECOVERY BY USING MPI AND CUDA
1. INTROUCTION: Electronic authentication is the process of confirming a user identity presented to an
information system. A widely deployed method for confirming a user identity is based on
passwords. A person submits her userid and password to a computer system. The computer
system processes the password with a hash function and compares the hash function output with
a hash value associated with that userid from a password database. If the two hash values match,
she is granted access to the system. Millions of people follow this protocol to confirm their
identity multiple times each day.
The hash functions used to authenticate users are called cryptographic hash functions.
These functions read a string of characters of any length, perform a hash operation on the string,
and then return an encoded, hashed value as a fixed length string. A good cryptographic hash
function is one-way, which means the original input string cannot be recovered by reversing the
function. This one-way characteristic has led to passwords being stored as hashed values instead
of in their plain-text format. Accordingly, even if the password database is compromised, the
plain-text passwords are still somewhat secure.
1.1. Existing System and its Drawbacks:
A dictionary-based password recovery application performs a large number of hash and
string comparison operations. A password dictionary is filled with familiar names, common
words, and simple character sequences. With a large dictionary or a large number of entries in a
password database, password recovery is computationally expensive. However, this type of
analysis has a high level of parallelism; testing one word from the dictionary is independent of
testing another. This makes dictionary-based password recovery well suited for a HPC
environment, like MPI or CUDA. The parallelism can be taken even further and implemented in
a hybrid MPI+CUDA environment, which allows for an initial course-grained division of the
problem using MPI and a fine-grained division of the problem using CUDA. In the realm of
password recovery there are several approaches that have been deployed, each with their own
advantages and disadvantages.
One problem with password authentication is that some passwords are weak; they contain
little randomness from one character to the next. When people create their own passwords, it is
natural for them to choose words that are easy to remember. Passwords are sometimes based on
familiar names, common words, or are simple sequences like “12345.” Those types of passwords
are easily guessed by attackers.
Page 7 PASSWORD RECOVERY BY USING MPI AND CUDA
1.2. Literature Survey:
There has been significant research in the areas relating password strength to system
security, improving the performance of password recovery systems, as well as high performance
distributed computing approaches to a variety of problems including password recovery. So we
have surveyed about some of the documents related to passwords and system security, serial
dictionary recovery system, course-grained HPC, fine-grained HPC, hybrid HPC and we have
discovered the following literature of our project.
As we have gone through the passwords and system security, we have seen that one area of
research focuses on improving the strength of systems that use password-based authentication
protocols. System strength can be improved by understanding what makes one password better
than another password. measured password strength in terms of the search space required for
attackers to guess some of the passwords. It has been shown that dictionary attacks are most
effective at discovering weak passwords; dictionary word mangling is useful after the dictionary
has been fully tested. Finally, Markov-based techniques were most useful in breaking strong
passwords.
Performance of password recovery can also be improved by increasing the number of
processors used. We used a loosely coupled architecture based on volunteer computing to
implement a dictionary attack against a private key ring generated by GnuPG software.
MPI and OpenMP provide an efficient way of coarsely dividing work between multiple
computers, cores, or threads where each individual unit performs calculations on a subset of the
data. On the other hand, CUDA is better suited to a more fine-grained parallelism of the
problem, where multiple versions of the LU benchmark are compared on multiple architectures.
Course- and fine-grained HPC have normally been used independent of each other.
However, they may be combined as a hybrid framework of parallelism. A course division of the
problem is done at the CPU level, using MPI or OpenMP. This level divides the data or
calculations into smaller subsets, handling distribution of data and gathering of results. At the
next level GPUs can then apply a fine-grained type of parallelism to the smaller subset of the
problem.
1.3. Motivation:
High Performance computing (HPC) is emerging with a great speed in the world of
technology.
Energy costs of HPC systems are of growing importance
Existing power saving technologies largely unused in HPC
Intelligent use of existing technologies has the potential to significantly reduce power
consumption while retaining performance
Efficient programming of clusters of shared memory (SMP) nodes
Hierarchical system layout
Hybrid programming seems natural
Page 8 PASSWORD RECOVERY BY USING MPI AND CUDA
• MPI between the nodes
• Shared memory programming inside of each SMP node
– MPICH2
– MPI-3 shared memory programming
CUDA has the following advantages that lead us to the motivation:
• High-level, C/C++/Fortran language extensions
• Scalability
• Thread-level abstractions
• Runtime library
1.4. Scope:
This MPI+CUDA approach allows for the division of a large dictionary or a password
database between multiple MPI nodes, which is then further divided among the GPUs of each
node. This allows for multiple GPUs to perform calculations at the same time. An analysis of the
algorithms was done to identify the different strengths and weaknesses with respect to execution
time and memory usage and help determine the most efficient data distribution strategy for a
hybrid MPI+CUDA computing environment.
1.5. Problem Statement
Design and implementation of multiple password recovery algorithm using MPI and
CUDA hybrid approach.
Page 9 PASSWORD RECOVERY BY USING MPI AND CUDA
2. OBJECTIVES OF THE PROJECT:
Objectives for undertaking the project are:
We are going to compare the memory usage and run times for multiple password
recovery algorithms. We designed and implemented algorithms using an MPI+CUDA hybrid
approach to password recovery, with each algorithm based around the dictionary attack method
of password recovery.
2.1 Significance of undertaking the project:
Thus, the main significant of this proposed algorithm is, to determine the most efficient
use of a hybrid MPI+CUDA system for password recovery, three different algorithms were
developed. The algorithms are similar in that they are based on a dictionary approach to
password recovery. In the dictionary approach hash values for a list of words are computed.
Those computed hash values are then compared against the hash values stored in a password
database. If the computed hash value for a dictionary word matches the stored hash value for an
account in the database, then the dictionary word is the password for the account.
The algorithms differ primarily in how the dictionary and password database are
distributed among the GPU devices. The remainder of this section will describe the hash function
used, our dictionary and password database data, and the three algorithms.
Page 10 PASSWORD RECOVERY BY USING MPI AND CUDA
3. PROJECT OVERVIEW:
3.1. System Model:
3.2. MPI:
MPI operates via function calls. C and C++ users should use something like this to make
use of MPI:
#include <mpi.h>
// ...
int rc;
// ...
rc = MPI_Bcast( ... )
Unlike OpenMP, you shouldn’t think of “parallel regions.” Your entire code is running in
parallel. You should exploit this. To initialize the MPI environment, you need to include some
function calls in your code. The following are typically used:
/* add in MPI startup routines */
/* 1st: launch the MPI processes on each node */
MPI_Init(&argc,&argv);
Page 11 PASSWORD RECOVERY BY USING MPI AND CUDA
/* 2nd: request a thread id, sometimes called a "rank" from
the MPI master process, which has rank or tid == 0
*/
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
/* 3rd: this is often useful, get the number of threads
or processes launched by MPI, this should be NCPUs-1
*/
MPI_Comm_size(MPI_COMM_WORLD, &nthreads);
When the MPI program reaches that first MPI_Init, it passes the arguments it received
on the command line to the MPI launcher process, which starts new copies of the binary running
on the processors.
At this point in time, you have gone from a single process to NCPU (or N Cores)
coordinated processes. These processes do not share variables. Your job is to figure out what
process needs what data, when they need it, and get it to them.
The MPI_Comm_rank function call returns data on a specific process. A process id of 0
is the “master” process, the original process which did in fact launch all the others. The process
id or “rank” is used to identify the different processes in you program.
The MPI_Comm_size function call returns the number of threads or processes being
used for this program.
With this information, your program is now aware of the parallel environment. This is
explicitly the case. Note that you could run your code on NCPU=1, in which case you have
effectively serial operation.
A way to visualize this is to imagine an implicit loop around your program, where you
have NCPU iterations of the loop. These iterations all occur at the same time, unlike an explicit
loop. The number of CPUs or cores is controlled by specifying a value to the mpirun or mpiexec
command using the -np argument. If it is not set, it could default to 1 or the number of
CPUs/Cores on your machine.
That done let’s parallelize hello.c. (Note: all source code used in this article can be
downloaded here.) Let’s put in the explicit function calls for the MPI processes, and add some
additional “sanity checking”, so we know where and how the code ran. The original code looks
like this:
#include "stdio.h"
int main(int argc, char *argv[])
{
printf("hello MPI user!\n");
return(0);
}
The MPI version of the code looks like this:
Page 12 PASSWORD RECOVERY BY USING MPI AND CUDA
#include "stdio.h"
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int tid,nthreads;
char *cpu_name;
/* add in MPI startup routines */
/* 1st: launch the MPI processes on each node */
MPI_Init(&argc,&argv);
/* 2nd: request a thread id, sometimes called a "rank" from
the MPI master process, which has rank or tid == 0
*/
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
/* 3rd: this is often useful, get the number of threads
or processes launched by MPI, this should be NCPUs-1
*/
MPI_Comm_size(MPI_COMM_WORLD, &nthreads);
/* on EVERY process, allocate space for the machine name */
cpu_name = (char *)calloc(80,sizeof(char));
/* get the machine name of this particular host ... well
at least the first 80 characters of it ... */
gethostname(cpu_name,80);
printf("hello MPI user: from process = %i on machine=%s, of NCPU=%i processes\n",
tid, cpu_name, nthreads);
MPI_Finalize();
return(0);
}
Unlike OpenMP, there is no “parallel region” to enclose, the entire code is a “parallel
region”, but nothing is shared. Let us create a simple Makefile to handle building this. The
Makefiles are available with the Source Code. Working binary executable called hello-mpi.exe.
To run the binary, use the mpirun command below. In this example we are starting eight
processes. Each process will be calledhello-mpi.exe and they will communicate with each other
using MPI.
/home/joe/local/bin/mpirun -np 8 -hostfile hostfile ./hello-mpi.exe
hello MPI user: from process = 0 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 1 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 3 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 4 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 5 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 2 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 6 on machine=pegasus-i, of NCPU=8 processes
hello MPI user: from process = 7 on machine=pegasus-i, of NCPU=8 processes
There are some additional arguments worth discussing. First, the -hostfile argument provides a
filename that tells mpirun the host or hosts on which to run the program. For this tutorial, we will
use a single hostfile with “localhost” as our only host. You can use any name for this file.
Second, since we have multiple cores we can adjust the number of processes to match the
number of cores by using the -np parameter.
Page 13 PASSWORD RECOVERY BY USING MPI AND CUDA
3.3. CUDA
CUDA (Compute Unified Device Architecture) is a parallel computing platform and
programming model created by NVIDIA and implemented by the graphics processing
units (GPUs) that they produce. CUDA gives program developers direct access to the
virtual instruction set and memory of the parallel computational elements in CUDA GPUs.
CUDA provides both a low level API and a higher level API. CUDA works with all
Nvidia GPUs from the G8x series onwards, including GeForce, Quadro and the Tesla line.
CUDA is compatible with most standard operating systems. Nvidia states that programs
developed for the G8x series will also work without modification on all future Nvidia video
cards, due to binary compatibility.
3.4. MPI + CUDA:
MPI, the Message Passing Interface, is a standard API for communicating data via
messages between distributed processes that is commonly used in HPC to build applications that
can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which
is designed for parallel computing on a single computer or node. There are many reasons for
wanting to combine the two parallel programming approaches of MPI and CUDA. A common
reason is to enable solving problems with a data size too large to fit into the memory of a single
GPU, or that would require an unreasonably long compute time on a single node. Another reason
is to accelerate an existing MPI application with GPUs or to enable an existing single-node
multi-GPU application to scale across multiple nodes. With CUDA-aware MPI these goals can
be achieved easily and efficiently.
Page 14 PASSWORD RECOVERY BY USING MPI AND CUDA
3.5. Platforms Used:
Language Used: C, CUDA C, C++.
Environment: Two Ubuntu 12.04 and two GEFORCE GT 525M NVIDIA GPU having
96 cores each.
The environment allows for an initial course-grained division of the problem using MPI
and a fine-grained division of the problem using CUDA.
3.6. Inputs to the Algorithms:
The password recovery program assumes that some preprocessing of the input files has
been done. The dictionary file and user files require the first two lines of the files to be the
number of entries in the file and the maximum string length of the entries respectively. The
maximum string length for the user account is the maximum length of the user account names.
i.e.
------------------------------------------------
<number of dictionary words>
<maximum string length of the dictionary words>
word1
word2
...
------------------------------------------------
Page 15 PASSWORD RECOVERY BY USING MPI AND CUDA
The dictionary file is formatted so that each line of the file after the first two lines is a single
dictionary word; as shown above
The user file is formatted "<userName> : <X X X X X>" where <userName> is the user account
name and <X X X X X> is the user accounts SHA1 hash value in hex format separated into 32
bit blocks separated by a space; as shown below
----------------------------------------------------
...
user1: a9993e36 4706816a ba3e2571 7850c26c 9cd0d89d
...
----------------------------------------------------
Secure hash algorithm input: The Secure Hashing Standard, which uses the Secure Hashing
Algorithm (SHA), produces a 160-bit message digest for a given data stream. In theory, it is
highly improbable that two messages will produce the same message digest. Therefore, this
algorithm can serve as a means of providing a "fingerprint" for a message.
Page 16 PASSWORD RECOVERY BY USING MPI AND CUDA
3.7. Flow Layout:
Installation of
MPI ch2-1.4.1
Formation of
MPI cluster of 2
nodes
Formation of
GPU cluster Implementation
of Algorithms
Installation of
CUDA 5.5
Testing
Result Analysis
END a. Divided Dictionary Algorithm
b. Divided Password Database Algorithm
c. Minimal Memory Algorithm
Page 17 PASSWORD RECOVERY BY USING MPI AND CUDA
3.8. Control Layout:
1. First we have to install an MPIch2-1.4.1 in the machine having Nvidia GEForce GT
540M.
2. Then we have to form a cluster of MPI of all the nodes.
3. After that, we have to install the CUDA 5.5.
4. Formation of GPU cluster is the next step.
5. Implement the algorithms.
a. Divided Dictionary Algorithm
b. Divided Password Database Algorithm
c. Minimal Memory Algorithm
6. Testing is necessary step for all projects.
7. Analysis of the result is carried out.
3.9. Flowcharts of Algorithms:
YES
NO
Start
Input mode, dictionary file name, password
database file name
If incorrect
argument Generate Error
Generate Dictionary/ Divide Dictionary/ Password
database file according to user input
A
Page 18 PASSWORD RECOVERY BY USING MPI AND CUDA
1 2
3
3.9.1 Divided Dictionary Algorithm:
A
Call Divided
Dictionary Algorithm If mode
Call Minimal Memory
Algorithm
Call Divided Password
Database Algorithm 1
2
3
1
Read No of Dictionary words and password database words
MPI will divide dictionary file according to user’s specification.
Load the divided files on each GPU
Initialize threads and allocate GPU memory
Calculate Hash value for each dictionary words and store this value in Global memory
Compare these stored values with password database hash value
Display matched results, time require to loading, generating and comparing hashes
and allocate memory usage
EXIT
Copy the entire dictionary file on GPU
Page 19 PASSWORD RECOVERY BY USING MPI AND CUDA
3.9.2 Divided Password Database Algorithm:
3.9.3 Minimal Memory Algorithm:
2
Read No of Dictionary words and password database words
MPI will divide password database file according to user’s specification.
Load the divided files on each GPU
Copy the entire dictionary file on GPU
Initialize threads and allocate GPU memory
Calculate Hash value for each dictionary words and store this value in Global memory
Compare these stored values with password database hash value
Display matched results, time require to loading, generating and comparing hashes
and allocate memory usage
EXIT
3
Initialize thread and allocate memory for dictionary word
B
Page 20 PASSWORD RECOVERY BY USING MPI AND CUDA
NOT FOUND
FOUND
3.10. MPI Functions Used:
SR. NO. Function Name Description
1 MPI_Init Initializes the MPI execution environment
2 MPI_Comm_size Returns the total number of MPI processes in the
specified communicator.
3 MPI_Comm_rank Returns the rank of the calling MPI process within the
specified communicator.
4 MPI_Send Basic blocking send operation.
5 MPI_Recv Receive a message and block until the requested data is
available in the application buffer in the receiving task.
6 MPI_Barrier Synchronization operation.
7 MPI_Bcast Data movement operation. Broadcasts (sends) a message
from the process with rank "root" to all other processes in
the group.
8 MPI_Scatter Data movement operation. Distributes distinct messages
from a single source task to each task in the group.
B
Load the dictionary word and hash value from password database file
Generate hash values for dictionary word
Compare both the values
If match
Display matched results, time require to loading, generating and comparing hashes
and allocate memory usage
EXIT
Page 21 PASSWORD RECOVERY BY USING MPI AND CUDA
3.11. CUDA Functions Used:
SR. NO. Function Name Description
1 cudaGetDeviceCount() Returns the number of deices with compute capability
2 cudaGetDeviceProperties() Returns’ all the device properties
3 cudaSetDevice() Returns the device on which the host thread executes.
4 cudaGetDevice() Sets devices as the current device for the calling host
5 cudaMalloc() Allocate the GPU memory
6 cudaMemcpy() Copy count bytes from memory area pointed to by src to
dest.
7 cuDeviceGetCount() Returns number of devices on node.
8 cuDeviceGet() Returns name of the device
9 cuDeviceGetName() Returns name and its properties of devices.
10 cudaGetLastError() Returns the error message.
Page 22 PASSWORD RECOVERY BY USING MPI AND CUDA
4. IMPLEMENTATION:
4.1 Formation of a Cluster:
Instructions for building MPICH2
1. Download mpich2-1.4.1p1.tar.gz to your account on Shale. 1.4.1p1 is the current stable
release. You can download it from
http://www.mcs.anl.gov/research/projects/mpich2/
2. Uncompressing and extract the archive.
$ tar xvf mpich2-1.4.1p1.tar.gz
3. Make an installation directory where the final MPI system will reside.
$ mkdir /home/<your userid>/mpich2-install/
4. At this point you should change to one of the CUDA nodes: node1,
node2
5. Run the MPI configuration script from the archive directory. The following disables the
FORTRAN compiler, specifies g++ as the c++ compiler, enables shared libraries, and
specifies the location of the installation directory. The following should be on one line.
$ ./configure --disable-f77 --disable-fc CXX=g++ --enable-shared --prefix=/home/<your
userid>/mpich2-install 2>&1 | tee c.txt
The options for CXX and enable-shared were needed to get the system to run on Shale.
6. Run make to build MPICH2.
$ make 2>&1 | tee m.txt
7. Run the install script.
$ make install 2>&1 | tee mi.txt
8. Add the mpich2-install/bin/ directory to your path.
9. Test that your path is configured correctly. The output of the following commands should
refer to the bin directory of your installation directory. If they do not, return to step 8 and
move the mpich2-install/bin directory to the beginning of your path.
$ which mpicc
$ which mpiexec
10. Test that you can ssh to our four MPI nodes (node1, node2) without having to enter a
password.
11. Create a hosts file. This should be a text file like the mpd.hosts
file we used at the beginning of the semester. The file should contain:
Page 23 PASSWORD RECOVERY BY USING MPI AND CUDA
node1
node2
12. Run a test program at mpich2-1.4.1p1/examples/cpi.
$ mpiexec -n 5 -f <path to hosts file><path to cpi program>
13. Compile and run mpi_hello.cxx
$ mpicxx -g -Wall -o mpi_hello mpi_hello.cxx
$ mpiexec -n 4 -f hosts ./mpi_hello
4.2. Installation of CUDA:
STEP I – Driver installation
Download the Nvidia drivers for Linux from their website, making sure to select 32 or 64-bit
Linux based on your system.
Make sure the requisite tools are installed using the following command -
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-devlibxi-dev libgl1-
mesa-glx libglu1-mesa libglu1-mesa-dev
Next, blacklist the required modules (so that they don’t interfere with the driver installation) -
sudonano /etc/modprobe.d/blacklist.conf
Add the following lines to the end of the file, one per line, and save it when done -
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
In order to get rid of any Nvidia residuals, run the following command in a terminal -
Page 24 PASSWORD RECOVERY BY USING MPI AND CUDA
sudo apt-get remove --purge nvidia*
Press Ctrl+Alt+F1 to switch to a text-based login, and switch to the directory which contains the
downloaded driver.
Run the following commands -
sudo service lightdm stop
chmod +x NVIDIA*.run
where NVIDIA*.run is the full name of your driver. Next, start the installation with -
sudo ./NVIDIA*.run
sudo service lightdm stop
sudo ./NVIDIA*.run.
Reboot once the installation completes.
STEP II – CUDA toolkit installation
Download the CUDA toolkit Navigate to the directory that contains the downloaded CUDA
toolkit package, and run the following command –
chmod +x cuda*.run
sudo ./cuda*.run
wherecuda*.run is the full name of the downloaded CUDA toolkit. Accept the license that
appears.
To make sure the necessary environment variables (PATH and LD_LIBRARY_PATH) are
modified every time you access a terminal, add the requisite lines (from the summary screen)
to the end of ~/.bashrc as follows -
Page 25 PASSWORD RECOVERY BY USING MPI AND CUDA
32 bit systems -
export PATH=$PATH:/usr/local/cuda-5.0/bin
export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib
64 bit systems -
export PATH=$PATH:/usr/local/cuda-5.0/bin
export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib64:/lib
STEP III – CUDA samples installation and troubleshooting
While the installation of the samples should be straightforward (simply run the all in one
toolkit installer), it’s often not all that easy.
Determine if variants of libglut.so are present as follows -
sudo find /usr -name libglut\*
On my 64-bit installation of Ubuntu 12.04, this output the following text -
/usr/lib/x86_64-linux-gnu/libglut.so.3
/usr/lib/x86_64-linux-gnu/libglut.so.3.9.0
/usr/lib/x86_64-linux-gnu/libglut.a
/usr/lib/x86_64-linux-gnu/libglut.so
4.3. Implementation of Algorithms:
Our hypothesis was that each algorithm will require different amounts of memory and
processing time due to their differences in how they divided the dictionary and password
database data. The scalability of the different algorithms was also of interest.
1) Divided Dictionary Algorithm: The divided dictionary algorithm (shown in Figure 1)
divides the larger dictionary file between the MPI nodes which then further divide the
dictionary words between the GPUs on each MPI node. Each GPU has a different portion of
the dictionary. The full password database file is then loaded onto each GPU. A kernel
function is repeatedly called for each of the user accounts to test for matches between the
password database hash and the dictionary word hashes. Any matches represent the
recovered password for the user account. The main steps of this algorithm are described
below.
Page 26 PASSWORD RECOVERY BY USING MPI AND CUDA
a) Evenly divide the dictionary words among the MPI nodes. Each MPI node further
divides its words among its available GPU devices.
b) The GPU devices calculate hash values for each of its dictionary words and store
them in global memory.
c) The contents of the password database are copied to each GPU device. Each GPU
has the same account data.
d) Finally, the GPU tests for matches between the calculated hash value of each
dictionary word with the stored hash value from the password database.
Fig. 1.Divided Dictionary Algorithm. The dictionary of potential passwords is divided
among the MPI nodes. Each MPI node subdivides its words among its available GPU nodes. The
same password database is loaded onto each GPU.
2) Divided Password Database Algorithm: The divided password database algorithm
(shown in Figure 2) evenly divides the password database file between the MPI nodes which
then further divide the user account data between the GPUs on each MPI node. The full
dictionary file is then loaded onto each GPU and a hash value for each word is calculated. A
kernel function is then repeatedly called for each of the user accounts to test for matches
between the password database hash value and the dictionary word hashes. Any matches
represent the recovered password for a given user account as outlined below.
a) Distribute all of the dictionary words to each MPI node and its available GPU
devices.
b) The GPU devices calculate hash values for each dictionary word.
c) Evenly divide the contents of the password database among each MPI node. Each
MPI node then divides its account data among the GPU devices.
Page 27 PASSWORD RECOVERY BY USING MPI AND CUDA
d) Finally, the GPU tests for matches between the calculated hash value of each
dictionary word and the stored hash value from the password database.
Fig. 2.Divided Password Database Algorithm. The password database is divided among
the MPI nodes. Each MPI node subdivides that user account data among its available GPU
devices. The entire dictionary is loaded onto each GPU.
3) Minimal Memory Algorithm: The minimal memory algorithm (shown in Figure 3)
repeatedly calculates the hash value for a dictionary word and then tests for a match in the
password database. The intent of this algorithm is to be a low memory solution; it differs
greatly when compared to the other algorithms. The minimal memory algorithm divides the
password database file between the MPI nodes which then further divide the password
database between the GPUs on each MPI node. The choice to divide the password database
between the MPI nodes instead of the dictionary file is based on the assumption that the
password database will be significantly smaller than the dictionary file. The algorithm
processes the data as outlined below.
a) Evenly divide the password database among the MPI nodes. Each MPI node
further divides its user accounts among the GPU devices.
b) For each dictionary word,
i. Load and distribute the word among the GPUs.
ii. Calculate the word’s hash value.
iii. Compare the calculated hash value against the loaded user account hash
values.
Page 28 PASSWORD RECOVERY BY USING MPI AND CUDA
Fig. 3.Minimal Memory Algorithm. The password database is divided evenly among the
MPI nodes and then subdivided among the GPU devices. One by one, a single word from the
dictionary is loaded onto the GPUs, hashed, and compared with the user account hash values.
Page 29 PASSWORD RECOVERY BY USING MPI AND CUDA
5. EXPERIMENTAL RESULTS A. Divided Dictionary Algorithm
B. Minimal Memory Algorithm.
Page 30 PASSWORD RECOVERY BY USING MPI AND CUDA
C. Divided Password Algorithm
D. Divided Dictionary Algorithm on 2 GPU
Page 31 PASSWORD RECOVERY BY USING MPI AND CUDA
E. Divided Password Algorithm on 2 GPU
F. Minimal Memory Algorithm on 2 GPU
Page 33 PASSWORD RECOVERY BY USING MPI AND CUDA
I. Execution Time of Program:
J. Average GPU Memory Usage:
Page 34 PASSWORD RECOVERY BY USING MPI AND CUDA
6. CONCLUSION
This project demonstrates that significant scalability and performance gains are possible with
a hybrid HPC approach to password recovery using MPI+CUDA. These tools can be easy to use
and will likely be adopted by both sides in the arms race between system administrators and
hackers.
Of the three algorithms analyzed the divided dictionary algorithm was the best approach for
distributing data to each GPU. Dividing the larger dictionary, rather than the smaller password
database among the GPUs ensured performance gains as the number of GPUs increased.
In general, this project shows that MPI and CUDA can be combined to develop an efficient
password recovery system that can scale to multiple compute nodes with multiple GPUs.
Further, this work opens up many avenues for improving performance and scalability for large
scale hybrid computing systems, as well as utilizing and testing modern improvements in GPU
technology.
Page 35 PASSWORD RECOVERY BY USING MPI AND CUDA
7. REFERENCES:
[1] MPICH [Online]. Available:
http://www.mcs.anl.gov/research/projects/mpich2/
[2] NVIDIA Corporation. CUDA.[Online]. Available:
http://www.nvidia.com/object/cuda home new.html
[3] M. de la Asuncin, J. M. Mantas, M. J. Castro, and E. Fernndez- Nieto, “An MPI-CUDA
implementation of an improved roe method for
two-layer shallow water systems,”
[4] S. Pellicer, Y. Pan, and M. Guo, “Distributed MD4 password hashing
with grid computing package BOINC,”
[5] S. Pennycook, S. Hammond, S. Jarvis, and G. Mudalige, “Performance
analysis of a hybrid MPI/CUDA implementation of the NAS-LU