Upload
independent
View
1
Download
0
Embed Size (px)
Citation preview
Parallelisation project report
General Project Information
Name Project : Parallelisation of the DirectMRDCI code in GAMESS
Project code : N/A
Project members : Remco Havenith & Joop van Lenthe
Theoretical Chemistry
Utrecht University
Fokke Dijkstra
SARA
Start date project : January 1, 2005
End date project : August 31, 2006
Estimated time : 18 weeks (programming) + 3 weeks (documentation)
1
Parallelisation of the DirectMRDCI code in GAMESS
Abstract
In this report we describe the integration of the parallel direct selected multireference CI code
Diesel into GAMESSUK. This has been done in order to introduce a parallel version of this
method into GAMESSUK. Most of the time has been spent on updating the C++ code in
which Diesel has been written. The scaling of the resulting code is shown to be satisfactorily,
but expected to be very good for large calculations. The project has been funded by NCF,
project number NRG2004.06.
Introduction
Goal of the project
The goal of this project was the parallelisation of the direct selected multireference CI code in
GAMESSUK. The Direct CI Code [1], as implemented in GAMESSUK [2], had already been
successfully parallelised. This code is able to perform MultiReference CI calculations for
large Configuration Interaction problems, without any approximation, employing all singly and
doubly excited configurations. This results in a highly efficient algorithm. CI's with more than
1001000 million configurations are quite feasible.
However, taking all configurations into account leads to a N4 dependency (roughly N2occ∙ N2
virt ∙
Nref where Nref is the number of reference configurations) of the number of configurations on
the size of the basis set. Thus for a molecule with say a 100 occupied orbitals and 400
virtuals, not an extreme molecule for current HartreeFock and DFT programs, this would
result using 30 reference configurations in 10000∙160000∙30 ≈ 5∙1010 configurations and
(depending on the spinstate) even more states. This is a rather large number, whereas this
example concerns a system, which is well in the range of molecules, that we like to study in
our collaborative (Theoretical Chemistry and Physical Organic Chemistry (cf. project SG032
B)) effort to study the optoelectrical properties of large organic molecules. For this type of
molecules a selection mechanism is called for, allowing different parts of the molecules to be
correlated using different combinations of occupied and virtual orbitals.
This kind of scheme is implemented in the semidirect MRDCI program of Engels et al. [3]
which was already implemented in the GAMESSUK program during a previous Cray
Research Grant (CRG 96.18). In order to keep up with current computational demands a
parallelisation of this code is required. There existed also a parallel C++ DieselCI program
[4,5], which is more capable in terms of number of states allowed, but was not yet integrated
within GAMESSUK. In the past it was also less capable in terms of properties that are
available, but that has been improved. There were also found to be problems with the
portability of this C++ code. Noteworthy is further that Diesel makes use of standard System
V IPC for communication between the different parallel processes.
As a first step in the parallelisation effort we decided to first look at the Diesel code before
trying to parallelise the MRDCI code within GAMESSUK, because the latter is much more
difficult.
2
A further point of interest was the parallelisation of the 4index transformation and integral
adaption prior to the actual CI calculation. This had been done already, but the performance is
lacking and will constitute a bottleneck in the total calculation.
Description of the programs
GAMESSUK
GAMESSUK is a general purpose ab initio molecular electronic structure program for
performing SCF, DFT and MCSCFgradient calculations, together with a variety of
techniques for post Hartree Fock calculations.
The program is derived from the original GAMESS code by Michel Dupuis in 1981, then at the
National Resource for Computational Chemistry, NRCC. This code was also used as the
basis for the US version of GAMESS, which is a different program and should not be
confused with GAMESSUK (although you would certainly not be the first to make this
mistake...). Both GAMESSUK and the US GAMESS have been extensively modified and
enhanced since they branched from the orginal GAMESS code.
The work on GAMESSUK has included contributions from numerous authors [1], and has
been conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the UK's
Collaborative Computational Project No. 1 (CCP1). Other major sources that have assisted in
the ongoing development and support of the program include various academic funding
agencies in the Netherlands, and ICI plc.
Diesel
Diesel is an objectoriented direct individually selecting multireference CI program, developed
by Michael Hanrath and Bernd Engels[4,5]. The code builds on earlier work by Volker Pleß[6].
One of the novel approaches of the program is that it has been written almost complety in an
objectoriented fashion using C++.
The program consists of several subprograms that perform certain tasks. A driver program
called diesel is also available. Documentation of the code is currently lacking. The best
overview of the code can be found in the thesis by Michael Hanrath[5], especially chapter 4.
The diagonalisation part of the code is able to run in parallel. A shared memory
communication model using System V IPC is used for this. The rest of the code is not able to
run in parallel however.
Parallelisation of the program
Integration of the Diesel code into GAMESSUK
As stated the parallelisation of the code was started by investigating the Diesel code by
Michael Hanrath et al. The most recent version of the code was obtained from Bernd Engels,
and permission was granted to incorporate the code into GAMESSUK.
The integration of the code into GAMESSUK consisted of several steps. The first one was
getting the standalone diesel code compiled using modern compilers; the second was the
3
integration into the GAMESSUK build system, and the final step the integration into the
GAMESS runtime system.
Compilation of the code
We started by installing the code on the Lisa cluster at SARA, which contains Dual Xeon
nodes, running the Debian Sarge linux distribution. The default GNU compiler on the system
was of version 3.3.5.
The Diesel code did not compile with this version of the compiler. The compilation generated
many errors. Some investigation learned that the problems were caused by the age of the
code. The Diesel code has been developed in the late 1990’s, using the 2.7 and 2.8 version of
the GNU compilers. In that time the GNU compilers did not adhere to the current ISO C++
standard. Furthermore these compilers came with a special class library called libg++. This
library has been used extensively within the Diesel code. Unfortunately the library has been
abandoned, in favour of the ISO C++ standard template library (STL). Since the code was
developed using the GNU compilers testing with other compilers seemed not to be a solution.
Since the experiences with the Diesel code in the past were quite good, we decided to still try
to create a working version of the Diesel code.
In order to get the Diesel code compiled a lot of changes had to be made to the program. In
the following list we describe some of the problems encountered and their solution.
1. The Diesel code depended heavily on the libg++ library, which is now obsolete. The
code of the library is still available though. Two solutions were considered. The first is
rewriting the functionality of the needed code, the second adaption of the existing
code to get it working with recent compilers. The first solution was deemed as being
too much work. Therefore all needed parts were taken from the libg++ library and
incorporated with the Diesel code. This code has been updated in similar ways as the
rest of the Diesel code, as we will describe now.
2. Templates have to be defined more strictly in the newer GNU compilers. A lot of the
template definitions, especially those redefining the in and output operators had to be
defined more strictly. This required a lot of research into the meaning of the specific
code, and the way those templates have to be defined.
3. The standard includes have all been changed in the newer C++ standards. For a lot
of include files this only meant a change of the name of the header file. Some library
code has been changed as well however, most notably the string stream libraries. The
Diesel code has been updated to use the modern sstream class library.
Another point worth noting is that all standard libraries are now in the namespace std,
requiring the use of this namespace where needed.
4. The use of the STL was fixed and brought up to date, by using the correct includes.
After basic changes like this the code could be compiled on the Lisa cluster, using the GNU
3.3.5 compilers. For fixing the bugs that appeared, a reference version was invaluable though.
We have managed to install the GNU 2.8 compilers on the Origin 3800, Teras, system at
SARA. Using these compilers we were able to create a working version of the code that was
very useful in debugging the version compiled on Lisa with the newer compiler. In this way we
have been able to remove some bugs from the code that only appeared using the newer
4
compilers. Installing the GNU 2.8 compilers on Lisa was not possible, because of changes in
the C library that have been made since the 2.8 compilers were released.
In order to increase the portability, we decided to test on other platforms and compilers as
well. The following platforms have been used during the project:
The Dell Dual Xeon Cluster, Lisa, using the GNU compilers, version 3.3.5 and 3.4.0,
and the Intel compilers version 8.1 and 9.0.
The SGI Origin 3800 system, Teras, at SARA, using the MIPSPro 7.4.2 compilers.
The SGI Altix 3700 system, Aster, at SARA, using the Intel compilers versions 8.1.
A Dual G5 PowerMac at the theoretical Chemistry group of Utrecht University, using
the GNU 3.x compilers and the IBM Fortran compiler.
The IBM p690 system, Solo, at SARA. On this system the IBM compilers were used.
We will now list some of the changes that were required for the machines mentioned.
For most compilers further compliance with the ISO C++ standard was needed. This
includes no variable sizes for arrays (MIPSpro), explicit typing at some places (GNU
3.4.0), explicit references to variables using e.g. the this pointer (GNU 3.4.0).
On the Macintosh system ranlib was needed besides ar to create archives.
Explicit references to functions in the GNU C library were removed to increase
portability.
Some of the changes needed to get the code compiled on the IBM system resulted in
errors when using the Intel compilers. Since the IBM system at SARA has been taken
out of service, priority has been given to the Intel compilers. This needs some future
work.
On the Origin system special prelinking of archive files is needed. Archive files have
to be created in a single pass. We still did not manage to get all dependencies fixed,
however. Since Origin systems are at the end of their lifetime, no priority has been
given to solving this problem.
Improvement of the code
After we had working versions of the code there still were some problems to be solved. The
most important problem was that the Diesel code used an old format for the integral files it
obtained from GAMESSUK. The Diesel code has been changed to accept the new integral
files.
Another problem was the use of 4 byte integers on 64 bit machines. The integer size is now
defined in a header file. In this way, the program can also be made compatible with GAMESS
UK versions with 8 byte integers.
It was found that the Diesel code uses a lot of memory to store the integrals. The memory
requirements could however be halved by removing an extraneous copying operation in the
fortran code of diesel. Now the memory allocated in the C++ part is used directly in the fortran
part.
Another problem was found in the make system. The original code created a special directory
for include files, into which all include files were linked. This happened every time make was
run. Because of this, every build of Diesel all code was recompiled, resulting in a lot of lost
5
time used for compilation, when only a few files had changed. Now all files are included from
the directory where the files are. The only problem this causes is that the list of directories to
search for include files has become very large.
Finally the code has been run using the Valgrind[7] tool to check for memory leaks and
addressing errors. This resulted in a lot of fixes to remove memory leaks where memory was
not freed. There were also some uses of uninitialised variables that have been fixed.
Integration into GAMESSUK
The final goal of the work on the Diesel code was to integrate it into GAMESSUK. Because
most of the Diesel code is written in C++, this is not easy to integrate with the fortran code of
GAMESS. Since it would change the GAMESSUK structure drastically we have opted to
integrate the code in a different way.
The Diesel code has been put in a separate subdirectory in the GAMESSUK repository. We
have added a special diesel option to the GAMESS build system. This option will perform a
make within the Diesel directory. The options for diesel are now specified in the GAMESS
configuration files, and stored at compilation time in a diesel configuration file. This allows for
an easy configuration and compilation of the Diesel programs.
The Diesel make system made use of absolute paths. This has been abandoned in favour of
relative paths, mainly because GAMESS makes use of relative paths as well. In order to
achieve this, all include files are now explicitly specified in the code using the full relative path.
This has reduced the number of search paths in the makefiles tremendously.
Finally Diesel now makes use of the GAMESS definitions of the blas and lapack libraries.
When necessary the GAMESS blas and lapack code is put into a special archive file that is
linked with Diesel.
To the GAMESS input options a diesel option has been added as well. All lines between the
keywords “diesel” and “dieselend” are now treated as input for diesel programs. This input is
written to a special file. When GAMESS decides that it is time to run the CI code, this input file
is parsed again. The first line of this input consists of the name of the diesel module, followed
by lines with input. The input is finished by the keyword ‘end’, after this ‘end’ keyword input for
another diesel program may be provided in the same way. After reading the input of a Diesel
program, this input is copied into another file, and the selected Diesel program is called
through a system call. While this may seem as a very superfluous way of integrating Diesel
into GAMESS, it is still useful because only one input file is needed. Of course anyone
inclined to do so, may still use the Diesel programs separate from GAMESS.
Open ends
Of course there is still more work to do on Diesel. The code can for example be ported to
more platforms than currently used. The code has been shown to run on IBM Power systems,
but currently some of the changes needed on IBM systems break the code for the Intel
compilers.
6
Another problem is that only part of the code, the diagonalisation, is parallel. Another often
used, and possibly time consuming, part of the program, namely the selection is not currently
able to run in parallel. In principle parallelising this part should not be very difficult.
Other goals
Because the integration of the Diesel code into GAMESSUK took a lot of time, mainly due to
the sheer size of the code (more than 440 files and more than 90000 lines of code), and the
amount of work involved in bringing it all uptodate, we did not have time left to improve the
4index transformation. Delivering a useful Diesel code integrated into GAMESSUK was
deemed more important.
Benchmark results and scalability
Scaling of the code
In order to test the efficiency of the parallel diesel code we performed some benchmarks on
the SGI Altix system (Aster) at SARA. The system contains 1.3 GHz Intel Itanium II
processors with 3MB cache each. The system contains a total of 416 processors, but has
been partitioned into smaller partitions, the largest being 128 processors. The amount of
memory is 2GB per processor.
We performed the benchmark using a Selected Multi Reference CI on the 3,5,7,9
tetraphenylhexaazaacridine molecule[8]. The calculation has been done using a 631G**
basis set using an optimised geometry. The CI was done using an automatic selection of
reference configurations. The resulting timings for the calculation are shown in Table I.
Table I – Results for scaling benchmark on Aster
Number of processors Wallclock time1
2
4
8
16
32
64
1951.198
999.234
527.745
277.41
161.471
111.985
132.097
The resulting speedup of the calculation with respect to the run on a single processor is
shown in figure 1
7
Speedup of Diesel
0
2
4
6
8
10
12
14
16
18
20
0 10 20 30 40 50 60 70
Number of processors
Spe
edu
p
Figure 1 – Scaling of diag on Aster. The speedup of the calculation with respect to running
on a single processor is shown.
As can be seen from the table and the figure the parallel speedup is not very impressive,
although the program can be run efficiently with up to 32 processors. Note that the calculation
used is not very large and the wallclock time using 32 processors is just under two minutes.
We think that a larger diagonalisation will result in better scaling.
Another point worth noting is that only the diagonalisation is run in parallel. The generation of
the integral files, and the selection have been run serially. The generation of the integral files
took 2211.189 seconds, and the selection took 360.442 seconds. When these numbers are
taken into account as well the scaling becomes much poorer.
Conclusion
We have successfully integrated the Diesel Direct Selecting Multi Reference CI code into
GAMESSUK. In this way a parallel version of such a code has been introduced into
GAMESSUK. The scaling of the code is not extremely well, but will allow calculations using
up to 32 processors. Unfortunately only part of the code has been parallelised. In principle a
parallelisation of the selection part of the code should be feasible as well. The integral
transformations can in principle already be done in parallel, but that code does not scale very
well. Here some further work remains.
References
8
[1] V. R. Saunders, J. H. van Lenthe, Mol. Phys. 48 (1983) 923
[2] GAMESSUK (2004) is a package of ab initio programs written by M.F. Guest, J.H. van
Lenthe, J. Kendrick, K. Schoffel, and P. Sherwood, with contributions from R.D. Amos,
R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J. Knowles, V.
BonacicKoutecky, W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J.
Stone, D.J. Tozer, and A.H. de Vries. The package is derived from the original
GAMESS code due to M. Dupuis, D. Spangler and J. Wendoloski, NRCC Software
Catalog, Vol. 1, Program No. QG01 (GAMESS), 1980.
http://www.dl.ac.uk/CFS/docs/gamess6\_manual/
[3] B. Engels, V. Pless, H.U Sutter, Direct MRDCI, University of Bonn, Germany.
[4] M. Hanrath, B.Engels Chem.Phys. 225 (1997) 197.
[5] M. Hanrath, " Ein individuell selektierendes, internexternsepariertes Multireferenz
Konfigurationswechselwirkungsverfahren”, Ph.D. Thesis, University of Bonn, 1999 (see
http://www.tc.unikoeln.de/people/hanrath for more information)
[6] V. Pleß, “Ein direktes, individuell selektierendes MultireferenzKonfigurationswechsel
wirkungsverfahren”, Ph. D. Thesis, University of Bonn, 1994
[7] http://www.valgrind.org
[8] P. Langer, A. Bodtke, N.N.R. Saleh, H. Görls, P Schreiner, Angew. Chem. Int. Ed. 44 (2005) 25
9
Appendix A: User information
Compiling the code
The code has been compiled on the platforms described earlier (Intel Xeon architecture
running Linux, Intel Itanium architecture running Linux, and PowerPC G5 architecture running
Mac OSX).
When compiling under GAMESSUK a special configuration file is used on each architecture.
We have added special Diesel build options to some of the configuration files. The options
have only been added to files for architectures on which we are rather certain that Diesel can
be build. For the other architectures not enough information is available yet. The existing
configuration files can be used as templates for the changes needed, however.
Currently compilation of Diesel only works when started from the m4 subdirectory of
GAMESS. This is something that should be fixed. Starting GAMESS compilation from the m4
directory is very normal however.
Flex library
The Diesel code makes use of the flex library for input parsing. We have run into some
problems with this library, because some versions of it are not compatible with the current
C++ standard. We have worked around this by using a patched version from RedHat on
systems that did not supply a compatible version of the flex library.
GAMESS configuration files
In the directory config one can find configuration files for a number of architectures on which
GAMESSUK can be build. To a number of these files the options for building Diesel have
been added. We have only added the options to architectures of which we are fairly sure that
Diesel will build in its current state. Here is an example of the definitions that have been
added to the files. Using the example one can easily add the options to other files as well.
There is also room for further improvement in e.g. compiler flags.
Here the C++ compiler is defined:CXX = icc
Here the C++ compiler flags are defined:##if debug#
CXXFLAGS = ${CFLAGSTMP} g c
##else#
CXXFLAGS = ${CFLAGSTMP} c O2
##endif#
The Diesel makefiles need some extra information. This information is copied to Makefile.conf
and config.h in the Diesel directory:# Diesel build options
LD_DIESEL = ${CXX}
10
##if mkl#
DIESEL_LIBS = ${LBLAS} ${lBLAS} lifcore
##elseif goto#
DIESEL_LIBS = ${LBLAS} ${lBLAS} lifcore
##else#
DIESEL_LIBS = lifcore
##endif#
FLEX = /usr/bin/flex
__UNDERBAR = 1
SIZEOF_VOID_P = 4
LONG_LONG_INT = long long int
LONG_INT = long int
INT = int
SHORT_INT = short int
Running the code
Currently there is not much documentation on Diesel available. The most complete
description of the program can be found in the thesis of Michael Hanrath[5], which is available
on the web. For most people it will be unfortunate that this thesis has been written in German.
Running the code from within GAMESSUK means specifying “runtype ci” and CI type
“diesel”. GAMESS expects to find a keyword “diesel”, followed by a keyword “dieselend”.
Anything in between is supposed to be input for one of more diesel subprograms.
The input for a diesel subprogram consists of the name of the subprogram followed by its
input. The input for a subprogram is closed with “end”. Multiple diesel subprograms can be
specified in one run of GAMESSUK.
An example input may look like:title
h2co 321g closed shell SCF
zmatrix angstrom
c
o 1 1.203
h 1 1.099 2 121.8
h 1 1.099 2 121.8 3 180.0
end
runtype ci
diesel
sel
NumberOfElectrons = 16
MOIntegralFileFormat = New
ExcitationLevel = 2
Multiplicity = 1
IrRep = 0
SelectionThresholds = { 1.0e3 1.0e9 }
Roots = { 1 2 }
RefConfs = {
0 15 13 1718
0 15 1314 17
2 1314 15 1718
2 1819 15 13 17
11