16
Use of distributed FFT for writing fully distributed N-body code for cosmological applications Supervisors : Dr. S. Sanyal, IIIT Allahabad &, Dr. J. S. Bagla, HRI Allahabad -Kalpana Roy R200513

Writing distributed N-body code using distributed FFT - 1

  • Upload
    kr0y

  • View
    1.600

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Writing distributed N-body code using distributed FFT - 1

Use of distributed FFT for writing fully

distributed N-body code for cosmological

applications

Supervisors : Dr. S. Sanyal, IIIT Allahabad

&, Dr. J. S. Bagla, HRI Allahabad

-Kalpana Roy

R200513

Page 2: Writing distributed N-body code using distributed FFT - 1

Motivation

� The classical N-body problem simulates the evolution of a

system of N bodies, where the force exerted on each body arises due to its interaction with all the other bodies in the

system. It is used in cosmology to study processes of structure

formation like the dynamical evolution of star clusters under

the influence of physical forces.

� Given the initial conditions of the bodies i.e. initial masses, positions and velocities, an N-body code helps to calculate

their current positions and motions, evaluating the intermediate

values over timesteps and updating.

� The particle-particle interactions lead to the order of N2

calculations which is extremely huge and practically not feasible.

Page 3: Writing distributed N-body code using distributed FFT - 1

� Hence, the need for optimisation comes in; Fast Fourier

Transforms are used which reduce the time required for calculation to order of N log N.

� Even then large volumes of data are generated and the

calculation of an N-body code takes excessively long time even

on the fastest of computers [2].

� As a solution, the computations are done on distributed systems. The task is divided into the number of

processors/systems available which perform calculations on

their local data. As the calculations occur parallely, time

required decreases.

� Hence, use of distributed FFT for writing a fully distributed N-

body code provides the advantages of faster calculations at a

comparatively lower cost.

Page 4: Writing distributed N-body code using distributed FFT - 1

Problem Definition

� Each N-body code has two basic modules, one for calculation

of the total force acting on each body, given the configuration of particles and the other module moves the particles in this

force field.

� The project deals with calculation of the force field based on

initial conditions and movement of the particles based on the

force.

� The data will be decomposed and stored into the local memory

of each distributed machine and processed.

� Then the processed local data of all the machines will be

combined and the desired N-body code will be obtained.

Page 5: Writing distributed N-body code using distributed FFT - 1

Initial conditions are

setup for the model of

interest.

Compute forces for given

particle positions

Move the particles by

one step

If t = tfin

Write output to file

yes

no

N-body

Page 6: Writing distributed N-body code using distributed FFT - 1

Technologies Used

� FFTW – Fastest Fourier Transform in the WEST is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and

of both real and complex data. The FFTW package was

developed at MIT by Matteo Frigo and Steven G. Johnson.

FFTW libraries can be used for writing codes in C, C++ and

Fortran languages.

� It is used for solving the Poisson equation of the gravitational

potential and calculation of force using Fourier transform.

� By default, both the forward and inverse Fourier transforms

are done out-place.

� FFTW also provides for in-place transforms, with same input

and output arrays.

Page 7: Writing distributed N-body code using distributed FFT - 1

� The FFTW routines store the data in row-major format for

multi-dimensional arrays.

� It does not do normalization of data implicitly and hence if we

perform forward transform of some data and inverse transform

of the result, we get the original data multiplied by the size of

the array.

� FFTW also support MPI (Message Passing Interface) operations allowing for distributed memory parallelism, where

each CPU has its own separate memory, and which can scale up

to clusters of many thousands of processors. This is desirable in

the project building as the data is huge and will not fit in the

memory of a single processor.

� In MPI, the data is divided among a set of “processes” which

each run in their own memory address space.

Page 8: Writing distributed N-body code using distributed FFT - 1

� PMFAST is a particle-mesh N-body code, written in Fortran 90 and aimed towards use in large-scale structure cosmological simulations [5].

� It offers support for distributed memory systems through MPI

as well as parallel initial condition generator.

Page 9: Writing distributed N-body code using distributed FFT - 1

The project comprises of writing an N-body code taking input

conditions, solving the potential equation in k-space and calculating the force and simulate over timesteps, calculating the intermediate

position and other attributes. As the major task here is solving of

the equation in k-space using Fourier transform, the following steps

are followed:

�The force and gravitational potential are related to each other as

�Finding the potential energy Φ is easy, because the Poisson equation,

where G is Newton's constant and is the density (number of particles at the

mesh points.)

Plan of Work

Page 10: Writing distributed N-body code using distributed FFT - 1

� It is trivial to solve Φ by using the fast Fourier transform to go to the frequency

domain where the Poisson equation has the simple form,

� The gravitational field can now be found by multiplying by k and computing the inverse Fourier transform.

• The first step of the project was taking a 1-dimensional real data

value and calculating the error obtained by using FFTW for

forward and then subsequent inverse transform followed by

normalisation.– g(x) = exp(-(x-N/2)2/(2*σ2)) , x ranging from 1 to N

– ∂2g = ((x-N/2)2/σ2 – 1)*g(x)/σ2 = f(x), say

– f(x) ------> F(k) [forward fourier transform]

– F(k)/-k2 ---------> g(x) [inverse fourier transform]

where, k2 = kx2 + ky2 + kz2 , for 3–dimensional data

– in current case 1-d , k2 = kx2

– kx = 2π/N * i, i<=N/2

– = 2π/N * (N-i), i>n/2

Page 11: Writing distributed N-body code using distributed FFT - 1

• Calculated the dependence of error on the values of σ and N.

Error = Σ(i=1toN) (gobtained(i)-g(i))2 /g(i)2

– Error(N=256) = 0.077926

– Error(N=512) = 0.043631

– Error(N=1024) = 0.0264835 , keeping σ =5, constant.

Page 12: Writing distributed N-body code using distributed FFT - 1

– Error(σ =5) = 0.0264835

– Error(σ =10) = 0.043631

– Error(σ =15) = 0.0607785 , keeping N=1024, constant

Hence, it is deducted that the error value increases with increasing σ but decreases as N increases.

• Performed multi-dimensional fast Fourier transform of real and

complex data. In this case the complex data's real part was kept

equal to the real data and complex value was left to zero, so that both the real and complex transform were done on the same

data.

Page 13: Writing distributed N-body code using distributed FFT - 1

2-d complex transform (above) and real transform (below)

Page 14: Writing distributed N-body code using distributed FFT - 1

• After successful completion of out-place transforms, in-place

transforms were done as they are useful in the project.

• The next step is to perform the in-place transforms using

distributed-memory parallelism.

Afore-mentioned work has been done before mid-semester.

• Work to be done now is to run the same MPI programs with

very large N values on a 32-node cluster, each node having 16GB RAM and a quad core processor. The task will be to plot

time against the number of processes for a particular N value

and find the optimal number of processes for which execution

time is minimised.

Page 15: Writing distributed N-body code using distributed FFT - 1

• The next step is to store the data required by each process in the

local memory of the process itself and then repeat the above. This will reduce the storage requirements and now the data size

can be extremely large as it will not depend on the storage of

one processor only.

• After the optimisation of Fourier transform functions, a Particle

Mesh based N-body code, PMFAST, will be used and the force computations will be done using the developed distributed-

memory Fourier transform codes.

• With the help of the force computations, particles will be moved

accordingly and subsequent calculations will be done iteratively

using timestep to achieve the final attributes of the particles.

Page 16: Writing distributed N-body code using distributed FFT - 1

References

1. J.S.Bagla 2001, Cosmological N-Body Simulations, Resource Summary, Khagol 48, 5

2. J. S. Bagla, Cosmological N-Body Simulations, Gravitational Clustering in an Expanding Universe -http://www.hri.res.in/~jasjeet/thesis.html

3. FFTW – Fastest Fourier Transform in the WEST -http://www.fftw.org

4. The Message Passing Interface (MPI) standard -http://www-unix.mcs.anl.gov/mpi/

5. PMFAST - http://www.cita.utoronto.ca/~merz/pmfast/

6. Wikipedia – The online encyclopedia