Parallel Computing Project (OPENMP using LINUX for Parallel application) Summer 2008 Group Project Instructor: Prof. Nagi Mekhiel August 12 th,, 2008 Ravi

Parallel Computing Project(OPENMP using LINUX for Parallel application)

Summer 2008 Group ProjectSummer 2008 Group ProjectInstructor: Prof. Nagi MekhielInstructor: Prof. Nagi Mekhiel

August 12August 12th,th,, 2008, 2008

Ravi IllapaniRavi IllapaniKyungheeKyunghee Ko KoLixiang ZhangLixiang Zhang

2

OpenMP Parallel Computing Solution Stack

3

Recall Basic Idea of OpenMP

The program generated by the compiler is executed by multiple threads

One thread per processor or core

Each thread performs part of the work

Parallel parts executed by multiple threads

Sequential parts executed by single thread

Dependences in parallel parts require synchronization between threads

4

Recall Basic Idea: How OpenMP Works

User must decide what is parallel in program Makes any changes needed to original source code

E.g. to remove any dependences in parts that should run in parallel

User inserts directives telling compiler how statements are to be executed What parts of the program are parallel

How to assign code in parallel regions to threads

Specifies data sharing attributes: shared, private, threadprivate…

5

How The User Interacts with Compiler

Compiler generates explicit threaded code Shields user from many details of the multithreaded code

Compiler figures out details of code each thread needs to execute

Compiler does not check that programmer directives are correct!!! Programmer must be sure the required synchronization is

inserted

The result is a multithreaded object program

6

OpenMP Compilers and Platforms Intel C++ and Fortran Compilers from Intel

Intel IA32 Linux/Windows Systems Intel Itanium-based Linux/Windows Systems

Fujitsu/Lahey Fortran, C and C++ Intel Linux Systems, Fujitsu Solaris Systems

HP HP-UX PA-RISC/Itanium , HP Tru64 Unix Fortran/C/C++

IBM XL Fortran and C from IBM IBM AIX Systems

Guide Fortran and C/C++ from Intel's KAI Software Lab Intel Linux/Windows Systems

PGF77 / PGF90 Compilers from The Portland Group (PGI) Intel Linux/Solaris/Windows/NT Systems

Freeware: Omni, OdinMP, OMPi, OpenUH...

Check information at http://www.compunity.org

7

Structure of a Compiler

Front End

Read in source program, ensure that it is error-free, build the intermediate representation(IR)

Middle End

Analyze and optimize program as much as possible. “Lower” IR to machine-like form

Back End

Determine layout of program data in memory. Generate object code for the target architecture and optimize it

8

OpenMP Implementation

9

OpenMP Implementation (con’t)

If program is compiled sequentially

OpenMP comments and pragmas are ignored

If code is compiled for parallel execution

Comments and/or pragmas are read, and

Drive translation into parallel program

Ideally, one source for both sequential and parallel program (big maintenance plus)

Usually this is accomplished by choosing a specific compiler option

10

OpenMP Implementation (con’t)

Transforms OpenMP programs into multi-threaded code

Figures out the details of the work to be performed by each thread

Arranges storage for different data and performs their initializations: shared, private...

Manages threads: creates, suspends, wakes up, terminates threads

Implements thread synchronization

11

Implementation-Defined Issues

OpenMP leaves some issues to the implement Default number of threads

Default schedule and default for schedule (runtime)

Number of threads to execute nested parallel regions

Behaviour in case of thread exhaustion

And many others....

Despite many similarities, each implementation is a little different from all others

Butterfly effect The butterfly effect is a phrase that encapsulates

the more technical notion of sensitive dependence on initial conditions in chaos theory. Small variations of the initial condition of a dynamical system may produce large variations in the long term behavior of the system

As butterfly describes, we gave parameters a little change and we got the totally different results.

13

System Overview

The classical model assumes having a magnetic pendulum which is attracted by three magnets with each magnet having a distinct color.

The magnets are located underneath the pendulum on a circle centered at the pendulum mount-point. They are strong enough to attract the pendulum in a way that it will not come to rest in the center position

System Overview (con’t)

Beeman Integration Algorithm

The formula used to compute the positions at time t + Δt is:

and this is the formula used to update the velocities:

Simulation results

Exp 1:

Single core vs dual core….

Performance w.r.t number of threads…..

Serial vs parallel…..

32 tests were conducted…

17

18

Exp 2:

Simulation when the no.of magnets are changed….

Simulation of the behavior of the pendulum….

5 tests were conducted..

19

20

21

Exp 3 In this experiment, we simulate the

pendulum in a field of 2 magnets with varying values of friction and gravitation forces.

A total number of 63 simulations were run:

22

23




24

25




26

27

28

Conclusion Even though the hardware is available, effective

programming is required to maximize code efficiency.

Complex simulations can be performed faster using parallel architecture.

Openmp helps!! Simple: everybody can learn it in 11weeks Not so simple: Don’t stop learning! keep learning

it for better performance

29

References [1] Michael Resch, Edgar Gabriel, Alfred Geiger (1999). An Approach for MPI Based Metacomputing, High Performance

Distributed Computing Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 17, retrieved from ACM website August, 2008

http://portal.acm.org/citation.cfm?id=823264&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [2] William Gropp, Ewing Lusk, Rajeev Thakur (1998), A case for using OPENMP's derived datatypes to improve I/O

performance, Conference on High Performance Networking and Computing Proceedings of the 1998 ACM/IEEE conference on Supercomputing, 1-10, retrieved from ACM website August, 2008

http://portal.acm.org/citation.cfm?id=509059&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [3] Michael Kagan (2006), Application acceleration through OPENMP overlap, Proceedings of the 2006 ACM/IEEE

conference on Supercomputing, , retrieved from ACM website August, 2008

http://portal.acm.org/citation.cfm?id=1188736&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [4] Kai Shen, Hong Tang, Tao Yang (1999), Compile/run-time support for threaded OPENMP execution on

multiprogrammed shared memory machines, ACM SIGPLAN Notices Volume 34, Issue 8, 107 -118, , retrieved from ACM website August, 2008

http://portal.acm.org/citation.cfm?id=301114&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [5] Wikipedia Reference, retrieved from Wikipedia.org website August, 2008

http://en.wikipedia.org/wiki/Beeman's_algorithm

http://www.bugman123.com/Fractals/Fractals.html

http://www.inf.ethz.ch/personal/muellren/pendulum/index.html#simulation

http://en.wikipedia.org/wiki/Chaos_theory

http://en.wikipedia.org/wiki/Butterfly_effect [6] Software install, compiler, code Reference, retrieved website August, 2008

http://www.openmp.org/wp/

http://www.intel.com/cd/software/products/asmo-na/eng/compilers/277618.htm

http://www.codeproject.com/KB/recipes/MagneticPendulum.aspx

30

Documents

Parallel Computing Project (OPENMP using LINUX for Parallel application) Summer 2008 Group Project Instructor: Prof. Nagi Mekhiel August 12 th,, 2008 Ravi