Upload
brian-foster
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Parallel Computing Project(OPENMP using LINUX for Parallel application)
Summer 2008 Group ProjectSummer 2008 Group ProjectInstructor: Prof. Nagi MekhielInstructor: Prof. Nagi Mekhiel
August 12August 12th,th,, 2008, 2008
Ravi IllapaniRavi IllapaniKyungheeKyunghee Ko KoLixiang ZhangLixiang Zhang
2
OpenMP Parallel Computing Solution Stack
3
Recall Basic Idea of OpenMP
The program generated by the compiler is executed by multiple threads
One thread per processor or core
Each thread performs part of the work
Parallel parts executed by multiple threads
Sequential parts executed by single thread
Dependences in parallel parts require synchronization between threads
4
Recall Basic Idea: How OpenMP Works
User must decide what is parallel in program Makes any changes needed to original source code
E.g. to remove any dependences in parts that should run in parallel
User inserts directives telling compiler how statements are to be executed What parts of the program are parallel
How to assign code in parallel regions to threads
Specifies data sharing attributes: shared, private, threadprivate…
5
How The User Interacts with Compiler
Compiler generates explicit threaded code Shields user from many details of the multithreaded code
Compiler figures out details of code each thread needs to execute
Compiler does not check that programmer directives are correct!!! Programmer must be sure the required synchronization is
inserted
The result is a multithreaded object program
6
OpenMP Compilers and Platforms Intel C++ and Fortran Compilers from Intel
Intel IA32 Linux/Windows Systems Intel Itanium-based Linux/Windows Systems
Fujitsu/Lahey Fortran, C and C++ Intel Linux Systems, Fujitsu Solaris Systems
HP HP-UX PA-RISC/Itanium , HP Tru64 Unix Fortran/C/C++
IBM XL Fortran and C from IBM IBM AIX Systems
Guide Fortran and C/C++ from Intel's KAI Software Lab Intel Linux/Windows Systems
PGF77 / PGF90 Compilers from The Portland Group (PGI) Intel Linux/Solaris/Windows/NT Systems
Freeware: Omni, OdinMP, OMPi, OpenUH...
Check information at http://www.compunity.org
7
Structure of a Compiler
Front End
Read in source program, ensure that it is error-free, build the intermediate representation(IR)
Middle End
Analyze and optimize program as much as possible. “Lower” IR to machine-like form
Back End
Determine layout of program data in memory. Generate object code for the target architecture and optimize it
8
OpenMP Implementation
9
OpenMP Implementation (con’t)
If program is compiled sequentially
OpenMP comments and pragmas are ignored
If code is compiled for parallel execution
Comments and/or pragmas are read, and
Drive translation into parallel program
Ideally, one source for both sequential and parallel program (big maintenance plus)
Usually this is accomplished by choosing a specific compiler option
10
OpenMP Implementation (con’t)
Transforms OpenMP programs into multi-threaded code
Figures out the details of the work to be performed by each thread
Arranges storage for different data and performs their initializations: shared, private...
Manages threads: creates, suspends, wakes up, terminates threads
Implements thread synchronization
11
Implementation-Defined Issues
OpenMP leaves some issues to the implement Default number of threads
Default schedule and default for schedule (runtime)
Number of threads to execute nested parallel regions
Behaviour in case of thread exhaustion
And many others....
Despite many similarities, each implementation is a little different from all others
Butterfly effect The butterfly effect is a phrase that encapsulates
the more technical notion of sensitive dependence on initial conditions in chaos theory. Small variations of the initial condition of a dynamical system may produce large variations in the long term behavior of the system
As butterfly describes, we gave parameters a little change and we got the totally different results.
13
System Overview
The classical model assumes having a magnetic pendulum which is attracted by three magnets with each magnet having a distinct color.
The magnets are located underneath the pendulum on a circle centered at the pendulum mount-point. They are strong enough to attract the pendulum in a way that it will not come to rest in the center position
System Overview (con’t)
Beeman Integration Algorithm
The formula used to compute the positions at time t + Δt is:
and this is the formula used to update the velocities:
Simulation results
Exp 1:
Single core vs dual core….
Performance w.r.t number of threads…..
Serial vs parallel…..
32 tests were conducted…
17
18
Exp 2:
Simulation when the no.of magnets are changed….
Simulation of the behavior of the pendulum….
5 tests were conducted..
19
20
21
Exp 3 In this experiment, we simulate the
pendulum in a field of 2 magnets with varying values of friction and gravitation forces.
A total number of 63 simulations were run:
22
23
Exp 4 In this experiment, we simulate the
pendulum in a field of 3 magnets with varying values of friction and gravitation forces.
A total number of 63 simulations were run:
24
25
Exp 5 In this experiment, we simulate the
pendulum in a field of 8 magnets with varying values of friction and gravitation forces.
A total number of 26 simulations were run:
26
27
28
Conclusion Even though the hardware is available, effective
programming is required to maximize code efficiency.
Complex simulations can be performed faster using parallel architecture.
Openmp helps!! Simple: everybody can learn it in 11weeks Not so simple: Don’t stop learning! keep learning
it for better performance
29
References [1] Michael Resch, Edgar Gabriel, Alfred Geiger (1999). An Approach for MPI Based Metacomputing, High Performance
Distributed Computing Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 17, retrieved from ACM website August, 2008
http://portal.acm.org/citation.cfm?id=823264&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [2] William Gropp, Ewing Lusk, Rajeev Thakur (1998), A case for using OPENMP's derived datatypes to improve I/O
performance, Conference on High Performance Networking and Computing Proceedings of the 1998 ACM/IEEE conference on Supercomputing, 1-10, retrieved from ACM website August, 2008
http://portal.acm.org/citation.cfm?id=509059&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [3] Michael Kagan (2006), Application acceleration through OPENMP overlap, Proceedings of the 2006 ACM/IEEE
conference on Supercomputing, , retrieved from ACM website August, 2008
http://portal.acm.org/citation.cfm?id=1188736&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [4] Kai Shen, Hong Tang, Tao Yang (1999), Compile/run-time support for threaded OPENMP execution on
multiprogrammed shared memory machines, ACM SIGPLAN Notices Volume 34, Issue 8, 107 -118, , retrieved from ACM website August, 2008
http://portal.acm.org/citation.cfm?id=301114&coll=ACM&dl=ACM&CFID=12436242&CFTOKEN=36621280 [5] Wikipedia Reference, retrieved from Wikipedia.org website August, 2008
http://en.wikipedia.org/wiki/Beeman's_algorithm
http://www.bugman123.com/Fractals/Fractals.html
http://www.inf.ethz.ch/personal/muellren/pendulum/index.html#simulation
http://en.wikipedia.org/wiki/Chaos_theory
http://en.wikipedia.org/wiki/Butterfly_effect [6] Software install, compiler, code Reference, retrieved website August, 2008
http://www.openmp.org/wp/
http://www.intel.com/cd/software/products/asmo-na/eng/compilers/277618.htm
http://www.codeproject.com/KB/recipes/MagneticPendulum.aspx
30