Upload
nelson-stevenson
View
220
Download
0
Embed Size (px)
DESCRIPTION
Overview Goals: To develop a parallel program for the simulation of a group of molecules using Molecular Dynamics Simulation. To implement various parallel algorithms and compare their performance. To produce good documentation of the design and the overall system.
Citation preview
MSE Presentation 3ByLakshmikanth Ganti
Under the Guidance of
Dr. Virgil Wallentine – Major ProfessorDr. Paul Smith – Committee MemberDr. Mitch Neilsen – Committee Member
Introduction Overview Revised Artifacts Component Design Assessment Evaluation Project Evaluation User Manual Conclusion
OverviewGoals:
To develop a parallel program for the simulation of a group of molecules using Molecular Dynamics Simulation.
To implement various parallel algorithms and compare their performance.
To produce good documentation of the design and the overall system.
Revised Artifacts Architecture Design
Revised with design descriptions for each of the parallel programming paradigms used.
Revised Artifacts Object Model
Component Design Classes
Atom Barrier ObjBuf EnergyWriter ParThread MdPar MdConstants
Component Design Classes
LineReader IO_Utils Semaphore BinarySemaphore CountingSemaphore
Assessment Evaluation Feature Testing
Read Data from files Read Program Arguments Format Values for output
Assessment Evaluation Functional Testing
Program executed with different number of threads . Velocities read from a file each time instead of calculating using Random Gaussian Distribution.
Assessment Evaluation Performance Evaluation
Initial Design 3-D grid shaped pattern of thread
creation Message passing by Bounded Buffers Number of threads is 512; Each thread is
assigned one partition No Speedup achieved
Assessment Evaluation Performance Evaluation
Design I 3-D grid shaped pattern of thread
creation Message passing by bounded buffers Number of threads can 2x2x2 or 4x4x4 3-D array of partitions are assigned to
each thread
Assessment Evaluation Performance Evaluation
Design I, Fine Grained
Number of Threads Time Taken Speed-up Efficiency
1 179625 -- --
8 174393 1.03 25.75
64 216415 0.83 20.75
Assessment Evaluation Performance Evaluation
Design I, Coarse Grained
Number of Threads Time Taken Speed-up Efficiency
1 1726549 -- --
8 1676261 1.03 25.75
64 2105547 0.82 20.5
Assessment Evaluation Performance Evaluation
Design ISpeedup vs Number of Threads
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70
Number of Threads
Spee
dup
Fine GrainedCoarse Grained
Performance Evaluation Design II
Vertical Pipeline shaped pattern of thread creation
Message Passing through Bounded Buffers Layers of partitions assigned to each
thread rather than a 3-D array of partitions Number of threads created can be 1, 2, 4
or 8.
Assessment Evaluation
Assessment Evaluation Performance Evaluation
Design II, Fine Grained
Number of Threads Time Taken Speed-up Efficiency
1 170062 -- --
2 158936 1.07 26.75
4 104796 1.62 40.5
8 126911 1.34 33.5
Assessment Evaluation Performance Evaluation
Design II, Coarse Grained
Number of Threads Time Taken Speed-up Efficiency
1 1699526 -- --
2 1559198 1.09 27.25
4 982384 1.73 43.25
8 1196849 1.42 35.5
Assessment Evaluation Performance Evaluation
Design IISpeedup vs Number of Threads
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 1 2 3 4 5 6 7 8 9
Number of Threads
Spee
dup
Fine GrainedCoarse Grained
Performance Evaluation Final Design
Vertical Pipeline shaped pattern of thread creation
Synchronization by Barrier. No message passing
Layers of partitions assigned to each thread rather than a 3-D array of partitions
Number of threads created can be 1, 2, 4 or 8.
Assessment Evaluation
Assessment Evaluation Performance Evaluation
Final Design, Fine Grained
Number of Threads Time Taken Speed-up Efficiency
1 162653 -- --
2 137841 1.18 29.5
4 62800 2.59 64.75
8 66935 2.43 60.75
Assessment Evaluation Performance Evaluation
Final Design, Coarse Grained
Number of Threads Time Taken Speed-up Efficiency
1 1684963 -- --
2 1306172 1.29 32.25
4 591215 2.85 71.25
8 640670 2.63 65.75
Assessment Evaluation Performance Evaluation
Final DesignSpeedup vs Number of Threads
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6 7 8 9
Number of threads
Spee
dup
Fine GrainedCoarse Grained
Project Evaluation Problems encountered
JPF Debugging Parallel Programs Limited Processing power of available
systems
Project Evaluation Accuracy of Estimates
Estimated duration of the project ~ 8 Months
Actual duration of the project ~ 7 months
Estimated LOC Actual LOC
Sequential 1435 504
Parallel 1545 1271
Project Evaluation Lessons Learnt
Methodology Reviews
User Manual Data Formats Program usage User Commands System Configuration
Conclusion Various parallel algorithms based on 1)
Synchronization mechanism, 2) the pattern of thread creation and 3) Granularity, are implemented
The above implementations are compared for speedup and efficiencies
Documentation of the design and the overall system is produced.
Questions/Comments