Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 1
Parallel Processing
Majid AlMeshari
John W. Conklin
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 2
Outline• The Challenge • Available Parallelization Resources• Requirements• Our Approach• Status of Parallelization• Plan & Next Steps
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 3
The Challenge1. Extended Kalman Filter with Sigma Point Technology
is used as our nonlinear estimation filter
Carrying this out on a 2-second time step is computationally intensive (takes up to 5 days)
2. Data set is huge ~ 1 Terabyte
RAM may not be sufficient to carry out computation tasks
Swapping to disk delays processing
In a nutshell, a computationally intensive algorithm is applied on a huge set of data
A single test case takes several days, which delays verification of correctness of the algorithm
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 4
Available Parallelization Resources• Hardware
– 2 Computer Clusters (Regelation & Nivation)
» Regelation : 44 64-bit-CPUs • 64-bit enables us to address a memory space beyond 4 GB
• Using this cluster because: (a) likely will not need more than 44 processors (b) same platform as our Linux boxes
» Nivation: 150 32-bit-CPUs » Both clusters run Linux,
Matlab 2007b (same as our in-house linux boxes)
• Software– MatlabMPI
» Matlab parallel computing toolbox created by MIT Lincoln Lab
» Eliminates the need to port our code into another language
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 5
Challenges of Parallelization in GP-B• Our filter doesn't fit the embarrassingly-parallel model
– A 44 speed-up cannot be achieved
• Inter-processor communication can only be done through disk Read/Write
– Parallelization overhead can be costly
• Simultaneous debugging on multiple CPUs– Hard to find bugs on remote machines
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 6
Requirements• Achieve a speed-up of 5-10x over serial version
– Minimize overhead by reducing inter-processor communication time
– Minimize idle time each processor spends waiting for others to finish
– Achieve a reasonable scalability» Manage trade-off between number of processors and
communication & idle time
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 7
Serial 2-Sec Filter Code StructureInitialization
Loop over iterations
Loop segments
Loop over orbits
Loop over gyros
Build h(x) (the right hand side)
Build Jacobian, J(x)
Load data (Z), perform fitBayesian estimation,
display results
Computationally Intensive
Components
(Parallelize!!!)
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 8
Parallel Code StructureInitialization
Loop over iterationsLoop segments
Loop over orbitsLoop over gyros
Build h(x)
Build Jacobian, J(x)
Bayesian estimation, display results
Distribute J(x), all nodes
Load data (Z), perform fitLoop over orbits
Send information matrices to master & combine
In parallel, split up by columns of J(x)
In parallel, split up by orbits
Custom data distribution code
required (Overhead)
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 9
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 10
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 11
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 12
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 13
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 14
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 15
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 16
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 17
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 18
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 19
Our Approach – Sequence Diagram
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 20
Parallelization Techniques• Inter-Processor Communication
– To minimize comm time, Use MatlabMPI for one-to-one communication only
– Use a custom approach for one-to-many communication» Each processor saves message file, creates confirmation
file when message is ready» Upon presence of confirmation file, other waiting
processors read message
• Load Balancing– Maintain an even processing load between all processors
CPUn Load =
k: {RT, Tel, Cg, S0 }
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 21
1 4 8 16 200
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
bayes., finalizecompile infoestimation (fit)distribute Jcompute Jh(x)initialization
Parallel Code Timings
# of Processors
7.2x speed-up, per iteration,
with 20 processors
Tim
e (S
ec)
Segments 5 & 6 / gyros 1, 2, 3, 4 / 1 iteration
Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University
04-Parallel Processing 22
Next Steps
Incorporate the new changes to the model implemented by the development team
• Move development/testing environment to cluster instead of in-house Linux boxes (October 2009)
Continue working on minimizing the overhead– Focus on new ways to improve communication