Parallel Processing - einstein.stanford.edueinstein.stanford.edu/RESOURCES/presentations/SAC19/04_parProcessi… · 04-Parallel Processing 3 The Challenge 1. Extended Kalman Filter

Science Advisory Committee Meeting - 19September 4, 2009 • Stanford University

04-Parallel Processing 1

Parallel Processing

Majid AlMeshari

John W. Conklin



Outline• The Challenge • Available Parallelization Resources• Requirements• Our Approach• Status of Parallelization• Plan & Next Steps



The Challenge1. Extended Kalman Filter with Sigma Point Technology

is used as our nonlinear estimation filter

Carrying this out on a 2-second time step is computationally intensive (takes up to 5 days)

2. Data set is huge ~ 1 Terabyte

RAM may not be sufficient to carry out computation tasks

Swapping to disk delays processing

In a nutshell, a computationally intensive algorithm is applied on a huge set of data

A single test case takes several days, which delays verification of correctness of the algorithm



Available Parallelization Resources• Hardware

– 2 Computer Clusters (Regelation & Nivation)

» Regelation : 44 64-bit-CPUs • 64-bit enables us to address a memory space beyond 4 GB

• Using this cluster because: (a) likely will not need more than 44 processors (b) same platform as our Linux boxes

» Nivation: 150 32-bit-CPUs » Both clusters run Linux,

Matlab 2007b (same as our in-house linux boxes)

• Software– MatlabMPI

» Matlab parallel computing toolbox created by MIT Lincoln Lab

» Eliminates the need to port our code into another language



Challenges of Parallelization in GP-B• Our filter doesn't fit the embarrassingly-parallel model

– A 44 speed-up cannot be achieved

• Inter-processor communication can only be done through disk Read/Write

– Parallelization overhead can be costly

• Simultaneous debugging on multiple CPUs– Hard to find bugs on remote machines



Requirements• Achieve a speed-up of 5-10x over serial version

– Minimize overhead by reducing inter-processor communication time

– Minimize idle time each processor spends waiting for others to finish

– Achieve a reasonable scalability» Manage trade-off between number of processors and

communication & idle time



Serial 2-Sec Filter Code StructureInitialization

Loop over iterations

Loop segments

Loop over orbits

Loop over gyros

Build h(x) (the right hand side)

Build Jacobian, J(x)

Load data (Z), perform fitBayesian estimation,

display results

Computationally Intensive

Components

(Parallelize!!!)



Parallel Code StructureInitialization

Loop over iterationsLoop segments

Loop over orbitsLoop over gyros

Build h(x)

Build Jacobian, J(x)

Bayesian estimation, display results

Distribute J(x), all nodes

Load data (Z), perform fitLoop over orbits

Send information matrices to master & combine

In parallel, split up by columns of J(x)

In parallel, split up by orbits

Custom data distribution code

required (Overhead)



Our Approach – Sequence Diagram

































Parallelization Techniques• Inter-Processor Communication

– To minimize comm time, Use MatlabMPI for one-to-one communication only

– Use a custom approach for one-to-many communication» Each processor saves message file, creates confirmation

file when message is ready» Upon presence of confirmation file, other waiting

processors read message

• Load Balancing– Maintain an even processing load between all processors

CPUn Load =

k: {RT, Tel, Cg, S0 }



1 4 8 16 200

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

bayes., finalizecompile infoestimation (fit)distribute Jcompute Jh(x)initialization

Parallel Code Timings

# of Processors

7.2x speed-up, per iteration,

with 20 processors

Tim

e (S

ec)

Segments 5 & 6 / gyros 1, 2, 3, 4 / 1 iteration



Next Steps

Incorporate the new changes to the model implemented by the development team

• Move development/testing environment to cluster instead of in-house Linux boxes (October 2009)

Continue working on minimizing the overhead– Focus on new ways to improve communication

Documents

Parallel Processing - einstein.stanford.edueinstein.stanford.edu/RESOURCES/presentations/SAC19/04_parProcessi… · 04-Parallel Processing 3 The Challenge 1. Extended Kalman Filter