Upload
doankhanh
View
234
Download
2
Embed Size (px)
Citation preview
High-Performance Inverse Modelingwith Reverse Monte Carlo Simulations
Abhinav Sarje1
Xiaoye Sherry Li Alexander Hexemer1Computational Research Advanced Light Source
Lawrence Berkeley National Laboratory
09.10.14
International Conference for Parallel Processing 2014, Minneapolis
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Background and Motivation
• X-ray scattering is an important tool for scientists to probe structuralproperties of materials at nano-scales.
• SAXS, Small Angle X-ray scattering, is a widely used type.
• SAXS is primarily used for non-crystalline/amorphous materials.
• Data challenge: Currently about 25-50 TB / month of raw data fromONE beamline, and growing nearly exponentially. A Synchrotron has10s-100 beamlines.
• Beamline scientists have not had HPC resources available. Need tobridge the gap.
• Many materials are too complex to use sophisticated modelingtechniques.
• Reverse Monte Carlo simulation works well with SAXS data.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
The General RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Faster Updates of Fourier Transform
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
A Scaled RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
A Scaled RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
A Scaled RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
A Scaled RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
A Scaled RMC Modeling Algorithm
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Autotuned Model Temperature
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Distributed-Memory Parallelization over Multiple Tiles
• MPI for distributed memory level.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Distributed-Memory Parallelization over Multiple Tiles
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Distributed-Memory Parallelization of a Single Tile
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Subtile Implementations
• Two main types of kernels:1 Data parallel.2 Reduction.
• Multicore CPU implementation with OpenMP.
• Graphics Processor (GPU) implementation with Nvidia CUDA.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Validation with Model Reconstruction
ActualModels
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Validation with Model Reconstruction
ActualModels
Start Models
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Validation with Model Reconstruction
ActualModels
Start Models
ReconstructedModels
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Model Reconstruction
Simulated and Experimental Patterns Computed Model
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Environment
• All computations are in double precision.
1 Single node platform:• Dual-socket 2.2 GHz 8-core Intel Xeon E5-2660 (Sandy Bridge)• 16 cores and 32 hardware threads with 2 NUMA regions, and 64 GB RAM• Nvidia K20X (Kepler) graphics processor.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Number of iterations
101
102
103
104
105
106
107
108
5 10
20
40
80
160
320
500
Tim
e [m
s]
Number of iterations [x1000]
TotalFFT Update
ReductionMiscMPI
Multicore CPU
101
102
103
104
105
106
107
108
5 10
20
40
80
160
320
500
Tim
e [m
s]
Number of iterations [x1000]
TotalFFT Update
ReductionMiscMPI
Kepler GPU
• Tile size of 512 × 512.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Scaling with number of tiles
101
102
103
104
105
106
107
108
1 2 4 8 16
32
64
Tim
e [m
s]
Number of tiles
TotalFFT Update
ReductionMiscMPI
Multicore CPU
101
102
103
104
105
106
107
108
1 2 4 8 16
32
64
Tim
e [m
s]
Number of tiles
TotalFFT Update
ReductionMiscMPI
Kepler GPU
• Total of 10,000 iterations.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Scaling with tile size
101
102
103
104
105
106
107
108
64
128
256
512
102
4
204
8
409
6
Tim
e [m
s]
Tile size
TotalFFT Update
ReductionMiscMPI
Multicore CPU
101
102
103
104
105
106
107
108
64
128
256
512
102
4
204
8
409
6
Tim
e [m
s]
Tile size
TotalFFT Update
ReductionMiscMPI
Kepler GPU
• Total of 40,000 iterations.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Environment
1 Multicore CPU cluster:• Cray XE6 supercomputer (‘Hopper’ at NERSC/LBNL)• Each node is a dual-socket 2.1 GHz 12-core AMD MagnyCours processors• 24 cores per node with 4 NUMA domains, and 32 GB RAM
2 GPU cluster:• Cray XK7 supercomputer (‘Titan’ at OLCF/ORNL)• Each node is a 2.2 GHz 16-core AMD Opteron 6274 (Interlagos)• 16 cores per node with 2 NUMA domains, and 32 GB RAM• One Nvidia K20X (Kepler) GPU on each node.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Weak scaling
104
105
106
107
24
48
96
192
384
768
153
6
307
2
614
4
122
88
Tim
e [m
s]
Number of Cores
t=2p, s=2048t=p, s=2048t=p, s=512
t=p/2, s=512
Hopper
104
105
106
1 2 4 8 16
32
64
128
256
512
102
4
Tim
e [m
s]
Number of Nodes (GPUs)
t=2p, s=2048t=p, s=2048t=2p, s=512t=p, s=512
Titan
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Experimental Performance: Strong scaling
104
105
106
107
108
109
1010
24
48
96
192
384
768
153
6
307
2
614
4
122
88
Tim
e [m
s]
Number of Cores
t=1024, s=2048t=1024, s=512t=128, s=2048t=128, s=512
Hopper
103
104
105
106
107
108
1 2 4 8 16
32
64
128
256
512
Tim
e [m
s]
Number of Nodes (GPUs)
t=1024, s=2048t=1024, s=512t=128, s=512
Titan
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Conclusions
1 Synchrotron beamlines need HPC for data processing to be efficient.
2 Monte-Carlo and Reverse Monte-Carlo simulations are highly usedmethods in scientific computing, and same parallelization techniquescan be applied.
3 RMC works when all other sophisticated methods fail.
4 Reduction operations are nearly one order of magnitude slower onGPUs compared to multicore CPUs. Better caching and lesssynchronizations on CPUs are two primary factors.
5 Even with large number of reduction kernels in the application, at scalethe overall performance on GPUs is about an order of magnitudebetter than on multicore CPUs!
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Future Work
1 Non-binary model reconstruction.
2 3-D structures reconstruction.
3 Time-series fitting.
4 Realtime?
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Acknowledgements
• Supported by the Director, Office of Science, of the U.S. Department ofEnergy under Contract No. DE-AC02-05CH11231, and DOE EarlyCareer Research grant awarded to Alexander Hexemer.
• Resource usage at National Energy Research Scientific Computing Center(NERSC) at the Lawrence Berkeley National Laboratory, supported bythe Office of Science of the U.S. Department of Energy under ContractNo. DE-AC02- 05CH11231.
• Resource usage at Oak Ridge Leadership Computing Facility (OLCF) at theOak Ridge National Laboratory, supported by the Office of Science of theU.S. Department of Energy under Contract No. DE-AC05-00OR22725.
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov
Introduction The RMC Methodology A High-Performance RMC Parallel RMC Performance Conclusions
Thank you!
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations. ICPP’14 asarje @ lbl.gov