Upload
selena-moyse
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Application of High Performance Computing to Situation Awareness Simulations
Amit Majumdar
Group Leader, Scientific Computing, San Diego Supercomputer CenterAssociate Professor, Dept of Radiation Oncology
University of California San Diego
Application of High Performance Computing to Near-Real Time Simulations
Outline
Academic High Performance Computing
Applications
Event-driven Science
Online Adaptive Cancer Radiotherapy
Dynamic Data Driven Image-guided Neurosurgery
Summary
2
Academic High Performance Computing
3
TeraGrid
NSF – National Science Foundation funds TeraGrid
TeraGrid – NSF funded supercomputer centers in US – high BW connection
Teraflop (TF) – 1012 floating point operations/sec
to
Petaflop (PF) – 1015 floating point operations/sec
range HPC machines
11 Resource Providers, One Facility
NSF - TeraGrid
TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, LSU, and the National Center for Atmospheric Research.
SDSCTACC
UC/ANL
NCSA
ORNL
PUIU
PSCNCAR
CaltechUSC-ISI
UtahIowa
Cornell
Buffalo
UNC-RENCI
Wisc
LSU
5 top Top500 HPC Machines
Top 5 November, 2009
Top 5 November, 2008
NSF HPC Perspective – Tflop - Pflop Track2 awards:
Two plus one – 3 awards Track2-A/B: 30M$ for machine plus ~8-10M$/year
operating cost - ~500 TF – 1PF range (peak)―Ranger at TACC, U Texas (579 TF, ~62K cores)―Kraken at NICS, ORNL (1 PF, ~99K cores)
Track2-D: Three different machines : Data intensive, Experimental, Grid research
Other awards for Visualization and Data systems Track1 award:
One award – ~200M$ Multi PF system with sustained PF performance on scientific
applications
Event-drive Science
8
On-demand Earthquake-induced Ground Wave Simulation
http://shakemovie.caltech.edu/
Prof Jeroen Tromp (at Caltech when we collaborated, currently at Princeton)
Caltech’s near real time simulation of southern California seismic events using SPECFEM3D software
Simulates SoCal seismic wave propagation based upon spectral element method (SEM) – a parallel MPI code
The movies illustrate the up (red) and down (blue) velocity of Earth’s surface
9
Events
Every time an earthquake of magnitude > 3.5 occurs in SoCal, 1000s of seismograms record at 100s of seismic stations epicenter, depth, intensity
Automatically collect these seismic recordings from the SCSN via internet
Subsequently simulate the seismic waves generated by the earthquake in a 3-D southern CA seismic velocity model using SCSN data
After full 3-D wave simulation collect the surface motion data (disp, vel, accl) and map on top of the topography
Render the data and generate movies Earthquake movies approved by a geophysicist at Caltech Movies are published – within ~45 mins of earthquake
10
On-demand HPC
Earthquake can happen anytime On-demand HPC resources needed for fast simulation Code uses 144 cores (Intel Woodcrest dual-socket dual-
core, 2.3 Ghz nodes) to complete simulations in about 20 mins
HPC resources setup at SDSC – called Ondemand HPC This has special queue where Caltech shakemovie jobs
can come in anytime automatically Batch software will kill other jobs to guarantee this job
gets resources Results sent back to Caltech – all with no human
intervention
11
Shake Movies
Implications
Emergency preparedness/response Tsunami warning
Work is being extended to do global simulation
Event: Sun Apr 11, 2010, 16:42:07; Lat:32.5285: Long: -115:3433
12
10602453-socalorange-small.mpg 10602453-laorange-small.mpg
Online Adaptive Cancer Therapy
13
http://radonc.ucsd.edu/Research/CART
14
Conventional Radiotherapy
Treatment simulation Build a virtual patient model
Treatment planning Perform virtual treatment using virtual machine on virtual patient
Treatment delivery Same treatment is repeated for many fractions Basic assumption: human body is a static system
Simulation Planning
Days Days
Treatment
Repeat
15
Human Body Is A Dynamic SystemWeek 1
Tumor
Week 3
Van de Bunt et al. ‘06
Tumor volume shrinkage in response to the treatment Tumor shape deformation due to filling state change of neighboring organs Relative position change between tumor and normal organs
Consequence of Patient Anatomical Variation
16
An optimal treatment plan may become less optimal or not optimal at all Dose to tumor ↓ Dose to normal tissues ↑
Dose to tumor ↓ → Tumor control ↓
Dose to normal tissues ↑ → Toxicity ↑
Toxicity ↑ → Prescribed tumor dose ↓ → Tumor control ↓
Solution
Develop a new treatment plan that is optimal to
patient’s new geometry
Adaptive radiation therapy (ART)
17
18
Simulation Planning
Days Days 5-8 min
On-board Imaging Re-planning Treatment
Repeat
Online ART
On-board volumetric imaging has recently become available
Major technical obstacle for clinical realization of online ART
Real-time re-planning
Imaging dose
Clinical workflow
Our Solution to Real-time Re planning Problem
Development of GPU-based computational tools
19
SCORE: Supercomputing On-line Re-planning Environment
Project Goal
To develop real-time re-planning tools based on GPUs
Funded by a UC Lab Research Grant
A collaboration with SDSC and Lawrence Livermore
National Laboratory
20
Online Re-planning Process
21
Deformable Image Regis
Dose Calculation
PlanRe-optimization
Deformed pCT and Contours
Dose Deposition
Coefficients
Planning CT w/ Contours
Beam Setup
Dose Distribution
Initial Plan
New Plan
Treatment Planning System
CBCT Reconstruction
Development of GPU-based Real-time Deformable Image Registration
22
Gu et al Phys Med Biol 55(1): 207-219, 2010
Deformable Image Registration
23
Morphing one image into another with correct correspondence
revbrad_0001.wmv
Deformable Image Registration with ‘Demons’
2424
Moving Image Im(r)
Updating nnn drrr 1
End
Static Image
Moving vector ndr
Gradient )( nsI r Gradient )( n
mI r
Compare )(),( 11 nm
ns II rr
CPU GPU
GPU CPU
)( nsI r
)( nmI r
Passive Force?
Active Force?
Stopping Criteria
Start
Yes
Yes Yes
No No
No
Gu et al Phys Med Biol 55(1): 207-219, 2010
Results for GPU-based Demons Algorithms
Method Case 1 Case 2 Case 3 Case 4 Case 5 Average
PF 1.11/6.80 1.04 /7.18 1.36/7.39 2.51/6.49 1.84/7.24 1.57/7.02
ePF 1.10/6.82 1.00/7.20 1.32/7.42 2.42/6.56 1.82/7.08 1.53/7.02
AF 1.15/8.29 1.05/9.24 1.39/8.79 2.34/7.75 1.81/8.44 1.55/8.50
DF 1.19/7.71 1.16/8.65 1.48/8.02 2.59/8.30 1.91/8.44 1.66/8.22
aDF 1.11/8.36 1.02/8.69 1.35/8.97 2.27/7.77 1.80/8.70 1.51/8.50
IC 1.24/11.07 1.28/11.47 1.42/11.54 3.27/10.46 1.67/10.98 1.78/11.10
25
3D spatial error (mm) / GPU time (s), image size 256×256×100~100x speedup compared to an Intel Xeon 2.27 GHz CPU
Development of GPU-based Real-time Dose Calculation
26
Gu et al Phys Med Biol 54(20) 6287-97, 2009Jia et al Phys Med Biol 2010 (in print)
Finite-size Pencil Beam (FSPB) Model
27
d
zb
ferf
d
zb
ferf
d
xa
ferf
d
xa
ferf
dAzdxD
ii
iii
iEFSPB
i
2
'2
2
'2
2
'2
2
'2
4
)(),,(
3
1
Results for GPU-based FSPB Algorithm
28
Voxel size (cm3) Beamlet size (cm2)
# Voxels ( 106 )
# Beamlets
CPU Time (sec)
GPU Time (sec)
Speedup
0.50x0.50x0.50 0.20x0.20 0.22 2500 21.22 0.06 373
0.37x0.37x0.37 0.20x0.20 0.51 2500 42.80 0.10 409
0.30x0..30x0.30 0.20x0.20 1.00 2500 78.27 0.18 419
0.25x0.25x0.25 0.20x0.20 1.73 2500 124.54 0.30 421
0.25x0.25x0.25 0.25x0.25 1.73 1600 120.14 0.29 415
0.25x0.25x0.25 0.33x0.33 1.73 900 112.78 0.27 416
0.25x0.25x0.25 0.50x0.50 1.73 400 100.77 0.24 417
~400x speedup compared to an Intel Xeon 2.27 GHz CPU< 1 sec for a 9-field prostate IMRT plan
Monte Carlo Dose Calculation on GPU
Directly map DPM code on GPU Treat a GPU card as a CPU cluster
29
Start
Transfer data to GPU including random # seeds, cross sections, and pre-generated e- tracks etc.
a). Clean local counterb). Simulate one MC history on thread #1
c). Put dose to global counter
Reach a preset # of histories ?
End
……
No
Yes
Transfer data from GPU to CPU
a). Clean local counterb). Simulate one MC history on thread #1
c). Put dose to global counter
a). Clean local counterb). Simulate one MC history on thread #1
c). Put dose to global counter
Results for GPU-based MC Dose Calculation
30
Case #
Sourcetype
# of Histories
Stan DevCPU(%)
Stan DevGPU (%)
TCPU
(min)
TGPU
(min)TCPU/TGPU
1 Electron 107 0.66 0.65 8.3 1.8 4.5
2 Photon 109 0.41 0.41 94 17 5.5
~5x speedup compared to an Intel Xeon 2.27 GHz CPU< 3 min for 1% sigma for photon beams
Development of GPU-based Real-time Plan Re-optimization
31
Men et al Phys Med Biol 54(21):6565-6573, 2009 Men et al Phys Med Biol 2010 (under review)Men et al Med Phys 2010 (to be submitted)
Results of Real-time Re-planning We have developed GPU-based computational
tools for real-time treatment re-planning
For a typical 9-field prostate case―The deformable registration can be done in 7 seconds
―The dose calculation takes less than 2 seconds
―The plan re-optimization takes less than 1 second (FMO), 2 seconds (DAP), or 30 seconds (VMAT)
A new plan can be developed in about 10-40 seconds
Online ART may substantially improve local tumor control while reducing normal tissue complications
Tools can be used to solve other radiotherapy problems
32
Dynamic Data Driven Image-guided Neurosurgery
A Majumdar1, A Birnbaum1, D Choi1, A Trivedi2, S. K. Warfield3, K. Baldridge1, and Petr Krysl2
1 San Diego Supercomputer Center University of California San Diego
2 Structural Engineering Dept University of California San Diego
3 Computational Radiology Lab Brigham and Women’s Hospital
Harvard Medical School
Grants: NSF: ITR 0427183,0426558; NIH:P41 RR13218, P01 CA67165, LM0078651, I3 grant (IBM)
33
Neurosurgery Challenge
Challenges : Remove as much tumor tissue as possible Minimize the removal of healthy tissue Avoid the disruption of critical anatomical structures Know when to stop the resection process
Compounded by the intra-operative brain shape deformation that happens as a result of the surgical process – preoperative plan diminishes
Important to be able to quantify and correct for these deformations while surgery is in progress by dynamically updating pre-operative images in a way that allows surgeons to react to these changing conditions
The simulation pipeline must meet the real-time constraints of neurosurgery – provide images approx. once/hour within few minutes during surgery lasting 6 to 8 hours
Intraoperative MRI Scanner at BWH
Brain Shape Deformation
Before surgery After surgery
Example of visualization: Intra-op Brain Tumor with Pre-op fMRI
Overall Process
Before image guided neurosurgery
During image guided neurosurgery
Segmentation and Visualization
Preoperative Planning ofSurgical Trajectory
Preoperative
Data Acquisition
Preoperative data
Intraoperative MRISegmentation Registration
Surfacematching
Solve biomechanicalModel for volumetricdeformation
Visualization Surgicalprocess
Timing During Surgery
Time (min)
Before surgery
During surgery
0 10 20 30 40
Preopsegmentation
Intraop MRISegmentationRegistration
Surfacedisplacement
Biomech simulationVisualization
Surgical progress
Current Prototype DDDAS Inside Hospital
Pre and Intra-op 3D MRI (once/hr)
Local computer
at BWHCrude linear elastic FEM solution
Merge pre and intra-op viz
Intr
a-op
sur
gica
l de
cisi
on a
nd s
teer
Segmentation, Registration, Surface Matching for BC
Once every hour or twofor a 6 or 8 hour surgery
Two Research Aspects
Grid Architecture – grid scheduling, on demand remote access to multi-teraflop machines, data transfer Data transfer from BWH to SDSC, solution of detail
advanced biomechanical model, transfer of results back to BWH for visualization need to be performed in a few minutes
Development of detailed advanced non-linear scalable viscoelastic biomechanical model To capture detail intraoperative brain deformation
End-to-end Timing of RTBM
• Timing of transferring ~20 MB files from BWH to SDSC, running simulations on 16 nodes (32 procs), transferring files back to BWH = 9* + (60** + 7***) + 50* = 124 sec.
• This shows that the grid infrastructure can provide biomechanical brain deformation simulation solutions (using the linear elastic model) to surgery rooms at BWH within ~ 2 mins using TG machines
• This satisfies the tight time constraint set by the neurosurgeons
Current and New Biomechanical Model
Current linear elastic material model – RTBMAdvanced model under development - FAMULSAdvanced model is based on conforming adaptive
refinement method – FAMULS package (AMR)Inspired by the theory of wavelets this refinement
produces globally compatible meshes by construction
First task was to replicate the linear elastic result produced by the RTBM code using FAMULS
Advanced Biomechanical Model
The current solver is based on small strain isotropic elastic principle
The new biomechanical model will be inhomogeneous scalable non-linear viscoelastic model with AMR
We also want to increase resolution close to the level of MRI voxels i.e. millions of FEM meshes
Since this complex model still has to meet the real time constraint of neurosurgery it requires fast access to remote multi-tflop systems
Summary
HPC resources can enable near real time simulations for various scientific, engineering, and medical applications
The architecture has to plan what are the right HPC resources how to access the HPC resources deal with data transfer etc.
Overall this can facilitate Natural or man-made event-driven rapid response and
preparedness Adaptive simulations to provide new capability Dynamic data driven simulations to enhance quality
45