Upload
mary-lawson
View
220
Download
4
Tags:
Embed Size (px)
Citation preview
155 South 1452 East Room 380 Salt Lake City, Utah 84112
1-801-585-1233
This research was sponsored by the National Nuclear Security Administration under the Accelerating Development of Retrofitable CO2 Capture Technologies through Predictivity program through DOE Cooperative Agreement DE-NA0000740
Integration of Reverse Monte-Carlo Ray Tracing within Uintah
Todd HarmanDepartment of Mechanical Engineering
Jeremy Thornock
Department of Chemical Engineering
Isaac HunsakerGraduate Student
Department of Chemical Engineering
• Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES.
Scalability demonstration.
DeliverablesDeliverables
ApproachApproach
CFD:
Finest level, (always)
RMCRT:
1 Level: CFD
2 Level: coarsest level
“Data Onion”: finest level,
Research Topic: Region of Interest (ROI)
2 Levels2 Levels
2 Levels
Data OnionData Onion
3-Levels
Data Onion: ROIData Onion: ROI Implemented
Research Topic: ROI location?
Static:
• User defined region?
Data Onion: ROIData Onion: ROI Implemented
Research Topic: ROI location
Dynamic: • ROI computed every timestep? (abskg sigmaT4)
• ROI proportional to the size of fine level patches?
Status: CompletedStatus: Completed
80% Complete: Data Onion, dynamic & static region of interests.
Testing phase, need benchmarks.
90% Complete: Integration of RMCRT tasks within ARCHES
(2 level)
Status: Work in ProgressStatus: Work in Progress• Single Level
Verification Order of accuracy
# rays (old)
grid resolution
Scalability studies, new mixed scheduler.
• 2 Levels verification
Errors associated with coarsening
Benchmark ProblemBenchmark Problem
S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997.
Initial Conditions:
- Uniform temperature field
- Analytical function for absorption coefficient
Verification: 1LVerification: 1L
S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997.
Verification: 1LVerification: 1L
Verification: 2LVerification: 2L
4X error from coarsening abskg
Verification: 2LVerification: 2L
Coarsening: smoothing filter
Error
Abskg
CollaborationCollaboration
Leverage the work of Dr. Berzin’s team
Hybrid MPI-threaded Task Scheduler (Qingyu Meng)
GPU-RMCRT (Alan Humphrey)
Hybrid MPI-threaded Task SchedulerHybrid MPI-threaded Task Scheduler
Hybrid MPI-threaded Task Scheduler*:
• Memory reduction!
• 13.5Gb -> 1GB per node (12 cores/node)*.
(2 material CFD problem, 20483 cells, on 110592 cores of Jaguar)
• Interconnect drivers and MPI software must be threadsafe.
• RMCRT requires an MPI environmental variable expert!
*Q. Meng, M. Berzins, and J. Schmidt, Using hybrid parallelism to improve memory use in uintah. In Proceeding of the Teragrid 2011.
MPI-threaded Task SchedulerMPI-threaded Task Scheduler Kraken
100 rays per cell
MPI-threaded Task SchedulerMPI-threaded Task Scheduler
Difficult to run on Kraken, crashing in mvapich
Further testing needed on bigger machines?
GPU-RMCRT
Motivation - Utilize all available hardware
Uintah’s asynchronous task-based approach is well suited to take
advantage of GPUs
RMCRT is ideal for GPUs
Keeneland Initial Delivery System360 GPUs
DoE Titan1000s of GPUs
Nvidia M2070/90 Tesla GPU
Multi-core CPU
+
GPU-RMCRT
• Offload Ray Tracing and RNG to GPU(s)
Available CPU cores can perform other computation.
• Uintah infrastructure supports GPU task scheduling and execution:
Can access multiple GPUs on-node
Uses Nvidia CUDA C/C++
• Using NVIDIA cuRAND Library
GPU-accelerated random number generation (RNG)
Uintah Hybrid CPU/GPU Scheduler
• Create & schedule CPU & GPU tasks
• Enables Uintah to “pre-fetch” GPU data
• Uintah infrastructure manages:
• Queues of CUDA Stream and Event handles
• Device memory allocation and transfers
• Utilize all available: CPU cores and GPUs
Uintah GPU Scheduler Abilities
• Capability jobs run on:
Keeneland Initial Delivery System (NICS)
1440 CPU cores & 360 GPUs simultaneously
Jaguar - GPU partition (OLCF)
15360 CPU cores & 960 GPUs simultaneousl
• Development of GPU RMCRT prototype underway.
Status: PendingStatus: Pending• Head-to-head comparison of RMCRT with Discrete Ordinates Method.
Single level.
Accuracy versus computational cost.
• 2 Levels:
Coarsening error for variable temperature and radiative properties.
• Data Onion:
Serial performance
Accuracy versus number of levels, refinement ratio, dynamic/static ROI.
Scalability Studies
SummarySummary
• Order of Accuracy: # rays0.5
, grid Cells1
• Accuracy issues related to coarsening data.
• Cost = f( #rays, Grid Cells1.4-1.5 communication….)
Doubling the grid resolution = 20ish X increase in cost.
• Good scalability characteristics
Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES.
Scalability demonstration.
SummarySummary
Acknowledgements:
DoE for funding the CSAFE project from 1997-2012, DOE NETL, DOE NNSA, INCITE
NSF for funding via SDCI and PetaApps
Keeneland Computing Facility, supported by NSF under Contract OCI-0910735 Oak Ridge Leadership Computing Facility – DoE Jaguar XK6 System (GPU partition)
http://www.uintah.utah.edu
GPU RMCRTGPU RMCRT
PhysicsPhysics
• Isotropic scattering added to the model
• Verification testing performed using an exact solution (Siegel, 1987)
• Grid convergence analysis performed
• Discrepancy diminishes with increased mesh refinement
Isotropic Scattering:VerificationIsotropic Scattering:Verification
Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987.
Benchmark Case of Seigel 1987
• Cube (1m3)
• Uniform Temperature 64.7K
• Mirror surface on all sides
• Black top and bottom walls
• Computed surface fluxes on top & bottom walls
• 10 rays per cell (low)
Isotropic Scattering:VerificationIsotropic Scattering:Verification
Radiative Flux vs Optical Thickness
Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987.
RMCRT (dots)Exact solution (lines)
Isotropic Scattering:VerificationIsotropic Scattering:Verification
Grid convergence of the L1 error norms where the scattering coefficient is 8 m-1, and the absorption coefficient is 2m-1.
DOM vs RMCRTDOM vs RMCRT
IFRF burner simulation (production size run)
• 1344 processors/cores
• Initial conditions taken from a previous run with DOM.
• Domain: (1m x 4.11 m x 1m)
• Resolution: (4.4mm x 8.8mm x 4.4mm) 24 million cells
DOM vs RMCRTDOM vs RMCRT