View
229
Download
1
Category
Preview:
Citation preview
The Astrophysics SimulationCollaboratory Portal
Case Study of a Grid-Enabled Application Environment
HPDC-10 San Francisco
Michael Russell, Gabrielle Allen,
Greg Daues, Ian Foster,
Tom Goodale, Edward Seidel,
Jason Novotny, John Shalf,
Wai-Mo Suen, Gregor von Laszewski
Collaboratory: Fully-integrated one-stop-shop for
your research community
CACTUS is a freely available, modular,
portable and manageable environment for collaboratively developing parallel, high-performance multi-dimensional simulations
Cactus Community DLR
Geophysics(Bosl)
Numerical Relativity CommunityCornell
Crack prop.
NASA NS GC
Livermore
SDSS(Szalay)
Intel
Microsoft
Clemson
“Egrid”NCSA, ANL, SDSC
AEI Cactus Group(Allen)
NSF KDI(Suen)
EU Network(Seidel)
Astrophysics(Zeus)
US Grid ForumDFN Gigabit
(Seidel)
“GRADS”(Kennedy, Foster,
Dongarra, et al)
ChemEng(Bishop)
San Diego, GMD, CornellBerkeley
(Shalf)
NSF-KDI ProjectAstrophysics Simulation Collaboratory
Physics: Accretion-induced collapse of Neutron Stars Numerics:
Full evolution of Einstein’s equations in 3D (AEI) Relativistic Hydrodynamics (WashU) Zeus MHD (NCSA) Adaptive Mesh Refinement (Rutgers)
Resources: Distributed all over the US and Europe (NCSA, SDSC, Rutgers, ANL, UNM, Golm, Garching)
People: Also distributed worldwide Different fields of expertise Different levels of expertise
Complex Workflow
Acquire CodeModules
ConfigureAnd Build
Bugs?
Report/Fixbugs
Set ParamsInitial Data
Run ManyTest Jobs
Steer, Kill,Or restart
Correct?
Select largestRsrc and runFor a week
Remove visand steer
NovelResults?
Archive TB’sOf Data
Select andStage data toStorage array
Regression
Rmt Vis
Data Mine
ObservationY
NY
NN
Y
PapersNobel Prizes
Users View Has To Be Easy!
ASC Portal (Major Components)
Web based portal interface Simplicity for users Available e-commerce infrastructure Central location for collaborative interactions
Globus Grid Services Widely deployed at HPC sites in US and Europe Uniform security model (single-sign-on) Uniform access to essential services (batch queues, files, information)
Cactus Computational ToolKit Modular/parallel multi-physics framework Multi-platform (NT, Unix, Linux ; Alpha, x86, IA64,MIPS,SR8000) Multilanguage (C, C++, F90, F77, Java, Python, Perl) Robust toolkit (Numerics, Parallel I/O, Remote Steering and Vis, PETSC) AMR Integration (GrACE) Revision control and active bug tracking support infrastructure
ASC Portal (Web Components)
Apache SSL/Secure HTTP
Tomcat Java Server Pages Automation
Java CoG 100% Pure Java implementation of Globus Services and API’s
GPDK Wraps CoG in JSP-amenable Java Beans Provides higher level services like connection pooling
SQL/RDBMS Portal internal state information (rather than serialization or flatfiles) Proven scalability for server replication Independent of implementation of portal automation
Inside the Portal Webserver
Manage Compute Resources1. Select machines in ASC Grid 2. Add to user’s profile
Cactus Software1. Select project and thorns 2. Run cvs checkout
Build Applications1. Edit configuration 2. Run “make…”
Parameter Files1. Edit parameter file
Cactus Simulations
1. Edit simulation 2. Run simulation 3. Steer simulation
Thorn HTTPD• Thorn which allows
simulation any to act as its own web server
• Connect to simulation from any browser anywhere
• Monitor run: parameters, basic visualization, ...
• Change steerable parameters
• See running example at www.CactusCode.org
• Wireless remote viz, monitoring and steering
Remote Visualization
IsoSurfaces and Geodesics
Contour plots(download)
Grid FunctionsStreaming
HDF5
Amira
Isoview
LCAVision
Amira
LCA Vision
OpenDXOpenDX
Go!
Clone job with steered parameter
Queue time over, find new machine
Add more resources
Found a horizon,try out excision
Look forhorizon
Calculate/OutputGrav. Waves
Calculate/OutputInvariants
Find bestresources
Free CPUs!!
NCSA
SDSC RZG
SDSC
LRZ Archive data
Dynamic Grid ComputingCan already do some of this!
Track!
GridLab: New Paradigms for Dynamic Grids
Dynamic Distributed apps with Grid-threads (gthreads) Code should be aware of its environment
What resources are out there NOW, and what is their current state? What is my allocation? What is the bandwidth/latency between sites?
Code should be able to make decisions on its own A slow part of my simulation can run asynchronously…spawn it off! New, more powerful resources just became available…migrate there! Machine went down…reconfigure and recover! Need more memory…get it by adding more machines!
Code should be able to publish this information to Portal for tracking, monitoring, steering… Unexpected event…notify users! Collaborators from around the world all connect, examine simulation.
GridLab: Enabling Dynamic Grid Applications• Large EU Project
Under Negotiation with EC…
• AEI, Lecce, Poznan, Brno, Amsterdam, ZIB-Berlin, Cardiff, Paderborn, SZTAKI, Compaq, Sun, Chicago, ISI, Wisconsin
• Grid Application Toolkit for both Apps and Infrastructure
• 20 positions opening up!! Come join us!
Recommended