Upload
christine-stevens
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
SciDAC Software Infrastructurefor Lattice Gauge Theory
SciDAC Software Infrastructurefor Lattice Gauge Theory
Richard C. Brower
Annual Progress Review
JLab , May 14 , 2007
Code distribution see http://www.usqcd.org/software.html
Software Committee
• Rich Brower (chair) [email protected]
• Carleton DeTar [email protected]
• Robert Edwards [email protected]
• Don Holmgren [email protected]
• Bob Mawhinney [email protected]
• Chip Watson [email protected]
• Ying Zhang [email protected]
SciDAC-2 Minutes/Documents &Progress report
http://super.bu.edu/~brower/scc.html
Major Participants in SciDAC Project
Arizona Doug Toussaint MIT Andrew Pochinsky
Dru Renner Joy Khoriaty
BU Rich Brower * North Carolina Rob Fowler
James Osborn Ying Zhang *
Mike Clark JLab Chip Watson *
BNL Chulwoo Jung Robert Edwards *
Enno Scholz Jie Chen
Efstratios Efstathiadis Balint Joo
Columbia Bob Mawhinney * IIT Xien-He Sun
DePaul Massimo DiPierro Indiana Steve Gottlieb
FNAL Don Holmgren * Subhasish Basak
Jim Simone Utah Carleton DeTar *
Jim Kowalkowski Ludmila Levkova
Amitoj Singh Vanderbilt Ted Bapty
* Software Committee: Participants funded in part by SciDAC grant
QCD Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse high performance hardware.
Requirements:
1. Build on 20 year investment in MILC/CPS/Chroma
2. Optimize critical kernels for peak performance
3. Minimize software effort to port to new platforms & to create new applications
Solution for Lattice QCD
(Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor.
(Complete) Latency hiding: overlap computation /communications
Data Parallel: operations on Small 3x3 Complex Matrices per link.
Critical kernels : Dirac Solver, HMC forces, etc. ... 70%-90%
Lattice Dirac operator:
Optimized Dirac Operators, InvertersLevel 3
QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts
Level 2
QMP (QCD Message Passing)
QLA (QCD Linear Algebra)Level 1
QIOBinary/ XML Metadata Files
SciDAC-1 QCD API SciDAC-1 QCD API
C/C++, implemented over MPI, nativeQCDOC, M-via GigE mesh
Optimised for P4 and QCDOC
Exists in C/C++
ILDG collab
Data Parallel QDP/C,C++ API
• Hides architecture and layout• Operates on lattice fields across sites• Linear algebra tailored for QCD• Shifts and permutation maps across sites• Reductions• Subsets• Entry/exit – attach to existing codes
Example of QDP++ Expression
Typical for Dirac Operator:
QDP/C++ code:
Use Portable Expression Template Engine (PETE)Temporaries eliminated, expressions optimized
QOP (Optimized in asm)Dirac Operator, Inverters, Force etc
QDP (QCD Data Parallel)Lattice Wide Operations, Data shifts
QMP(QCD Message Passing)
QLA(QCD Linear Algebra)
QIOBinary / XML files & ILDG
SciDAC-2 QCD API SciDAC-2 QCD API
QMC (QCD Multi-core interface)
Uniform User EnvRuntime, accounting, grid,
QCD Physics ToolboxShared Alg,Building Blocks, Visualization,Performance
ToolsLevel 4 Workflow
and Data Analysis tools
Application Codes: MILC / CPS / Chroma / RoleYourOwn
Level 3
Level 2
Level 1
SciDAC-1/SciDAC-2 = Gold/Blue
PERITOPS
0
200
400
600
800
1000
1200
1400
1600
0 5000 10000 15000 20000 25000 30000
Local lattice size
Mflops/node
JLAB 3G, Level II JLab 3G, Level III JLAB 4G, Level II
JLab 4G, Level III FNAL Myrinet, Level III
Level 3 Domain Wall CG Inverter y
y Ls = 16, 4g is P4 2.8MHz, 800MHz FSB
(6)102 40 6 163
(32 nodes)
42 6 16 (4) 52 20
Asqtad Inverter on Kaon cluster @ FNAL
0
200
400
600
800
1000
1200
4 6 8 10 12 14
SciDAC 16
SciDAC 64
non-SciDAC 16
non-SciDAC 64
Mflop/s per core
L
Comparison of MILC C code vs SciDAC/QDP on L4 sub-volumes for16 and 64 core partition of Kaon
Level 3 on QCDOC
0
50
100
150
200
250
300
350
400
3 4 6 8 10
Asqtad
Asqtad CG on L4 subvolumes
Mflop/s
L
Asqtad RHMC kernels
243x32 with subvolumes 63x18
DW RHMC kernels
323x64x16 with subvolumes 43x8x16
Building on SciDAC-1
Common Runtime Environment1. File transfer, Batch scripts, Compile targets2. Practical 3 Laboratory “Metafacility”
Fuller use of API in application code. 1. Integrate QDP into MILC & QMP into CPS2. Universal use of QIO, File Formats, QLA etc 3. Level 3 Interface standards
Porting API to INCITE Platforms1. BG/L & BG/P: QMP and QLA using XLC & Perl script
2. Cray XT4 & Opteron, clusters
Workflow and Cluster Reliability1. Automated campaign to merge lattices, propagators and
extract physics . (FNAL & Illinois Institute of Tech) 2. Cluster Reliability: (FNAL & Vanderbuilt)
Tool Box -- shared algorithms / building blocks1. RHMC, eigenvector solvers, etc
2. Visualization and Performance Analysis (DePaul & PERC) 3. Multi-scale Algorithms (QCD/TOPS Collaboration)
Exploitation of Multi-core1. Multi-core not Hertz is new paradigm 2. Plans for a QMC API (JLab & FNAL & PERC)
New SciDAC-2 Goals
http://lqcd.fnal.gov/workflow/WorkflowProject.html
http://www.yale.edu/QCDNA/
See SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop
QMC – QCD Multi-Threading• General Evaluation
– OpenMP vs. Explicit Thread library (Chen)
– Explicit thread library can do better than OpenMP but OpenMP is compiler dependent
• Simple Threading API: QMC – based on older smp_lib (Pochinsky)
– use pthreads and investigate barrier synchronization algorithms
• Evaluate threads for SSE-Dslash
• Consider threaded version of QMP ( Fowler and Porterfield in RENCI)
Parallelsites 0..7
Serial Code
Parallelsites 8..15
Idle
finalize / thread join
ConclusionsProgress has been made using a common QCD-API & libraries for
Communication, Linear Algebra, I/0, optimized inverters etc.
But full Implementation, Optimization, Documentation & Maintenance of shared codes is a continuing challenge.
And there is much work to keep up with changing Hardware and Algorithms.
Still NEW users (young and old) with no prior lattice experience
have initiated new lattice QCD research using SciDAC software!
The bottom line is PHYSICS is being well served.