View
217
Download
0
Category
Tags:
Preview:
Citation preview
CCS machine development plan for post-peta scale computing
and Japanese the next generation
supercomputer project
Mitsuhisa SatoCCS, University of Tsukuba
2
2010.2.22
core core
core core
mem
ory
core core
core core
core core
core core
core core
core core
mem
ory
mem
orym
emory
core core
core core
core core
core core
mem
orym
emory
core core
core core
core core
core core
core core
core core
core core
core core
core core
core core
core core
core core
mem
orym
emory
mem
orym
emory
mem
orym
emory
IO interface IO interface
Network (DDR Infiniband x 4)
#node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory 5 TB network 250MB/s/link x 3 (3D-HXB by GbE)
#node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory 5 TB network 250MB/s/link x 3 (3D-HXB by GbE)
L1 SWs
Nodes
L2 SWs
L3 SWs
Full bi-sectional FAT-tree Network
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 361 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 2 3 4 5 6 7 8 9 10 11 1211 22 33 44 55 66 77 88 99 1010 1111 1212
1 2 3 4 5 6 7 8 9 10 11 1211 22 33 44 55 66 77 88 99 1010 1111 1212
: #Node with 4 Linksn
: #24ports IB Switchn
: #Node with 4 Linksn : #Node with 4 Linksnn
: #24ports IB Switchn : #24ports IB Switchnn
696Node
Total switch 616
#Item
144Level 3 switch
240Level 2 switch
232Level 1 switch
696Node
Total switch 616
#Item
144Level 3 switch
240Level 2 switch
232Level 1 switch
Detail View for one network unit
x 20 network units
※ ノード総数696台にはオンラインの スペアノード4台を含みます。
Designed by T2K Open SupercomputerAlliance (U. Tokyo and Kyoto U)
Spec ;• 648 nodes (quad Opteron, 4sockets/node)• 10000 cores• Peak performance 95.4TF• total memory 20TB• total disk capacity 800TB( 20th in top 500, June, 2008)
A Special-purpose system to Astrophysics simulation by hybrid computation of radiation and N-body.
Each node is equipped by GRAPE-6, which is an accelerator specialized for N-body Gravity calculation.
256 nodes performance: cluster 3.5TFLOPS + Grape-6 35TFLOPS
PACS-CS FIRST
T2K-tsukuba
GRAPE-6
(2006 ~) (2007 ~)
(2008 ~)
Computing resources in CCS
3
2010.2.22
System installation and future plans
2004 20122011201020092005 2006 2007 2008
FCS-IV
FCS-V
HA-PACS(planned)
PACS-CS
CP-PACS
T2K
FIRST
H16 H17 H18 H19 H20 H21 H22 H23 H24
(計画)
2011-2013
VPP suspended
2013
NGS(10PF)
FCS: Front-end system
the next systemto T2K
4
2010.2.22
Issues for Post-peta scale systems (not exa?)
System to enable strong-scaling the current petascale system enabled by weak-scaling We need more powerful node & network
GPGPU is one of solution
More specialized architecture we need a sharp science target All applications cannot use
More difficult to program Need supports from CS-side Collaboration with computer science
and computational science
1 10 102 103 104 105 106
1GFlops109
1TFlops1012
1PFlops1015
1EFlops1018
#node
Peakflops
limitationof #node
Exaflops system
PACS-CS(14TF)
target ofHA-PACS
NGS> 10PF
CCS's mission
5
2010.2.22HA-PACS: Highly Accelerated Parallel Advanced system for Computational Sciences
(planned)Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system
Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system
Design and deploy a GPGPU-based Cluster system Research on programming model and languages, environment for parallel
system with accelerators. Design of Algorithms and applications for parallel system with accelerators. Research on architectures for parallel system with accelerators.
IB switch IB switch
..............
18 node
IB switch IB switch
..............
18 node
IB switch IB switch
..............
12 node
IB switch IB switch
..............
12 node
IB switch IB switch
..............
18 node
IB switch IB switch
..............
18 node8coreCPU
8coreCPU
GPGPU GPGPU
Infiniband QDRx 2 port
........
18 groups
2-stage Fat-Tree (Infiniband QDR)..... .........
•ノード構成:8-core CPU x 2 + GPU x 2•ネットワーク構成:Infiniband QDR x 2 / node
Full-bisection B/W Fat-Tree•ピーク性能:2TFLOPS/node x 324
= 648TFLOPS
Total #node = 18x18 = 324
examples
6
2010.2.22
IB switch IB switch
..............
12 node
IB switch IB switch
..............
12 node
IB switch IB switch
..............
12 node
HA-PACS/NG powered by PEARL Link
CPUCPU
PEACHPEACH GPUGPU
to neighbor node
To external PCI-e switch
To neighbor node
CPUCPU
PEACHPEACH GPUGPU
to neighbor node
To external PCI-e switch
To neighbor node
CPU
GPGPU
CPU
GPGPU
Infiniband QDR
..... .........
GPGPU GPGPU GPGPU..............
PEARL Link
PEARL Link
PCIe PCIe
Infiniband QDR
CPUCPU
PEACHPEACH GPUGPU
CPUCPU
PEACHPEACH GPUGPU
CPUCPU
PEACHPEACH GPUGPU
CPUCPU
PEACHPEACH GPUGPU
DirectConnectionbetweenGPUs
PEARL: PCI-Express Adaptive and Reliable LinkUse PCI-Express as a high-speed linkConnect CPU and devices including GPGPU through a router chip, PEACH (PCI-Express Adaptive Communication Hub)
7
2010.2.22
Strategic target computational sciences of HA-PACS
① Bio-physics : high performance QM/MM hybrid simulation for mechanisms of high-efficiency enzymatic reactions, electronic and 3D structures of biomacromolecules
Speedup of QM is a key for this simulations
② astrophysics: full Hydrodynamics and radiative-transfer simulation for the Universe and Formation of Astronomical objects
Full 6 dimensional simulation is required
③ Particle physics: full-lattice QCD simulation
Japanese the next generation supercomputer project
9
2010.2.22
background: Japanese government plan The 3rd Science and Technology Basic Plan (FY2006-FY2010) “Next-generation super computing technology” is selected as one of
key technologies of national importance Development and installation of the advanced high performance
supercomputer system (10petaflops) → the Next-Generation Supercomputer
Development application software Establishment of “Advanced Computational Science and Technology
Center” (tentative name) The 4th Science and Technology Basic Plan (FY2011-FY2015)
(Now under discussion) Exaflops class HPC Technology New chip device, software, hardware…
After the election of the House of Representatives in the last summer,….
In the November of the last year, the new government party have decided to freeze the plan of the development at the screening of government projects!!!
In January of this year, the cabinet have made a decision to resume the super computer project.
10
2010.2.22
The System Overview of NGS
Ultra high-speed/ high-reliable CPU Advanced 45nm process technology 8cores/CPU, 128GFLOPS Error recovery ( ECC, Instruction retry, etc.)
High performance/highly reliable network Direct interconnection network by multi-dimensional mesh/torus network Expandability and reliablity
System Software Linux OS Fortran, C, and MPI libraries Distributed parallel file system
【 Massively Parallel/Distributed Memory Supercomputer 】
Logical 3-dimensional torus network
Courtesy of FUJITSU
11
2010.2.22
Configuration of Compute Nodes Number of nodes > 80k
Number of CPUs > 80kNumber of cores > 640k
Peak Performance > 10PFLOPSTotal Memory Capacity > 1PB ( 16GB/node )
Multi-dimensional mesh/torus networkPeak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus networkPeak bi-sectional bandwidth: > 30TB/s
ノード
CPU: 128GFLOPS(8 Core)
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
CoreSIMD(4FMA)
16GFlops
L2$: 5MB
64GB/s
CoreSIMD(4FMA)16GFLOPS
MEM: 16GB
Logical 3-dimensional torus network for programmingx
y
z
5GB/s x 2 5GB/s x 2
5G
B/s x
25
GB
/s x 2
5GB/s x 2
5GB/s x 2
12
2010.2.22
The Next-Generation Supercomputer Project
FY2008 FY2009 FY2010 FY2011
Computerbuilding
Researchbuilding
FY2007FY2006 FY2012
Next-GenerationIntegrated NanoscienceSimulation
Next-GenerationIntegratedLife Simulation VerificationVerificationDevelopment, production, and evaluationDevelopment, production, and evaluation
Tuning and improvement
Tuning and improvement
VerificationVerification
Production, installation, and adjustment
Production, installation, and adjustment
ConstructionConstructionDesignDesign
ConstructionConstructionDesignDesign
Prototype andevaluation Detailed design Detailed design
Conceptualdesign
Development, production, and evaluationDevelopment, production, and evaluation
System
Bu
ildin
gs
Ap
plic
atio
ns
○Schedule
open to users
13
2010.2.22
The categories of users of NGS
1. Strategic Use: MEXT selected 5 strategic fields from national viewpoint. Field 1: Life science/Drug manufacture Field 2: New material/energy creation Field 3: Global change prediction for disaster
prevention/mitigation Field 4: Mono-zukuri (Manufacturing technology) Field 5: The origin of matters and the universe
2. General Use: The use for the needs of the researchers in many
science and technology fields including industrial use and educational use
14
2010.2.22
Organization for NGS “Advanced Computational Science and Technology Center”
(ACSTC) (tentative name) will be organized at NGS.
MEXT selects 5 core organizations that lead research activities in 5 strategic fields
ACSTC → Core research center• Conducts advanced and basic R&D in computational science• Leads cooperation among strategic fields• Provides key knowledge to 5 organizations in strategic fields and another
research organizations 5 core organizations → Research center in each field
• Conducts advanced R&D in each field
• CCS was selected as a core organization for "Field 5: The origin of matters and the universe"
• particle physics, Astrophysics, nuclear physics• Collaboration with KEK and National Observatory
Recommended