CCS machine development plan for post- peta scale computing and Japanese the next generation...

CCS machine development plan for post-peta scale computing

and Japanese the next generation

supercomputer project

Mitsuhisa SatoCCS, University of Tsukuba

2010.2.22

core core

IO interface IO interface

Network (DDR Infiniband x 4)

#node 2560 node (Intel Xeon 2.8GHz, single core /node) peak performance 14.34 TF memory 　 5 TB network 250MB/s/link x 3 (3D-HXB by GbE)

L1 SWs

L2 SWs

L3 SWs

Full bi-sectional FAT-tree Network

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 361 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 2 3 4 5 6 7 8 9 10 11 1211 22 33 44 55 66 77 88 99 1010 1111 1212

: #Node with 4 Linksn

: #24ports IB Switchn

: #Node with 4 Linksn : #Node with 4 Linksnn

: #24ports IB Switchn : #24ports IB Switchnn

696Node

Total switch 616

144Level 3 switch

240Level 2 switch

232Level 1 switch

696Node

Total switch 616

144Level 3 switch

240Level 2 switch

232Level 1 switch

Detail View for one network unit

x 20 network units

※ ノード総数696台にはオンラインのスペアノード4台を含みます。

Designed by T2K Open SupercomputerAlliance (U. Tokyo and Kyoto U)

Spec ；• 648 nodes (quad Opteron, 4sockets/node)• 10000 cores• Peak performance 95.4TF• total memory 20TB• total disk capacity 800TB( 20th in top 500, June, 2008)

A Special-purpose system to Astrophysics simulation by hybrid computation of radiation and N-body.

Each node is equipped by GRAPE-6, which is an accelerator specialized for N-body Gravity calculation.

256 nodes 　performance: cluster 3.5TFLOPS ＋ Grape-6 35TFLOPS

PACS-CS FIRST

T2K-tsukuba

GRAPE-6

(2006 ～） (2007 ～）

(2008 ～）

Computing resources in CCS

2010.2.22

System installation and future plans

2004 20122011201020092005 2006 2007 2008

FCS-IV

HA-PACS(planned)

PACS-CS

CP-PACS

H16 H17 H18 H19 H20 H21 H22 H23 H24

（計画）

2011-2013

VPP suspended

NGS(10PF)

FCS: Front-end system

the next systemto T2K

2010.2.22

Issues for Post-peta scale systems (not exa?)

System to enable strong-scaling the current petascale system enabled by weak-scaling We need more powerful node & network

GPGPU is one of solution

More specialized architecture we need a sharp science target All applications cannot use

More difficult to program Need supports from CS-side Collaboration with computer science

and computational science

1 10 102 103 104 105 106

1GFlops109

1TFlops1012

1PFlops1015

1EFlops1018

Peakflops

limitationof #node

Exaflops system

PACS-CS(14TF)

target ofHA-PACS

NGS> 10PF

CCS's mission

2010.2.22HA-PACS: Highly Accelerated Parallel Advanced system for Computational Sciences

(planned)Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system

Objective: to investigate acceleration technologies for post-petascale computing and its software, algorithms and computational science applications, and demonstrate by building a prototype system

Design and deploy a GPGPU-based Cluster system Research on programming model and languages, environment for parallel

system with accelerators. Design of Algorithms and applications for parallel system with accelerators. Research on architectures for parallel system with accelerators.

IB switch IB switch

..............

18 node

IB switch IB switch

..............

18 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

18 node

IB switch IB switch

..............

18 node8coreCPU

8coreCPU

GPGPU GPGPU

Infiniband QDRx 2 port

........

18 groups

2-stage Fat-Tree (Infiniband QDR)..... .........

•ノード構成：8-core CPU x 2 + GPU x 2•ネットワーク構成：Infiniband QDR x 2 / node

Full-bisection B/W Fat-Tree•ピーク性能：2TFLOPS/node x 324

= 648TFLOPS

Total #node = 18x18 = 324

examples

2010.2.22

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

IB switch IB switch

..............

12 node

HA-PACS/NG powered by PEARL Link

CPUCPU

PEACHPEACH GPUGPU

to neighbor node

To external PCI-e switch

To neighbor node

CPUCPU

PEACHPEACH GPUGPU

to neighbor node

To external PCI-e switch

To neighbor node

Infiniband QDR

..... .........

GPGPU GPGPU GPGPU..............

PEARL Link

PCIe PCIe

Infiniband QDR

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

CPUCPU

PEACHPEACH GPUGPU

DirectConnectionbetweenGPUs

PEARL: PCI-Express Adaptive and Reliable LinkUse PCI-Express as a high-speed linkConnect CPU and devices including GPGPU through a router chip, PEACH (PCI-Express Adaptive Communication Hub)

2010.2.22

Strategic target computational sciences of HA-PACS

① Bio-physics : high performance QM/MM hybrid simulation for mechanisms of high-efficiency enzymatic reactions, electronic and 3D structures of biomacromolecules

Speedup of QM is a key for this simulations

② astrophysics: full Hydrodynamics and radiative-transfer simulation for the Universe and Formation of Astronomical objects

Full 6 dimensional simulation is required

③ Particle physics: full-lattice QCD simulation

Japanese the next generation supercomputer project

2010.2.22

background: Japanese government plan The 3rd Science and Technology Basic Plan (FY2006-FY2010) “Next-generation super computing technology” is selected as one of

key technologies of national importance Development and installation of the advanced high performance

supercomputer system (10petaflops) → the Next-Generation Supercomputer

Development application software Establishment of “Advanced Computational Science and Technology

Center” (tentative name) The 4th Science and Technology Basic Plan (FY2011-FY2015)

(Now under discussion) Exaflops class HPC Technology New chip device, software, hardware…

After the election of the House of Representatives in the last summer,….

In the November of the last year, the new government party have decided to freeze the plan of the development at the screening of government projects!!!

In January of this year, the cabinet have made a decision to resume the super computer project.

2010.2.22

The System Overview of NGS

Ultra high-speed/ high-reliable CPU Advanced 45nm process technology 8cores/CPU, 128GFLOPS Error recovery ( ECC, Instruction retry, etc.)

High performance/highly reliable network Direct interconnection network by multi-dimensional mesh/torus network Expandability and reliablity

System Software Linux OS Fortran, C, and MPI libraries Distributed parallel file system

【 Massively Parallel/Distributed Memory Supercomputer 】

Logical 3-dimensional torus network

Courtesy of FUJITSU

2010.2.22

Configuration of Compute Nodes Number of nodes > 80k

Number of CPUs > 80kNumber of cores > 640k

Peak Performance > 10PFLOPSTotal Memory Capacity > 1PB ( 16GB/node )

Multi-dimensional mesh/torus networkPeak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus networkPeak bi-sectional bandwidth: > 30TB/s

ノード

CPU: 128GFLOPS(8 Core)

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

CoreSIMD(4FMA)

16GFlops

L2$: 5MB

64GB/s

CoreSIMD(4FMA)16GFLOPS

MEM: 16GB

Logical 3-dimensional torus network for programmingx

5GB/s x 2 5GB/s x 2

/s x 2

5GB/s x 2

2010.2.22

The Next-Generation Supercomputer Project

FY2008 FY2009 FY2010 FY2011

Computerbuilding

Researchbuilding

FY2007FY2006 FY2012

Next-GenerationIntegrated NanoscienceSimulation

Next-GenerationIntegratedLife Simulation VerificationVerificationDevelopment, production, and evaluationDevelopment, production, and evaluation

Tuning and improvement

VerificationVerification

Production, installation, and adjustment

ConstructionConstructionDesignDesign

Prototype andevaluation Detailed design Detailed design

Conceptualdesign

Development, production, and evaluationDevelopment, production, and evaluation

System

○Schedule

open to users

2010.2.22

The categories of users of NGS

1. Strategic Use: MEXT selected 5 strategic fields from national viewpoint. Field 1: Life science/Drug manufacture Field 2: New material/energy creation Field 3: Global change prediction for disaster

prevention/mitigation Field 4: Mono-zukuri (Manufacturing technology) Field 5: The origin of matters and the universe

2. General Use: The use for the needs of the researchers in many

science and technology fields including industrial use and educational use

2010.2.22

Organization for NGS “Advanced Computational Science and Technology Center”

(ACSTC) (tentative name) will be organized at NGS.

MEXT selects 5 core organizations that lead research activities in 5 strategic fields

ACSTC → Core research center• Conducts advanced and basic R&D in computational science• Leads cooperation among strategic fields• Provides key knowledge to 5 organizations in strategic fields and another

research organizations 5 core organizations → Research center in each field

• Conducts advanced R&D in each field

• CCS was selected as a core organization for "Field 5: The origin of matters and the universe"

• particle physics, Astrophysics, nuclear physics• Collaboration with KEK and National Observatory

CCS machine development plan for post- peta scale computing and Japanese the next generation...

Documents

Tsukuba University_resume 2012 _ yunghsunchen

Guide Map Tsukuba AIST - 産業技術総合研究所 Map Tsukuba Center TRAIN: USING TSUKUBA EXRESS Take the express train from Akihabara (45 min) and get oﬀ at Tsukuba Station

2003/10/3 UK Jpana N+N Meeting 1 “Grid Platform for Drug Discovery” Project Mitsuhisa Sato Center for Computational Physics, University of Tsukuba, Japan

1 Basic Concepts in Parallelization - cOMPunity Basic_Concepts... · Basic Concepts in Parallelization 1 RvdP/V1 Tutorial IWOMP 2010 – CCS Un. of Tsukuba, June 14, ... This "law

Activities and Collaborations - Tsukuba

Concrete Colour Palette 2014 - Ever-Ready Concrete · ccs gypsy 8% ccs pompei ash 3% ccs bilby 3% ccs echidna 3% ccs ... ccs concrete colour palette 2014 ccs canary 3% owc ccs buttermilk

場づくりのススメモバイル版（2）143 D < for here - TFF Tsukuba up Tsukuba - TFF a for here up Tsukuba 10 rup Tsukuba, WEBS -747 FM 142 7k

Mental Skills Training - Tsukuba Summer Institute · Mental Skills Training for Athletes and Coaches Tsukuba Summer Institute 2014 Guido Geisler, PhD University of Tsukuba

Trichuris muris system - Tsukuba

Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1

SPOT90 - Tsukuba

1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba

Computational Thinking - Tsukuba

What We Did for Co-Design in Development of …orap.irisa.fr/wp-content/uploads/2020/01/Mitsuhisa-Sato...What We Did for Co-Design in Development of “Fugaku” Mitsuhisa Sato Team

easy Speak' Users Manual For Tsukuba Toastmasters Club members · Tsukuba Toastmasters Club June 2012 Version1.1 1 'easy-Speak' Users Manual For Tsukuba Toastmasters Club members

Status report of XcalableMP project Mitsuhisa Sato University of Tsukuba On behalf of the parallel language WG This is the name of our language!

ATHIC2008T.Umeda (Tsukuba)1 QCD Thermodynamics at fixed lattice scale Takashi Umeda (Univ. of Tsukuba) for WHOT-QCD Collaboration ATHIC2008, Univ. of Tsukuba,

ILDG Middleware Status Presented By: Bálint Joó, Jlab, USA Working Group Members: G. Beckett (EPCC, UK) T. Boku (CCS Tsukuba, Japan) D. Byrne (EPCC, UK)

1 General Guidelines - Tsukuba

Tq_12_01_08 - Tsukuba Islamic Association