All Rights Reserved. Copyright © 2000 Hitachi Europe Ltd. SR8000 Concept Tim Lanfear Hitachi Europe...

SR8000 Concept

Tim Lanfear

Hitachi Europe GmbH.

t-lanfear@hpcc.hitachi-eu.co.uk

SR8000 Model Range4 8 16 32 64 128 256 512

SR8000 32 64 128 256 512 1,024 - -SR8000 Model E1 38.4 76.8 153.6 307.2 614.4 1,228.8 2,457.6 4,915.2SR8000 Model F1 48 96 192 384 768 1,536 3,072 6,144SR8000 3-D

Crossbar- -

SR8000 Models E1, F1SR8000SR8000 Model E1SR8000 Model F1SR8000 32 64 128 256 512 1,024 - -SR8000 Models E1, F1 64 128 256 512 1,024 2,048 4,096 8,192

SR8000SR8000 Model E1SR8000 Model F1SR8000SR8000 Models E1, F1

External Interfaces

Number of NodesPeak Performance (GFlops)

Inter-node Network

Max. Memory Capacity (GB)

One Dimensional Crossbar

Two Dimensional Crossbar

One Dimensional Two Dimensional Crossbar Three Dimensional CrossbarInter-node Transfer Speed

1 GB/s (single direction) x2 -1.2 GB/s (single direction) x21 GB/s (single direction) x2

Ultra SCSI, Ethernet/Fast Ethernet, Gigabit Ethernet, ATM, HIPPI, Fibre Channel

Peak Performance (GFlops)

89.612

Memory Capacity (GB)

2 / 4 / 82 / 4 / 8 / 16

System

SR8000 Appearance

Compact Model

Model Model A Model B Model C

Peak Performance 4 GFlops 8 GFlops 12 GFlops Memory Capacity 2GB / 4GB / 8GB /

External Interfaces

Number of I/O Interfaces

System Expandability

Input Current (Single Phase 200V-240V)

Power Consumption 3.3 kW

Noise Level

Dimension (mm) <W x D x H>

Operation Management

Languages

Development Support

Matrix Calculation Library

Graphics

500 x 910 x 1500

Physical Planning

HI-UX/MPP for SR8000

NQS, NFS, RealTime Monitor, ADSM (ADSTAR Distributed Storage Manager)

SoftwareFORTRAN77, FORTRAN90, Parallel FORTRAN, C, C++,

Kuck and Associates C++

Application Development Environment, Parallel Debugger

MATRIX/MPP, MATRIX/MPP/SSS, MSL2

X Window System, OSF/Motif, GKS, PHIGS, PEXlib

Hardware

3.0 kW

Ultra SCSI, Fibre Channel, Ethernet/Fast Ethernet, Gigabit Ethernet, HIPPI, ATM

2GB / 4GB / 8GB

8 (maximum)

Available

Vector vs SMP vs MPP

Feature Vector SMP MPPSingle Node Performance

Scalability

Programming Effort

Development Cycle

Energy Requirements

System Architecture

Cross-bar Inter-node Network

Node (PRN)

Node (ION) CPU CPU

System Control

Main MemoryNetwork Control

Ether, ATM, HIPPI RAID Disk

Service Processor

Console

Programming Models

Hardware Programming Model Example Single CPU Pseudo-vector processing Vector application

Independent processing on each IP

Compilation, parallel make

Message passing MPP application DO loop distribution with COMPAS

Vector application Single Node

Parallel processing of independent blocks of code

Message passing MPP application Multiple Nodes COMPAS and message

passing Vector parallel application

CPU Architecture

• 16 bytes/cycle memory BW• 128 Kbyte L1 cache• Pre-fetch and pre-load

instructions• 160 f.p. registers• 2 f.p. pipelines• 4 flops/cycle

Main MemoryMain Memory

Pre-fetchPre-fetch

Pre-loadPre-loadCache

Floating Point Registers

Arithmetic UnitArithmetic Unit

Memory SwitchMemory Switch

Slide Window Registers

• Registers for all instructions• Registers for extended instructions only• Fixed registers: 4, 8, 16, 32 (16 illustrated)• Fixed + sliding = 128

Physical Sliding part: 0 to 127 Global part: 128 to 159

Logical0 to 1532 to 12516 to 31126-7

0 to 1532 to 12316 to 31124-7

Base=2

Base=4

Instruction Set Extensions

• Load and store with extended registers

• Floating point arithmetic with extended registers

• Slide window control

• Pre-fetch and pre-load

• Thread start-up and finish

• Predicate instructions

SR8000 Programming

Instruction Level Parallelism

(Pseudo-vector Processing: PVP)

Pre-fetch and Pre-load

• Pre-fetch: load cache line from memory to cache

• Pre-load: load one word from memory to register

• 16 streams

Main MemoryMain Memory

Pre-fetchPre-fetch

Pre-loadPre-loadCache

Floating Point Registers

Arithmetic UnitArithmetic Unit

Memory SwitchMemory Switch

Pre-fetchIteration

PF Latency Use dataLD1

Use dataLD

PF Latency Use dataLD

Use dataLD

• Pre-fetch 128 bytes to cache• Follow by LD to register

Pre-load

PL Latency Use data1

Latency Use data

• Pre-load 8 bytes to register• LD not required

Iteration

Software Pipelining

I=1 I=2 I=3

No SWPL

Infinite resource

Recurrence

=a I=1

Resources:

registers, f.p. units, instruction issue, memory bandwidth etc

Finite resource

Initiation interval

Pseudo-vector Processing

A(:) = A(:) + N

Vector Pseudo-Vector

PF Lat LD + ST

LD + ST

PF Lat LD + ST

LD + ST

VADDVLD

Effect of PVP

1 10 100 1000 10000 100000

Loop length (N)

PVP off

PVP on

Dot product: S = A(1:N)*B(1:N)

SR8000 Programming

Multi-thread Parallelism

(Cooperative Microprocessors in a Single Address Space: COMPAS)

COMPASMulti-dimensional Crossbar Network

thread

process

Pre-fetchLoadArithmeticStoreBranch

Automatic Parallel Processing

COMPAS (Start Inst.)

IP: Instruction Processor

Node Node Node

Main memory (shared)

IP IP IP IP

COMPAS ( End Inst.)

COMPAS: Co-operative Micro-Processors in single Address Space

Loop Part

(waiting for startup)

Hardware SupportSoftware

Scalar Part

Start Parallel Inst.

Loop Part

End Parallel Inst.

Hardware Support

IP:Instruction ProcessorSC:Storage ControllerMS:Main Storage

Barrier SynchronizationMechanism

Scalar Part

Loop Part

IPIPIPIP

Loop Parallelisation

DO i =1,NA(i)=B(i)+C(i)

[fork]DO i =start,end

A(i)=B(i)+C(i)ENDDO[join]

i loop parallelisation

DO j=1,MW(j)=C(j)+D(j)DO i=1,N

A(i,j)=B(i,j)+W(j)ENDDO

[fork]DO j=start,end

W(j)=C(j)+D(j)DO i=1,N

A(i,j)=B(i,j)+W(j)ENDDO

ENDDO[join]

j loop parallelisation

DO j=2,MDO i=1,N

A(i,j) = A(i,j-1)+A(i,j)ENDDO

[fork]DO j=2,M

DO i=start,endA(i,j) = A(i,j-1)+A(i,j)

ENDDOENDDO[join]

DO i=1,NA(i) = B(i)+C(i)

ENDDODO j=1,M

D(j) = E(j)*F(j)ENDDO

[fork]DO i=start,end

A(i) = B(i)+C(i)ENDDODO j=start,end

D(j) = E(j)*F(j)ENDDO[join]

j loop parallelisation

DO i = 1,N

CALL sub(a,b,i)

*poption parallel force parallelisation

*poption tlocal(a,b,i) thread local variables

[fork]

DO i = 1,N

CALL sub(a,b,i)

[join]

Section Parallelisation

*poption parallel_sections

*poption section

CALL SUB1

*poption section

CALL SUB2

*poption end_parallel_sections

Execution of independent blocks of code in different threads

(sections are always single threaded)

Effect of COMPAS

1 10 100 1000 10000 100000

Loop length (N)

COMPAS off

COMPAS on

Dot product: S = A(1:N)*B(1:N)

SR8000 Programming

Message Passing

Remote DMA

Receive Buffer

ProgramProgram

memory copy

Send Buffer

Crossbar Network

Normal TransferProtocol ProcessingContext SwitchInterrupt Handling

Node Node

memory copy

Remote DMA Transfer

No Buffering in KernelNo OS System Call

Inter-node MPI

One MPI process per node; RDMA transfer possible

MPI MPI MPI

Intra-node MPI

One MPI process per IP; RDMA transfer not possible

Shared mem

MPIShared m

MPI Ping-pong

0.00E+00

2.00E+02

4.00E+02

6.00E+02

8.00E+02

1.00E+03

1.20E+03

1.40E+03

1 10 100 1000 10000 100000 1000000

Message length

Intra-node

Inter-node RDMAInter-node no RDMA

SR8000 Parallelism

Instruction level (PVP)

Multi-thread (COMPAS)

Node 1 Node 2

Message passing (MPI)

SR8000 Programming

Memory Architecture

Memory Hierarchy

fp registers (128+32)

L1 cache (128 Kb 4-way)

Store buffer (16 entries)

Switch

Memory (2 to 16 Gb, 512 banks)

16 b/cyc

16 b/cyc32 b/cyc

Other IPs

Address Translation

Virtual page number Page offset

memory

Virtual address

Cache recently used entries of page table in

Large TLB

Virtual page number Page offset

memory

Virtual address

Large TLB covers whole address space with 256 entries.

Page size 16Mb to 128 Mb

Memory Address Hashing

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

memory controller data path

storage controller data path

Key Features of SR8000

• High performance RISC CPU with PVP

• High performance node with COMPAS

• High sustained memory bandwidth

• High scalability with fast network

• Low energy and space requirements

SR8000 Programming

Performance

Top 500 – June 2000

Manufacturer Computer Rmax Installation Site

1 Intel ASCI Red 2379 Sandia National Lab

2 IBM ASCI Blue Pacific 2144 Lawrence Livermore National Lab

3 SGI ASCI Blue Mountain 1608 Los Alamos National Lab

4 IBM SP Power3 375 MHz 1417 NAVOCEANO

5 Hitachi SR8000-F1/112 1035 LRZ Munich

6 Hitachi SR8000-F1/100 917 KEK Tsukuba

7 Cray Inc T3E/1200 891 US Government

8 Cray Inc T3E/1200 891 US Army HPC Research Center

9 Hitachi SR8000/128 873 University of Tokyo

10 Cray Inc T3E/1200 815 US Government

Linpack Performance

80.25 (8 nodes)159.51 (16 nodes)

313.32 (32nodes)

917.15(100 nodes)

605.30 (64 nodes)577.49 (60 nodes)

0 20 40 60 80 100 120Number of nodes

10.88 Gflops on 1 node

20.50 Gflops on 2 nodes

10.88 Gflops on 1 node

NAS Parallel FT

5.147.92

1 2 4 8Number of Nodes

ClassA

ClassB

ClassC

NAS Parallel CG

6.353.67

ClassB

ClassC

6.353.67

ClassB

ClassC

All Rights Reserved. Copyright © 2000 Hitachi Europe Ltd. SR8000 Concept Tim Lanfear Hitachi Europe...

Documents

HITACHI All Rights Reserved, Copyright C 2001, Hitachi, Ltd. Overview of Hitachi’s Super Technical Server SR8000 Overview of Hitachi’s Super Technical

2762 FIAT HITACHI FR 160.2 FIAT HITACHI

Hitachi ChainHoist Catalog Hitachi Electric Chain HOIST

Hitachi Universal Replicator · Hitachi Universal Replicator User Guide Hitachi Virtual Storage Platform G1000 and G1500 Hitachi Virtual Storage Platform F1500 Hitachi Virtual Storage

SR8000 Concept

IOT & Asset Management...Hitachi Digital Solutions & Services Hitachi Smart Cloud Solutions Hitachi Global Data Office Hitachi Predictive Maintenance ... The Transformation –Progress

GE Hitachi Nuclear Energy SI-HITACHI

Overview of Hitachi’s Super Technical Server SR8000

001CoverF UC18YGL EE - HiKOKI · 2020. 11. 25. · Hitachi label Hitachi-Etikett Επωνυµία Hitachi Nameplate Typenschild Πινακίδα ... HITACHI. 704. 001CoverF_UC18YGL_EE

Hitachi Universal Storage Platform V Hitachi Universal

Consolidated Earnings Report - hitachi-capital.co.jp · common stocks of Hitachi Capital owned by Hitachi, Ltd. (Hitachi), executed in October, 2016, Hitachi now holds 33.40%, MUFG

HITACHI CONSTRUCTION MACHINERY EUROPE TESTET ... - … · ÜBER HITACHI CONSTRUCTION MACHINERY EUROPE Hitachi Construction Machinery Europe ist eine Tochtergesellschaft von Hitachi

Hitachi Device Manager Software, Hitachi Tiered …...ST 称： Hitachi Device Manager Software,Hitachi Tiered Storage Manager Software セキュリティターゲットバージョン：

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology 010652000 Introduction to Parallel Computing Group

Japan Tag Symposium 2017 · 2017-05-29 · Hitachi Consulting Europe Hitachi Sophia Antipolis Laboratory Hitachi Data Systems (Altrincham) Hitachi Tool Engineering Europe Hitachi

HITACHI PROJECTOR NETWORK TOOLS - Hitachi America · HITACHI PROJECTOR NETWORK TOOLS. HITACHI NETWORK PROJECTOR ADVANTAGES Hitachi Network Projectors offer versatile, embedded networking

Edwards Aquifer Protection Program Dianne Pavlicek-Mesa, P.G. Amanda Zrubek Zach Lanfear

HITACHI CAPITAL CORPORATION HITACHI … the purposes of Directive 2004/39/EC ... HITACHI CAPITAL CORPORATION HITACHI CAPITAL (UK) PLC HITACHI CAPITAL AMERICA CORP. [[] []

Hitachi Property Management - hitachi-solutions.de · Hitachi Solutions | hitachi-solutions.de 2 Wie Ihr Unternehmen von Hitachi Property Management proﬁ tieren kann Um efﬁ zient

HITACHI Inspire the...HITACHI Inspire the