23
Better answers Compaq HPTC Compaq HPTC Solutions Solutions Bruce Foster, Ph.D., MBA Bruce Foster, Ph.D., MBA [email protected] [email protected]

Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA [email protected]

Embed Size (px)

Citation preview

Page 1: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Compaq HPTC SolutionsCompaq HPTC Solutions

Bruce Foster, Ph.D., MBABruce Foster, Ph.D., MBA

[email protected]@compaq.com

Page 2: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Top100 SuperComputer Top100 SuperComputer Architectures (June 1999)Architectures (June 1999)

40.8%

50.4%

41.0%

25.1%

9.5%

3.0%

19.5%17.1%

30.0%

6.8%5.3%

7.0%5.6%

9.3%

6.0%

1.8%

4.6%7.0%

0.7%3.3%

5.0%

0.4% 0.5% 1.0%0.0% 0.0% 0.0%0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Percent CPUs Percent GFlops Percent Sites

Alpha

Pentium

RS6000

MIPs

Hitachi

Fujitsu

NEC

SPARC

PA-RISC

Page 3: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

The Barriers to Performance ScalingThe Barriers to Performance ScalingThe Barriers to Performance ScalingThe Barriers to Performance Scaling

CPU cycle time (nsec)

11

100100

1010

1010 100100 1K1K 10K10K 100K100K

100 to 200 GFLOP LimitSC(PVP)

SMP Cluster FarmMPP

Numbers of CPUs

PhysicalLimits

ComplexityLimits

Page 4: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

CPU cycle time (nsec)

PhysicalLimits

11

100100

1010

1010 100100 1K1K 10K10K 100K100K

SC(PVP)

SMPCluster

FarmMPP

Numbers of CPUs

ComplexityLimits

Fastest Microprocessorswith best interconnects

for SMP Clustersyield

MaximumApplication Performance

(TeraFLOP Level)

Clusters of SMPs Are Breaking Clusters of SMPs Are Breaking Through to the TeraFlop LevelThrough to the TeraFlop Level

Page 5: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

High Performance ComputingHigh Performance ComputingSystemsSystems

HPTC SolutionsHPTC Solutions

AlphaServer AlphaServer SystemsSystems

InterconnectsInterconnects

SoftwareSoftware

ServicesServices

Page 6: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Compaq is in it for the long haul!Compaq is in it for the long haul! Alpha roadmap committed for 10 years and Alpha roadmap committed for 10 years and

beyond of performance leadership.beyond of performance leadership. Tandem will use Alpha in their next generation Tandem will use Alpha in their next generation

systems. Tandem owns 36 of the top 38 stock markets systems. Tandem owns 36 of the top 38 stock markets worldwide.worldwide.

Over 50% of Compaq’s revenue is from Over 50% of Compaq’s revenue is from Enterprise SystemsEnterprise Systems

Page 7: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Wide Presence in HPTC marketWide Presence in HPTC market Intel/ServerNet clusters at NCSAIntel/ServerNet clusters at NCSA Alpha Linux/ServerNet at CaltechAlpha Linux/ServerNet at Caltech Alpha Tru64 Unix/FastEthernet at SwinburneAlpha Tru64 Unix/FastEthernet at Swinburne Alpha Linux /Myrinet “C-Plant” at Sandia (#44 on Top500 Alpha Linux /Myrinet “C-Plant” at Sandia (#44 on Top500

list)list) HPTi win at FSL (Alpha Linux /Myrinet) 4 TFlop systemHPTi win at FSL (Alpha Linux /Myrinet) 4 TFlop system Compaq Visual Fortran for W95/NTCompaq Visual Fortran for W95/NT Compaq Compilers for Alpha/LinuxCompaq Compilers for Alpha/Linux Several very large SC systems (#34 on Top500 list)Several very large SC systems (#34 on Top500 list) Celera 300 x 4 CPU ES40s (1.2 TFlop)Celera 300 x 4 CPU ES40s (1.2 TFlop) ASCI PathForward and ASCI TurquoiseASCI PathForward and ASCI Turquoise

Page 8: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

1999 Small and Medium 1999 Small and Medium AlphaServersAlphaServers

Compaq DS10Compaq DS10 Compaq DS20 SystemCompaq DS20 System

2 CPUs, small PC tower2 CPUs, small PC tower 5.13 GB/s peak, 1.3 GB/s Single-CPU McCalpin 5.13 GB/s peak, 1.3 GB/s Single-CPU McCalpin

Memory B/WMemory B/W Compaq ES40 SystemCompaq ES40 System

4 CPUs, bigger cabinet4 CPUs, bigger cabinet EV67 systems: EV67 systems: 2.5 GB/s 4-CPU McCalpin b/w2.5 GB/s 4-CPU McCalpin b/w Double the I/O bandwidth & more slotsDouble the I/O bandwidth & more slots

Page 9: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Next Generation DS/ES AlphaServersNext Generation DS/ES AlphaServersDesigned to Protect Your InvestmentDesigned to Protect Your Investment

SecondSecondGenerationGeneration

125 MHz 125 MHz Data BusData Bus125 MHz 125 MHz Data BusData Bus

Ultra2 64-Ultra2 64-bitbit

RAIDRAID

Ultra2 64-Ultra2 64-bitbit

RAIDRAID

EV67 600+ EV67 600+ MHzMHz

EV67 600+ EV67 600+ MHzMHz

EV68EV68800+ MHz800+ MHz

EV68EV68800+ MHz800+ MHz

8 MB 8 MB L2 CacheL2 Cache

8 MB 8 MB L2 CacheL2 Cache

32 GB32 GB32 GB32 GB

Ultra3Ultra3SCSISCSIUltra3Ultra3SCSISCSI

DVDDVDDVDDVD

Processor

Architecture

Memory

Storage

ThirdThirdGenerationGeneration

Alpha 21264 500 MHzAlpha 21264 500 MHz4 MB L2 Cache4 MB L2 Cache

16 GB of Memory16 GB of Memory83 MHz Data Bus83 MHz Data Bus

2 64-bit PCI busses2 64-bit PCI busses33 MHz PCI33 MHz PCI

Ultra2Ultra2

First GenerationFirst Generation

4 PCI Busses4 PCI Busses4 PCI Busses4 PCI Busses

66 MHz PCI66 MHz PCI66 MHz PCI66 MHz PCI

AGPAGPAGPAGP

ThirdThirdGenerationGeneration

Note: Feature set varies between AlphaServer DS and ES products based on customer needs

Page 10: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

SC’99SC’99

16x4 ES40 => 64 CPUs16x4 ES40 => 64 CPUs Quadrics InterconnectQuadrics Interconnect 1.7TB Storage1.7TB Storage

Page 11: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

LINPACK NxN Rmax (GFlops)LINPACK NxN Rmax (GFlops)

10.70

21.53

42.04

85.41

154.42 271.40

1

10

100

1,000

16 32 64 128 256 512

Number CPUs

GF

lops

Page 12: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

MPI 8Byte Ping Pong

y = 0.0005x + 5.5267

R2 = 0.1728

5.25

5.30

5.35

5.40

5.45

5.50

5.55

5.60

5.65

5.70

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126

node paired with 0

usec

s

Page 13: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Cluster and Parallel File SystemCluster and Parallel File System Cluster File SystemCluster File System

File system mounted on any node is visible to all File system mounted on any node is visible to all nodes without race conditionsnodes without race conditions

Each node is both a CFS server and CFS clientEach node is both a CFS server and CFS client Coherency is maintained by exchanging tokensCoherency is maintained by exchanging tokens Semantics are POSIX and X/OPEN compliantSemantics are POSIX and X/OPEN compliant Performance depends on access type and patternPerformance depends on access type and pattern

Parallel File SystemParallel File System Aggregates CFS files into a single parallel fileAggregates CFS files into a single parallel file Enables striping a single logical file across multiple Enables striping a single logical file across multiple

underlying local filesunderlying local files

Page 14: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Compilers & ToolsCompilers & Tools

Compaq F90, C, C++, Java, …Compaq F90, C, C++, Java, … Shared memoryShared memory

Parallelization within SMP node by OpenMPParallelization within SMP node by OpenMP 3rd party decomposition tools (KAI)3rd party decomposition tools (KAI)

Cray T3D/E-compatible Shmem libraryCray T3D/E-compatible Shmem library MPI (MPI 2, MPI-I/O, thread-safe)MPI (MPI 2, MPI-I/O, thread-safe) Debugger: TotalView (Etnus, Inc.)Debugger: TotalView (Etnus, Inc.) Performance analysis: Vampir (PALLAS GmbH)Performance analysis: Vampir (PALLAS GmbH) Load balancing: LSF (Platform Computing)Load balancing: LSF (Platform Computing)

Page 15: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Our Capability Machine Our Capability Machine is Hereis Here

A 16-CPU AlphaServer at SC’99A 16-CPU AlphaServer at SC’99 16-way GS160 AlphaServer16-way GS160 AlphaServer 16 * 1.46 GF/CPU = 23.4 GFLOPS16 * 1.46 GF/CPU = 23.4 GFLOPS High sustainable memory High sustainable memory

bandwidthbandwidth 32-way:32-way:

32 CPUs: 46.8 GFLOPS32 CPUs: 46.8 GFLOPS Very high sustainable memory Very high sustainable memory

bandwidthbandwidth

Page 16: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Alpha Microprocessor SummaryAlpha Microprocessor Summary EV6 (21264)EV6 (21264)

.35 .35 m, 466 - 500 MHzm, 466 - 500 MHz 4-wide superscalar4-wide superscalar Out-of-order executionOut-of-order execution

EV67 (21264a)EV67 (21264a) .25 .25 m, 667 - 730 MHzm, 667 - 730 MHz 8MB L2 cache8MB L2 cache

EV68 (21264b)EV68 (21264b) .18 .18 m, 800 - 1042 MHzm, 800 - 1042 MHz

EV7 (21364)EV7 (21364) .18 .18 m, ~1200 MHzm, ~1200 MHz L2 cache on-chipL2 cache on-chip RAMBUSRAMBUS Glueless MPGlueless MP

EV8 (21464)EV8 (21464) .13 .13 m, ~1500 MHzm, ~1500 MHz 8-wide superscalar8-wide superscalar SMTSMT

. . . Future Alpha Microprocessors planned through to 2025 !

Page 17: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

EV67/667MHz EV67/667MHz Preliminary Preliminary HPTC HPTC Applications ResultsApplications Results

30 to 45% improvement over ES40 EV6/500mhz30 to 45% improvement over ES40 EV6/500mhz Competitive leadershipCompetitive leadership

1.15 to over 2 times HP N40001.15 to over 2 times HP N4000– Better than an 8 CPU N4000Better than an 8 CPU N4000

Over 2 times SGI Origin 2000Over 2 times SGI Origin 2000– Better than an 8 CPU Origin 2000Better than an 8 CPU Origin 2000

Over 2 times Sun UE3000Over 2 times Sun UE3000 2 to 4 times Intel Xeon III2 to 4 times Intel Xeon III

Page 18: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Glo

bal S

wit

ch

EV6 EV6EV6 EV6

Mem MemMem Mem

I/O Switch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/O Switch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/O Switch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/O Switch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/OSwitch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/OSwitch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/OSwitch

EV67 EV67EV67 EV67

Mem MemMem Mem

I/OSwitch

New High-end New High-end AlphaServerAlphaServer Architecture ArchitectureA new way of looking at ServersA new way of looking at Servers

EachEach Quad Building Block Quad Building Block 4 EV67 CPUs (731 MHz, 1.46 GFlops)4 EV67 CPUs (731 MHz, 1.46 GFlops) 4 Memory Arrays (total of 16GB, 32-way)4 Memory Arrays (total of 16GB, 32-way) 6.4 GB/s Local Switch6.4 GB/s Local Switch 28 PCI slots28 PCI slots

Quads aggregate via a Global Switch (8 ports)Quads aggregate via a Global Switch (8 ports) Combines up to 8 quadsCombines up to 8 quads High Bandwidth, Low LatencyHigh Bandwidth, Low Latency Preserves SMP programming model

Up to 8 System PartitionsUp to 8 System Partitions Hardware firewalls provide software fault isolation between partitions Can be dynamically reconfigured Support multiple instances and versions of same O/S or different O/S

completely (Tru64 UNIX, OpenVMS, and soon Linux)

Page 19: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Overview of CY2000Overview of CY2000 CPUs/SMPCPUs/SMP

DS10 (1 CPU), DS10 (1 CPU), DS20 (2 CPUs), DS20 (2 CPUs), ES40 (4 CPUs), andES40 (4 CPUs), and GS80 (8), GS160 (16) and GS320 (32)GS80 (8), GS160 (16) and GS320 (32)

Systems up to 4096 CPUsSystems up to 4096 CPUs 128-way128-way

Microprocessor speedMicroprocessor speed Around 1GHz at end-2000Around 1GHz at end-2000

Page 20: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Systems Area Network: Systems Area Network: FAST Message PassingFAST Message Passing

QuadricsQuadrics Backbone of our AlphaServer SC systems.Backbone of our AlphaServer SC systems. High Bandwidth, Low Latency, High Node/CPU CountHigh Bandwidth, Low Latency, High Node/CPU Count It’s a PCI Card; this allows systems of both small and big servers.It’s a PCI Card; this allows systems of both small and big servers.

ServerNetServerNet Engineered for low per-node SAN cost.Engineered for low per-node SAN cost. Brings Tandem Non-Stop technology to Alpha Linux BeowulfsBrings Tandem Non-Stop technology to Alpha Linux Beowulfs

MyrinetMyrinet Ties together hundreds of Alphas on Sandia’s C-Plant.Ties together hundreds of Alphas on Sandia’s C-Plant.

Ethernet/Fast EthernetEthernet/Fast Ethernet Low cost interconnect for medium size systems; (Alpha at Swinburne, Low cost interconnect for medium size systems; (Alpha at Swinburne,

Sydney Uni (Gordon Bell winner), CSIRO multiple divisions)Sydney Uni (Gordon Bell winner), CSIRO multiple divisions)

Page 21: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Customer Comments: Alpha and Red HatCustomer Comments: Alpha and Red Hat

Comments from "The Center for the Neural Basis of CognitionComments from "The Center for the Neural Basis of Cognition” ” It runs about six times faster on that {DS20} machine than on a Pentium II It runs about six times faster on that {DS20} machine than on a Pentium II

400.400. Comments From West Coast University math department:Comments From West Coast University math department:

PII-450-512k cache PII-450-512k cache g77 -O3 75:02 g77 -O3 75:02 Celeron 450A-128K cache g77 -O3 74:44Celeron 450A-128K cache g77 -O3 74:44 Alpha 21164-600 4 MB cache g77 -O3 29:27Alpha 21164-600 4 MB cache g77 -O3 29:27 Alpha 21264-500 4 MB cache g77 -O3 17:16Alpha 21264-500 4 MB cache g77 -O3 17:16 Alpha 21264-500 4 MB cache fort -O3 8:42Alpha 21264-500 4 MB cache fort -O3 8:42 I'm impressed (both with the AlphaServer 21264 and Compaq Fortran). I'm impressed (both with the AlphaServer 21264 and Compaq Fortran).

It's a 5 mesh fluid flow used for modeling blood flowsIt's a 5 mesh fluid flow used for modeling blood flows.. Comments from Canadian University. Comments from Canadian University.

With your Fortran compiler the DS20 is about 3.5x the speed of an SGI With your Fortran compiler the DS20 is about 3.5x the speed of an SGI Origin 200 with a 180Mhz R10K CPU, pretty impressiveOrigin 200 with a 180Mhz R10K CPU, pretty impressive..

9 times !

6 times !

3.5 times!

Page 22: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Complete Suite of HPTC SystemsComplete Suite of HPTC Systems

•1- 2 Processors•Up to 4GB of memory•6 PCI slots

Switched based system - 64-bit PCI I/O subsystems - Very Large Memory

Scalable clusters on DIGITAL UNIX, OpenVMS and Linux

Modular system packaging - advanced systems management

DS Series

AprApr

•1- 4 Processors•Up to 16GB of memory•Up to 10 PCI slots

ES Series

FebFebMayMay

ComingComingSoonSoon

•1-32 Processors•Up to 128+GB of memory•Up to 224 PCI slots

GS SeriesSC Series

•EV 67 667MHz•64-512 Processors•Up to 2 TB memory•Up to 1.2K I/O slots

AnnouncingAnnouncing

Page 23: Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA bruce.foster@compaq.com

Better answers

Thank You!

Please visit our HPTC Web Site or send eMail to Steve Tolnai or myself

http://www.compaq.com/hpc

eMail: [email protected]@compaq.com