31
Nanco: a large HPC Nanco: a large HPC cluster for RBNI cluster for RBNI (Russell Berrie Nanotechnology (Russell Berrie Nanotechnology Institute) Institute) Anne Weill – Zrahia Anne Weill – Zrahia Technion,Computer Center Technion,Computer Center October 2008 October 2008

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Embed Size (px)

Citation preview

Page 1: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Nanco: a large HPC Nanco: a large HPC cluster for RBNIcluster for RBNI

(Russell Berrie Nanotechnology Institute)(Russell Berrie Nanotechnology Institute)

Anne Weill – ZrahiaAnne Weill – ZrahiaTechnion,Computer CenterTechnion,Computer Center

October 2008October 2008

Page 2: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Resources needed for applications Resources needed for applications arising from Nanotechnologyarising from Nanotechnology

Large memory – Large memory – TbytesTbytes

High floating point computing speed High floating point computing speed ––TflopsTflops

High data throughput High data throughput – state of the – state of the art …art …

Page 3: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

SMP architectureSMP architecture

PP PP

Memory

Page 4: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Cluster architectureCluster architecture

Interconnection network

Page 5: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Why not a clusterWhy not a cluster

Single SMP system easier to Single SMP system easier to purchase/maintainpurchase/maintain

Ease of programming in SMP Ease of programming in SMP systemssystems

Page 6: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Why a clusterWhy a cluster

ScalabilityScalability Total available physical RAMTotal available physical RAM Reduced costReduced cost

But …But …

Page 7: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Having an application which exploits Having an application which exploits the parallel capabilities the parallel capabilities

Studying the application or applications which

will run on the cluster

Page 8: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Things to include in designThings to include in design

Property of Property of codecode

Essential Essential componentcomponent

CPU boundCPU bound Fast Fast computing computing unitunit

Memory Memory boundbound

Large Large memory , fast memory , fast accessaccess

Global flow of Global flow of data in data in parallel appparallel app

Fast Fast interconnectinterconnect

Page 9: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Our choicesOur choices Property of Property of codecode

Essential Essential componentcomponent

ChoiceChoice

CoComputationnmputationnally ally intensive,FPintensive,FP

Fast Fast computing computing unitunit

64 bit dual 64 bit dual core,Opteron,core,Opteron,Rev.FRev.F

Large Large matricesmatrices

Large Large memory , fast memory , fast accessaccess

88 GB /nodeGB /node

Finite Finite element, element, spectral spectral codescodes,,

Fast Fast interconnectinterconnect

Infiniband Infiniband DDR (20 DDR (20 Gb/s,low Gb/s,low latency)latency)

Page 10: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Other requirementsOther requirements

Space, power ,cooling constraints , Space, power ,cooling constraints , strength of floorsstrength of floors

Software configuration:Software configuration:

1.1. Operating systemOperating system

2.2. Compilers & application deve. toolsCompilers & application deve. tools

3.3. Load balancing and job schedulingLoad balancing and job scheduling

4.4. System management toolsSystem management tools

Page 11: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

ConfigurationConfiguration

P P PPP P

MMM

Infiniband Switch

Page 12: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Before finalizing our choice …Before finalizing our choice …

One should check , on a similar system One should check , on a similar system ::

Single processor peak performanceSingle processor peak performance Infiniband interconnect performance Infiniband interconnect performance SMP behaviourSMP behaviour Non commercial parallel applications Non commercial parallel applications

behaviourbehaviour

Page 13: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Parallel applications issuesParallel applications issues

Execution timeExecution time

Parallel speedup Sp= T1/TpParallel speedup Sp= T1/Tp

ScalabilityScalability

Page 14: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Benchmark designBenchmark design

Must give a good estimate of Must give a good estimate of performance of your applicationperformance of your application

Acceptance test -should match all its Acceptance test -should match all its componentscomponents

Page 15: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Comparison of performanceComparison of performance

Computer Computer CarmelCarmelNancoNanco

Lapack Lapack program, program, N=9000N=9000

487 Mflops487 Mflops3823826.4 Mflops6.4 Mflops

Ratio of 7.8 !!Ratio of 7.8 !!

Page 16: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Execution time of Monte-Carlo Execution time of Monte-Carlo parallel code (MPI)parallel code (MPI)

ProcessesProcesses((CarmelCarmel11NancoNanco

112204222042

(~6hrs !)(~6hrs !)43894389

(~1 hr)(~1 hr)

22122461224617391739

44480948091154.81154.8

8835403540642.12642.12

1616282.5282.5

Page 17: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Speedup of Parallel Monte Carlo

0.00

20.00

40.00

60.00

80.00

100.00

120.00

2 4 8 16 32 64

n of processes

Exe

uti

on

tim

e

MILC

Page 18: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

What did workWhat did work

Running MPI code interactivelyRunning MPI code interactively Running a serial job through the Running a serial job through the

queuequeue Compiling C code with MPICompiling C code with MPI

Page 19: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

What did not workWhat did not work

Compiling F90 or C++ code with Compiling F90 or C++ code with MPIMPI

Running MPI code through the queueRunning MPI code through the queue Queues do not do accounting per Queues do not do accounting per

CPUCPU

Page 20: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

PParaarallel performancellel performance results results

TheoreticTheoretical peak al peak

2.1 Tflops2.1 Tflops

NNanco performance on HPL:anco performance on HPL:

0.58 Tflops0.58 Tflops

Page 21: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Comparison with Sun BenchmarkComparison with Sun Benchmark

Comparison Sunbench vs nanco(pathscale),2ppn

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

24816

nof processes

MVH1

MILC

IGOR

Page 22: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

M I LC s mal l - 2th/ n-

0. 00

500. 00

1000. 00

1500. 00

2000. 00

2500. 00

12481632

pr ocesses

Sun-bench

Nanco-gcc3

Nanco-sunc

Nanco-path

Nanco-gcc4

EExecution tixecution time –comparison of me –comparison of compilerscompilers

Page 23: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

P ar al l el Speedup f or M I LC (2th/ n)

0. 00

20. 00

40. 00

60. 00

80. 00

100. 00

120. 00

248163264

pr ocesses

SUN-bench

Nanco-sun

Nanco-path

Page 24: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

PerforPerformance with different mance with different optimizationsoptimizationsExecution time of MVH1 on nanco w ith 32 threads

0.00

50.00

100.00

150.00

200.00

250.00

300.00

Type of optimization

Execu

tio

n t

ime

VoltaireMPI+Pathscale

OpenMPI+opt.plac.

OpenMPI+opt.plac.+tmp disk

Page 25: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Conclusions from acceptance testsConclusions from acceptance tests

New gcc (gcc4) is faster than New gcc (gcc4) is faster than Pathscale for some applicationsPathscale for some applications

MPI collective communication MPI collective communication functions are differently functions are differently implemented in various MPI versionsimplemented in various MPI versions

Disk access times are crucial - use Disk access times are crucial - use attached storage when possibleattached storage when possible

Page 26: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Scheduling decisionsScheduling decisions

Assessing priorities between user Assessing priorities between user groupsgroups

Assessing parallel efficiency of Assessing parallel efficiency of different job types different job types (MPI,serial ,OPenMP) /commercial (MPI,serial ,OPenMP) /commercial software and designing special software and designing special queues for themqueues for them

Avoiding starvation by giving weight Avoiding starvation by giving weight to the urgency parameterto the urgency parameter

Page 27: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Observations during production Observations during production modemode

Assessing user’s understanding of Assessing user’s understanding of machine – support in writing scripts machine – support in writing scripts and efficient parallelizationand efficient parallelization

Lack of visualization tools – writing of Lack of visualization tools – writing of script to show current usage of script to show current usage of clustercluster

Page 28: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Utilization of clusterUtilization of cluster

Page 29: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Utilization of nanco sep08Utilization of nanco sep08

Utilization (daily) sep 08

0

20

40

60

80

100

120

date

Uti

liza

tio

n

Series1

Page 30: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Nanco jobs by typeNanco jobs by type

Nanco- feb 2008-by job type

Scalar

Fullwave

Self dev.code

Page 31: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

ConclusionConclusion

Benchmark correct design is crucial Benchmark correct design is crucial to test capabilities of proposed to test capabilities of proposed architecturearchitecture

Acceptance tests allow to negotiate Acceptance tests allow to negotiate with vendors and give insights on with vendors and give insights on future choicesfuture choices

Only after several weeks and Only after several weeks and running of the cluster at full running of the cluster at full capacity can we make informed capacity can we make informed decisions on management of the decisions on management of the clustercluster