30
www.cptec.inpe.br 1 CENTER FOR WEATHER FORECAST AND CLIMATE STUDIES CENTER FOR WEATHER FORECAST AND CLIMATE STUDIES HPC ACTIVITIES AT CPTEC HPC ACTIVITIES AT CPTEC JAIRO PANETTA (CPTEC) JAIRO PANETTA (CPTEC) SAULO BARROS (USP) SAULO BARROS (USP) and many and many colleagues colleagues ( ( ..) ..) ECMWF, OCT 2006 ECMWF, OCT 2006

HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br1

CENTER FOR WEATHER FORECAST AND CLIMATE STUDIESCENTER FOR WEATHER FORECAST AND CLIMATE STUDIES

HPC ACTIVITIES AT CPTECHPC ACTIVITIES AT CPTEC

JAIRO PANETTA (CPTEC)JAIRO PANETTA (CPTEC)SAULO BARROS (USP)SAULO BARROS (USP)

and many and many colleaguescolleagues…… ((……..)..)ECMWF, OCT 2006ECMWF, OCT 2006

Page 2: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br2

Installed Top Speed (GFlops)

1

10

100

1.000

10.000

100.000

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

ECMWF NCEP UKMO DWD JMA Canada

Source: Top500

Page 3: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br3

1

10

100

1.000

10.000

100.000

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

ECMWF NCEP UKMO DWD JMA Canada LeastSquares

1.8/year

Installed Top Speed (GFlops)

Source: Top500

1 PFlops @ 2012

Page 4: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br4

1PByte

1994 1998 2004MACHINEMACHINE

NUMBER OF NODESNUMBER OF NODES

PROCESSORSPROCESSORS

TOP SPEEDTOP SPEED

MEMORYMEMORY

DISKDISK

Supercomputing at CPTEC

Page 5: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br5

Global Spectral Model T213L42 (64km) up to 10 days, twice a day (NCEP analysis and GPSAS assimilation system)

Regional ETA model (20kmL38) up to 5 days, twice a day (RPSAS assimilation system with CPTEC AGCM fields)

Coupled ocean/atmosphere global model (T126L28 + MOM3) up to 30 days, twice a day

CATT-BRAMS environmental model up to 3 days

Global (T126L28, 15 members, 15 days) and regional (40kmL38, 5 members, 5 days) ensembles twice a day

Wave model, climate monthly runs, etc…

Operational Suite

Page 6: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br6

Software aspects of production models:ParallelismEfficiencyEasy to useEasy to modify

Provide user support on all software aspects

Transform successful research into production

Probe future technologies Hardware and software

HPC Group Activities

Page 7: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br7

Spectral Eulerian or Semi-Lagrangian, Full, Reduced, Quadratic or Linear Grid

Dynamically configurable, Fortran 90, OpenMP, MPI

Binary reproducible, portable, efficient on production machine

Easy to insert new physical parametrizations

Souza’s Shallow Cumulus , Grell Ensemble Convection, CLIRAD radiation

About 15 men-years modernization effort

Global Model

Page 8: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br8

Efficiency under OpenMP

Eulerian Full

25%

30%

35%

40%

45%

50%

1 2 3 4 5 6 7 8

Processors

% M

achi

ne T

op S

peed

T062L28 (210km) T126L28 (105km) T170L42 (79km) T213L42 (63km) T254L64 (53km) T319L64 (42km)

Page 9: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br9

MPI +OMP Speed-up

T213L42 Eulerian Full

0

4

8

12

16

20

24

28

32

0 8 16 24 32

Processors

Spee

d-up

Page 10: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br10

A T341L64, Semi-Lagrangian, Reduced Grid, OpenMP +

MPI executes on 4 full nodes (32 procs) at 80,67 GFlops

(31,5% of top speed)

SL Reduced Grid

Page 11: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br11

Limited area forecast model for regional weather centers

BRAMS = RAMS + tropical parametrizations + software quality + binary reproducibility + higher efficiency

Contributions from multiple sources:IAG/USP, IME/USP, UFCG, …

Fortran 90, MPI

Tailored for PC Clusters

About 20 men-years effort

BRAMS

Page 12: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br12

CATT-BRAMSAir Pollution due to biomassburning and urban areas

Page 13: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br13

Experimenting with OLAM

150 km global

75 km regional

Page 14: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br14

Ocean Land and Atmosphere Model; global version of RAMS

Developed at Duke University by Robert Walkoand Roni Avissar

Global triangulation based on icosahedron, shaved eta vertical coordinate

Non hydrostatic, finite volume formulation

Prototype versionSound results (daily test runs at CPTEC and IAG/USP)Requires software enhancement (long effort)

OLAM

Page 15: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br15

CREATION, DISTRIBUTION, MAINTENANCE AND SUPPORT OF MODERN, EFFICIENT, UP TO DATE OPEN SOURCE SOFTWARE FOR

METEOROLOGICAL AND ENVIRONMENTAL SCIENCES

HPC Group Role at CPTEC

Page 16: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br16

Web Pages

BRAMS

Global Model

OLAM

Unpublished

Prototypes

Published

Page 17: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br17

BRAMS DISSEMINATION

Week of site accesses

Daily production within Brazil

Page 18: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br18

Given that:CATT-BRAMS runs on NEC-SX6 at CPTECBRAMS runs on PC Clusters all over BrazilA desirable single source physics for BRAMS and OLAM

Is it possible to generate a single sourcephysics that is efficient on a wide range of architectures?

Elusive goal over the last 30 years

Effective Portability

Page 19: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br19

Combine vector instructions with cache reuse

Vector Instructions on SX and PC

“Unstructured” blocked physics From (k, i, j) formulation into (ij, k) Block on ijTailor block size to the architecture

First Attempt

Page 20: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br20

First Attempt: Radiationij x k formulation

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

SX6 (Vector Size 256) IA32 (Vector Size 4)

Exec

utio

n Ti

me

(s)

k formulation ij formulation

Page 21: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br21

0

200

400

600

800

1000

1200

0 10 20 30 40 50 60

Problem Size

MFl

ops

Xeon SX6

Second Attempt: Advection

Original (k, i, j) formulation

Page 22: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br22

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 10 20 30 40 50 60

Problem Size

MFl

ops

Xeon SX6

Second Attempt: Advection

Current Formulation

Page 23: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br23

Development History:1. (k, i, j)2. (ij, k)3. Blocked (ij,k)4. (ijBlk, k, nBlk)5. Vector (nBlk) of all fields on type (ijBlk,k)

Single source, efficient on both vector and microprocessor based architectures

There is still a long way to go:Full code, not single moduleType conversion cost

Second Attempt: Advection

Page 24: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br24

Second Attempt: Advection

Single Performance Parameter

0

1

2

3

4

5

6

7

0 5000 10000 15000 20000 25000 30000 35000 40000

Vector Length

Spee

d G

ain

Xeon SX6

Page 25: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br25

Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware at a time

BRAMS Climatology Generation:Partition Brazil in three regionsThree starting dates (members)Partition 10 years climatology into one year runs

Stressing Grid concept with larger than usual computing load grain

Grid Computing

Page 26: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br26

Domain Partition

40 km resolution, but uneven grid sizes and computational efforts

Page 27: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br27

Results

91/93 (7 days)

94/96 (5 days)

97/99 (4,5 days)

2.6 Speed-up

Page 28: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br28

Formal Procurement for 1000 processors machine, for research purposes

IA32 processors, fast interconnectProposals due Nov 6th

Goal: “Massively Parallel” versions of:Global ModelCATT - BRAMSMesoscale models (includes at least BRAMS)Local Ensemble Kalman Filter based Data AssimilationOLAM…

Future Plans - I

Page 29: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br29

Central computing facility replacementSchedule for 2007/2008, depending upon funding20 – 40 TFlops range

Production Goals:Higher resolution AGCM (20 km)Higher resolution Environmental ModelHigher resolution Mesoscale ModelsKalman Filtering based Data AssimilationClimate Change

Future Plans - II

Page 30: HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware

www.cptec.inpe.br30

THANK YOUTHANK YOU