33
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Javier Cuenca Domingo Giménez José González Jack Dongarra Kenneth Roche

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

  • Upload
    sulwyn

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load. Jack Dongarra Kenneth Roche. Javier Cuenca Domingo Giménez José González. Optimisation of Linear Algebra Routines. Traditional method: Hand-Optimisation for each platform Time-consuming - PowerPoint PPT Presentation

Citation preview

Page 1: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Javier CuencaDomingo Giménez

José González

Jack DongarraKenneth Roche

Page 2: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Optimisation of Linear Algebra Routines

•Traditional method: Hand-Optimisation for each platform

› Time-consuming› Incompatible with Hardware Evolution› Incompatible with changes in the system › (architecture and basic libraries)› Unsuitable for systems with variable load› Misuse by non expert users

Page 3: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Our ApproachModelling

the Linear Algebra Routine (LAR):

Texec = f (SP, AP, n)

SP: System ParametersAP: Algorithmic Parametersn: Problem size

Estimationof SP

Selectionof AP values

Executionof LAR

DESIGN

INSTALLATION

RUN-TIME

Page 4: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Our Approach

LARsJacobi methods for the symmetric eigenvalue problem

Gauss elimination

LU factorisation

QR factorisation

PlatformsCluster of Workstations

Cluster of PCs

SGI Origin 2000

IBM SP2

Static Model of LAR: Situation of platform at installation time

Page 5: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Our Approach

LARsJacobi methods for the symmetric eigenvalue problem

Gauss elimination

LU factorisation

QR factorisation

PlatformsCluster of Workstations

Cluster of PCs

SGI Origin 2000

IBM SP2

Static Model of LAR: Situation of platform at installation time

Dynamic Model of LAR: Situation of platform at run-time.

Page 6: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

DESIGN PROCESS

DESIGN

LAR: Linear Algebra RoutineMade by the LAR Designer

LAR

Example of LAR: Parallel Block LU factorisation

Page 7: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

Page 8: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

MODELTexec = f (SP, AP, n)

SP: System Parameters AP: Algorithmic Parameters n : Problem size

Made by the LAR-DesignerOnly once per LAR

Page 9: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

SP: k3, k2, ts, twAP: p, bn : Problem size

MODEL LAR: Parallel Block LU factorisation

Page 10: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Implementation of SP-EstimatorsLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

DESIGN

Page 11: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Implementation of SP-EstimatorsLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

DESIGN

Estimators of Arithmetic-SPComputation Kernel of the LARSimilar storage schemeSimilar quantity of data

Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communicationSimilar quantity of data

Page 12: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

INSTALLATION PROCESSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

INSTALLATION

DESIGN

Installation ProcessOnly once per PlatformDone by the System Manager

Page 13: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Page 14: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Basic LibrariesBasic Communication Library:

MPI PVM

Basic Linear Algebra Library: reference-BLAS

machine-specific-BLASATLAS

Installation FileSP values are obtained using the information (n and AP values) of this file.

Page 15: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Estimation of the Static-SP tw-static (in sec)

Message size (Kbytes) 32 256 1024 2048tw-static 0.700 0.690 0.680 0.675

Platform:Cluster of Pentium III + Fast Ethernet

Basic Libraries: ATLAS and MPI

Estimation of the Static-SP k3-static (in sec)

Block size 16 32 64 128k3-static 0.0038 0.0033 0.0030 0.0027

Page 16: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

RUN-TIME PROCESSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

Page 17: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

Optimum-AP

Selectionof Optimum AP

RUN-TIME PROCESS: Static approach

Page 18: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

Optimum-AP

Selectionof Optimum AP

Executionof LAR

RUN-TIME PROCESS: Static approach

Page 19: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

RUN-TIME PROCESS:Dynamic Approach

Page 20: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Call to NWSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 21: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Call to NWS

RUN-TIME

NWS Information

Call to NWS

The NWS is called and it reports:

the fraction of available CPU (fCPU)

the current word sending time (tw-current) for a specific n and AP values (n0, AP0).

Then the fraction of available network is calculated:

Page 22: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Call to NWSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 23: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Dynamic Adjustment of SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 24: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Dynamic Adjustment of SP

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

The values of the SP are adjusted, according to the current situation:

Static-SP-File

RUN-TIME

Page 25: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Dynamic Adjustment of SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 26: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Selection of Optimum APLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

Optimum-AP

Selectionof Optimum AP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 27: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Execution of LARLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

Optimum-AP

Selectionof Optimum AP

Executionof LAR

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Page 28: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Platform load: different situations studied

nodo1 nodo2 nodo3 nodo4 nodo5 nodo6 nodo7 nodo8Situation A

CPU avail. 100% 100% 100% 100% 100% 100% 100% 100%tw-current 0.7sec

Situation BCPU avail. 80% 80% 80% 80% 100% 100% 100% 100%tw-current 0.8sec 0.7sec

Situation CCPU avail. 60% 60% 60% 60% 100% 100% 100% 100%tw-current 1.8sec 0.7sec

Situation DCPU avail. 60% 60% 60% 60% 100% 100% 80% 80%tw-current 1.8sec 0.7sec 0.8sec

Situation ECPU avail. 60% 60% 60% 60% 100% 100% 50% 50%tw-current 1.8sec 0.7sec 4.0sec

Page 29: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Optimum AP for the different situations studied

Block size Situations of the Platform Load

n A B C D E1024 32 32 64 64 642048 64 64 64 128 1283072 64 64 128 128 128

Number of nodes to use p = r c

Situations of the Platform Loadn A B C D E1024 42 42 22 22 212048 42 42 22 22213072 42 42 22 2221

Page 30: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Experimental Time:deviations from the Optimum

n = 1024

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

A B C D E

Situations of platform load

Static ModelDynamic Model

Page 31: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Experimental Time:deviations from the Optimum

n = 2048

0%

20%

40%

60%

80%

100%

120%

140%

160%

A B C D E

Situations of the platform load

Static ModelDynamic Model

Page 32: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Experimental Time:deviations from the Optimum

n = 3072

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

A B C D E

Situations of the platform load

Static ModelDynamic Model

Page 33: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Conclusions and Future Work

•The use of the proposed methodology is viable in systems where the load is stable or variable.

•Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time.

•The heterogeneous load case offers many more possibilities than the one studied.