21
The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

  • Upload
    george

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05. Outline. Compiling and Linking. Optimization. Libraries. Debugging. Porting from Seaborg and other systems. Pathscale Compilers. Default compilers: Pathscale Fortran 90, C, and C++. - PowerPoint PPT Presentation

Citation preview

Page 1: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

The Jacquard Programming Environment

Mike Stewart

NUG User Training, 10/3/05

Page 2: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

2

Outline

• Compiling and Linking.• Optimization.• Libraries.• Debugging.• Porting from Seaborg and other

systems.

Page 3: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

3

Pathscale Compilers

• Default compilers: Pathscale Fortran 90, C, and C++.

• Module “path” is loaded by default and points to the current default version of the Pathscale compilers (currently 2.2.1).

• Other versions available: module avail path.• Extensive vendor documentation available on-line

at http://pathscale.com/docs.html.• Commercial product: well supported and

optimized.

Page 4: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

4

Compiling Code

• Compiler invocation:– No MPI: pathf90, pathcc, pathCC.– MPI: mpif90, mpicc, mpicxx

• The mpi compiler invocation will use the currently loaded compiler version.

• The mpi and non-mpi compiler invocations have the same options and arguments.

Page 5: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

5

Compiler Optimization Options

• 4 numeric levels –On where n ranges from 0 (no optimization) to 3.

• Default level: -O2 (unlike IBM)• –g without a –O option changes the

default to –O0.

Page 6: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

6

-O1 Optimization

• Minimal impact on compilation time compared to –O0 compile.

• Only optimizations applied to straight line code (basic blocks) like instruction scheduling.

Page 7: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

7

-O2 Optimization• Default when no optimization arguments given.• Optimizations that always increase performance.• Can significantly increase compilation time.• -O2 optimization examples:

– Loop nest optimization.– Global optimization within a function scope.– 2 passes of instruction scheduling.– Dead code elimination.– Global register allocation.

Page 8: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

8

-O3 Optimization• More extensive optimizations that may in some

cases slow down performance.• Optimizes loop nests rather than just inner loops,

i.e. inverts indices, etc.• “Safe” optimizations – produces answers

identical with those produced by –O0.• NERSC recommendation based on experiences

with benchmarks.

Page 9: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

9

-Ofast Optimization

• Equivalent to -O3 -ipa -fno-math-errno-OPT:roundoff=2:Olimit=0:div_split=ON:alias=typed.• ipa – interprocedural analysis.

– Optimizes across functional boundaries.– Must be specified both at compile and link time.

• Aggressive “unsafe” optimizations:– Changes order of evaluation.– Deviates from IEEE 754 standard to obtain better

performance.• There are some known problems with this level of

optimization in the current release, 2.2.1.

Page 10: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

10

NAS B Serial Benchmarks Performance (MOP/S)

Seaborg Best

-O0 -O1 -O2 -O3 -Ofast

BT 99.6 157.2 348.1 633.6 739.8 750.9CG 46.3 101.2 128.3 236.9 223.1 224.5EP 3.7 15.1 17.5 21.9 21.8 21.8FT 130.1 186.2 231.5 572.4 592.7 did not

compile

IS 5.8 16.9 22.0 25.6 27.0 26.8LU 169.8 129.0 342.4 700.0 809.9 903.2MG 163.3 109.0 257.9 747.7 518.5 530.0SP 78.2 104.7 225.7 507.3 462.9 516.6

Page 11: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

11

NAS B Serial Benchmarks Compile Times (seconds)

-O0 -O1 -O2 -O3 -OfastBT 2.1 9.0 4.9 9.1 30.7CG .4 .4 .7 .9 1.5EP .4 .4 .5 .6 .9FT .4 .5 .8 1.5 did not compile

IS .3 .4 .4 .7 .9LU 2.1 4.2 5.7 11.4 17.4MG .5 .7 1.1 2.2 2.9SP 1.6 2.0 3.2 10.0 14.4

Page 12: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

12

NAS B Optimization Arguments Used by LNXI Benchmarkers

Benchmark Arguments

BT -O3 -ipa -WOPT:aggstr=off

CG -O3 -ipa -CG:use_movlpd=on -CG:movnti=1

EP -LNO:fission=2 -O3 -LNO:vintr=2

FT -O3 -LNO:opt=0

IS -Ofast -DUSE_BUCKETS

LU -Ofast -LNO:fusion=2:prefetch=0:full_unroll=10:ou_max=5-OPT:ro=3:fold_unsafe_relops=on:fold_unsigned_relops=on:unroll_size=256:unroll_times_max=16:fast_complex -CG:cflow=off:p2align_freq=1 -fno-exceptions

MG -O3 -ipa -WOPT:aggstr=off -CG:movnti=0

SP -Ofast

Page 13: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

13

NAS C FT (32 Proc)

Optimization Mops/Proc Compile Time (seconds)

Seaborg Best 86.5 N/A

-O0 148.8 .7-O1 180.6 .9-O2 356.5 1.4-O3 347.4 2.4

-Ofast 346.0 3.4

Page 14: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

14

SuperLU MPI Benchmark

• Based on the SuperLU general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.

• Mostly C with some Fortran 90 routines. • Run on 64 processors/32 nodes. • Uses BLAS routines from ACML.

Page 15: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

15

SLU (64 procs)

Optimization Elapsed run time (seconds)

Compile Time (seconds)

Seaborg Best 742.5 N/A

-O0 276.7 5.8-O1 241.5 7.1-O2 213.5 10.6-O3 212.1 14.6

-Ofast N/A Did not compile

Page 16: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

16

Jacquard Applications Acceptance Benchmarks

Benchmark Seaborg Jacquard Jacquard Optimizations

NAMD (32 proc) 2384 sec 554 -O3 -ipa -fno-exceptions

Chombo Serial 1036 sec 138 -O3 -OPT:Ofast -OPT:Olimit=80000 -fno-math-errno -finline

Chombo Parallel (32 proc)

773 sec 161 -O3 -OPT:Ofast -OPT:Olimit=80000 -fno-math-errno -finline

CAM Serial 1174.4 sec 264 -O2

CAM Parallel (32 proc)

75 sec 13.2 -O2

SuperLU (64 proc)

742.5 sec 212 -O3 -OPT:Ofast -fno-math-errno

Page 17: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

17

ACML Library• AMD Core Math Library - set of numerical routines tuned

specifically for AMD64 platform processors.– BLAS– LAPACK– FFT

• To use with pathscale:– module load acml (built with pathscale compilers)– Compile and link with $ACML

• To use with gcc:– module load acml_gcc (build with pathscale compilers)– Compile and link with $ACML

Page 18: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

18

Matrix Multiply Optimization Example

• 3 ways to multiply 2 dense matrices– Directly in Fortran with nested loops– Matmul F90 intrinsic– dgemm from ACML

• Example 2 1000 by 1000 double precision matrices.

• Order of indices: ijk means– do i=1,n– do j=1,n– do k=1,n

Page 19: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

19

Fortran Matrix Multiply MFLOPs

Seaborg Best -O0 -O1 -O2 -O3 -Ofast

ijk 693 65 117 148 1640 1706jik 691 63 100 139 1640 1802ikj 691 53 51 52 1471 1579kij 691 48 48 53 1619 1706jki 691 72 236 598 1153 1802kji 691 72 183 385 1599 1706matmul 946 561 554 564 1683 1706dgemm 1310 3876 3763 3877 3877 3877

Page 20: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

20

Debugging

• Etnus Totalview debugger has been installed on the system.

• Still in testing mode, but it should be available to users soon.

Page 21: The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05

21

Porting codes

• Jacquard is a linux system so gnu tools like gmake are the defaults.

• Pathscale compilers are good, but new, so please report any evident compiler bugs to consult.