14
AMD intel

AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

AMD

intel

Page 2: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

AMD vs. Intel

The Compiler as Referee

• pgf95 –tp k8-64

• pgf95 –tp p7-64

Parallel Ocean Program

75 80 85 90 95 100

Woodcrest

Opteron -tp k8-64

-tp p7-64

Page 3: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

AMD vs. Intel

The Compiler as Referee

• pgf95 –tp k8-64

• pgf95 –tp x64

• pgf95 –tp p7-64

Parallel Ocean Program

75 80 85 90 95 100

Woodcrest

Opteron-tp p7-64

-tp x64

-tp p7-64

Page 4: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

PGI Unified Binary™

• Generate two versions of a function

• Select at run time which to run

• simple method

-tp x64

• targeted method

-tp k8-64,p7-64,core2-64

• tuned method

#pragma pgi routine tp k8-64 core2-64

Page 5: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

SPEC FP2006 Relative Performance

Opteron

80 85 90 95 100

410 bwaves416 gamess

433 milc

434 zeusmp435 gromacs

436 cactusadm437 leslie3d

444 namd447 dealII

450 soplex453 povray

454 calculix459 gemsFDTD

465 tonto470 lbm

481 wrf482 sphinx3

-tp k8-64 -tp p7-64

Page 6: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

SPEC FP2006 Relative Performance

Opteron

80 85 90 95 100

410 bwaves

416 gamess

433 milc

434 zeusmp

435 gromacs436 cactusadm

437 leslie3d

444 namd

447 dealII

450 soplex

453 povray

454 calculix459 gemsFDTD

465 tonto

470 lbm

481 wrf

482 sphinx3

-tp k8-64 -tp x64 -tp p7-64

Page 7: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

SPEC FP2006 Relative Performance

Woodcrest

0 20 40 60 80 100

410 bwaves416 gamess

433 milc434 zeusmp

435 gromacs436 cactusadm

437 leslie3d

444 namd447 dealII

450 soplex

453 povray454 calculix

459 gemsFDTD465 tonto

470 lbm

481 wrf482 sphinx3

-tp k8-64 -tp p7-64

Page 8: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

SPEC FP2006 Relative Performance

Woodcrest

0 20 40 60 80 100

410 bwaves

416 gamess

433 milc

434 zeusmp

435 gromacs

436 cactusadm

437 leslie3d

444 namd

447 dealII

450 soplex

453 povray

454 calculix

459 gemsFDTD

465 tonto

470 lbm

481 wrf

482 sphinx3

-tp k8-64 -tp x64 -tp p7-64

Page 9: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

Overhead and Cost

• Compile time

– from 10%-90%

• Disk space

– from 10% to 70%

• Memory

– from 10% to 90%

• Run time

– selection cost

0

20

40

60

80

100

120

%compile

0

20

40

60

80

100

space

Page 10: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

Overhead and Cost

• Compile time

– from 10%-90%

• Disk space

– from 10% to 70%

• Memory

– from 10% to 90%

• Run time

– selection cost

0

20

40

60

80

100

120

%compile %functions

0

20

40

60

80

100

space

Page 11: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

Overhead and Cost

• Compile time

– from 10%-90%

• Disk space

– from 10% to 70%

• Memory

– from 10% to 90%

• Run time

– selection cost

0

20

40

60

80

100

120

%compile %functions

0

20

40

60

80

100

space .text

Page 12: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

Convergence or Divergence?

x86/x87

Add’l Regs

HyperThread

Shared Cache

SSE1/2/3

L2 Cache

Multi-Core64-bit

Hypertransport

AMD64 Arch

MMX

Clock RateDual Issue SSE

AMD64 Both EM64T

Core2 Arch

MobileEmbedded

MNI

L3 Cache

Page 13: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

PGI Unified Binary™ Executables

• A single x64 binary including optimized code

sequences for both AMD64 and EM64T/Core2

• SW-enabled binary compatibility

• Reduce development, tuning, manufacturing,

maintenance costs

• Maximize flexibility for end-users to deployapplications across platforms

• Exploit the innovations of both AMD and Intel without

fear of losing binary compatibility

Page 14: AMD intel - PGI Compilers and Tools · 2006-11-20 · AMD vs. Intel The Compiler as Referee ¥pgf95 Ðtp k8-64 ¥pgf95 Ðtp x64 ¥pgf95 Ðtp p7-64 P a ra lle l O c e a n P ro g ra

Future Work

• Reduce I-cache and Virtual Memory pollution

• Tune with profile feedback

• Reduce function selection cost

The Portland Group, PGI Unified Binary and PGF95 are trademarks

and PGI is a registered trademark of STMicroelectronics. Other brands

and names are the property of their respective owners.