19
David Chase 2005-10-25 High-Productivity Languages for HPC: Compiler Challenges

High-Productivity Languages for HPC: Compiler Challenges

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Productivity Languages for HPC: Compiler Challenges

David Chase

2005-10-25

High-Productivity Languages for HPC: Compiler Challenges

Page 2: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Fortress

• New language• Designed for productivity, high performance,

abundant parallelism.

2

ContributorsGuy SteeleJan-Willem MaessenEric AllenDavid ChaseSukyoung RyuVictor LuchangcoChristine Flood

Sam Tobin-HochstadtYossi LevCheryl McCoshJoe HallettCarl EastlundJoao Dias

Page 3: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

High productivity

• Speed of coding to scale (speed, fault tolerance)• Ease of reuse• Ease of debugging• Ease of maintenance• Portable performance (1P, CMT, SMP, NUMA, MPP)• Ease of deployment (fragile dependence on DLLs?)• Ease of system maintenance• Larger pool of programmers• Domain-specific extensions

3

Page 4: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

• Garbage collection• Transactional memory• Fault tolerance• Trustworthy compilers• (Support for) cache-oblivious/work-stealing style• Programming-by-contract• Better “human factors”

4

High productivity features(that present compiler challenges)

Page 5: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

GC, TM, FT, and the compiler• GC and the compiler is pretty-well understood> Compiler can help with safepoints, barrier optimization,

logging optimization, and pointer maps.> Runtime and compiler can be co-designed.> Compiler must be aware of runtime, concurrency, and

memory-model issues.• GC, TM, and FT are similar in many ways> Make copies of data> Monitor reads and writes> Profit from locality information> Tend to use read/write barriers, logging, and safepoints> Can we combine these? How can the optimizer help?

5

Page 6: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Example: Card-mark design/optimization

• Generational GC uses write-barriers to enforce old-young partition. Pointers from old to young must be treated specially.

• Traditional software write-barrier maps “card” X to heap addresses [X*256, X*256+256).

• A pointer written to address Y requires mark of card Y/256.

• Garbage collection looks for dirty cards, finds corresponding objects, and records actual old-to-young pointers.

6

Page 7: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Example: Card-marks and safe points

• Marks for multiple writes to the same address are redundant, provided no GC can intervene.

• (Non-concurrent) GC can only occur at safepoints; therefore, if two writes to Y are not separated by a safepoint, one card mark can be eliminated.

7

o.f = pmark(&o.f)o.f = qmark(&o.f)

o.f = po.f = qmark(&o.f)

Page 8: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

8

Example: Card-marks, per-object

• Scanning often requires access to object header; might as well scan the whole object. Objects (unlike arrays) are usually small-ish.

• Marks for writes to different fields can be redundant, provided no safepoint intervenes.

5

o.f1 = pmark(&o.f1)o.f2 = qmark(&o.f2)

o.f1 = po.f2 = qmark(&o)

Page 9: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Example: Card-mark loop optimization

• When scanning cards, extend the range of addresses if the ending object is an array of pointers. card X maps to [X*256-64, X*256+256).

• In compiler, make writeBarrier( A[i] ) be redundant with writeBarrier( A[i+K] ) for 0 <= K < 8.

• When a loop writing into an array of objects is unrolled by 8, all but one of the write-barriers is removed (provided no safepoint intervenes)

9

Page 10: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Example: card-mark youth optimization.

• Card marking is used to record creation of pointers from “old” objects to “young” object.

• Newly-allocated objects are guaranteed young until a GC occurs.

• Until a safepoint intervenes, stores into a fresh object require no card marking.

10

o = new O()o.f1 = pmark(&o.f1)o.f2 = qmark(&o.f2)

o = new O()o.f1 = po.f2 = q

Page 11: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Ths OS is in the way. Do we trust our compilers enough to replace it?

• OS threads are slow and clunky.• OS traps are slow and clunky.• OS virtual memory is inflexible.• Do we trust our compilers enough to let them take

the place of the kernel/user boundary?> We lack consensus on “correctness” for parallel programs.

+ Many correct answers; optimization may shrink that set, but that’s OK.> Only an option for safe languages.

What about C and C++?

11

Page 12: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

JavaGrande Sync Method (larger is better)

0

1000000

2000000

3000000

4000000

5000000

6000000

0 32 64 96 128 160 192 224 256

Number of threads

To

tal s

yncs

per

sec

on

d

NBSun 1.3.1IBM Windows

12

OS threads

User threads

The OS is in the way; synchronization

Page 13: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

13

JavaGrande Barrier Simple (larger is better)

0

50000

100000

150000

200000

250000

300000

350000

400000

0 32 64 96 128 160 192 224 256

Number of Threads

To

tal b

arri

ers

per

sec

on

d

NBSun 1.3.1IBM Windows

OS threads

User threads

The OS is in the way; wait/notifyAll

Page 14: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Cache-oblivious computing• Subdivide a problem on its largest dimension.> Good in theory, often good in practice> Automatically exploits size of caches, TLBs, working set > Good for work-stealing, work-dealing> Minimizes area of boundary between subproblems> Also N-processors oblivious

• But...> code generated by inlining at leaves seems to be “ugly”> work-stealing is spatial-locality-ignorant> can we pipeline between recursive nests?> can we map it to a cluster?

14

Page 15: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Programming-by-contract

• Said to help productivity, seems like it should.• How should the optimizer use contract information?> Can we optimize contracts?> Can contracts help with library-level optimizations?> Should we only generate code for contracts that the

compiler finds useful?• Does the contract language allow us to say the right

things?

15

Page 16: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Example contract for Vector

16

class Vector { Object[] items; int size; invariant {size <= items.length}

int size() { return size; } void put(int i, Object o) requires { 0 <= i < size() } { items[i] = o; }

Object get(int i) requires { 0 <= i < size() } { return items[i]; }}

Page 17: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

Human factors

• Error-reporting must be as informative as possible> Parsing> Type-inference> Exception stack traces (like Java, or Python)

• Error-reporting should be as early as possible• Observability> Why is it slow?> Which threads have problems?

• How hard is it to say what I mean?> Expressive type systems can be a pain.

17

Page 18: High-Productivity Languages for HPC: Compiler Challenges

Page

Compiler Challenges

18

I am betting that this is the name of some C++ function, run through a one-way hash to yield a unique name.

__ZNSt15basic_streambufIcSt11char_traitsIcEED4Ev

(Sarcastically) I assume a one-way hash, because it would just be so incredibly stupid not to fix the Unix™ linker for A DOZEN WHOLE YEARS to make it demangle C++ identifiers into something meaning something to the programmer....

How disgusted do I need to be, that my mail agent makes a swoosh sound IN STEREO, and my windows pour into their minaturized form, but my linker still hands me missing symbols that look like line noise.

Our rotten tools; excerpt from a rant

Page 19: High-Productivity Languages for HPC: Compiler Challenges

David [email protected]

High-Productivity Languages for HPC: Compiler Challenges