40
HPX – The Futurization of Computing Thomas Heller ([email protected]) February 28, 2014 Department Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg HPX

HPX - Center for Computation and Technology

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPX - Center for Computation and Technology

HPX – The Futurization of Computing

Thomas Heller ([email protected])

February 28, 2014

Department Computer Science 3

Friedrich-Alexander-Universität Erlangen-Nürnberg

HPX

Page 2: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

2

The HPX Programming Model

Towards a C++ compliant Interface and beyond

Case Studies

HPX Thread Granularity

LibGeoDecomp

ZERPA

... and more

HPX

Page 3: HPX - Center for Computation and Technology

The HPX Programming Model

HPX

Page 4: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

4

The 4 Horsemen of the Apocalypse: SLOW

Starvation

Latency

Overhead

Waiting for contention

HPX

Page 5: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

5

Fighting SLOWness:

Governing Principles

• Active global address space (AGAS), instead of PGAS

• Message driven, instead of Message Passing

• Lightweight Control Objects, instead of Global Barriers

• Adaptive locality control, instead of Static Data Distribution

• Moving work to data, instead of Moving Data to Work

• Fine grained parallelism of lightweight threads, instead of Communication

Sequential Processes (CSP/MPI)

HPX

Page 6: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

6

HPX – A general purpose runtime system

A uniform, standards-oriented API for ease of programming parallel and

distributed applications.

⇒ Standard C++ codebase

⇒ Fully C++11/14 compliant

HPX

Page 7: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

7

HPX – A general purpose runtime system

Exposing new and unexpected forms of parallelism.

⇒ Enables programmer to write fully asynchronous code using hundreds

of millions of threads.

⇒ Provides unified syntax and semantics for local and remote operations.

⇒ Makes concurrency manageable with dataflow and future based

synchronization.

HPX

Page 8: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

8

HPX – A general purpose runtime system

Implements a rich set of runtime services supporting a broad range of use

cases.

⇒ Introspect the state of your parallel computer at any time

⇒ Performance Counters

⇒ Debugger?

HPX

Page 9: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

9

HPX – A general purpose runtime system

Has been designed and developed for systems of any scale.

⇒ Currently running on ARM, x86, Xeon Phi, BlueGeneQ

⇒ Supporting Windows, Linux, OSX, Android, CNK

⇒ Ranging from large scale Clusters over Desktop Computers to

Handheld devices

⇒ Existing performant implementation

HPX

Page 10: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

10

HPX – A general purpose runtime system

Is the first fully functional implementation of the ParalleX execution model.

⇒ ParalleX is the theoretic foundation on which HPX was built

HPX

Page 11: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

11

HPX – A general purpose runtime system

Is published under a liberal open-source license and has a open, active, and

thriving developer community.

⇒ Boost License Version 1.0

⇒ Over 30 contributors from all over the world

⇒ Development started 2005

HPX

Page 12: HPX - Center for Computation and Technology

Towards a C++ compliant Interface and beyond

HPX

Page 13: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

13

Why C++?

• Widely accepted industry standard

• C is a subset of C++!

• Powerful mechanisms to build abstractions while maintaining high

performance

HPX

Page 14: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

14

Parallelism in C++

• Threads (thread)

• Futures (future<T>, shared_future<T>)

• Asynchronous Tasks (async)

• Synchronization primitives (mutex, condition_variable)

• Atomics

• Memory Model

HPX

Page 15: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

15

Utilities

• Partial function application (bind)

• Heterogeneous tuples (tuple<T...>)

• Generic (type erased) function objects (function<R(...)>)

HPX

Page 16: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

16

HPX Extensions: Actions

HPX

C++ Standard Library

C++

R f(p...) Synchronous Asynchronous Fire & Forget(returns R) (returns future<R>) (returns void)

Functions f(p...) async(f, p...) apply(f, p...)(direct)

Functions bind(f, p...)(...) async(bind(f, p...), ...) apply(bind(f, p...), ...)(lazy)

Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)(direct) a()(id, p...) async(a(), id, p...) apply(a(), id, p...)

Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)(lazy) bind(a(), id, p...)

(...)async(bind(a(), id, p...),...)

apply(bind(a(), id, p...),...)

HPX

Page 17: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

17

HPX Extensions: Components

• C++ Objects in the Global Address space

• Referencable through unique GIDs

• Referencable through meaningful symbolic names

• Member functions callable through actions

• Migratable

HPX

Page 18: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

18

HPX Extensions: More power to futures

Composable futures

• hpx::when_all, hpx::when_any, hpx::when_n

• hpx::future<T>::then

• hpx::dataflow

Expressing locality

• Executors let you specify where your tasks run and how they are

scheduled

HPX

Page 19: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

19

The HPX Programming Model

Key features

• Fully asynchronous

• ’Fire & Forget’ semantics (no result available)

• ’Pure’ asynchronous semantics (result available through hpx::future)

• Can be used ’synchronously’, but does not block

• Fully type safe remote operations

• Extending the notion of a callable to remote case (action)

• Everything you can do with functions, can be done with actions

• Data types can be used in remote contexts

• Can be sent over the wire (hpx::function, hpx::bind, hpx::any)

• Can be used with actions (hpx::bind, hpx::async, hpx::function)

HPX

Page 20: HPX - Center for Computation and Technology

Case Studies

HPX

Page 21: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

21

Homogeneous thread spawning in HPX (weak scaling)

Courtesy of Bryce of Lelbach (LSU) and Patricia Grubel (NMSU)HPX

Page 22: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

22

LibGeoDecomp

• C++ Auto-parallelizing framework

• Open Source

• High scalability

• Wide range of platform support

• http://www.libgeodecomp.org

HPX

Page 23: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

23

LibGeoDecomp

Futurizing the Simulation Flow

Basic Simulation flow:

for(Region r: innerRegion) {

update(r, oldGrid , newGrid , step);

}

swap(oldGrid , newGrid);

++step;

for(Region r: outerGhostZoneRegion) {

notifyPatchProviders(r, oldGrid);

}

for(Region r: outerGhostZoneRegion) {

update(r, oldGrid , newGrid , step);

}

for(Region r: innerGhostZoneRegion) {

notifyPatchAccepters(r, oldGrid);

}

HPX

Page 24: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

24

LibGeoDecomp

Futurizing the Simulation Flow

Futurized Simulation flow:

parallel for(Region r: innerRegion) {

update(r, oldGrid , newGrid , step);

}

swap(oldGrid , newGrid); ++step;

parallel for(Region r: outerGhostZoneRegion) {

notifyPatchProviders(r, oldGrid);

}

parallel for(Region r: outerGhostZoneRegion) {

update(r, oldGrid , newGrid , step);

}

parallel for(Region r: innerGhostZoneRegion) {

notifyPatchAccepters(r, oldGrid);

}

Continuation

Continuation

Continuation

HPX

Page 25: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

25

LibGeoDecomp

Performance Results

0

10

20

30

40

50

60

70

1 2 4 8 16

Tim

e [

s]

Number of Cores, on one Node

Execution Times of HPX and MPI N-Body Codes(SMP, Weak Scaling)

Sim HPX

Sim MPI

Comm HPX

Comm MPI

HPX

Page 26: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

26

LibGeoDecomp

Performance Results

HPX

Page 27: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

27

LibGeoDecomp

Performance Results

0

200

400

600

800

1000

1200

1400

1600

0 10 20 30 40 50 60

Pe

rfo

rma

nce

in G

FLO

PS

Number of Cores

Weak Scaling Results for HPX N-Body Code(Single Xeon Phi, Futurized)

1 Thread/Core

2 Threads/Core

3 Threads/Core

4 Threads/Core

HPX

Page 28: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

28

LibGeoDecomp

Performance Results

0

5

10

15

20

25

30

0 2 4 6 8 10 12 14 16

Pe

rfo

rma

nce

in

TF

LOP

S

Number of Nodes, 16 Cores on Host, Full Xeon Phi

Weak Scaling Results for HPX N-Body Codes(Host Cores and Xeon Phi Accelerator)

HPX

Peak

HPX

Page 29: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

29

ZERPA: Taking futurization to the next level

• An EDSL for describing interdependant operators on Volumes

• Heavy usage of the HPX dataflow programming capabilities

• http://www3.cs.fau.de/DE/Research/ZERPA/

HPX

Page 30: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

30

ZERPA: Taking futurization to the next level

cast_part = loader("cast_part")

, mask = loader("mask")

, filtered = cross_median(cast_part)

, cleaned = special_median(filtered , mask)

HPX

Page 31: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

31

ZERPA: Taking futurization to the next level

mask = loader gussteil = loader

filtered = cross_median<1,1,1>

cleaned = special_median<9,9,9>

Requesting result Pushing completed result

HPX

Page 32: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

32

ZERPA: Taking futurization to the next level

HPX

Page 33: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

33

HPX OpenCL integration

• Seamless integration of calling OpenCL kernels inside of HPX

• An HPX backend for pocl

• https://github.com/STEllAR-GROUP/hpxcl

HPX

Page 34: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

34

Other recent results

Tiled LU-Decomposition

Courtesy of Antoine Tran Tan (LRI, Université Paris-Sud XI, INRIA - Orsay, France)HPX

Page 35: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

35

Other recent results

Migration path: OpenMP Backend implemented on top of HPX

Courtesy of Jeremy A. Kemp (UH)

HPX

Page 36: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

36

Other recent results

OpenMP vs. HPX

Barriers vs. Dataflow

for(int i = 0; i < num_iterations; ++i)

{

#pragma omp parallel for

for(j = 0; j < N; ++j)

{

// compute something depending on i-1, i

and j

}

}

std::vector <future <T>> results [];

for(int i = 0; i < num_iterations; ++i)

{

for(j = 0; j < N; ++j)

{

results[i][j] =

when_all(dependencies).then(

// compute something depending on

i-1, i and j

);

}

}

HPX

Page 37: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

37

Other recent results

OpenMP vs. HPX: Jacobi Smoother

Barriers vs. Dataflow

HPX

Page 38: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

38

• github.com/STEllAR-GROUP/hpx/wiki/GSoC-2014-Project-Ideas

• www.google-melange.com/gsoc/homepage/google/gsoc2014

HPX

Page 39: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

39

Summary: Revolution through Evolution

• HPX is a general purpose runtime system

• Standards oriented API

• Unified local and remote semantics

• Existing performant implementation

HPX

Page 40: HPX - Center for Computation and Technology

HPX – The Futurization of ComputingFebruary 28, 2014 | Thomas Heller ([email protected]) | Department Computer Science 3Friedrich-Alexander-Universität Erlangen-Nürnberg

40

Get in touch!

• Blog: http://stellar.cct.lsu.edu

• Code: https://github.com/STEllAR-GROUP/hpx

• Mailing List: [email protected]

• IRC: #ste||ar @ irc.freenode.org

HPX