21
Distributed Galois Andrew Lenharth 2/27/2015

Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Embed Size (px)

Citation preview

Page 1: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Distributed Galois

Andrew Lenharth2/27/2015

Page 2: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Goals

• An implementation of the operator formulation for distributed memory– Ideally forward-compatible where possible

• Both simple programming model and fast implementation– Like Galois, may need restrictions or structure for

highest performance

Page 3: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Overview

• PGAS (using fat pointers)• Implicit, asynchronous communication• Default execution mode:– Galois compatable– Implicit locking and data movement– Plugable schedulers– Speculative execution

• All D-Galois programs are valid Galois

Page 4: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Support

Galois Implementation

User Code

User Context

Graph

Parallel Loop

Contention Manager

Memory Management

Statistics

Topology

Scheduler

Barrier

Termination Etc

Page 5: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Support

Distributed Galois Implementation

User Code

User Context

Graph

Parallel Loop

Contention Manager

Memory Management

Statistics

Topology

Scheduler

Barrier

Termination Etc

NetworkDirectoryRemote Store

Page 6: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Current Status

• Working implementation of baseline– Asynchronous, speculative

Page 7: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Interesting Problems

• Livelock• Asynchronous directory• Abstractions for building data-structures• Network hardware• Network software• Remote updates• Scheduling

Page 8: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Solved: Livelock

• Source: object state transition is more complex, is asynchronous, and may require multiple steps (hence interruptable)

• Solution: scheme to ensure forward progress of one host

• Alternate: if this happens a lot for your application, a coordinated scheduling may be more appropriate (or relaxed consistency)

Page 9: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Asynchronous Directory

• Source: communication and workers interleave access to directory (and directly to objects stored in the directory)

• Solution: mostly just a pain.

Page 10: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Abstraction for building DS

• Source: Distributed data structures are hard (so are SM DS).

• Solution: Set of abstractions• Federated object: different instance on each

host/thread, pointers resolve locally.• Federation bootstrapped by runtime.• Federated objects don’t have any notion of

exclusive behavior

Page 11: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Remote Updates

• Directory synchronization really bad when not needed (essential when needed)

• Many algorithms have an update and schedule behavior for their neighbors

• Treat this behavior as a task type– Multiple task-types per loop– Quite similar to nested parallelism

Page 12: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Remote Updates – PageRank

Self.value += self.residualFor n : neighbor

n.residual += f(self.residual)Schedule (operator type on) {n}

Self.value += self.residualFor n : neighbor

Schedule (update type on) {n, f(self.residual)}

With a new operator:Self.redual += updateSchedule (operator type on) {self}

Page 13: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Scheduling

• Source: Imagine SSSP using the existing schedulers (host-unaware) on distributed memory

• Need schedule with way to anchor work to data-structure element

Page 14: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Network hardware

Page 15: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible
Page 16: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible
Page 17: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Networks

• Small asynchronous messages are bad for throughput

• Scale-free graphs stress throughput• Large messages are bad for latency• Find optimal point– Sometimes latency is critical

Page 18: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Nagle’s algorithm

• If you don’t have a large message, wait a while to get more data

• Bad for latency• Also, keeps MPI in it’s broken behavior range• Also, requires O(P) memory for

communications (assuming direct pointwise)

Page 19: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Communication pattern

Page 20: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Communication pattern

Page 21: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible

Software Routing

• Pros: single communication channel– Scales with hosts– Aggregates all messages

• Cons: 2 hops (or more)