23
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé

Application-specific Topology-aware Mapping for Three Dimensional Topologies

  • Upload
    amal

  • View
    68

  • Download
    1

Embed Size (px)

DESCRIPTION

Abhinav Bhatelé Laxmikant V. Kalé. Application-specific Topology-aware Mapping for Three Dimensional Topologies. Outline. Motivation The Mapping Problem Static Mapping: 3D Stencil Load Balancing: NAMD Future Work. The network latency for wormhole routing is (L f /B)*D + L/B - PowerPoint PPT Presentation

Citation preview

Page 1: Application-specific Topology-aware Mapping for Three Dimensional Topologies

Application-specific Topology-aware Mapping for Three Dimensional Topologies

Abhinav BhateléLaxmikant V. Kalé

Page 2: Application-specific Topology-aware Mapping for Three Dimensional Topologies

2

Outline

• Motivation• The Mapping Problem• Static Mapping: 3D Stencil• Load Balancing: NAMD• Future Work

Page 3: Application-specific Topology-aware Mapping for Three Dimensional Topologies

3

The network latency for wormhole routing is

(Lf/B)*D + L/B

Lf = Length of each flit, B = bandwidth D = number of hops, L = length of message

Lionel M. Ni and Philip K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, Computer, Volume 26, Issue 2, pages 62-76, 1993

Page 4: Application-specific Topology-aware Mapping for Three Dimensional Topologies

4

0.001

0.01

0.1

1

512 1024 2048 4096

No. of processors

Tim

e (

ms) 100000 NN

10000 NN

1000 NN

100 NN

10 NN

Message Latencies

0.001

0.01

0.1

1

10

512 1024 2048 4096

No. of processors

Tim

e (

ms)

100000 RND

100000 NN

10000 RND

10000 NN

1000 RND

1000 NN

100 RND

100 NN

10 RND

10 NN

NN = Near Neighbor, RND = Random

Page 5: Application-specific Topology-aware Mapping for Three Dimensional Topologies

5

Hardware Latencies

• Blue Gene/L– Near neighbor: < 1 µs– Worst case: 7 µs

• Blue Gene/P– Near neighbor: < 1 µs– Worst case: 5 µs

• Corresponding differences for MPI messages

Page 6: Application-specific Topology-aware Mapping for Three Dimensional Topologies

6

Topology-aware mapping

• Problem: Given a object communication graph and a processor graph, find an optimal mapping– Minimizes communication– Ensure load balance

• Metric for communication traffic– Hop-bytes = number of links (hops) traversed

X message size

Page 7: Application-specific Topology-aware Mapping for Three Dimensional Topologies

7

Machine Topology

• Information required at runtime– No. of processors in the allocated partition– No. of processors along each dimension– Physical coordinates of each processor

Page 8: Application-specific Topology-aware Mapping for Three Dimensional Topologies

8

Page 9: Application-specific Topology-aware Mapping for Three Dimensional Topologies

9

Communication Graph

• Static– 3D Stencil: regular communication graph

• Dynamic– Molecular dynamics application– Changes as atoms migrate from one processor

to another

Page 10: Application-specific Topology-aware Mapping for Three Dimensional Topologies

10

Static Graph - 3D Stencil

Page 11: Application-specific Topology-aware Mapping for Three Dimensional Topologies

11

Performance

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

512 1024 2048 4096 8192

No. of Processors

Tim

e p

er

itera

tion (

secs

)

Random

Round-Robin

Topology

Page 12: Application-specific Topology-aware Mapping for Three Dimensional Topologies

12

Hop counts

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

512 1024 2048 4096 8192

Hop count (in millions)

Num

ber

of

pro

cess

ors

Random

Round-robin

Topology

Page 13: Application-specific Topology-aware Mapping for Three Dimensional Topologies

13

Dynamic Graph - NAMD

• Molecular Dynamics (MD) application

• Simulation box is a 3D cell full of atoms

Patches

Computes

Page 14: Application-specific Topology-aware Mapping for Three Dimensional Topologies

14

Page 15: Application-specific Topology-aware Mapping for Three Dimensional Topologies

15

Load Balancing in NAMD

• Measurement-based (Charm++)– Principle of persistence

• Patches are statically mapped– Orthogonal recursive bisection

• Computes can be migrated• Load balancing framework gathers the

communication information• Goal

– Minimize communication– Maximize load balance

Page 16: Application-specific Topology-aware Mapping for Three Dimensional Topologies

16

Y

Non- bondedComputes

Patches

X

BondedComputes

Z

Page 17: Application-specific Topology-aware Mapping for Three Dimensional Topologies

17

Old strategy

• Greedy approach• Pick the heaviest compute• Place it on a processor with one of the

patches OR• On a processor which already has a

compute for this patch

Page 18: Application-specific Topology-aware Mapping for Three Dimensional Topologies

18

Patch 1

Patch 2

Outer Brick

Inner Brick

3D Torus

Page 19: Application-specific Topology-aware Mapping for Three Dimensional Topologies

19

Hop-bytes

0

500

1000

1500

2000

2500

3000

Hop-byt

es

(MB)

512 1024 2048 4096 8192

Number of processors

NAMD on Blue Gene/ L (VN mode)

Old

Topology

0

500

1000

1500

2000

2500

3000

Hop-byt

es

(MB)

512 1024 2048 4096 8192

Number of processors

NAMD on Blue Gene/ L (CO mode)

Old

Topology

~17 %

Page 20: Application-specific Topology-aware Mapping for Three Dimensional Topologies

20

Future Work

• Reason for contention– Heavy communication exceeding bandwidth– Link contention (such as in deterministic

routing)

• Use UPC/PAPI on Blue Gene/L and P

Page 21: Application-specific Topology-aware Mapping for Three Dimensional Topologies

21

Future Work

• Automatic Mapping– Initial Static Mapping– Use case – meshing applications

• Extend work on the Charm++ load balancers– Section-multicast aware load balancers– Useful in matrix multiplication

Page 22: Application-specific Topology-aware Mapping for Three Dimensional Topologies

22

Future Work

• Optimization on other topologies– SiCortex (Kautz Graph)– Infiniband clusters (Fat-tree)

Page 23: Application-specific Topology-aware Mapping for Three Dimensional Topologies

23

Summary

• Topology mapping helps!– Especially heavily communication bound

applications

• Static mapping• Dynamic mapping during load balancing• Automatic mapping to relieve the user