The Bulk Synchronous Parallel Model (PSC §1.2)

Lecture 1.2 Bulk Synchronous Parallel Model

The Bulk Synchronous Parallel Model(PSC §1.2)

1 / 17

What is a parallel computer?

PC PC PC PC PC PC PC PC

switch switch switch switch

A parallel computer consists of a set of processors (such as acluster of PCs) that work together on solving a computationalproblem.

2 / 17

Moore’s law has broken down

I Single-processor speed cannot continue to improve. Moore’sLaw (‘Speed doubles every 18 months’) has broken down in2007, because of increasing power consumption O(f 3), wheref is the clock frequency. Therefore, improvements can onlycome from using multiple processors or processor cores.

I Current supercomputers are all parallel computers. ChineseTianhe-2 (Milkyway-2) has 3.12 million processor cores and isthe fastest supercomputer on earth (Top 500, June 2014). Itcan reach a speed of 33.8 Petaflop/s; 1 Petaflop = 1015

floating-point operations. Its power consumption is 17.8 MW.

3 / 17

Why parallel computing?

I It is almost as cheap to put two processor cores onto a chip(giving a dual-core chip) as putting just one. The doubledspeed of a dual-core PC makes for fine advertising.

I Two cores running at clock frequency f /2 use only 1/4 of thepower of one core running at frequency f .

I But it is hard to achieve the same total computation speed inpractice.

4 / 17

Why not?

I It is more difficult to write parallel programs than to writesequential ones (i.e. for one processor). The work has to bedistributed evenly over the processors and the amount ofcommunication between the processors has to be kept withinlimits.

I But not much more difficult. That’s why we have this course.

I Parallel programs may run fast on certain architectures, butsurprisingly slow on others.

I But not if you program in a portable fashion. That’s what wetry to teach.

5 / 17

Parallel computer: abstract model

MP P P PP

M M M M

Communicationnetwork

Bulk synchronous parallel (BSP) computer.Proposed by Leslie Valiant, 1989.

6 / 17

BSP computer

I A BSP computer consists of a collection of processors, eachwith its own memory. It is a distributed-memory computer.

I Access to own memory is fast, to remote memory slower.

I Uniform-time access to all remote memories.

I No need to open the black box of the communicationnetwork. Algorithm designers should not worry about networkdetails, only about global performance.

I Algorithms designed for a BSP computer are portable: theycan be run efficiently on many different parallel computers.

7 / 17

Parallel algorithm: supersteps

P(0) P(1) P(2) P(3) P(4)

8 / 17

BSP algorithm

I A BSP algorithm consists of a sequence of supersteps.

I A computation superstep consists of many small steps, suchas the floating-point operations (flops) addition, subtraction,multiplication, division. In scientific computing, flops are thecommon unit for expressing computation cost.

I A communication superstep consists of many basiccommunication operations, each transferring a data word suchas a real or integer from one processor to another.

I In our theoretical algorithms, we distinguish between the twotypes of supersteps. This helps in the design and analysis ofparallel algorithms.

I In our practical programs, we drop the distinction and mixcomputation and communication freely in each superstep.

9 / 17

Communication superstep: h-relation

2-relations:

P(0) P(1)

P(0)P(0) P(0)P(0) P(1)

(a) (b)

I An h-relation is a communication superstep in which everyprocessor sends and receives at most h data words:h = max{hs, hr}.

I hs is the maximum number of data words sent by a processor.

I hr is the maximum number of data words received by aprocessor.

10 / 17

Cost of communication superstep

I T (h) = hg + l , where g is the time per data word and l theglobal synchronisation time.

I Motivation hg : h determines communication time, sinceentry/exit of processor is the bottleneck .

I Motivation l : contains fixed overhead such as start-up costsof sending data, costs of checking whether all data havearrived at their destination, and costs of the synchronisationmechanism itself.

11 / 17

Time of an h-relation on an 8-processor IBM SP2

100000

150000

200000

250000

300000

350000

0 50 100 150 200 250

Measured dataLeast-squares fit

r = 212 Mflop/s, p = 8, g = 187 flop (0.88µs),l = 148212 flop (698 µs). Year 2000.

12 / 17

Time of an h-relation on a 4-core Apple iMac desktop

100000

0 50 100 150 200 250

Measured dataLeast-squares fit

r = 6.8 Gflop/s, p = 4, g = 304 flop (44 ns),l = 5796 flop (0.85 µs). Year 2014.3.4 GHz Intel Core i5, 4 cores, running MulticoreBSP for C.

13 / 17

Cost of computation superstep

I T = w + l , where w is the maximum number of flops of aprocessor in the superstep.

I Processors with less than w flops have to wait. This waitingtime is called idle time.

I To measure T , a wall clock is needed, giving the elapsed time.Straightforwardly using a CPU timer will not work, since itdoes not measure idle time.

I Synchronising the processors before every time measurementhelps, but it takes time to synchronise!

I Same l as in communication superstep, for simplicity.

14 / 17

Cost of algorithm

I The cost of a BSP algorithm is an expression of the form

a + bg + cl .

This cost is obtained by adding the costs of all the supersteps.

I Note that g = g(p) and l = l(p) are in general a function ofthe number of processors p.

I The parameters a, b, c depend in general on p and on aproblem size n.

15 / 17

Parallel algorithm: supersteps

P(0) P(1) P(2) P(3) P(4)

Cost of BSP algorithm for p = 5, g = 2.5, l = 20 is 320 flops.First computation superstep costs 60 + 20 = 80 flops.First communication superstep costs 4 · 5 · 2.5 + 20 = 70 flops.

16 / 17

Summary

I An abstract BSP machine is just a BSP(p, r , g , l) computer.This is all we need to know about the machine for developingalgorithms. The parameters are:p number of processorsr computing rate (in flop/s)g communication cost per data word (in flop time units)l global synchronisation cost (in flop time units)

I The BSP model consists ofI a distributed-memory architecture with a black box

communication network providing uniform-time access toremote memories;

I an algorithmic framework formed by a sequence of supersteps;I a cost model giving cost expressions of the form a + bg + cl .

17 / 17

The Bulk Synchronous Parallel Model (PSC §1.2)

Documents

Bulk: A Modern C++ Interface for Bulk-Synchronous Parallel

Bulk Power System Dynamics with Varying Levels of ... · Bulk Power System Dynamics with Varying Levels of Synchronous Generators and Grid-Forming Power Inverters Brian J. Pierre

Unicorn: A Bulk Synchronous Programming Model, …sbansal/theses/tarun_beri.pdfunicorn: a bulk synchronous programming model, framework and runtime for hybrid cpu-gpu clusters tarun

Distributed Optimization - Haoming Jiang · Bulk synchronous parallel(BSP) Pregel （4）GraphX (2012) 三、Petuum (1) Background: (2) Parameter Server (3) Stale Synchronous Parallel

Scientific Computing on Bulk Synchronous Parallel Architectures

Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

BSPonP2P: Towards Running Bulk-Synchronous Parallel ... · BSP (Bulk Synchronous Parallel) programming model•Applications (sorting, broadcast, data mining, computation fluid dynamics,

PSC-4xxx 6xxx Power Supply Control Software · PSC-6118, PSC-6136, PSC-6160, PSC-6318, PSC-6336, PSC-6360, PSC-6616, PSC-6632, PSC-6660, PSC-6916, PSC-6932, PSC-6960 1. Introduction

DISTRIBUTED AND SCALABLE GRAPH PATTERN MATCHING …...graphs through a vertex-centric, Bulk Synchronous Parallel (BSP) programming model. However, developing scalable and efficient

Bulk Synchronous and SPMD Programming Prof. Aiken CS 315B Lecture 2 1 Bulk Synchronous and SPMD Programming CS315B Lecture 2 The Bulk Synchronous Mode l Prof. Aiken CS 315B Lecture

Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel

Annual Report on Port State Control - classnk.or.jp · This annual Port State Control (PSC) report summarizes deficiencies identified by PSC ... 1.1.1 Bulk Carrier Safety related

Introduction to the Bulk Synchronous Parallel modelalbert-jan.yzelman.net/education/parco14/A2.pdfIntroduction to the Bulk Synchronous Parallel model > The Bulk Synchronous Parallel

Bulk Synchronous Parallel Processing Model

PSC là gì? Hoạt động PSC

Distributed Systems - Rutgers Universitypxk/417/notes/pdf/21... · Bulk Synchronous Parallel (BSP) Computing model for parallel computation • Series of supersteps 1. Concurrent

A Polymorphic Type System for Bulk Synchronous Parallel ML

Bulk-synchronous pseudo-streaming algorithms for many-core … · 2017. 3. 24. · Bulk-synchronous pseudo-streaming algorithms for many-core accelerators Jan-Willem Buurlagea,, Tom

The Bulk Synchronous Parallel Model (PSC §1.2)bisse101/Book/PSC/psc1_2.pdfLecture 1.2 Bulk Synchronous Parallel Model BSP computer I A BSP computer consists of a collection of processors,

Bulk Synchronous Parallel (BSP) Model