39
Stale Synchronous Parallel Iterations on Flink TRAN Nam-Luc / Engineer @EURA NOVA Research & Development FLINK FORWARD 2015 BERLIN, GERMANY OCTOBER 2015

Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Embed Size (px)

Citation preview

Stale Synchronous Parallel Iterations on Flink

TRAN Nam-Luc / Engineer @EURA NOVA Research & Development

FLINK FORWARD 2015BERLIN, GERMANYOCTOBER 2015

Our people:

40 employees from business engineers to data scientists

7 freelances 3 founding partners

EURA NOVA?OUR INNOVATION-DRIVEN MODEL & DISRUPTIVE CULTURE

KEY FIGURES“EURA NOVA is a team of passionated IT

experts devoted to providing knowledge & skills to people with great ideas”

Data Science, Distributed computing, Software engineering, Big Data.

Our researches Since 2009

2 Phd thesis & 18 master thesis

with 4 renowned Universities20 publications

in conferences as lecturer4 large R&D projects

3 open-source products

How not to synchronize workers

Worker 1

Worker 2Worker 3

Worker 4

Worker 6

Worker 5

Worker 7

Worker 8

Worker 9

Worker 10

STRAGGLER

Bulk Synchronous Parallelism synchronizes threads after each iteration.

THE BIG PICTURE

4

There are always stragglers in a cluster.

In large clusters, that causes a lot of workers waiting !

Gonna dig me a hole (gonna dig me a hole),

Gonna put a nerd in it (gonna put a nerd in it),

Gonna take a firecracker (gonna take a firecracker)…

Worker 1

Worker 2

Worker 3

CONTRIBUTION

6

1. STALE SYNCHRONOUS PARALLEL ITERATIONS

Tackling the straggler problem within Flink

2. DISTRIBUTED FRANK-WOLFE ALGORITHM

Applied on LASSO regression, as use case

PART 1: STALE SYNCHRONOUS PARALLEL ITERATIONS ON FLINK

There are stragglers in distributed processing frameworks …

→ Hardware heterogeneity→ Skewed data distribution→ Garbage collection

8

THE STRAGGLER PROBLEM

Iteration time

Not predictableCostly to reschedule !

… especially in the context of data center operating systems:

Distribution of iterative-convergent algorithms:

9

BULK VS STALE SYNCHRONOUS

STALE STALE

Classic

Explicit synchronization

barrier

10

PARAMETER SERVER

STALE STALE

Explicit synchronization

barrier

How to keep workers up-to-date?

x

x

x

Parameter server

1. SSP iteration control model

2. Parameter server

11

INTEGRATION WITH FLINK

What does Flink need to enable SSP?

if clocki <= cluster-wide clock + staleness

do iteration

++clocki , then send to clock

i synchronization sink

else wait until clocki <= cluster-wide clock + staleness

12

ITERATION CONTROL MODEL IN FLINK

Worker pi

Clock Synchronization Sink

clocki

cluster-wide clock

store clocki in C

cluster-wide clock = min(C)

broadcast cluster-wide clock if changed

ITERATION CONTROL MODEL IN FLINK

ClockEvent

13

IterationHead

worker done

worker done

worker done

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

all workers done

all workers done

all workers done

IterationSynchronizationTask

ITERATION CONTROL MODEL IN FLINK

ClockEvent

14

IterationHead

Clock pi

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

ClockSynchronizationTask

cluster-wide clock

Clock pi

Clock pi

cluster-wide clock

cluster-wide clock

15

ITERATION CONTROL MODEL IN FLINK

SuperstepBarrier

IterationHeadPACTTask

SyncEventHandler

IterationSynchronizationTask SSPIterationHeadPACTTask

ClockHolder

ClockSyncEventHandler

ClockSynchronizationTask

BULK SYNCHRONOUS PARALLEL

Convergence determined at synchronization barrier

16

CONVERGENCE CHECK

STALE SYNCHRONOUS PARALLEL

Convergence reached when no more worker can improve the solution

dataSet.Iterate(nIterations)

17

STALE SYNCHRONOUS API

dataSet.IterateWithSSP(nIterations, staleness)

Simple APIRichMapFunctionWithParameterServer extends RichMapFunction {

update(id, clock, parameter)

get(id)

}

18

PARAMETER SERVER

DATA GRID

SHARED MODEL

Worker Worker Worker Worker

Architecture

PART 2: DISTRIBUTED FRANK-WOLFE ALGORITHM

Solving the current optimization problem:

Distributed version (Bellet et al. 2015):

20

DISTRIBUTED FRANK-WOLFE ALGORITHM

Linear combination of atoms

sparse coefficients

Distributed version (Bellet et al. 2015):

21

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Linear combination of atoms

sparse coefficients

Distributed version (Bellet et al. 2015):

22

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Distributed version (Bellet et al. 2015):

23

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

1. Local selection of atoms

Distributed version (Bellet et al. 2015):

24

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

2. Global consensus

Distributed version (Bellet et al. 2015):

25

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

3. α Coefficients update

Stale synchronous version:

26

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

1. Get α coefficients from parameter server

Parameter Server

Stale synchronous version:

27

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

2. Local selection of atoms

Parameter Server

Stale synchronous version:

28

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

3. Compute α coefficients from locally selected atoms

Parameter Server

Stale synchronous version:

29

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

4. Update α coefficients to parameter server

Parameter Server

Stale synchronous version:

30

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Repeat while within staleness bounds

Parameter Server

See our full paper for

� full implementation details� properties � application to LASSO REGRESSION� convergence proof

N-L Tran, T Peel, S Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of IEEE BigData 2015, Santa Clara, November 2015

DISTRIBUTED FRANK-WOLFE ALGORITHM

31

Application on LASSO regression

Random sparse 1.000 x 10.000 matrices

Sparsity ratio = 0,001

Generated load: at any time, 1 random node under 100% load during 12 seconds

32

EXPERIMENTS

5 nodes, 2 Ghz, 3Gb RAM

33

RESULTS

Convergence of the objective function

Stragglers in a cluster are an issue.

Mitigate them with Stale Synchronous Parallel Iterations.

34

RECAP

Pull request #967

35

WANNA TRY IT OUT?

Stale Synchronous Parallel iterations + API

Pull request #1101

Frank-Wolfe algorithm + LASSO regression

THANK YOU!Do you have any questions?

[email protected]

AGENDA

37

1. STALE SYNCHRONOUS PARALLEL ITERATIONS∙ The straggler problem∙ BSP vs SSP∙ Integration with Flink∙ Iteration control model∙ API

2. DISTRIBUTED FRANK-WOLFE ALGORITHM∙ Problem statement∙ Application: LASSO regression∙ Experiments

38

RESULTS

Sparsity of the coefficients

The parameter server keeps track of the intermediate results

→ Key-object store→ Distributed, with local caching

39

PARAMETER SERVER