34
1 1 March 2017 Reactive On-line Machine Learning with Akka Streams Jan Pustelnik & Kamil Owczarek

Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

1

1 March 2017

Reactive On-lineMachine Learningwith Akka Streams

Jan Pustelnik & Kamil Owczarek

Page 2: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

2

We all love retro, don’t we?

Page 3: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

3

Reactive Streams Made Easy

DATA

BACKPRESSURE

FLOW SINKSOURCE

Page 4: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

4

Reactive Fast Data with Akka!

Akka allows you to put your on-line / streaming data structure / algorithm in

context. You don’t need to think about how to take care of data flow and

backpressure.

Well thought out architecture of Akka lets you concentrate on the stuff

relevant to your problem domain.

Akka is geared towards high performance, and can be used in IoT setup,

where e.g. Spark streaming would not fit.

It is easy to port e.g. machine learning algorithms from other streaming

setups (like Spark streaming) to Akka.

Page 5: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

5

Backpressure so retro! (1981)

Page 6: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

6

Retro is future-proof! (2001)

SEDA is fast because

async and has buffers

and stages are single

threaded

Page 7: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

7

On-line algorithms / data structures (1992)

Page 8: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

8

Example: Kadane algorithm (on-line, streaming, 1984)

Source: https://en.wikipedia.org/wiki/Maximum_subarray_problem

(…) the maximum subarray problem is the task of finding the contiguous subarray within a one-

dimensional array of numbers which has the largest sum. For example, for the sequence of values

−2, 1, −3, 4, −1, 2, 1, −5, 4; the contiguous subarray with the largest sum is 4, −1, 2, 1, with sum 6.

Page 9: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

9

KadaneFlowStage

IN OUT

Akka Plumbing

Stateful Kadane Logic

Page 10: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

10

Flow Shape (plumbing)

Page 11: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

11

Flow Shape – output handler (plumbing)

Page 12: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

12

Flow Shape – output handler (plumbing)

Page 13: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

13

Flow Shape – output handler (plumbing)

Better not

fail silently.

Page 14: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

14

Flow stage – “Business logic”

Proper Kadane algo logic

here

Page 15: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

15

Flow stage – “Business logic”

Proper Kadane algo logic

here

Page 16: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

16

Let the flow flow…

Page 17: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

17

Bloom filter (on-line, streaming, 1970)

BLOOM DICT

1, 5

7 ?

X ?

1, 5

√ / X? / X

Page 18: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

18

It is easy to create new shapes but you can (re)use existing

Page 19: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

19

Tripod? Just like the in old days

Page 20: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

20

Remember your Topology class? A shape is just a shape…

Page 21: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

21

Bloom filter – CrossShape!

Q

U

E

R

I

E

S

DATA

A

N

S

W

E

R

S

DATABLOOM

Page 22: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

22

Remember, a shape is just a shape

In2|v

+---------+In1 ~> | cross | ~> Out1

+---------+|v

Out2

=

Page 23: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

23

BloomFilterCrossStage, ftw!

Page 24: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

24

Two crossing flows… Common shared state… Single thread!

Page 25: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

25

Machine Learning with Akka streams!

Page 26: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

26

Machine Learning with Akka streams!

Page 27: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

27

Online ML models

ON-LINE MACHINE LEARNING

ADVERSARIAL MODELS STATISTICAL MODELS

Page 28: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

28

Statistical Models

Idea: the input variable (X) and predicted variable (Y) come from a

probability distribtion p(X, Y)

Aim: predict Y as good as possible: Pr(Y)

Cost function: cost of an error: V(Y, Pr(Y))

Generalized solution: minimze 𝑬[𝑽 𝒀, 𝐏𝐫 𝒀 ) = 𝑽(𝒀, 𝐏𝐫 𝒀 𝒅𝒑(𝑿, 𝒀)

Putting different V functions gives familiar ML algorithms: Linear

Regression, SVM etc.

Page 29: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

29

Adversarial Models

• Not frequently mentioned outside scientific community/conferences!

• Problem as a game between the learner and nature:

1. Learner sees input X(i)

2. Learner „makes his move” - predicts output Pr[Y(i)]

3. Nature sees X(i) and Pr[Y(i)] and „makes a move” emitting actual output Y(i)

4. Learner „suffers a loss”: V[Y(i), Pr[Y(i)]]

• Important element: nature’s reaction can depend on prediction

• Actual games, asset trading, varying cost evaluation

Page 30: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

30

Recursive Least Squares

• We all know the „least squares” metric from school, right?

• O(dn3) memory complexity

• The formula is recursive:

• Recursive = on-line = O(dn2) memory complexity

Page 31: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

31

Reacursive Least Squares

Page 32: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

32

Follow The Leader

Adversarial online ML algorithm

Not very complex

Pick the hypothesis one that performed best until now

Paradoxically: good for bounded loss

Careful investment, medical costs evaluation

etc.

Regularized for broadened set of applications

Page 33: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

33

Follow The Leader

Page 34: Reactive On-line Machine Learning with Akka Streamsreactsphere.org/wp-content/uploads/2017/01/ReactSphere_2017_pus… · 4 Reactive Fast Data with Akka! Akka allows you to put your

34

That’s it…

https://en.wikipedia.org/wiki/Banner_Mania