115
Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL Konrad `@ktosopl` Malawski Akka Streams

[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)

Embed Size (px)

DESCRIPTION

This is a work in progress of a talk for the Scala User Group in Tokyo. It touches on basics and some ideas behind Reactive Streams as well as the implementation shipped by Akka.

Citation preview

Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL

Konrad `@ktosopl` Malawski

Akka Streams

Konrad `@ktosopl` Malawski

hAkker @

Konrad `@ktosopl` Malawski

typesafe.com geecon.org

Java.pl / KrakowScala.pl sckrk.com / meetup.com/Paper-Cup @ London

GDGKrakow.pl meetup.com/Lambda-Lounge-Krakow

hAkker @

You?

?

You?

?

z ?

You?

?

z ?

?

You?

?

z ?

?

?

Streams

Streams

Streams“You cannot enter the same river twice” ~ Heraclitus

http://en.wikiquote.org/wiki/Heraclitus

StreamsReal Time Stream Processing !When you attach “late” to a Publisher, you may miss initial elements – it’s a river of data.

http://en.wikiquote.org/wiki/Heraclitus

Reactive Streams

Reactive Streams

!

!

Stream processing

Reactive Streams

Back-pressured !

Stream processing

Reactive Streams

Back-pressured Asynchronous

Stream processing

Reactive Streams

Back-pressured Asynchronous

Stream processing Standardised (!)

Reactive Streams: Goals

1. Back-pressured Asynchronous Stream processing !

2. Standard implemented by many libraries

Reactive Streams - Specification & TCK

http://reactive-streams.org

Reactive Streams - Who?

http://reactive-streams.org

Kaazing Corp. rxJava @ Netflix,

reactor @ Pivotal (SpringSource), vert.x @ Red Hat,

Twitter, akka-streams @ Typesafe,

spray @ Spray.io, Oracle,

java (?) – Doug Lea - SUNY Oswego …

Reactive Streams - Inter-op

http://reactive-streams.org

We want to make different implementations co-operate with each other.

Reactive Streams - Inter-op

http://reactive-streams.org

The different implementations “talk to each other” using the Reactive Streams protocol.

Reactive Streams - Inter-op

http://reactive-streams.org

The Reactive Streams SPI is NOT meant to be user-api. You should use one of the implementing libraries.

Back-pressure, なにですか?

Back-pressure? Example Without

Publisher[T] Subscriber[T]

Back-pressure? Example Without

Fast Publisher Slow Subscriber

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Subscriber usually has some kind of buffer

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

Back-pressure? Push + NACK model

What if the buffer overflows?

Back-pressure? Push + NACK model (a)

Use bounded buffer, drop messages + require re-sending

Back-pressure? Push + NACK model (a)

Use bounded buffer, drop messages + require re-sending

Kernel does this! Routers do this!

(TCP)

Back-pressure? Push + NACK model (b)Increase buffer size… Well, while you have memory available!

Back-pressure? Push + NACK model (b)

Back-pressure? Why NACKing is NOT enough

Back-pressure? Example NACKingたいへんですよ! Buffer overflow is imminent!

Back-pressure? Example NACKingTelling the Publisher to slow down / stop sending…

Back-pressure? Example NACKingNACK did not make it in time,

because M was in-flight!

Back-pressure? speed(publisher) < speed(subscriber)

Back-pressure? Fast Subscriber, No Problem

No problem!

Back-pressure? Reactive-Streams

= “Dynamic Push/Pull”

Just push – not safe when Slow Subscriber Just pull – too slow when Fast Subscriber

Back-pressure? RS: Dynamic Push/Pull

Just push – not safe when Slow Subscriber Just pull – too slow when Fast Subscriber

!Solution:

Dynamic adjustment (Reactive Streams)

Back-pressure? RS: Dynamic Push/Pull

Back-pressure? RS: Dynamic Push/Pull

Slow Subscriber sees it’s buffer can take 3 elements. Publisher will never blow up it’s buffer.

Back-pressure? RS: Dynamic Push/Pull

Fast Publisher will send at-most 3 elements. This is pull-based-backpressure.

Back-pressure? RS: Dynamic Push/Pull

Fast Subscriber can issue more Request(n), before more data arrives!

Back-pressure? RS: Dynamic Push/Pull

Fast Subscriber can issue more Request(n), before more data arrives!

Back-pressure? RS: Accumulate demand

Publisher accumulates total demand per subscriber.

Back-pressure? RS: Accumulate demandTotal demand of elements is safe to publish. Subscriber’s buffer will not overflow.

Back-pressure? RS: Requesting “a lot”

Fast Subscriber, can request “a lot” from Publisher. This is effectively “publisher push”, and is really fast. Buffer size is known and this is safe.

Back-pressure? RS: Dynamic Push/Pull

Back-pressure? RS: Dynamic Push/Pull

Safe! Will never overflow!

わなにですか?

AkkaAkka is a high-performance concurrency library for Scala and Java. !

At it’s core it focuses on the Actor Model:

AkkaAkka is a high-performance concurrency library for Scala and Java. !

At it’s core it focuses on the Actor Model:

An Actor can only: • Send / receive messages • Create Actors • Change it’s behaviour

AkkaAkka has multiple modules: !

Akka-camel: integration Akka-remote: remote actors Akka-cluster: clustering Akka-persistence: CQRS / Event Sourcing Akka-streams: stream processing …

Akka Streams 0.7 early preview

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

Akka Streams – Linear Flow

FlowFrom[Double].map(_.toInt). [...]

No Source attached yet. “Pipe ready to work with Doubles”.

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!!

It’s the world in which Actors live in. AkkaStreams uses Actors, so it needs ActorSystem.

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

Contains logic on HOW to materialise the stream. Can be pure Actors, or (future) Apache Spark (in the future).

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

You can configure it’s buffer sizes etc. (Or implement your own materialiser (“run on spark”))

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

val foreachSink = ForeachSink[Int](println)!val mf = FlowFrom(1 to 3).withSink(foreachSink).run()

Uses the implicit FlowMaterializer

Akka Streams – Linear Flow

implicit val sys = ActorSystem("tokyo-sys")!implicit val mat = FlowMaterializer()!

val foreachSink = ForeachSink[Int](println)!val mf = FlowFrom(1 to 3).withSink(foreachSink).run()(mat)

Akka Streams – Linear Flow

val mf = FlowFrom[Int].! map(_ * 2).! withSink(ForeachSink(println)) // needs source,! // can NOT run

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Linear Flow

val f = FlowFrom[Int].! map(_ * 2).!! ! ! withSink(ForeachSink(i => println(s"i = $i”))).! ! ! // needs Source to run!

!! ! ! f.withSource(IterableSource(1 to 10)).run()

Akka Streams – Flows are reusable

!! ! ! f.withSource(IterableSource(1 to 10)).run()! ! ! ! f.withSource(IterableSource(1 to 100)).run()! ! ! ! f.withSource(IterableSource(1 to 1000)).run()

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

Each “group” is a stream too! “Stream of Streams”.

Akka Streams <-> Actors – Advanced! groupBy(_.last).

GroupBy groups “11” to group “1”, “12” to group “2” etc.

Akka Streams <-> Actors – Advanced! groupBy(_.last).

It offers (groupKey, subStreamFlow) to Subscriber

Akka Streams <-> Actors – Advanced! groupBy(_.last).

It can then start children, to handle the sub-flows!

Akka Streams <-> Actors – Advanced! groupBy(_.last).

For example, one child for each group.

Akka Streams <-> Actors – Advanced

val subscriber = system.actorOf(Props[SubStreamParent], ”parent")!!FlowFrom(1 to 100).! map(_.toString).! filter(_.length == 2).! drop(2).! groupBy(_.last).! publishTo(ActorSubscriber(subscriber))!

普通 Akka Actor, will consume SubStream offers.

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber ! with ImplicitFlowMaterializer ! with ActorLogging {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext((groupId: String, subStream: FlowWithSource[_, _])) =>!! val subSub = context.actorOf(Props[SubStreamSubscriber], ! s"sub-$groupId")! subStream.publishTo(ActorSubscriber(subSub))! }!}!

Akka Streams <-> Actors – Advanced

class SubStreamParent extends ActorSubscriber {!! override def requestStrategy = OneByOneRequestStrategy!! override def receive = {! case OnNext(n: String) => println(s”n = $n”) ! }!}!

Akka Streams – GraphFlow

GraphFlow

Akka Streams – GraphFlow

Linear Flows or

non-akka pipelines

Could be another RS implementation!

Akka Streams – GraphFlow

Fan-out elements and

Fan-in elements

Akka Streams – GraphFlow

Fan-out elements and

Fan-in elements

Now you need a FlowGraph

Akka Streams – GraphFlow

// first define some pipeline pieces!val f1 = FlowFrom[Input].map(_.toIntermediate)!val f2 = FlowFrom[Intermediate].map(_.enrich)!val f3 = FlowFrom[Enriched].filter(_.isImportant)!val f4 = FlowFrom[Intermediate].mapFuture(_.enrichAsync)!!// then add input and output placeholders!val in = SubscriberSource[Input]!val out = PublisherSink[Enriched]!

Akka Streams – GraphFlow

Akka Streams – GraphFlowval b3 = Broadcast[Int]("b3")!val b7 = Broadcast[Int]("b7")!val b11 = Broadcast[Int]("b11")!val m8 = Merge[Int]("m8")!val m9 = Merge[Int]("m9")!val m10 = Merge[Int]("m10")!val m11 = Merge[Int]("m11")!val in3 = IterableSource(List(3))!val in5 = IterableSource(List(5))!val in7 = IterableSource(List(7))!

Akka Streams – GraphFlow

Akka Streams – GraphFlow

// First layer!in7 ~> b7!b7 ~> m11!b7 ~> m8!!in5 ~> m11!!in3 ~> b3!b3 ~> m8!b3 ~> m10!

Akka Streams – GraphFlow

!// Second layer!m11 ~> b11!b11 ~> FlowFrom[Int].grouped(1000) ~> resultFuture2 !b11 ~> m9!b11 ~> m10!!m8 ~> m9!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

!// Third layer!m9 ~> FlowFrom[Int].grouped(1000) ~> resultFuture9!m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10!

Akka Streams – GraphFlow

val resultFuture2 = FutureSink[Seq[Int]]!val resultFuture9 = FutureSink[Seq[Int]]!val resultFuture10 = FutureSink[Seq[Int]]!!val g = FlowGraph { implicit b =>! // ...! m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10! // ...!}.run()!!Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted! should be(List(5, 7))

Sinks and Sources are “keys” which can be addressed within the graph

Akka Streams – GraphFlow

val resultFuture2 = FutureSink[Seq[Int]]!val resultFuture9 = FutureSink[Seq[Int]]!val resultFuture10 = FutureSink[Seq[Int]]!!val g = FlowGraph { implicit b =>! // ...! m10 ~> FlowFrom[Int].grouped(1000) ~> resultFuture10! // ...!}.run()!!Await.result(g.getSinkFor(resultFuture2), 3.seconds).sorted! should be(List(5, 7))

Sinks and Sources are “keys” which can be addressed within the graph

Akka Streams – GraphFlow

!val g = FlowGraph {}!

FlowGraph is immutable and safe to share and re-use! Think of it as “the description” which then gets “run”.

Available Elements 0.7 early preview

Available Sources• FutureSource • IterableSource • IteratorSource • PublisherSource • SubscriberSource • ThunkSource • TickSource (timer based) • … easy to add your own!

0.7 early preview

Available operations• buffer • collect • concat • conflate • drop / dropWithin • take / takeWithin • filter • fold • foreach • groupBy • grouped • map • onComplete • prefixAndTail

• broadcast • merge / “generalised merge” • zip • … possible to add your own!

0.7 early preview

Available Sinks• BlackHoleSink • FoldSink • ForeachSink • FutureSink • OnCompleteSink • PublisherSink / FanoutPublisherSink • SubscriberSink • … easy to add your own!

0.7 early preview

Links1. http://akka.io 2. http://reactive-streams.org 3. https://groups.google.com/group/akka-user

ありがとう ございました! Questions? http://akka.io

ktoso @ typesafe.com twitter: ktosopl github: ktoso team blog: letitcrash.com

©Typesafe 2014 – All Rights Reserved