37
1 Kostas Tzoumas @kostas_tzoumas Strata + Hadoop World NYC 2016 September 29, 2016 Apache Flink®: State of the Union and What's Next

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Embed Size (px)

Citation preview

Page 1: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

1

Kostas Tzoumas@kostas_tzoumas

Strata + Hadoop World NYC 2016September 29, 2016

Apache Flink®: State of the Union and What's Next

Page 2: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

What I'd like to talk about

Some highlights from Flink Forward 2016

Streaming ecosystem evolution and Flink

What's coming up in Flink 2

Page 3: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

3

Original creators of Apache Flink®

Providers of the dA Platform, the supported Flink

distribution

Page 4: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink Forward 2016

4

Page 5: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

5

Flink Forward 2016

Page 6: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

7 sponsors

Page 7: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Speaker organizations

Page 8: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Retail, e-commerce

Better product recommendations

Process monitoring Inventory

management

Finance Differentiation

via tech Push-based

products Fraud detection

Telco, IoT, Infrastructure Infrastructure

monitoring Anomaly

detection

Internet & mobile Personalization User behavior

monitoring Analytics

8

Page 9: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily

Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees

Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second

9

Page 10: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

10

Page 11: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Streaming ecosystem and Flink

11

Page 12: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Streaming technology is enabling the obvious: continuous processing on data

that is continuously produced

Hint: you already have streaming data12

Page 13: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

13

collect log analyze query

app state

history log

Page 14: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

14

(Aside: streaming and "batch")

2016-3-112:00 am

2016-3-11:00 am

2016-3-12:00 am

2016-3-1111:00pm

2016-3-1212:00am

2016-3-121:00am

2016-3-1110:00pm

2016-3-122:00am

2016-3-123:00am…

partition

partition

Stream (low latency)

Batch(bounded stream)Stream (high latency)

Page 15: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

What is Flink's unique contribution in the streaming data ecosystem?

15

Page 16: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Before Flink, users had to make hard choices between volume, latency, and accuracy

16

Page 17: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink eliminates these tradeoffs

10s of millions events per second for stateful applications

Sub-second latency, as low as single-digit milliseconds

Accurate computation results

17

Page 18: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

A broader definition of accuracy: the results that I want when I want them

1. Accurate under failures and downtime2. Accurate under out of order data3. Results when you need them4. Accurate modeling of the world

18

Page 19: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

1. Failures and downtime

Checkpoints & savepoints Exactly-once guarantees

2. Out of order and late data Event time support Watermarks

3. Results when you need them Low latency Triggers

4. Accurate modeling True streaming engine Sessions and flexible

windows

19

Page 20: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

5. Batch + streaming One engine Dedicated APIs

6. Reprocessing High throughput, event

time support, and savepoints

7. Ecosystem Rich connector

ecosystem and 3rd party packages

8. Community support One of the most active

projects with over 200 contributors

20

flink -s <savepoint> <job>

Page 21: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

21

Having a dependable framework enables more stateful applications to

run as streaming applications

Page 22: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

What's coming up in Flink

22

Page 23: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Provide state of the art streaming capabilities (✔) Operate in the largest infrastructures of the world Open up to a wider set of enterprise users Broaden the scope of stream processing

23

Page 24: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink's unique combination of features

24

Low latencyHigh Throughput

Well-behavedflow control

(back pressure)

Consistency

Works on real-timeand historic data

Performance Event Time

APIsLibraries

StatefulStreaming

Savepoints(replays, A/B testing,upgrades, versioning)

Exactly-once semanticsfor fault tolerance

Windows &user-defined state

Flexible windows(time, count, session, roll-your own)

Complex Event Processing

Fluent API

Out-of-order events

Fast and largeout-of-core state

Page 25: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink v1.1

25

Connectors MetricSystem (Stream) SQL Session

WindowsLibrary

enhancements

Page 26: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink v1.1 + current threads

26

ConnectorsSession

Windows(Stream) SQL

Libraryenhancements

MetricSystem

Metrics &Visualization

Dynamic Scaling

Savepointcompatibility Checkpoints

to savepoints

More connectors Stream SQLWindows

Large stateMaintenance

Fine grainedrecovery

Side in-/outputsWindow DSL

Security

Mesos &others

Dynamic ResourceManagement

Authentication

Queryable State

Page 27: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink v1.1 + current threads

27

ConnectorsSession

Windows(Stream) SQL

Libraryenhancements

MetricSystem

Operations

Ecosystem ApplicationFeatures

Metrics &Visualization

Dynamic Scaling

Savepointcompatibility Checkpoints

to savepoints

More connectors Stream SQLWindows

Large stateMaintenance

Fine grainedrecovery

Side in-/outputsWindow DSL

BroaderAudience

Security

Mesos &others

Dynamic ResourceManagement

Authentication

Queryable State

Page 28: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Flink v1.1 + current threads

28

ConnectorsSession

Windows(Stream) SQL

Libraryenhancements

MetricSystem

Operations

Ecosystem ApplicationFeatures

Metrics &Visualization

Dynamic Scaling

Savepointcompatibility Checkpoints

to savepoints

More connectors Stream SQLWindows

Large stateMaintenance

Fine grainedrecovery

Side in-/outputsWindow DSL

BroaderAudience

Security

Mesos &others

Dynamic ResourceManagement

Authentication

Queryable State

Page 29: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Security / Authentication

29

No unauthorized data accessSecured clusters with Kerberos-based authentication• Kafka, ZooKeeper, HDFS, YARN, HBase, …

No unencrypted traffic between Flink Processes• RPC, Data Exchange, Web UI

Largely contributed by

Prevent malicious users to hook into Flink jobs

Page 30: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Checkpoints / Savepoints

30

Recover a running job into a new job

Recover a running job onto a new clusterApplication state backwards compatibility• Flink 1.0 made the APIs backwards compatible• Now making the savepoints backwards compatible

• Applications can be moved to newer versions ofFlink even when state backends or internals change

v1.x v2.0v1.y

Page 31: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Dynamic scaling

31

Changing load bears changing resource requirements• Need to adjust parallelism of running streaming jobs

Re-scaling stateless operators is trivialRe-scaling stateful operators is hard (windows, user state)• Efficiently re-shard state

time

WorkloadResources

Re-scaling Flink jobs preservesexactly-once guarantees

Page 32: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Cluster management

32

Series of improvements to seamlessly interoperate with various cluster managers• YARN, Mesos, Docker, Standalone, …

Driven byMesos integration contributed by

and

Page 33: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Stream SQL

33

SQL is the standard high-level query languageA natural way to open up streaming to more peopleProblem: There is no Streaming SQL standard• At least beyond the basic operations• Challenging: Incorporate windows and time

semanticsFlink community working withApache Calcite to draft a new model

Page 34: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

State in stream processing

34

Stateless Streaming(Apache Storm)

Stateful Streaming(Apache Samza)

Accurate Stateful Streaming(Apache Flink)

State sizes in Flink today: 10s gigabytes per operatorHow to scale this to many terabytes?• Queryable State• Data driven triggers over large state

Page 35: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Large-state streaming

35

How to scale the stream processor state?

… and maintain fast checkpoint intervals?… and have very fast recovery on machine failures?

More and more database techniques coming into Flink

Page 37: Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

37

@kostas_tzoumas | @ApacheFlink | @dataArtisans

Thank you! We are hiring!