45
STREAM PROCESSING WITH APACHE FLINK IN ZALANDO’S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU

Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Embed Size (px)

Citation preview

Page 1: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

STREAM

PROCESSING WITH

APACHE FLINK IN

ZALANDO’S WORLD

OF MICROSERVICES

JAVIER LOPEZ

MIHAIL VIERU

Please write title, subtitle

and speaker name in all

capital letters

Page 2: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

2

Please write the title in

all capital letters

AGENDA

Please write the title in

all capital letters

● Zalando’s Microservices Architecture

● Saiki - Data Integration and Distribution at Scale

● Flink in a Microservices World

● Stream Processing Use Cases:

o Business Process Monitoring

o Continuous ETL

● Future Work

Page 3: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

3

Please write the title in

all capital letters

ABOUT US

Please write the title in

all capital letters

Mihail Vieru Big Data Engineer,

Business Intelligence

Javier López Big Data Engineer,

Business Intelligence

Page 4: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

4

Please write the title in

all capital letters

Please write the title in

all capital letters

Page 5: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

5

Please write the title in

all capital letters

Please write the title in

all capital letters

One of Europe's largest online fashion retailers

15 countries

~19 million active customers

~3 billion € revenue 2015

1,500 brands

150,000+ products

11,000+ employees in Europe

Page 6: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

6

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ZALANDO TECHNOLOGY

Put images in the grey

dotted box "unsupported

placeholder"

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

1500+ TECHNOLOGISTS

Rapidly growing

international team

http://tech.zalando.com

Page 7: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

VINTAGE ARCHITECTURE

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 8: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

8

Please write the title in

all capital letters

VINTAGE BUSINESS INTELLIGENCE

Please write the title in

all capital letters

Classical ETL process

Business

Logic

Data Warehouse (DWH)

Database DBA

BI

Business

Logic

Database

Business

Logic

Database

Business

Logic

Database

Dev

Page 9: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

9

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

VINTAGE BUSINESS INTELLIGENCE

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

DWH Oracle

Exasol

Page 10: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

RADICAL AGILITY

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 11: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

11

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

RADICAL AGILITY

Put images in the grey

dotted box "unsupported

placeholder"

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

AUTONOMY

MASTERY

PURPOSE

Page 12: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

12

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

RADICAL AGILITY - AUTONOMY

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Technologies Operations Teams

Page 13: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

13

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Page 14: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

14

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Business

Logic

Database

Team A Business

Logic

Database

Team B

RE

ST

AP

I RE

ST

AP

I

public Internet

Applications communicate using REST APIs

Databases hidden behind the walls of AWS VPC

Page 15: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

15

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Business

Logic

Database

Team A Business

Logic

Database

Team B

RE

ST

AP

I RE

ST

AP

I

public Internet

Classical ETL process is impossible!

Page 16: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

16

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box Business

Logic

Database

RE

ST

AP

I

App A

Business

Logic

Database

RE

ST

AP

I

Ap

p B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

Business Intelligence

Page 17: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

SAIKI

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 18: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

18

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI DATA PLATFORM

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI

App A App B App D App C BI

Data Warehouse

Page 19: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

19

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI — DATA INTEGRATION & DISTRIBUTION

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box BI

Data Warehouse E.g. Forecast DB

SAIKI

App A App B App D App C

Exporter

REST API

Stream Processing

via Apache Flink Data Lake .

AWS S3

Page 20: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

20

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

SAIKI — SUMMARY

Δ J

D

B

C

REST

B

E

F

O

R

E

A

F

T

E

R

Data sources

Technologies

Data sources

Connections

Data sources

Extraction

Data

Delivery

Page 21: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

FLINK IN A

MICROSERVICES WORLD

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 22: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

22

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

OPPORTUNITIES FOR NEXT GEN BI

Cloud Computing - Distributed ETL

- Scale

Access to Real Time Data - All teams publish data to central event

bus

Hub for Data Teams - Data Lake provides distributed access

and fine grained security

- Data can be transformed (aggregated,

joined, etc.) before delivering it to data

teams

Semi-Structured Data

“General-purpose data processing engines

like Flink or Spark let you define own data

types and functions.”

- Fabian Hueske, dataArtisans

Page 23: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

23

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

THE RIGHT FIT

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

STREAM PROCESSING

Page 24: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

24

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

THE RIGHT FIT — STREAM PROCESSING ENGINE

Candidates:

Storm & Samza ruled out because of batch processing

requirement

Page 25: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

25

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SPARK VS. FLINK DIFFERENCES

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

Processing mode micro-batching tuple at a time

Temporal processing support processing time event time, ingestion time,

processing time

Latency seconds sub-second

Back pressure handling manual configuration implicit, through system

architecture

State access full state scan for each microbatch value lookup by key

Operator library neutral ++ (split, windowByCount..)

Support neutral ++ (mailing list, direct contact &

support from data Artisans)

Page 26: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

26

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SPARK VS. FLINK PERFORMANCE

source: Benchmarking Streaming Computation Engines at Yahoo!

Page 27: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

27

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

APACHE FLINK

• true stream processing framework

• process events at a consistently high rate with low

latency

• scalable

• great community and on-site support from Berlin/

Europe

• university graduates with Flink skills

https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/

Page 28: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

28

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

FLINK ON AWS - OUR APPLIANCE

MASTER ELB

EC2 Docker

Flink Master

EC2 Docker

Flink Shadow Master

WORKERS ELB

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

Page 29: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

USE CASES

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 30: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

BUSINESS PROCESS

MONITORING

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 31: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

31

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

BUSINESS PROCESS

A business process is in its simplest form a chain of

correlated events:

start event completion event

ORDER_CREATED ALL_PARCELS_SHIPPED

Business Events from the whole Zalando platform flow through

Saiki => opportunity to process those streams in near real time

Page 32: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

32

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

REAL-TIME BUSINESS PROCESS MONITORING

• Check if business processes in the Zalando platform work

• Analyze data on the fly:

o Order velocities

o Delivery velocities

o Control SLAs of correlated events, e.g. parcel sent out

after order

Page 33: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

33

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Saiki BPM

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ARCHITECTURE BPM

Cfg Service

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

PU

BLIC

INT

ER

NE

T

OA

UT

H

Alert Svc

UI

Elasticsearch

Stream Processing

Page 34: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

34

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

HOW WE USE FLINK IN BPM

• 1000+ Event Types; 1 Event Type -> 1 Kafka topic

• Analyze processes with correlated event types (Join &

Union)

• Enrich data based on business rules

• Sliding Windows (1min to 48hrs) for Platform Snapshots

• State for alert metadata

• Generation and processing of Complex Events (CEP lib)

Page 35: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

STREAMING ETL

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 36: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

36

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Extract Transform Load (ETL)

Traditional ETL process:

• Batch processing

• No real time

• ETL tools

• Heavy processing on the storage side

Page 37: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

37

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

WHAT CHANGED WITH RADICAL AGILITY?

• Data comes in a semi-structured format (JSON payload)

• Data is distributed in separate Kafka topics

• There would be peak times, meaning that the data flow

will increase by several factors

• Data sources number increased by several factors

Page 38: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

38

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

`

Saiki Streaming ETL

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ARCHITECTURE STREAMING ETL

Stream Processing

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log Exporter

Oracle DWH

Importer

Page 39: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

39

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

HOW WE (WOULD) USE FLINK IN STREAMING ETL

• Transformation of complex payloads into simple ones for

easier consumption in Oracle DWH

• Combine several topics based on Business Rules (Union,

Join)

• Pre-Aggregate data to improve performance in the

generation of reports (Windows, State)

• Data cleansing

• Data validation

Page 40: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

FUTURE USE CASES

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Page 41: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

41

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

COMPLEX EVENT PROCESSING FOR BPM

Cont. example business process:

• Multiple PARCEL_SHIPPED events per order

• Generate complex event ALL_PARCELS_SHIPPED,

when all PARCEL_SHIPPED events received

(CEP lib, State)

Page 42: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

42

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

DEPLOYMENTS FROM OTHER BI TEAMS

Flink Jobs from other BI Teams

Requirements:

• manage and control deployments

• isolation of data flows

o prevent different jobs from writing to the same sink

• resource management in Flink

o share cluster resources among concurrently running jobs

StreamSQL would significantly lower the entry barrier

Page 43: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

43

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

OTHER FUTURE TOPICS

• New use cases for Real Time Analytics/ BI

o Sales monitoring

o Price monitoring

• Fraud detection for payments (evaluation)

• Contact customer according to variable event pattern

(evaluation)

Page 44: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

44

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

CONCLUSION

Flink proved to be the right fit for our current stream

processing use cases. It enables us to build Zalando’s Next

Gen BI platform.

https://tech.zalando.de/blog/?tags=Saiki

Page 45: Stream Processing using Apache Flink in Zalando's World of Microservices - Reactive Summit

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

THANK YOU

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters