Flink in Zalando's World of Microservices

Preview:

Citation preview

world of microservices

Flink in Zalando’s

Agenda

Zalando’s Microservices Architecture

Saiki - Data Integration and Distribution at Scale

Flink in a Microservices World

Stream Processing Use Cases:

Business Process Monitoring

Continuous ETL

Future Work

2

About us

Mihail Vieru Big Data Engineer,

Business Intelligence

Javier López Big Data Engineer,

Business Intelligence

3

4

15 countries

4 fulfillment centers

~18 million active customers

2.9 billion € revenue 2015

150,000+ products

10,000+ employees

One of Europe's largest

online fashion retailers

Zalando Technology

1100+ TECHNOLOGISTS

Rapidly growing

international team

http://tech.zalando.com

6

VINTAGE ARCHITECTURE

Classical ETL process

Vintage Business Intelligence

Business Logic

Data Warehouse (DWH)

Database DBA

BI

Business Logic

Database

Business Logic

Database

Business Logic

Database

Dev

8

Vintage Business Intelligence

Very stable architecture that is still in use in the oldest

components

Classical ETL process

• Use-case specific

• Usually outputs data into a Data Warehouse

• well structured

• easy to use by the end user (SQL) 9

RADICAL AGILITY

AUTONOMY

MASTERY

PURPOSE

Radical Agility

11

Autonomy

Autonomous teams

• can choose own technology stack

• including persistence layer

• are responsible for operations

• should use isolated AWS accounts 12

Supporting autonomy — Microservices

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

13

Supporting autonomy — Microservices

Business Logic

Database

Team

A Business Logic

Database

Team

B

RE

ST

AP

I RE

ST

AP

I

Applications communicate using REST APIs

Databases hidden behind the walls of AWS VPC

public Internet

14

Supporting autonomy — Microservices

Business Logic

Database

Team

A Business Logic

Database

Team

B

Classical ETL process is impossible!

RE

ST

AP

I

public Internet

RE

ST

AP

I

15

Supporting autonomy — Microservices

Business

Logic

Database

RE

ST

AP

I

Ap

p A

Business

Logic

Database

RE

ST

AP

I

App B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

16

Supporting autonomy — Microservices

Business Intelligence

Business

Logic

Database

RE

ST

AP

I

Ap

p A

Business

Logic

Database

RE

ST

AP

I

App B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

17

Supporting autonomy — Microservices

App A App B App D App C BI

Data Warehouse

?

18

SAIKI

SAIKI

Saiki Data Platform

App A App B App D App C BI

Data Warehouse

20

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku

21

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku

AWS S3

Tukang

REST API

22

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku Tukang

REST API

AWS S3

23

E.g. Forecast DB

Saiki — Data Integration & Distribution

Old Load Process New Load Process

relied on Delta Loads relies on Event Stream

JDBC connection RESTful HTTPS connections

Data Quality could be controlled by BI

independently

trust for Correctness of Data in the

delivery teams

PostgreSQL dependent Independent of the source technology

stack

N to 1 data stream N to M streams, no single data sink

24

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

AWS S3

25

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

Stream Processing

via Apache Flink

AWS S3

26

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

Stream Processing

via Apache Flink Data Lake .

AWS S3

27

FLINK IN A

MICROSERVICES

WORLD

Current state in BI

29

• Centralized ETL tools

(Pentaho DI/Kettle and Oracle Stored Procedures)

• Reporting data does not arrive in real time

• Data to data science & analytical teams is provided using

Exasol DB

• All data comes from SQL databases

Opportunities for Next Gen BI

• Cloud Computing

• Distributed ETL

• Scale

• Access to real time data

• All teams publishing data to central bus (Kafka)

30

Opportunities for Next Gen BI

• Hub for Data teams

• Data Lake provides distributed access and fine

grained security

• Data can be transformed (aggregated, joined, etc.)

before delivering it to data teams

• Non structured data

• “General-purpose data processing engines like Flink or

Spark let you define own data types and functions.” -

Fabian Hueske

31

The right fit

Stream Processing

32

The right fit - Data processing engine (I)

33

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

processing mode micro-batching tuple at a time

temporal processing

support

processing time event time, ingestion time, processing

time

batch processing yes yes

latency seconds sub-second

back pressure handling through manual configuration implicit, through system architecture

state storage distributed dataset in memory distributed key/value store

state access full state scan for each microbatch value lookup by key

state checkpointing yes yes

high availability mode yes yes

The right fit - Data processing engine (II)

34

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

event processing guarantee exactly once exactly once

Apache Kafka connector yes yes

Amazon S3 connector yes yes

operator library neutral ++ (split, windowByCount..)

language support Java, Scala, Python Java, Scala, Python

deployment standalone, YARN, Mesos standalone, YARN

support neutral ++ (mailing list, direct contact & support

from data Artisans)

documentation + neutral

maturity ++ neutral

Flink in AWS - Our appliance

36

MASTER ELB

EC2 Docker

Flink Master

EC2 Docker

Flink Shadow Master

WORKERS ELB

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

USE CASES

BUSINESS PROCESS

MONITORING

Business Process

A business process is in its simplest form a chain of

correlated events:

Business Events from the whole Zalando platform flow through

Saiki => opportunity to process those streams in near real-time

39

start event completion event

ORDER_CREATED ALL_SHIPMENTS_DONE

Near Real-Time Business Process Monitoring

• Check if technically the Zalando platform works

• 1000+ Event Types

• Analyze data on the fly:

• Order velocities

• Delivery velocities

• Control SLAs of correlated events, e.g. parcel sent out

after order

40

Architecture BPM

41

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

Stream Processing Tokoh

ZMON

Cfg Service

PU

BL

IC IN

TE

RN

ET

OA

UT

H

Saiki BPM

Elasticsearch

How we use Flink in BPM

• 1 Event Type -> 1 Kafka topic

• Define a granularity level of 1 minute (TumblingWindows)

• Analyze processes with multiple event types (Join &

Union)

• Generation and processing of Complex Events (CEP &

State)

42

CONTINUOUS ETL

Extract Transform Load (ETL)

Traditional ETL process:

• Batch processing

• No real time

• ETL tools

• Heavy processing on the storage side

44

What changed with Radical Agility?

• Data comes in a semi-structured format (JSON payload)

• Data is distributed in separate Kafka topics

• There would be peak times, meaning that the data flow

will increase by several factors

• Data sources number increased by several factors

45

Architecture Continuous ETL

46

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

Stream Processing Saiki Continuous ETL

Tukang

Oracle DWH

Orang

How we (would) use Flink in Continuous ETL

• Transformation of complex payloads into simple ones for

easier consumption in Oracle DWH

• Combine several topics based on Business Rules (Union,

Join)

• Pre-Aggregate data to improve performance in the

generation of reports (Windows, State)

• Data cleansing

• Data validation 47

FUTURE WORK

Complex Event Processing for BPM

Cont. example business process:

• Multiple PARCEL_SHIPPED events per order

• Generate complex event ALL_SHIPMENTS_DONE, when

all PARCEL_SHIPPED events received

(CEP lib, State)

49

Deployments from other BI Teams

Flink Jobs from other BI Teams

Requirements:

• manage and control deployments

• isolation of data flows

• prevent different jobs from writing to the same sink

• resource management in Flink

• share cluster resources among concurrently running jobs

StreamSQL would significantly lower the entry barrier

50

Replace Kafka2Kafka

• Python app

• extracts events from REST API Nakadi Event Bus

• writes them to our Kafka cluster

Idea: Replace Python app with Flink Jobs, which would write

directly to Saiki’s Kafka cluster (under discussion)

51

Other Future Topics

• CD integration for Flink appliance

• Monitoring data flows via heartbeats

• Flink <-> Kafka

• Flink <-> Elasticsearch

• New use cases for Real Time Analytics/ BI

• Sales monitoring

52

THANK YOU

Recommended