39
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz @ gschmutz guidoschmutz.wordpress.com

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Embed Size (px)

Citation preview

Page 1: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Apache KafkaScalable Message Processing and more!

Guido Schmutz

@gschmutz guidoschmutz.wordpress.com

Page 2: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Agenda

1. Introduction & Motivation2. Kafka Core

3. Kafka Connect

4. Kafka Streams

5. Kafka and ”Big Data” / ”Fast Data” Ecosystem

6. Confluent Data Platform7. Kafka in Architecture

8. Summary

Apache Kafka - Scalable Message Processing and more!3

Page 3: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Introduction & Motivation

Apache Kafka - Scalable Message Processing and more!4

Page 4: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

A little story of a “real-life” customer situation

Traditional system interact with its clients and does its workImplemented using legacy technologies (i.e. PL/SQL)

New requirement:• Offer notification service to notify

customer when goods are shipped• Subscription and inform over different

channels• Existing technology doesn’t fit

delivery

LogisticSystem

Oracle

MobileApps

Sensor ship

sort

5

Rich(Web)ClientApps

DB

schedule

Logic(PL/SQL)

delivery

Apache Kafka - Scalable Message Processing and more!5

Page 5: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

A little story of a “real-life” customer situation

Events are “owned” by traditional application (as well as the channels they are transported over)

Implement notification as a new Java-based application/system

But we need the events ! => so let’s integrate

delivery

LogisticSystem

Oracle

MobileApps

Sensor ship

sort

6

Rich(Web)ClientApps

DB

schedule

Notification

Logic(PL/SQL)

Logic(Java)delivery

SMS

Email

Apache Kafka - Scalable Message Processing and more!6

Page 6: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

A little story of a “real-life” customer situation

integrate in order to get the information! Oracle Service Bus was already there

Rule Engine implemented in Java and invoked from OSB message flowNotification system informed via queueHigher Latency introduced (good enough in this case)

delivery

LogisticSystem

OracleOracle

ServiceBus

MobileApps

Sensor AQship

sort

7

Rich(Web)ClientApps

DB

schedule

Filter

Notification

Logic(PL/SQL)

JMS

RuleEngine(Java)

Logic(Java)delivery

shipdelivery

delivery true SMS

Email

Apache Kafka - Scalable Message Processing and more!7

Page 7: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

A little story of a “real-life” customer situation

Treat events as first-class citizens

Events belong to the “enterprise” and not an individual system => Catalog of Events similar to Catalog of Services/APIs !!

Event (stream) processing can be introduced and by that latency reduced!

delivery

LogisticSystem

OracleOracle

ServiceBus

MobileApps

Sensor AQship

sort

8

Rich(Web)ClientApps

DB

schedule

Filter

Notification

Logic(PL/SQL)

JMS

RuleEngine(Java)

Logic(Java)delivery

shipdelivery

delivery true SMS

Email

Apache Kafka - Scalable Message Processing and more!8

Page 8: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Treat Events as Events and share them!

delivery

LogisticSystem

Oracle

OracleServiceBus

MobileApps

Sensorship

sort

9

Rich(Web)ClientApps

DB

schedule

Filter

Notification

Logic(PL/SQL)

JMS

RuleEngine(Java)

Logic(Java)

delivery

ship

delivery true SMS

Email

EventBus/Hub

Stream/EventProcessing

Apache Kafka - Scalable Message Processing and more!9

Page 9: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Treat Events as Events,share and make use of them!

delivery

LogisticSystem

Oracle

MobileApps

Sensorship

sort

10

Rich(Web)ClientApps

DB

schedule

Filter

Notification

Logic(PL/SQL)

RuleEngine(Java)

Logic(Java)

delivery

SMS

Email

EventBus/Hub

Stream/EventProcessing

notifiableDelivery

notifiableDelivery

delivery

Apache Kafka - Scalable Message Processing and more!10

Page 10: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Stream Data Platform

Source:ConfluentApache Kafka - Scalable Message Processing and more!11

Page 11: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Core

Apache Kafka - Scalable Message Processing and more!12

Page 12: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Overview

Distributed publish-subscribe messaging system

Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)

Initially developed at LinkedIn, now part of Apache

Does not use JMS API and standards

Kafka maintains feeds of messages in topics

Apache Kafka - Scalable Message Processing and more!13

Page 13: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Motivation

LinkedIn’s motivation for Kafka was:

• “A unified platform for handling all the real-time data feeds a large company might have.”

Must haves

• High throughput to support high volume event feeds.

• Support real-time processing of these feeds to create new, derived feeds.

• Support large data backlogs to handle periodic ingestion from offline systems.

• Support low-latency delivery to handle more traditional messaging use cases.

• Guarantee fault-tolerance in the presence of machine failures.

Apache Kafka - Scalable Message Processing and more!14

Page 14: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka High Level Architecture

The who is who• Producers write data to brokers.• Consumers read data from

brokers.• All this is distributed.

The data• Data is stored in topics.• Topics are split into partitions,

which are replicated.

Kafka Cluster

Consumer Consumer Consumer

Producer Producer Producer

Broker 1 Broker 2 Broker 3

ZookeeperEnsemble

Apache Kafka - Scalable Message Processing and more!15

Page 15: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor1 2 3 4 5 6

Truck

Apache Kafka - Scalable Message Processing and more!16

Page 16: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor

Partition0

1 2 3 4 5 6Partition0

1 2 3 4 5 6Partition1 Movement

ProcessorTruck

Apache Kafka - Scalable Message Processing and more!17

Page 17: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

ApacheKafka

Kafka Broker 1

Movement Processor

Truck

MovementTopicP0

Movement Processor

1 2 3 4 5

P2 1 2 3 4 5

Kafka Broker 2MovementTopic

P2 1 2 3 4 5

P1 1 2 3 4 5

Kafka Broker 3MovementTopic

P0 1 2 3 4 5

P1 1 2 3 4 5

Movement Processor

Apache Kafka - Scalable Message Processing and more!18

Page 18: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Architecture

• Write Ahead Log / Commit Log

• Producers always append to tail

• think append to file

Kafka Broker

MovementTopic

1 2 3 4 5

Truck

6 6

Apache Kafka - Scalable Message Processing and more!19

Page 19: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Durability Guarantees

Producer can configure acknowledgements

Value Impact Durability

0 • Producerdoesn’twaitforleader weak1(default) • Producerwaitsforleader

• Leadersends ack whenmessagewrittentolog• Nowaitforfollowers

medium

all • Producerwaitsforleader• Leadersendsack when allIn-SyncReplicahaveacknowledged

strong

Apache Kafka - Scalable Message Processing and more!20

Page 20: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka - Partition offsets

Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

• Consumers track their pointers via (offset, partition, topic) tuples

ConsumerGroupA ConsumerGroupB

Apache Kafka - Scalable Message Processing and more!21

Source:ApacheKafka

Page 21: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Data Retention – 3 options

1. Never

2. Time based (TTL) log.retention.{ms | minutes | hours}

3. Size based log.retention.bytes

4. Log compaction based (entries with same key are removed)kafka-topics.sh --zookeeper localhost:2181 \

--create --topic customers \--replication-factor 1 --partitions 1 \--config cleanup.policy=compact

Apache Kafka - Scalable Message Processing and more!22

Page 22: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Apache Kafka – Some numbers

Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics

Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster

• 445’622 messages/second• 31 MB / second • 3.0405 ms average latency between producer / consumer

1.3Trillionmessagesperday

330Terabytesin/day

1.2Petabytesout/day

Peakloadforasinglecluster2millionmessages/sec4.7Gigabits/secinbound15Gigabits/secoutbound

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

https://engineering.linkedin.com/kafka/running-kafka-scale

Apache Kafka - Scalable Message Processing and more!23

Page 23: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Connect

Apache Kafka - Scalable Message Processing and more!26

Page 24: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Connect Architecture

Apache Kafka - Scalable Message Processing and more!27

Source:Confluent

Page 25: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Connector Hub – Certified Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!28

Page 26: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Connector Hub – Additional Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!29

Page 27: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Streams

Apache Kafka - Scalable Message Processing and more!30

Page 28: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Streams

• Designed as a simple and lightweight library in Apache Kafka

• no external dependencies on systems other than Apache Kafka

• Leverages Kafka as its internal messaging layer

• agnostic to resource management and configuration tools

• Supports fault-tolerant local state

• Event-at-a-time processing (not microbatch) with millisecond latency

• Windowing with out-of-order data using a Google DataFlow-like model

Apache Kafka - Scalable Message Processing and more!31

Page 29: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka Streams Architecture

Apache Kafka - Scalable Message Processing and more!32

Source:Confluent

Page 30: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka and ”Big Data” / ”Fast Data” Ecosystem

Apache Kafka - Scalable Message Processing and more!33

Page 31: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka and the Big Data / Fast Data ecosystem

Kafka integrates with many popular products / frameworks

• Apache Spark Streaming

• Apache Flink

• Apache Storm

• Apache NiFi

• Streamsets

• Apache Flume

• Oracle Stream Analytics

• Oracle Service Bus

• Oracle GoldenGate

• Spring Integration Kafka Support

• …Stormbuilt-inKafkaSpouttoconsumeeventsfromKafka

Apache Kafka - Scalable Message Processing and more!34

Page 32: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Confluent Platform

Apache Kafka - Scalable Message Processing and more!35

Page 33: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Confluent Data Platform 3.0

Apache Kafka - Scalable Message Processing and more!36

Source:Confluent

Page 34: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Kafka in Architecture

Apache Kafka - Scalable Message Processing and more!37

Page 35: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Hadoop ClusterdHadoop ClusterHadoop Cluster

Customer Event Hub – taking Velocity into account

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

ParallelProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

File Import / SQL Import

WeatherData

Apache Kafka - Scalable Message Processing and more!38

Page 36: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

WeatherData

SQL ImportHadoop ClusterdHadoop Cluster

Hadoop Cluster

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

ParallelProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

Customer Event Hub – mapping of technologies

Apache Kafka - Scalable Message Processing and more!39

Page 37: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Summary

Apache Kafka - Scalable Message Processing and more!40

Page 38: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Summary

• Kafka can scale to millions of messages per second, and more

• Easy to start with for a PoC

• A bit more to invest to setup production environment

• Monitoring is key

• Vibrant community and ecosystem

• Fast pace technology

• Confluent provides Kafka Distribution

Apache Kafka - Scalable Message Processing and more!41

Page 39: Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz

Guido SchmutzTechnology Manager

[email protected]

Apache Kafka - Scalable Message Processing and more!42

@gschmutz guidoschmutz.wordpress.com