Apache Kafka Lesson Learned

Preview:

Citation preview

1

Guozhang Wang Kafka Meetup Beijing, April 15, 2017

Apache Kafka Development Experience and Lesson Learned

2

A Short History of Kafka

3

A Short History• 2010.10: First commit of Kafka

4

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

5

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

• 2012.10: Graduated to top-level project

• Release 0.8.0: intra-cluster replication

6

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

• 2012.10: Graduated to top-level project

• Release 0.8.0: intra-cluster replication

• 2014.11: Confluent founded

• Release 0.8.2: new producer, quota

• Release 0.9.0: Kafka Connect, new consumer, security

• Release 0.10.0: Kafka Streams, timestamps, rack awareness

7

What is Kafka, Really?

[NetDB 2011]a scalable pub-sub messaging system..

8

Example: Pub-Sub Messaging

Tracking Logs / Metrics

Hadoop / DW

Apache Kafka

9

What is Kafka, Really?

[NetDB 2011]

[Hadoop Summit 2013]

a scalable pub-sub messaging system..

a real-time data pipeline..

10

Example: Centralized Data Pipeline

KV-Store Doc-Store RDBMSTracking Logs / Metrics

Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity

Apache Kafka

11

What is Kafka, Really?

[NetDB 2011]

[Hadoop Summit 2013]

[VLDB 2015]

a scalable pub-sub messaging system..

a real-time data pipeline..

a distributed and replicated log..

12

Example: Data Store Geo-Replication

Apache Kafka

Local Stores

User Apps User Apps

Local Stores

Apache Kafka

Region 2Region 1

write read

append log

mirroring

apply log

13

What is Kafka, Really?

a scalable pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

14

Example: Async. Micro-Services

15

What is Kafka, Really?

a scalable pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

16

What is Kafka, Really?

a scalable Pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

All of them!

17

Kafka: Streaming Platform

• Publish / Subscribe• Move data around as online streams

• Store• “Source-of-truth” continuous data

• Process• React / process data in real-time

18

How did we get here?

19

Lesson 1: Build evolvable systems

20

Upgrade Your Kafka Cluster is like ..

21

Kafka @ LI

• Release from trunk• Push frequency: daily

• Staging cluster• Full traffic of production• Full monitoring / alerting

• Production Ramp-up• Prepare to roll-back anytime

22

Kafka: Evolvable System• Zero down-time• Maintenance outage? No such thing.

• All protocols versioned• Brokers can talk to older versioned clients

• And vice versa since 0.10.2!

• One should do no more than rolling bounces• Staging before production

23

Server-Client Compatibility

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

24

25

Lesson 2: What gets measured gets fixed

26

Audit Trail @ LI

27

28

.. and a LOT of Them

29

Metrics Reporter

30

• JMXReporter• Included in AK: sensor / metrics -> mbean / attributes

• KafkaReporter• Send metrics back to Kafka!

• More..• Ganglia, Graphite, statsd ..

31

Confluent Enterprise: Delivery Tracking

32

Confluent Enterprise: Delivery Tracking

33

Confluent Enterprise: Cluster Health

34

Lesson 3: APIs stay forever

The Story of KAFKA-1481

35

Hey, we should stop allowing dashes / underscores in MBean name since they care used in hostnames / topics / etc.

Makes sense, let’s do this!

DONE! (a few lines of key changes)

The Story of KAFKA-1481

36

The Story of KAFKA-1481

37

The Story of KAFKA-1481

38

OMG what happened?

We need to tell our SRE to change their monitoring metrics names now..

That’s N cluster on M data centers ..

39

40

41

42

Lesson 4: Service needs gatekeepers

43

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

44

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

45

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

46

One naughty client can bother everyone ..

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

Multi-tenancy Services

47

• Security from ground up [Release 0.9.0+]

• Authentication

• Authorization

• Resources under control [Release 0.9.0+]

• Quota on bytes rate / request rate on clientId / GroupId etc • Quota on CPU resources

48

Lesson 5: Ecosystems are the key

49

Remember Hadoop-Producer/Consumer?

50

• Container Image?

• Kafka Manager GUI?

• REST Proxy APIs?

• Operation Tooling?

What Should Go into AK ?

• Other Language Clients?

• Schema Registry?

• Hadoop / etc Integration?

• Stream Processing?

51

Layered Architecture, not Monolithic

Messagin

g

Cross-D

C

Data Schem

as

Fast ETL

Search

&Query

Process

ing

Console

s

Apache Kafka Enterprise OfferingsOS Eco-Systems

Third-party ToolsFramework Impls

52

API, coding

“Full stack” evaluation

Operations, debugging, …

53

API, coding

“Full stack” evaluation

Operations, debugging, …

Simple is Beautiful

54

55

56

57

Take-aways• Build evolvable systems

• What gets measured gets fixed

• APIs stay forever

• (Multi-tenant) Services need gatekeepers

• Ecosystems are the key

THANKS!

Guozhang Wang | guozhang@confluent.io | @guozhangwang

Kafka Summit 2017 @ NYC & SF

Recommended