Click here to load reader

Apache Kafka Lesson Learned

  • View
    577

  • Download
    0

Embed Size (px)

Text of Apache Kafka Lesson Learned

  • 1

    Guozhang Wang Kafka Meetup Beijing, April 15, 2017

    Apache Kafka Development Experience and Lesson Learned

  • 2

    A Short History of Kafka

  • 3

    A Short History 2010.10: First commit of Kafka

  • 4

    A Short History 2010.10: First commit of Kafka

    2011.07: Enters Apache Incubator Release 0.7.0: compression, mirror-maker

  • 5

    A Short History 2010.10: First commit of Kafka

    2011.07: Enters Apache Incubator Release 0.7.0: compression, mirror-maker

    2012.10: Graduated to top-level project Release 0.8.0: intra-cluster replication

  • 6

    A Short History 2010.10: First commit of Kafka

    2011.07: Enters Apache Incubator Release 0.7.0: compression, mirror-maker

    2012.10: Graduated to top-level project Release 0.8.0: intra-cluster replication

    2014.11: Confluent founded Release 0.8.2: new producer, quota Release 0.9.0: Kafka Connect, new consumer, security Release 0.10.0: Kafka Streams, timestamps, rack awareness

  • 7

    What is Kafka, Really?

    [NetDB 2011]a scalable pub-sub messaging system..

  • 8

    Example: Pub-Sub Messaging

    Tracking Logs / Metrics

    Hadoop / DW

    Apache Kafka

  • 9

    What is Kafka, Really?

    [NetDB 2011]

    [Hadoop Summit 2013]

    a scalable pub-sub messaging system..

    a real-time data pipeline..

  • 10

    Example: Centralized Data Pipeline

    KV-Store Doc-Store RDBMSTracking Logs / Metrics

    Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity

    Apache Kafka

  • 11

    What is Kafka, Really?

    [NetDB 2011]

    [Hadoop Summit 2013]

    [VLDB 2015]

    a scalable pub-sub messaging system..

    a real-time data pipeline..

    a distributed and replicated log..

  • 12

    Example: Data Store Geo-Replication

    Apache Kafka

    Local Stores

    User Apps User Apps

    Local Stores

    Apache Kafka

    Region 2Region 1

    write read

    append log

    mirroring

    apply log

  • 13

    What is Kafka, Really?

    a scalable pub-sub messaging system.. [NetDB 2011]

    a real-time data pipeline.. [Hadoop Summit 2013]

    a distributed and replicated log.. [VLDB 2015]

    a unified data integration stack.. [CIDR 2015]

  • 14

    Example: Async. Micro-Services

  • 15

    What is Kafka, Really?

    a scalable pub-sub messaging system.. [NetDB 2011]

    a real-time data pipeline.. [Hadoop Summit 2013]

    a distributed and replicated log.. [VLDB 2015]

    a unified data integration stack.. [CIDR 2015]

  • 16

    What is Kafka, Really?

    a scalable Pub-sub messaging system.. [NetDB 2011]

    a real-time data pipeline.. [Hadoop Summit 2013]

    a distributed and replicated log.. [VLDB 2015]

    a unified data integration stack.. [CIDR 2015]

    All of them!

  • 17

    Kafka: Streaming Platform

    Publish / Subscribe Move data around as online streams

    Store Source-of-truth continuous data

    Process React / process data in real-time

  • 18

    How did we get here?

  • 19

    Lesson 1: Build evolvable systems

  • 20

    Upgrade Your Kafka Cluster is like ..

  • 21

    Kafka @ LI

    Release from trunk Push frequency: daily

    Staging cluster Full traffic of production Full monitoring / alerting

    Production Ramp-up Prepare to roll-back anytime

  • 22

    Kafka: Evolvable System Zero down-time Maintenance outage? No such thing.

    All protocols versioned Brokers can talk to older versioned clients And vice versa since 0.10.2!

    One should do no more than rolling bounces Staging before production

  • 23

    Server-Client Compatibility

    0.10.0

    0.8.2

    0.10.2

    0.8.0

    0.9.0

    0.8.2

    0.10.1

    0.8.1

  • 24

  • 25

    Lesson 2: What gets measured gets fixed

  • 26

    Audit Trail @ LI

  • 27

  • 28

    .. and a LOT of Them

  • 29

  • Metrics Reporter

    30

    JMXReporter Included in AK: sensor / metrics -> mbean / attributes

    KafkaReporter Send metrics back to Kafka!

    More.. Ganglia, Graphite, statsd ..

  • 31

    Confluent Enterprise: Delivery Tracking

  • 32

    Confluent Enterprise: Delivery Tracking

  • 33

    Confluent Enterprise: Cluster Health

  • 34

    Lesson 3: APIs stay forever

  • The Story of KAFKA-1481

    35

    Hey, we should stop allowing dashes / underscores in MBean name since they care used in hostnames / topics / etc.

    Makes sense, lets do this!

    DONE! (a few lines of key changes)

  • The Story of KAFKA-1481

    36

  • The Story of KAFKA-1481

    37

  • The Story of KAFKA-1481

    38

    OMG what happened?

    We need to tell our SRE to change their monitoring metrics names now..

    Thats N cluster on M data centers ..

  • 39

  • 40

  • 41

  • 42

    Lesson 4: Service needs gatekeepers

  • 43

    0.10.0

    0.8.2

    0.10.2

    0.8.0

    0.9.0

    0.8.2

    0.10.1

    0.8.1

    One naughty client can bother everyone ..

  • 44

    0.10.0

    0.8.2

    0.10.2

    0.8.0

    0.9.0

    0.8.2

    0.10.1

    0.8.1

    One naughty client can bother everyone ..

  • 45

    0.10.0

    0.8.2

    0.10.2

    0.8.0

    0.9.0

    0.8.2

    0.10.1

    0.8.1

    One naughty client can bother everyone ..

  • 46

    One naughty client can bother everyone ..

    0.10.0

    0.8.2

    0.10.2

    0.8.0

    0.9.0

    0.8.2

    0.10.1

    0.8.1

  • Multi-tenancy Services

    47

    Security from ground up [Release 0.9.0+] Authentication

    Authorization

    Resources under control [Release 0.9.0+] Quota on bytes rate / request rate on clientId / GroupId etc Quota on CPU resources

  • 48

    Lesson 5: Ecosystems are the key

  • 49

    Remember Hadoop-Producer/Consumer?

  • 50

    Container Image?

    Kafka Manager GUI?

    REST Proxy APIs?

    Operation Tooling?

    What Should Go into AK ?

    Other Language Clients?

    Schema Registry?

    Hadoop / etc Integration?

    Stream Processing?

  • 51

    Layered Architecture, not Monolithic

    Messa

    ging

    Cross-

    DC

    Data

    Schem

    as

    Fast E

    TL

    Search

    &Quer

    y

    Proces

    sing

    Conso

    les

    Apache Kafka Enterprise OfferingsOS Eco-Systems

    Third-party ToolsFramework Impls

  • 52

    API, coding

    Full stack evaluation

    Operations, debugging,

  • 53

    API, coding

    Full stack evaluation

    Operations, debugging,

    Simple is Beautiful

  • 54

  • 55

  • 56

  • 57

    Take-aways Build evolvable systems

    What gets measured gets fixed

    APIs stay forever

    (Multi-tenant) Services need gatekeepers

    Ecosystems are the key

    THANKS!

    Guozhang Wang | [email protected] | @guozhangwang

    Kafka Summit 2017 @ NYC & SF

    mailto:[email protected]

Search related