23
a Emre Akış

Apache Kafka

Embed Size (px)

Citation preview

Page 1: Apache Kafka

a

Emre Akış

Page 2: Apache Kafka

2

Outline

• Why do we use Apache Kafka ?

• What is it?

• How it works?

• Demo

• Ecosystem

Page 3: Apache Kafka

3

Big Data

• Data doesn’t fit in one computer

• Welcome to the distributed systems

Page 4: Apache Kafka

4

(Near) Real-time Big Data & Analytics

• Events (e.g. clickstreams)

• Sensors

• Internet of Things (IoT)

• Data streams

Page 5: Apache Kafka

5

Messaging Queues

FIFO

Page 6: Apache Kafka

6

Distributed Messaging Queues

• Scalable

• Reliable

• High throughput (read & write)

Page 7: Apache Kafka

7

Why’s for Apache Kafka

• Clean and simple architecture

• Easy to use

• Easy to deploy

• High throughput

• Scalability

• High availability

• Persistence (for a while)

Page 8: Apache Kafka

8

Apache Kafka 101

• Distributed, partitioned, replicated commit log

service.

• Provides the functionality of a messaging

system.

Page 9: Apache Kafka

9

Cluster

Language agnostic TCP protocol

Cluster => group of servers(brokers)

Page 10: Apache Kafka

10

Topic

• Category or feed name to which messages are published.

• Partitioned log• Each partition– Ordered– Immutable seq.– Appended to

offset => sequential id number

Page 11: Apache Kafka

11

Partition Distribution

• Distributed over servers in the cluster• Replicated for fault tolerance (configurable)• Each partition has a leader server (read &

writes)• Others acts followers (replicate leader)• In case of partition failure one of the followers

becomes new leader

Page 12: Apache Kafka

12

Producer

• Decides which message to which partition

– Round-robin

– Semantic partitioning

Page 13: Apache Kafka

13

Consumer

• Queue vs. Publish/Subscribe• Traditional queue ordering vs per-partition

ordering

Page 14: Apache Kafka

14

Guarantees

• Messages in a partition will be same order they are sent by a producer.

• Consumers see messages in the stored order in log.

Page 15: Apache Kafka

15

Demo

• Basic Command Line Tools – Start a server– Create a topic– Send a message– Start a consumer– Multi-broker cluster

• No arguments displays usage information

Page 16: Apache Kafka

16

Clients

• Java• Python• Ruby• Go• C/C++• .NET• Clojure• Node.js

• Scala• JRuby• Perl• Erlang• PHP• Rust• HTTP Rest

https://cwiki.apache.org/confluence/display/KAFKA/Clients

Page 17: Apache Kafka

17

Administrative Tools

• Kafka Manager (powered by Yahoo)• Kafkat : Command-line administration for Kafka

brokers.• Kafka Web Console : Displays information about

your Kafka cluster including which nodes are up and what topics they host data for.

• Kafka Offset Monitor : Displays the state of all consumers and how far behind the head of the stream they are.

Page 18: Apache Kafka

18

Ecosystem

• Samza• Spark Streaming• Storm

https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Page 19: Apache Kafka

19

Use Cases

• Messaging • Website activity tracking (at Linkedin)• Metrics • Log aggregation • Stream processing (with Storm or Samza)• Event sourcing (state changes are logged by time)• Commit log (like database transaction log – log

compaction)

Page 20: Apache Kafka

20

Who uses ?

• LinkedIn

• Yahoo

• Twitter

• Netflix

• Spotify

• Pinterest

• Uber

• Goldman Sachs

• Tumblr

• PayPal

• Box

• Airbnb

• Mozilla

• Cisco

• Etsy

• Foursquare

• StumbleUpon

• Coursera

• …

https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

Page 21: Apache Kafka

21

Resources• http://kafka.apache.org/• https://cwiki.apache.org/confluence/display/KAFKA/Index• https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem• http://www.confluent.io/blog

Page 22: Apache Kafka

22

Q & A

Page 23: Apache Kafka

23

About Me

• Twitter : @akisemre• Linkedin : https://tr.linkedin.com/in/emreakis