58
Jiangjie (Becket) Qin LinkedIn 1

Jiangjie (Becket) Qin LinkedIn

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jiangjie (Becket) Qin LinkedIn

Jiangjie (Becket) Qin

LinkedIn

1

Page 2: Jiangjie (Becket) Qin LinkedIn

2

Page 3: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

Q&A

3

Page 4: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

Q&A

4

Page 5: Jiangjie (Becket) Qin LinkedIn

5

Page 6: Jiangjie (Becket) Qin LinkedIn

6

Page 7: Jiangjie (Becket) Qin LinkedIn

7

Page 8: Jiangjie (Becket) Qin LinkedIn

Offline

Media

processing

Online/Nearline

Online

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

Tracking events

App2App msg.

Metrics

Data deployment

Logging

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Brooklin(Change capture)

8

Processed data

Processed data

Page 9: Jiangjie (Becket) Qin LinkedIn

Offline

Media

processing

Online/Nearline

Online

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Brooklin(Change capture)

9

Processed data

Processed data

下午16:40,百宴厅1,LinkedIn基于Kafka和ElasticSearch的实时日志分析

Tracking events

App2App msg.

Metrics

Data deployment

Logging

Page 10: Jiangjie (Becket) Qin LinkedIn

Online/Nearline

Media

processingOnline

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

Tracking events

App2App msg.

Metrics

Data deployment

Logging

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Brooklin(Change capture)

Offline

10

Processed data

Processed data

Page 11: Jiangjie (Becket) Qin LinkedIn

Online/Nearline

Media

processingOnline

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Brooklin(Change capture)

Offline

11

Processed data

Processed data

Tracking events

App2App msg.

Metrics

Data deployment

Logging

Page 12: Jiangjie (Becket) Qin LinkedIn

Online/Nearline

Media

processingOnline

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

Processed data

User data update

Change Logs

Hadoop

Brooklin(Change capture)

Offline

12

Processed data

Tracking events

App2App msg.

Metrics

Data deployment

Logging

Page 13: Jiangjie (Becket) Qin LinkedIn

Online/Nearline

Media

processingOnline

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Brooklin(Change capture)

Offline

13

Processed data

Processed data

Tracking events

App2App msg.

Metrics

Data deployment

Logging

Page 14: Jiangjie (Becket) Qin LinkedIn

Online/Nearline

Media

processingOnline

Applications

Voldemort/Venice

(K-V store)

Vector

Ambry(Blob store)

Kafka(Messaging)

Samza(Stream proc.)

Brooklin(Change capture)

User data updateNuage

(Our AWS)

Media upload/Download

(Images/Docs/Videos)

Media processing

(Images/Docs/Videos)

Databus

streams

Stream StorageMedia Nuage

ETL / Data deployment

Espresso(NoSQL DB)

Change Log

Media

processing

User data update

Change Logs

Hadoop

Offline

14

Processed data

Processed data

Tracking events

App2App msg.

Metrics

Data deployment

Logging

Page 15: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

15

Page 16: Jiangjie (Becket) Qin LinkedIn

Storage Node

API Server

MySQL

RouterRouter

Router

Apache Helix

ZooKeeper

Storage Node

API Server

MySQL

Storage Node

API Server

MySQL

Storage Node

API Server

MySQL

DataControl

Routing Table

r

r

r

HTTP

Client

HTTP

16

Page 17: Jiangjie (Becket) Qin LinkedIn

Node 1

P1 P2 P3

Node 2

P1 P2 P3

Node 3

P1 P2 P3

Node 1

P4 P5 P6

Node 2

P4 P5 P6

Node 3

P4 P5 P6

Master

Slave

Offline

MySQL instance level replication

17

Page 18: Jiangjie (Becket) Qin LinkedIn

Storage Node

MySQL

Open Replicator

Kafka Producer

API Server

binlog

binlogevent

Kafka Consumer

SQLINSERT..UPDATE

SQLINSERT..UPDATE

Storage Node

MySQL

Open Replicator

Kafka Producer

API Server

binlog

binlogevent

Kafka Consumer

SQLINSERT..UPDATE

SQLINSERT..UPDATE

Kafka Partition

Kafka Message

Kafka Message

ClientHTTP

PUT/POST

Kafka based partition level replication

Master Slave

18

Page 19: Jiangjie (Becket) Qin LinkedIn

No message loss

In-order delivery

Exactly once semantic

High throughput

Low latency

Handle large messages

Security

Message Integrity

Performance

19

Page 20: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

20

Page 21: Jiangjie (Becket) Qin LinkedIn

Produce

Replicate

Consume

• No message loss

• In-order delivery

• Exactly once semantic

21

Page 22: Jiangjie (Becket) Qin LinkedIn

A well implemented producer usually

supports:

Batching the messages

Sending the messages asynchronously

Example: org.apache.kafka.clients.producer.KafkaProducer

22

Page 23: Jiangjie (Becket) Qin LinkedIn

User: producer.send(new ProducerRecord(“topic0”, “hello”),

callback);

Record Accumulator

batch1 topic0, 0

freemessage

s

compressor

callbacks

M

CB

Topic

Metadata

topic=“topic0”

value=“hello”PartitionerSerializer

topic=“topic0”

partition =0

value=“hello”

• Serialization

• Partitioning

• Compression

Tasks done by the user thread

23

Page 24: Jiangjie (Becket) Qin LinkedIn

sender thread Record Accumulator

…...

batch1

batch0

batch0 batch1

topic0, 0

topic0, 1

topic1, 0batch2

freemessage

s

compressor

callbacks

M

CB

Sender:

1. polls batches from the batch queues (one batch / partition)

Broker 0

Broker 1

resp

resp

2. groups batches based on the leader broker

3. sends the grouped batches to the brokers

4. Fire callbacks after receiving the response

callbacks CB

24

Page 25: Jiangjie (Becket) Qin LinkedIn

Enable retries in the sender

(e.g. retries=5)

Keep the messages not acked yet

Espresso only checkpoints at the transaction

boundary after the callback is fired

acks=all

25

Page 26: Jiangjie (Becket) Qin LinkedIn

26

Broker 0 (leader) Broker 1 (follower)

ProduceRequest1

2Producer

3

ProduceResponse4

Leader Follower

1. [Network] Send ProduceRequest

2. [Broker] Append messages to the leader’s log

4. [Broker] ProduceResponse

3. [Broker] Replication (before sending the response)

Log Log

Page 27: Jiangjie (Becket) Qin LinkedIn

In-Sync-Replica (ISR)

A replica that can keep up with the leader

Semantic for acks

acks Throughput Latency Durability

no ack(0) high low No guarantee

Leader only(1) medium medium leader

All ISR(-1) low high All ISR

27

Page 28: Jiangjie (Becket) Qin LinkedIn

max.in.flight.requests.per.connection = 1

Request pipelining

Kafka

BrokerProducer

message 0

message 1

message 0 failed

retry message 0Timeline

message 1 ok

max.in.flight.requests.per.connection=2

message 0 ok

28

Page 29: Jiangjie (Becket) Qin LinkedIn

Close the producer in the callback with 0

timeout on failures

Callbacks are executed in the Sender thread

Record

Accumulator

Sender Thread

Kafka Broker

Timeline

1.msg 0

2.callback(msg 0) ack expt.

User

Thread

close prod.

3.msg 1

notify

29

Page 30: Jiangjie (Becket) Qin LinkedIn

Close the producer in the callback with 0

timeout on failures

Record

Accumulator

Sender Thread

Kafka Broker

Timeline

1.msg 0

2.callback(msg 0) ack expt.

User

Threadnotify

close(0)

30

Page 31: Jiangjie (Becket) Qin LinkedIn

min.isr=2

If acks=all, at least 2 copies of messages are

required on the broker

replication.factor needs be 3 to tolerate single

broker failure

Broker 0

(Leader)

Broker 1

(In-sync follower)Producer

resp.

req.

Replica = {0, 1}

ISR = {0, 1}

replication.

Replica = {0, 1}

ISR = {0}

31

Page 32: Jiangjie (Becket) Qin LinkedIn

unclean.leader.election.enable= false

Only in-sync replicas can become leader

Broker 0

(Leader)

Broker 1

(In-sync follower)

Broker 2

(out of sync follower)

Producer

resp.req.

Replica = {0, 1, 2}

ISR = {0, 1}

replication.

32

Page 33: Jiangjie (Becket) Qin LinkedIn

Disable auto offset commit

Manually commit offset

only commit offsets after successfully

processing the messages

33

Page 34: Jiangjie (Becket) Qin LinkedIn

Exactly once delivery (Espresso)

Consumer only apply higher Generation:SCN

B – Begin txn

E – End txn

C – Control

Master

MySQ

L

ProducerConsumer

Slave

MySQ

L

ProducerConsumer

3:101

B,E

3:102

B

3:102 3:102

E

3:100

B,E

3:103

B,E

3:104

B

3:104DB_0:

3:104

E

(Generation:SCN)

34

Master Generation

Transaction Seq.

Page 35: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

35

Page 36: Jiangjie (Becket) Qin LinkedIn

Performance tuning is case by case

Traffic pattern sensitive

Application requirement specific

Producer performance is more interesting

Especially for acks=all

See more

▪ Producer performance tuning for Apache Kafka

(http://www.slideshare.net/JiangjieQin/producer-performance-

tuning-for-apache-kafka-63147600)

36

Page 37: Jiangjie (Becket) Qin LinkedIn

3737

Broker 0 (leader) Broker 1 (follower)

ProduceRequest1

2Producer

3

ProduceResponse4

Leader Follower

1. [Network] Send ProduceRequest

2. [Broker] Append messages to the leader’s log

4. [Broker] ProduceResponse

3. [Broker] Replication (synchronously)

Log Log

increases latency

Page 38: Jiangjie (Becket) Qin LinkedIn

Kafka replication is a pull model

Increase num.replica.fetchers

Parallel fetch

Not perfect solution

Diminishing effect (1/N)

Scalability concern

▪ Replica fetchers per broker = (Cluster_Size - 1 ) *

num.replica.fetchers

38

Page 39: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

39

Page 40: Jiangjie (Becket) Qin LinkedIn

Kafka has a limit on the maximum size of a single

message Enforced on the compressed wrapper message if

compression is used

BrokerProducer

{

if (message.size > message.max.bytes)

reject!

}

RecordTooLargeException

40

Page 41: Jiangjie (Becket) Qin LinkedIn

Increase the memory pressure in the broker

Large messages are expensive to handle and

could slow down the brokers.

A reasonable message size limit can handle

vast majority of the use cases.

41

Page 42: Jiangjie (Becket) Qin LinkedIn

KafkaProducer

Data

Store

Consumer

dataRef

.Ref

.

Ref

.

dataRef

.

• Reference based messaging

42

Page 43: Jiangjie (Becket) Qin LinkedIn

Unknown maximum row size

Strict no data loss

Strict message order guarantee

KafkaProducer

Data

Store

Consumer

dataRef

.Ref

.

Ref

.

dataRef

.

Works fine as long as the durability of the data store can be guaranteed.

43

Page 44: Jiangjie (Becket) Qin LinkedIn

Replicates a data store by using another data

store....

Sporadic large messages

Low end to end latency There are more round trips in the system.

Need to make sure the data store is fast

KafkaProducer

Data

Store

Consumer

dataRef

.Ref

.

Ref

.

dataRef

.

44

Page 45: Jiangjie (Becket) Qin LinkedIn

45

Page 46: Jiangjie (Becket) Qin LinkedIn

Reference Based Messaging In-line large message support

Operational complexity Two systems to maintain Only maintain Kafka

System stability Depend on :

The consistency between Kafka

and the external storage

The durability of external

storage

Only depend on Kafka

Cost to serve Kafka + External Storage Only maintain Kafka

End to end latency Depend on the external storage The latency of Kafka

Client complexity Need to deal with envelopes Much more involved

Functional limitations Almost none Some limitations

46

Page 47: Jiangjie (Becket) Qin LinkedIn

ConsumerKafka brokers

KafkaConsumer<byte[], byte[]>

Producer

MessageAssembler

DeliveredMessageOffsetTracker

LargeMessageBufferPool

MessageSplitter

KafkaProducer<byte[], byte[]>

Compatible interface

with open source Kafka

producer / consumer

47

Page 48: Jiangjie (Becket) Qin LinkedIn

Many interesting details

The offset of a large message

Offset Tracking

Rebalance and duplicates handling

Producer callback

Memory management

Performance overhead

Compatibility with existing messages

48

Page 49: Jiangjie (Becket) Qin LinkedIn

Many interesting details

The offset of a large message

Offset Tracking

Rebalance and duplicates handling

Producer callback

Memory management

Performance overhead

Compatibility with existing messages

49

Page 50: Jiangjie (Becket) Qin LinkedIn

Each message in Kafka has an Offset

The logical sequence in the log

Two options for large message’s offset

The offset of the first segment

The offset of the last segment

50

Page 51: Jiangjie (Becket) Qin LinkedIn

offset of a large message = offset of first

segment First seen first serve

Expensive for in-order delivery

0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker

0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

Consumer

1: msg13: msg0-seg1

0: msg0

User

51

Max number of segments to buffer: 4

offset

Page 52: Jiangjie (Becket) Qin LinkedIn

offset of a large message = offset of last

segment Less memory consumption

Better tolerance for partially sent large messages.

Hard to seek()

0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

3: msg0-seg1

Broker

0: msg0-seg0

1: msg1-seg0

2: msg1-seg1

Consumer

2: msg1

3: msg0-seg13: msg0

User

52

Max number of segments to buffer: 3

Page 53: Jiangjie (Becket) Qin LinkedIn

Offset Tracking

Rebalance and duplicates handling

Producer callback

Memory management

Performance overhead

Compatibility with existing messages

http://www.slideshare.net/JiangjieQin/handle-

large-messages-in-apache-kafka-58692297

53

Page 54: Jiangjie (Becket) Qin LinkedIn

Our client library will be open sourced

shortly

Large message support

Auditing

54

Page 55: Jiangjie (Becket) Qin LinkedIn

Introduction to Apache Kafka

Kafka based replication in Espresso

Message Integrity guarantees

Performance

Large message handling

Security

55

Page 56: Jiangjie (Becket) Qin LinkedIn

Authentication (SSL, Kerberos, SASL)

Authorization (Unix-like permission)

Resource: Cluster, Topic, Group

Operation: Read, Write, Create, Delete, Alter,

Describe, Cluster Operation, All

TLS encryption

56

Page 57: Jiangjie (Becket) Qin LinkedIn

Broker

ACL service

(group,resource,action)

(group,resource,action)

Client

SSL Authenticator

Authorizer

Local ACL

Cache

Request Handling

Allow

Deny

Resp.

principal

Authorizer performance is important

57

Page 58: Jiangjie (Becket) Qin LinkedIn

58