Upload
vasilievip
View
594
Download
0
Embed Size (px)
Citation preview
2016
2
HOW TO COOK APACHE KAFKAWITH CAMEL AND SPRING BOOT
Java EE conference 2016
Ivan VasylievPlaytika Core Services Team
3Java EE conference 2016
AGENDA
Basics of Apache Kafka Apache Camel Spring Boot Demo Q&A
CODE SLIDES
4Java EE conference 2016
WHY APACHE KAFKA?
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
5Java EE conference 2016
WHY APACHE KAFKA?
Designed for large scale Widely adopted by top tech companies Hardened production quality product Data replication out of the box
6Java EE conference 2016
FEATURES
At most once, at least once guarantees Batching for high throughput cases Efficient with DEFAULT settings
7Java EE conference 2016
EVEN MORE FEATURES
Mirroring between datacenters Connectors to various DWH Complex event processing integrations
8Java EE conference 2016
HIGH LEVEL VIEW
http://kafka.apache.org/documentation.html#introduction
9Java EE conference 2016
HIGH LEVEL VIEW
Publisher/subscriber and point-to-point models Client which sends message – producer Client which receives messages - consumer
10Java EE conference 2016
WHAT IS NOT INCLUDED - JMS
11Java EE conference 2016
WHAT IS NOT INCLUDED - JMS
Not a JMS compliant server No message headers
Can employ message key Send in payload Wait for it, on roadmap
No transactions/JTA support
12Java EE conference 2016
WHAT IS NOT INCLUDED - EXACTLY ONCE GUARANTEE
13Java EE conference 2016
WHAT IS NOT INCLUDED - EXACTLY ONCE GUARANTEE
No exactly once guarantee Duplicates because of failures De-duplication is on roadmap
De-duplication on consumer With camel EIP, by message ID/body Consumer can tolerate duplicates
14Java EE conference 2016
APACHE KAFKA LANGUAGE
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
15Java EE conference 2016
APACHE KAFKA LANGUAGE
Topic - represents stream of messages Contains set of partitions
Partition - subset of messages in stream Partitioning is done by message key on producer
No “queue” in dictionary
16Java EE conference 2016
TOPICS AND PARTITIONS
http://kafka.apache.org/documentation.html#intro_topics
17Java EE conference 2016
TOPICS AND PARTITIONS
Partition is smallest unit of storage in kafka Partition is data file with messages
Producer always append to end of file Consumers scroll/seek over file
Consumer offset is persisted (zk or kafka) Strong ordering guarantees for consumer
18Java EE conference 2016
QUEUE SEMANTIC IS DONE ON CLIENT
http://kafka.apache.org/documentation.html#intro_consumers
19Java EE conference 2016
QUEUE
Consumer offset is persisted by group id/per partition Queue semantic inside of consumer group Topic semantic between consumer groups
20Java EE conference 2016
CONSUMPTION IS ALL ABOUT OFFSETS
https://hadoopabcd.wordpress.com/2015/04/11/kafka-building-a-real-time-data-pipeline/
21Java EE conference 2016
CONSUMPTION IS ALL ABOUT OFFSETS
Consumer polls data from broker Consumer offset is send (committed) to server Auto offset commit enabled
By separate thread, periodically
Auto offset commit disabled By your code, when batch of messages processed
22Java EE conference 2016
CONSUMER OFFSET AND AUTO-COMMIT
23Java EE conference 2016
CONSUMER OFFSET AND AUTO-COMMIT
With “auto-commit” enabled you can loose messages Step1: One thread did not finish processing and failed Step 2: Auto-commit thread does not care
Auto-commit is OK for status heartbeats Auto-commit is NOT OK if you need “at least once” guarantee, e.g. payment processing
24Java EE conference 2016
DATA REPLICATION
25Java EE conference 2016
DATA REPLICATION
Leader receives all reads and writes Decides when to commit message
Follower syncs messages from leader Take over if leader is down
Replication controller maintains leader Zookeper used for coordination
Leader election Consensus protocol
26Java EE conference 2016
APACHE KAFKA PRODUCER
27Java EE conference 2016
APACHE KAFKA PRODUCER
Performs load balancing Uses message key to select partition Finds appropriate kafka broker leader for partition Has few configurable acknowledge modes Can do batching in async mode
28Java EE conference 2016
DELIVERY GUARANTEED
29Java EE conference 2016
DELIVERY GUARANTEED
Durability with ack levels on producer side Data replication between brokers No in-memory state, efficient persistence Manually committing offset on consumer side
30Java EE conference 2016
ISSUES - OPS
Ops is not free There is Zookeeper on board Easy to setup with Docker/Rancher
Need to learn basics to setup and monitor
31Java EE conference 2016
ISSUES – DATA
Can’t auto-scale existing data Option 1: Add new partitions, they will go to new nodes Option 2: Do it manually, move partitions around Option 3: Wait for it, on roadmap
Mirroring seems to work into one direction Can’t handle very large number of topics
32Java EE conference 2016
WHY APACHE CAMEL?
33Java EE conference 2016
WHY APACHE CAMEL?
Message routing DSL (java/scala/grooovy) Enterprise Integration Patterns
Idempotent consumer (de-duplication) Aggregator …
Abstractions for testing MockEndpoint Route Advice
34Java EE conference 2016
APACHE CAMEL
http://camel.apache.org/java-dsl.html
35Java EE conference 2016
APACHE CAMEL
Lightweight and embeddable Spring boot integration Connectors to various message and data sources
36Java EE conference 2016
SPRING BOOT
37Java EE conference 2016
SPRING BOOT
Fat jar/jee containerless deployment Autoconfiguration and conditionals Сodeless usage of spring cloud/netflix projects
38Java EE conference 2016
39Java EE conference 2016
GOTCHA’S – PRODUCER FASTER THAN CONSUMER, PRECONDITIONS
Its not recommended to have lots of partitions Each partition is consumed by one consumer thread Producer X times faster than consumer
40Java EE conference 2016
GOTCHA’S – PRODUCER FASTER THAN CONSUMERS, ACTIONS
Monitor kafka lag Messages not consumed by group
Add intermediate multiplexing queue See camel “seda” component Think carefully since in-memory state can lead to data loss
Consider adding more partitions Will allow more consumption threads
41Java EE conference 2016
GOTCHA’S – PRODUCER FASTER THAN CONSUMERS, TOOLS
https://github.com/quantifind/KafkaOffsetMonitor
42Java EE conference 2016
GOTCHA’S – AUTO OFFSET RESET
When you start test you do not receive any messages Producer sends message before consumer is UP Check auto.offset.reset setting in unit test
Latest (or largest in old api) can lead to consumption of only new messages Earliest (or smallest in old api) will mean “from beginning”
43Java EE conference 2016
GOTCHA’S – CLIENT VERSION MIGHT NEED TO MATCH SERVER
Clients supposed to be “backward compatible”, but … If you see weird things – you should check classpath
44Java EE conference 2016
GOTCHA’S – WATCH THE CLASSPATH
Multiple versions of kafka client Multiple versions of kafka client dependencies Multiple versions of zookeper client
45Java EE conference 2016
DEPENDENCY MANAGEMENT
Use dependency management to force versions and exclusions Use “Maven helper” Intellij plugin to check issues
https://github.com/krasa/MavenHelperhttps://plugins.jetbrains.com/plugin/7179
46Java EE conference 2016