Shipping YaaS logs with Apache Spark and KafkaDogukan Sonmez
Senior Software Engineer @hybris Software@dogukansonmez
Agenda
² Introduction to Yaas
² Architecture of Logging pipeline
² Technology behind logging pipeline
² Challenges
² Recap
² Q&A
What is YaaS
SAP hybris as a Service (YaaS)
A micro-service based Business PaaS
Integrated with hybris and SAP Solutions
Build
Publish
Fast
yaas.io
Architecture of Logging pipeline
Architecture of Logging pipeline
Technology behind logging pipeline
High Throughput messaging
BrokerDistributed
Scalable
Fault Tolerant
TopicPartition
Replicated
Offset
Technology behind logging pipeline
Micro Batching RDD
Streaming
DAG
Reliable
ML
Scalable
Graph
Fast
Big Data pipeline challenges
Reliability of Kafka
v 3 Brokers
v 3 Zookeeper instances
v default.replication.factor=2
v Mainly with Default Configurations
v 5 Brokers
v 5 Zookeeper instances
v unclean.leader.election.enable=false
v min.insync.replicas=2
v default.replication.factor=3
BEFORE AFTER
Big Data pipeline challenges
Spark Streaming Checkpointing
v Spark checkpointing
v All RDD serialized and stored at HDFS
v Custom kafka checkpointing
(Only latest offset stored at kafka)
BEFORE AFTER
Big Data pipeline challenges
Elasticsearch indexing big data
v Default mapping
v index.refresh_interval = 1s
v Indices.memory_index_buffer_size= 10%
v Custom mapping with disabled norms
v Mapping using simple analyzer
v index.refresh_interval = 30s
v Indices.memory_index_buffer_size= 30%
v spark.streaming.kafka.maxRatePerPartition=10000
BEFORE AFTER
Recap
Recap
Q&A
https://hackingat.hybris.com