Upload
gluent
View
1.911
Download
0
Embed Size (px)
Citation preview
I’lltellyouabout
• Whatisstreamprocessingandwhyitmatters• WhatisApacheKafka• HowKafkahelpsstreamprocessing
Stayawakeforthispart
StreamProcessingParadigm
• Dataisgeneratedatitsownrateas“Streams”• Wecanprocessasmuchoraslittleaswewant• Continuously• Resultsareavailableinreal-time• Butnothingwaitsforspecificresults• Timefordataavailability?• Morethan“fewms”• Lessthan“hours”
Thisistheworldchangingbit
• Mostofthebusinessis…• Noturgentenoughtorequireimmediateresponse• Butcan’twaitforthenextday
• “Streamsofevents”representssomethingfundamental• Samewayrelationaltablesarefundamental
ButLogsarealsoaSTREAMofeventsAndKafkastoresthoselogs
Allowingtoreadthepastandkeepgettingupdatesonthefuture
Method2:TheStreamProcessingFrameworks• Storm• Spark• Flink• Samza• Apex• Nifi• StreamBase• InfoSphere Streams• GoogleDataFlow (AKABeam)• Icangoonfor5morepages…
WhatdoImeanbytoocomplex?
HadoopClusterIIStorage Processing
SolR
HadoopClusterI
ClientClientFlumeAgents
Hbase /Memory
SparkStreaming
HDFS
Hive/Impala
Map/Reduce
Spark
Search
Automated&Manual
AnalyticalAdjustmentsandPatterndetection
Fetching&UpdatingProfiles
AdjustingNRTStats
HDFSEventSink
SolR Sink
BatchTimeAdjustments
Automated&Manual
ReviewofNRTChangesandCounters
LocalCache
Kafka
Clients:(Swipehere!)
WebApp
Whysomanymovingparts?
Weneeded…Hbase tohandlecomplexstateSparkrequiresHDFSIngestlayerBatchlayertohandlere-calculations
NoFramework
• Itisjustalibrarythatdoestransformations• Wecanaddlanguagesontop• Kafkadoeseverythingweneededtheframeworktodo• Youdon’tneed“framework”torunqueries,whydoyouneedittorunqueriescontinuously?
Wecanconverttablestostreamsandback:
Stream->Apply->TableTable->ChangeCapture->Stream
ThisiscalledTable-StreamDuality.