Upload
flink-forward
View
79
Download
0
Embed Size (px)
Citation preview
UfukCelebi@iamuce
The Stream Processoras a Database
The(Classic)UseCaseRealtimeCountsandAggregates
2
(Real-)TimeSeriesStatistics
3
StreamofEvents Real-timeStatistics
TheArchitecture
4
collect messagequeue
analyze serve&store
TheFlinkJob
5
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
TheFlinkJob
6
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
TheFlinkJob
7
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
TheFlinkJob
8
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
TheFlinkJob
9
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")
TheFlinkJob
10
KafkaSource map() window()/
sum() Sink
KafkaSource map() window()/
sum() Sink
filter()
filter()
keyBy()
keyBy()
State
State
Puttingitalltogether
11
Periodically(everysecond)flushnewaggregates
toRedis
TheBottleneck
12
Writestothekey/valuestoretaketoolong
Queryable State
13
QueryableState
14
QueryableState
15
Optional,andonlyattheendof
windows
QueryableState:ApplicationView
16
Database
realtimeresults olderresults
Application QueryService
currenttimewindows
pasttimewindows
QueryableStateEnablers§ Flinkhasstateasafirstclasscitizen
§ Stateisfaulttolerant (exactlyoncesemantics)
§ Stateispartitioned (sharded)togetherwiththeoperatorsthatcreate/updateit
§ Stateiscontinuous (notminibatched)
§ Stateisscalable
17
StateinFlink
18
window()/sum()
Source/filter()/map()
Stateindex(e.g.,RocksDB)
Eventsarepersistentandordered (perpartition/key)
inthemessagequeue(e.g.,ApacheKafka)
Eventsflowwithoutreplicationor synchronouswrites
StateinFlink
19
window()/sum()
Source/filter()/map()
Triggercheckpoint Injectcheckpointbarrier
StateinFlink
20
window()/sum()
Source/filter()/map()
Takestatesnapshot Triggerstatecopy-on-write
StateinFlink
21
window()/sum()
Source/filter()/map()
Persiststatesnapshots Durablypersistsnapshots
asynchronously
Processingpipelinecontinues
QueryableState:Implementation
22
QueryClient
StateRegistry
window()/sum()
JobManager TaskManager
ExecutionGraph
StateLocationServer
deploy
status
Query:/job/state-name/key
StateRegistry
window()/sum()
TaskManager
(1)Getlocationof"key-partition"of"job"
(2)Lookuplocation
(3)Respondlocation
(4)Querystate-nameandkey
localstate
register
QueryableStatePerformance
23
Conclusion
24
Takeaways§ Streamingapplicationsareoftennotboundbythestream
processoritself.Crosssysteminteraction isfrequentlybiggestbottleneck
§ Queryablestatemitigatesabigbottleneck:Communicationwithexternalkey/valuestorestopublishrealtimeresults
§ ApacheFlink'ssophisticatedsupportforstatemakesthispossible
25
TakeawaysPerformanceofQueryableState
§ Datapersistenceisfastwithlogs• Appendonly,andstreamingreplication
§ Computedstateisfastwithlocaldatastructuresandnosynchronousreplication
§ Flink'scheckpointmethodmakescomputedstatepersistentwithlowoverhead
26
Questions?§ eMail:[email protected]§ Twitter:@iamuce§ Code/Demo:https://github.com/dataArtisans/flink-
queryable_state_demo
27
Appendix
28
Flink Runtime+APIs
29
DataStreamAPI
RuntimeDistributedStreamingDataFlow
TableAPI&StreamSQL
ProcessFunction API
Building Blocks: Streams, Time, State
ApacheFlinkArchitectureReview
30