View
265
Download
10
Category
Preview:
Citation preview
Azure 上基于 Spark Streaming的数据流实时计算 左继红
ACP-B205
Azure 上的实时计算
Spark Streaming 概念和功能
EventHub 示例:
val stream = EventHubUtils.createStream(ssc, eventHubName, partitionNum, consumerGroupName)
val dataset: RDD[Int, String] = … val metricsDS: DStream[Int, SensorMetrics] = stream.window(Seconds(3), Seconds(2)) val joinedDS: Dstream[Int, (SensorMetrics, String)] = metricsDS.transform(rdd => rdd.join(dataset))
val computeMeanFunc = (values: Seq[SensorMetrics], state: Option[SensorState]) => { val back_ax_vals = values.map(_.getSensorReading("back").get.ax) val back_ax_mean = back_ax_vals.reduce(_+_) / values.size val back_ax_dev = Math.pow(back_ax_vals.map(x => Math.pow(x-back_ax_mean, 2)). reduce(_+_) / values.size, 0.5) ... }
集成 EventHub
并行结构,避免资源竞争 事件可保存多天,可反复读取 可通过Throughput Unit控制性能
EventData
Offset Sequence number Body User properties System properties
Event Hub
Partition1
Partition2
Partition3
Partition4
事件按接收的时间存储
Offset: 字节偏移量
每个EventHubReceiver对应一个EventHub Partition 使用EventHubs Java client 底层使用Apache Qpid库访问EventHub,基于AMQP协议
EventHub的数据持久化存储 ResilientEventHubReceiver的自动恢复 Offset的定时checkpoint Metadata、RDD data定时checkpoint
Reliable Receiver: 当数据被成功接收并可靠存储后,向源发送确认 Unreliable Receiver: 不向源发送确认
Unreliable Receiver 通过offset checkpointing保证数据的可靠接收 Offset被存储于Azure Blob Storage
Azure 上的 Spark 集群部署
演示: 使用 Spark Streaming 实现动作信号的分析
Azure 上实时分析工具的比较
课后提醒
https://channel9.msdn.com/Events/Ignite/Microsoft-Ignite-China-2015
http://aka.ms/IgniteChina2015
Recommended