Future of Apache StormHadoop Summit 2016, Melbourne
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About usArun Iyer, Hortonworks
Apache Storm Committer/[email protected]
Satish Duggana, HortonworksApache Storm Committer/[email protected]
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Storm 1.0Maturity and Improved Performance
Release Date: April 12, 2016
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Cache API
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Cache API
Topology resources– Dictionaries, ML Models, Geolocation Data, etc.
Typically packaged in topology jar– Fine for small files– Large files negatively impact topology startup time– Immutable: Changes require repackaging and deployment
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Cache API
Allows sharing of files (BLOBs) among topologies
Files can be updated from the command line
Allows for files from several KB to several GB in size
Files can change over the lifetime of the topology
Allows for compression (e.g. zip, tar, gzip)
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Cache API
Two implementations: LocalFsBlobStore and HdfsBlobStore Local implementation supports Replication Factor (not needed for HDFS-backed
implementation) Both support ACLs
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Cache API
Creating a blob:
– storm blobstore create --file dict.txt --acl o::rwa --repl-fctr 2 key1
Making it available to a topology:
– storm jar topo.jar my.topo.Class test_topo -c topology.blobstore.map=‘{"key1":{"localname":"dict.txt", "uncompress":"false"}}'
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Highly Available Nimbus
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus
Nimbus HA
ZooKeeper
Supervisor
Worker*
Supervisor
Worker*
Supervisor
Worker*
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus
Leader
Nimbus
Nimbus
Nimbus HA
ZooKeeper
Supervisor
Worker*
Supervisor
Worker*
Supervisor
Worker*
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus
Leader
Nimbus
Nimbus
Nimbus HA
ZooKeeper
Supervisor
Worker*
Supervisor
Worker*
Supervisor
Worker*
Leader election
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus
Nimbus
Leader
Nimbus
Nimbus HA
ZooKeeper
Supervisor
Worker*
Supervisor
Worker*
Supervisor
Worker*
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus
Nimbus
Leader
Nimbus
Nimbus HA
ZooKeeper
Supervisor
Worker*
Supervisor
Worker*
Supervisor
Worker*
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Nimbus HA
Increase overall availability of Nimbus Nimbus hosts can join/leave at any time Leverages Distributed Cache API Topology JAR, Config, and Serialized Topology uploaded to Distributed Cache Replication guarantees availability of all files
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
PacemakerHeartbeat Server
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pacemaker
Replaces Zookeeper for Heartbeats In-Memory key-value store Allows Scaling to 2k-3k+ Nodes Secure: Kerberos/Digest Authentication
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pacemaker
Compared to Zookeeper– Less Memory/CPU– No Disk– No overhead of maintaining consistency
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automatic Back Pressure
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automatic Back Pressure
In previous Storm versions, the only way to throttle topologies was to enable ACKing and set topology.spout.max.pending.
If you don’t require at-least-once guarantees, this imposed a significant performance penalty.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automatic Back Pressure
High/Low Watermarks (expressed as % of task’s buffer size) Back Pressure thread monitors task’s buffers If High Watermark reached, slow down Spouts If Low Watermark reached, stop throttling All Spouts Supported
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Up to 16x faster throughputRealistically 3x -- Highly dependent on use case and fault tolerance settings
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
> 60% Latency Reduction
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Aware Scheduler(RAS)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Aware Scheduler
Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts)
Memory: On-Heap / Off-Heap (if off-heap is used) CPU: Point system based on number of cores Resource requirements are per component instance (parallelism
matters)
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Aware Scheduler
CPU and Memory availability described in storm.yaml on each supervisor node. E.g.:
supervisor.memory.capacity.mb: 3072.0supervisor.cpu.capacity: 400.0
Convention for CPU capacity is to use 100 for each CPU core
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource Aware Scheduler
SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10);//set cpu requirementspout.setCPULoad(20);//set onheap and offheap memory requirementspout.setMemoryLoad(64, 16);
BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1");//sets cpu requirement. Not neccessary to set both CPU and memory.//For requirements not set, a default value will be usedbolt1.setCPULoad(15);
BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1");bolt2.setMemoryLoad(100);
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing support
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing support
Fundamental abstraction for processing continuous streams of events
Breaks the stream of events into sub-sets
The sub-sets can then be aggregated, joined or analyzed.
Why is it important?
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing support
Trending items– The top trending search queries in the last day
– Top tweets in the last 1 hour
Complex event processing– Windowing is fundamental to gain meaningful insights from a stream of events
– E.g. If same credit card is used from geographically distributed locations within the last one hour, it could be a fraud.
Some use cases
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing support
Sliding Window
time0 5 10 15
time0 5 10 15
Tumbling Window
Windows do not overlap
Windows can overlap
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }
public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));
// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }
public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));
// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}
triggered when the window activates based on the window configuration
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }
public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));
// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}
the window configuration
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing supportTrident
topology.newStream("spout", spout) .window( SlidingDurationWindow.of(Duration.minutes(10), Duration.minutes(2)), // window configuration new InMemoryWindowsStoreFactory(), // window store factory new Fields("val"), // input field new Min("val"), // aggregate function new Fields("max") // output field );
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native windowing support
Processing time Event time
– e.g. Processing events out of a log file based on when the event actually happened– Implemented based on watermarks
Out of order and late events– e.g. Mobile gaming data where the user goes offline and data arrives after sometime
Windowing considerations
6:026:01 6:05 6:08
7:017:00
6:10 6:12
7:02
6:15
Processing time
6:09 6:126:08
watermark watermark
6:10
late event
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State ManagementStateful Bolts with Automatic Checkpointing
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Management
Allows storm bolts to store and retrieve the state of its computation
The framework automatically and periodically snapshots the state of the bolts
In-memory and Redis based state stores
Implementation based on– Asynchronous Barrier Snapshotting (ABS) algorithm [1]– Chandy-Lamport Algorithm [2]
1. http://arxiv.org/abs/1506.086032. http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts; private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void initState(KeyValueState state) { this.wordCounts = state; }
public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts; private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void initState(KeyValueState state) { this.wordCounts = state; }
public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}
Initialize state
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {
private KeyValueState wordCounts; private OutputCollector collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }
public void initState(KeyValueState state) { this.wordCounts = state; }
public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}
Update state
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Management
Checkpointing/Snapshotting: What you see
Spout Stateful Bolt1 Stateful Bolt2Bolt1
execute/update state execute/update stateexecute
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
State Management
Checkpointing/Snapshotting: What you get
Spout
Checkpoint Spout
Stateful Bolt1 Stateful Bolt2Bolt1
State Store
ACKER
ACK
ACKACK
chkpt chkpt
chkpt
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL(currently Beta/WIP)
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL
SQL is a well adopted standard– available in several projects like Drill, Hive, Phoenix, Spark
Faster development cycle Opportunity to unify batch and stream data analytics
Benefits
SELECT word, COUNT(*) AS totalFROM wordsGROUP BY word
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL
Leverages Apache calcite Compiles SQL statements into Trident topologies Usage:
– $ bin/storm sql <sql-file> <topo-name>
Storm implementation
49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL
Example
CREATE EXTERNAL TABLE ORDERS (ID INT PRIMARY KEY, UNIT_PRICE INT, QUANTITY INT) LOCATION 'kafka://localhost:2181/brokers?topic=orders’
CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) LOCATION 'kafka://localhost:2181/brokers?topic=large_orders’
INSERT INTO LARGE_ORDERSSELECT ID, UNIT_PRICE * QUANTITY AS TOTAL FROM ORDERS WHERE UNIT_PRICE * QUANTITY > 50
50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL
Storm SQL workflow
51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming SQL
Storm SQL workflow
52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm Usability ImprovementsEnhanced Debugging and Monitoring of Topologies
53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm usability improvements
./bin/storm set_log_level [topology name]-l [logger name]=[LEVEL]:[TIMEOUT]
Dynamic Log Levels
Storm CLI:
Storm UI:
54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm usability improvements
No more debug bolts or Trident functions Click on the “Events” link to view the sample log.
Tuple Sampling
55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm usability improvements
Distributed Log search– Search across all topology logs, even archived (ZIP) logs.
Dynamic worker profiling– Heap dumps, JStack output, Profiler recordings
Restart workers from UI Supervisor health checks
56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrations
New Cassandra Solr Elastic Search MQTT
Improvements Kafka HDFS spout Hbase Hive
57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s next for Storm?2.0 and Beyond
58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Clojure to JavaBroadening the contributor base
Alibaba JStorm Contribution
59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Beam (incubating) Integration Unified API for dealing with
bounded/unbounded data sources (i.e. batch/streaming)
One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Storm 2.0 and beyond
Windowing enhancements Unified Stream API Revamped UI Metrics improvements
and many more…
61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q&AThank You