61
Future of Apache Storm Hadoop Summit 2016, Melbourne

Future of Apache Storm

Embed Size (px)

Citation preview

Page 1: Future of Apache Storm

Future of Apache StormHadoop Summit 2016, Melbourne

Page 2: Future of Apache Storm

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About usArun Iyer, Hortonworks

Apache Storm Committer/[email protected]

Satish Duggana, HortonworksApache Storm Committer/[email protected]

Page 3: Future of Apache Storm

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 4: Future of Apache Storm

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Storm 1.0Maturity and Improved Performance

Release Date: April 12, 2016

Page 5: Future of Apache Storm

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Cache API

Page 6: Future of Apache Storm

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Cache API

Topology resources– Dictionaries, ML Models, Geolocation Data, etc.

Typically packaged in topology jar– Fine for small files– Large files negatively impact topology startup time– Immutable: Changes require repackaging and deployment

Page 7: Future of Apache Storm

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Cache API

Allows sharing of files (BLOBs) among topologies

Files can be updated from the command line

Allows for files from several KB to several GB in size

Files can change over the lifetime of the topology

Allows for compression (e.g. zip, tar, gzip)

Page 8: Future of Apache Storm

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Cache API

Two implementations: LocalFsBlobStore and HdfsBlobStore Local implementation supports Replication Factor (not needed for HDFS-backed

implementation) Both support ACLs

Page 9: Future of Apache Storm

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Cache API

Creating a blob:

– storm blobstore create --file dict.txt --acl o::rwa --repl-fctr 2 key1

Making it available to a topology:

– storm jar topo.jar my.topo.Class test_topo -c topology.blobstore.map=‘{"key1":{"localname":"dict.txt", "uncompress":"false"}}'

Page 10: Future of Apache Storm

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Highly Available Nimbus

Page 11: Future of Apache Storm

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus

Nimbus HA

ZooKeeper

Supervisor

Worker*

Supervisor

Worker*

Supervisor

Worker*

Page 12: Future of Apache Storm

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus

Leader

Nimbus

Nimbus

Nimbus HA

ZooKeeper

Supervisor

Worker*

Supervisor

Worker*

Supervisor

Worker*

Page 13: Future of Apache Storm

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus

Leader

Nimbus

Nimbus

Nimbus HA

ZooKeeper

Supervisor

Worker*

Supervisor

Worker*

Supervisor

Worker*

Leader election

Page 14: Future of Apache Storm

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus

Nimbus

Leader

Nimbus

Nimbus HA

ZooKeeper

Supervisor

Worker*

Supervisor

Worker*

Supervisor

Worker*

Page 15: Future of Apache Storm

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus

Nimbus

Leader

Nimbus

Nimbus HA

ZooKeeper

Supervisor

Worker*

Supervisor

Worker*

Supervisor

Worker*

Page 16: Future of Apache Storm

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nimbus HA

Increase overall availability of Nimbus Nimbus hosts can join/leave at any time Leverages Distributed Cache API Topology JAR, Config, and Serialized Topology uploaded to Distributed Cache Replication guarantees availability of all files

Page 17: Future of Apache Storm

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

PacemakerHeartbeat Server

Page 18: Future of Apache Storm

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Pacemaker

Replaces Zookeeper for Heartbeats In-Memory key-value store Allows Scaling to 2k-3k+ Nodes Secure: Kerberos/Digest Authentication

Page 19: Future of Apache Storm

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Pacemaker

Compared to Zookeeper– Less Memory/CPU– No Disk– No overhead of maintaining consistency

Page 20: Future of Apache Storm

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automatic Back Pressure

Page 21: Future of Apache Storm

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automatic Back Pressure

In previous Storm versions, the only way to throttle topologies was to enable ACKing and set topology.spout.max.pending.

If you don’t require at-least-once guarantees, this imposed a significant performance penalty.

Page 22: Future of Apache Storm

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automatic Back Pressure

High/Low Watermarks (expressed as % of task’s buffer size) Back Pressure thread monitors task’s buffers If High Watermark reached, slow down Spouts If Low Watermark reached, stop throttling All Spouts Supported

Page 23: Future of Apache Storm

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Performance

Page 24: Future of Apache Storm

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Up to 16x faster throughputRealistically 3x -- Highly dependent on use case and fault tolerance settings

Page 25: Future of Apache Storm

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

> 60% Latency Reduction

Page 26: Future of Apache Storm

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Resource Aware Scheduler(RAS)

Page 27: Future of Apache Storm

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Resource Aware Scheduler

Specify the resource requirements (Memory/CPU) for individual topology components (Spouts/Bolts)

Memory: On-Heap / Off-Heap (if off-heap is used) CPU: Point system based on number of cores Resource requirements are per component instance (parallelism

matters)

Page 28: Future of Apache Storm

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Resource Aware Scheduler

CPU and Memory availability described in storm.yaml on each supervisor node. E.g.:

supervisor.memory.capacity.mb: 3072.0supervisor.cpu.capacity: 400.0

Convention for CPU capacity is to use 100 for each CPU core

Page 29: Future of Apache Storm

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Resource Aware Scheduler

SpoutDeclarer spout = builder.setSpout("sp1", new TestSpout(), 10);//set cpu requirementspout.setCPULoad(20);//set onheap and offheap memory requirementspout.setMemoryLoad(64, 16);

BoltDeclarer bolt1 = builder.setBolt("b1", new MyBolt(), 3).shuffleGrouping("sp1");//sets cpu requirement. Not neccessary to set both CPU and memory.//For requirements not set, a default value will be usedbolt1.setCPULoad(15);

BoltDeclarer bolt2 = builder.setBolt("b2", new MyBolt(), 2).shuffleGrouping("b1");bolt2.setMemoryLoad(100);

Page 30: Future of Apache Storm

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing support

Page 31: Future of Apache Storm

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing support

Fundamental abstraction for processing continuous streams of events

Breaks the stream of events into sub-sets

The sub-sets can then be aggregated, joined or analyzed.

Why is it important?

Page 32: Future of Apache Storm

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing support

Trending items– The top trending search queries in the last day

– Top tweets in the last 1 hour

Complex event processing– Windowing is fundamental to gain meaningful insights from a stream of events

– E.g. If same credit card is used from geographically distributed locations within the last one hour, it could be a fraud.

Some use cases

Page 33: Future of Apache Storm

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing support

Sliding Window

time0 5 10 15

time0 5 10 15

Tumbling Window

Windows do not overlap

Windows can overlap

Page 34: Future of Apache Storm

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }

public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));

// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}

Page 35: Future of Apache Storm

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }

public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));

// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}

triggered when the window activates based on the window configuration

Page 36: Future of Apache Storm

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing supportpublic class WindowMaxBolt extends BaseWindowedBolt { private OutputCollector collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void execute(TupleWindow inputWindow) { int max = Integer.MIN_VALUE; for (Tuple tuple: inputWindow.get()) { max = Math.Max(max, tuple.getIntegerByField(”val")); } collector.emit(new Values(max)); }

public static void main(String[] args) { // ... builder.setBolt("slidingMax", new WindowMaxBolt().withWindow(Duration.minutes(10), Duration.minutes(2)));

// submit topology StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); }}

the window configuration

Page 37: Future of Apache Storm

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing supportTrident

topology.newStream("spout", spout) .window( SlidingDurationWindow.of(Duration.minutes(10), Duration.minutes(2)), // window configuration new InMemoryWindowsStoreFactory(), // window store factory new Fields("val"), // input field new Min("val"), // aggregate function new Fields("max") // output field );

Page 38: Future of Apache Storm

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native windowing support

Processing time Event time

– e.g. Processing events out of a log file based on when the event actually happened– Implemented based on watermarks

Out of order and late events– e.g. Mobile gaming data where the user goes offline and data arrives after sometime

Windowing considerations

6:026:01 6:05 6:08

7:017:00

6:10 6:12

7:02

6:15

Processing time

6:09 6:126:08

watermark watermark

6:10

late event

Page 39: Future of Apache Storm

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State ManagementStateful Bolts with Automatic Checkpointing

Page 40: Future of Apache Storm

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Management

Allows storm bolts to store and retrieve the state of its computation

The framework automatically and periodically snapshots the state of the bolts

In-memory and Redis based state stores

Implementation based on– Asynchronous Barrier Snapshotting (ABS) algorithm [1]– Chandy-Lamport Algorithm [2]

1. http://arxiv.org/abs/1506.086032. http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf

Page 41: Future of Apache Storm

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {

private KeyValueState wordCounts; private OutputCollector collector;

public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void initState(KeyValueState state) { this.wordCounts = state; }

public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}

Page 42: Future of Apache Storm

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {

private KeyValueState wordCounts; private OutputCollector collector;

public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void initState(KeyValueState state) { this.wordCounts = state; }

public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}

Initialize state

Page 43: Future of Apache Storm

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Managementpublic class WordCountBolt extends BaseStatefulBolt<KeyValueState> {

private KeyValueState wordCounts; private OutputCollector collector;

public void prepare(Map conf, TopologyContext context, OutputCollector collector) { this.collector = collector; }

public void initState(KeyValueState state) { this.wordCounts = state; }

public void execute(Tuple tuple) { String word = tuple.getString(0); Integer count = (Integer) wordCounts.get(word, 0); count++; wordCounts.put(word, count); collector.emit(new Values(word, count)); }}

Update state

Page 44: Future of Apache Storm

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Management

Checkpointing/Snapshotting: What you see

Spout Stateful Bolt1 Stateful Bolt2Bolt1

execute/update state execute/update stateexecute

Page 45: Future of Apache Storm

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

State Management

Checkpointing/Snapshotting: What you get

Spout

Checkpoint Spout

Stateful Bolt1 Stateful Bolt2Bolt1

State Store

ACKER

ACK

ACKACK

chkpt chkpt

chkpt

Page 46: Future of Apache Storm

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL(currently Beta/WIP)

Page 47: Future of Apache Storm

47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL

SQL is a well adopted standard– available in several projects like Drill, Hive, Phoenix, Spark

Faster development cycle Opportunity to unify batch and stream data analytics

Benefits

SELECT word, COUNT(*) AS totalFROM wordsGROUP BY word

Page 48: Future of Apache Storm

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL

Leverages Apache calcite Compiles SQL statements into Trident topologies Usage:

– $ bin/storm sql <sql-file> <topo-name>

Storm implementation

Page 49: Future of Apache Storm

49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL

Example

CREATE EXTERNAL TABLE ORDERS (ID INT PRIMARY KEY, UNIT_PRICE INT, QUANTITY INT) LOCATION 'kafka://localhost:2181/brokers?topic=orders’

CREATE EXTERNAL TABLE LARGE_ORDERS (ID INT PRIMARY KEY, TOTAL INT) LOCATION 'kafka://localhost:2181/brokers?topic=large_orders’

INSERT INTO LARGE_ORDERSSELECT ID, UNIT_PRICE * QUANTITY AS TOTAL FROM ORDERS WHERE UNIT_PRICE * QUANTITY > 50

Page 50: Future of Apache Storm

50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL

Storm SQL workflow

Page 51: Future of Apache Storm

51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming SQL

Storm SQL workflow

Page 52: Future of Apache Storm

52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Storm Usability ImprovementsEnhanced Debugging and Monitoring of Topologies

Page 53: Future of Apache Storm

53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Storm usability improvements

./bin/storm set_log_level [topology name]-l [logger name]=[LEVEL]:[TIMEOUT]

Dynamic Log Levels

Storm CLI:

Storm UI:

Page 54: Future of Apache Storm

54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Storm usability improvements

No more debug bolts or Trident functions Click on the “Events” link to view the sample log.

Tuple Sampling

Page 55: Future of Apache Storm

55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Storm usability improvements

Distributed Log search– Search across all topology logs, even archived (ZIP) logs.

Dynamic worker profiling– Heap dumps, JStack output, Profiler recordings

Restart workers from UI Supervisor health checks

Page 56: Future of Apache Storm

56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Integrations

New Cassandra Solr Elastic Search MQTT

Improvements Kafka HDFS spout Hbase Hive

Page 57: Future of Apache Storm

57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What’s next for Storm?2.0 and Beyond

Page 58: Future of Apache Storm

58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Clojure to JavaBroadening the contributor base

Alibaba JStorm Contribution

Page 59: Future of Apache Storm

59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Beam (incubating) Integration Unified API for dealing with

bounded/unbounded data sources (i.e. batch/streaming)

One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.

Page 60: Future of Apache Storm

60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Storm 2.0 and beyond

Windowing enhancements Unified Stream API Revamped UI Metrics improvements

and many more…

Page 61: Future of Apache Storm

61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q&AThank You