14
STORM DISTRIBUTED AND FAULT-TOLERANT REALTIME COMPUTATION Jimmy Zöger CLC < FIB < UPC 2013-06-03

Short introduction to Storm

Embed Size (px)

DESCRIPTION

Presentation given in class for Cloud Computing at Universitat Politècnica de Catalunya

Citation preview

Page 1: Short introduction to Storm

STORMDISTRIBUTED AND FAULT-TOLERANT

REALTIME COMPUTATION

Jimmy ZögerCLC < FIB < UPC

2013-06-03

Page 2: Short introduction to Storm

INTRODUCTION

• Like Hadoop for realtime processing instead of batch

•Open Source

•Developed by BackType which was later acquired by Twitter

•Developed for analyzing Twitter data

• Similar to S4

Page 3: Short introduction to Storm

STORM TOPOLOGY

Page 4: Short introduction to Storm

SPOUTS

Page 5: Short introduction to Storm

SPOUTS

• The component responsible for feeding messages into the topology

• Emits tuples

• Can be reliable or unreliable (ack() and fail())

Page 6: Short introduction to Storm

INTEGRATION

• Kestrel

• RabbitMQ

• Kafka

• JMS

• Integration is easy with the simple Spout abstraction

Page 7: Short introduction to Storm

BOLTS

Page 8: Short introduction to Storm

BOLTS

• A component that takes tuples as input and produces tuples as output

• Can do filtering, joining, functions, aggregations etc.

•Does not have to process a tuple immediately and may hold onto tuples to process later

• Comparison with Hadoop: A bolt can be a mapper or a reducer (or anything)

Page 9: Short introduction to Storm

STORM TOPOLOGY

Page 10: Short introduction to Storm

STORM TOPOLOGY

• Spouts, bolts and streams

•Distributed

• Runs indefinitely until it is stopped

• Arbitrary complexity

• Streams requiring multiple steps also requires multiple bolts

•No intermediate queues for streams

Page 11: Short introduction to Storm

FAULT-TOLERANCE

•Nimbus daemon and Supervisor daemons are fail-fast and stateless

• Each worker sends heartbeats to Nimbus

• Transactional topologies → Guaranteed processing

NimbusZookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Zookeeper

Page 12: Short introduction to Storm

USE CASES

• Counting words!

• Realtime analytics - trending topics on Twitter

•Online machine learning

• Continuous computation

•Distributed RPC

• Extract, Transform and Load (ETL)

Page 13: Short introduction to Storm

FAST

One benchmark clocked it over a million tuples processed

per second per node

{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠

Page 14: Short introduction to Storm

STORMDISTRIBUTED AND FAULT-TOLERANT

REALTIME COMPUTATION

Jimmy ZögerCLC < FIB < UPC

2013-06-03