93
Nathan Marz Twitter Distributed and fault-tolerant realtime computation Storm

Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Nathan MarzTwitter

Distributed and fault-tolerant realtime computation

Storm

Page 2: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Basic info

• Open sourced September 19th

• Implementation is 15,000 lines of code

• Used by over 25 companies

• >2700 watchers on Github (most watched JVM project)

• Very active mailing list

• >2300 messages

• >670 members

Page 3: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Before Storm

Queues Workers

Page 4: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

(simplified)

Page 5: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Workers schemify tweets and append to Hadoop

Page 6: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Workers update statistics on URLs by incrementing counters in Cassandra

Page 7: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Use mod/hashing to make sure same URL always goes to same worker

Page 8: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

ScalingDeploy

Reconfigure/redeploy

Page 9: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Problems

• Scaling is painful

• Poor fault-tolerance

• Coding is tedious

Page 10: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

What we want

• Guaranteed data processing

• Horizontal scalability

• Fault-tolerance

• No intermediate message brokers!

• Higher level abstraction than message passing

• “Just works”

Page 11: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm

Guaranteed data processing

Horizontal scalability

Fault-tolerance

No intermediate message brokers!

Higher level abstraction than message passing

“Just works”

Page 12: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streamprocessing

Continuouscomputation

DistributedRPC

Use cases

Page 13: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm Cluster

Page 14: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm Cluster

Master node (similar to Hadoop JobTracker)

Page 15: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm Cluster

Used for cluster coordination

Page 16: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm Cluster

Run worker processes

Page 17: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Starting a topology

Page 18: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Killing a topology

Page 19: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Concepts

• Streams

• Spouts

• Bolts

• Topologies

Page 20: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streams

Unbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

Page 21: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Spouts

Source of streams

Page 22: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Spout examples

• Read from Kestrel queue

• Read from Twitter streaming API

Page 23: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Bolts

Processes input streams and produces new streams

Page 24: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Bolts

• Functions

• Filters

• Aggregation

• Joins

• Talk to databases

Page 25: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Topology

Network of spouts and bolts

Page 26: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Tasks

Spouts and bolts execute as many tasks across the cluster

Page 27: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Task execution

Tasks are spread across the cluster

Page 28: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Task execution

Tasks are spread across the cluster

Page 29: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Stream grouping

When a tuple is emitted, which task does it go to?

Page 30: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Stream grouping

• Shuffle grouping: pick a random task

• Fields grouping: mod hashing on a subset of tuple fields

• All grouping: send to all tasks

• Global grouping: pick task with lowest id

Page 31: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Topology

shuffle

[“url”]

shuffle

shuffle

[“id1”, “id2”]

all

Page 32: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

TopologyBuilder is used to construct topologies in Java

Page 33: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Define a spout in the topology with parallelism of 5 tasks

Page 34: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Split sentences into words with parallelism of 8 tasks

Page 35: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Consumer decides what data it receives and how it gets grouped

Streaming word count

Split sentences into words with parallelism of 8 tasks

Page 36: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Create a word count stream

Page 37: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

splitsentence.py

Page 38: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Page 39: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Submitting topology to a cluster

Page 40: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Streaming word count

Running topology in local mode

Page 41: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Demo

Page 42: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Distributed RPC

Data flow for Distributed RPC

Page 43: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

DRPC Example

Computing “reach” of a URL on the fly

Page 44: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach

Reach is the number of unique peopleexposed to a URL on Twitter

Page 45: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Computing reach

URL

Tweeter

Tweeter

Tweeter

Follower

Follower

Follower

Follower

Follower

Follower

Distinct follower

Distinct follower

Distinct follower

Count Reach

Page 46: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Page 47: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Page 48: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Page 49: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Keep set of followers foreach request id in memory

Page 50: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Update followers set whenreceive a new follower

Page 51: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Reach topology

Emit partial count afterreceiving all followers for a

request id

Page 52: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Demo

Page 53: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

“Tuple tree”

Page 54: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

• A spout tuple is not fully processed until all tuples in the tree have been completed

Page 55: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

• If the tuple tree is not completed within a specified timeout, the spout tuple is replayed

Page 56: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

Reliability API

Page 57: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

“Anchoring” creates a new edge in the tuple tree

Page 58: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

Marks a single node in the tree as complete

Page 59: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Guaranteeing message processing

• Storm tracks tuple trees for you in an extremely efficient way

Page 60: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Transactional topologies

How do you do idempotent counting with anat least once delivery guarantee?

Page 61: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Won’t you overcount?

Transactional topologies

Page 62: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Transactional topologies solve this problem

Transactional topologies

Page 63: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Built completely on top of Storm’s primitivesof streams, spouts, and bolts

Transactional topologies

Page 64: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Enables fault-tolerant, exactly-once messaging semantics

Transactional topologies

Page 65: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Batch 1 Batch 2 Batch 3

Transactional topologies

Process small batches of tuples

Page 66: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Batch 1 Batch 2 Batch 3

Transactional topologies

If a batch fails, replay the whole batch

Page 67: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Batch 1 Batch 2 Batch 3

Transactional topologies

Once a batch is completed, commit the batch

Page 68: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Batch 1 Batch 2 Batch 3

Transactional topologies

Bolts can optionally be “committers”

Page 69: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Commit 1

Transactional topologies

Commits are ordered. If there’s a failure during commit, the whole batch + commit is retried

Commit 1 Commit 2 Commit 3 Commit 4 Commit 4

Page 70: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Page 71: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

New instance of this objectfor every transaction attempt

Page 72: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Aggregate the count forthis batch

Page 73: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

Only update database if transaction ids differ

Page 74: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

This enables idempotency sincecommits are ordered

Page 75: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Example

(Credit goes to Kafka devs for this trick)

Page 76: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Transactional topologies

Multiple batches can be processed in parallel, but commits are guaranteed to be ordered

Page 77: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

• Requires a more sophisticated source queue than Kestrel or RabbitMQ

• storm-contrib has a transactional spout implementation for Kafka

Transactional topologies

Page 78: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm UI

Page 79: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Storm on EC2

https://github.com/nathanmarz/storm-deploy

One-click deploy tool

Page 80: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Starter code

https://github.com/nathanmarz/storm-starter

Example topologies

Page 81: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Documentation

Page 82: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Ecosystem

• Scala, JRuby, and Clojure DSL’s

• Kestrel, Redis, AMQP, JMS, and other spout adapters

• Multilang adapters

• Cassandra, MongoDB integration

Page 83: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Questions?

http://github.com/nathanmarz/storm

Page 84: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Future work

• State spout

• Storm on Mesos

• “Swapping”

• Auto-scaling

• Higher level abstractions

Page 85: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

KafkaTransactionalSpout

Page 86: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

Page 87: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

TransactionalSpout is a subtopology consisting of a spout and a bolt

Page 88: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

The spout consists of one task that coordinates the transactions

Page 89: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

The bolt emits the batches of tuples

Page 90: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

The coordinator emits a “batch” streamand a “commit stream”

Page 91: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

Batch stream

Page 92: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

Commit stream

Page 93: Storm - assets.en.oreilly.comassets.en.oreilly.com/1/event/75/Storm_ distributed and fault-tolerant... · Storm Guaranteed data processing Horizontal scalability Fault-tolerance No

Implementation

all

all

all

Coordinator reuses tuple tree framework to detectsuccess or failure of batches or commits and replays

appropriately