33
Storm and Cassandra Cassandra NYC Meetup 11/5/2013 Jake Luciani (@tjake)

Storm and Cassandra

Embed Size (px)

DESCRIPTION

Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra. There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler. http://github.com/tjake/stormscraper

Citation preview

Page 1: Storm and Cassandra

Storm and CassandraCassandra NYC Meetup 11/5/2013

Jake Luciani (@tjake)

Page 2: Storm and Cassandra

What is Storm?

• Distributed event processor

• Provides constructs to reliably process all events

• Simple conceptual model

• New to Apache Incubator: http://wiki.apache.org/incubator/StormProposal

Page 3: Storm and Cassandra

Storm ConceptsSpout - Collects work and submits it to be processed. Tracks success or failure of each tuple.

Bolt - Processes tuples and optionally emits more tuples.

… Tuple - A collection of data that is passed within storm.

Stream - Identifies outputs from a Spout/Bolt. Forces tuples have some declared structure.

Page 4: Storm and Cassandra

Host C

Host B

Host A

Storm TopologiesA directed graph of spouts and bolts connected via streams

Zookeeper

A-F

G-P

Q-Z

Firehose Cassandra (optional)

Page 5: Storm and Cassandra

Example Topologies

• Track the top 10 most popular links being shared in the last N minutes.

Page 6: Storm and Cassandra

Where does data end up?

• Storm supports built in RPC so client requests can effectively become a spout.

!

• Put the data into a database…

• Why Cassandra though?

Page 7: Storm and Cassandra

Why Cassandra?

• Cassandra’s Data model allows incremental modifications to rows.

• Different bolts can update different parts of a Cassandra row asynchronously.

Page 8: Storm and Cassandra

Example

Page 9: Storm and Cassandra

StormScraper!A web crawling system built on

Storm + Cassandra !

http://github.com/tjake/stormscraper

Page 10: Storm and Cassandra

StormScraper C* DataModel!CREATE TABLE pages ( url text, scrape_date timestamp, title text, html text, text text, inbound_links set<text>, outbound_links set<text>, PRIMARY KEY (url, scrape_date) );

CREATE TABLE scrape_list ( url text PRIMARY KEY, last_update timestamp, depth int );

Page 11: Storm and Cassandra

StormScraper Topology

Page 12: Storm and Cassandra

StormScraper Topology

Cassandra

Page 13: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 14: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 15: Storm and Cassandra

StormScraper Topology

Url Spout

Cassandra

Page 16: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 17: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 18: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Cassandra

Page 19: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Html Writer

Cassandra

Page 20: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Html Writer

Link Writer

Cassandra

Page 21: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Cassandra

Page 22: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 23: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 24: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 25: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 26: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 27: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Page 28: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 29: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 30: Storm and Cassandra

StormScraper Topology

Url Spout

Scraper Bolt

Text Extraction

Bolt

Html Writer

Link Writer

Text Writer

Cassandra

Fail

Page 31: Storm and Cassandra

Code Walkthrough http://github.com/tjake/

stormscraper

Page 32: Storm and Cassandra

Storm Summary

• Powerful

• But easy to make mistakes

• Wrong tuple expectation, names, types

• Bad topology wiring

Page 33: Storm and Cassandra

Thank You! Q&A?