Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
@
Jeremy PierreYet Another Developer j14159
Lessons Learned, Traps Sprung, Stuff From The Trenches At HootSuite
Scalable Event-Based Systems
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Gaps happen when you don’t pull o!en enough.
REST breaks down at a certain volume.
Why?
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Given a stream of discrete data items, a chunk missing
in the middle.
Gaps happen when you don’t pull o!en enough.
REST breaks down at a certain volume.
Why?
WTF is a gap?
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Why Event Based at HootSuite?
Most public APIs have limitations:
• One user per API call
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Why Event Based at HootSuite?
• One user per API call
• Lots of requests in parallel
> 4 million users
millions of Twitter accounts
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Most public APIs have limitations:
Monday, 23 July, 12
Why Event Based at HootSuite?
• One user per API call
• Lots of requests in parallel
> 4 million users
millions of Twitter accounts
• HTTP/HTTPS is expensive
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Most public APIs have limitations:
Monday, 23 July, 12
The First Steps
Use Twitter’s Streaming API...
• General data collection/categorization
• Push Notifications first
• Analytics
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Halfway to Event Sourcing...
What is “Event Based”?
• Push vs pull
• Discrete domain object instances
• Time ordered
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Halfway to Event Sourcing...
What is “Event Based”?
• Push vs pull
• Discrete domain object instances
• Time ordered
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
A system that relies on discrete time-ordered data elements pushed to components rather than
components pulling one or more elements from a common source.
Monday, 23 July, 12
Looks a little bit like OOP when you squint...
Evented System Characteristics
• Generalized standalone components
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Looks a little bit like OOP when you squint...
Evented System Characteristics
• Generalized standalone components
• Well-known domain objects (interfaces)
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Looks a little bit like OOP when you squint...
Evented System Characteristics
• Generalized standalone components
• Well-known domain objects (interfaces)
• Inputs and outputs are message queues
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
And no, it’s not RPC
Monday, 23 July, 12
The Players
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Scala: functional(cool) and objects(comfort)
• Akka: concurrency, fault tolerance
• RabbitMQ: client-agnostic messaging
• ZooKeeper: self-awareness
And this is growing fast.
Monday, 23 July, 12
The Streaming Base Layer
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Dispatch assigns work
• Harvesters collect data
• Processors perform loose categorization
A collection of loosely couple services
Monday, 23 July, 12
General message bus
How RabbitMQ is Used
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Harvesters → single durable queue• Processors consume from harvesters• Processors → topic exchange
Monday, 23 July, 12
site.twitter.user123.mentionsite.twitter.*.mention
Push service consumes from topic exchange:
General message bus
How RabbitMQ is Used
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Harvesters → single durable queue• Processors consume from harvesters• Processors → topic exchange
Monday, 23 July, 12
When RabbitMQ falls over...
The Development Apocalypse
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Each device & user combo gets a queue
+
Monday, 23 July, 12
When RabbitMQ falls over...
The Development Apocalypse
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Each device & user combo gets a queue
150,000 device & user combinations+
+
Monday, 23 July, 12
When RabbitMQ falls over...
The Development Apocalypse
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Each device & user combo gets a queue
150,000 device & user combinations
Push server stops
Maximum Badness
+
+
=
Monday, 23 July, 12
The solutions...
• Do some routing/filtering in your app
• Federation
• TCP load balancers
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
The Development Apocalypse
No silver bullet
Monday, 23 July, 12
How ZooKeeper is Used
Self-Awareness:
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Lets dispatch watch for recovery/load balancing opportunities
• Less configuration
• Node health snapshots
Monday, 23 July, 12
How ZooKeeper is Used
Clustering and leader election:
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
• Dispatch nodes
• Push nodes
• Metrics master
Monday, 23 July, 12
The Production Apocalypse
When ZooKeeper meets The Cloud:
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Default (sane) session timeout
+
Monday, 23 July, 12
The Production Apocalypse
When ZooKeeper meets The Cloud:
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Default (sane) session timeout
Major reaction to disappearing nodes+
+
Monday, 23 July, 12
The Production Apocalypse
When ZooKeeper meets The Cloud:
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Default (sane) session timeout
Major reaction to disappearing nodes
Unpredictable pseudo network
Maximum Badness
+
+
=
Monday, 23 July, 12
Events and LAMP
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
It’s not all bad news for REST...
Monday, 23 July, 12
Events and LAMP
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
...as long as it is only the source of events.
It’s not all bad news for REST...
• Push subscriptions from public API
• Subscriptions to base layer from Analytics
• New message posts
Monday, 23 July, 12
Push Servers → HootSuite API
Feedback is sometimes necessary:
• UrbanAirship reports invalid devices
• HootSuite API uses a different DB
• API has to poll a durable queue (cron)
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Streaming Base → Analytics
Up to date and fast required...
• uses mention and retweet data
• high frequency polling, daemon process
• durable, non-autodelete, !auto-ack queue
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
The Speedbump
When RabbitMQ meets PHP
• cron and daemon are your friends
• non-autodelete, no auto-ack
• durable is optional
• consume as frequently as possible
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Metrics for the Masses: Graphite
Graph Everything
Metrics updating every 5-15
minutes is insufficient
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Metrics for the Masses: Graphite
Graph Everything
Metrics updating every 5-15
minutes is insufficient
• Instant insight• see external problems before they’re reported• easy custom graphs• UI’s not great but it works
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Getting Rid of REST: ∅MQ
For internal APIs, timeliness > throughput
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Monday, 23 July, 12
Getting Rid of REST: ∅MQ
For internal APIs, timeliness > throughput
• Enterprise clients have shifting team structures
• Access control changes must propagate quickly
• Uniform API
twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com
Decentralized Message Queues:
Monday, 23 July, 12