34
@ Jeremy Pierre Yet Another Developer j14159 Lessons Learned, Traps Sprung, Stuff From The Trenches At HootSuite Scalable Event-Based Systems twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com Monday, 23 July, 12

Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

@

Jeremy PierreYet Another Developer j14159

Lessons Learned, Traps Sprung, Stuff From The Trenches At HootSuite

Scalable Event-Based Systems

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 2: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Gaps happen when you don’t pull o!en enough.

REST breaks down at a certain volume.

Why?

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 3: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Given a stream of discrete data items, a chunk missing

in the middle.

Gaps happen when you don’t pull o!en enough.

REST breaks down at a certain volume.

Why?

WTF is a gap?

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 4: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Why Event Based at HootSuite?

Most public APIs have limitations:

• One user per API call

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 5: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Why Event Based at HootSuite?

• One user per API call

• Lots of requests in parallel

> 4 million users

millions of Twitter accounts

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Most public APIs have limitations:

Monday, 23 July, 12

Page 6: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Why Event Based at HootSuite?

• One user per API call

• Lots of requests in parallel

> 4 million users

millions of Twitter accounts

• HTTP/HTTPS is expensive

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Most public APIs have limitations:

Monday, 23 July, 12

Page 7: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The First Steps

Use Twitter’s Streaming API...

• General data collection/categorization

• Push Notifications first

• Analytics

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 8: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Halfway to Event Sourcing...

What is “Event Based”?

• Push vs pull

• Discrete domain object instances

• Time ordered

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 9: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Halfway to Event Sourcing...

What is “Event Based”?

• Push vs pull

• Discrete domain object instances

• Time ordered

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

A system that relies on discrete time-ordered data elements pushed to components rather than

components pulling one or more elements from a common source.

Monday, 23 July, 12

Page 10: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Looks a little bit like OOP when you squint...

Evented System Characteristics

• Generalized standalone components

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 11: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Looks a little bit like OOP when you squint...

Evented System Characteristics

• Generalized standalone components

• Well-known domain objects (interfaces)

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 12: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Looks a little bit like OOP when you squint...

Evented System Characteristics

• Generalized standalone components

• Well-known domain objects (interfaces)

• Inputs and outputs are message queues

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

And no, it’s not RPC

Monday, 23 July, 12

Page 13: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Players

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Scala: functional(cool) and objects(comfort)

• Akka: concurrency, fault tolerance

• RabbitMQ: client-agnostic messaging

• ZooKeeper: self-awareness

And this is growing fast.

Monday, 23 July, 12

Page 14: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Streaming Base Layer

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Dispatch assigns work

• Harvesters collect data

• Processors perform loose categorization

A collection of loosely couple services

Monday, 23 July, 12

Page 15: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

General message bus

How RabbitMQ is Used

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Harvesters → single durable queue• Processors consume from harvesters• Processors → topic exchange

Monday, 23 July, 12

Page 16: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

site.twitter.user123.mentionsite.twitter.*.mention

Push service consumes from topic exchange:

General message bus

How RabbitMQ is Used

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Harvesters → single durable queue• Processors consume from harvesters• Processors → topic exchange

Monday, 23 July, 12

Page 17: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

When RabbitMQ falls over...

The Development Apocalypse

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Each device & user combo gets a queue

+

Monday, 23 July, 12

Page 18: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

When RabbitMQ falls over...

The Development Apocalypse

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Each device & user combo gets a queue

150,000 device & user combinations+

+

Monday, 23 July, 12

Page 19: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

When RabbitMQ falls over...

The Development Apocalypse

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Each device & user combo gets a queue

150,000 device & user combinations

Push server stops

Maximum Badness

+

+

=

Monday, 23 July, 12

Page 20: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The solutions...

• Do some routing/filtering in your app

• Federation

• TCP load balancers

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

The Development Apocalypse

No silver bullet

Monday, 23 July, 12

Page 21: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

How ZooKeeper is Used

Self-Awareness:

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Lets dispatch watch for recovery/load balancing opportunities

• Less configuration

• Node health snapshots

Monday, 23 July, 12

Page 22: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

How ZooKeeper is Used

Clustering and leader election:

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

• Dispatch nodes

• Push nodes

• Metrics master

Monday, 23 July, 12

Page 23: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Production Apocalypse

When ZooKeeper meets The Cloud:

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Default (sane) session timeout

+

Monday, 23 July, 12

Page 24: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Production Apocalypse

When ZooKeeper meets The Cloud:

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Default (sane) session timeout

Major reaction to disappearing nodes+

+

Monday, 23 July, 12

Page 25: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Production Apocalypse

When ZooKeeper meets The Cloud:

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Default (sane) session timeout

Major reaction to disappearing nodes

Unpredictable pseudo network

Maximum Badness

+

+

=

Monday, 23 July, 12

Page 26: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Events and LAMP

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

It’s not all bad news for REST...

Monday, 23 July, 12

Page 27: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Events and LAMP

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

...as long as it is only the source of events.

It’s not all bad news for REST...

• Push subscriptions from public API

• Subscriptions to base layer from Analytics

• New message posts

Monday, 23 July, 12

Page 28: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Push Servers → HootSuite API

Feedback is sometimes necessary:

• UrbanAirship reports invalid devices

• HootSuite API uses a different DB

• API has to poll a durable queue (cron)

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 29: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Streaming Base → Analytics

Up to date and fast required...

• uses mention and retweet data

• high frequency polling, daemon process

• durable, non-autodelete, !auto-ack queue

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 30: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

The Speedbump

When RabbitMQ meets PHP

• cron and daemon are your friends

• non-autodelete, no auto-ack

• durable is optional

• consume as frequently as possible

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 31: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Metrics for the Masses: Graphite

Graph Everything

Metrics updating every 5-15

minutes is insufficient

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 32: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Metrics for the Masses: Graphite

Graph Everything

Metrics updating every 5-15

minutes is insufficient

• Instant insight• see external problems before they’re reported• easy custom graphs• UI’s not great but it works

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 33: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Getting Rid of REST: ∅MQ

For internal APIs, timeliness > throughput

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Monday, 23 July, 12

Page 34: Scalable Event-Based Systemsfiles.meetup.com/3946322/Scalable-Evented-Systems.pdf · Given a stream of discrete data items, a chunk missing in the middle. Gaps happen when you don’t

Getting Rid of REST: ∅MQ

For internal APIs, timeliness > throughput

• Enterprise clients have shifting team structures

• Access control changes must propagate quickly

• Uniform API

twitter.com/hootsuite facebook.com/hootsuite slideshare.com/hootsuite blog.hootsuite.com

Decentralized Message Queues:

Monday, 23 July, 12