Learning to Build Distributed Systems the Hard Way

Preview:

DESCRIPTION

I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment. You shouldn't have to make those mistakes too. In this talk I'll tell the story of how I built a real time advertising analytics platform that tracks and reports on millions of impressions every day, and all the things I did wrong before I got it to work. I’ll also tell you what I did right, and the choices I don’t regret.

Citation preview

LEARNING TO BUILDDISTRIBUTED SYSTEMS

THE HARD WAY

@iconara

speakerdeck.com/u/iconara(real time!)

Theo / @iconara

Chief Architect at

let’s make online advertising a great experience

MAKING THIS

INTO THIS

HOW HARD CAN IT BE?

TRACKING AD IMPRESSIONS

track page views and all their adstrack visibility and send updates on changes

track events, track activity, sync cookies,and track visits

track page views and all their adstrack visibility and send updates on changes

track events, track activity, sync cookies,and track visits

LOADED

VISIBLE

HIDDEN

VISIBLE

LOADED

ASSEMBLING SESSIONS

assemble ad impressions, page views and visits,to be able to calculate things like total visible durationmix in demographics, revenue, and third-party data

assemble ad impressions, page views and visits,to be able to calculate things like total visible durationmix in demographics, revenue, and third-party data

WASLOADED

BECAMEACTIVE

BECAMEVISIBLE

WAS HIDDEN

BECAMEVISIBLEAGAIN

A CLICK! { "user_id": "M9L6R5TD0YXK", "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", "click": true, "ad_dimensions":"980x300"}

3rd PARTY DATA & OTHER GOODIES

ANALYTICSprecompute metrics, count uniques, build visitor histories for attribution

precompute metrics, count uniques, build visitor histories for attribution

HOW HARD CAN IT BE?

25K REQUESTSPER SECOND

~1 billion requests per day, 1 TB raw data

ONE VISIT CAN CHANGE UP TO

100K COUNTERShundreds of millions of individual counters per day,

plus counting uniques and visitor histories

IN REAL TIMEor near real time, if you want to be pedantic

START WITH TWOOF EVERYTHINGgoing from one to two is the hardest

GIVE A LOT OF THOUGHT TO YOUR

KEYS AND IDSit will save you lots of pain

MANLO0 JME57Zmonotonically increasing,

sorts nicely

a timestampsomething random

JME57Z MANLO0uniformly distributed,

works nicely with sharding

something randoma timestamp

PUT BUFFERSBETWEEN LAYERS

queues can even out peaks, let you scale layers independently,

and let you restart services without loosing data

SEPARATE PROCESSING

FROM STORAGEthat way you can scale each independently

PLAN HOW TO GET RID OF YOUR DATAdeleting stuff is harder than you might think

××××

××

×

NoDBkeep things streaming

×

STREAM PARTITIONING

RANDOMLYwhen you have no interdependenciesbetween things it’s easy to scale out

(or round robin, it’s basically the same)

CONSISTENTLYwhen there are interdependencies you needto route using some property of the objects,but make sure you get a uniform distribution

NUMEROLOGY

12

2 | 123 | 124 | 126 | 12

8 | 245 | 60

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

12, 60, 120, 360superior highly composite numbers

for maximal flexibility partition with multiples of 12

for maximal flexibility partition with multiples of 12

A SHORT DIVERSION ABOUT COUNTING TO 60

the reason why there’s 60 seconds to a minute,and 360 degrees to a circle

3 SEGMENTSON EACH FINGER

= 12

3 SEGMENTSON EACH FINGER

= 12

FIVE FINGERSON OTHER HAND

= 60

log2(366) ≈ 31

$-$(ASCII code 36)-----

log2(366) ≈ 31

log2(366) ≈ 31six characters 0-9, A-Z can represent 31 bits,which is kind of almost very close to four bytes

MANLO0

MANLO0a timestamp

Time.now.to_i.to_s(36).upcase

DO YOU REALLYNEED A BACKUP?

if you got 3x replication over multipleavailability zones, is that backup really worth it?

PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT

when thousands of things happen every second, new, weird and unforeseen things happen all the time,

no test can anticipate everything(but testing is good anyway, just don’t think you got everything covered)

KTHXBAI@iconara

github.com/iconaraarchitecturalatrocities.com

burtcorp.com

COME TO SWEDEN IN MARCH AND

TALK ABOUT BIG DATAscandevconf.se/2013/call-for-proposals

IDEMPOTENCE

f(f(x)) = f(x)doing something again doesn’t change the outcome

IDEMPOTENCEif you don’t have to worry about things accidentally happening twice, everything becomes much simpler

COUNTING UNIQUESwhen adding to a set it doesn’t matter how many

times you do it, the end result is the same

INC X VS SET Xincrements are not idempotent, and very scary,if you can avoid non-idempotent operations, try

KTHXBAI@iconara

github.com/iconaraarchitecturalatrocities.com

burtcorp.com

Recommended