37
An Infrastructure in Transition Joe Stump, Lead Architect, Digg

Digg.com Software Architecture

Embed Size (px)

DESCRIPTION

Digg is one of the largest sites on the internet serving some 26 million unique visitors a month. Those unique visitors are only half the story as hundreds of millions of requests from dozens of sources actually hit Digg's stack monthly.Join Joe Stump, Lead Architect, from Digg as he pulls back the curtain for a peak at the systems and software architecture that makes Digg hum along.Watch a video at http://www.bestechvideos.com/2009/03/16/digg-an-infrastructure-in-transition

Citation preview

Page 1: Digg.com Software Architecture

An Infrastructure in Transition

Joe Stump, Lead Architect, Digg

Page 2: Digg.com Software Architecture

Introductions

Page 3: Digg.com Software Architecture

✓ 35,000,000 uniques✓ 3,500,000 users✓ 15,000 requests / sec✓Hundreds of servers

Page 4: Digg.com Software Architecture

“Web 2.0 sucks (for scaling).”Joe Stump

Page 5: Digg.com Software Architecture

What’s Scaling?

Page 6: Digg.com Software Architecture

What’s Scaling?

Specialization

Page 7: Digg.com Software Architecture

What’s Scaling?

Severe Hair Loss

Page 8: Digg.com Software Architecture

What’s Performance?

Page 9: Digg.com Software Architecture

What’s Performance?

Who cares?

Page 10: Digg.com Software Architecture

4 Stages of Scaling

Page 11: Digg.com Software Architecture
Page 12: Digg.com Software Architecture
Page 13: Digg.com Software Architecture
Page 14: Digg.com Software Architecture
Page 15: Digg.com Software Architecture

As it stands ...

Page 16: Digg.com Software Architecture

Applications

Netscalers

MogileFS

Rec. Engine

ZOMG ROFLAFK WTFLULZ

Lucene

Page 17: Digg.com Software Architecture

Building Blocks• MogileFS

- 9 nodes- 2.8TB of files

• Gearman- Each application

server- 400,000 jobs / day

• Memcached- 25 nodes- 2GB / node

Page 18: Digg.com Software Architecture

Moving forward ...

Page 19: Digg.com Software Architecture

MogileFS

Rec. EngineLucene

Netscalers

Applications

Services

IDDB

Netscalers

Applications

Services

IDDB

Messaging

Page 20: Digg.com Software Architecture

✓ Elastic horizontal partitions✓Heterogenous partition types✓Muti-homed✓ ID’s live in multiple places✓Partitioned result sets

IDDB

Page 21: Digg.com Software Architecture

IDDB_ID_Intid bigintdate_created timestampstatus tinyintversion bigint

IDDB_ID_Charcharid charname charvalue charintid bigintdate_created timestamp

IDDB_ID_Int_Shardsintid bigintshardid intstatus tinyint

IDDB_Shardsid biginttype charhost charport mediumintuser charpass charstatus tinyint

Page 22: Digg.com Software Architecture

✓Memcached + BDB✓ 28,000+ writes a second✓Persistent key/value storage✓Works with Memcached

clients

MemcacheDB

Page 23: Digg.com Software Architecture

War stories ...

Page 24: Digg.com Software Architecture

✓ 15,000 - 17,000 submissions per day

✓Crawl for images, video embeds, source, other meta data

✓Ran in parallel via Gearman

Digg Images

Page 25: Digg.com Software Architecture

✓ 230,000+ Diggs per day✓Most active Diggers are also

most followed✓ 3,000 writes per second✓Ran in background via

Gearman✓ Eventually consistent

Green Badges

Page 26: Digg.com Software Architecture

user_ip_views

Page 27: Digg.com Software Architecture

✓Switched to explicit caching✓ Intelligently grouped objects

in cache ✓Sorting, limiting, etc. done in

the application layer✓ 200% to 300% gains in

performance

Digg Comments

Page 28: Digg.com Software Architecture

✓ Vertical partitioning✓Migrate in background

processes ✓Use the bots✓Keep track of migration✓Retry failed migrations

automatically

Data Migration

Page 29: Digg.com Software Architecture

Things to ponder ...

Page 30: Digg.com Software Architecture

CAP Theorem

Page 31: Digg.com Software Architecture

Have I ran the numbers?

Page 32: Digg.com Software Architecture

Is MySQL the best solution?

Page 33: Digg.com Software Architecture

Can I do this later?

Page 34: Digg.com Software Architecture

How can I partition this data?

Page 35: Digg.com Software Architecture

How should I cache this data?

Page 36: Digg.com Software Architecture

Questions?!