Getting 100B Metrics to Disk

Preview:

DESCRIPTION

Awesome! Traffic to your site is really picking up and everything is lookin’ good. Well, except for that database back in the corner, but it will hold… right? No one really wants to deal with scaling the database tier, but hopefully your customers will drag you (perhaps kicking and screaming) to some sort of distributed database architecture. This talk is all about scaling MySQL through hardware optimizations and sharding from a Site Engineering perspective. This includes real world examples of finding pain points, identifying risks, and evaluating cloud vs hardware scaling. I’ll also discus distributed database management, dealing with data purging, making consistent backups, and how to keep the site up when things go bad.

Citation preview

G E T T I N G 1 0 0 B M E T R I C S T O D I S KJonathan Thurman -Site Reliability Engineer @jthurman42

1 9 4 B

http://www.flickr.com/photos/meteopassione/9157134653/

N E W R E L I C

• Performance Monitoring

• Web Apps

• Mobile Apps

• Servers

• Databases, Caches & More…

• Software Analytics

O K AY, Y O U C O L L E C T D ATA

• 194 Billion Metrics

• 100,000 req/sec

• 2 Gbps Inbound

• 216 Terabytes

• All backed my MySQL

http://www.flickr.com/photos/bobsfever/6658919861/

H O W W E G O T H E R E

http://www.flickr.com/photos/auvet/853157494/

B U I L D I N G B L O C K S

• Hosted Environment

• Xen Virtual Machines

• Data storage

• ATA over Ethernet

• SATA drives

• MySQL 5.0

• Single Ruby on Rails Application

http://www.flickr.com/photos/riekhavoc/4648423297/

S H A R D I N G F R O M I N C E P T I O N

• Account Information

• Read heavy

• Single HA Instance

• Agent Data

• Write heavy

• 8 shards based on AccountId

http://www.flickr.com/photos/erikb/48221952/

TA L E O F T W O M O D E L S

• Ruby on Rails

• class ShardData < ActiveRecord::Base

• Look up shard for Account

• Override ConnectionHandler

http://www.flickr.com/photos/jungle_boy/140279885/

T R I B B L E S TA B L E S

• Metric table name contains

• AccountID

• Year and Julian Day

• Resolution

• ts_72_13221_1h

• Currently ~200k tables per DB

http://www.flickr.com/photos/15942690@N00/4571141076/

B I N G E A N D P U R G E

• Purging data

• DELETE FROM …

• DROP TABLE …

• innodb_file_per_table

• innodb_lazy_drop_table (pre 5.5.30-30.2)

http://www.flickr.com/photos/exalthim/2261294871/

http://www.flickr.com/photos/davidmonro/8331755849/

http://www.flickr.com/photos/heliocentric/1571127347/

http://www.flickr.com/photos/aigle_dore/6225535459/

G R O W I N G PA I N S

http://www.flickr.com/photos/aigle_dore/5626285743/

M U LT I P L E P O I N T S O F FA I L U R E

• Single shard slows down

• App servers wait for response

• DB connection pool becomes full

• Site goes down

http://www.flickr.com/photos/boston_public_library/8204384670/

S H A R D G U A R D

• Monitor all databases

• Identify shard status:

• Bad? Mark as “wedged”

• Good? Clear “wedged” flag

• ShardData checks status!

http://www.flickr.com/photos/mac_filko/5486980804/

S TA B I L I T Y A N D P E R F O R M A N C E

• Degraded performance

• New Accounts => Shard 9!

• Old accounts remain as-is

http://www.flickr.com/photos/ejpphoto/7823027272/

D ATA C O L L E C T I O N

• Rails isn’t great for data collection

• Ruby isn’t great either…

• Rewritten in Java using Jetty

http://www.flickr.com/photos/autograt/224540606/

C A C H E I S K I N G

• Buffered, not queued

• RAM is cheaper than I/O

• Get creative with batch processing

http://www.flickr.com/photos/epsos/8474532085/

I N S E R T I N T O ( S E L E C T …

• Select rows and re-process

• Cache last hour in Java’s Heap

• Write a journal and post-process it

http://www.flickr.com/photos/esoteric_13/4741001804/

R E A D / W R I T E P R O B L E M

• Sequential Inserts

• Batched in 5k chunks

• Optimize for Throughput

• Must complete < 1 minute

R E A D / W R I T E P R O B L E M

• Scattered Reads

• Optimized for Latency

• Unique Covering Indexes

M O V E T O H A R D W A R E

• Instant performance!

• Just add…

• Datacenter - Chicago, US

• Servers - Dell

• Storage - Direct Attached

• Time - About 6 months

http://www.flickr.com/photos/zebble/9621007/

S P I N N I N G R U S T

• Dell MD1200 shelves

• 8 Disks per shelf

• RAID 5 virtual disk

• Dedicated Hot-spare

http://www.flickr.com/photos/walkn/5472536812/

T H E G R E AT E X PA N S E

• MD1200s support 12 disks

• Add four more!

• Online RAID expansion

http://www.flickr.com/photos/aigle_dore/5853807037/

# FA I L

• “On-line” expansion, not so much

• Added second 4 disk RAID 5

• LVM Concatenation for space

http://www.flickr.com/photos/fireflythegreat/2845637227/

N E E D M O R E C A PA C I T Y

• Tight on disk space

• Performance not an issue

• New Accounts => Shard 10!

• Old Accounts as-is

http://www.flickr.com/photos/seandreilinger/6289721616/

S H A R D P I T FA L L S

http://www.flickr.com/photos/21206761@N00/469110140/

M I G R AT I O N P R O B L E M

• Accounts cannot move

• Not all tables have the shard key

• Rails defaults to auto-increment IDs

• Massive primary key collisions

• Punt and move the metrics

http://www.flickr.com/photos/tzafrir/125380911/

B R E A K I N G U P I S H A R D T O D O

• Agent Databases

• Metadata / Notes / Errors

• Timeslice Databases

• Time-series metric data

• 1 Minute and 1 Hour resolution

http://www.flickr.com/photos/rsepulveda/4275236049/

R E S O U R C E P O O L S

• Distributed by Shard Key

• Distribution can CHANGE

• Lookup table, not hash

• Data can be MOVED

http://www.flickr.com/photos/dclark3996/4971906528/

B A C K U P S

• Custom mysqldump wrapper

• Based on business need

• Backup per table

• Ignore tables to be purged

http://www.flickr.com/photos/usdagov/6896218334/

E V O L U T I O N

http://www.flickr.com/photos/pfsullivan_1056/3485953405/

S S D R E V O L U T I O N

• 600GB Intel 320 SSDs

• Dell MD1220 Direct Attached shelf

• Disks are no longer the bottle-neck

• Inserts in Read-optimized order are “fast enough”

Y O U C A N U S E S S D W I T H D ATA B A S E S

• 6 of 420 drives RMA’d

• March 2012 to Aug 2013

• Average 180TB lifetime writes

• 91% wear remaining

http://www.flickr.com/photos/joeshlabotnik/3584172834/

R E D U N D A N T A R R AY O F E X P E N S I V E D I S K S

• Rebuilds under load > 4 hours

• Migrated to RAID 60

• 2 x 12 disk span

• Ditch the Hot-spares

http://www.flickr.com/photos/mbk/27640225/

X F S T U N I N G

• mkfs.xfs -s size=4096

• options

• noatime

• nobarrier

• inode64

• logbsize=256k

http://www.flickr.com/photos/rocketlass/5169004165/

S H A R D G U A R D PA R T D E U X

• Protect all the things!

• Kill UI queries over 75 seconds

• Kill background queries over 1 hour

• Yes, all of them

• No really, kill them, now

http://www.flickr.com/photos/chiky/7194089194/

I F Y O U D O N ’ T B E L I E V E M E …

• Delayed Job

• Long running background query

• InnoDB History List Traversal

T O I N F I N I T Y A N D B E Y O N D

http://www.flickr.com/photos/temma2/1149223191/

H A R D W A R E V 2

• Dell R620

• 2 x Intel E5-2690 @ 2.90GHz

• 96GB RAM

• MD1220 Storage Shelf

• 800GB Intel SSD S3500

http://www.flickr.com/photos/tnarik/2590037637/

C O N T I N U O U S I M P R O V E M E N T

• EXT4 / ZFS / XFS

• RAID Card vs HBA

• Percona Server 5.6

• Multiple MySQL Instances

• Databases per Service

http://www.flickr.com/photos/shawnclover/8555834230/

JOIN THE TEAM NewRelic.com/jobs

Recommended