Upload
renato-lucindo
View
11.367
Download
1
Tags:
Embed Size (px)
DESCRIPTION
QconSP 2010
Citation preview
Distributed Systems
scalability and high availability
Renato Lucindo - lucindo.github.com - @rlucindo
Renato Lucindo
Call me Lucindo (or Linus)2002 - Bachelor Computer Science2007 - M.Sc. Computer Science (Combinatorial Optimization)7+ year developing Distributed Systems
My default answer: "I don't know."
Agenda
Scalability
High Availability
Problems
Tips and Tricks
Learning More
Distributed Systems
Multiple computers that interact with each other over a network to achieve a common goalPurpose
ScalabilityHigh availability
source: http://www.cnds.jhu.edu/
Scalability
System ability to handle gracefully a growing amount of work
Scale up (vertical)Add resources to a single nodeImprove existing code to handle more work
Scale out (horizontal)Add more nodes to a systemLinear (or better) scalability
Scalability - Vertical
Add: CPU, Memory, Disks (bigger box) Handling more simultaneous:
ConnectionsOperationsUsers
Choose a good I/O and concurrency modelNon-blocking I/OAsynchronous I/OThreads (single, pool, per-connection)Event handling patterns (Reactor, Proactor, ...)
Memory model?STM
Scalability - Vertical
Careful with numbersRequests per second# of ConnectionsSimultaneous operations
Event handlingThink front-endSlow connections/clientsIt's slower than other options
In doubt, go asyncBack-end
Thread pool (thread per-connection)No eventsProcess per-core
Scalability - Horizontal
Add nodes to handle more workFront-end
StraightforwardStateless
Back-endMaster/Slave(s)Partitioning
DHTVolatile Index
Scalability - Horizontal
Master/SlaveWrite on single MasterRead on Slaves (one or more)Scales reads
Scalability - Horizontal
Partitioning (Sharding)Distribute dada across nodes
Generally involves data de-normalizationWhere is some specific data?
Master IndexHash (DTH, Consistent Hashing)Volatile Index
Joins done in application levelNoSQL friendly
Scalability - Horizontal
Volatile Index: build and maintain data index as cached information (all clients)
High Availability
"Processes, as well as people, die"
Handle hardware and software failuresEliminate single point of failure
RedundancyFailoverReplicas
High Availability - Failover/Redundancy
High Availability - Replicas
Two or more copies of same dataReplica granularity
From node replica to "row" replicaLoad balancingWrite concurrencyReplica updatesKey for high availability and root of several problems
Problems
Problems - CAP Theorem
Problems - CAP Theorem
Consistency: all operations (reads/writes) yield a global consistent state
Availability: all requests (on non-failed servers) must have a response
Partition Tolerance: nodes may not be able to communicate with each other.
Pick Two
Problems - CAP Theorem
C + A: network problems might stop the system
Examples:Oracle RAC, IBM DB2 ParallelRDBMS (Master/Slave)Google File SystemHDFS (Hadoop)
Problems - CAP Theorem
C + P: clients can't always perform operations
Examples:Distributed lock-systems: Chubby, ZooKeeperPaxos protocol (consensus)BigTable, HbaseHypertableMongoDB
Problems - CAP Theorem
A + P: clients may read inconsistent (old or undone) data
Examples:�Amazon DynamoCassandraVoldemortCouchDBRiakCaches
Problem with CAP Theorem
In practice, C + A and C + P systems are the same.C + A: not tolerant of network partitionsC + P: not available when a network partition occurs
Big problem: network partitionNot so big (how often does it happens?)
Pick twoAvailabilityConsistency
The forgotten: LatencyOr, how long the system waits before considering a partitioned network?
Problems - Real World
Every component may fail:Network failureHardware failureElectricityNatural disastersCode failure
Tips & Tricks
Tips & Tricks - Pyramid
Capacity (connections, operations, ...) Pyramid
Tips & Tricks - Reply Fast
FAIL FastBreak complex requests into smaller onesUse timeoutsNo transactionsBe aware that a single slow operation or component can generate contentionSelf-denial attack
Tips & Tricks - Cache
Cache: component location, data, dns lookups, previous requests, etcUse negative cache for failed requests (low expiration)Don't rely on cacheYour system must work with no cache
Tips & Tricks - Queues
Easy way to add asynchronous processing an decouple your system.
Tips & Tricks - DNS
Tips & Tricks - Logs
Log everythingUse several log levelsOn every log message
UserRequest hostComponent involvedVersionFilename and line
If log level not enabled do not process log messageAvoid lookup calls (gettimeofday)
Tips & Tricks - Domino Effect
Make sure your load balancer won't overload componentsUser smart algorithms
Load BalanceResource Allocation
Tips & Tricks - (Zero) Configuration
No configuration filesUse good defaultsAuto-discovery (multicast, gossip, ...)Make everything configurable
Administrative commandNo need to stop for changes
Automatic self adjusts when possible
Tips & Tricks - STOP Test
With your system under load: kill -STOP <component>
Tips & Tricks - Know your tools
load average (uptime)stats tools
vmstatiostatmpstattcpstat, tcprstat, etc
tcpdump, nc, netstattunning
/proc/net/*ulimitsysctl
oprofiledebuging tools (gdb, valgrind)...
Tips & Tricks - Count
Count everythingConnectionsOperationsFailuresSuccessesRequest times (granularity)
Total, average, standard deviationMonitor counters
Tips & Tricks - Stability Patterns
Use TimeoutsCircuit BreakerBulkheadsSteady StateFail FastHandshakingTest HarnessDecoupling Middleware
Tips & Tricks - Don't Panic!
Learning More - Books
TCP/IP Illustrated, Vol. 1: The Protocols
Learning More - Books
Unix Network Programming, Vol. 1: The Sockets Networking
Learning More - Books
Pattern Oriented Software Architecture, Vol. 2
Learning More - Books
Release It!
Learning More - Papers
The Google File System Bigtable: A Distributed Storage System for Structured DataDynamo: Amazon's Highly Available Key-Value StorePNUTS: Yahoo!’s Hosted Data Serving PlatformMapReduce: Simplified Data Processing on Large Clusters
Towards robust distributed systemsBrewer's conjecture and the feasibility of consistent, available, partition-tolerant web servicesBASE: An Acid AlternativeLooking up data in P2P systems
Thanks!!! Questions?
lucindo.github.com - @rlucindo