45
Scalability without going nuts James Cox Chief squirrel, smokeclouds [email protected] 1 1

Scalability without going nuts

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Scalability without going nuts

Scalability without going nuts

James Cox Chief squirrel, smokeclouds

[email protected]

1

1

Page 2: Scalability without going nuts

2

what this is( just an overview )

2

Thisisanoverviewofsomeoftheareasi’vefocusedonwheninvestigatingscalability.

Therearenoeasyanswers‐buthopefullytheseideaswillgiveyousomedirectionsforyourownapps

Fromsomethingsmallcomessomethingbig‐ijustmadethatup.We’regoingtohavefunwithmakingourappsworkwhenthereismorethanoneuserbylookingatcode,opsandmore.

particularlywe’lltryandwadethroughsomelanguageimprovements/tips,someinfrastructureplanningtips,stufftomakeMySQLbetter,andsoon.

we’llalsotouchonproxy/appservers,filesharesandsomequestionsattheend,ifwegetthatfar.

Ihopeyou’reallcomfortable,mobilephonesarealloffasiknowwe’reallbusypeople

right.letsbegin.

Page 3: Scalability without going nuts

3

The Language Performance Race

3

Page 4: Scalability without going nuts

4

Rails isn’t fastest( assembler is )

4

railsisn’tfastest‐that’sok.

Lifeisabouttradeoffandcompromise

Wepickrailsbecauseofitseaseandefficiencytocode‐andwecanrefactor,scaleandimprovelater.orjustbuymoreservers.

refertorecentrantsonrubyperfetc...

Page 5: Scalability without going nuts

5

Planning Trumps All( even donald )

5

Abitofplanningandprocessmappingwilldomoreforyourabilitytoscalethananylaterimprovements,usuallyrulingoutarewriteifyou’vegotthecoreoftheprojectintherightdirection.

Page 6: Scalability without going nuts

6

Analyze( don’t guess )

6

Onceyouhaveyourplanningarranged,don’tguessastowhereperformanceisstruggling‐actuallytryandgetsomenumberstobenchmarkagainst.Tolearnmoregowatchtheexcellentpeepcasthttperftutorial‐whichialmostplayedinsteadofdoingthistalk!

Page 7: Scalability without going nuts

7

Speed Perceived( the easiest way )

7

Thereisalwaysthe“glamour”ofmakingahighperformanceappwhichcanhandlealltherequestsyoucanpossiblyimagine.

Noteveryonecanbealivejournalandactuallymaketheirserverspush98MB/sontheir100MBnetworkcards.

Findtheareasoftheappwhichtheuserbaseperceivestobetheslowest:itmaybethatyoucanmakeyourappappear‘faster’byimprovingtheUI/UX.

Workontheseareasandthenradiateoutwards:it’seasiertorefactorinchunksthanasawhole

(tangent:SOAarchitectureisnotabadidea....)

Page 8: Scalability without going nuts

8

Focus on your app( it’s usually cheaper )

8

Sohowcanwemakeourappfaster?

Thereareanumberoftechniqueswecanemploytomakeourappsbetter.

Nowtodiscusssomeofthem.....

Page 9: Scalability without going nuts

9

:select, :limit, :offset( take what you need )

Improving ActiveRecord:

9

‐Youdon’talwaysneedthatdata‐thisproblemhidesitselfwhen youarefirstbuilding‐butasyouadddata,nolimit/offsetmeansyouoftenendupgrabbingtoomanyrows‐ThisisparticularlyimportantwhenusingTCPconnectionstoyourdatabase.‐oftentimesanappiswaitingforthedatatotransfer,solimitittojustthestuffyouneed

Page 10: Scalability without going nuts

10

:include => :association( keep it eager )

Improving ActiveRecord:

10

OKsoeagerloadingchangesyourqueryfromN+1(wherenisthenumberofrowsmultipliedbyassociations)toonequery.

Underthehood,thisworksbycausingaLEFTOUTERJOIN‐SQLforjoiningthetablestogether.OuterjoinsworkbyincludingrowsevenwhenonehalfofthejoinisNULL.

Highquerycountsarebadbecausetheycausequeueingforread/writeonthetable.

Page 11: Scalability without going nuts

11

Model < CachedModel( cache first, ask questions later )

Improving ActiveRecord:

11

Soyou’velimitedyourquerytotheleastamountofdatanecessary‐oryou’rejustlookingupasinglerow.Whatnext?

Cacheyourdatainafastretrievalstoresuchasmemcached.NiceActiveRecordextensionforthis(evenifitisabithairy)

n.b.thisonlyworkswithsimpleIDbasedlookups‐foranythingcomplexyouneedtouseCache.setandCache.get

Page 12: Scalability without going nuts

12

acts_as_cached( built from experience )

Improving ActiveRecord:

12

BetteralternativetoCachedModel,butyouhavetoaddthisasamethodtoaModel.

ThisisabitmorestructuredthanCachedModel.BuiltfromCNET’schow/chowhoundteam

Page 13: Scalability without going nuts

13

cache_fu( in incubation )

Improving ActiveRecord:

13

BetteralternativetoCachedModel,butyouhavetoaddthisasamethodtoaModel.

ThisisabitmorestructuredthanCachedModel.BuiltfromCNET’schow/chowhoundteam

Page 14: Scalability without going nuts

14

@var ||= Model.find(...)( keep your code dry )

Improving ActiveRecord:

14

Everdoalookup‐current_user,current_page,orsomeothercheckthathappensmorethanonceinarequest?

the||=methodsays‐usetheinstancevariableordefineitviathequery.

Page 15: Scalability without going nuts

15

@@modulo ||= (52 % 100)( run once, save forever )

Improving ActiveRecord:

15

@@isaclassvariable‐aquickwaytostoreavariableforthelifetimeoftheapp...

Page 16: Scalability without going nuts

16

template optimizer( non-lazy views )

Improving ActionView:

16

ifyouusesemanticviews‐markaby,builder‐orlotsofhelpers‐‐youhavetospendwaytoomuchtimetoparsethefiletogetsomeHTMLintheend...

link_to,image_tag,form_tags‐allhelpersforHTMLfunctionswhichare,honestly,forpeoplewho’vegottenboredwritingHTML.

Duringeachrequesttheviewrhtmlhastobeparsedanddelivered‐thisisexpensive.It’ssoexpensivetodothisparsingthat,inotherlanguages‐e.g.PHPalloptimizersfocusonservingupbyte‐compiledscripts‐andthisgoesbacktoourfirstcommentthatassemblerisfaster.

sogetyourviewsbacktothe‘compiled’formandditchthosehelpersearlybyoptimizingyourtemplates

Thisshouldbringdownthe‘Render’partofthequerylog.

Page 17: Scalability without going nuts

17

Publish Once( caching always wins )

Improving ActionView:

17

You’regoingtohavegottenyourpageloadtimetosomewhatofanoptimallevelbynow‐improvingyourdatabasequeries,andthenpre‐compilingyourtemplates.

Nowconsiderifyoucancacheyourpages.

Isthisahighlytraffickedcontentwebsite?(cachingisamust)Canyougetawaywithprofileetcpagesbeingcachedtillupdated?(socialnetworkingsite)

Page 18: Scalability without going nuts

18

caches_page: bad( nightmare to cleanup )

Improving ActionView:

18

caches_pageisthetrickusedtosimplywriteouttheentirepagetodisk...canbetrickytokeepuptodate,andalsohardworkforaslowdisk.

Thisalsofallsdownifyouhavealooseurlschema:asitei’vehackedonhadabout500MBofcontent,butcaches_pagehasgenerated30GBofcontent‐‐why?spiderswillpervertyoururlschema‐andcauseittogeneratewaaaaytoomuchcontent.

Page 19: Scalability without going nuts

19

<%= cache(:action => 'feature', :part => 'most_read') do

render :partial => 'article/most_read' end -%>

Improving ActionView:

19

Dropafragmentcacheintoyourviewandsaverepetitivetasks

Doesn’tyetworkwithrobot‐coop’smemcache‐clientasafaststoreforfragments‐

but

Thereisamemcachebackedfragmentstoregem‐eg,extendedfragmentcache

Page 20: Scalability without going nuts

20

Follow Edge( DHH Breaks Stuff )

Improving Sanity:

20

@@isaclassvariable‐aquickwaytostoreavariableforthelifetimeoftheapp...

Page 21: Scalability without going nuts

21

Tuning Up

21

Page 22: Scalability without going nuts

22

Tuning Up

22

Page 23: Scalability without going nuts

23

Tuning Up

23

Page 24: Scalability without going nuts

24

Avoid Shared Hosting( there’s only so much to go around )

24

WhenIwaslivingatmyfamilyhome,mybrothersalwaysusedtosharemystuff‐clothes,showergel,aftershave‐younameit.

Sameistrueforserverresources‐everyone’sgottashare.

Notallusersplaynice‐thatcrazycrawleronyourboxistakingupalltheramandthespammerisgettingyoublacklisted.

Toomanyvariablesyoucan’tcontrol‐VPSsoftwareisprettyharshforsettingprocesslimitstosavetheboxasawhole

Underconfiguredsoftware‐allpackagestomakeitworkforeveryone.Lowperformance:designedtoencourageupgrades.

Page 25: Scalability without going nuts

25

New Players( always one )

25

SOMEvpsaregettingitright‐EngineYard,RailsMachine‐high‐performancefocusedservers

Expectstrustedusers‐won’tcaterforthelow‐enduser

Expensivetobuyinto,lowavailability‐butoftenaworthwhileinvestment

Page 26: Scalability without going nuts

26

Multiple Servers?( work them hard )

26

Oneserverormore?

It’sgreatifyouhavetheinfrastructure....butdoyouknowhowtosplitthemup?

Page 27: Scalability without going nuts

27

Setup Hot( universe is infinite )

27

There’salsoperformanceinproductivity‐itmakessensetomirrorsetupsoneachmachineforhot‐backupaswellasforpredictability.

capistranowillhelpyouwiththis.

Page 28: Scalability without going nuts

28

Proxy/Web Static (2)

Application Servers (4)

Database Layer (2)

8 Se

rver

Gem

28

It’sgreatifyouhavetheinfrastructure....butdoyouknowhowtosplitthemup?

Thinkoftheshapeofaruby‐thetopisabitofaplateau,andthat’swhereyouputstaticandproxyservers.You’llwanttoloadbalancetheseforhighavailability‐butgenerallythesescaleverywellastheydon’tdomuchbutroutetrafficandservefiles.

Thewidestpart‐thoseareyourapplicationservers,andyoucangrowtheseouttoasmanyasyoucanimagine.Thisisyourworkhorselayer‐everythinginterestinghappenshere.Carefulyoudon’thavetoomanyofthesefortheproxyservers‐iftherearesomanychoicesforeachproxysomeofthesecansitidle.

Thebottom,hiddenpartisthebestbit‐thedatabaselayer.Thisisasomewhatsacredlayer:notmanyserverscanplaythispartatonce.Ensureyouputyourbestmachinesatthislevel.You’regoingtowanttoseehighram,goodI/Othroughput,lotsofCPUpowerandplentifuldiskspace.

Page 29: Scalability without going nuts

29

Playing Well Together( there is only one sandpit )

29

Soyou’vegottenyourserverstaggedup‐howdoyouassignthemtasks?

Withoneofourclients,wehadasituationwherewehaveamegabusyad‐serverandabusyCMSsharingthesamedatabase.itmadesensetobreakthemapartontotwoservers‐thequerystatsmadesense.

...butwecouldputtheadminandthefrontendappandproxyserversonthesamemachines‐

Why?Frontend/adminworkwelltogether.Databasesareheavyread/writesotwobusydatabaseswillfight/queueforfilesystemaccess.

Page 30: Scalability without going nuts

30

MySQL Tuning( feed the beast )

30

OKletscoversometipsgettingMySQLtoplaynice.

WhyMySQLoverothers?Mostlybusinessreasonsthantech‐ithasanicepathwaytomoveontoafullysupportedcontractwhenyouneedit.

MySQLisalsoonthecuspoflaunchingareallyawesomeNBDcluster‐thisisbasicallyahighavailabilitymemorystoredatabasewhichretainsintegrityviathestandardserver.

Page 31: Scalability without going nuts

31

mysql> \s

mysql Ver 14.7 Distrib 4.1.19, for pc-linux-gnu (i686) using readline 4.3

Uptime: 10 hours 11 min 47 sec

Threads: 3 Questions: 10,171,505 Slow queries: 334 Opens: 224 Flush tables: 1 Open tables: 106 Queries per second avg: 277.100

31

Thisisasinglemachine,dual2.4GHzxeonprocessor,hyperthreaded.2GBRAM.Linux.

YesitispossibletogetsomereallyhighperformanceMySQLgoing‐youjustneedtogetthesettingsright‐thisistrialanderror(mostly)

Hadoverabillionqueriesonanuptimeof60days,butsome‘technician’atthedatacenterrebootedthewrongbox.SoIcan’tshowthatoff.shame!

Page 32: Scalability without going nuts

32

# query cache considered harmfulquery_cache_size=0

# key_buffer_size is the size of the buffer used for index blocks. key_buffer_size=100M

# The maximum size of one packet. max_allowed_packet=1M

# the length of time (in seconds) that we want to log against.#long-query-time=3log-slow-queries=/var/log/mysql_slow_queries

32

SomekeyvariablesIalwayshaveset...

querycacheisnotalwaysasusefulasitseems‐OKfortrulyunoptimizedbadlyindexedstuff,notsogoodforwhenyouneedtomanagethestack‐thinkofaloggingtableorausertableinasocialnetwork‐whenthedatachangesmorequicklythanthetimeittakestocreateandquerythecache‐you’reintrouble.

itwasalsoquicklywrittentomakeMySQL4lessslowinresponsetoacustomerrequest.

buffersize‐settobeasmuchspareramasyouhave‐thisistheamountofmemoryit’llallocatetofitinthebuffer.Ifithastokeepallocating,thenit’lldothesortinchunkswhichtakesFOREVER.

Themessagebufferisinitialisedtonet_buffer_lengthbytes,butcangrowuptomax_allowed_packetbyteswhenneeded.Goodifyou’repassingaroundlargeobjectssuchasimages,articles,andsoon‐setithighandforgetaboutit(aslongasyournetworkcancope)

ALWAYSlogslowqueries‐andregularlycheck.ThisisyourfirstportofcallforoptimizingyourDB!!!

Page 33: Scalability without going nuts

33

# if you use network (tcp) based connections

wait_timeout=90net_write_timeout=180net_read_timeout=60max_connections=500

mysql > SHOW FULL PROCESSLIST; (for more info)

33

IfyourDBserverisdifferenttoyourappserver,it’simportanttosetthese.Oftentimesi’veseenserverswhereappserversarequeuingduetolonglaggytimeoutsandnoavailableconnections.

Page 34: Scalability without going nuts

34

It’s OK to ditch AR( DHH won’t get upset )

34

Sometimesit’sjustsimplertodropoutandcraftaveryfocusedquery,useastoredprocedureorfunction,mysqlvariables....forceanindex.

Justbecauseyoucan’tdoitina#finddoesn’tmeanyoushouldn’tdoit.(ie,don’tsacrificeultimateperformanceformanageabilityeverytime)

goodexampleandnoteasyusingstandardAR‐‐usingINSERTDELAYEDisgreatforwhenyoudon’tneedtoknowtheidoftherowinserted.Goodforthingslikelogs,statsetc.

Page 35: Scalability without going nuts

35

Proxy > App( warm up the pack, the engine’s running )

35

Bestadvicerightnowistousenginxasafrontendtoamongrelcluster(ortwo)

it’sveryfastandscalable‐nginxislightweight,andcanhandleupstreamclusterswithease,aswellasusefastonboardPCREstyleregexforhandlingdifferentpathsbasedontheirneeds.

mongrel,whilenotbeingthefastestinthepack,letsyouscaleouteasily.PlusZedisprettyclever,andhe’llfixstuffquickly.

Whyusethem?Lotsofthese‘new’httpserversaremorefocusedtowardsasmallergoalset‐theyaredesignedtoachieveoneortwothings.ApacheHTTPDletsyouembedalmostanymoduleimaginableinthechainset.It’sclearwho’sgoingtobefaster.

Page 36: Scalability without going nuts

36

Event Driven?( don’t presume your traffic )

36

Youcanuseswiftiplyandeventedmongreltomoveawayfromthehighcostofthreads.Thisisusefulbecauserailssitsinonebigloopforeachrequest‐sotieingupexpensivethreadswaitingforyourapptogetdoneisnotnecessarilyefficient.Perhapstryrunningitinaneventloop

haven’ttriedthisyetinanykindofreal‐worldexample‐butreallykeentoseeifitcanscale(andstandup)

Page 37: Scalability without going nuts

37

125.00

156.25

187.50

218.75

250.00

nginx litespeed lighttpd(fcgi) apache(fcgi)

234

220

207

187

Req/sec (mean)

Stats courtesy of http://blog.kovyrin.net/

37

Clearalternativesifyouaren’tscalingpastoneappserver‐thesenumbersaresortofindicative

litespeed(payforproduct)hassomenicenumbersandanapparentlyeasy‐to‐useinterface‐livetoolforaddingnewlsapisonthefly

lighttpd+apache,yes,straightfastcgiisgoodbutyoucan’tscalepastfourFCGIprocesses,mongrelcan

Page 38: Scalability without going nuts

38

KeepAlive( no point if you’re dead )

38

KeepAlivealmostneverworks.99%ofthetime,you’regoingtobenefitjustmakingyourappserver/webserverignoreit.Mostbrowsersnowworkaroundthistohelpimproveperceivedperformance.

Youcangetthesamekindofbenefitbyparallelizingyourassetrequests‐ierandomizefromserver1/server2etc.

Edgerailssupportsthisnatively.

Page 39: Scalability without going nuts

39

Hostname Lookup( do not do this. ever. )

39

anythingthatinterfereswiththebusinessofservingyourwebpagetotheclientisgoingtohurtyourperformance.

turnoffhostnamelookup,excessivelogs,unusedmodules‐‐anythingyoureallyreallydon’tneed.

makesureyourappsarecompiledtoperformthebestwithyoursetup(exceptforMySQLwhereyoushouldalwaysusetheircompiledversions)

Doyouusestatspackages?MakesuretheJScallsarerightbeforetheend</body>tag‐‐youmaygetluckyandbrowserswilldealwithcomplicatedstufflikestylesandsoon,orrenderthepagetothescreenwhilstwaiting‐thesecallstypicallyblockandthebrowsercan’tdomuchtilltheyreturn.

Sobesureyourstatspackagecanhandleyourtrafficbeforeyoustickitupthere.(Hint:self‐installablestufflikemintcan’thandlemillionsofhitsperdaywithoutlotsofhardwaretosupportit)

Reallybadstats?

perhapsuseanasyncXMLHttpRequesttofireit,anIFrameortheonloadhandler....

Page 40: Scalability without going nuts

40

NFS and Beyond( sharing is good )

40

Areyoupre‐cachingoneveryserver?Thenuseasharedfilestore!

It’salsoeasiertoexpireonestorethanmany.

bewarned‐NFStraditionallyhasn’tbeenknowntoscaleaswellasitcould‐morerecentversionsaremoreperformant

SomeNFSoptionsyoucanturnoff(youdon’talwaysneedtowrite,forexample)andstayinginsyncisnotalwaysimportantforasmallshareyoucanjustremountifitgetscrazy.

Page 41: Scalability without going nuts

41

Write over NFS( be super efficient )

41

Zedpointedoutthisreallybrain‐deadsimpleefficiency.IfyouuseNFS‐useittowritetoyourassetservers‐diskischeapbutthenetworkteardown/startupisexpensive.Don’tsaturateyournetcardjustpassingdataaroundagainandagain.

Alwayslookforthesimplestpath.

Page 42: Scalability without going nuts

42

MogileFS, NFS Clusters( brainy sharing )

42

Ifyou’restrugglingundertheloadoflotsofstaticassets(thinkyoutubeorflickr)andyoucan’tquiteaffordanetworkattachedstoragedevicewithapetabyteofdiskspace,

considerusingupthemanymultigigabytedisksyouhaveinyourservers!

clusterupforNFSclusters(trickybutnotimpossible)whereyoucancreateapseudoraidovermachinesviasoftware.googleforit

orusemogileFSanditsHTTPDAVstyleapiforgrabbingyourdatachunks.RobotCOOPhaveaworkinglibrary.

Page 43: Scalability without going nuts

43

Tuning Recap( were you listening? )

43

1.Checkforbottlenecks.focusonperceivedareasofslowness2.Improvebymakingusershappy3.Lookatyourlayout‐areyourserversfightingforCPU/RAMtime?4.Areyouonasharedhostandbeingkeptinstrictlimits?5.Isyourcodeoptimal‐especiallytemplates?6.Canyougetmoreservers?7.Tuningyourapps‐istheMySQLprocesslistshowinglotsofwaitingqueries?8.AreyourunningthemostoptimalHTTPsetup?9.isyourcachecausingyouproblemsonthedisk?10.Attendoneofourscalabilitytalks‐startinginMay.asktheskillsmatterteamhereformoreinfo.10.Hireme....orsomeonelikeme:)

Page 44: Scalability without going nuts

44

Any Questions?

44

Page 45: Scalability without going nuts

45

Resources - talk: smokeclouds.com/scalability.pdf me: smokeclouds.com :: imaj.esblogs: brainspl.at :: blog.kovyrin.net : caboo.se app: mongrel.net :: litespeed.com web: lighttpd.net :: nginx.net :: swiftcore.orghosts: railsmachina.com :: engineyard.com

45