15
Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database Management Systems Shahram Ghandeharizadeh, Jason Yap Database Laboratory Technical Report 2012-01 Computer Science Department, USC Los Angeles, California 90089-0781 September 11, 2012 Abstract Query intensive applications augment a Relational Database Management System (RDBMS) with a middle-tier cache to enhance performance. An example is memcached in use by very large well known sites such as Facebook. In the presence of updates to the normalized tables of the RDBMS, invalidation based consistency techniques delete the impacted key-value pairs residing in the cache. A subsequent reference for these key-value pairs observes a cache miss, re-computes the new values from the RDBMS, and inserts the new key-value pairs in the cache. These techniques suffer from race conditions that result in cache states that produce stale data. The Gumball Technique (GT) addresses this limitation by preventing race conditions. Experimental results show GT enhances the accuracy of an application hundreds of folds and, in some cases, may reduce system performance slightly. A Introduction The workload of certain application classes such as social networking is dominated by queries that read data [7]. An example is the user profile page. A user may update her profile page rarely, every few hours if not days and weeks. At the same time, these profile pages are referenced and displayed frequently: Every time the user logs in and navigates between pages. To enhance system performance, these applications augment a standard SQL based RDBMS, e.g., MySQL, with a cache manager. Typically, the cache manager is a Key-Value Store (KVS), materializing key-value pairs computed using the normalized relational data. A key-value pair might be finely tuned to the requirements of an application, e.g., dynamically generated A short version of this paper appeared in the proceedings of the Second ACM SIGMOD Workshop on Databases and Social Networks (DBSocial), May 20, 2012. 1

Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

Embed Size (px)

Citation preview

Page 1: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

Gumball:A RaceConditionPreventionTechniquefor Cache

AugmentedSQL DatabaseManagementSystems�

Shahram Ghandeharizadeh, Jason Yap

DatabaseLaboratoryTechnicalReport2012-01

ComputerScienceDepartment,USC

Los Angeles,California90089-0781

September11,2012

Abstract

Queryintensiveapplicationsaugmenta RelationalDatabaseManagementSystem(RDBMS) with a

middle-tiercacheto enhanceperformance.An exampleis memcachedin useby very largewell known

sitessuchasFacebook.In thepresenceof updatesto thenormalizedtablesof theRDBMS, invalidation

basedconsistency techniquesdeletethe impactedkey-valuepairsresidingin the cache.A subsequent

referencefor thesekey-valuepairsobservesacachemiss,re-computesthenew valuesfrom theRDBMS,

and insertsthe new key-valuepairs in the cache. Thesetechniquessuffer from raceconditionsthat

result in cachestatesthat producestaledata. The GumballTechnique(GT) addressesthis limitation

by preventingraceconditions.Experimentalresultsshow GT enhancesthe accuracy of an application

hundredsof folds and,in somecases,mayreducesystemperformanceslightly.

A Introduction

The workloadof certainapplicationclassessuchassocialnetworking is dominatedby queriesthat read

data[7]. An exampleis theuserprofile page.A usermayupdateherprofile pagerarely, every few hoursif

not daysandweeks.At thesametime, theseprofile pagesarereferencedanddisplayedfrequently:Every

time the userlogs in and navigatesbetweenpages. To enhancesystemperformance,theseapplications

augmentastandardSQLbasedRDBMS,e.g.,MySQL, with acachemanager. Typically, thecachemanager

is a Key-ValueStore(KVS), materializingkey-valuepairscomputedusingthenormalizedrelationaldata.

A key-valuepair might be finely tunedto the requirementsof an application,e.g.,dynamicallygenerated�A shortversionof this paperappearedin theproceedingsof theSecondACM SIGMOD Workshopon DatabasesandSocial

Networks(DBSocial),May 20,2012.

1

Page 2: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

HTML formattedpages[12, 27, 5, 26, 10, 17], and the KVS may managea large number(billions) of

suchhighly optimizedrepresentation.A CacheAugmentedSQLRDBMS,CASQL,enhancesperformance

dramaticallybecausea KVS look up is significantly fasterthan processingSQL queries. This explains

thepopularityof memcached,anin-memorydistributedKVS deployedby sitessuchasYouTube[14] and

Facebook[35].

With CASQLs, a consistency techniquemaintainsthe relationshipbetweenthe normalizeddataand

its key-value representation,detectschangesto the normalizeddata,and invalidates1 the corresponding

key-value(s)storedin the KVS. Almost all techniquessuffer from raceconditions,seeSectionB. The

significanceof theseraceconditionsis highlightedin [33] which describeshow a web site may decide

to not materializefailed key-value lookupsbecausethe KVS may becomeinconsistentwith the database

permanently.

As anexample,considerAlice whois trying to retrieveherprofilepagewhile thewebsite’sadministrator

is trying to deleteher profile pagedue to her violation of site’s termsof use. SectionC shows how an

interleavedexecutionof thesetwo logicaloperationsleave theKVS inconsistentwith thedatabasesuchthat

theKVS reflectstheexistenceof Alice’s profile pagewhile thedatabaseis left with no recordspertaining

to Alice. A subsequentreferencefor the key-value pair correspondingto Alice’s profile pagesucceeds,

reflectingAlice’s existencein thesystem.

ThispaperpresentstheGumballTechnique(GT) to detectraceconditionsandpreventthemfrom caus-

ing the key-value pairs to becomeinconsistentwith the tabular data. GT is an applicationtransparent,

deadlockfree(non-blocking)techniqueimplementedby theKVS. Its key insight is that it is permissibleto

ignorecacheputoperationsto preventinconsistentstates.Its advantagesareseveralfolds. First, it appliesto

all applicationsthatusea CASQL, freeingeachapplicationfrom implementingits own raceconditionde-

tectiontechnique.Thisreducesthecomplexity of theapplicationsoftware,minimizingcosts.Second,in our

experiments,it reducedthenumberof observed inconsistenciesdramatically(morethanten folds). Third,

GT requiresno specificfeaturefrom anRDBMS andcanbeusedwith all off-the-shelfRDBMSs. Fourth,

GT adjuststo varyingsystemloadsandhasno externalknobsthatrequireadjustmentsby anadministrator.

Fifth, while GT employstimestamps,it doesnotsuffer from clockdrift andrequiresnosynchronizedclocks

becauseits timestampsarelocal to a(partitioned)cacheserver. Thedisadvantageof usingGT is thatit may

slow down anapplicationslightly whenalmostall (99%)requestsareservicedusingKVS. This is because

GT re-directsrequeststhatmayretrieve stalecachedatato processSQLqueriesusingtheRDBMS.

Theprimarycontribution of this paperis thedesignof GT andhow it detectsandpreventsracecondi-

tions. To thebestof our knowledge,GT is novel andnot presentedelsewhere.A secondarycontribution is

animplementationandevaluationof GT usinga socialnetworking benchmark.Obtainedresultsshow GT

imposesnegligible overheadwhile reducingthepercentageof inconsistenciesdramatically.1Otherpossibilitiesincluderefreshing[12, 23] or incrementallyupdating[24] thecorrespondingkey-value.

2

Page 3: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

��������� ���������

�������� ������������ � � � ��! "��#$���! "�%������&'&�&'&�&

( )+*, -.���� $ � � � !�� /�0#!�������/�

Figure1: Architectureof aCASQL.Arrowsshow processingof a13254"6�7�8

with akey referencewhosevalueis not foundin KVS.

The restof this paperis organizedasfollows. Theraceconditionis formalizedin SectionB andSec-

tion C describesGT to detectandprevent it. We highlight the tradeoffs associatedwith GT in SectionD.

SectionE presentsrelatedwork. Brief conclusionsareofferedin SectionF.

B Problem Statement

Figure1 shows the architectureof a typical CASQL. Both the KVS andthe RDBMS consistof a client

anda server componentthat communicateusingtheTCPprotocol. Theapplicationemploys theRDBMS

client(e.g.,JDBC)to issueSQLqueriesto theRDBMSserver. Similarly, it employs theKVS client to issue

insert,getanddeletecommandsto oneor moreinstancesof a KVS server deployedon several(potentially

thousandsof) PCswith asubstantialamountof memory.

Thekey-valuepairsin KVS might pertainto eithertheresultsof a query[23] or a semistructureddata

obtainedby executingseveralqueriesandglueingtheir resultstogetherusingapplicationspecificlogic [12,

10, 33, 23]. With theformer, thequerystringis thekey andits resultsetis thevalue.Thelattermightbethe

outputof eitheradeveloperdesignatedread-onlyfunction[33] or codesegmentthatconsumessomeinputto

produceanoutput[23]. In thepresenceof updatesto theRDBMS,a consistency techniquedeployedeither

at the applicationor the RDBMS may deletethe impactedcachedkey-valuepairs. This deleteoperation

mayracewith a look up thatobservesacachemiss,resultingin stalecacheddata.

To illustratea racecondition,assumetheuserissuesa requestthat invokesa segmentof code(192:4"6�7�8

)

that referencesa ;=< - >?< pair that is not KVS residentbecauseit wasjust deletedby anupdateissuedto the

RDBMS.Thiscorrespondsto Alice in theexampleof SectionA referencingherprofilepageafterupdating

herprofile information.Theadministratorwho is trying to deleteAlice from thesysteminvokesadifferent

codesegment(132A@CBED

) to delete;�< - >?< . Eventhoughboth192A@FBED

and132:4"6G708

employ theconceptof trans-

actions,theirKVS andRDBMS operationsarenon-transactionalandmayleave theKVS inconsistent.One

3

Page 4: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

1. Get(Ki) returns cache

miss.

2. Issue queries to DBMS to compute

K i-V i.3. Insert Ki-V i

in the cache.

CS mod

1. UpdateDBMS

2. Consistencytechnique deletes Ki

from the cache.

CS modCS

fuseCS

fuseCS

fuse

T miss

T Delete

Time

2.a)Acceptable.

1. Get(Ki) returns cache

miss.

2. Issue queries to DBMS to compute

K i-V i.3. Insert Ki-V i

in the cache.

CS mod

1. UpdateDBMS

2. Consistencytechnique deletes Ki

from the cache.

CS mod

CS fuse

CS fuse

CS fuse

T miss

Time T Delete

2.b)Undesirable.

Figure2: Two interleavedprocessingof132:4"6G708

and132A@CBHD

referencingthesamekey-valuepair.

scenariois shown in Figure2.b.192:4"6�7�8

looksuptheKVS andobservesamiss,Arrows1 and2 of Figure1,

andcomputes;�< - >?< by processingits bodyof codethat issuesSQL queries(a transaction)to theRDBMS

to computes>?< , Arrows 3 and4 of Figure1. Prior to132:4"6G708

executingArrow 5,192A@FBED

issuesboth its

transactionto updatetheRDBMS anddeletecommandto updatetheKVS. Next,132:4"6G708

inserts;�< - >?< into

theKVS. Thisschedule,seeFigure2.b,renderstheKVS inconsistentwith theRDBMS.A subsequentlook

up of ;�< from KVS producesastalevalue >/< with no correspondingtabular datain theRDBMS.

In sum,a race condition is an interleaved executionof192:4"6�7�8

and192A@FBED

with both referencingthe

samekey-valuepair. Not all raceconditionsareundesirable;only thosethat causethe key-valuepairsto

becomeinconsistentwith thetabular data.An undesirableraceconditionis aninterleavedexecutionof one

or more threadsexecuting132A@CBHD

with oneor more threadsexecuting132:4"6G708

that satisfy the following

criteria. First, the thread(s)executing192:4"6�7�8

mustconstructa key-valuepair prior to thosethreadsthat

execute192A@FBED

that updatethe RDBMS. And,132A@CBHD

threadsmustdeletetheir impactedkey-valuepair

from KVS prior to13254"6�7�8

threadsinsertingtheir computedkey-valuepairsin theKVS. Figure2.b shows

aninterleavedprocessingthatsatisfiestheseconditions,resultingin anundesirableracecondition.Therace

conditionof Figure2.adoesnot resultin aninconsistentstateandis acceptable.

C Gumball Technique

GumballTechnique(GT) is designedto preventtheraceconditionsof SectionB from causingthekey-value

pairsto becomeinconsistentwith tabular data. It is implementedwithin theKVS by extendingits simple

operations(delete,getandput) to managegumballs,seeFigure3. Its detailsareasfollows. Whentheserver

4

Page 5: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

delete(I�J )1. If I�J - K?J existsthendeleteI=J - K?J andgenerategumball L=J , i.e., I=J - L=J , with MONHP setto currenttime.2. If I�J - L=J existsthenchangeMQN�P to thecurrenttime.3. If noentryexistsfor I J thengenerateL J , i.e., I J - L J , with M N P setto currenttime.

get(I�J )1. If I�J - K?J existsthenreturn K?J2. If either I=J - L=J existsor no entryexistsfor I�J thenreporta cachemisswith currenttimeas M�RSJUTVT timestamp.

put(I J ,K J ,M RSJWTVT )1. Let M�X betheserversystemtime.2. If ( MQX - M�RSJWTVT�Y[Z ) thenignoretheput operation.3. If ( L=J existsand M�RSJWTVT is beforeMQNHP ) thenignoretheputoperation.4. If ( K?J existsandits time stampis after M�RSJWT0T ) thenignoretheput.5. If ( M RSJUTVT�\ M%]�^�_�` T!a ) thenignoretheputoperation.6. Otherwise,insert I J - K J with its time stampsetto M RSJUTVT .

Figure3: GT enableddelete,get,andput pseudo-code.All time stampsarelocal to theserver containing;cb - >db .

receivesadelete(;cb ) request,andthereis novaluefor ;eb in theKVS, GT storesthearrival timeof thedelete

( f D�8�gh80i'8 ) in a gumball jdb andinsertsit in theKVS with key ;eb . With severaldelete(;cb ) requestsissuedback

to back,GT maintainsonly one jGb denotingthetime stampof thelatestdelete(;cb ). GT assignsa fixedtime

to live, k , to each;cb - jdb to prevent themfrom occupying KVS memorylongerthannecessary. Thevalueof

k is computeddynamically, seeSectionC.1.1.

When the server processesa get(;eb ) requestandobserves a KVS miss, GT provides the KVS client

component(client for short)with the misstime stamp,f @ b 7H7 . The client maintains;eb andits f @ b 7�7 time

stamp.Once13254"6�7�8

computesavaluefor ;eb andperformsa putoperation,theclient extendsthis call with

f @ b 7�7 . With this put(;eb ,>db ,f @ b 7H7 ), a GT enabledKVS server comparesf @ b 7�7 with thecurrenttime ( f5l ). If

their differenceexceedsk , f5lnmof @ b 7�79p k , thenit ignorestheput operation.This is becausea gumball

might have existedandit is no longerin theKVS asit timed out. Otherwise,therearethreepossibilities:

Either (1) thereexists a gumball for ;cb , ;eb - jdb , (2) the KVS server hasno entry for ;cb , or (3) thereis an

existing valuefor ;eb , ;cb - >Gb . Considereachcasein turn. With thefirst, theserver comparesf @ b 7�7 with the

timestampof thegumball.If themisshappenedbeforethe jGb timestamp,f @ b 7�7rq f�s 6?@Ftvu�gwg , thenthereis a

raceconditionandtheput operationis ignored.Otherwise,theput operationsucceeds.This meansjdb (i.e.,

the gumball) is overwrittenwith >db . Moreover, the server maintainsf @ b 7H7 asmetadatafor this ;eb - >db (this

f @ b 7�7 is usedin thethird scenarioto detectstaleputoperations,seediscussionsof thethird scenario).

In the secondscenario,the server inserts ;cb - >db in the KVS and maintainsf @ b 7�7 as metadataof this

key-valuepair.

In the third scenario,a KVS server may implementtwo possiblesolutions. With the first, the server

comparesf @ b 7�7 of theputoperationwith themetadataof theexisting ; b - > b pair. Theformermustbegreater

in orderfor theputoperationto over-write theexistingvalue.Otherwise,theremight bearaceconditionand

5

Page 6: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

theputoperationis ignored.A moreexpensivealternative is for theKVS to performabyte-wisecomparison

of theexisting valuewith theincomingvalue.If they differ thenit maydelete;eb - >db to forcetheapplication

to produceaconsistentvalue.

GT ignorestheput operationwith bothacceptableandundesirableraceconditions,seediscussionsof

Figure2 in SectionB. For example,with theacceptableraceconditionof Figure2.a,GT rejectstheputop-

erationof132:4"6G708

becauseits f @ b 7H7 is beforef�s 6?@Ctvu�ghg . Thesereducethenumberof requestsservicedusing

theKVS. Instead,they executethefusioncodethatissuesSQLqueriesto theRDBMS.This is significantly

slower thanaKVS look up,degradingsystemperformance.

C.1 Value of xIdeally, k shouldbesetto theelapsedtime from when

192:4"6�7�8observesa KVS missfor ; b to the time it

issuesaput(; b ,> b ,f @ b 7�7 ) operation.k valuesgreaterthanidealareundesirablebecausethey causegumballs

to occupy memorylonger thannecessary, reducingthe KVS hit rateof the application. k valueslower

thanideal causeGT to rejectKVS insertoperationsun-necessarily, seeStep2 of the put pseudo-codein

Figure3. They slow down a CASQL significantlybecausethey prevent theserver from cachingkey-value

pairs.In oneexperiment,GT configuredwith asmall k valueslowedthesystemdown tenfoldsby causing

theKVS to sit emptyandre-directall requeststo theRDBMS for processing.This sectiondescribeshow

GT computesthevalueof k dynamically.

C.1.1 Dynamic computation of kGT adjuststhe valueof k dynamicallyin responseto CASQL load to avoid valuesthat renderthe KVS

emptyandidle. Thedynamictechniqueis basedon theobservation that theKVS server mayestimatethe13254"6�7�8responsetime, RT, by subtractingthecurrenttime ( f:l ) from f @ b 7�7 : yzf|{}f5lnm~f @ b 7�7 . Whena

put is rejectedbecauseits RT is higherthan k , GT setsthevalueof k to this yzf multipliedby aninflation

( � ) value, k = yzf���� . For example,� mightbesetto 1.1to inflate k to be10%higherthanthemaximum

observedresponsetime. (Seebelow for adiscussionof � andits value.)

Increasingthevalueof k meansrequeststhatobserved a missprior to this changemay now passthe

raceconditiondetectioncheck. This is becauseGT may have rejectedoneor moreof theseput requests

with the smaller k valuewhenperformingthe check f5l�m�f @ b 7H7�p k . To prevent suchrequestsfrom

polluting thecache,GT maintainsthetimestampof whenit increasedthevalueof k , f u�D < 6G7Vi . It ignoresall

putoperationswith f @ b 7�7 prior to f u�D < 6�70i .GT reducesthevalueof k whena ;cb - jdb is replacedwith a ;cb - >db . It maintainsthemaximumresponse

time, yzf @Cu�� , usingaslidingwindow of 60seconds(durationof slidingwindow is aconfigurationparameter

of theKVS server). If this maximummultiplied by aninflation value( � ) is lower thanthecurrentvalueof

6

Page 7: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

k thenit resetsk to this lowervalue, k|{�yzf @Cu�� ��� . Notethatdecreasingthevalueof k doesnotrequire

settingf u�D < 6�70i to thecurrenttimestamp:Thoseput requeststhatsatisfythecondition f5l�m�f @ b 7H7�p k will

continueto satisfyit with thesmaller k value.

Thedynamic k computationtechniqueuses� valuesgreaterthan1 to maintain k slightly higherthan

its ideal value. In essence,it tradesmemoryto minimize the likelihoodof Step2 of the put pseudo-code

(seeFigure3) from ignoringits cacheinsertunnecessarilyandredirectingfuturereferencesto theRDBMS.

This preventsthe possibility of an applicationobservingdegradedsystemperformancedue to a burst of

requeststhatincurKVS missesandaresloweddown bycompetingwith oneanotherfor RDBMSprocessing.

Moreover, gumballshaveasmallmemoryfootprint. Thisin combinationwith thelow probabilityof updates

minimizesthelikelihoodof extragumballsfrom impactingtheKVS hit rateadversely.

D Evaluation

Thissectionanalyzestheperformanceof anapplicationconsistency techniquewith andwithoutGT usinga

realisticsocialnetworkingbenchmarkbasedonawebsitenamedRAYS.Weconsideredusingotherpopular

benchmarkingtools suchas RUBiS [3], YCSB [13] and YCSB++ [32]. We could not useRUBiS and

YCSB[13] becauseneitherquantifiestheamountof staledata.Theinconsistency window metricquantified

by YCSB++ [32] measuresthedelayfrom whenanupdateis issueduntil it is consistentlyreflectedin the

system. This metric is inadequatebecauseit doesnot measurethe amountof staledataproduceddueto

raceconditionsby multiple threads.Below, we startwith a descriptionof our workload.Subsequently, we

presentperformanceresultsandcharacterizetheperformanceof Gumballwith different k values.

D.1 RAYS and a Social Networking Benchmark

RecallAll You See(RAYS) [22] envisions a social networking systemthat empowers its usersto store,

retrieve, andsharedataproducedby devicesthatstreamcontinuousmedia,audioandvideodata.Example

devicesincludethepopularApple iPhoneandinexpensive camerasfrom PanasonicandLinksys.Similar to

othersocialnetworking sites,a userregistersa profile with RAYS andproceedsto invite othersasfriends.

A usermayregisterstreamingdevicesandinvite othersto view andrecordfrom them.Moreover, theuser’s

profileconsistsof a“Li veFriends”sectionthatdisplaysthosefriendswith adevicethatis actively streaming.

Theusermaycontactthesefriendsto view their streams.

We usetwo popularnavigation pathsof RAYS to evaluateGT: BrowseandTogglestreaming(Toggle

for short). While Browseis a read-onlyworkload,Toggleresultsin updatesto the databaserequiringthe

key-valuepairsto remainconsistentwith thetabular data.Wedescribeeachin turn.

Browseemulatesfour clicks to modelauserviewing herprofile,herinvitationsto view streams,andher

list of friendsfollowedwith theprofileof afriend. It issues38SQLqueriesto theRDBMS.With aCASQL,

7

Page 8: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

Databaseparameters� Numberof usersin thedatabase.�Numberof friendsperuser.

Workloadparameters�Numberof simultaneoususers/threads.� Numberof usersemulatedby a thread.� Think time betweenuserclicksexecutinga sequence.�Inter-arrival timebetweenusersemulatedby a thread.� Probabilityof auserinvoking theTogglesequence.

Table1: Workloadparametersandtheirdefinitions.

Browsingissues8 KVS getoperations.For eachgetthatobservesa miss,it performsaput operation.With

anemptyKVS, thegetoperationsobserve no hitsandthissequenceperforms8 putoperations.

Togglecorrespondsto a sequenceof threeclicks wherea userviews her profile, her list of registered

devicesandtogglesthestateof a device. Thefirst two resultin a total of 23 SQL queries.With a CASQL,

Toggle issues7 get operationsand,with an empty KVS, observes a miss for all 7. This causesToggle

to perform7 put operationsto populatethe KVS. With the last userclick, if the device is streamingthen

theuserstopsthis stream.Otherwise,theuserinitiatesa streamfrom thedevice. This resultsin 3 update

commandsto thedatabase.With Trig, theseupdatesinvoke triggersthatdeleteKVS entriescorresponding

to boththeprofile2 anddevicespages.With apopulatedKVS, thenumberof deletesis higherbecauseeach

toggleinvalidatesthe“Li veFriends”sectionof thosefriendswith aKVS entry.

Our multi-threadedworkloadgeneratortargetsa databasewith a fixed numberof users,� . A thread

simulatessequentialarrival of � usersperformingonesequenceat a time. Thereis a fixed delay, inter-

arrival time � , betweentwo usersissuedby thethread.A threadselectstheidentityof auserby employing a

randomnumbergeneratorconditionedusingaZipfiandistribution with ameanof 0.27. � threadsmodel�simultaneoususersaccessingthesystem.In thesingleuser(1 thread,� =1) experiments,this means20%

of usershave 80%likelihoodof beingselected.Oncea userarrivesandheridentity is selected,shepicksa

Togglesequencewith probabilityof � anda Browsingsequencewith probability �E�zmo�:� . Thereis a fixed

think time � betweentheuserclicks thatconstitutea sequence.

Theworkloadgeneratormaintainsboththestructureof thesyntheticdatabaseandtheactivities of dif-

ferentusersto detectkey-valuepairs(HTML pages)thatarenotconsistentwith thestateof thetabulardata,

termedstale data. The workloadgeneratorproducesuniqueusersaccessingRAYS simultaneously. This

meansamoreuniformdistribution of accessto datawith a largernumberof threads.While this is no longer

a trueZipfian distribution,obtainedresultsfrom a systemwith andwithout GT arecomparablebecausethe

sameworkloadis usedwith eachalternative.2Theuser’sprofile pagedisplaysthecountof devicesthatarestreaminganda toggleof astreamingdeviceeitherincrementsor

decrementsthis counter.

8

Page 9: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

0 1 2 3 4 5 61

10

100

1,000

10,000

100,000

1,000,000

Total Execution Time (Hours) Time (hours)

Number of Stale Reads

System with GT

System without GT

Figure4: Numberof stalereadsin oneexperiment;� =1000, � =10, � =100, � =10,000,� = � =0, � =1%.

To measurethe amountof staledatawith 100%accurracy, the workloadgeneratormustmaintainthe

statusof different devices managedby RAYS and serializesimultaneoususerrequestsand issueone to

CASQLatatime. This is unrealisticandwouldeliminateall raceconditions.Instead,weallowedthework-

loadgeneratorto issuerequestssimultaneouslyandusedtime stampsto detectits internalraceconditions.

This resultsin falsepositiveswhereanidentifiedstaledatais dueto anin-progresschangeto a time stamp.

Thesefalsepositivesareobservedwhentheworkloadgeneratoris usingRDBMS only.

D.2 Performance Results

Weconductedmany experimentsto quantifya) theamountof staledataeliminatedby GT, b) theimpactof

GT on systemperformance,andc) how quickly GT adaptsk to changingworkloadcharacteristics.In all

experiments,GT reducedthe amountof staledata100 folds or more. Below, we presentoneexperiment

with 300fold reductionin staledataanddiscusstheothertwo metricsin turn.

In this experiment,we focuson an invalidationbasedtechniqueimplementedin the application. We

targetasmalldatabasethatfits in memoryto quantifytheoverheadof GT. With largerdatasetsthatresultin

cachemisses,theapplicationmustissuequeriesto theRDBMS. This resultsin higherresponsetimesthat

hidetheoverheadof GT. If raceconditionsoccurfrequentlythenGT will slow down aCASQLby reducing

its cachehit rate. In our experiments,raceconditionsoccurlessthan3% of thetime3. Thus,GT’s impact

on systemperformanceis negligible.

Figure4 shows thenumberof requeststhatobserve staledatawhenasystemis configuredto eitheruse

GT or not useGT. Thex-axisof this figure is theexecutiontime of thework-load. They-axis is log scale

andshows thenumberof requeststhatobserve staledata.With GT, only 343requests(lessthan0.009%of

the total numberof requests)observe staledata. We attribute theseto the falsepositivesproducedby our

workloadgenerator, seeSectionD.1. Without GT, morethan100,000requests(2.4%of thetotal requests)

observe staledata.Thecachehit rateis approximately85%with andwithoutGT. Eventhoughthedatabase3Approximatedby theamountof staledataproducedwithout GT.

9

Page 10: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

0 1000 2000 3000 4000 50000

10

20

30

40

50

60

70

Time (Seconds)

Delta in Seconds

MAX RT

N=1 10 20 50 100 50 20 10 1

2) Valueof k .

Figure5: Varyingsystemload; � =1000, � =2, � =10,000,� =100msec,� =0, � =10%.

is small enoughto fit in memory, the cachehit ratecannotapproximate100%because10% of requests

executeToggle( � =10%)which invalidatecacheentries.

GT adaptsto changingworkloadsby adjustingthe value of k . We varied systemload by varying

thenumberof simultaneoususersaccessingthesystem,� . We looked at differentpatternsrangingfrom

thosethatchangetheloadabruptly(switchfrom 1 to 100simultaneoususers)to thosethatchangetheload

gracefully. In eachcase,GT adjuststhevalueof k quickly, minimizing thenumberof KVS insertsrejected

due to a small value of k . Suchrejectionsare typically a negligible percentageof the total numberof

requestsprocessed.Below, wereporton oneexperiment.

Thisexperimentvariedthenumberof simultaneoususers(� ) from 1 to 10,20,50,100andbackto 50,

20,10 and1. For eachsetting,a userissues1000requests.Figure5 shows thevalueof k whencompared

with themaximumobserved RT, seediscussionsof SectionC.1.1. As we increasethe load,GT increases

the valueof k to prevent rejectionof KVS insertsunnecessarily. Similarly, whenwe decreasethe load,

GT reducesthevalueof k to freememoryby preventinggumballsfrom occupying thecachelongerthan

necessary. k is higherthantheobservedmaximumresponsetimebecauseits inflationvalueis setto 2. This

experimentissuesmorethantwo hundredthousandput requestsandGT rejectsfewer than600dueto small

k values.

E Related Work

CacheaugmentedRDBMSshave beenanactive areaof researchdatingbackto 1990s[27, 12, 41, 18, 15,

10, 31, 29, 1, 40, 30, 9, 17, 2, 4, 5, 33, 24]. The focus of this paperis on a subsetwith the following

characteristic.First, the cachemustsupportsimpleinsert,get, anddeleteoperations[6, 33]. Thereexist

complex cacheswith theability toprocessSQLqueries,e.g.,TimesTen[40], DBProxy[2], DBCache[31, 9],

CacheTables[1], MTCache[30], Ferdinand[21]. While thesefall beyond the focusof GT, a KVS might

betheir building componentwhich mayrenderGT relevant. Second,thecache(KVS) mustbeat thesame

10

Page 11: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

abstractionasthe RDBMS wheresecurityandprivacy of contentis guaranteedby the applicationandits

infrastructure[27, 12, 41,18, 15, 29, 4, 5, 33, 24, 23]. It doesnotapplyto proxycaches[10, 16,17] thatare

externalto theapplication.

TxCache[33] presentstheconceptof raceconditionsandhow they mayleave thecache(KVS) incon-

sistentwith the RDBMS. It proposesa slight modificationto thoseRDBMSsthat implementMVCC [8]

to (a) maintainvalidity interval for read-onlyquerieswhoseresultsarerepresentedaskey-valuepairs,and

(b) generateinvalidation streamsin thepresenceof updatetransactions.This streamcontainstransaction’s

time stampandall invalidationtagscorrespondingto key-valuepairsin thecache.This enablesTxCache

to implementtransactionswith snapshotisolation. GT is different in several ways. First, while both GT

andTxCacheemploy time stampsto detectraceconditions,thesemanticsof thesetime stampsarediffer-

ent. GT’s time stamppertainsto whenKVS fails to producea valuefor a referencedkey while TxCache’s

time stampis the updatetransactiontime stampissuedby the RDBMS. Second,GT is a building block

of a consistency techniquewhile TxCacheis a consistency technique.Certainpropertiesof a transaction

(atomicity, consistency, isolation,durability) may imposesignificantoverheadandnot beapplicableto an

application[38, 11, 24, 32]. A consistency techniquemayabandonthesepropertieswhile utilizing GT to

preventraceconditionsthatleave thecacheinconsistentpermanently. Thesamecannotberealizedwith Tx-

Cachebecauseit is aconsistency techniquethatimplementstransactionswith snapshotisolation.Third, GT

canbeusedwith all SQLRDBMSsbecauseit doesnoteithermodify or requireaprespecifiedfunctionality

from theRDBMS.

Oneapproachto preventraceconditionsis to serializeall writesusingtheRDBMS[24]. If theRDBMS

is thesystembottleneck,GT is abetteralternative becauseit imposesnoadditionalloadontheRDBMS.Its

entireimplementationis by thecache.

Tombstones[20] mark deletedobjectsin systemsthat allow replica contentto diverge [37]. A key

designchallengeis when to deletethem. Somesystemsemploy a two-phasecommit protocol to delete

tombstonessafely[25, 34, 36]. To continueoperationin thepresenceof failuresandintermittentnetwork

connectivity, somestudieshave proposeddeletionof tombstonesat eachsite after a fixed period, long

enoughfor mostupdatesto completepropagation,but shortenoughto keepthespaceoverheadlow [39, 28].

Clearinghouse[19] removestombstonesfrom mostsitesafter the expirationperiodandretainsthemon a

few designatedsitesindefinitely. Whena staleoperationarrivesaftertheexpirationperiod,somesitesmay

incorrectlyapply thatoperation.However, the designatedsiteswill distribute an operationthat re-installs

tombstoneson thesesites.

A gumball is similar to a tombstonebecauseit identifiesa deletedkey-value pair. However, GT’s

managementof gumballsis novel. First, GT deletesa gumballaftera fixed time interval k . GT controls

thevalueof k dynamicallyusingtheresponsetimeof thekey-valuepair, i.e., theRDBMSservicetimeand

queuingdelaysattributedto systemload. Second,GT ignoresput operationsfor ;cb - >db that incur response

11

Page 12: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

timesgreaterthan k (independentof gumballexistence).GT is notasubstitutefor tombstonemanagement

algorithmsandtheirassumedoptimisticreplicationenvironmentandvice versa.

F Conclusion

GT is a racecondition detectionand prevention techniquefor mid-tier in-memorycachesthat comple-

menta RDBMS to enhanceperformance.This simpletechniqueworkswith all RDBMSsandalternative

invalidation-basedapproachesto cacheconsistency. Its evaluationhighlightstheneedfor abenchmarkthat

canaccuratelymeasuretheamountof staledataproducedby a framework. Our socialnetworking bench-

mark suffers from a few falsepositives (thousandthof onepercentof issuedrequest). Theseshouldbe

eliminatedwithoutslowing down theworkloadgenerator.

G Acknowledgments

Wewish to thanktheanonymousrefreesof DBSocial2012for their valuablecomments.

References

[1] M. Altinel, C. Bornhovd, S.Krishnamurthy, C. Mohan,H. Pirahesh,andB. Reinwald. CacheTables:

Paving theWay for anAdaptive DatabaseCache.In VLDB, 2003.

[2] K. Amiri, S. Park, andR. Tewari. DBProxy: A dynamicdatacachefor Webapplications.In ICDE,

2003.

[3] C. Amza, A. Chanda,A. Cox, S. Elnikety, R. Gil, K. Rajamani,W. Zwaenepoel,E. Cecchet,and

J.Marguerite.SpecificationandImplementationof DynamicWebSiteBenchmarks.In Workshop on

Workload Characterization, 2002.

[4] C. Amza,A. L. Cox,andW. Zwaenepoel.A comparative evaluationof transparentscalingtechniques

for dynamiccontentservers. In ICDE, 2005.

[5] C.Amza,G.Soundararajan,andE.Cecchet.TransparentCachingwith StrongConsistency in Dynamic

ContentWebSites.In Supercomputing, ICS ’05, pages264–273,New York, NY, USA, 2005.ACM.

[6] Six Apart. Memcached Specification, http://code.sixapart.com/svn/memcached/trunk/server

/doc/protocol.txt.

[7] Fabrıcio Benevenuto,Tiago Rodrigues,MeeyoungCha,andVirgılio A. F. Almeida. Characterizing

userbehavior in onlinesocialnetworks. In Internet Measurement Conference, 2009.

12

Page 13: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

[8] P. BernsteinandN. Goodman. MultiversionConcurrency Control - TheoryandAlgorithms. ACM

Transactions on Database Systems, 8:465–483,February1983.

[9] C. Bornhovdd,M. Altinel, C. Mohan,H. Pirahesh,andB. Reinwald. AdaptiveDatabaseCachingwith

DBCache.IEEE Data Engineering Bull., pages11–18,2004.

[10] K. S. Candan,W. Li, Q. Luo, W. Hsiung,andD. Agrawal. Enablingdynamiccontentcachingfor

database-drivenwebsites.In SIGMOD Conference, pages532–543,2001.

[11] R. Cattell. ScalableSQLandNoSQLDataStores.SIGMOD Rec., 39:12–27,May 2011.

[12] J.Challenger, P. Dantzig,andA. Iyengar. A ScalableSystemfor ConsistentlyCachingDynamicWeb

Data.In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communications

Societies, 1999.

[13] B. F. Cooper, A. Silberstein,E. Tam,R. Ramakrishnan,andR. Sears.BenchmarkingCloudServing

Systemswith YCSB. In Cloud Computing, 2010.

[14] C. D. Cuong.YouTubeScalability.GoogleSeattleConferenceonScalability, June2007.

[15] A. Datta,K. Dutta,H. Thomas,D. VanderMeer, D. VanderMeer, K. Ramamritham,andD. Fishman.

A ComparativeStudyof AlternativeMiddle Tier CachingSolutionsto SupportDynamicWebContent

Acceleration.In VLDB, pages667–670,2001.

[16] A. Datta, K. Dutta, H. Thomas,D. VanderMeer, D. VanderMeer, Suresha,and K. Ramamritham.

Proxy-BasedAccelerationof DynamicallyGeneratedContenton theWorld Wide Web:An Approach

andImplementation.In SIGMOD, pages97–108,2002.

[17] A. Datta,K. Dutta,H. M. Thomas,D. E.VanderMeer, andK. Ramamritham.Proxy-basedAcceleration

of DynamicallyGeneratedContenton theWorld WideWeb:An ApproachandImplementation.ACM

Transactions on Database Systems, pages403–443,2004.

[18] L. Degenaro,A. Iyengar, I. Lipkind, and I. Rouvellou. A MiddlewareSystemWhich Intelligently

CachesQueryResults.In IFIP/ACM International Conference on Distributed systems platforms, 2000.

[19] A. Demers,D. Greene,C. Hauser, W. Irish, and J. Larson. EpidemicAlgorithms for Replicated

DatabaseMaintenance.In Symposium on Principles of Distributed Computing, pages1–12,1987.

[20] M. J.FischerandA. Michael.SacrificingSerializabilityto Attain Availability of Datain anUnreliable

Network. In Symposium on Principles of Database Systems (PODS) Conference, pages70–75,1982.

13

Page 14: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

[21] C. Garrod,A. Manjhi, A. Ailamaki, B. Maggs,T. Mowry, C. Olston,andA. Tomasic.ScalableQuery

ResultCachingfor WebApplications.August2008.

[22] S.Ghandeharizadeh,S.Barahmand,A. Ojha,andJ.Yap.RecallAll YouSee,http://rays.shorturl.com,

2010.

[23] S.GhandeharizadehandJ.Yap. SQL QueryTo TriggerTranslation:A Novel Consistency Technique

for CacheAugmentedDBMSs. In Submitted for Publication, 2012.

[24] P. Gupta,N. Zeldovich, andS. Madden.A Trigger-BasedMiddlewareCachefor ORMs. In Middle-

ware, 2011.

[25] R. Guy, G. Popek,andT. Page.Consistency Algorithmsfor OptimisticReplication.In IEEE Interna-

tional Conference on Network Protocols, 1993.

[26] V. Holmedahl,B. Smith,andT. Yang.CooperativeCachingof DynamicContentonaDistributedWeb

Server. In HPDC, pages243–250,1998.

[27] A. IyengarandJ. Challenger. Improving WebServer Performanceby CachingDynamicData. In In

Proceedings of the USENIX Symposium on Internet Technologies and Systems, pages49–60,1997.

[28] J.KistlerandM. Satyanarayanan.DisconnectedOperationin theCodaFileSystem.ACM Transactions

on Computing Systems, 10:3–25,February1992.

[29] A. LabrinidisandN. Roussopoulos.ExploringtheTradeoff BetweenPerformanceandDataFreshness

in Database-DrivenWebServers.The VLDB Journal, 2004.

[30] P. Larson,J. Goldstein,and J. Zhou. MTCache: TransparentMid-Tier DatabaseCachingin SQL

Server. In ICDE, pages177–189,2004.

[31] Q.Luo,S.Krishnamurthy, C.Mohan,H. Pirahesh,H. Woo,B. G.Lindsay, andJ.F. Naughton.Middle-

Tier DatabaseCachingfor e-Business.In SIGMOD, 2002.

[32] S. Patil, M. Polte,K. Ren,W. Tantisiriroj, L. Xiao, J. Lopez,G. Gibson,A. Fuchs,andB. Rinaldi.

YCSB++: BenchmarkingandPerformanceDebuggingAdvancedFeaturesin ScalableTableStores.

In Cloud Computing, New York, NY, USA, 2011.ACM.

[33] D. R. K. Ports,A. T. Clements,I. Zhang,S. Madden,andB. Liskov. Transactionalconsistency and

automaticmanagementin anapplicationdatacache.In OSDI. USENIX, October2010.

[34] D. Ratner. Roam:A ScalableReplicationSystemfor Mobile andDistributedComputing,Ph.D.thesis,

UC LosAngeles,TechReportUCLA-CSD-970044,1998.

14

Page 15: Gumball: A Race Condition Prevention Technique for Cache ...dblab.usc.edu/Users/papers/GumballTR.pdf · Gumball: A Race Condition Prevention Technique for Cache Augmented SQL Database

[35] P. Saab. ScalingmemcachedatFacebook,http://www.facebook.com/note.php?note id=39391378919,

Dec.2008.

[36] Y. SaitoandH. Levy. OptimisticReplicationfor InternetDataServices.In Conference on Distributed

Computing (DISC), pages297–314,2000.

[37] Y. SaitoandM. Shapiro.OptimisticReplication.ACM Computing Surveys, 5:12–27,2005.

[38] M. I. Seltzer. BeyondRelationalDatabases.Commun. ACM, 51(7):52–58,2008.

[39] H. SpencerandD. Lawrence. ManagingUsent,O’Reilly & Associates,Sebastopol,CA, ISBN 1-

56592-198-4,1998.

[40] TheTimesTenTeam. Mid-Tier Caching:TheTimesTenApproach. In Proceedings of the SIGMOD,

2002.

[41] K. Yagoub,D. Florescu,V. Issarny, andP. Valduriez.CachingStrategiesfor Data-Intensive WebSites.

In VLDB, pages188–199,2000.

15