Upload
mehab
View
219
Download
0
Embed Size (px)
Citation preview
8/6/2019 Middle Tier
1/24
Middle-tierDatabaseCachingfore-Business
QiongLuo1SaileshKrishnamurthy2C.Mohan3HamidPirahesh3
HongukWoo4BruceG.Lindsay3JeffreyF.Naughton1
Contactemail:{[email protected],[email protected]}
Abstract
Scaling up to the enormous and growing Internet population with unpredictable usage patterns, E-
commerceapplicationsface severe challenges in cost andmanageability, especially fordatabaseservers
that are deployed as those applications back-ends in a multi-tier configuration. Middle-tier database
caching is one solution to this problem. In this paper, we present a simple extension to the existing
federatedfeaturesinDB2UDB,whichenablesa"regular"DB2instancetobecomeaDBCachewithout
any application modification. On deployment of aDBCache at an application server, arbitrary SQL
statementsgeneratedfromtheunchangedapplicationhithertointendedforaback-enddatabaseserver,can
beanswered:atthecache,attheback-enddatabaseserver,oratbothlocationsinadistributedmanner.The
factorsthatdeterminethedistributionofworkloadincludetheSQLstatementtype,thecachecontent,the
applicationrequirementondatafreshness,andcost-basedoptimizationatthecache.Wehavedevelopeda
research prototype of DBCache, and conducted an extensive set of experiments with an E-Commerce
benchmarktoshowthebenefitsofthisapproachandillustratetradeoffsincachingconsiderations.
1 Introduction
Variouscachingtechniqueshavebeendeployedtoincreasetheperformanceofmulti-tierweb-basedapplicationsin
responseto theever-increasingscaleof theInternet. Suchapplications typically achieve a measureof scalability
withapplication serversrunningon multiple (relativelycheaper)systems connecting toasingle databasesystem.
This,however,doesnotsolvethescalabilityproblemforback-enddatabaseservers.Middle-tierdatabasecaching
(shownasthegrayboxinFigure1)isoneofthesecachingtechniques,whichisdeployedinthemiddle,usuallyat
theapplicationserver,ofa multi-tierwebsiteinfrastructure when theback-enddatabaseisa bottleneck.Example
commercial products include the Database Cache of Oracle 9i Internet Application Server [19] and TimesTen's
Front-Tier[22].
Inamulti-tiere-Businessinfrastructure,middle-tierdatabasecachingisattractivebecauseofimprovementsto
thefollowingattributes:
WorkdoneatIBMAlmadenResearchCenter1ComputerSciencesDepartment,UniversityofWisconsin-Madison,Madison,WI53706
2ComputerScienceDivision,DepartmentofEECS,UniversityofCaliforniaBerkeley,Berkeley,CA94720
3IBMAlmadenResearchCenter,650HarryRoad,SanJose,CA95120
4DepartmentofComputerSciences,UniversityofTexas-Austin,Austin,TX78712
8/6/2019 Middle Tier
2/24
2
(1) Scalability:bydistributingqueryworkloadfromback-endtomultiplecheapfront-endsystems.
(2) Flexibility:withQoS(QualityOfService)controlwhereeachcachehostsdifferentpartsofthebackend
data,e.g.,thedataofPlatinumcustomersiscachedwhilethatofordinarycustomersisnot.
(3) Availability:bycontinuedserviceforapplicationsthatdependonlyoncachedtablesevenif theback-end
serverisunavailable.
(4) Performance:bypotentiallyrespondingtolocalitypatternsintheworkloadandsmoothingoutloadpeaks.
Usingageneral-purposeindustrial-strengthDBMSformiddle-tierdatabasecachingisespeciallyattractivetoe-
Businesses,eventhoughtherehavebeenspecial-purposesolutions(forexample,e-Bayusesitsownfront-enddata
cache[18]).Thisismainlyduetocrucialbusinessrequirementssuchasreliability,scalability,andmanageability.
Forinstance,an industrial-strengthDBMScloselytracksSQLenhancements,andprovidesa varietyof tools for
application development. More importantly, it provides transactional support, multiple consistency levels, and
efficient recoveryservices.Finally,anidealcacheshouldbe transparentto theapplicationthat uses it,andthis is
difficulttoachievewithaspecial-purposesolution.
Theexistingcommercialapproaches([19],[22])ofusingageneral-purposeindustrialstrengthDBMSinvolve
anarchitecturesuchasinFigure2.Inthisapproach,thereisasmalllayer(CacheManager)thatdirectsapplication
database queries to different databases, and the DBCache instance has no knowledge that it is being used as a
DBCache.Inpractice,theCacheManagercouldbepartofacall-levelinterfacelibrary[19]orspecialpurposecode
embeddedintheapplicationitself[22].Sincetheapplicationhastwoseparateconnectionstodifferentdatabases,we
willcallthisthedouble-connectionapproachtodistinguishitfromtheconfigurationinFigure1,whichwewillcall
thesingle-connectionapproach.Inthelattercase,theDBCacheisfullyawarethatitservesasaproxyfortheback-
Figure1:Single-connectionApproachtoDatabaseCaching
Figure2:Double-connectionApproachtoDatabaseCaching
Web/App.
Server
Application
DBServerBrowserHTTP
Middle-tier
DBCache
Web/App.
Server
Application
DBServerBrowserHTTP
Cache
Manager
Middle-tier
DBCache
8/6/2019 Middle Tier
3/24
3
enddatabaseserver.Wewillfurtherdistinguishtheformercaseintoapplication-awaredouble-connection[22]and
application-transparentdoubleconnection [19].Aswewillseerepeatedlyin thispaper,thisdesignchoiceaffects
manyaspectsofthedatabase.
Manyresearchquestionsariseinusingafull-fledgeddatabaseengineformiddle-tierdatabasecaching,andthe
answerstosomeofthemaffecttherelevanceofothers.Indecreasingorderofimportance,theseare:
(1) Whataretheperformancebottlenecksine-Businessapplications,orinotherwords,areweaddressingthe
rightproblemsbyfocusingondatabasecaching?
(2) WillperformancebeacceptableusingacommercialDBMSasa middle-tierdatacache?Featuressuchas
transactional semantics, consistency, and recovery come with some overhead. What features can be
dispensedwithinsuchanenvironment?
(3) Whatdatabasecachingschemesaresuitablefore-Businessapplications?
(4) How can a databasecaching scheme beimplemented in a commercialdatabase engine and how does it
performunderrealistice-Commerceworkloads?
(5) Whatistheimpactofrunningadatabaseserverinthesamecomputerasanapplicationserver?
(6) Howcanwegeneralizetheseresultstootherkindsofwebapplications?
Mostofthesequestionsremainopen,partlyduetothediversityofe-Commerceapplicationsandthecomplexity
ofthesesystems.
Inthispaper,weattempttoanswersomeofthesequestions.Westartwithexaminingtheopportunitiesine-
Commerceapplicationsformiddle-tierdatabasecachingbyrunningane-Commercebenchmarkontypicalwebsite
architectures.Weobservethatthisbenchmarkgenerateda largenumberofsimpleOLTP-stylequeries,theirtable
accesses were highly skewed on a few read-dominant tables, and there was a clear separation between write-
dominanttablesand read-dominanttables.We findthatwebapplicationclonescouldscaleup toheavyloadsand
thisleavesthebackenddatabaseservertoeventuallybecometheperformancebottleneckinthesystem.
WethenexplorehowtoextendDB2sothatitcanbeusedasamiddle-tierdatabasecache.Wetakethesingle-
connection approach,becausewe think itis important toleverage existing featuresof theDBMS engineandnot
produce special purpose code outside of the engine. By extending DB2s federated features we turned a DB2
instanceintoa tableleveldatabasecachewithoutchanginguserapplications.Thenoveltyofthisextensionisthat
query plans at the cache may involve both the cache and the remote server based on cost estimation. Through
experiments, we showed thatthe overhead of adding a full-strength DBMSas a middle-tierdatabasecache was
insignificantfore-Commerceworkloads.Consequently,middle-tierdatabasecachingimprovedusersresponsetime
significantlywhenthebackenddatabaseserverwasheavilyloaded.
Theremainderofthepaperisorganizedasfollows.InSection2,wepresentourprototypemiddle-tierdatabase
cache (calledDBCache) that leverages existing features of federated technology in a commercial DBMSengine
(DB2). In Section3, wedescribeour overallevaluation methodologywith an e-Commerce benchmark. We then
8/6/2019 Middle Tier
4/24
4
presentourexperimentalresultsto showtheperformanceimpact ofmiddle-tierdatabasecaching(Section4). We
discussrelatedworkinSection5andconcludeinSection6withouragendaforfutureresearch.
2 TurningDB2intoaDBCache
In this section, we will first discuss our design considerations for the DBCache. Then, we present our cache
initializationtoolandourmodificationtotheDB2engine.
2.1 DesignConsiderations
ThegoalofourDBCacheistoimprovetheperformanceandscalability
ofweb-basedapplicationsb ydistributingqueryprocessingto theclones
ofthe applicationsand theunderlyingapplicationservers(as Figure3).
WhentheDBCacheiscollocatedwiththeapplicationserver,itservesas
adatabaseserverforthenodeitresidesin.EachDBCachecancachepart
ofthedatainthebackenddatabaseserverandsharetheserverworkload.
WithserverworkloaddistributedtomultipleDBCachenodes,theoverall
userresponsetimeisimprovedwhenthebackendserverisunderheavy
load.
Withthisgoalinmind,weexaminethedesignrequirements,andour
choiceofcachingschemesandexistingmechanisms.
2.1.1 Requirements
ThefirstrequirementinourdesignofDBCacheisthatneithertheapplication,northeunderlyingdatabaseschema
shouldhavetochange.Firstly,itisdesiredthatthedecisiontodeployaDBCachecouldbemadeforanarbitrary
shrink-wrapped application by local administrators who do not have access to the application source code.
Secondly,requiringtheapplicationtobecognizantoftheDBCachewouldresultinincreasedcomplexity,whichis
undesirableespeciallygiventhatcostandmaintainabilityarealreadymajorproblemsinsuchenvironments.Weaim
to makeit easy for databaseadministrators to set up the cachedatabase schema, and makethe DBCache to be
transparenttotheapplicationsatruntime.Noticethatin[22]anypartoftheapplicationthatusestheDBCachefor
performance needs to retarget SQL queries to the specific dialect supported by the DBCache. In general, it is
desirablethatthe DBCacheinstanceiscapableofunderstandinganySQLstatementthattheback-endcan handle.
Withapplication-transparentdoubleconnection [19]thisisnotmuchofanissueastheCacheManagercaneasily
redirectspecialcasequeriestotheback-endserver.
The second requirement is to support reasonable update semantics. Update transactions increase resource
Figure3:DeployingDBCache
DBServer
Browser
NetworkDispatcher
Web/App.
Server
Application
Web/App.
Server
Application
DBCache DBCache
BrowserBrowser
8/6/2019 Middle Tier
5/24
5
contentionatthebackenddatabaseserveraswellasthecostforcacheconsistencymaintenance.Fortunately,E-
commerceapplicationshave"highbrowse-to-buy"ratios(read-dominant)andhightoleranceforslightlyout-of-date
data. This allows us to defer update propagation to the cache so that it affects on-line transactions as little as
possible.Nevertheless,time limits on thedeferralof cache synchronization arenecessaryto ensure "reasonable"
freshnessofcacheddata.
Otherrequirementsincludesupportforfailoverofincomingrequests toa failedcachenodeto anothercache-
node, and dynamic cache node addition and removal. These requirements are aimed at increasing system
availability,manageability,andincrementalchangestocapacity.
2.1.2 ChoiceofCachingSchemes
Given the requirements, we have the task of choosing a caching scheme for DBCache. We categorize caching
schemesby the unitof logicaldata(baseor derived) thatis cached asfollows: fulltable, asubsetof atable, an
intermediatequeryresult,orafinalqueryresult.Although(full)tablelevelcachingcanbeviewedasafulltable
scan query, it is the only scheme among the four that needs only schema information of the cached table. In
contrast,otherschemesneedtoknowextrainformation,suchasthequerydefinitionsthatcorrespondtothecurrent
cachecontent.So,weregardthelatterthreeasqueryresultcachingandconsidercachingasubsetofatableasa
specialcaseofanintermediatequeryresult.
Tablelevelcachinghasseveraladvantages,withthemostdefinitiveonesbeingeasydeploymentandtheability
toanswerarbitraryqueriesoncachedtables.However,updatesonatablemustreachallnodesthatcachethistable
withinareasonableamountoftime.Ifthequeriesareexpensive,tablelevelcachingdoesnotsaveanycomputation
evenonacachehitunlesstheaccesspathsaredifferentand/orthecachenodeislessloaded.Incomparison,queryresult caching schemes may save expensive computation on a cache hit. But the downside is that they require
complex schema definitions for deployment, complex query rewriting at runtime, and complex update logic to
minimizethenumberofcachenodestosynchronize.
Note: Wedo not consider the single-connection anddouble-connection approaches to be different schemes.
Theyaremerelydifferentdesignsforthesameschemei.e.,tablelevelcaching.
The relative performance of these caching schemes is determined by the characteristics of the data and
workload. From our observations on an E-Commerce benchmark and anecdotal knowledge of real world e-
Businesses,tablelevelcachingseemstobesufficientfortheseapplications.ThesimpleOLTP-typequeriesdonot
needcomplexintermediateresultcaching,thesmallnumberoffrequentlyqueriedtablesserveaseasycandidatesfor
tablelevelcaching;andthecleanseparationofread-dominanttablesandwrite-dominanttablesenablesselectively
cachingtablestoreduceupdatepropagationcosts.
Accordingly,asthefirststepofturningDB2intoamiddle-tierdatacache,weexploretablelevelcaching.Other
techniquessuchassubsetcachingandintermediateresultcachingintheformofmaterializedviewsandfinalquery
resultcachingatacall-levelinterfacelibrary(suchasaJDBCdriver)arealsoon-goingworkatIBMbutareoutof
8/6/2019 Middle Tier
6/24
6
thescopeofthispaper.
2.1.3 LeveragingExistingMechanisms
Having decided on table level caching, our problem is the following: given a web application that cannot bechanged,itsbackenddatabaseschema,anditsdatabaseworkload,generatethemiddle-tiercachedatabaseschema,
andprocesstheSQLstatementsutilizingboththecacheandthebackenddatabase.Asdiscussedearlier,wehave
twoalternatives:oneistobuilda specialpurposecachequeryprocessorfromscratch,andtheotheris toleverage
existingtechnologyin DB2.Inourresearchprototype,we chosethelatterrouteforthefollowingreasons.Firstly,
thereisaninterestingmatchbetweenexistingfederatedfeaturesandthequeryroutingfunctionneededinourcache.
Secondly,weprefertoreuseexistingfeaturesandnotbuildyetanotherqueryprocessor.Finally,ourapproachalso
allowsustohandledistributequerieseffectively,whereitispossiblefortheoptimizertodecidewhatportionofthe
queryshouldbeprocessedinthefrontendandwhatportionisinthebackend.Incomparison,intheapplication-
transparent double-connection approach, new functionality is limited to the Cache Manager and outside the
databaseengineproper.
ThefederatedfeaturesinDB2V7 firstappearedinIBMsDataJoiner[10]product.UserscanaccessIBMand
some non-IBM databases, relational dataand non-relational data,as well as local data or remote data througha
singlefederatedDB2database.Thelocaldatabase,calledafederator,translatesauserqueryoveralocalaliasfor
remotedataintoa distributed query toremotedatasources. When settingup a federateddatabase, users needto
createreferencesinthelocaldatabasetoremotedatasources,forexample,anodetoidentifyaremotehost,aserver
foraremotedatabaseonthehost,andanicknameforatableorviewintheremotedatabase.
Ifweuseafederatortomodelacache,andaremotedatasourcetobetheback-enddatabase,weinstantlyhavealmostallthedesiredqueryroutingcapabilitythatweneed.Wecantherefore,designthecacheschematobe such
that all cached tables are local tables, and all un-cached tables to be nicknames of the back-end tables. SQL
statementssubmittedtothecachearecompiledasusual;ifastatementinvolvesnicknames,thefederatedfeaturesof
DB2willestimatethecostofqueryexecutionattheremoteserver,decideonpredicatepushdownbasedonthecost
estimation,andgenerateadistributedqueryplan.
Ourprototypecurrentlydoesnotsupportupdatesinthefront-end.Instead,usersneedtoissuethe setpassthru
command and then issue the updates on the referenced remote tables. Inpassthru mode, the
federator blindlyforwards thestatementstothespecified remoteserver.This allows users toupdate remotedata
sources directly and frees the federator from maintaining multi-site updates. As we will see, this restriction
necessitatessomemodificationtotheDB2engineinordertouseitasamiddle-tierDBCachewithoutchanginguser
applications.Nevertheless,evenwithoutthisrestriction,wewouldstillneedsomethingspecialtorouteallupdates
tothebackendevenwhentheaffectedtablesexistinthecache.Itshouldbenotedthatthepassthrufunctionalityis
presentinDataJoinertohandlethosecaseswhereusersmaychooseto entirelybypassthefrondendandsendthe
requestsdirectlytothebackend.DBMSsoftenhaveuniqueextensionstothestandardSQL.Usingpassthru,onecan
8/6/2019 Middle Tier
7/24
7
exploitthesefeatures.
Like the approaches in [19] and [22], we aim to keep the data in our DBCache consistent using standard
replicationtechniques.Inourapproach,weuseDataPropagator/Relational(DPropR)[11].DPropRisIBMstoolfor
asynchronousdatareplication forrelational databases. In conjunction with DataJoiner, DPropR canalso support
non-relational data replication and replication between non-IBM and IBM databases. DPropR consists of three
independentprograms:anadministrationprogram,adata changecaptureprogram,andanupdateapply program.
The three programs communicate with one another through a set of tables called control tables. Users can
subscribe replication requests using GUI tools andthesubscriptioninformation is stored in thecontrol tables.
Subscriptionscanbeonasetoftables,possiblywithsomeselectionpredicatesontables.Userscanalsospecifythe
frequency of update propagation, the minimum size of each data transfer, among other options. For IBM data
sources, DPropR uses a log reader tocapturechanges; for non-IBM data sources, DPropR uses triggerson the
sourcedatabasetocapturechanges.Theapplyprogramisahumbledatabaseapplicationoperatingonthecontrol
tablesandthetargettables.
Inourapproach,asin[19],allupdates musthappen inthe back-end,and so wechose a replication scheme
(basedonDPropR)forupdatepropagationbetweentheback-enddatabaseandtheDBCache.Thisensuresthatall
write operations take place at the same location with no possibilities for asynchronously-performed conflicting
updates.Infuturework,wewilladdressotheroptionsforhandlingupdatestoimproveavailabilityandperformance.
Inaddition,evenifanactivetransactionholdsawritelockagainstacacheddataitemintheback-end,thatdoesnot
precludeanothertransactionfromreadingatransaction-consistent olderversionofthesamedataiteminanother
front-end.Thisway,weexploitrelaxedconsistencyrequirementsofe-Businessapplicationsandavoidtheoverhead
oftwo-phasecommittransactionalconsistencymaintenance.
2.2 CacheInitialization
ImplementingtablelevelcachingusingDB2consistsof twopiecesof work. Thefirstisatoolforinitializingthe
cache, including choosing tables to cache, and setting up update propagation tools for them. The second is to
modifytheDB2enginesothatitcanconsideruserspecifieddatafreshnessrequirementsandrouterequeststothe
cacheortothebackenddatabase.Inaddition,forreasonsexplainedbelow,DPropRalsohastobemodifiedslightly.
OneofourdesignrequirementsforDBCacheisthattheexistingwebapplicationsandtheirdatabaseschemashould
notchangewhendeployinga DBCache.Asthisisabasicrequirementforusability,weimplementeda toolcalled
DBCacheInittoautomaticallycreatethedatabaseschemaforacachedatabaseandinitializeit.
First,thetoolcollectsnecessaryaccessauthorizationinformationaboutthebackenddatabase,suchastheserver
name, thebackenddatabasename,anduser/passwordinformation.Then, itusesthat information toexaminethe
catalogof thebackenddatabase,and furthercollectsinformationaboutexistingtables,views, indexes,referential
integrity constraints, and so on. Information about triggers is not collected, as triggers are only involved with
updates,andinthisversion,wewantallupdatestohappenonlyattheback-end.StoredProceduresmaycontain
8/6/2019 Middle Tier
8/24
8
update statements and so we dont deploy themat theDBCache either. InDB2, user-defined functions maynot
updatedatabasecontent.However,ingeneral,theback-endandfront-endnodesmaybeondifferentplatforms,and
soforsimplicitythetooldoesnotattempttore-deployuser-definedfunctionsinthefront-end.
Ideally,aftercollectinginformationaboutthebackenddatabase,thetoolshouldexamineasnapshotofatypical
workloadconsistingofSQLqueries,anddecidewhichtablestocache.Thisisa similarproblemtothatsolvedby
DB2's IndexAdvisor,which recommends indexes basedon query workloadand availabledisk space. So, inour
tool,wepresumethatselectionofthecachedtablesversusuncachedtablesisprovideda-priori.
Oncethetablestocachearedetermined,thetoolthencreatesacachedatabasewiththesamenameasthatofthe
backenddatabase,unlessspecifiedotherwise,andreplicatestheto-be-cachedtablesatthecachedatabase.Foreach
tablethatisnottobecached,thetoolcreatesanicknameforitatthecachedatabase,withthesamenameasthe
correspondingtableattheback-end.Inaddition,allviewsintheback-endarerecreatedinthefront-end.Bysetting
upnamesofcachedtablesandnicknamesidenticaltotheircounterpartsatthebackenddatabase,theuserapplication
doesnotneedtochangeorevenbeawareoftheexistenceofthecachedatabase.TheDB2federatedqueryprocessor
willdecidehowtoprocessqueries.
Inaddition,indexesoncachedtablescanbereplicatedtothecachedatabase.Indexesonnon-cachedtables,are
replicated as"indexspecification only", where actual indexesare notcreated, butthe cache databaseis informed
abouttheexistenceofarealindexattheback-end.Thisletsthecachedatabasechooseapotentiallybetterplan.
Finally,forthecachedtables,weneedtoloadtheirinitialdata.Wealsoneedtosetupreplicationsubscriptions
for them so that when the tables in the backend database change, the cached tables will be brought up to date
asynchronously.WeuseDPropR forthispurpose.When thecachedtablesaresubscribedfor updatepropagation,
andthecaptureandapplyprogramsstartrunning,thecachedtablesareloadedwiththedatafromtheircounterparts
inthebackenddatabaseandareupdatedasynchronouslyatthespecifiedfrequency.
2.3 InsideDBCache
2.3.1 DBCacheMode
BecausewearemodifyingaregularDB2intoaDBCache,weneedawaytoindicatetotheDBMSinstancethatit
willbeusedasacache.Wehavetwochoicesoverwheretosetthiscachemode:eitherattheDBMSinstancelevel,
oronaper-databasebasis.Acachemodeattheinstancelevelissimplertoimplementwhileacachemodeatthe
databaselevelismoreflexible.Ifthecachemodeisonaperdatabasebasis,theDBMSwillmakequeryrouting
decisionsaccording to which databasethe queriesareon. Thisallowsa DBCache to managesboth reallocal
databasesandlocaldatabasesthatserveasacacheofaremotedatabase.Thedownsideisthatitislessefficientthan
aDBMSlevelsettingiftheinstanceisdeployedpurelyasa cache.Inourcurrentprototype,weuseaDBMSlevel
cachemodesetting,andmaychangetoadatabaselevelsettinginthefuture.
AnotherissuerelatedtothecachemodesettingiswhetherweallowacacheDBMSinstancetobeservedby
8/6/2019 Middle Tier
9/24
9
multipleremoteDBMSinstances.Ideally,thisshouldbethecasesincefederatedDB2canservequeriesthataccess
multipleremotedatasources.Unfortunately,federatedDB2doesnotsupportmultiplesiteupdatesyetwhileour
DBCache will facethis problemif itallows formultiple remote servers.Therefore, wesupportonlyone remote
serverperDBCacheinstanceinthecurrentstage.Thecachedatabaseisawareoftheidentityof theremoteserver
andusesitinitsqueryprocessingdecisions.
Finally, even in the cache mode, applications sometimes may need to access the most-up-to-date data (for
instance, shoppingcart), and sometimes need toissue updates thatare targetedto the cachedatabaseitself (for
example, the applyprogram ofDPropR). For the case of accessing the most-up-to-date data, we allow users to
changethevalueofanexistingspecialregistercalledREFRESHAGEtospecifytheirdatacurrencyrequirement.
Fortheoperationstargetedatthelocaldatabase,weprovideacommand setpassthrulocalsothatanyoperations
afterthatandbeforeasetpassthruresetareexecutedlocallyeveninacachemode.Wewillpresentmoredetails
forpassthruinthefollowingsection.
2.3.2 Auto-Passthru
When the DBCache instance detects the cachemodesetting, itwillroute requeststo the cachedatabase, tothe
backenddatabase,ortobothplacesfordistributedqueryprocessing.Weachievethisbyintroducinganautomatic
passthru,orauto-passthrumechanismbasedonDB2sexistingpassthrumechanism.
The existing mechanism relies on the commands set passthru and set passthru
reset.Allstatementssubmittedafterapassthrusessionhasbeenturnedonandbeforeithasbeenreset,aresentto
thespecifiedremoteserverdirectly.Theexceptiontothisisanothersetpassthrucommand.Ifausersetspassthru
toServerA andthensetspassthrutoServerBbeforeresettingpassthru,thestatementsbeforesetpassthruBaresenttoServerAdirectly,andaftersetpassthruBtoServerBdirectly,implicitlyendingthepassthrutoServerA.
Whena setpassthruresetisissuedlater,thepassthrumodetoB isthenended.Notethatthismodelisdifferent
fromatrulynestedorastackmodel.
Unliketheexistingpassthrumechanism,wedonotdependonexplicitpassthrusetandresetcommands,asthat
will requireapplic\ationmodification.Instead,auto-passthru takes place inthecache mode. Three factorsaffect
whereastatementisexecuted:(1)statementtype,whetheritisaUDI(Update/Delete/Insert)oraquery(Select),(2)
thecurrentvalueoftheREFRESHAGEregister,whichindicatestheuserstoleranceforout-of-datedata,and(3)
anynicknamesinthequery.
Morespecifically,ifastatementisaUDI,auto-passthrusendsitthroughtotheremoteserver.Thisistoavoid
conflictsduetomulti-siteupdatesaswellastobypasstheupdaterestrictionsonnicknames.Ifastatementisaread-
onlyquery, auto-passthruexaminesthecurrentvalueofREFRESHAGEtodecidefurther:ifthevalueis0,it
means that the user has requested the most-up-to-date data. In this case, auto-passthru will send the statement
Thisvaluecanbesetorchangedwithoutnecessarilymodifyingtheapplicationprogram.
8/6/2019 Middle Tier
10/24
10
throughtothebackenddatabaseservertoensurethefreshnessofthedata.Otherwise,theauto-passthrumechanism
willallowthequerytobeexecutedlocallyatthecache.Interestingly,ifa queryisroutedtothelocaldatabasebut
involvesnicknames,thentheexistingfederatedqueryprocessingtakesover:ifthequeryinvolvesanycachedtables,
thenadistributedqueryplanisgenerated;otherwise,aremote-onlyplanischosen.Finally,statementsotherthana
UDIoraquery,suchasDataDefinitionLanguage(DDL)statements,aredirectlypassedtothebackenddatabaseon
theassumptionthatitiswhattheuserdesired.Thephilosophyhereisthattheuserisingeneralunawareofthe
existenceofthecache,andsoanyschemachangeshouldbeeffectedattheback-enddatabase.
Forsituationswhereauserisawareofthecachesexistence(suchascreationoflocalindexes),weneedaway
tocapturetheusersintent.Anexampleofasituationwhereoperationsareexplicitlytargetedatthecachedatabase
is the DPropR apply program. This application
propagatesdataupdatedontheback-enddatabasetothe
cachedatabase. SinceDpropRreusestheSQLAPI,the
cacheDBMSenginehasnowayofknowingthatthese
updates are targeted at the cache database, and so we
havetoensurethatauto-passthrudoesnotsendthemto
the back-end database again. Other administration
activitiesoverthecachedatabasefacethesameproblem.
We solve this problem byprovidingan SQLstatement
setpassthrulocal.Applicationsusethiscommandto
indicate that the following statements should be
executedlocally evenin thecache mode. Like a normal
passthru, this can be turned off with the set passthru
resetcommand.InourcasewemodifiedtheDPropRapplyprogramtoletitissue"setpassthrulocal"command
rightafteritsetsupaconnectiontothecachedatabaseforapplyingchanges.
Finally,wealsoneedtosolvetheproblemofanexplicitpassthruthatisspecifiedbytheuserapplication.In
other words, the use of passthru for a DBCache should not be intrusive. The following scenario (in Figure 4)
illustratestheproblem.Supposea userapplicationusesitsbackenddatabaseHasa federatorforsomeotherdata
source A andhas a setpassthru statement set passthru A.Now theadministrator decides todeploy a database
cacheCinfrontofitsbackenddatabaseH.WhenthesetpassthruAcommandcomestoC,auto-passthruneedsto
recordthisinformation,andpassthisstatementitselfandthefollowingstatementstothebackenddatabaseH.This
isbecausethecachedatabaseCshouldnotbypassthebackenddatabaseHandforwardsubsequentSQLrequestsdirectly to A. In fact, the cache database C might have no knowledge of the data source A. In a real-world
environment,webapplicationserversoftenliveinaDMZ(De-MilitarizedZone)andmaynotbeabletoaccessthe
sameserversthattheoriginaldatabaseHcan.
Toputittogether,weshowthestatetransitiongraphcomparingtheoriginalpassthruandourauto-passthruin
Figure 5. Note: (1)The italics(e.g. read-only) above each boxdenotes which SQLrequests(excluding passthru
Figure4:HandlingUser-SpecifiedPassthru
H
setpassthruA;
updateA_table;
A
updateA_table;
H
setpassthruA;updateA_table;
A
updateA_table;
C
setpassthruA;
updateA_table;
W/OCache WithCache
8/6/2019 Middle Tier
11/24
11
set/reset and commit/rollback, which are processed specially) are handled at the cache node. The remaining
commandsarehandledatthebackendnode.(2)ThePTDS(i*)statemeansamodeofpassingthrutothemost
recentlyspecifiedpassthrudatasource.
DB2Local PTDS(i*)setpassthruDS(u)
setpassthru resetsetpassthruDS(v)
DBCache
db2set/resetDBCACHE
DBCacheLocalsetpassthrulocal
setpassthrureset
DBCache PTDS(j*)
setpassthruDS(x)
setpassthrureset setpassthruDS(y)
setpassthrulocal
read-only
selective
readsonly
none
everything
none
setpassthrulocal
SetpassthruDS(z)
OriginalPassthruinFederatedDB2
Auto-Passthru inDBCache
Figure5:ComparisonofPassthruStateTransition
ThetwostatesabovethedottedlineareforDB2initsnon-DBCachemode(regularfederatedfeature).Inthe
DB2Localstate,thelocalDB2 processesalmostallkindsofSQLstatements,exceptgivingerrormessagesfor
UDIoperationsonnicknames. OnasetpassthruDS(u)request,thelocalDB2 transitsintoaPT DS(i*)state
wherei*=u.Inthisstate,thelocalDB2passeseveryfollowingrequestto thebackendDS(i*)untila setpassthru
resetrequestcomes.IfanothersetpassthruDS(v)requestcomesinthisstate,thelocalDB2staysinthePT
DS(i*)statewherei*=v.Inthefederatedenvironment,thelocalDB2talkstoonedatasourceatanypointoftime
(norealnesting)andremembersonlythemostrecentlyspecifieddatasourceforpassthru.Onasetpassthrureset,
thelocalDB2willgobacktotheDB2Localstate.
ThethreestatesbelowthedottedlineareforDB2initsDBCachemode.AftertheDBAissuesthedbcache-
mode-settingcommandviadb2set,thelocalDB2enterstheDBCachestate,inwhichthelocalDB2willhandle
readsonlocaltablesselectively(onlywhenrefresh-age=any).Uponasetpassthrulocalcommand,thelocalDB2
enters the DBCache Local state, in which all later requests including UDIs are handled locally. From this
DBCacheLocalstate,uponasetpassthruDS(z),thelocalDB2enterstheDBCachePTDS(j*)state,inwhich
thelocalDB2passescommandstothebackenddatabaseandthebackenddatabaseinturnpassesthecommandsto
thespecifieddatasource.TheDBCachePTDS(j*)statecanalsobereachedfromtheDBCachestateupona
Set Pasthru DS(x) command. Proper handling of application issued set passthru is important since, as we
explained before, even in DataJoiner passthru was supported to allow exploitation of non-standard features of
remotedatasources.
8/6/2019 Middle Tier
12/24
12
3 EvaluationMethodology
Thereareatleasttwoalternativewaystoexaminetheperformanceimpactofmiddle-tierdatabasecachingine-
Businessapplications.Onealternativeistoapplyourprototypeinthefieldandperformcasestudies.Unfortunately,
thisisseldomviableforvariousbusinessreasons.Moreover,withthediversityoftheseapplicationsandworkloads,
itmaybedifficulttogaininsightsfromcasestudies.Theotheralternativeistopursuesimulationstudies.The
problem there is that it may be extremely difficult to model thecomplex running environmentsof e-Commerce
applications.
Therefore, we chose to pursue a middle-of-the-road approach to test out our ideas. We used hardware and
softwarecomponentsthatarepopularinreale-Commerceapplicationstobuildourtestingenvironment.Wechose
an e-Commerce benchmark called ECDW (Electronic Commerce Division Workload) to be the test target
application. This benchmark was developed and is used internally by the WebSphere Commerce Server
PerformancegroupatIBMTorontoLab.ItissimilartotheTPC-Wbenchmark[23],buthasmorefeaturesthataretypicaline-Commerceapplications.
WetestedtheECDWworkloadinseveraltypicalserver-sideconfigurations.WechosetouseproductionDB2
inallconfigurations,insteadofusingproductionDB2forsomeconfigurationsandusingourDBCacheprototypefor
someotherconfigurationswithamiddle-tierdatabasecache.Themainreasonwasthatwe wantedto comparethe
caching scheme results with the non-caching results without worrying about the effects of implementation
differences.Therefore,forconfigurationswithmiddle-tierdatabasecaching,wecreatedspecialdatabaseschemaat
the middle-tier to simulate the effects of caching. By simulating table level caching using the existing DB2
federatedfeatures,itissufficienttoshowhowthisschemeperforms.
Disclaimer:TheusageoftheECDWbenchmarkthroughoutthispaperisforustogaininsightsinane-Commerce
applicationandtestourideas.Itwasneitherintendedfornorshouldbeinanywayviewedasthebestpossible
performanceresultsforanyspecificIBMornon-IBMproducts.
3.1 BenchmarkDescription
TheECDWbenchmarkwasdesignedthroughcloseinteractionswitha widerangeofcustomersinordertoreflect
thekeycharacteristicsofrealworlde-Commerceapplications.Itsimulateswebusersaccessinganon-lineshopping
mall.TheapplicationusedisIBMsWebSphereCommerceSuite(WCS)[13],andSegueSoftwaresSilkPerformer
[21]tooltosimulatetheuserworkloadandtestperformance.WedescribeWCS,SilkPerformer,andECDWitselfin
order.
WCSisanintegratedsolutionusedby e-commercesitesinvariousindustries.A samplesetofcustomersare:
BuyUSA, InfinityQS, Mazdas Competition Parts Program, Milwaukee Electronic Corporation, and IBMs own
shopIBM site (www.ibm.com). It provides services for creating, customizing, running, and maintaining on-line
8/6/2019 Middle Tier
13/24
13
storesthroughouttheirentirelifespanofoperations.Onthedatabaseside,ithasmorethan500tables,manyindexes,
constraints,andtriggers.Therearetoolstocreatethedatabaseschemaandloadindata.Ontheapplicationside,it
uses an Enterprise JavaBean (EJB) framework so thatdevelopers canprogram databaseaccesseswithout being
directlyboundtotheunderlyingdatabaseschema.Furthermore,customerscananddoextendthedatabaseschema
aswellastheapplication,bycreatingnewcolumnsandtablesandnewEJBs.
SilkPerformer isa loadandperformancetesting toolfor e-business webapplications. Itemulates workloads
thattestersspecify,suchasnumberofconcurrentusers,testingperiod,testingscenarios,andotheroptions.Thetool
warms up thetesting environment, doesthemeasurement,and finisheswith a proper shutdown process. Allthe
measurementsareperformedontheclientside;inthenormalconfiguration,noinstrumentationisdoneat theweb
server,applicationserver,orbackenddatabaseserver.Themeasurementoutputincludesthroughput,responsetime,
user-definedcounters,user-definedtimers,andothernumbers,bothonarunningbasisandonanaggregationbasis.
TheECDWbenchmarkusesaround300tablesintheWCSschema.Thedatabasesizecanbesmall(10,000
items,650MB),medium(30,000items,2GB),orlarge(50,000items,3.5GB).The"regularshoppingscenario"is
definedinaSilkPerformerscript,whichdependsontwovariablesusertypeandshopflow.Theusertypesare:
new registering user (5%), existing registered user (10%), and guest user (85%). The shop flowsare: browsing
(88%), browsing and adding toa shoppingcart(5%), browsingand preparing anorder (2%),and browsing and
buying (5%). These ratios were obtained through customer interactions, and thus attempt to mimic common
"browsetobuy"ratiosatrealshoppingsites.
The benchmark measures web interactions and web transactions. Aweb interaction corresponds to a user
conductingaspecificoperationatabrowserthatmayinvolveafewmouseclicksandpossiblysomeuserinput.For
instance,aLogOnwebinteractionisoneinwhicharegistered
userclicksonthe"Logon"link,fillsoutherinformation,and
clicksonthesubmitbutton.Oraguestuserclicksonalinkto
a product item to view the details of that item. A web
transactioncorrespondstoanHTTP sessionaseriesofuser
operations,whichincludesmultiplewebinteractions.
Figure 6 shows the regular shopping scenario of the
benchmark. Each box represents a web interaction. A web
transaction in the regular shopping scenario consists of the
following sequence of web interactions: 1) Go to the front
pageof the store. 2)Iftheuseris anew registered shopper,
register with the site; if the user is an existing registered
shopper,logontothesite;otherwise(aguestshopper),goto
thenextstepdirectly.3)Browseafewitems.4)Ifthecurrent
shopflowis notjustbrowsing,butalsopreparinganorderor
buying,loopafewtimesaddingsomeproductsbrowsedinto
Figure6:RegularShoppingScenario
Register
Registereduser
StoreFront
LogOn
AddToShopCart
Guest Newregisteringuser
Browse
ShoppingBrowsingonly Continue
browsing
GuestorderingRegistereduserordering
DisplayOrderInfo FillInAddressBook
ShippingDetails
Buying
Order
8/6/2019 Middle Tier
14/24
14
theshoppingcart,andbrowsingafewitemsagain.5)Ifthecurrentshopflowneedstoprepareanorder,openand
fillouttheaddressbookforaguestshopperordirectlydisplayorderinformationforregisteredusers,andgenerate
thedetailedshippinginformationforallusers.6)Ifthecurrentshopflowistobuy,finishtheorder.
Sincetherearefourdifferentshopflowsintheregularshoppingscenario,awebtransactionmayendafterone
ofthefollowingfourwebinteractions(grayedboxesasinFigure6):Browse,AddingToShopCart,ShippingDetails,
andOrdercorrespondingly.
3.2 Server-sideTopologies
We tested on five server-side topologies, two of
them with a middle-tierdatabasecache,and three
ofthem without.Thethree topologies that do not
have a middle-tier cache(shown in Figure 7) are
the following: (1) single box, in which the
web/application server and the backend database
serverareonthesamemachine;(2)remoteDB,in
whichthesetwocomponentsareontwomachines;
(3) clustered remote DB, in which there are
multiple web/application server machines to
communicatewiththesamebackenddatabaseserver.
The HTTP requests are distributed to the web
application server machines in round-robin
fashion.throughanetworkdispatcherorsomemechanismslikethat.
Figure7:ThreeNon-CachingServer-sideTopologies
Figure8:TwoCachingServer-sideTopologies
Web/App.
Server
Application
DBServer
Browser
HTTP
DBServerDBServer
Web/App.
Server
Application
Browser
HTTP
Browser
HTTP
NetworkDispatcher
Web/App.
Server
Application
Web/App.
Server
Application
SingleBox ClusteredRemoteDBRemoteDB
DBServerDBServer
Browser
HTTP
Browser
HTTP
NetworkDispatcher
ClusteredDBCacheDBCache
Web/App.
Server
Application
DBCache
Web/App.
Server
Application
DBCache
Web/App.
Server
Application
DBCache
8/6/2019 Middle Tier
15/24
15
Thetwotopologiesthatdohavemiddle-tiercachingareshowninFigure8.DBCachetopologyaddsamiddle-
tierdatabasecachetotheRemoteDBtopology,andClusteredDBCacheaddsamiddle-tierdatabasecachetoeach
web application server in the Clustered Remote DB topology. Note that the single box topology in Figure 7 is
essentiallyaDBCachetopologywitha100%cachehitratio.
3.3 TestEnvironmentDetails
We used six computers in the tests. Four of them were IBM Netfinity 3500 server machines with an 800MHz
PentiumIIICPUand1GBmemory,andtwoofthemwereIBMIntelliStationworkstationswith930MHzPentium
IIICPUand512MBmemory.ThefourservermachineshadWindows2000Serverandthetwoworkstationshad
Windows2000Professional.Allmachineshad20-30GBdiskspace.AllmachineswereonaLANwithbandwidth
of100Mbits/second.
We installed an instance of IBM WebSphere Commerce Suite (WCS) V5.1 on each server system, which
includestheIBMHTTP Server(repackagedApache),WebSphereApplicationServer(WAS),andDB2V7.1.We
also deployedthe ECDW storeapplication (JSP files, HTML files, Java classfiles, EJBfiles,etc.) inthe WCS
instanceoneachseversystem,andcreatedthestoredatabasewiththelargedatasize(3.5GB)usingthescriptsand
datathatcomewiththebenchmark. Ononeworkstationcomputer,weinstalledSilkPerformerV3.5to beusedas
thetestdriver(webclient).Ontheotherworkstationmachine,weinstalledIBMWebSphereEdgeServerV3.6tobe
usedasanetworkdispatchertodistributeHTTPrequeststomultipleWASservers.
WeconfiguredDB2asspecifiedbytheECDWbenchmarkwithappropriatebufferpoolandlogbuffersizes.To
intensifythetestingforthedatabaseserver,wesetthethinktime(waitingtimebetweenwebinteractions)tobezero
intheECDWclienttestdriver.
3.4 DatabaseWorkloadDetails
Weareespeciallyinterestedinthecharacteristicsofthedatabaseworkloadfromtheapplication.ThisWCS-based
benchmarkisacannedapplication.WeusedDB2sdynamicSQLstatementsnapshottool[12]tocapturetheSQL
executioninformationatthedatabaseserverside.ItreportstheSQLstatementtext(with?representingparameter
markers input variables that are bound at query run-time as opposed to compile-time), the total number of
executions,totalexecutiontime,numberofrowsread,andotherinformation.
WeexaminedtheSQLstatementsthatwerecapturedthroughthesnapshottool.Forthe1-userregularshopping
workload, there were151 distinct query templates (with parameter markersand literals),consisting of 125 read
queries,14insertstatements,and11updatestatements.Forthe30-userregularshoppingworkload,therewere388
distinctquerytemplates,butstillthesame14insertsand11updatesasinthe1-userworkload.Thiswasbecause
whilealltheinsertionsand updateswereissuedaspreparedstatementswithparametermarkers,somequeries(for
example,checkingordersofaparticularregistereduser)wereissuedwithliteralsandnotparametermarkers.Asa
8/6/2019 Middle Tier
16/24
16
result, thedifferent workloadshad a fixed numbersof insert andupdatetemplates, buthad differentnumbers of
selectiontemplates.Thislargeandvaryingnumberof querytemplates madeitdifficultforustoanalyzetheSQL
query characteristics of the workloads. Since ordering and registering comprised only a small fraction of the
workload,inlaterexperimentswefocusedonbrowsing-onlyworkloads,whichwecreatedbymodifyingtheoriginal
regularshoppingworkloads.
In a browsing-only scenario, all users are guest shoppers and all transactions are browsing only. We also
examinedtheSQLsnapshotsof1-userand30-userbrowsing-onlyworkloads.Thequerytemplatesinthebrowsing-
onlyworkloadswerefixed. Therewerea totalof 47querytemplates,with27ofthemhavingparametermarkers,
andtheother20not.AllofthemweresimpleOLTPstylequeries,withonly15ofthemhavingjoinsamongtwoto
fourtablesandtheother32beingsingle-tableselectionqueries.Intotal,thebrowsingonlyworkloadsinvolved51
tables.
Thenumberofexecutionsofthesequerytemplatesinbrowsing-onlyworkloadisshowninFigure9.Thetop12
mostaccessedquerytemplatesallhadparametermarkersinthem.Fourofthemwerejoinsandtheothereightwere
selection queries.Collectivelythey accessed15 tables(less than1/3 oftheinvolved tables ofthe workload) and
consistedof88%ofthetotalnumberofSQLexecutions.
Moreover,inregularshoppingworkloads,weobservedthattherewasalargedegreeofoverlapbetweentables
withinsertsandupdatesofthe11tableswith
updates and 14 with inserts (there was no
deletion in the workload), 9 had both inserts
andupdates.Weobservedthattherewaslittle
overlapbetweenread-onlytableswithqueries
andtableswithupdatesandinserts.Onlytwo
tables(userregandusers)weresubjectto
inserts, updates and selects, only one table
(member) was both queried from and
inserted into, and one table (keys) was
queried from and updated to. We also
examined if any updates/inserts happened on the tables that were involved in the top 12 most frequent query
templates.Wefoundthattherewasonlyonesuchtable(thetableusersaccessedbythe6th
mostfrequentquery
template).
In summary, we observed that e-Commerce workloads had short query execution time, highly skewed
popularityoftables,andcleanseparationof read-dominantand write-dominant tables.Thesecharacteristics make
middle-tierdatabasecachingveryattractive.
Figure9:NumberofExecutionsofQueryTemplatesinBrowsing
#executionsofquerytemplates/xact
0
20
40
60
querytemplates
#ex
ecutions
8/6/2019 Middle Tier
17/24
17
4 ExperimentalResults
First, we compared performance of regular shopping scenarios with that of browse-only scenarios. Then, we
measuredtheoverheadofaddingamiddle-tierdatabasecachebyusingadatabasecachewith0%hitrate.Then,we
cached tables for the top 6 most frequent queries at the middle-tier and measured its performance varying the
workloadonthebackenddatabaseserver.Wethenexploredupdatepropagationcostinthecachingscheme.Finally,
weexaminedtheperformanceofclusteredtopologies.
4.1 ComparingWorkloadCharacteristics
Wetestedtheregularshoppingscenariointhesingleboxtopology.Wevariedthenumberofconcurrentusersatthe
simulatorandmeasuredeachuserexecuting100webtransactions.Thebackenddatabasewasrestoredaftereachrun
sothateachrunstartedwiththesamedatabasecontent.The throughputisreportedintermsof averagenumberof
webtransactionspersecond,andtheaverageresponsetimeisreportedintermsofthenumberofsecondsperweb
transaction. We also report the average response times (in seconds) perweb interaction as tBrowse, tBuy,
tOthers,andtOverAll.tBrowsereferstothetimespentinbrowsing.tBuyincludesthetimespentinadding
itemstoshoppingcarts,filling outaddressbook,displaying orderinformation, workingout shippingdetails, and
ordering. tOthersreferstothetimespentinregistereduserloggingonandnew usersregistering.tOverAllis
theweightedaverageresponsetimeperwebinteractionforalltypesofwebinteractions,withtheweightsbeingthe
occurrences of each type of interaction. These numbers are shown in Table 1. We also ran the browsing only
workloadonasingleboxservertopologyvaryingthenumberofconcurrentusersasshowninTable2.
#users 1 5 10 30
#xacts/sec 0.5 0.8 0.8 1.0
secs/xact 1.9 6.2 11.2 30.1
TBrowse 0.1 0.5 0.8 3.0
TBuy 1.0 1.9 6.1 5.5
TOthers 0.2 1.0 3.7 3.0
TOverAll 0.2 0.7 1.2 3.2
Notsurprisingly,boththethroughputandresponsetimes(perwebtransactionandperwebinteraction)ofthebrowsing-onlyworkloadimprovedoverthoseoftheregular shoppingworkload. Butbothscenariosfollowedthe
same pattern: the throughput increased slightly from 1 user to 30 users, while the response time consistently
increased proportionalto theincreaseof thenumberof users.Sincebrowsingrepresentsthe majority ofthe total
workload (tOverAll in regular shopping scenario follows closely with the tBrowse value), and browsing-only
scenarioismuchsimplertotest,mostofthefollowingexperimentsarefocusedonbrowsing-onlyworkloads.
#users 1 5 10 30
#xacts/sec 1.3 1.6 1.6 1.6
secs/xact 0.8 3.1 6.4 19.5
tOverAll 0.1 0.4 0.8 2.5
Table1:RegularShoppingScenarioonSingleBox Table2:Browsing-onlyScenarioonSingleBox
8/6/2019 Middle Tier
18/24
18
4.2 ExaminingOverheadofAddingaFrontEndCache
Adding a middle-tier database cache at the application server creates overhead by consuming resources on the
application server machine. This isa common concern,especially when themiddletier databasecache isa full-
strengthDBMSnotalightweightqueryprocessor,soweexaminethisoverhead.
WeconfiguredWAS(WebSphereApplicationServer)onaservermachinetoletitusealocalDB2server,and
usedtheDBCacheInittooltocreateadatabaseofallnicknamesinthelocalDB2referencingtheback-enddatabase
inaremoteDB2onanotherservermachine.ThismakesthelocalDB2actasamiddle-tierDBCachewitha0%
cachehitrate.Wecomparedtheperformanceofthis dbcache0configurationandtheremoteDBconfigurationto
examinetheoverhead.
Wecomparethethroughputandwebtransactionresponsetimeofthesethreeconfigurationsforvaryingnumber
ofconcurrentusersinFigure10.TheremoteDB configurationisshownasremote,andthedatabasecachewith
allnicknamesdbcache0.Asexpected,dbcache0wasalwaysworsethantheremoteDBcase,becausethebackend
serverwasnotoverloaded.Thisisbecauseallthequeriesatthecachearemissesandeveryquerygoesthroughtwo
databaseservers.Thismadetheperformanceofdbcache0aroundonehalfoftheremoteDBcaseunderalightload
(lessthan10users).Butwhenthenumberofconcurrentusersincreasedto30,thedifferencebecamemuchless
significant. This shows that although using a full strength DBMS as a middle-tier database cache adds some
overhead,thisoverheadisinsignificantwhentheserverisfullyloaded.
4.3 ExaminingServerWorkloadSharing
Inrealworldscenarios,e-businessapplicationshavealargenumberofonlineusers,andtheloadcanbevarybya
factorof100indailyoperations[5].Whenthebackenddatabaseserverismoreheavilyloaded,cachinginthefront
endsisevenmoreimportanttoimproveusers'responsetime.Therefore,wesetupamiddle-tiercachedatabaseata
WASmachineasthefrontendandmeasureditsperformancevaryingtheworkloadonthebackendserver.
Figure10:OverheadofAddingaFrontEndCachewitha0%CacheHitRate
0
0.5
1
1.5
2
1 5 10 30 100#usersThroughput(xacts/sec)
remote dbcache0
0
20
40
60
80
1 5 10 30 100#users
Respons
eTime
(secs/x
act)
remote dbcache0
8/6/2019 Middle Tier
19/24
19
From previousinvestigation onthebrowse-onlyworkloads, weobservedthatthe accessesof different query
templateswerehighlyskewed.Weselectedthetop6mostusedquerytemplatesandcachedinthelocaldatabasethe
eighttablesthattheyaccessed:inthelocaldatabase.Queriesontheseeighttablesconsistedof 71%ofthedatabase
queriesinthebrowse-onlyscenario.Inthelaterexperiments,wealsousedthiscacheconfigurationforallDBCache
cases.
ThesetupforvaryingthebackendserverworkloadintheDBCachetopologyisshowninFigure11.Thegoalis
tovarythebackendserverworkload and examine how cachingcan help.We achievedthisby sending anextra
browsingonlyworkloadtothebackenddirectly.ThesetupforvaryingthebackendserverworkloadintheRemote
topologyissimilarexceptthatthereisnomiddle-tierdbcacheatthefrontend.Boththefrontendresponsetimeand
thebackendresponsetimeweremeasuredatthetestdriver(webclientside).Wecomparetheresponsetimesinthe
dbcachecasewiththoseintheremotecase.
InFigure12,weseethatwhentheextraworkloadonthebackenddatabasewas10or50users,addingacache
at theapplicationserverdid nothelp. Butwhen thenumberofextra users on thebackendreached100,caching
startedtomakeadifference.Thefrontendresponsetimewasimprovedbecausethecachesheltereditsusersfrom
theoverloadedbackenddatabaseserver,andthebackendresponsetimewasimprovedbecausethecachesharedits
serverworkload.Duetoresourceconstraints,wewerenotabletotestmorethan100users.Butwebelievethatthis
cachingbenefitwillbeevenmoresignificantwhenthebackenddatabaseserverismoreheavilyloaded.
Figure11:SetupforVaryingServerWorkload
Figure12:CachingEffectWithVaryingServerWorkload
UsersConnectingtotheFrontEnd
0
2
4
6
8
10
10users 50users 100users
ExtraWorkloadatBackend
ResponseTime
(secs/xact)
w/dbcache w/odbcache
UsersConnectingtotheBackEnd
0
20
40
60
80
100
10users 50users 100users
ExtraWorkloadatBackend
ResponseTime
(secs/xact)
w/dbcache w/odbcache
Web/App.
Server
Application
Backend
DBServerBrowserMiddle-tier
DBCache
Browser Web/App.
Server
Application
HTTP
HTTP
ExtraWorkload
10-userload
8/6/2019 Middle Tier
20/24
20
4.4 ExaminingUpdatePropagationCost
Thisexperimentwastoexaminehowmuchperformanceimpacttheasynchronousupdatepropagationprocesshad
ontheon-linequeryperformance.WesetupDPropRonthebackenddatabaseserverandtheWASserverwitha
DBCache.The captureprogramwasrunningon thebackenddatabaseserver,andtheapplyprogramonthecache
database. The cache database still had the eight tables cached and the other tables uncached. Since the cached
"users" table was updated frequently in a regular-shopping scenario to update the "lastSession" field with the
timestampofthelastlog-onsessionforeachregistereduser,wesubscribedthe"users"tableforupdatepropagation
withtheminimumfrequencyof1minute.
Westillusedthesameextraworkloadsonthebackendasinthepreviousexperiment,andexaminedthe
updatepropagationcostinthissetting. Besidessendinga 10-userbrowsing-onlyworkloadtothefrontend
WASserverwithaDBCacheandsendingextrabrowsing-onlyworkloadtothebackenddatabaseserverwith
aWASclonedirectly,wealsosentanupdateworkloadonthe"lastSession"fieldofthe"users"tabletothe
backend database server (shown in Figure 13). This "lastSession" field was updated with the current
timestamptosimulatetheactualupdateinaregular-shoppingscenariowhenaregistereduserloggedon.The
Figure13:SetupofDPropRwithVaryingServerWorkload
Figure14:UpdatePropagationCostwithVaryingServerWorkload
Web/App.Server
Application
BackendDBServerBrowserMiddle-tierDBCache
Browser Web/App.
Server
Application
FE BE
HTTP
HTTP
ExtraWorkload
10-userLoad
TestDriver
Ca
pture
Apply
UpdaterUpdateWorkload
UsersConnectingtotheFrontEnd
02468
10
10users 50users 100users
ExtraWorkloadatBackend
ResponseTime
(secs/xact)
w/odpropr w/dpropr
UsersConnectingtotheBackend
0
50
100
10users 50users 100users
ExtraWorkloadatBackend
ResponseTime
(secs/xact)
w/odpropr w/dpropr
8/6/2019 Middle Tier
21/24
21
updateworkloadwasexecutedwithoutanywaitingtimebetweenconsecutiveupdates.Theupdatethroughputwas
measuredtobe40-60updates/seconddependingontheserverload.Wemeasuredtheperformanceimpactofupdate
propagationonthebrowsingworkloadsbymeasuringtheresponsetimesatthesimulatedbrowsers.
Wecomparedthewith-dpropr-runningcasewiththe without-dpropr-runningcasewhenthesameupdaterwas
updatingthebackenddatabaseserver.Figure14showsthatingeneralasynchronousupdatepropagationdidnotadd
significantoverheadto theresponse time, althoughthe captureprogramon thebackenddatabase serverincurred
around20%overheadtothequeryworkloadwhentheextraloadontheserverwas100users.Theoverheadcaused
by the apply program was low because the apply program batched up updates for each propagation interval (1
minute),andeachexperimentlasted20-30minutes.Theoverheadcausedbythecaptureprogram wasreadingthe
logand,whenalogrecordrelatingtoasubscribedtableisfound,performinganSQLinsertintothe"changeddata"
table(oneofthecontroltablesusedbyDPropR).
4.5 ClusteringWebApplicationServers
Finally,wecomparedtheperformanceonclusteredtopologies("2-WAS"and"3-WAS")withthatoncorresponding
non-clusteredtopologies("1-WAS")toseehowtheywerescalingwiththenumberofWASmachines.Figure15
shows the throughput and response times of a 30-user browsing-only workload when the backend database is
heavilyloadedunderaCPUhogprogram.
WhenthenumberofWASmachinesincreased,bothclusteredtopologiesimprovedtheuserresponsetime,but
clustereddbcacheimprovedthethroughputmorethanclusteredremoteDBtopology.Thisimpliesthat(1)Forthe
clustered remote DB topology,simply increasing the number of application servers doesnot scale up the entiresystem under a heavy load, and causes the back-end database server to become the bottleneck. (2) Clustered
DBCachetopologysharesthebackenddatabaseserverworkload,anditcanscaleupthroughputbetterbyadding
morecachenodes.WeareinterestedinaddingmoreWASnodestofurtherexaminethescale-upeffectforclustered
topologies.
4.6 Discussion
Figure15:VaryingNumberofWASmachines
0
1
2
3
4
1-WAS 2-WAS 3-WASThroughput(xacts/second)
remote dbcache
0
5
10
15
20
25
30
1-WAS 2-WAS 3-WAS
Respons
eTimes
(secs
/xact)
r emote dbc ac he
8/6/2019 Middle Tier
22/24
22
Byrunningthe benchmarkand itsmodificationsin variousconfigurations,weshow that bothapplication servers
andback-enddatabaseservers canbe bottlenecksunder different workloads. Applicationserversare mostlyCPU
intensiveundere-Commerceworkloads,buttheycanscaletoalargenumberofusersbyreplicating(togetherwith
replicatedapplications)tomultiplenodes.Ingeneral,singlenodecommercialdatabaseserversconsumemuchless
CPUresourcethanapplicationservers,buttheycanalsobecomeabottleneckunderheavyloads.
Oneapproachtoscalingtheback-enddatabasewhenitisabottleneckistouseamoreexpensiveSMP/MPP
system while this approach helps increase the scalability of the system, it does not address the performance,
flexibility and availability concerns listed in Section 1. It is also more expensive compared to the DBCache
approachwherecheaperandlessreliablemachinescanbeusedtoruntheapplicationserverswithDBCache.
Duetoresourceconstraints,wewerenotabletotestmorethan100simulatedusers,morethan3WCSnodes,or
separatingtheapplicationserversfromtheDBserverbyawideareanetwork.However,fromthetrendsshownin
the experiments, we believe that middle-tier database caching on the application servers can improve server
scalability. If these datacachesaredeployedwith edgeservers,they canalso bring contentcloser to users and
improveperformanceintermsofresponsetime.Performancecanalsobeenhancedwhentheapplicationserverand
thedatabasearegeographicallyseparatedbya wide-areanetwork,asiscommonformanycustomers.Finally,by
continuingtoprovidelimitedservicebasedoncacheddata,thisapproachalsoincreasesavailabilityofthewebsite.
Manyissuesrelatingtodatabasecachinginapplicationandedgeserversarediscussedin[18].
5 RelatedWork
ProductsmostrelevanttooursaretheDatabaseCacheofOracles9iIAS(InternetApplicationServer)[19]and
TimesTen's Front-Tier [22]. Oracle's Database Cachecaches full tables using a full-fledgedOracle DBMS, and
reliesonreplicationtoolstoasynchronouslypropagateupdatesfromthebackenddatabasetothecache.TimesTen's
Front-Tierisacachingproductbasedontheirin-memorydatabasetechnology.OneadvancedfeatureofFront-Tier
isthatuserscancreatecacheviewsattheFront-Tier,whichcanbea subsetoftablesandjoinviews.UnlikeOracle
and our DBCache, updates are performed at the Front-Tier cache, and propagated to the backend database at
transactioncommittime(orthepropagationtothebackendcanalsobedoneasynchronously).
A major difference between our work and these existing products is that our cache has distributed query
processing capability. Thisis because weleverageDB2's federatedfeatures so that query plans at thecache can
involvebothsitesinacost-basedmanner[20].Incontrast,OraclesqueryroutinghappensattheOCIlayerbeforea
statement reaches the cache database. Consequently, the statement is either entirely executed at the backend
databaseorentirelyatthecachedatabase.Similarly,applicationsusingTimesTen'sFront-Tiermustbeawareofthe
cachecontentandissuequeriesoncachedcontentandonthebackenddatabaseseparately.
Caching for data-intensive web sites have been recently studied in [2], [3], [7], [16], [17], and [24]. They
focused on cachingdynamically generatedweb pages, HTMLfragments, XMLfragments, or query resultsfrom
outside of a DBMS(except [24]investigated usingthe backend database to cache intermediate query results as
8/6/2019 Middle Tier
23/24
23
materializedviews). Ourfocusis toengineera full-strengthDBMSintoa middle-tierdatabasecachefrominside
out,andimproveavailabilityandperformanceforapplicationswithoutmakinganychangestothem.
Finally,previousworkonmaterializedviews[9]andcachingforheterogeneoussystems([1],[4]),clientserver
databasesystems([6],[14]),andOLAPsystems[8]arealsorelevanttoourwork.Mostofthetechniquesproposed
inthesepapersaresuitableforspecifictypesofapplications,forexample,keywordbasedsearch,mobilenavigation,
orcomputationintensiveOLAPqueries.Comparetotheseapplications,e-Commerceapplicationsareusuallysimple
OLTP-stylequeriesbutrequiresreliability,scalability,and maintainability. Consequently,we choosesimpletable
levelcachingusinganindustrialstrengthDBMS.
6 ConclusionsandFutureWork
Wehaveexaminedtheopportunitiesine-Commerceapplicationsformiddle-tierdatabasecachingbyrunningane-
Commercebenchmark on typical website architectures.We observedthat e-Commerceapplications generated a
largenumberofsimpleOLTP-stylequeries,theirtableaccessesarehighlyskewedona fewread-dominanttables,
andthere wasa clear separation between write-dominant tables andread-dominant tables. We demonstratedthat
web application clones could scale up to heavy loads and the backend database server eventually becomes the
performancebottleneckinthesystem.
We have presented our prototype implementation of a middle-tier database cache. By extending DB2s
federatedfeaturesweturnedaDB2instanceintoaDBCachewithoutchanginguserapplications.Thenoveltyofthis
extensionisthatqueryplansatthecachemayinvolveboththecacheandtheremoteserverbasedoncostestimation.
Throughexperiments,weshowedthattheoverheadofaddingafull-strengthDBMSasamiddle-tierdatabasecache
wasinsignificantfore-Commerceworkloads.Consequently,middle-tierdatabasecachingimprovedusersresponse
timesignificantlywhenthebackenddatabaseserverwasheavilyloaded.
FutureworkincludesextendingtheDBCacheprototypetohandlespecialSQLdatatypes,statements,anduser
defined functions. We are also investigating alternatives for handling updates. Usability enhancements, such as
cache performance monitoring, and dynamically identification of candidate tables for caching are important
directionsforustopursue.
Acknowledgements
WewouldliketothankMehmetAltinelforhishelpfulsuggestionsonthepaper,LarryBrownforsettingupandtestingtheprototypeonNTplatforms,andKiranMehtaandDanWolfsonfordiscussionsaboutourarchitecture.
We would also like to thank Satya Dash, Joseph Fung, Jim Kleewein, Peter Schwarz, and David Tolleson for
answeringourquestionsontheECDWbenchmark,DB2federatedfeatures,andDPropR.
References
8/6/2019 Middle Tier
24/24
[1] Sibel Adali, K. Seluk Candan, Yannis Papakonstantinou, and V. S. Subrahmanian. Query Caching andOptimizationinDistributedMediatorSystems.SIGMODConference1996:137-148.
[2] K.SelukCandan,Wen-SyanLi,QiongLuo,Wang-PinHsiung,andDivyakantAgrawal.EnablingDynamicContentCachingforDatabase-DrivenWebSites.SIGMODConference2001.
[3] JimChallenger,ArunIyengar,andPaulDantzig.AScalableSystemforConsistentlyCachingDynamicWebData.IEEEINFOCOM99.
[4] Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider. Cache Mechanism for HeterogeneousWebQuerying.Proc.8thWorldWideWebConference(WWW8),1999.
[5] Mike Conner, George Copeland and Greg Flurry. Scaling Up e-Business Applications with Caching.DeveloperToolbox Technical Magazine, August 2000.
http://service2.boulder.ibm.com/devtools/news0800/art7.htm
[6] ShaulDar,MichaelJ.Franklin,BjrnrJnsson,andDiveshSrivastava,MichaelTan.DataCachingandReplacement.VLDB1996.
[7] AnindayaDatta,KaushikDutta,HelenM.Thomas,DebraE.VanderMeer,KrithiRamamritham,andDanFishman. A Comparative Study of Alternative Middle Tier Caching Solutions to Support Dynamic Web
ContentAcceleration.VLDB2001.
[8] Prasad Deshpande, Karthikeyan Ramasamy, Amit Shukla, and Jeffrey F. Naughton. CachingMultidimensionalQueriesUsingChunks.SIGMODConference1998:259-270.
[9] AshishGuptaandInderpalSinghMumick(Editors).MaterializedViews:Techniques,Implementations,and
Applications.TheMITPress,1999.[10] IBM.DB2DataJoiner.http://www-4.ibm.com/software/data/datajoiner/[11] IBM.DB2DataPropagator.http://www-4.ibm.com/software/data/DPropR/[12] IBM. DB2 System Monitor Guide and Reference. http://www-4.ibm.com/cgi-
bin/db2www/data/db2/udb/winos2unix/support/document.d2w/report?fn=db2v7f0frm3toc.htm
[13] IBM.WebSphereCommerceSuite.http://www-4.ibm.com/software/webservers/commerce/wcs51.html[14] Arthur M. Keller and Julie Basu. A Predicate-based Caching Scheme for Client-Server Database
Architectures.VLDBJournal5(1):35-47(1996).
[15] Donald Kossmann, Michael J. Franklin, and Gerhard Drasch. Cache Investment: Integrating QueryOptimizationandDistributedDataPlacement.ACMTODS,December2000
[16] AlexandrosLabrinidisandNickRoussopoulos.WebViewMaterialization.SIGMODConference2000.[17] Qiong Luo and Jeffrey F. Naughton. Form-Based Proxy Caching forDatabase-Backed Web Sites.VLDB
2001.
[18] C. Mohan. Caching Technologies for Web Applications. VLDB 2001.http://www.almaden.ibm.com/u/mohan/Caching_VLDB2001.pdf
[19] Oracle Corporation. Oracle Internet Application Server Documentation Library.http://technet.oracle.com/docs/products/ias/doc_index.htm
[20] Mary TorkRoth, Fatma Ozcan, Laura M. Haas: CostModelsDO Matter:Providing CostInformationforDiverseDataSourcesinaFederatedSystem.VLDB1999
[21] SegueSoftware,Inc.SilkPerformer.http://www.segue.com/html/s_solutions/s_performer/s_performer.htm[22] TimesTen.TimesTenFront-Tier.http://www.timesten.com/products/fronttier/index.html[23] TransactionProcessingPerformanceCouncil.TPC-WBenchmark.http://www.tpc.org/tpcw/default.asp[24] KhaledYagoub,DanielaFlorescu,ValrieIssarny,and PatrickValduriez.Buildingand CustomizingData-
IntensiveWebSitesUsingWeave.VLDB2000.