Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
PeekingBehindtheCurtainsofServerless Platforms
LiangWang1 ,Mengyuan Li2,Yinqian Zhang2 ,ThomasRistenpart3,MichaelSwift1
1UW-Madison, 2 TheOhioStateUniversity,3 CornellTech
Providersdomore,tenantdoless
…
PhysicalMachine
VM
…
Server Scaling Uptime
APP
Serverless(FaaS)
Non-controllable Controllable
PhysicalMachine
VM
PaaS
…
Server Scaling Uptime
APP
PhysicalMachine
APP
VM
IaaS
…
Server Scaling Uptime
APP
3
Benefitsofserverless
FunctionTenant
Serverless provider
Function:Standalone,smallapplicationdedicatedtospecifictasks
FunctionDeploy
• Minimalconfiguration• Noeffortsonservermanagement• Lowcost
4
Serverless ecosystem
Source:https://venturebeat.com/2017/10/22/the-big-opportunities-in-serverless-computing/
5
Lotsofquestionsaboutserverless
• AreapplicationsresistanttoDDos attacksinserverless?
• Arefunctionssecureinserverless?
• Canserverless providersdeliverguaranteedperformance?
…
Weneedbettermethodologyandmoresystematicmeasurementtoanswerthesequestions
6
Contributions• In-depthstudyofresourcemanagementandperformanceisolationin
• Identifyopportunitiestoimproveserverless platformso AWS:Badperformance isolation, functionconsistencyissue,…o Azure:Unpredictableperformance, tenantisolationissues,…o Google:Resourceaccountingbug,…
• Open-sourcemeasurementtool(https://github.com/liangw89/faas_measure)
Azure Functions Google Cloud FunctionsAWS Lambda
7
Overview• Background
• Methodology
• Highlightedresultso Serverless architectureso Resourceschedulingo Performanceisolationo Bugs
8
Howserverless works
Serverless provider
FunctionUser Request
VM
FunctionContainerResponse
A function runsinacontainer(functioninstance) launchedbytheproviderwithlimitedCPU/memory/executiontime
Launch
VM
9
Howserverless works
Serverless provider
FunctionUser
FunctionContainer
Thefunctioninstancewillbefrozenafterreturningfrominvocation
PauseNewrequests: Reactivated
Tenantsdon’tneedtopaywhileinstancesarepaused
VM
10
Howserverless works
Serverless provider
Function
UserConcurrentrequests
VM
Responses
Providersmanagebackendinfrastructuresandresourcefortenants
Scaleup
11
Methodology
Azure Functions
Google Cloud Functions
AWS Lambda
Measurementfunction• Collectinformationviaprocfs/cmd/env• Executeperformancetests
Settingvariables:• Functionmemory• Function language• Requestfrequency• Concurrentrequest
Invokemeasurementfunctionsmanytimes(50K+)undervarioussettingsfromvantagepointsinthesamecloudregion
Time:July–Dec2017,May2018
12
Tool1: Maprequeststoinstances
Inst1Request1
Instanceidentification:Writeaunique fileon/tmp à persistentduring instancelifetime
Result+“inst1.txt”(newinst!)
inst1.txt
Inst2
inst2.txtRequest2
Result+“inst1.txt”(inst1ranagain!)
Request3
Result+“inst2.txt”(newinst!)
Whichinstancehandledtherequest?
VM2
13
Tool2:MapinstancestoVMs
Requests
VMidentification:• AWS:Anentryinthe/proc/self/cgroup• Azure:TheWEBSITE_INSTANCE_ID environmentvariable• Google:Unknown
Results+Inst ID+VMID
VM1VMID=abc
VMID=abc
VMID=xyz
VerifiedviaI/O-basedandFlush-Reloadcoresidency tests
AreinstancesonthesameVM?
14
Highlightedresults
• Serverless architectures
• Resourcescheduling
• Performanceisolation
• Bugs
15
Domultipletenants’instancesrunonthesameVM?
AWS:Noà VMonlyhostsfunctions fromsingletenant
AWS
VM1 VM2TenantAfunc1
TenantAfunc2
TenantBfunc3
Azure
VM1
TenantA
func1 func2
TenantB
Azure:• 2017:Yesà VMhosts functionsfrommultipletenants• 2018:No.Butotherplatformsstilldothis:Spotinst,stdlib,webtask.io
Google:Unknown
Cross-tenantVMsharingmakeapplicationsvulnerabletoside-channelattacks
16
DoVMshavethesameconfigurations?Methodology:Examineprocfs andenv variablesofthehostVMsof50Kfunction instances
AWS:5CPUconfigurations(1or2vCPUs,4CPUmodels)Azure:9configurations(1or2or4vCPUs,4CPUmodels)Google:4configurations(4CPUmodels)
2x2.9GHz59%
2x2.8GHz38%
2x2.4GHz3%
2x2.3GHz0.09%
1x2.4GHz0.01%
AWS
7947%
8545%
634%
454%
Googlemodelversion
1 vCPU54%2vCPU
25%
4vCPU21%
Azure
DifferenttypesofVMscouldresultindifferentinstanceperformance
17
Highlightedresults
• Serverless architectures
• Resourcescheduling
• Performanceisolation
• Bugs
18
Cantheplatformseffectivelyhandleconcurrentrequests?
Azure/Google:Don’tdeliverpromisedscalability
Methodology: sendNconcurrent requestsandexaminethenumberofinstancesrunning concurrently
N #Requests
#Instance
AWS:N
Google:N/2
Azure:10
19
Howlongdoesittaketolaunchaninstance?
0
50
100
150
200
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
AWS
0 200 400 600 800
1000 1200 1400 1600
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
b
0 2000 4000 6000 8000
10000 12000 14000 16000 18000
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168Mon Tue Wed Thu Fri Sat Sun
Azure
AWS: 160 ms
Google: 500ms (2017)à 2000ms (2018)
Azure: 3600ms (2017)à 300ms (2018)
Coldstart mightaffecttaillatencies
Mediancoldstart latencyperhourover7days(2017)
ms
ms
ms
Mediancoldstart latencyof1000instances
20
Highlightedresults
• Serverless architectures
• Resourcescheduling
• Performanceisolation
• Bugs
21
Whatcanaffectperformance?• CPUshare:fractionof1000-mstimeperiodforwhichtheinstancecanuseCPU
• IOthroughput:Write512KBofdatatothelocaldisk1,000times(viadd orscripts)
• Networkthroughput:Useiperf3torunthethroughputtestfor10seconds
AWS Azure Google
Coresidency Yes Yes UnknownVM configuration No Yes No
Factorsaffectingperformance:
22
HowinstancesareplacedonVMsAWS:Bin-packing;useatmost3328 MBVMmemory
Azure:Random
Google:Unknown
AWSLambdaVMmemoryutilization:85-100%
AWS:Easyforinstancesfromthesametenanttobecoresident
25*128MBinsts:1VM50*128MBinsts:2VMs…200*128MBinsts:8VMs
0
2
4
6
8
10
0 50 100 150 200
No.o
fVMs
No.ofinstances
AWS:No.ofVMsbeingused foragivennumberofinstances(128MB)
23
(Estimatedbasedonthemedianperformanceacrosscoresident instances,over50rounds)
CPU IO Network
AWS
1instance 20instances
CPU IO Netowrk
Azure
1instance 6instances
same
4x-19x-
3x-5x- 6x-
Coresident instancescontendforVMresources
ResourcesareallocatedperVMMoreco-residencydecreasesresourcesperfunction
24
(Estimatedbasedonthemedianperformanceacrosscoresident instances,over50rounds)
CPU IO Network
AWS
1instance 20instances
CPU IO Netowrk
Azure
1instance 6instances
same
4x-19x-
3x-5x- 6x-
Coresident instancescontendforVMresources
ResourcesareallocatedperVMMoreco-residencydecreasesresourcesperfunction
25
AWS/Google:CPUshareisproportionaltomemory
AWS Google
Morememory-->MoreCPU-->Betterperformance
0 500 1,000 1,5000
0.2
0.4
0.6
0.8
1
Function memory (MB)
Fra
ction
CPU share Mem*2/3328
0 1,000 2,000
0.5
1
Function memory (MB)
Fra
ction
CPU share
AWS:Functionsof128MBmemorycanuseCPUfor80ms in1000msFunctions of1.5GBmemorycanuseCPUfor900ms in1000ms
AWS Azure Google
Coresidency Yes Yes UnknownVM configuration No Yes No
26
Whatcanaffectperformance?• CPUshare:fractionof1000-mstimeperiodforwhichtheinstancecanuseCPU
• IOthroughput:Write512KBofdatatothelocaldisk1,000times(viadd orscripts)
• Networkthroughput:Useiperf3torunthethroughputtestfor10seconds
Factorsaffectingperformance:
27
Azure:VMconfigurationsaffectperformance
32.2% 0.5%
67.8%
99.5%
0%
20%
40%
60%
80%
100%
1or2vCPUs 4vCPUs
%ofinstances
0-60% 60%-80%
Azure:
Samefunction+fewerresources=longerrunningtime=moremoney
4-vCPUVMsget1.5x IOthroughput, 2x networkthroughput,andmoreCPUthanothertypesofVMs
CPUshare
28
Highlightedresults
• Serverless architectures
• Resourcescheduling
• Performanceisolation
• Bugs
29
CanAWSpropagatefunctionupdatescorrectly?
50concurrentrequests
InstancesetA
MemoryIAMrolesEnvironmentvariableFunction code
Update1of:
50concurrentrequests
1
2
3
DidanyinstancesinsetBrunfunc insteadoffunc’?
func
func func’
func’
InstancesetB
Methodology:
30
AWS:Inconsistentfunctionusage
3.8%(outof20K)rananinconsistentoroutdatedfunction
31
AWS:Inconsistentfunctionusage
3.8%(outof20K)rananinconsistentoroutdatedfunction
• Case1:Newinstancesranoutdatedfunctions(0.1%)
32
AWS:Inconsistentfunctionusage
3.8%(outof20K)rananinconsistentoroutdatedfunction
• Case1:Newinstancesranoutdatedfunctions(0.1%)
• Case2:Requestshandledbytheinstancesforoutdatedfunctions(3.7%)
33
AWS:Inconsistentfunctionusage
3.8%(outof20K)rananinconsistentoroutdatedfunction
• Case1:Newinstancesranoutdatedfunctions(0.1%)
• Case2:Requestshandledbytheinstancesforoutdatedfunctions(3.7%)
Inconsistentresponsestousers
34
Google:StealthybackgroundprocessProcessescanrunafterfunction invocationconcluded
exports.handler =function handler(req, res){//runasynchronous taskhere.
lineA: user_task();//sendbackresults.
lineB: res.status(http_code).send(user_data);}
Nodejs willexecutelineBwithoutwaitingforuser_task returns
• Processescanstayaliveforto21hours• Nobillingà Useextraresourcesforfree!
Method:
35
Google:StealthybackgroundprocessProcessescanrunafterfunction invocationconcluded
exports.handler =function handler(req, res){//runasynchronous taskhere.
lineA: user_task();//sendbackresults.
lineB: res.status(http_code).send(user_data);}
Nodejs willexecutelineBwithoutwaitingforuser_task returns
Method:
GoogleshouldmonitortheresourceusageoftheentirefunctioninstanceratherthantheNodejs processes
36
Summary• In-depthmeasurementstudythatdiscovervariousissuesinthreeserverless computingplatformso Unpredictableperformanceo Badperformanceisolationo Consistencyissues
• Performancebaselinesanddesignconsiderationsforfuturedesignofserverless platforms
• Responsibledisclosure