Upload
fabien-hermenier
View
82
Download
0
Tags:
Embed Size (px)
Citation preview
Higher SLA Satisfaction in Datacenters with Continuous Placement Constraints
Huynh Tu Dang [email protected]
Fabien Hermenier [email protected]
SLA for a virtualised application
SLA for a virtualised application
spread the replicas
SLA for a virtualised application
performance guarantee
SLA for a virtualised application
low latency
N3
N2
time
SLA: spread(VM1, VM2)
VM1
VM2
reconfiguration algorithm
N1
N3
N2
time
SLA: spread(VM1, VM2)sys-admin query: offline(N1)
VM2VM1
VM2
reconfiguration algorithm
VM1
N1
Rec
onfig
urat
ion
proc
ess
N3
N2
time
SLA: spread(VM1, VM2)sys-admin query: offline(N1)
VM2VM1
VM2
reconfiguration algorithm
VM1
N1
with discrete restrictions
Discrete restriction is not enough
not an unpredictable situation,an algorithmic issue
Evaluating the reliability of discrete placement constraints
• simulate a 256-server datacenter
• running 350 HA webapp (5,200 VMs)
• BtrPlace as the reconfiguration algorithm
• 4 reconfiguration scenarios that mimic industrial use case
• 100 instances per scenario
Studied constraints
among
singleResourceCapacity
maxOnline
splitAmong
spread
DBs on a same edge-switch for a fast synchronisation.
keep resource for hypervisor management operations
webapp split over 2 clusters for disaster recovery
240 nodes online at maximum to fit licensing policy
replicas on distinct servers for fault tolerance
verti
cal e
lastic
ity
Tiers 1
Tiers 2
Tiers 3
scenario
verti
cal e
lastic
ity
Tiers 1
Tiers 2
Tiers 3
scenario
scenariohorizontal elasticity
Tiers 1
Tiers 2
Tiers 3
scenariohorizontal elasticity
Tiers 1
Tiers 2
Tiers 3
scenarioboot storm x 400
scenarioserver failure
Scenario Violated SLAs
ActionsVM Boot Migrate Node Boot Node Shutdown
Vertical Elasticity 40.72 0% 99.99% 0.005% 0.005%
Horizontal Elasticity 0.19 99.82% 0.18% 2.82% 0%
Server Failure 29.56 61.29% 35.89% 2.82% 0%
Boot Storm 0.35 98.57% 1.43% 0% 0%
Migrations lead to unanticipated placements
0
25
50
75
100
VerticalElasticity
HorizontalElasticity
ServerFailure
BootStorm
Violations
spread among splitAmong maxOnline
performance lossspof
failure
Migrations tend to violate relative placement constraints
spread(VM[1,2])
Trading unreliable discrete constraints …
we addressed an assignment problem
spread(VM[1,2])
… for safe continuous constraints
we must address a scheduling problem
Continuous placement constraints with
from discrete to continuous among|simpleAmong
N1N2 N3
N4 N5N6
stay on a same partition by the end of the reconfiguration process
from discrete to continuous among|simpleAmong
N1N2 N3
N4 N5N6
stay on a same partition by the end of the reconfiguration process
from discrete to continuous among|simpleAmong
N1N2 N3
N4 N5N6
stay on a same partition by the end of the reconfiguration process
Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem
N1N2 N3
N4 N5N6
from discrete to continuous among|simpleAmong
Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem
N1N2 N3
N4 N5N6
from discrete to continuous among|simpleAmong
allDi↵erent(dhost
1 , d
host
2 )
discrete spread(VM[1,2]) ::= continuous spread(VM[1,2]) ::=
allDi↵erent(dhost
1 , d
host
2 ) ^d
host
1 = c
host
2 =) a
start
1 � a
end
2 ^d
host
2 = c
host
1 =) a
start
2 � a
end
1
continuous spread
Disallow temporary overlapping • require to know this may happen • scheduling 101
continuous maxOnlinediscrete maxOnline(N[1..10], 7)::=
continuous maxOnline(N[1..10], 7)::=
10X
i=1
nqi 7
8i 2 [1, 10], n
on
i
=
⇢0 if n
q
i
= 1
a
start
i
otherwise
n
off
i
=
⇢max (T ) if n
q
i
= 0
a
end
i
otherwise
8t 2 T, card({i|non
i
� t ^ n
off
i
}) 7
scheduling 201
detailed knowledge of a reconfiguration process
harder to imagine, model & implement
5 10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
5 10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
boot storm
horitzontal elasticity server failure
Performance overhead
5 10 20 50 100 200Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
vertical elasticity
5 10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
5 10 20 50 100Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
boot storm
horitzontal elasticity server failure
Performance overhead
5 10 20 50 100 200Duration (sec.)
Solve
d in
stan
ces
discretecontinuous
020
4060
8010
0
vertical elasticity
• discrete restriction is not enough
• continuous restriction is a solution
• a different view on the problem
• challenging, but still possible to implement
Conclusions
Future Work
• a broader range of constraints and objectives
• reducing performance overhead
• static analysis to detect un-necessary continuous constraints
• controlled relaxation to handle hard situations
open source, 20+ placement constraints, demo, tutorials, everything for reproducibility
http://btrp.inria.fr