Upload
sereneworkshop
View
366
Download
1
Embed Size (px)
Citation preview
Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems
Risk assessment based cloudification
Szilárd BozókiGábor Koronka
Supervisor: Prof. András PatariczaDepartment of Measurement and Information Systems
2
Cloud computing around the globe
Source: Cisco Global Cloud Index: Forecast and Methodology 2013–2018
3
Mission critical cloud computing August 31, 2015
o Federal aviation administrationo 108$ million now, $1 billion in 10 yearso Source: CSC news
Network Functions Virtualizationo The „telco” cloudo Source: NFV
Problem statement Moving to the cloud, cloudification:
„How large pool of VMs is needed?”o It depends…
• SLA?• Application and platform characteristics
Aim: mission critical and high value applications Approach:
oModeling distinctive cloud featureso Revisit decades old modeling and fault tolerance
techniques leveraging distinctive cloud features
4
5
From dependability to risk A definition of dependability
o “the ability of a system to avoid service failures that are more frequent or more severe than is acceptable.”
Core idea:Mission insurance ? Mission incident probability * value of assete.g. 3$ < > = 50% * 10$expected value of loss (risk): 5$while insurance < expected value of loss good insurance
insurance investment in redundancyo diminishing returns
?
6
Risk interpretation Risk the expected value of loss
due to potential service outageso Failure frequency
• Cloud fault model– Fault tree, or reliability block diagram
o Severity• Cost of application downtime
Risk mitigationo Design for resiliency, cloudificationo Reduction of failure frequency or severity
7
Critical services over ordinary clouds? Environment
o HW/SW stacko Cloud service models
Research objectiveo Carrier grade IaaSo Fast resiliency (SDN)o Redundancy architectural
pattern• HW,VM, App, else
o Measures• Availability (downtime)• Cost HW 3
HW 2HW 1Redundancy
Paa
S p
rovi
der
Saa
S p
rovi
der
Hardware
VMM (Hypervisor)
+ optional Host OS
VM
Guest OS
Container
Application
IaaS
pro
vide
r VM 3VM 2
VM 1Replication
APP 3APP 2
APP 1Replication
8
Physical cloud model A distinctive cloud feature to begin with.
VM
Data Center
ClientInternet
HW- ClusterHW- Server
Availability Zone
9
Related work SLA requirement from provider view
o Provider manages and optimizes the SLA portfolio Concrete application specific cloud user view
o Concentrates on queuing and scheduler based job execution service
Deploying a scientific grid on a private cloudo Profitability analysis
Our focus: user view on an IaaS public cloud
10
Cost resilience trade-off In order to reduce total cost, how many
redundant virtual machines are needed? Risk
o Failure frequency• Cloud fault model based on the physical layout of IaaS
o Severity• Cost of application downtime (SLA)
Risk mitigation failure frequency reduction redundant VMs (hot-running)Cost overhead of VMs (VM price/hour)
11
Abstract cloud model based on the physical cloud
Abstract cloud model
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM 𝒇 𝟏 (𝒕) 𝒇 𝟐 (𝒕 ) 𝒇 𝟑 (𝒕) 𝒇 𝟒(𝒕 )𝒈𝟏(𝒕 ) 𝒈𝟐(𝒕 ) 𝒈𝟑(𝒕)
𝒉𝟏(𝒕 ) 𝒉𝟐(𝒕)
12
Top level fault tree - basic model
Path down
AV zone down VM downRegion
down
System DownAll paths down
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM
N+7 replication based redundancy- basic model
13
AVZone 1
fail
VM 1fail
Region 1
fail
VM 2fail
All VMs fail
AVzone 1 path fail
AVZone 2
fail
VM 1fail
VM 2fail
All VMs fail
AVzone 2 path fail
Total AVzone path fail
Region 1 path fail
AVZone 1
fail
VM 1fail
Region 2
fail
VM 2fail
All VMs fail
AVzone 1 path fail
AVZone 2
fail
VM 1fail
VM 2fail
All VMs fail
AVzone 2 path fail
Total AVzone path fail
Region 2 path fail
System failureAll paths downLevel1 Client
Level2 Region
Level 3 AV zone
Level 4 VM
14
Variance of resource quality A distinctive cloud feature with impact. If any resource underperforms VM down
o Interferenceo Noisy neighbors
VM down Computedown
Networkdown
VM down
Level 4 VM
15
Network resource quality variation
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A. “Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
16
Compute resource quality variation
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A. “Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
17
Results – availability (number of 9s)
0 1 2 3 4 5 6 70
2
4
6
8
10
12
A basicA extended
Number of VMs
Num
ber o
f 9s
18
Results – availability (number of 9s)
0 1 2 3 4 5 6 70
2
4
6
8
10
12
A basicA extendedB basicB extendedC basicC extended
Number of VMs
Num
ber o
f 9s
19
Results – annual expected total cost
0 1 2 3 4 5 6 7 100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
A basic modelA extended
Number of VMs
Annu
al to
tal c
ost
20
Results – annual expected total cost
0 1 2 3 4 5 6 7 100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
A basic modelA extendedB basicB extendedC basicC extended
Number of VMs
Annu
al to
tal c
ost
21
Concluding remarks Aim: mission critical and high value applications Modeling distinctive cloud features
Risk based interpretation of dependabilityFault tree based on the physical cloud
Revisit decades old modeling and fault tolerance techniques leveraging distinctive cloud featuresRedundant VMs on the cloud
Numerical analysisRedundant VMs N+(4-6VMs) extra availability reduction of risk
reduction of total cost Future work
o Model refinemento Data acquisition, measuremento Dynamic replication for critical mission phases