Upload
melissa-walker
View
215
Download
0
Embed Size (px)
Citation preview
RECON: A TOOL TO RECOMMEND DYNAMIC SERVER CONSOLIDATION IN MULTI-CLUSTER DATACENTERS
Anindya Neogi
IEEE Network Operations and Management Symposium, 2008
Sameep Mehta
Presented by: Yun Liaw
IBM India Research Lab
Outline
Introduction ReCon Overview Mapping Logic Experimental Validation Related Works Discussions Conclusion and Comments
112/04/19
2
Introduction
Server virtualization has regained popularity for various reasons Virtual machines (VMs) support more flexible and finer
grain resource allocation Physical server’s cost of management and total cost of
ownership (TCO) has gone up drastically Virtualization enables consolidation of a number of
smaller machines as VMs on a large server Leads to more efficient utilization of hardware resources Saving floor spaces, saving management cost
112/04/19
3
Introduction (cont’d)
ReCon: a tool that uses historical resource usage monitoring data to recommend a dynamic or static consolidation plan on servers
112/04/19
4
White: high utilization Black: low utilization
ReCon Overview
Trace data: a set of measurements taken from the system, typically in a timeseries format E.g., CPU, memory, etc.
Cost Static cost: the base cost of
running a physical server with associated workload
Dynamic cost: the cost that varies with the utilization
VM migration cost
112/04/19
5
ReCon Overview (cont’d)
Constraints: to restrict the space of possible mappings between VMs and physical servers System constraints Application level constraints Legal constraints
“What-if” input configuration: For users be able to tweak the input parameters and review the impact of consolidation Time window size of dynamic
consolidation The period that a server should
have no workload to consider turning it off
112/04/19
6
ReCon Overview (cont’d)
Optimal Mapping Algorithm: To take all parameters, costs, constraints, configurations and process the trace data to generate static or dynamic server consolidation Consolidation window: the
non-overlapping time window to divide the historical data for dynamic consolidation For each time window, a
optimal mapping from VM to physical servers are created
In static consolidation, the time window is assumed to be the entire trace
112/04/19
7
Mapping Logic – Basic Notations
Let VM = {VM1, VM2,…, VMN} Each VMi observes and stores K variables O = {O(1,i), O(2,i),…, O(K,i)}
Each VMi is monitored for T time steps, the time series generated by jth sensor of VMi is
112/04/19
8
Informal Problem StatementGiven N application VMs, find n physical machines where n < N
such that each VM is assigned to one physical machine while satisfying domain specified constraints
Informal Problem StatementGiven N application VMs, find n physical machines where n < N
such that each VM is assigned to one physical machine while satisfying domain specified constraints
Mapping Logic - Constraints
Virtual machine constraints:Each VMi is associated with a list of Mi constraints
Physical server constraints:Each physical server Pi is associated with a list of Li constraints
The jth constraint of VMi which should hold in the interval [t1, t2]
The constraint is said to be satisfied if
9
},...,,{ ),(),2(),1( iMii iVCVCVCVC
},...,,{ ),(),2(),1( iLii iPCPCPCPC
1)( ],[),(21 PCeval tt
ij
Where P is the properties of the environment/architecture in time [t1, t2]
Mapping Logic – Optimization Problem Formulation Assume that in the initial, each VM (application) is
hosted by one physical machine, and each physical machine hosts exactly one VM |VM| = |P| = N n is not known a priori, and N is the upper bound of n
A: a N×N matrix, such that Ai,j =1 specifies that VMi is assigned to Pj A will be a diagonal matrix in the initial
Y: a |P| bit long vector, such that Yi =1 implies that Pi is currently running some VMs Y will be a vector with all 1 in the initial
112/04/19
10
Mapping Logic – Optimization Problem Formulation
Costi: the fixed cost incurred if Pi is active
MCosti,j: the cost for migrating VMi to Pj
F: a function that calculates the dynamic cost if one or more VMs are assigned to it Currently this function uses the CPU utilization for
computing the dynamic cost The benefit function attained by the consolidation
is as the following function
11
The cost of initial setting Fixed cost of running physical servers
Cost of VMs migrating to Pj
Mapping Logic – Optimization Problem Formulation the first term of B is fixed and does not change
while maximizing the function, therefore the objective function can be transferred to minimize
112/04/19
12
Mapping Logic – Dynamic/Static Consolidation Dynamic Consolidation
Assume the consolidation window size is Ts
Firstly minimize the optimization function in time interval [1, Ts], and generate the assignment matrix A[1,Ts]
While consolidating for time interval [Ts+1, 2*Ts], using the new set of constraints and A[1,Ts] as the starting point for optimization
Static Consolidation Set all migration costs to zero Set the consolidation window to cover the whole time
period
112/04/19
13
Experimental Validation – Data Set
The trace data was collected using the Model Driven Monitoring System (MDMS) [8] 186 physical servers 35 clusters with each cluster supporting one application Approximately 15 parameters are monitored for every
server But in this paper, authors use CPU utilization data only
Parameter are sampled at 5 minutes interval The optimization problem solver: AMPL and CPLEX
14
[8] B. Krishnamurthy, et al., “Data tagging architecture for system monitoring in dynamic environments,” in IEEE NOMS, 2008
Experimental Validation – Evaluation Metrics Time efficiency
To measure how fast it works given the size of data The efficiency ε is defined as TR/TS
TS: the consolidation window size TR: the time taken by ReCon to generate consolidation plan ε ~ 0 for a highly efficient tool ε ≧ 1 renders the tool useless for all practical purpose
Effectiveness The percentage of physical machines that can be turned off by
packing N VMs onto ni physical machines while satisfying all constraints in the corresponding consolidation window I
The effectiveness S is given by (N - ni)/N S ~ 1 implies most of the physical machines can be turned off
112/04/19
15
Experimental Validation – Change in recommendations VS migration cost
Recommendation: The merging of two VMs onto the same physical server
Migration cost: Inter cluster migration cost is normalized to be 100 Intra-cluster migration cost is varied as percentage of inter migration
cost
112/04/19
16
Experimental Validation – Efficiency Results
112/04/19
17
VMs1~175
ConsolidationWindow
10 ~240 (min)
Tim
e T
aken
on
Experimental Validation – Change in recommendations over time period
To study how the recommendations vary with change in the consolidation window size
Results: As the time window is increased, the number of
recommendations decreases More samples makes it difficult to satisfy the constraints
Time Window Size The time window size should not be too big in order to capture
the dynamic behavior The time window size should not be too small so that the
optimization engine is not used repeatedly without any gain Recommend value: T=300 minutes
112/04/19
18
112/04/1919
X axis: time windowY axis: cost saving
Experimental Validation – Change in Cluster Over Time To study the effect of recommendations on
individual clusters Based on the mean and standard deviation of
saving, the clusters can be categorized into four groups Low Variation – Low Saving Low Variation – High Saving High Variation – Low Saving High Variation – High Saving
112/04/19
20
21
Low VariationLow Saving
Low VariationHigh Saving
High VariationLow Saving
High VariationHigh Saving
X axis: time windowY axis: cost saving
Related Works
There is significant work in capacity planning and runtime resource management domain without bringing in the aspect of virtualization
VMware’s Distributed Resource Scheduler (DRS) Bobroff et al. describe algorithms for
reconsolidation in a dynamic setting while managing SLA violations
In static consolidation several bin-packing heuristics have been used to map VMs to physical servers
112/04/19
22
Discussion and Future Work
Handling multiple attributes The implementation cannot exploit the relationships or
correlation among attributes E.g., the time lag relation between high CPU utilization and high
I/O utilization Runtime reconfiguration tool
In order to convert the planning tool into a real time decision module, highly efficient implementation and forecasting logic is needed Machine learning and time series forecasting techniques are the
candidates for the author’s next step Extending what-if analysis
112/04/19
23
Conclusion
A VM consolidation planning tool called ReCon is provided To analyze the historical resource consumption data The consolidation problem is formulated in an
optimization framework Time varying constraints are easily incorporate to
temporal change in workload characteristics Different migration cost function, virtualization
models can be plugged into the tool
112/04/19
24
Comment
The problem is well-formulated But the mentioned cost functions are mysterious
112/04/19
25