View
33
Download
0
Category
Preview:
Citation preview
vCacheShare: Automated Server Flash Cache Space Management in
a Virtualiza;on Environment
Fei Meng Xiaosong Ma Li Zhou∗
Sandeep U3amchandani Deng Liu∗
∗With VMware during this work
Background
• Virtualization
Guest OS
Apps Apps Apps
Hypervisor
USENIX ATC'14 2
Background
• Virtualization • Server Flash Cache
Guest OS
Apps Apps Apps
Hypervisor
(Server Flash Cache) SFC
USENIX ATC'14 2
Background
• Virtualization • Server Flash Cache
Guest OS
Apps Apps Apps
Hypervisor
(Server Flash Cache) SFC
I/O acceleraNon
Increased VM-‐to-‐server consolidaNon raNos
Reduced disk I/O load
Reduced inter-‐server I/O contenNon
USENIX ATC'14 2
Problem Setting and Motivations
• SFC management
Hypervisor SFC
VM5 VM1 VM2 VM3 VM4
USENIX ATC'14 3
Problem Setting and Motivations
• SFC management – Static space
partitioning
Hypervisor SFC
SFC1
VM5 VM1 VM2 VM3 VM4
SFC2 SFC3 SFC4 SFC5
USENIX ATC'14 3
Problem Setting and Motivations
• SFC management – Static space
partitioning
– Globally shared
Hypervisor SFC
VM5 VM1 VM2 VM3 VM4
Shared
USENIX ATC'14 3
Problem Setting and Motivations
• SFC management – Static space
partitioning
– Globally shared • Issues
– Different access behavior across VMs
– Temporal changes in locality
Hypervisor SFC
VM5 VM1 VM2 VM3 VM4
Shared
USENIX ATC'14 3
Problem Setting and Motivations
• SFC management – Static space
partitioning
– Globally shared • Issues
– Different access behavior across VMs
– Temporal changes in locality
Hypervisor SFC
VM5 VM1 VM2 VM3 VM4
SFC1 SFC2 SFC3 SFC4 SFC5
vCacheShare
USENIX ATC'14 3
vCacheShare Design Goal
Automated and continuously optimized SFC space allocation, considering
USENIX ATC'14 4
vCacheShare Design Goal
Automated and continuously optimized SFC space allocation, considering
– VM priority – Locality of workloads – IO access characteristics – Backend device service time – Configuration events (e.g., migration)
USENIX ATC'14 4
vCacheShare (vCS) Overview
Dynamic cache partitioning based on – Continuous per-VM access history collection
and analysis – Comprehensive cache utility model
• “How useful is more cache space to this workload?” • Combining hit ratio and reuse intensity in cache
utility calculation – Long-term vs. short-term observation – Different turnaround requirement for access history
analysis
USENIX ATC'14 5
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCacheShare Design
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 6
ESXi
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
Cache module
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
VM
SFC
Disk array
Cache module
Cache policy illustraNon
write
read
read
• Write-around read cache
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
• Block-level cache targeting hypervisor
VM
SFC
Disk array
Cache module
Cache policy illustraNon
write
read
read
• Write-around read cache
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
• Block-level cache targeting hypervisor – Dynamically resizable
VM
SFC
Disk array
Cache module
Cache policy illustraNon
write
read
read
• Write-around read cache
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
• Block-level cache targeting hypervisor – Dynamically resizable – VM-aware
VM
SFC
Disk array
Cache module
Cache policy illustraNon
write
read
read
• Write-around read cache
vCS Component 1: Cache Module
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 7
• Block-level cache targeting hypervisor – Dynamically resizable – VM-aware
• LRU
VM
SFC
Disk array
Cache module
Cache policy illustraNon
write
read
read
• Write-around read cache
vCS Component 2: Monitor
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 8
vCS monitor
vCS Component 2: Monitor
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
USENIX ATC'14 8
• Location: In I/O path, between VMs and SFC
vCS monitor
vCS Component 2: Monitor
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
struct cache_io_stats { unit16 VM_UUID; unit16 VMDK_UUID; unit16 Nmestamp; uint8 isRead; uint32 LBA; uint8 len; uint16 latency; uint8 isCached;
}
USENIX ATC'14 8
• Location: In I/O path, between VMs and SFC • Intercepts I/O accesses for trace collection
– In-memory plus on-SSD circular buffer – Copy segments (sliding window) to Analyzer’s buffers
vCS monitor
Cache stat entry
Time window
*Analyzer *SSD buffer Copy to
In-‐memory buffer
vCS Component 2: Monitor
SSD
Intercept
HypervisorVMKernel
User world
SAN, NAS
Cache management(bookkeeping, hash, LRU
list management etc.)
I/O
Monitor
trace buffersI/O log
ActuatorResize actions
Recommended cache allocation plan
Cache utility analyzer
I/O logsRD
Optimizerlinear, non-
linear solver etc
CUsvCacheShare
Cache module
Hardware
I/O I/O
vmdk
VMVMVM
Cluster management node Database
Administration eventssuch as VMotion
RD array
Cluster
vmdk
struct cache_io_stats { unit16 VM_UUID; unit16 VMDK_UUID; unit16 Nmestamp; uint8 isRead; uint32 LBA; uint8 len; uint16 latency; uint8 isCached;
}
USENIX ATC'14 8
• Location: In I/O path, between VMs and SFC • Intercepts I/O accesses for trace collection
– In-memory plus on-SSD circular buffer – Copy segments (sliding window) to Analyzer’s buffers
• Overhead tolerable for workloads tested – Sampling possible if necessary
vCS monitor
Cache stat entry
Time window
*Analyzer *SSD buffer Copy to
In-‐memory buffer
vCS Component 3: Analyzer
• Processes trace segments from Monitor • Output
USENIX ATC'14 9
VM 1:
VM 2:
vCS Component 3: Analyzer
• Processes trace segments from Monitor • Output
– Reuse distance CDF • Used in calculating Hit Ratio (HR)
– Measuring long-term locality behavior – Commonly used in cache partitioning [Mattson1970][Ding2003]
– But incomplete: does not capture access speed
USENIX ATC'14 9
VM 1:
VM 2:
vCS Component 3: Analyzer
• Processes trace segments from Monitor • Output
– Reuse distance CDF • Used in calculating Hit Ratio (HR)
– Measuring long-term locality behavior – Commonly used in cache partitioning [Mattson1970][Ding2003]
– But incomplete: does not capture access speed
USENIX ATC'14 9
VM1: … 1 2 7 8
VM2: … 1 2 7 8
OpNmizaNon Nme
VM 1:
VM 2:
2 hours
…
…
10 seconds
vCS Component 3: Analyzer
• Processes trace segments from Monitor • Output
– Reuse distance CDF • Used in calculating Hit Ratio (HR)
– Measuring long-term locality behavior – Commonly used in cache partitioning [Mattson1970][Ding2003]
– But incomplete: does not capture access speed
– Reuse Intensity (RI) • Measuring how fast are data re-visited • Estimated by ratio of total accessed volume and
footprint • Coarse and more frequent measurement
USENIX ATC'14 9
)/( uniquewtotal StS ×
VM 1:
VM 2:
vCS Component 4: Optimizer
• For each VM – Cache utility
CUi (c) = li ×RRi × (HRi (c)+αRIi )
USENIX ATC'14 10
Backend device latency
Read raNo
EsNmated hit raNo
Reuse intensity
Tuning knob
vCS Component 4: Optimizer
• For each VM – Cache utility
• Optimization – Objective function
max
– Constraints
CUi (c) = li ×RRi × (HRi (c)+αRIi )
priorityi ×CUii=1
n
∑ (c)
Cccc n =+++ !21
maxmin ccc i ≤≤
USENIX ATC'14 10
Backend device latency
Read raNo
EsNmated hit raNo
Reuse intensity
Tuning knob
vCS Component 4: Optimizer
• For each VM – Cache utility
• Optimization – Objective function
max
– Constraints <c1, c2, c3, …> Adjusted cache size
CUi (c) = li ×RRi × (HRi (c)+αRIi )
priorityi ×CUii=1
n
∑ (c)
Cccc n =+++ !21
maxmin ccc i ≤≤
USENIX ATC'14 10
Backend device latency
Read raNo
EsNmated hit raNo
Reuse intensity
Simulated annealing
Tuning knob
vCacheShare ESX Event Handling
VM Events AcNons by vCacheShare Framework
VM Power-‐Off, Suspend, vMoNon Source
Trigger opNmizaNon to re-‐allocate; free IO trace buffers for all associated VMDKs
VM Bootstrap, Power-‐On, Resume
Make iniNal allocaNon based on reserved cache space and priority seings; start trace collecNon
vMoNon DesNnaNon
Trigger opNmizaNon to re-‐allocate based on IO characterisNcs migrated with the VMDKs involved in vMoNon
Storage vMoNon (runNme backend de-‐ vice change)
Suspend analysis/opNmizaNon Nll compleNon; evict device service latency history; trigger re-‐ opNmizaNon upon vMoNon compleNon
VM Fast Suspend, vMoNon Stun
Reserve cached data; lock cache space allocaNon to involved VMs by subtracNng allocated size from total available cache size
USENIX ATC'14 11
Evaluation Setup
• Prototype – VMware EXSi 5.0 with ~2800 LOC C++ user
space code and ~2500 LOC kernel C code • Hardware
– Two AMD Opteron 8-core CPU, 16GiB memory, Intel 400GiB SSD
– EMC Clariion array • VM configuration
– 1vCPU, 1GiB memory and 8/80GiB VMDK – Win2k8, Ubuntu 11.04
USENIX ATC'14 12
0
20
40
60
80
100
150 300 450 600 750 900 1050 1200 1350 1500
Cac
he a
lloca
tion
(%)
Time (s)
VM1 GLRUVM2 GLRU
VM1 vCSVM2 vCS
Micro-Benchmark: Iometer
VM1 footprint shrunk
VM2 replay
USENIX ATC'14 13
0
20
40
60
80
100
150 300 450 600 750 900 1050 1200 1350 1500
Cac
he a
lloca
tion
(%)
Time (s)
VM1 GLRUVM2 GLRU
VM1 vCSVM2 vCS
Micro-Benchmark: Iometer
USENIX ATC'14 13
1
2
3
4
5
6
150 300 450 600 750 900 1050 1200 1350
IO la
tenc
y (m
s)
Time (s)
VM1 GLRU VM2 GLRU VM1 vCS VM2 vCS
Cache allocaNon
Workload latency
Macro-benchmark: VDI 100 VDI VMs going through boot storm + 1 Iometer VM
0
100
200
300
400
500
0 50 100 150 200 250 300 350
Late
ncy
(ms)
Time (s)
GLRU vCS Static No cache
USENIX ATC'14 14
StaNc: 72s vCS: 114s
GLRU:242s No cache:298s
Macro-benchmark: VDI 100 VDI VMs going through boot storm + 1 Iometer VM
0
100
200
300
400
500
0 50 100 150 200 250 300 350
Late
ncy
(ms)
Time (s)
GLRU vCS Static No cache
USENIX ATC'14 14
0
1
2
3
4
5
20 40 60 80 100 120 140 160
Late
ncy
(ms)
Time (s)
vCS VDI Static VDI
VDI latency
Iometer latency
Closing Remarks
• Dynamic SFC partitioning affordable and worthwhile
• Long sampling windows desirable for accurate cache hit ratio estimation – But cause slow response to locality spikes – Can be compensated by simultaneous short-term
locality monitoring • Future work: relationship between sampling
window size and cache hit ratio estimation accuracy
USENIX ATC'14 15
Q&A
Fei Meng fmeng@ncsu.edu
North Carolina State University
USENIX ATC'14 16
References
• [Mattson1970] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Syst. J., 9(2):78–117, June 1970.
• [Ding2003] C. Ding and Y. Zhong. Predicting Whole-program Locality through Reuse Distance Analysis. In ACM SIGPLAN Notices 2003.
USENIX ATC'14 17
Recommended