Upload
berne
View
69
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Differentiated Storage Services. Tian Luo The Ohio State University. Michael Mesnier, Jason Akers, Feng Chen Intel Corporation. 23rd ACM Symposium on Operating Systems Principles (SOSP) October 23-26, 2011, Cascais , Portugal . Technology overview. An analogy: moving & shipping. - PowerPoint PPT Presentation
Citation preview
1
Differentiated Storage Services
Michael Mesnier, Jason Akers, Feng ChenIntel Corporation
Tian LuoThe Ohio State University
23rd ACM Symposium on Operating Systems Principles (SOSP)
October 23-26, 2011, Cascais, Portugal
2
An analogy: moving & shipping
Why should computer storage be any different?
Technology overview
Classification Policy assignment Policy enforcement
3
Differentiated Storage Services
(offline)
Classifier QoS Policy
Metadata Low latency
Boot files Low latency
Small files High throughput
Media files High bandwidth
… …
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
StoragePool A
StoragePool B
StoragePool C
= Current & future research
Technology overview
Classification Policy assignment Policy enforcement
Classify each I/O in-band
4
The SCSI CDB
5 bits 32 classes
5
Motivation: disk caching with SSDs Universal challenges in the industry
– Keeping the right data cached– Avoiding thrash under cache pressure
Conventional approaches– Cache bypass for large/sequential requests– Evict cold data (LRU commonly used)
How I/O classification can help– Identify cacheable I/O classes– Assign relative caching priorities
Technology overview
6
Filesystem prototypes (Ext3 & NTFS)
Classify each I/O in-band
Classifier Cache priority
Metadata 0
Journal 0
Directories 0
Files <= 4KB 1
Files <=16KB 2
Files <=64KB 3
… …Files > GB Lowest
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
FS classification FS policy assignment FS policy enforcement
Disk SSD
7
Classifier Cache priority
System tables 0Temp. tables (on write) 1
Randomly tables 2Temp. tables (on read) 3
Sequential tables BypassIndex files Bypass
Database prototype (PostgreSQL)
Classify each I/O in-band
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
DB classification DB policy assignment DB policy enforcement
Disk SSD
8
Selective cache algorithms Selective allocation
– Always allocate high-priority classes– E.g. FS metadata and DB system tables always allocated
– Conditionally allocate low-priority classes– Depends on cache pressure, cache contents, etc.– High/low cutoff is a tunable parameter
Selective eviction– Evict in priority order (lowest priority first)
– E.g., temporary DB tables evicted system tables– Trivially implemented by managing one LRU per class
Technology overview
9
Technology development
10
Ext3 prototype OS changes (block layer)
– Add classifier to I/O requests– Only coalesce like-class requests– Copy classifier into SCSI CDB
Ext3 changes– 18 classes identified – Optimized for a file server
Small files & metadata A small kernel patch A one-time change to the FS
Ext3 Class
Group Number
Cache priority
Unclassified 0 12Superblock 1 0Group desc. 2 0
Bitmap 3 0Inode 4 0
Indirect block 5 0Directories 6 0
Journal 7 0File <= 4KB 8 1
File <= 16KB 9 2File <= 64KB 10 3
… … …File > 1GB 18 11
Technology development
11
Ext3 classification illustrated echo ‘Hello, world!’ >> foo; sync
– READ_10(lba 231495 len 8 grp 9) <=4KB– WRITE_10(lba 231495 len 8 grp 9) <=4KB– WRITE_10(lba 16519223 len 8 grp 8) Journal– WRITE_10(lba 16519231 len 8 grp 8) Journal– WRITE_10(lba 16519239 len 8 grp 8) Journal– WRITE_10(lba 16519247 len 8 grp 8) Journal– WRITE_10(lba 8279 len 8 grp 5) Inode
7 I/Os (28KB) to write 13 bytes– Metadata accounts for most of the overhead
I/O classification shows read-modify-write and
metadata updates
Technology development
NTFS classification is implementedwith Windows filter drivers
12
PostgreSQL prototype Classification API: scatter/gather I/O
OS changes (block layer)– Add O_CLASSIFIED file flag– Extract classifier from SG I/O
A small OS & DB patch A one-time change to the OS & DB
PostgreSQL class
Group Number
Unclassified 0Transaction log 19System table 20
Free space map 21Temporary table 22Random table 23
Sequential table 24Index file 25Reserved 26-31
fd=open("foo", O_RDWR|O_CLASSIFIED, 0666); class = 19;myiov[0].iov_base = &class;myiov[0].iov_len = 1;myiov[1].iov_base = “Hello, world!”;myiov[1].iov_len = 13;writev(fd, myiov, 2);
Preliminary DB classes
Technology development
13
Cache implementations Fully associative read/write LRU cache
– Insert(), Lookup(), Delete(), etc.– Hash table maps disk LBA to SSD LBA– Syncer daemon asynchronously cleans cache
Monitors cache pressure for selective allocateMaintains multiple LRU lists for selective evict
Front-ends: iSCSI (OS independent) and Linux MD MD cache module (RAID-9)
Technology development
Striping: mdadm –create /dev/md0 –level=0 –raid-devices=2 /dev/sdd /dev/sdeMirroring: mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sdd /dev/sde RAID-9: mdadm –create /dev/md0 –level=9 –raid-devices=2 <cache> <base
14
Evaluation
15
Experimental setup Host OS (Xeon, 2-way, quad-core, 12GB RAM)
– Linux 2.6.34 (patched as described) Target storage system
– HW RAID array + X25-E cache Workloads and cache sizes
– SPECsfs: 18GB (10% of 184GB working set)– TPC-H: 8GB (28% of 29GB working set)
Comparison– LRU versus LRU-S (LRU with selective caching)
Evaluation
16
SPECsfs I/O breakdown
Large files pollute LRU cache(metadata and small files evicted)
LRU
LRU-S fences off large file I/O
LRU-S
17
SPECsfs performance metrics
Syncer overhead
LRU-SLRU
LRU LRU-S
I/O Throughput
LRU LRU-S
Hit rate
LRU LRU-SHDD
Running time
1.8x speedup
18
SPECsfs file latencies
LRULRU-S
Reduction in write latency over HDD
LRU suffers from write outliers(from eviction overheads)
LRULRU-S
Reduction in read latency over HDD
LRU-S reduces read latency(most small files are cached)
LRULRU-S
19
TPC-H I/O breakdown
Indexes pollute LRU cache(user tables evicted)
LRU
LRU-S fences off index files
LRU-S
20
TPC-H performance metrics
Syncer overhead I/O Throughput
LRU-SLRU
LRU LRU
LRU
LRU-S LRU-S
LRU-S
HDD
Running timeHit rate
1.2x speedup
Intel Confidential
21
Conclusion & future work Intelligent caching is just the beginning
– Other types of performance differentiation– Security, reliability, retention, …
Other applications we’re looking at – Databases– Hypervisors– Cloud storage– Big Data (NoSQL DB)
Work already underway in T10 Open source coming soon…
Thank you!
Questions?