Upload
ngomien
View
257
Download
0
Embed Size (px)
Citation preview
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Based Compression in Ceph
OSD with BTRFS
Weigang Li ([email protected])
Tushar Gohad ([email protected])
Data Center Group
Intel Corporation
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Credits
This work wouldn’t have been possible without contributions from –
Reddy Chagam ([email protected])
Brian Will ([email protected])
Praveen Mosur ([email protected])
Edward Pullin ([email protected])
2
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Agenda
Ceph
A Quick Primer
Storage Efficiency and Security Features
Storage Workload Acceleration
Software and Hardware Approaches
Compression in Ceph OSD with BTRFS
Compression in BTRFS and Ceph
Hardware Acceleration with QAT
PoC implementation
Performance Results
Key Takeaways
3
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Ceph
Open-source, object-based scale-out storage system
Software-defined, hardware-agnostic – runs on commodity hardware
Object, Block and File support in a unified storage cluster
Highly durable, available – replication, erasure coding
Replicates and re-balances dynamically
4
Image source: http://ceph.com/ceph-storage
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Ceph
Scalability – CRUSH data placement, no single POF
Enterprise features – snapshots, cloning, mirroring
Most popular block storage for Openstack use cases
10 years of hardening, vibrant community
5
Source: http://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Ceph: Architecture
6
DISK
Backend
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
Backend Backend BackendBackend
btrfs
xfs
ext4
MMM
Commodity
Servers
POSIX
Bluestore
KV
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Ceph: Storage Efficiency, SecurityErasure Coding, Compression, Encryption
DISK
OSD
Backend
Erasure
Coding
DISK
OSD
Backend
...
...
...
CompressionFilestore – BTRFS
Bluestore – native
Encryptiondm-crypt/LUKS
Self-encrypting drive
Ceph Cluster Ceph Client
RGW
Object
(S3/Swift)
EncryptionE2EE
RBD
Block
RADOS
Native
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
8
Storage Workload AccelerationSoftware and Hardware-based Approaches
Software-based Approaches Fixed-function, Reconfigurable Approaches
ISA-L (SIMD) ASIC (QAT) FPGA GPGPU
Application Flexibility
Ease
of
Pro
gra
mm
ing
Distance from CPU Core
Late
ncy,
Gra
nula
rity
On-core
On-chip
QPI-attach
PCIe-attach
Fixed-function
Reconfigurable
CPU
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Software-based Acceleration with ISA-LIntel® Intelligent Storage Acceleration Library
Collection of optimized low-level storage functions
Uses SIMD primitives to accelerate storage functions in software
Primitives supported
Compression – gzip
Encryption – AES block ciphers (XTS, CBC, GCM)
Erasure Coding – Reed-Soloman
CRC (iSCSI, IEEE, T10DIF), RAID5/6
Multi-buffer hashes (SHA1, SHA256, SHA512, MD5), Multi-hash
OS agnostic: Linux, FreeBSD etc
Commercially-compatible, Free, Open Source under BSD license
https://github.com/01org/isa-l_crypto
https://github.com/01org/isa-l
https://software.intel.com/en-us/storage/ISA-L
9
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware-based AccelerationIntel® QuickAssist Technology
10
• Security (symmetric encryption and authentication) for data in flight and at rest
Bulk Cryptography
• Secure Key Establishment (asymmetric encryption, digital signatures, key exchange)
Public KeyCryptography
• Lossless data compression for data in flight and at rest
Compression
Chipset
Connects to CPU via on-board PCI
Express* lanes
PCI Express*
Plugin Card
SoC
Connects to CPU via on-chip
interconnect
Hardware acceleration of compute intensive workloads (cryptography and compression)
Open-source Software Support
Cryptography OpenSSL libcrypto, Linux Kernel Crypto Framework
Data Compression zlib (user API), BTRFS/ZFS (kernel), Hadoop, Databases
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Support for Offloads in Upstream CephIntel® ISA-L and QAT
Erasure Coding
ISA-L offload support for Reed-Soloman codes
Supported since Hammer
Compression
Filestore
QAT offload for BTRFS compression (kernel patch submission in progress)
Bluestore
ISA-L offload for zlib compression supported in upstream master
QAT offload for zlib compression (work-in-progress)
Encryption
Client-side (E2EE) RADOS GW encryption with ISA-L and QAT offloads (work-in-progress)
Qemu-RBD encryption with ISA-L and QAT offloads (under investigation)11
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Today’s Focus: Compression
12
10x Data Growth from
2013 to 20201
• Digital universe doubling every two years
• Data from the Internet of Things, will grow 5x
• % of data that is analyzed grows from 22% to 37%
More Data
1 Source: April 2014, EMC* Digital Universe with Research & Analysis by IDC*
Compression can save
storage capacity
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression: Cost
Compress 1GB Calgary
Corpus* file on one CPU
core (HT).
Compression ratio: less is
bettercRatio = compressed size / original size
CPU intensive, better
compression ratio requires
more CPU time.
13
lzo gzip-1 gzip-6 bzip2
real (s) 6.37 22.75 55.15 83.74
user (s) 4.07 22.09 54.51 83.18
sys (s) 0.79 0.64 0.59 0.52
cRatio % 51% 38% 33% 28%
51%
38%
33%
28%
0%
10%
20%
30%
40%
50%
60%
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
cRat
io %
sec
Compression tool
real (s) user (s) sys (s) cRatio %
*The Calgary corpus is a collection of text and binary data files, commonly used for comparing data compression algorithms.
Source as of August 2016: Intel internal measurements with dual E5-
2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit,
DDR4-128GB
Software and workloads used in performance tests may have been
optimized for performance only on Intel microprocessors. Any change to
any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product
when combined with other products. Any difference in system hardware
or software design or configuration may affect actual performance.
Results have been estimated based on internal Intel analysis and are
provided for informational purposes only. Any difference in system
hardware or software design or configuration may affect actual
performance. For more information go to
http://www.intel.com/performance
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
lzo accel-1 * accel-6 ** gzip-1 gzip-6 bzip2
real (s) 6.37 4.01 8.01 22.75 55.15 83.74
user (s) 4.07 0.49 0.45 22.09 54.51 83.18
sys (s) 0.79 1.31 1.22 0.64 0.59 0.52
cRatio % 51% 40% 38% 38% 33% 28%
51%
40% 38% 38%
33%
28%
0%
10%
20%
30%
40%
50%
60%
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
cRat
io %se
c
Compression tool
Benefit of Hardware Acceleration
14
Compress 1GB Calgary Corpus File
Source as of August 2016: Intel internal measurements with dual E5-2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, 1 x DH8955 adaptor, DDR4-128GB
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Any difference in system hardware or software design or
configuration may affect actual performance. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration
may affect actual performance. For more information go to http://www.intel.com/performance
Less CPU load,
better compression
ratio
* Intel® QuickAssist Technology DH8955 level-1
** Intel® QuickAssist Technology DH8955 level-6
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Transparent Compression in Ceph: BTRFS
Copy on Write (CoW) filesystem for Linux.
“Has the correct feature set and roadmap to serve
Ceph in the long-term, and is recommended for
testing, development, and any non-critical
deployments… This compelling list of features
makes btrfs the ideal choice for Ceph clusters”*
Native compression support.
Mount with “compress” or “compress-force”.
ZLIB / LZO supported.
Compress up to 128KB each time.15
* http://docs.ceph.com/docs/hammer/rados/configuration/filesystem-recommendations/
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
BTRFS Compression Accelerated with Intel®
QuickAssist Technology BTRFS currently supports LZO and ZLIB:
The Lempel-Ziv-Oberhumer (LZO) compression is a
portable and lossless compression library that focuses on
compression speed rather than data compression ratio.
ZLIB provides lossless data compression based on the
DEFLATE compression algorithm. LZ77 + Huffman coding
Good compression ratio, slow
Intel® QuickAssist Technology supports:
DEFLATE: LZ77 compression followed by Huffman coding
with GZIP or ZLIB header.
16
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Compression in BTRFS
BTRFS compress the
page buffers before
writing to the storage
media.
LKCF select hardware
engine for compression.
Data compressed by
hardware can be de-
compressed by software
library, and vise versa.
17
BTRFS
Intel® QuickAssist Technology
VFS
Application
Page
Cache
ZLIB
LZO
sys
call
Storage
Media
Intel®QuickAssist Technology
Driver
Linux Kernel Crypto API
async compressFlush
Job DONE
user
kernel
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Linux Kernel Crypto API
18
Output buffer (pre-allocated)
Compressed Data
Input Buffer (up to128KB)
Uncmpressed Data
4K 4K 4K…
Intel® QuickAssist Technology
4K 4K 4K
zlib_compress_pages_async
Callback
interrupt
Hardware Compression in BTRFS (Cont.)
BTRFS submit “async”
compression job with sg-list
containing up to 32 x 4K
pages.
BTRFS compression thread
is put to sleep when the
“async” compression API is
called.
BTRFS compression thread
is woken up when
hardware complete the
compression job.
Hardware can be fully
utilized when multiple
BTRFS compression
threads run in-parallel.
async compress
btrfs_compress_pages
sleep… return
DMA
inputDMA
output
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression in Ceph OSD
Ceph OSD with BTRFS can support build-
in compression:
Transparent, real-time compression in
the filesystem level.
Reduce the amount of data written to
local disk, and reduce disk I/O.
Hardware accelerator can be plugged
in to free up OSDs’ CPU.
19
OSD
BTRFS
Compute
Node
OSD
Disk Disk
Intel® DH8955
Network
Compute
Node
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark - Hardware Setup
20
CPU 0Xeon(R) CPU
E5-2699 v3
(Haswell) @
2.30GHz
CPU 1Xeon(R) CPU
E5-2699 v3
(Haswell) @
2.30GHz
NVMe
DDR4128GB
Linux
PCIePCIe
SSD x 12
JBOD
PCIe
Intel®
DH8955
plug-in card
PCIe
Ceph Cluster
CPU 0Xeon(R) CPU
E5-2699 v3
(Haswell) @
2.30GHz
CPU 1Xeon(R) CPU
E5-2699 v3
(Haswell) @
2.30GHz
DDR464GB
Linux
FIO
40Gb NIC
Client Server
HBA
LSI00300
Intel®
DH8955
plug-in card
NVMe
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
21
Benchmark - Ceph Configuration
OSD-1
OSD-2
SSD-1
OSD-23
OSD-24
SSD-12
Intel®
DH8955
NVMe-1
Journal
NVMe-2
Journal
Deploy Ceph OSD on top
of BTRFS as backend
filesystem.
Deploy 2 OSDs on 1 SSD
24x OSDs in total.
2x NVMe for journal.
Data written to Ceph OSD
is compressed by Intel®
QuickAssist Technology
(Intel® DH8955 plug-in
card).
MON
SSD-2
OSD-3
OSD-4
Intel®
DH8955
…
BTRFS
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Test Methodology
22
Client
FIO thread0FIO threadFIO thread
CEPHOSD
OSDOSD
CephFS RBD
LIBRADOS
RADOS
Start 64 FIO threads in
client, each write / read
2GB file to / from Ceph
cluster through network.
Drop caches before tests.
For write tests, all files are
synchronized to OSDs’ disk
before tests complete.
The average CPU load,
disk utilization in Ceph
OSDs and FIO throughput
are measured.MON
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark Configuration Details
23
Client
CPU 2 x Intel® Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads)
Memory 64GB
Network 40GbE, jumbo frame: MTU=8000
Test Tool FIO 2.1.2, engine=libaio, bs=64KB, 64 threads
Ceph Cluster
CPU 2 x Intel (R) Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads)
Memory 128GB
Network 40GbE, jumbo frame: MTU=8000
HBA HBA LSI00300
OS Fedora 22 (Kernel 4.1.3)
OSD 24 x OSD, 2 on one SSD (S3700), no-replica
2 x NVMe (P3700) for journal
2400 pgs
Accelerator Intel® QuickAssist Technology, 2 x Intel® QuickAssist Adapters 8955
Dynamic compression Level-1
BTRFS ZLIB S/W ZLIB Level-3
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
off accel * lzo zlib-3
Cpu Util (%) 13.62% 15.25% 28.30% 90.95%
cRatio (%) 100% 40% 50% 36%
Bandwidth(MB/s) 2910 2960 3003 1157
2910 29603003
1157
0
500
1000
1500
2000
2500
3000
3500
0%
20%
40%
60%
80%
100%
120%
BW (MB/s)CPU Util (%)
cRatio (%)
0
20
40
60
80
100
120
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101105
cpu
uti
l%
Time (seconds)
off
accel *
lzo
zlib-3
Sequential Write
24* Intel® QuickAssist Technology DH8955 level-1
** Dataset is random data generated by FIO
60% disk saving, with
minimal CPU overhead
Source as of August 2016: Intel internal
measurements with dual E5-2699 v3 (18C, 2.3GHz,
145W), HT & Turbo Enabled, Fedora 22 64 bit,
kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB
Software and workloads used in performance tests
may have been optimized for performance only on
Intel microprocessors. Any change to any of those
factors may cause the results to vary. You should
consult other information and performance tests to
assist you in fully evaluating your contemplated
purchases, including the performance of that product
when combined with other products. Any difference
in system hardware or software design or
configuration may affect actual performance. Results
have been estimated based on internal Intel analysis
and are provided for informational purposes only.
Any difference in system hardware or software
design or configuration may affect actual
performance. For more information go to
http://www.intel.com/performance
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
off accel * lzo zlib-3
Cpu Util (%) 7.33% 8.76% 11.81% 26.20%
Bandwidth(MB/s) 2557 2915 3042 2913
2557
29153042
2913
1000
1500
2000
2500
3000
3500
0%
5%
10%
15%
20%
25%
30%
0
10
20
30
40
1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940
cpu
uti
l%
Time (seconds)
off
accel *
lzo
zlib-3
Sequential Read
25
Minimal CPU overhead
for decompression
* Intel® QuickAssist Technology DH8955 level-1
Source as of August 2016: Intel internal
measurements with dual E5-2699 v3 (18C, 2.3GHz,
145W), HT & Turbo Enabled, Fedora 22 64 bit,
kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB
Software and workloads used in performance tests
may have been optimized for performance only on
Intel microprocessors. Any change to any of those
factors may cause the results to vary. You should
consult other information and performance tests to
assist you in fully evaluating your contemplated
purchases, including the performance of that product
when combined with other products. Any difference
in system hardware or software design or
configuration may affect actual performance. Results
have been estimated based on internal Intel analysis
and are provided for informational purposes only.
Any difference in system hardware or software
design or configuration may affect actual
performance. For more information go to
http://www.intel.com/performance
CPU Util (%) BW
(MB/s)
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Key Takeaways
Data compression offers improved storage efficiency
Filesystem level compression in OSD is transparent to the
Ceph software stack
Data compression is CPU intensive, getting better
compression ratio requires more CPU cost
Software and hardware offloads methods available for
accelerating compression, including ISA-L, Intel®
QuickAssist Technology and FPGA
Hardware offloading method can greatly reduce CPU cost,
optimize disk utilization & IO in Storage infrastructure
26
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Additional Sources of Information
For more information on Intel® QuickAssist Technology & Intel® QuickAssist Software
Solutions can be found here:
Software Package and engine are available at 01.org: Intel QuickAssist Technology |
01.org
For more details on Intel® QuickAssist Technology visit:
http://www.intel.com/quickassist
Intel Network Builders: https://networkbuilders.intel.com/ecosystem
Intel®QuickAssist Technology Storage Testimonials
IBM v7000Z w/QuickAssist:http://www-
03.ibm.com/systems/storage/disk/storwize_v7000/overview.html https://builders.intel.com/docs/networkbuilders/Accelerating-data-economics-IBM-flashSystem-and-Intel-quick-assist-
technology.pdf
Intel’s QuickAssist Adapter for Servers: http://ark.intel.com/products/79483/Intel-
QuickAssist-Adapter-8950
DEFLATE Compressed Data Format Specification version 1.3
http://tools.ietf.org/html/rfc1951
BTRFS: https://btrfs.wiki.kernel.org
Ceph: http://ceph.com/27
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Additional Information
28
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
QAT Attach Options
29
2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS
GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves
these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this
information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm%20 Performance tests
and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or
configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and
on the performance of Intel products, visit Intel Performance Benchmark Limitations
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel, Intel logo, Intel Core, Intel Inside, Intel Inside logo, Intel Ethernet, Intel QuickAssist Technology, Intel Flow Director, Intel Solid State Drives, Intel Intelligent Storage Acceleration Library, Itanium,, Xeon, and Xeon Inside are
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Performance will vary depending on your
hardware and software configurations. Consult with your system vendor for more information.
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology is a security technology under development by Intel and requires for operation a computer system with Intel® Virtualization
Technology, an Intel Trusted Execution Technology-enabled processor, chipset, BIOS, Authenticated Code Modules, and an Intel or other compatible measured virtual machine monitor. In addition, Intel Trusted Execution Technology
requires the system to contain a TPMv1.2 as defined by the Trusted Computing Group and specific software for some uses. See http://www.intel.com/technology/security/ for more information.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other
benefits will vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.
* Other names and brands may be claimed as the property of others.
Other vendors are listed by Intel as a convenience to Intel's general customer base, but Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of these
devices. This list and/or these devices may be subject to change without notice.
Copyright © 2016, Intel Corporation. All rights reserved.
30