Upload
hakhanh
View
223
Download
1
Embed Size (px)
Citation preview
Isilon Gen6 Performance
© Copyright 2017 Dell Inc.2
Performance at PB scale Support evolving needs with lower TCO
Address Customer Challenges
Increased performance per usable TB
Enable applications requiring lower latency to leverage scale out NAS
Predictable performance at scale
Cluster in a box and simple growth path
Customizable solution Same building blocks
irrespective of cluster profile
PERFORMANCE & SCALE AGILITY AND LOWER TCO
Data Protection & Availability
ENTERPRISE GRADE RESILIENCE
Smaller fault domains enabling rapid rebuilds
Predictable failure handling at PB densities
Eliminate Single points of failure
© Copyright 2017 Dell Inc.3
Introducing Isilon Infinity
Front(without faceplate) Back
Internal View
Future proofed…• Drive Technology• Drive count
Sled
Performance-Cost optimized• CPU• Memory
Compute Suitcase
Dedicated slots• Capacity• Count
Cache SSD
© Copyright 2017 Dell Inc.4
Introducing Isilon Gen6Isilon Family
Capacity
Perfo
rman
ce
S-Series
NL-Series HD-Series
X-Series
S-Series
NL-Series HD-Series
X-Series
250k ops, 15GB/s
F800
3.5GB/s480TB/chassis
H400
120TB-480TB/chassis
A200
40k ops5GB/s
H500117k ops12GB/s
H600
800TB/chassis
A2000
© Copyright 2017 Dell Inc.5
Isilon Gen6 hardware performance view• Improved resource scaling
– More CPU, RAM, Journal, Network per-disk than with Gen5
• CPU Sizing– 2-16 cores Broadwell Xeon– Up to 30% faster clock
• CPU Efficiency– 5-10% generational improvement– Single socket for 30% lower memory latency
• Journal– Up to 4x larger, 8GB effective space– Lower latency higher throughput, DMA offload
• Bandwidth– 8 x PCI-E (64Gbits) to all I/O (F/E, B/E, SAS, Peer)– Dual 40Gbit Ethernet front & back– 12Gbps SAS w/over provisioned switch
0102030405060708090
100
S210 H600
CPU efficiency, MB/s per GHz
© Copyright 2017 Dell Inc.6
OneFS 8.1.0 “Freight Trains”
• Software must evolve to actualize potential; hardware gains don’t come for “free”
• H600 on 7.2 would perform the same as S210
• 8.1.0 brings dozens of software improvements to enable Gen6 hardware
– Latency reduction– Threading & lock scaling– I/O concurrency– Prefetching
10000
15000
20000
25000
30000
35000
S210 7.2 S210 8.0 S210 8.1 H600
SAS (HDD) namespace Ops, a history
© Copyright 2017 Dell Inc.7
Life of an Op
• Operation flow in Isilon’s distributed architecture– It’s all about minimizing and mitigating latency
• No single performance bottleneck, everything must improve in concert
• What are the primary sources of latency?– Network round-trips, front and back– I/O devices (journal, disk subsystem)– Compute time– Waiting on execution resources (threads, queues)
• Let’s look more closely at each of these sources…
Isilon Node Stack
TransactionsCache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
FSops
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
© Copyright 2017 Dell Inc.8
Networking
• Responsible for communication with client (front-end) or other nodes in the cluster (back-end)
• Lower latency through custom adaptive interrupt moderation
• High throughput TCP– Very complex changes, all RFC compliant– Microsecond timer resolution– Safest aggressive congestion control
• Ethernet back-end– Utilizes Explicit Congestion Notification (ECN) to
avoid tail loss, lower latency– Higher bandwidth than QDR IB (32Gbit vs. 40Gbit)– Latency parity at high loads
Client I/O
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.9
Networking Throughput
0500
100015002000250030003500400045005000
Standard TCP Backend TCP
Throughput in 20:1 “Incast” test(Tail Loss mitigation)
MB/s
stddev
0
300
600
900
1200
1500
1800
2100
S210 8.0 S210 8.1 H600 F800
Throughput, MB/ssingle-client, single-thread
© Copyright 2017 Dell Inc.10
Front-end Execution (BAM)
• BAM – Block Allocation Manager
• Responsible for:– Executing filesystem operations– Logical state– Distributed locking and coordination– Enforcing POSIX semantics
• Hardware changes reduce create CPU time from 400μs to 250μs (38% reduction)
– Namespace ops are 75-90% CPU time
• Targeted CPU optimizations– Local locks– create efficiency
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.11
Gen6 Low Concurrency
0
50
100
150
200
250
300
350
400
S210 8.0 S210 8.1 H600 F800
Small file developer workload, sec to complete (lower is better)
© Copyright 2017 Dell Inc.12
Prefetch (aka Readahead)
• Prefetch reads blocks ahead of client reads– Fills L2 cache on the participant nodes and L1 on the
initiator node (the node handling the read request)
• Increase average I/O size from 256 KiB to 1 MiB– Fewer disk seeks, less possible disk contention
• Layout– ‘streaming’ layout improvements for optimal drive
selection– Better contiguity over longer runs
• Prefetch improvements– Maintain large requests end-to-end
› Initiator -> BAM/RBM -> Participant -> LBM -> Disk– Better align prefetch I/O with allocated stripes by
greedily fetching
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.13
Streaming I/O Efficiency
0
50
100
150
200
250
300
S210 H600 F800
Per-drive read throughput, OneFS 8.1, MB/s
© Copyright 2017 Dell Inc.14
Message Execution
• RMB – Remote Block Manager
• Responsible for transactional block protocol between nodes; marries BAM to LBM
• Major focus of node level scalability
• Increasing CPU cores requires more threads and more concurrency in order to use those cores
– Can’t just increase threads without addressing thread contention issues!
• Concurrency work resulted in a 4x improvement from 256k messages/s on Gen5 to over 1M on Gen6
• More operations with less latency leads to more throughput and more ops/sec
Client I/O
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.15
Back-end Execution Resources
0
20000
40000
60000
80000
100000
120000
140000
160000
S210 8.0 S210 8.1 H600 F800
Software build ops/sec
0
500
1000
1500
2000
2500
3000
3500
4000
4500
S210 H600 F800
Per-node concurrent read, MB/s
© Copyright 2017 Dell Inc.16
Journal & Transactions• Gen6 uses M.2 flash for persistent storage
• The OneFS journal is responsible for recording all modifications to the filesystem in a fault tolerant fashion
• Journal transaction latency slows all modifying ops (e.g. SETATTR)
• Rewrite of journal threading and locking model dramatically lowered latency
• Example: OneFS Job Engine job SnapshotDelete does 10,000 transactions/sec
– 8 ms average create time during the job on OneFS 7.x› Bottlenecked on the journal lock
– < 1 ms average create time on OneFS 8.1› Due to locking and threading rewrite
Client I/O
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.17
Journal Performance
0123456789
Latency (ms) vs. transactions
S210/7.2
F800
0
20000
40000
60000
80000
100000
120000
140000
160000
S210 7.2 S210 8.0 H600 F800
Transactions per second
© Copyright 2017 Dell Inc.18
Disk I/O
• Eliminate major source of I/O latency– Read immediately following write to disk
• Gen6 uses write-cache disabled disks– More enterprise standard– More predictable latency
• IO Scheduler rewrite– Predictable latency using write-cache disabled disks– Increased queue depth to compensate for write-cache
disabled disks – Dynamically limit queue depth
› Ensures better completion times for reads› Better fairness for writes
– Scalable to 100s of thousands of ops
Client I/O
Cache LBM
Node N-1…
NFS SMB HDFS FTPHTTP
Front-end Network (1GigE, 10GigE, 40GigE)
NVRAM Journal
Disk Subsystem
Cache BAM
Back-end Network(Infiniband, 10 GigE, 40 GigE)
RBMNode N+1…
FSops
Transactions
© Copyright 2017 Dell Inc.19
I/O Optimizations
0500
100015002000250030003500400045005000
S210 8.0 S210 8.1
Mixed I/O ops(reads and writes)
190
200
210
220
230
240
250
S210 8.0 S210 8.1
Tar extract, L3 cache, sec to complete(lower is better)
© Copyright 2017 Dell Inc.20
OK, so what have we learned?
• OneFS 8.1.0 is a big software release with many performance improvements– Optimizations throughout the Isilon software stack
• Gen6 is a big hardware release also with many improvements– Not just the next hardware iteration!– Isilon game changer
• Both hardware AND software optimizations are needed to unlock full performance potential of a system
• Isilon portfolio now much more differentiated, as the next two slides will highlight…
© Copyright 2017 Dell Inc.21
Portfolio Comparison
8 RU16 RU
4 RU4 RU
4 RU
4 RU
0
51000
102000
153000
204000
255000
S210 X410 H400 H500 H600 F800
“Home Directory” ops/sec, OneFS 8.1
Performance variation across the lineup
© Copyright 2017 Dell Inc.22
Portfolio ComparisonPerformance variation across the lineup
8 RU16 RU
4 RU4 RU
4 RU4 RU
4 RU
0
2000
4000
6000
8000
10000
12000
14000
16000
S210 X410 H400 H500 H600 F800 (IB) F800 (Eth)
Streaming reads and writes, OneFS 8.1, MB/s
read
write
© Copyright 2017 Dell Inc.23
Isilon all-flash is new, tell me more!or
Why all-flash? SAS (HDD) vs. F800• Let’s end this by looking more specifically at the all-flash offering, new
in the Isilon portfolio– Let’s compare it to H600, the “next fastest” system
• 25% more throughput (15.5 GB/s vs. 12 GB/s)
• 27% more namespace ops (140k vs. 110k ops/sec in SWBUILD)
• 1/16th the latency (8 ms random read vs. 500 μs)
• 12x drive rebuild speed (FlexProtect, 1800 GB/hr vs. 150 GB/hr)
• Reliable, low latency with mixed workloads
• Surplus of ops…
© Copyright 2017 Dell Inc.24
0
50
100
150
200
250
300
H600 F800
100% 4 KiB random read, MB/s
Transactional workloads
00.5
11.5
22.5
33.5
44.5
5
EDA benchmarklatency vs. ops
(20) S210
F800
15x increase in ops500 μs vs. 8 ms per op
© Copyright 2017 Dell Inc.25
Portfolio SummaryExternal Name HDD Capacities Compute (per node) I/O
Isilon F800• 1.6TB SSD• 3.2TB SSD• 15.4TB SSD
Ultra (CPU: 2.6GHz 16c, Mem: 256GB)
Front-End: 10GbE, 40GbEBack-End: IB, 40Gb Ethernet
Isilon H600 • 600GB SAS• 1.2TB SAS
Turbo (CPU: 2.4GHz 14c, Mem: 256GB)
Front-End: 10GbE, 40GbEBack-End: IB, 40Gb Ethernet
Isilon H500 • 2TB SATA• 4TB SATA• 8TB SATA
High (CPU: 2.2GHz 10c, Mem: 128GB)
Front-End: 10GbE, 40GbEBack-End: IB, 40Gb Ethernet
Isilon H400• 2TB SATA• 4TB SATA• 8TB SATA
Med (2.2GHz 4c, Mem: 64GB) Front-End: 10GbEBack-End: IB, 10Gb Ethernet
Isilon A200• 2TB SATA • 4TB SATA• 8TB SATA
Low (2.2GHz 2c, Mem: 16GB) Front-End: 10GbEBack-End: IB, 10Gb Ethernet
Isilon A2000 • 10TB SATA Low (2.2GHz 2c, Mem: 16GB) Front-End: 10GbEBack-End: IB, 10Gb Ethernet
DEW DividerSubtitle
© Copyright 2017 Dell Inc.28
DEW title and body layout
• Body text with bullets– Indent 1
› Indent 2
• Body text with bullets– Indent 1
› Indent 2
• Body text with bullets– Indent 1
› Indent 2
© Copyright 2017 Dell Inc.29
DEW title, subtitle, and body layout
Subtitle
• Body text with bullets– Indent 1
› Indent 2
• Body text with bullets– Indent 1
› Indent 2
• Body text with bullets– Indent 1
› Indent 2
© Copyright 2017 Dell Inc.30
DEW title and 2 columns layout
• Column 1 body text with bullets– Indent 1
› Indent 2
• Column 1 body text with bullets– Indent 1
› Indent 2
• Column 1 body text with bullets– Indent 1
› Indent 2
• Column 2 body text with bullets– Indent 1
› Indent 2
• Column 2 body text with bullets– Indent 1
› Indent 2
• Column 2 body text with bullets– Indent 1
› Indent 2
© Copyright 2017 Dell Inc.31
DEW title, subtitle, and left column layoutSubtitle
• Left column body text with bullets– Indent 1
› Indent 2
• Left column body text with bullets– Indent 1
› Indent 2
• Left column body text with bullets– Indent 1
› Indent 2