Best practices for using flash in hyperscale software storage architectures

Best practices for using flash in hyperscale software storage architectures

Brandon HoangSolutions Architect

Flash Memory Summit 2015Santa Clara, CA 1


Software-Defined Storage (SDS)


Software Software-Defined Storage

+ =

Double-digit growthA $2.8 billion market by 2017

Commodity Servers

IDC 2015

Who is Hedvig?


• Founded in 2012 by Avinash Lakshman – Co-inventor of Amazon Dynamo

and inventor of Apache Cassandra

• Develop the Hedvig Distributed Storage Platform – A software-defined storage solution

Anatomy of a distributed, hyperscale storage system

Each node hosts:• Metadata• Data


Hosts access storage tier via storage proxies

Scale-out storage tier with

commodity servers

Applicationtier

Taking advantage of flash w/ SDS


• At the storage server node– Store metadata on SSD: fast lookups and tracking– Write-optimization: sequentialize random I/O– Auto-tier and cache active data on SSDs: speed access

to hot data and buffer HDD capacity tier– Provision volumes on “all-flash” persistent storage (aka

“pin to flash”): dedicated, consistent performance for latency sensitive apps

• At the application host– Cache hot data on local SSD or PCIe flash to

accelerate access and avoid network latencies

Biggest flash benefit to hyperscale: Sequentializing random I/O


Application writes data in random blocks, and gets immediate ack from cluster

1

Storage cluster sequentializes incoming blocks (in RAM+SSD) into larger chunks

2

Larger sequentialized data chunks written to underlying disks according to policy

3

Hyperscale client

Hyperscale nodes

Application Server

Three ways flash is used in hyperscale systems

Read/write cache on storage nodes

“Pin to flash” dedicated primary storage volume Client side read cache

Option #1: Node OS storage



Type of flash:SLC/MLC SSDs or PCIe Flash

Use of flash:-- Store metadata for fastoperations – dedupe, compression, snaps, clones-- Autotiering to ensure hot data is migrated to flash-- Write logs for metadata and data

Typical configuration:2x 300GB MLC SAS/SATA SSDs

Option #2: Node volume storage




Use of flash:-- All-flash virtual volumesfor dedicated, consistentperformance on a per-appbasis-- Flash performance for read and write operations

Typical configuration:2x 800GB MLC SAS/SATA SSDs

Option #3: Client-side cache




Use of flash:-- Write-through cache to store hot blocks-- Local metadata storage

Typical configuration:800GB MLC SAS/SATA SSDs

Typical application server can deliver 65K IOPs vs. 15K

without flash

Results: Law Firm• Challenge:

– Needed quick, reliable indexing and lookups of massive 100 million active client legal docs

– Traditional NAS underperformed required access time – Standalone servers with flash performed well, but

predictably ran out of space• Solution/Result:

– Hedvig software-defined storage with SSD/HDD and client-side flash caching

– ~9x faster performance with flash– Scale-out architecture simplifies growth and

expansion



Thank you!

For more on Hedvig visit: • Web: hedviginc.com• Twitter: @hedviginc