27
M emo r y C h a n n e l S t o r a g e ( MCS ) D e m y s t i f i e d J e r om e M c F a r l a n d P r i n c i p a l P r o d u c t M ar ke t e r

Memo ry Chanel Storage MCS Demystified - Storage ... Channel Storage...SO, WHAT IS MEMORY CHANNEL STORAGE ? + An Arch itecture (not a single product) + Enables Fl ash Storage to Dir

  • Upload
    lehuong

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Memory Channel Storage™

( M C S ™ ) Demystified

Jerome McFarland

Principal Product Marketer

AGENDA

+ INTRO AND ARCHITECTURE

+ PRODUCT DETAILS

+ APPLICATIONS

THE COMPUTE-STORAGE DISCONNECT

+ Compute And Data

Have Far Outgrown

Storage Advancements

2000 2010

Enterprises need a solution to close the gap . . .

2020

FL ASH STORAGE EVOLUTION THUS FAR

2005 2010NOW

??

??

—P

CI

SS

Ds

b

rin

g F

LA

SH

c

lo

se

r to

th

e C

PU

—S

AT

A S

SD

s m

ov

e F

LA

SH

in

to

th

e E

nte

rp

ris

e

2015

TODAY’S SOLUTION . . .

MEMORY

MEMORY

MEMORY

MEMORY

+ Processor, memor y and applications are tightly coupled

+ Storage not local to processor and application execution

+ SSDs deployed to minimize the disconnect, but critical issues remain:

- Long trips required for data retrieval

- Resource contention

- Response T ime (Latency) suf fers

CPU

WHAT IS THE IMPACT

OF RESPONSE TIME (A .K .A . LATENCY) ?

RESPONSE TIME

SLOW RESPONSE TIME

FAST RESPONSE TIME

DETERM INISTIC

RESPONSE TIME

USER EXPERIENCE

APPLICATION L AG

CRITICAL BUSINESS

ADVANTAGES

QUALITY OF SERVICE

THE PERFORMANCE TRADE-OFF

+ Traditionally customers have faced a suboptimal

trade-of f in storage system design:

OPTIM IZE IOPS SACRIFICE LATENCY

OPTIM IZE L ATENCY SACRIFICE IOPS

A Painful Workaround . . .

+ When SSD “IOPS vs . Latency” trade-offs are unacceptable,

adding expensive RAM is a traditional recourse .

+ However, adding RAM can create an imbalance between

incremental performance requirements

and rap idly growing solution cost .

PERFORMANCE

REQUIREMENTS

SO

LU

TIO

N C

OS

TS

+ MCS is coupled with the processor, application,

and system memory

+ Distance and contention issues are eliminated

+ Data stays within memor y subsystem for local access

+ Achieves ultra-low latenc y

+ Enables linear “performance vs . cost’ scalabilit y

+ Enabled by a unique architectural approachPERFORMANCE

REQUIREMENTS

SO

LU

TIO

N C

OS

TS

MEMORY CHANNEL STORAGE SOLUTION

CPU

MEMORY

MEMORY

MEMORY

MEMORY

+ Massive Flash capacity exposed through the low-latency memor y subsystem .

MCS SYSTEM VIEW

Leveraging the

Power of

Parallelism...

SOFTWARE ARCHITECTURE

MCS Firmware

Hardware

Flash Controller Firmware

MCS Kernel DriverBIOS/UEFI

OS Stack

Block Layer

Management

SoftwareApplications

User Space

Kernel Space

Diablo SanDisk OEM 3rd Party

HARDWARE ARCHITECTURE

MCS-based DIMM

MCS Chipset Storage Subsystem

Power System: Detection / Protection

MCS Controller

FL ASH

Conroller

FL ASH

ConrollerDD

R3

PH

Y

AP

PL

ICA

TIO

N D

AT

A

NAND

Flash

NAND

Flash

PERSISTENT MEMORY

WITHIN NUMA

I/O Controller

+ The best location for application data is within the NUMA architecture

+ Highly parallel (threaded) applicatons running on parallel processors and cores,

with highly parallel memory and storage access

CPU0

QPI

QPI

QPI

QPI

QPI QPI

QPI QPI

CPU1

CPU2 CPU3

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Core0

Core2

Core4

Core6 Core7

Core1

Core3

Core5

Core0

Core2

Core4

Core6

Core1

Core3

Core5

Core7

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

Mem

Chan

I/O Controller

Core0 Core1

Core2 Core3

Core4 Core5

Core6 Core7

Core0 Core1

Core2 Core3

Core4 Core5

Core6 Core7

SO, WHAT IS MEMORY CHANNEL STORAGE?

+ An Architecture (not a single product)

+ Enables Flash Storage to Directly Interface on the Memor y Channel

+ Presents as a Block I/O Device

+ Can be Managed just like Existing Storage Devices

+ DDR3 Interface, Standard RDIMM Physical Form Factor

+ Plugs into Standard DIMM Slots

+ Self-contained, No External Connections Required

STORAGE SUBSYSTEM STORAGE SUBSYSTEMCONTROLLER

NAND NAND NAND NAND

CONTROLLER

NAND NAND NAND NAND

SYSTEM REQUIREMENTS

& COMPATIBILITY

+ Hardware and BIOS Requirements

+ Server enabled with MCS UEFI BIOS modif ications

+ DDR3-compatible processor

+ MCS is compatible with standard JEDEC-compliant 240-pin RDIMMs

+ Supports DDR3-800 through DDR3-1600

+ 8GB of standard memory (RDIMM) installed in the system

+ MCS follows standard server DIMM population rules

+ Initial OS Support

+ L inux (RHEL , SLES)

+ Windows Server

+ VMware ESXi

TECHNOLOGY COLLABORATION TO CREATE

THE FIRST MCS-ENABLED PRODUCT

+ Reference architecture design

+ DDR3 to SSD ASIC/firmware

+ Kernel and application level

software development

+ OEM System Integration and

enterprise application domain

knowledge

+ Guardian Technology for

enterprise applications

+ SSD controller & FTL firmware

development and test

+ Supply Chain and Manufacturing

with flash partner

+ System Validation

+

SanDisk™ “ULLtraDIMM™” POWERED BY MCS

GUARDIAN™ TECHNOLOGY

+ 19nm MLC

+ 10 DRIVE WRITES PER DAY

+ 5 YEAR WARRANTY

ENTERPRISE CLASS

RELIABILITY

+ BACKUP POWER CIRCUITRY

+ END-TO-END DATA PROTECTION

+ 2M HOURS MTBF

ADDITIONAL FEATURES

+ S.M.A .R .T. MONITORING

+ SUPPORTS TRIM

+ MAINTENANCE TOOLS

MEMORY CHANNEL INTERFACE

+ DDR3 PROTOCOL

+ CONFIGURABLE AS BLOCK DEVICE

+ STANDARD RDIMM FORM FACTOR

SCALABLE &

COST EFFECTIVE MEDIA

+ 200, 400GB CAPACITIES

+ SCALABLE ARCHITECTURE

+ 19nm MLC NAND

SUPERIOR PERFORMANCE

ACROSS WORKLOADS

+ Enables standardization

on a f lexible, low-latency

platform

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 200000 400000 600000

Late

ncy

(m

s)

IOPS

4K Random 100% Write

8x 200G MCS PCIe Competitor A PCIe Competitor B

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500000 1000000

Late

ncy

(m

s)

IOPS

4K Random 100% Read

8x 200G MCS PCIe Competitor A PCIe Competitor B

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 100000 200000 300000 400000 500000

Late

ncy

(m

s)

IOPS

4K Random 70/30 Read/Write

8x 200G MCS PCIe Competitor A PCIe Competitor B

Tested using MCS prototype modules.

IT’S ALL ABOUT THE APPLICATIONS!

LOW LATENCY

APPLICATIONS

+ 3X MESSAGE RATE

+ 40X REDUCTION

OF MAX LATENCY

VIRTUAL

DESKTOP

+ LOW σ =

CONSISTENT

USER EXPERIENCE

+ MEET QoS/SLA

REQUIREMENTS

DATABASE/CLOUD

+ 7X TPSE

IMPROVEMENT

+ 3X REDUCTION OF

AVERAGE LATENCY

BIG DATA

ANALYTICS

+ MINIMIZE

QUERY TIMES

+ EXTEND WORKING

SET BEYOND RAM

ALLOCATION

SERVER

VIRTUALIZATION

+ 2X VMs PER

NODE...

+ ...USING 1/6

THE RAM PER VM

REDUCED LATENCY

ENABLES REAL-TIME ANALYTICS

+ THE APPLICATION HAS BECOME THE BOTTLENECK IN E-TRADING

15% Read Mix 15% Read/Write Ratio Overview

VIRTUAL DESKTOPS

SCAL ABLE ARCHITECTURE

+ 200 SERVERS, 8x SAS SSD

+ 3 GB RAM / VM

+ 100 SERVERS, 4x MCS Modules

+ 1/2 GB RAM / VM

+ IT’S NOW EASY AND COST EFFECTIVE TO

ADD USERS AND SCALE INFRASTRUCTURE

10,000

VIRTUAL DESK TOP

DEPLOYMENTVM VM VM

VM VM VM

VM VM VM

VMware ESXi

VM VM VM

VM VM VM

VM VM VM

VMware ESXi+ 2X VIRTUAL DESKTOPS PER HOST

+ 75% DRAM REDUCTION PER HOST

+ MAINTAIN WORKLOAD PERFORMANCE

MEMORY MAPPED I/O ACCELERATION

10 million records (20GB mmap) using synchronous msync calls

microsecond floor bins

mmap Random Write: Write Latency Histogram

oc

cu

ra

nc

es

(lo

gs

ca

le)

MCS

PCIe Competitor 1

PCIeCompetitor 2

mmap Random Write: Write Latency Percentiles

percentiles

mic

ro

se

co

nd

ce

ilin

g(l

og

sc

ale

)

MCS

PCIe Competitor 1

PCIeCompetitor 2

+ MCS 99th-percentile latency is 2x lower than Competitor 2

and 10x lower than Competitor 1

+ MCS has the tightest latency distribution

TODAY’S

FRAGMENTED STORAGE SOLUTIONS

INDEXING

ANALYTICS

SWAP AREA

CACHING

LOGGING

COLD STORAGE

METADATA

MORE...

MCS-ENABLED

HOMOGENOUS PL ATFORM

INDEXING

ANALYTICS

SWAP AREA

CACHING

LOGGING

COLD STORAGE

METADATA+

MORE...

SUMMARY

Memory Channel Storage

+ Leverages parallelism and scalability of the

memory channel

+ Signif icantly reduces data persistence

latencies and improves single thread throughput

Benefits of MCS

+ 200GB to tens of TB’s of flash in standard

DIMM form factor and DDR3-CPU interface

+ Disruptive performance accelerates existing

applications and enables new flash use cases

+ Scalability facilitates economic, “right-sized”

system solutions

+ Form factor enables high-performance flash

in servers, blades, and storage arrays

+ Future proofed with ability to utilize

NAND-flash and future non-volatile memories

Near-DRAM

Response Time

Deterministic

Reliable

Performance Scalable

Form Factor,

Capacity,

Performance

Thank You!

Jerome McFarland

jmcfarland@diablo-technologies .com

Q&A

1. What is the difference between Memory Channel Storage and NVDIMMs?

2. What is the difference between Memory Channel Storage and SATADIMMs?

3. Though being on the memory channel will reduce latency to the CPU, isn’t I/O performance

still limited by the IOPS & BW of the SSD NAND flash and controller?

4. How do Enterprise customers purchase Memory Channel Storage?

5. Will Memory Channel Storage support DDR4?

6. Will the Linux driver be open sourced?

7. Is there a limit to the number of MCS modules that can be deployed in single system?

8. What is the cost per GB for a Memory Channel Storage device?

Thank You!

Jerome McFarland

jmcfarland@diablo-technologies .com