Enabling NVMe WRR support in Linux Block Layer

USENIX HotStorage’17

Kanchan Joshi, Praval Choudhary, Kaushal YadavMemory solutions, Samsung Semiconductor India R&D

Outline

NVMe I/O queues

Arbitration methods and WRR

What it takes to build differentiated I/O service

Affinity based method and its drawback

Proposed method

Results

Summary

NVMe I/O QueuesHOST IO Queues NVMe SSD

Per-CPU queue pair Parallel I/O distribution Fast core-local path

Arbitration Methods

Arbitrate

Round-Robin (RR)

Controller

Arbitration Methods

Arbitrate

Weight 3

Weight 2

Weight 1

Medium

Arbitrate

Round-Robin (RR)

Weighted Round-Robin with urgent priority (WRR)

Controller

Problem Statement

How to make prioritization capability (WRR) benefits reach to Applications!

WRR Support Requirements

I/O Prioritization

Need to create prioritized I/O queues

Retain NUMA-friendly path

I/O classification

How application can specify I/O service?

Per-application or per I/O?

Non-prioritized queues

Prioritized queues

URGENT HIGH

MEDIUM LOW

I/O Prioritization

I/O classification

Non-prioritized queues

Prioritized queues

URGENT HIGH

MEDIUM LOW

IO classification method

I/O Prioritization

I/O classification

Affinity-based Method

Prioritization method: Each core hosts one type of submission queue (1:1 mapping)

Classification method: Affine applications to particular core(s)

CORE 3

CORE 2

CORE 1

CORE 0

NVMe Controller

Affinity-based Method

Prioritization method: Each core hosts one type of submission queue (1:1 mapping)

Classification method: Affine applications to particular core(s)

CORE 3

CORE 2

CORE 1

CORE 0

NVMe Controller

Affine Affine Affine Affine

URGENT HIGH MEDIUM LOW

Drawbacks

All running applications must be affined (Arbitrary I/O performance otherwise)

Drawbacks

All running applications must be affined (Arbitrary I/O performance otherwise)

HIGH PRIORITY

LOW PRIORITY

MEDIUM PRIORITY

Reduction in compute-ability

Mandatory affinity leading to asymmetric core-utilization

Proposed Method: I/O Priority-based

I/O Prioritization

Create prioritized I/O queues on each core

I/O Classification

Link NVMe priorities to existing I/O priority classes

Per-application

I/O Prioritization

I/O Classification

Per-application

CORE 0

SQ SQ SQ SQ CQ

IO scheduling class

NVMe queue priority

I/O Prioritization

I/O Classification

Per-application

CORE 0

SQ SQ SQ SQ CQ

I/O Priority-based Method

Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes

CORE 0

SQ SQ SQ SQ CQ

CORE 1

SQ SQ SQ SQ CQ

CORE 2

SQ SQ SQ SQ CQ

CORE 3

SQ SQ SQ SQ CQ

NVMe Controller

Compute-ability unaffected Does not require modifying applications

CORE 0

SQ SQ SQ SQ CQ

CORE 1

SQ SQ SQ SQ CQ

CORE 2

SQ SQ SQ SQ CQ

CORE 3

SQ SQ SQ SQ CQ

NVMe Controller

Real-time

CORE 0

SQ SQ SQ SQ CQ

CORE 1

SQ SQ SQ SQ CQ

CORE 2

SQ SQ SQ SQ CQ

CORE 3

SQ SQ SQ SQ CQ

NVMe Controller

Real-time Best-effort None Idle

Modified NVMe Stack (4.10 Kernel)

VFS/Page cache

Single-queue Multi-queue

deadline

NVMe driver(Modified)

SATA driver

Block Layer

Specify io-priority class value while running (ionice)

This is stored in io_contextinside task_struct

Obtain io-class value from io_context (or from request)

Map io-class to queue-priority value and place command in corresponding SQ

Real-time Urgent

Best-effort

Medium

Ionice example on NVMe

Best-effort

Medium

Experimental Setup

Linux 4.10 Kernel (Modified NVMe Driver)

Dell R720 server 32 CPUs (2 NUMA nodes) 32 GB RAM

Samsung PM1725a SSD (With WRR arbitration)

Result #1

IOPS distribution among 3 applications

Application configuration 4 FIO jobs QD 64 4K record

Result #1

Weight-based

distribution

Result #1

Weight-based

distribution

420 420 423 419

Result #2

Bandwidth distribution among 3 applications

Result #2

Bandwidth distribution among 3 applications

Weight-based

distribution

Result #3

Foreground/Background IO control

Result #3

Foreground/Background IO control

Foreground Read IOPS

WRR mode Background process can be throttled 16:1 = Throttle BG process 128:1 = Further throttling. Retains

foreground performance

RR mode Sharp decline in IOPS Background process cannot be throttled

Summary

Differentiated I/O service for applications can be built using WRR arbitration

Scheduler-independent prioritization: Applications get the advantage of the prioritization natively present inside the device

Proposed method does not reduce compute-ability of applications

By not introducing new interface/API, need of rebuilding application is avoided

Future work Kernel patch Sysfs support for run-time WRR configuration

AcknowledgementsRajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin

Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj...

Documents

Mobility Robustness in 5G Networks · 2017. 4. 28. · Mobility Robustness in 5G Networks Ashish Thapliyal School of Electrical Engineering Thesis submitted for examination for the

1 INTEGRATED HEADQUARTERS MINISTRY OF DEFENCE (NAVY) … · 2019. 5. 3. · 2 Vice Admiral Anurag G Thapliyal, AVSM. Chief of Personnel Integrated Headquarters . Tele: 23011407 Ministry

Robert Arthur Kevin Lee Xing Liu Pushkar Pande Gena Tang Racchit Thapliyal Tianjun Ye

Cross-modal Language Generation using Pivot Stabilization ...Web-scale Language Coverage Ashish V. Thapliyal Google Research asht@google.com Radu Soricut Google Research rsoricut@google.com

Change Management in Evolving Web Ontologies · Change Management in Evolving Web Ontologies Asad Masood Khattaka, Khalid Latifb, Sungyoung Leea, aUbiquitous Computing Lab, Department

UX STRAT 2013: Elizabeth Thapliyal, Reena Merchant, START THE INVENTION ENGINE: CREATING AN INNOVATION PROGRAM WITHIN A LARGE COMPANY

Cyber Kill Chain based Threat Taxonomy and its Application ... · Cyber Kill Chain based Threat Taxonomy and its Application on Cyber Common Operational Picture Sungyoung Cho, Insung

REUTERS/Mansi Thapliyal Womb Outsourcing · 286 volume 40 | number 5 September/October 2015 In 1992, Georgia, formerly of the Union of Soviet Social-ist Republics, deemed surrogacy

Going Beyond Tomorrow SPENTEX INDUSTRIES LIMITED Annual Report 2008-09.pdf · Deepak Diwan Prem Malik Ram Kumar Thapliyal Shyamal Ghosh Vivek Chhachhi Dhananjaya Prasad Singh SECRETARY

TAS 2016- Akshay Thapliyal

Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee

APPNO NAME FatherName Gender Cat. SubCat Q.% W MP S1 … · 195276403 abhay thapliyal rishi prasad thapliyal 75 m obc 82 79 78 86 96 84.20 0 84.20 76 195168403 avantika kaintura man

Priyank tHAPLiYAL - Jupiter Mines · Youfu (Andrew) ZHou (Non-Executive DIrector) eugene Xie (Alternate Non-Executive Director for Mr Zhou) chief executive officer Greg Durack company

3- 6 December 2019 | Pragati Maidan, New Delhi WORLD’S ...india.paperex-expo.com/Paperex/media/ITEGroup/... · Chief Executive Officer, Hyve Group PLC Dr. Bipin Prakash Thapliyal

ETRI기관특별세션 ETRI와 국내 기업의 인공지능 프로세서 ...kcs.cosar.or.kr/2020/download/program_s/KCS2020_Program... · 2020. 2. 6. · Seunguk Song1, Yeoseon Sim1,

Thapliyal 1MAPLD 2005/1011 A High Speed and Efficient Method of Elliptic Curve Encryption Using Ancient Indian Vedic Mathematics Himanshu Thapliyal and

EDGARD MUÑOZ-COREAS AND HIMANSHU THAPLIYAL, … · Author’s address: Edgard Muñoz-Coreas and Himanshu Thapliyal, University of Kentucky, Department of Electrical and Computer

Project progress and outlook Bhupandra Singh Rana (INHERE), Kuldeep Thapliyal (CHIRAG) Thanammal Ravichandran, Nils Teufel (ILRI) MilkIT – Advisory Council

PERFORMANCE EVALUATION OF CORRUGATED PACKAGING - A KEY TO SUCCESS FOR PAPERMAKERS AND CORRUGATORS Sanjay Tyagi, B. P. Thapliyal, R. D. Godiyal, Prachi

ISSIP 2nd Quarter 2017 Board of Directors Meeting July 26, 2017 · 2017-07-26 · GE Saurabh Thapliyal (GE Digital) PK Agrawal (Northeastern University) IBM Hamid Motahari (IBM) Rama