53
IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver High Level Design Draft 2 June, 2002

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

Draft 2

June, 2002

Page 2: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

Revision History and Disclaimers

Rev. Date Notes Draft 1 <May, 2002> Internal review. Draft 2 <June, 2002> Integrated Draft 1 review comments. Open to group wide review.

THIS SPECIFICATION IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE. Intel disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein. This Specification as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation. Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others.

Copyright © 2002 Intel Corporation.

Page 3: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

Approval

Role Signature Date

Responsible Engineer

Engineering Group Leader

Software Engineering Manager

Page 4: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

Abstract

The Linux SRP driver, srpl, is a low level Linux SCSI driver. It provides user applications access to storage resources on InfiniBand fabric attached SRP storage I/O Units, either directly through a device file, or through a transparent mount point in the file system. Srpl registers with the Linux SCSI mid layer as does any other low level Linux SCSI driver for a SCSI host bus adapter.

An application's access to Linux SCSI devices is abstracted through the Linux SCSI mid layer. I/O requests from the file system or block driver go to the SCSI mid layer which then delivers SCSI commands to the driver in control of the target SCSI device. The mid layer keeps track of all the SCSI devices and their controllers. As controller drivers start, they register with the mid layer, and the mid layer scans for devices then maps those devices to device file nodes (names) in the file system. For a low level driver to register with the mid layer, it passes a set of driver entry points for the mid layer to use when passing commands. When an application opens a device, the mid layer arranges to pass SCSI commands intended for that device to the appropriate driver.

An SRP storage I/O unit on an InfiniBand fabric is any device on the fabric that provides block storage services using the SRP over InfiniBand.

Srpl provides the SCSI mid layer with access to the InfiniBand attached SRP storage resources. It behaves like any other low level Linux SCSI driver as far as the mid layer is concerned. It differs from most low level drivers in some important ways. First, srpl does not directly manage any specific storage controller hardware. Second, this driver registers with the InfiniBand plug and play manager. This allows the plug and play manager to inform srpl when a SRP storage controller becomes available on the fabric. When this happens, srpl establishes an InfiniBand connection with the storage unit then registers the newly discovered controller with the mid layer.

After registration with the mid layer, srpl is ready to service I/O requests from the mid layer. This is done in the following steps (see figure for illustration):

1. Srpl receives an I/O request from the SCSI mid layer.

2. Srpl translates the I/O request from its native Linux SCSI mid layer form to an SRP information unit command request.

3. Srpl sends the command request in an InfiniBand message, to the I/O unit.

4. After the I/O unit has completed the request (successfully or unsuccessfully), srpl will receive the corresponding SRP command response in an InfiniBand message from the I/O unit.

5. Srpl translates the command response message to find the completion status.

6. Srpl completes the I/O by reporting the completion status from the command response back to the SCSI mid layer.

Page 5: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

SCSI Midlayer

srpl InfiniBand Storage Driver

Resource Management

SRP protocol engine

CommandTranslation

ResponseTranslation

Connection Management

Initialization and ShutdwnManagement

SCSI

Com

man

dSR

P C

omm

and

Req

uest

I/O Unit

SCSI

Com

man

d re

sult

SRP

Com

man

d R

espo

nse

1

5

6

43

2

Error Handler

I/O RequestManagement

Figure 1–1. Srpl provides SCSI mid layer with access to IB storage resources.

Page 6: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

Contents

1. Introduction ................................................................................................ 1-1 1.1 Purpose and Scope ...................................................................................................1-1 1.2 Audience....................................................................................................................1-1 1.3 Acronyms and Terms ................................................................................................1-1 1.4 References ................................................................................................................1-2 1.5 Conventions...............................................................................................................1-2 1.6 Stakeholders..............................................................................................................1-2 1.7 Before You Begin ......................................................................................................1-2 2. Features ...................................................................................................... 2-1 3. Goals ........................................................................................................... 3-1 4. Design Assumptions & Rules ................................................................... 4-1 5. Design Overview ........................................................................................ 5-1 5.1 Major Components ....................................................................................................5-1

5.1.1 Interfaces...................................................................................................5-1 5.1.2 Other components.....................................................................................5-3

5.2 Operation...................................................................................................................5-3 6. Design Details............................................................................................. 6-1 6.1 Plug and Play Manager Interface ..............................................................................6-1

6.1.1 Inbound Data Flow ....................................................................................6-1 6.1.2 Outbound Data Flow .................................................................................6-1

6.2 Fabric Attached SRP Controller Interface .................................................................6-2 6.2.1 Outbound Data Flow .................................................................................6-2 6.2.2 Inbound Data Flow ....................................................................................6-2

6.3 SCSI Mid Layer Interface ..........................................................................................6-2 6.3.1 Inbound Data Flow ....................................................................................6-3 6.3.2 Outbound Data Flow .................................................................................6-4

6.4 I/O Request Management .........................................................................................6-4 6.5 Resource Management .............................................................................................6-7 6.6 Threading Model........................................................................................................6-8 6.7 Locking ......................................................................................................................6-8 6.8 Buffer Strategy...........................................................................................................6-9 6.9 Error Handling ...........................................................................................................6-9 6.10 Major Data Structures..............................................................................................6-10

6.10.1 srpl globals ..............................................................................................6-10 6.10.2 srpl host...................................................................................................6-10 6.10.3 srpl request..............................................................................................6-11

7. System Resource Usage ........................................................................... 7-1 7.1 Memory......................................................................................................................7-1 7.2 Other Resources .......................................................................................................7-1 8. Internal Compatibility................................................................................. 8-1 8.1 Interaction with Other Components...........................................................................8-1 8.2 System Requirements ...............................................................................................8-1

Page 7: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

8.3 Imported Interfaces....................................................................................................8-1 8.4 Exported Interfaces ...................................................................................................8-1 9. External Compatibility ............................................................................... 9-1 9.1 Standards ..................................................................................................................9-1 9.2 Deviations from Standards ........................................................................................9-1 10. Other Dependencies ................................................................................ 10-1 11. Initialization & Shutdown......................................................................... 11-1 11.1 Initialization..............................................................................................................11-1 11.2 Shutdown.................................................................................................................11-2 12. Installing, Configuring, and Uninstalling ............................................... 12-1 12.1 Installing...................................................................................................................12-1 12.2 Configuring ..............................................................................................................12-1 12.3 Uninstalling ..............................................................................................................12-1 13. Unresolved Issues.................................................................................... 13-1 14. Data Structures and APIs ........................................................................ 14-1

Figures Figure 1–1. Srpl provides SCSI mid layer with access to IB storage resources. ................... v Figure 5–1. Interfaces of srpl ...............................................................................................5-2 Figure 6–1. SCSI command execution ................................................................................6-5 Figure 6–2. Srpl’s request state machine ............................................................................6-6 Figure 6–3. Controller resources ..........................................................................................6-8 Figure 6–4. Major data structures.......................................................................................6-10 Figure 11–1. Controller discovery.......................................................................................11-2

Tables Table 5-1. Srpl Responses to Events ...................................................................................5-3

Page 8: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 9: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

1. Introduction

1.1 Purpose and Scope This document is one of a set of High Level Designs (HLDs) that supplement the Software Architecture Specification (SAS) by providing further levels of decomposition and design detail. Please refer to the SAS for the first level design decomposition and architectural description. This HLD defines the implementation of one component in the SAS, including inter-component dependencies.

When completed, this HLD will enable the product development team (PDT) to complete the low-level design, coordinate commitments, make good estimates of the required effort, start test planning, and schedule for the Plan of Record (POR).

1.2 Audience Anyone interested in understanding this implementation of the SAS should read this document, including:

• Software developers who are integrating the separate modules into their own software projects • Hardware developers who need an understanding of the software behavior to optimize their designs • Evaluation engineers who are developing tests for InfiniBand-compliant devices • Others in similar roles who need more than a basic understanding of the software

1.3 Acronyms and Terms Information Unit

Information Units are SRP formatted requests of various types and responses. Information Units are exchanged between SCSI initiators and targets across an RDMA channel (such as InfiniBand). The most common of these are the command requests (sent by the initiator to the target) and command responses (send by the target to the initiator).

IU Information Unit

SCSI RDMA Protocol This protocol defined by the ANSI T-10 Committee describes an encapsulation scheme by which the SCSI I/O protocol is mapped to an RDMA capable transport.

SRP SCSI RDMA Protocol

Srpl Srpl is the name of the Intel Linux SRP InfiniBand storage driver.

1-1

Page 10: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

1.4 References SRP Specificationhttp://www.t10.org/pubs.htm

American National Standard for Information Systems – Information Technology – Working draft SCSI RDMA Protocol (SRP). (Current draft is revision 15.) http://www.t10.org/pubs.htm

1.5 Conventions This document uses the following typographical conventions and icons:

Italic is used for book titles, manual titles, URLs, and new terms.

Bold is used for user input (in the Installation section).

Fixed width is used for code definitions, data structures, function definitions, and system console output. Fixed width text is always in Courier font.

NOTE Is used to alert you to an item of special interest.

DESIGN ISSUE Is used to alert you to unresolved design issues that may impact the module’s design, function, or usage.

1.6 Stakeholders The stakeholders in this design are:

• Manager of Software Development • Program Management • Evaluation • Inspection • Technical Publications • Technical marketing • Software Quality • InfiniBand Linux System SW Manager

1.7 Before You Begin Please note the following:

This document assumes that you are familiar with the InfiniBand Architecture Specification, which is available from the InfiniBand Trade Association at http://www.infinibandta.org.

1-2

Page 11: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

For a complete list of acronyms, terms, and references for all the HLDs, see the InfiniBand* Architecture Glossary and References.

1-3

Page 12: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 13: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

2. Features Srpl will be loadable by the InfiniBand Access Layer’s plug and play manager. Srpl will register with the plug and play manager as it initializes allowing the plug and play manager to notify srpl of new SRP resources as they become available on the fabric. In addition, srpl will have an entry in the device configuration file, which maps I/O controllers to driver modules. Thus if a resource becomes available, and the driver is not loaded, the plug and play manager will be able to select the srpl module file and load it on demand.

Srpl will support proc file system controls to modify behavior and extract counter values.

2-1

Page 14: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 15: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

3. Goals

The performance and other goals have not been defined for the first draft of this document.

3-1

Page 16: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

3-2

Page 17: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

4. Design Assumptions & Rules The following are assumed in order to support the design of srpl:

- The plug and play manager will be present and capable of loading the srpl driver and notifying srpl of changes in the status of relevant resources on the fabric

- The Access Layer and Connection manager will be present and capable of supporting InfiniBand connection establishment, message passing and RDMA services.

- The Linux SCSI mid layer will serve as request broker between the application (or driver) using the disk services and srpl. This mid layer will manage flow control and I/O request backlog, and error handling above the srpl driver.

- The version of Linux is Red Had 7.2.

- The physical machine is capable of supporting an Intel HCA and the software stack to run it. Srpl requires about 40 to 60 Kbytes per connected I/O controller, depending on the request queue depth and the number of disks on that controller.

Srpl is designed as a Linux low level SCSI driver. It conforms to the interface requirements of the Linux SCSI mid layer whose purpose is to abstract a common usage model of all low level SCSI drivers for the benefit of those programs higher on the storage stack (e.g. file system drivers, user applications).

4-1

Page 18: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

4-2

Page 19: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

5. Design Overview

5.1 Major Components 5.1.1 Interfaces Srpl has three interfaces. One to interact with the Linux SCSI mid layer, another for the fabric attached SRP storage controller, and the third is used to communicate with the plug and play manager. (The path to the storage controller goes through the InfiniBand access layer to receive and deliver messages to and from the fabric.) To the Linux SCSI mid layer, srpl behaves like any other low level Linux SCSI driver. It registers with the mid layer in order to advertise its entry points in a low level driver standard way. The mid layer uses these entry points to deliver SCSI commands and error recovery notifications to srpl. Figure 5–1 shows the relationship of srpl and its neighboring components. The heavy connectors in the figure represent the interfaces discussed here.

Srpl relies on the InfiniBand Access Layer for establishment of InfiniBand connections with the fabric attached storage controller service on an I/O unit. The interface with the I/O unit is defined by the SRP specification, which determines formats of messages, and expected behavior. Srpl opens a connection in an SRP specific way then sends messages to the I/O unit containing SRP information units (IU) that represent the SCSI command received from the mid layer. Srpl interprets responses from the I/O unit and notifies the mid layer of I/O completion.

Srpl logically sits between the SCSI mid layer and the fabric attached SRP controller. Its primary job is to translate SCSI requests (from the mid layer) into SRP information units, and information units received from the controller into responses for which will be delivered to the mid layer. Srpl is capable of doing this for several (limited only by system memory resources) fabric attached controllers simultaneously. It is also capable of managing many I/O transactions for each of these controllers.

The third interface used to communicate with the plug and play manager. When srpl starts up it immediately registers with the plug and play manager, advertising entry points for notification, so that srpl can be informed of newly available fabric attached resources (or that some resource has become unavailable).

5-1

Page 20: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

Linux SRPStorageDriver

Verbs Provider

Linux SCSI Mid Layer

File System Drivers

UserApplications

Linux Block Driver

Linux Kernel

InfiniBandFabric

Storage I/OUnit

Other LowLevel SCSIdrivers

IB Access Layer

ConnectionServices

PnPServices

Figure 5–1. Interfaces of srpl

5-2

Page 21: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

5.1.2 Other components In addition to these interfaces, other major elements of srpl are the SRP protocol engine, the I/O request manager, error handler and resource manager. The SRP protocol engine translates SCSI commands into SRP requests and SRP responses from the target into command results and completions for the SCSI mid layer. Error handling in the srpl driver is mostly concerned with InfiniBand port fail over, but also includes special interfaces to facilitate target error recovery by the mid layer. The resource manager sets up pools of resources (messages and I/O request structures) for newly discovered controllers, and manages those pools during execution.

5.2 Operation Srpl is event driven. All of srpl’s behavior is in response to certain events. See the table below describing srpl’s actions for each type of event it might receive. Once the driver is running, any of these types of events could happen at any time. The I/O request manager and resource manager elements of the driver keep track of transactions in progress, so that the state of the driver remains coherent.

Event Srpl Action

Driver Load Driver Initialization; Registration with plug and play manager

Controller assigned

Connect through InfiniBand; Allocate message and I/O request structure resources; Register controller with SCSI mid layer

I/O Request from mid layer

Translate SCSI Command into SRP message to I/O unit; Send message over fabric to controller service on I/O unit

Send message completion

Advance state of I/O request; If complete, notify mid layer and recycle I/O request, messages

Recv message completion

Advance state of I/O request; If complete, notify mid layer and recycle I/O request, messages

Controller revoked [Currently the design doesn’t support this event. Srpl will generate an error and continue. More work needs to be done to understand how to handle this in Linux]

Driver unload Return all resources; Close all InfiniBand connections; driver exit

IB Connection failure

Fail over: srpl attempts to fail over. Srpl suspends action on all unfinished I/O requests; attempts to re-connect, possibly using a different path; If this fails, for each outstanding I/O, each is closed in error and the mid layer notified; If successful, then srpl re-issues the outstanding I/Os in the order they were initially issued. It is possible the an HCA will support automatic path migration. In a future release, srpl can use this feature to improve fail over performance.

SCSI Error Report error to mid layer; issue command abort, or reset to controller, under mid layer direction

Table 5-1. Srpl Responses to Events

5-3

Page 22: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 23: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

6. Design Details This section gives details on the major components of the driver.

6.1 Plug and Play Manager Interface Srpl interacts with the plug and play manager. Srpl registers with the plug and play manager, advertising a set of entry points intended to be used to notify srpl of events related to the arrival or departure of SRP resources on the fabric.

The relationship between srpl (or any other channel driver) and the plug and play manager can be established even before the driver loads. The plug and play manager uses a configuration file in the file system to associate InfiniBand I/O controllers with specific channel drivers. Thus when the plug and play manager learns about a new SRP resource on the fabric, and there is no driver module loaded which has registered and associated itself with that resource, the plug and play manager can choose the srpl driver and load it. When this happens, the driver gets a chance to initialize and register with the plug and play manager, so it can be notified of the new SRP resource on the fabric.

6.1.1 Inbound Data Flow Data inbound from the plug and play manager is really an event notification and takes the form of a call to one of srpl’s two plug and play entry points (advertised at plug and play registration). The first of these is the add unit entry point, srpl_add_unit(). The plug and play manager passes an IOC profile structure that contains characteristics about the service. The access layer provides facilities for finding paths to the IOC. When srpl gets notification of a new SRP resource on the fabric, it then establishes a connection with that service, and makes it available to the Linux SCSI mid layer, by calling the mid layer function to register a new SCSI controller.

The Other entry point the plug and play manager may use is the remove unit entry point, srpl_remove_unit(). This notification informs srpl that a previously available SRP resource has become unavailable. The argument to this call is an IOC profile structure pointer. This is enough information for srpl to identify which resource has been removed.

DESIGN ISSUE

Linux doesn’t handle device removal very well. This is an unresolved issue at this point. For the moment, srpl will ignore these notifications. See chapter 13.

6.1.2 Outbound Data Flow The only outbound interaction srpl has with the plug and play manager is summed up in two events. Srpl calls RegisterChannelDriver() to register as it is initializing. This registration process informs the plug and play manager of the entry points to use to notify the srpl. It also lists the vendor and device ids for those fabric-attached services for which this driver should be used. The other event is de-registration. When srpl is shutting down, it calls UnregisterChannelDriver() to inform the plug and play manager that the driver is being unloaded and the entry points are no longer valid.

6-1

Jerrie Coffman
This has changed from Gen1. There is not a separate add_unit or remove_unit entry point. The plug and play manager allows a client to register for notification of events on a specified class of IOC. The client provides a callback function and context for IOC notifications. There is only one callback per registration. When an event occurs, the callback is invoked with an event type, such as IOC_ADD or IOC_REMOVE. This change will affect several areas of you HLD.
Page 24: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

6.2 Fabric Attached SRP Controller Interface This section describes how srpl communicates with the SRP storage service on the fabric. Connection to the target is initiated when the plug and play manager notifies srpl that a fabric attached controller has become available for use.

Srpl issues a connect request containing a structure of connection parameters specified by the SRP protocol (e.g., SRP logon request that contains message size and queue depth). If successful, the target will accept the connection (with possibly modified connection parameters) and finally, srpl will accept the target’s connection response. In this step srpl posts messages on the receive queue, the number of which is the same as the queue depth. These will be used to receive messages (command responses) from the I/O unit, on this connection. Once this has happened the connection is ready to support message traffic in both directions.

6.2.1 Outbound Data Flow Outbound data from srpl to the target SRP service is contained in InfiniBand messages. The message payload contains an SRP information unit of type command request, translated from the Linux SCSI command data structure. The message specifies the location in memory, the location on the disk (which are the endpoints of the user buffer movement), size of the user buffer, and other information such as the SCSI target id and LUN, and command tag. The message is interpreted by the SRP service on the other end of the connection. The SRP target will then execute the command, using RDMA to move the user data (without srpl’s direct involvement). When the I/O unit has completed the SRP request, it replies to srpl with a message containing an SRP command response. See section 6.2.2.

6.2.2 Inbound Data Flow Inbound data through the target connection come in the form of InfiniBand messages containing SRP information units (command responses). They report the status of the command request with the same tag. Any error information, and auto sense data are also contained within the information unit. The srpl driver uses the information in command responses to complete the requested I/O for the Linux mid layer.

SRP supports a small number of special commands from the target. These messages are distinguished by srpl as they arrive. See the SRP specification for the use of these and more details on the use of commands and command responses in the protocol.

Srpl I/O request manager keeps track of all (perhaps many) outstanding I/O requests. Please see section 6.4 for details.

6.3 SCSI Mid Layer Interface This section describes the interface used to exchange I/O requests and results between the Linux SCSI mid layer and srpl. This interface is the standard one any low level SCSI host bus adapter uses with the Linux operating system.

For each connection to a fabric attached resource, srpl registers with the Linux SCSI mid layer using scsi_register_module() to pass a SCSI host template data structure pointer to the mid layer. This template describes the attributes of the low level driver, including various parameters such as maximum number of SCSI target ids, and LUNS. The structure also contains a set of pointers to the driver’s entry points,

6-2

Page 25: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

which constitutes the mid layer’s interface to srpl. There are pointers to routines for delivering SCSI commands to the driver, aborting commands, resetting, and handling errors. An exhaustive list and descriptions of these entry points is beyond the scope of this document. Those details can be found in the Linux kernel source, specifically, the file /usr/src/linux/drivers/scsi/hosts.h, which contains the definition of the SHT (SCSI host template), many of whose fields are function pointers into the low level driver. The structure fields are commented with information about how the functions pointed to are to be used. This section deals with the normal flow of SCSI commands and results across this interface.

6.3.1 Inbound Data Flow Commands flow into srpl from the SCSI mid layer via calls made to srpl_queuecommand(), the main entry point advertised upon driver registration with the mid layer. There are two arguments to srpl_queuecommand(): a pointer to a Linux SCSI command data structure, and a pointer to a mid layer callback function. The SCSI command structure contains the SCSI command block which identifies the request the mid layer is making, (usually) a pointer to a user buffer which holds the data to be moved to or from the target device, the buffer’s size, and space to be filled with result status and (possibly) auto-sense data. It should be noted that srpl doesn’t explicitly move data in or out of the user buffer. The I/O unit manages that data movement. Srpl facilitates this by getting a memory handle for the user buffer and making that available to the I/O unit. The I/O unit, after it interprets the command request information unit in the message, will have enough information to initiate an InfiniBand RDMA request to move the user data.

When the mid layer calls srpl_queuecommand(), it expects srpl to queue the request and return from the call before the request has completed. The mid layer is then free to call srpl_queuecommand() again (and again) until the maximum queue depth of outstanding requests is reached.

For each command dispatched to a low level driver, the mid layer sets a timer so that it can detect missing request completions. Normally this timer doesn’t expire. It is destroyed when the command is completed. If a command timer should expire, however, the mid layer will interpret this to mean that the SCSI controller is unable to complete the command. The mid layer will take steps to recover by first issuing a command abort to the low level driver, and if successful, will retry the command. If this doesn’t result in a completion of the command, the mid layer will try resetting first the device, then the SCSI bus, and finally the adapter. To do this, the SCSI mid layer may call srpl’s command abort handler, or any of three reset routines (for device reset, SCSI bus reset, or adapter reset) in attempts to recover from detected errors or command time out events. When calling any of these srpl entry points, the mid layer expects the request (abort or reset) to be completed by the time srpl returns control back to the mid layer. Each of these entry points has only one argument: a pointer to SCSI command structure. Srpl will abort the command identified by this SCSI command or reset the device, bus or controller associated with the SCSI command. To abort a previously received SCSI command, srpl will create a message to the I/O unit with an SRP task management request information unit with code: abort, and send that to the I/O unit. Upon receiving the reply to the abort, srpl cleans up the data structures associated with the I/O request and returns from the abort handler to the mid layer. The mid layer will not expect to see a completion for the aborted command. The mid layer can now re-issue the aborted command.

Resets are handled in a similar way. Reset notification is forwarded to the appropriate controller on the fabric. After the reset has completed, srpl cleans up all requests associated with the object that was reset, and returns from the reset handler to the mid layer. The mid layer will not expect completions for commands issued to the unit (disk, bus, or controller) being reset after the reset success is reported.

6-3

Page 26: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

6.3.2 Outbound Data Flow After the I/O unit has completed a request (that is the reply message containing the command response has been received), srpl fills in the appropriate fields of the associated SCSI command, including error status and perhaps SCSI sense data. Srpl then calls the mid layer completion function associated with the command, passing the SCSI command structure pointer at its argument. This completes the I/O and reduces the outstanding queue depth by one. Srpl also posts a new InfiniBand message receive request to replace the one consumed by receiving the I/O unit’s message.

For the life of the I/O request, srpl uses its own request data structure to contain all the associated resources, information and the state of the request. This structure holds a pointer to the SCSI command structure, the mid layers completion handler pointer, and a pointer to the message containing the SRP information unit that was sent to the I/O unit. See section 6.4 for details on how srpl manages the I/O requests.

6.4 I/O Request Management Srpl receives SCSI commands from the mid layer, translates them to SRP information units, and puts them into messages bound for the target I/O unit that is providing the service. As responses come in from the I/O unit, srpl coordinates message reception, completions of I/O requests, and other updates of I/O requests. Figure 6–1 shows the steps that occur during the life of a SCSI command from the mid layer.

Here is a description of the steps taken during the life of an I/O request.

1. SCSI mid layer delivers SCSI command structure (pointer) through the interface srpl registered with the mid layer (srpl_queuecommand()).

2. Srpl formats an SRP command request information unit and puts it in a message and calls Connection Service to send it to the I/O unit.

3. Connection Service sends the message.

4. Connection Service notifies srpl of the message send completion by calling srpl’s send completion handler.

5. I/O unit executes the I/O request, managing an RDMA transaction to move the data from (or to) the user buffer.

6. I/O unit sends message containing an SRP command response information unit to srpl.

7. Connection Service notifies srpl of a message receive completion by calling srpl’s receive completion handler.

8. Srpl interprets the I/O completion message, reports status in the SCSI command structure associated with the I/O request, then calls the mid layer’s completion routine.

For each command delivered to srpl, an I/O request structure is allocated from this controller’s free list and is put on the work-in-progress list until the command has completed. At that time the I/O request structure is returned to the free list.

Srpl uses a state machine to track the progress of each I/O request. As steps in the process happen, the state of the request is updated until it reaches the final state, at which point the I/O request is completed and the resources (including the I/O request structure and the InfiniBand message) associated with it are freed for future use.

The I/O request structure contains the state of the I/O transaction (which is changing through its life), and pointers to the resources in use by the transaction, such as InfiniBand messages as well as pointers to the

6-4

Page 27: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

SCSI command structure and the mid layer’s completion routine. Figure 6–2 is a diagram showing the I/O request state machine.

Each I/O request traverses through the machine from its initialization (which happens as srpl is given a new SCSI command) to the time the command is completed and the I/O request structure freed. This machine has the following states: free, init, send_pend_recv_pend, send_comp_recv_pend, send_pend_recv_comp and complete.

The state of the I/O request structure starts as free. At this time the structure is on the free list belonging to the controller (srpl host) structure.

When a SCSI command is received from the mid layer, an I/O request structure is removed from the free list and its state is updated to init. Srpl translates the SCSI command into an information unit (which is in

I/O Unit

srpl

ConnectionServices

SCSI Mid Layer

3

1

User Data Buffer

2

Fabric

8

74

6

5

Figure 6–1. SCSI command execution

the payload of an InfiniBand Message). The resources associated with the request (the SCSI command structure pointer, the outbound request message, etc.) are stored in the I/O request structure. This request

6-5

Page 28: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

structure is then added to the work-in-progress queue of the controller (srpl host) structure to join any other requests that have not yet completed.

Free init

send_pendrecv_pend

send_comprecv_pend

send_pendrecv_comp

complete

Allocate Request Structure

Mes

sage

Sen

d R

eque

sted

Message Send Completion

Message Send Completion

Messa

ge R

eceiv

e Com

pletio

n

Messa

ge Rec

eive C

ompletio

n

Free

Req

uest

Stru

ctur

e

Figure 6–2. Srpl’s request state machine

Srpl calls the message send routine of the Access Layer’s Connection Service, and updates the state of the I/O request to send_pend_recv_pend. The message has been sent, though the completion for the send has not been detected, nor has the reply from the I/O unit.

From the send_pend_recv_pend state, the request will transition to one of two states, send_comp_recv_pend or send_pend_recv_comp, depending on which completion event handler is called first. (It is possible that the reply receive completion handler is called before the send completion handler for the outbound request.) Both completion handlers get a context argument, whose value is the address of the I/O request structure to be updated.

6-6

Page 29: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

From the send_comp_recv_pend state, the request will transition to complete when its command response is received. The receive completion handler will complete the request for the Linux SCSI mid layer.

From the send_pend_recv_comp state the request will transition to complete when the completion for the send completes (the response has already been completed). In this path the send completion handler will complete the request for the Linux SCSI mid layer.

When the request is in the complete state, then either the send handler or the receive handler has initiated completion with the mid layer. The mid layer’s completion routine is called, and after its return, the request structure is returned to the free list, and its state is returned to free.

At any time there could arise an interruption of the connection. When that happens, srpl attempts to fail over to a new connection. During this recovery normal operation of srpl is suspended until the connection is re-established and the outstanding command requests are re-sent to the I/O unit. As part of the process, all outstanding requests representing work in progress (those on the wip) transition to a special state, shelved (not shown in figure 6-1), and are queued on this host’s shelf. If any new requests arrive from the mid layer while srpl is failing over, the new requests go immediately to the shelf, with the state of shelved.

If srpl is successful at creating a new connection to the I/O unit, then all requests on the shelf are re-transmitted to the I/O unit over the new connection, and their state is changed to send_pend_recv_pend. At this time normal operation of the driver resumes, and requests traverse the state machine as described above.

6.5 Resource Management When an I/O controller on the fabric is assigned to a Linux host, srpl creates a new data structure called an srpl host structure, to represent that controller. Associated with the srpl host structure are a set of srpl request structures and a pool of messages to be used to exchange SRP requests and responses with the I/O controller service on the I/O unit. The number of srpl request structures is the same as the queue depth for that controller (established at channel connection), and the number of message is twice that (one each for sending the command requests and receiving the command responses). Figure 6–3 Illustrates the relationship between the major resources associated with each controller, managed by the srpl host structure.

Creation of these resources happens only when the controller is assigned and is managed by the plug and play interface for adding new units. Dynamic management of these resources is handled by a set of routines not exposed outside of the driver. These routines ensure that the state of the free list and of the data structures remains coherent.

srpl _host

req freelist

free messagepool

req wiplist

srpl_requestsrpl_request

srpl_request srpl_request

6-7

Page 30: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

Figure 6–3. Controller resources

6.6 Threading Model This driver does not explicitly manage any threads. At controller initialization time, srpl creates event managers for handling notification of message send and receive completions on its InfiniBand connections. Each of these managers has a thread that runs when signaled by the Access Layer. This ensures that when servicing message completions (that is, updating request state and possibly calling the mid layer’s completion handler), the thread has a context (i.e. the context is not within an interrupt service routine). This allows the completion handlers to be free of restrictions that exist for code that might run in the context of an interrupt.

The driver advertises entry points to two other local services, the SCSI mid layer and the plug and play manager. When any of these routines is called, they run in the context of the calling thread. Any threading model used by those services is implicitly extended to include execution of functions they call in srpl.

6.7 Locking Locks are fine grained and implemented to serialize access to common data structures. First, there is the srpl globals structure. This structure holds parameters global to the entire instance of the driver. When a controller is assigned to this host, a new host structure is initialized and added to a list headed by a field in this structure. A lock is taken while the addition of such a list element is in progress.

Secondly, there are locks associated with each host structure instance. As operations and events occur, there are counters in the host structure that are incremented. Write access to these fields is guarded by a lock. In addition there are locks guarding access to the request free and work-in-progress queues (both owned by the host structure).

Lastly, there is a lock in the request structure, used to serialize changes of state of the request. For example, once a command request has been sent to the I/O unit, two events are expected: the message send completion and the message receive completion (indicating a response from the I/O unit). In a multiprocessor environment, these two handlers could be running simultaneously, each updating the state of the same request structure. The lock in the request structure is there to ensure that once an update to state has started, it will finish before the other update can start.

All locks are public library implemented spin locks. These are, at their core, Linux kernel spin locks. Taking a lock involves spinning (holding the processor) until the lock is available. For this reason certain locking principles are observed. First, only hold the lock for a short time (or piece of code path); second, never yield the processor while holding a lock (either directly by blocking or scheduling or by calling a function that may block); and third, hold as few locks as possible at any given time. In this implementation, there is no time when a thread running in this driver is holding more than one lock at any given time.

6-8

Page 31: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

6.8 Buffer Strategy Srpl does not buffer data moved between memory and disk. The client data is identified by a pointer to the buffer’s location in user memory, and size. This information is passed in the SCSI command structure to srpl through the srpl_queuecommand() interface. Srpl creates a memory handle to the buffer and sends that in the command request message to the I/O unit. The I/O unit manages the movement of data between user memory and the disk, using the InfiniBand RDMA service. The buffer strategies used by I/O units will vary and is beyond the scope of this document.

6.9 Error Handling There are three major classes of errors that srpl is prepared to handle: local resource shortage, InfiniBand connection errors and target I/O unit errors.

The first class of errors is lack of resources on the local system, which causes an allocation request to fail. When this happens srpl returns the appropriate error code to the calling routine. The caller might be the mid layer, in which case another attempt will be made, until after repeated failure, the mid layer gives up. Resource allocation errors are not expected when the mid layer is queuing a command. This is because the queue depth is known in advance, and the necessary resources are allocated before the first command is queued. If the caller is the plug and play manager attempting to notify srpl of an newly assigned controller, then it is possible for resource allocation to fail, as this is the time when the driver is asking the operating system for more memory resources, which might not be currently available. If the system should be so limited that resources cannot be allocated to support the new controller assignment, then srpl returns, and the controller is not initialized. The controller will not be registered with the mid layer, and Linux will not queue commands to that controller.

The second class is errors on the InfiniBand connection to the I/O unit. Any sort of error on the connection results in the connection being destroyed. The I/O unit controller service is free to throw away any requests in progress and reset itself. Srpl, attempts to establish a new connection, possible using the same path record or, alternatively, a different one if there is another one known. All of the outstanding requests are moved temporarily to the shelf, and any new requests go directly to the shelf. Next, srpl will attempt to open a new connection, trying repeatedly on each of the paths to the I/O unit it knows or can find out. After the new connection is established, requests are re-sent to the I/O unit (using the new connection), and the requests are moved back to the srpl host structure’s work-in-progress queue, where they will be found when the completion notifications are delivered.

Errors of the third class occur on the I/O unit itself. In some cases, the I/O unit will indicate an error condition in the response it sends back to srpl. While this is unusual, the protocols allow for it. Srpl will facilitate negotiation between the mid layer and the I/O unit. In that event, the error condition notice and any sense data are sent up to the SCSI mid layer for processing. The mid layer may choose to reissue the command, or take some other course of action (such as requesting a reset).

In other cases, a message may get lost or stuck in the I/O unit, and the host never receives a response from the I/O unit. In that case, the mid layer timer associated with the request expires, notifying the mid layer that a response is missing. The mid layers response to this is to first issue an abort request within the mid layer, and then reissue the request to srpl. If this fails, then the mid layer starts issuing reset requests to a larger domain until it works. The first reset attempt is used on the device the next on the channel and finally, to the controller interface as a whole.

Srpl’s role in this is to forward the abort or reset requests to the I/O unit, clean up resources associated with abandoned I/O requests, and report abort or reset results back to the mid layer.

6-9

Page 32: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

6.10 Major Data Structures This section describes the major data structures employed by srpl and their relationships to each other. Figure 6–4 Shows the relationships of the major data structures. The sections that follow describe those structures.

srpl_host

spinlock next

host

state

mid layer callback

Scsi_Cmnd next

scsi_host

shtp

req_free

req_wip

connection attributesIOC attributes

resource pools

SRP msg

next

host

state

Scsi_Cmnd

next

host

state

Scsi_Cmnd

mid layer callback

SRP msg

mid layer callback

SRP msg

Figure 6–4. Major data structures

6.10.1 srpl globals The srpl globals data structure holds the heads of two linked lists of structures. The first is the list of srpl host structures (described in section 6.10.2). The other is the list of Linux scsi host templates. These are defined by the Linux SCSI mid layer and filled out by srpl to identify driver attributes including driver entry points. The Linux SCSI host template is passed to the mid layer’s scsi_register_host() routine when srpl registers a controller. The srpl globals structure is locked as items are added or removed from these lists.

6.10.2 srpl host For each discovered InfiniBand fabric-attached SRP controller, srpl creates an srpl host structure and adds it to the host list maintained within the srpl globals structure. The srpl host structure organizes the resources in use for that controller: connection attributes, IOC profile, path records to the IOC, and I/O

6-10

Page 33: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

request structures (described in section 6.10.3). The I/O request structures are kept on three lists. The free list keeps the I/O request structures not currently in use, the wip (work in progress) list keeps the I/O request structures representing unfinished I/Os. The third list is only used to manage port fail over connection recovery.

The srpl host structure also contains event counters to track number of requests received and completed, among other events, and the locks necessary to ensure coherent access to the I/O request structure lists.

6.10.3 srpl request Each time srpl receives a SCSI command from the mid layer, it takes an I/O request structure from the host’s free list, and uses it to track the I/O specific information and resources. This structure has a pointer to the original SCSI command, a pointer to the mid layer’s completion call back routine, pointers to the messages sent to and received from the I/O unit, address and size of the user buffer, and finally a state field indicating the stage of progress of this I/O request.

6-11

Page 34: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 35: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

7. System Resource Usage

7.1 Memory The total amount of memory required by srpl depends, of course, on the number of fabric attached SRP controllers it is managing, the number of disks on those controllers and the total number of possible outstanding I/O requests (the sum of all the queue depths over all the controllers). The controller cost is about 4 Kb per controller (connection). The cost per disk and I/O request structure (allocated when the remote service is reported available by the plug and play manager) is about 256 bytes and 650 bytes, respectively. This includes the amount of memory that srpl allocates directly as well as memory that is allocated by the SCSI mid layer in order to manage command configuration in its layer.

To estimate the size of the memory footprint on a given host on a fabric apply the following formula:

MF = c * 4000 + r * 650 + d * 250.

Where MF is the memory footprint in bytes, c is the number of controllers, r is the total number of I/O request structures allocated (e.g., 64 * c, sixty-four each for each of four controllers), and d is the total number of disk srpl discovers.

For example, in a system with 1 (one) controller, 4 (four) disks and a queue depth capability of 64, the total memory footprint is about 46 Kb of kernel physical memory. If there are three controllers, each with 6 disks, and each with a command queue depth of 64, then the queue depth would be about 126 Kb of kernel physical memory.

7.2 Other Resources Srpl depends on the presence of an HCA, and a stack of drivers to run the HCA, the Access Layer, and Subnet management services. Therefore the system must have enough resources to support those functions.

7-1

Page 36: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 37: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

8. Internal Compatibility

8.1 Interaction with Other Components As a channel driver, srpl sits on top of the Intel InfiniBand driver stack. It depends on the Access Layer for sending and receiving InfiniBand messages to and from the SRP I/O unit.

Srpl also interacts with the plug and play manager (part of the Access Layer). The plug and play manager is able to load srpl when it detects an SRP service on the fabric (if srpl is not already loaded). It will notify srpl via srpl’s advertised entry point srpl_add_unit(). The plug and play manager will also notify srpl in the event an existing SRP service disappears from the fabric. The plug and play manager does this by calling srpl_remove_unit().

As srpl initializes, it calls the plug and play manager function RegisterChannelDriver() to register, which here means identifying the add unit and remove unit entry points, as well as listing the InfiniBand vendor and device ids that this driver is intended to manage.

Srpl behaves as a low level SCSI controller driver to the Linux SCSI mid layer. It conforms to the standard Linux model by which SCSI host bus adapters provide their services to the operating system.

8.2 System Requirements In order for a system to support srpl, it must have an HCA that will support RMDA. It must be capable of driving that interface, that is, the system must have adequate memory, processor resources, and system bus resources to support the HCA and its driver stack, including the Access Layer.

8.3 Imported Interfaces The Intel InfiniBand storage driver, srpl, depends on the following software interfaces: the Linux SCSI mid layer as found in Linux release 2.4.10, the Intel InfiniBand Access Layer’s channel service, and plug and play manager interfaces. Srpl’s use of these interfaces is discussed in detail in chapter 6.

In addition, this driver depends on the Linux kernel environment of 2.4.10 for system facilities such as locks, events, and memory management.

8.4 Exported Interfaces See Section 14.

8-1

Page 38: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 39: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

9. External Compatibility

9.1 Standards The wire protocol used by srpl to communicate with I/O units on the fabric is SCSI RDMA Protocol (SRP) and is defined by the ANSI T10 working group (see section 1.4 for reference).

The InfiniBand specifications define the method by hosts and I/O resources are discovered on the fabric, as well as connection establishment procedures and protocol of packet exchange over the fabric. Srpl uses many of these features directly, and depends on software that uses others.

9.2 Deviations from Standards Srpl is intended to implement version 2 of the SRP ANSI specification. That document does not yet exist. Until it is released, srpl will comply with various working drafts leading up to that version.

9-1

Page 40: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

9-2

Page 41: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

10. Other Dependencies None.

10-1

Page 42: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

10-2

Page 43: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

11. Initialization & Shutdown

11.1 Initialization From the point of view of srpl, initialization starts when the driver module is loaded. Loading of the driver can be controlled by the plug and play manager, or may be initiated manually through Linux’s modprobe facility. Figure 11–1 shows the sequence of steps that happen upon controller discovery, starting with the driver’s registration with the plug and play service.

The following steps are taken when srpl registers with the plug and play manager, and a controller is assigned to srpl.

1. Srpl initializes, and registers with the Plug and Play service.

2. Plug and Play manager notifies srpl when controller is assigned.

3. Srpl uses Connection Service to connect to the I/O Unit.

4. Connection Service manages connection establishment with I/O Unit.

5. I/O Unit completes connection establishment.

6. Connection Service calls back srpl with new connection.

7. Srpl registers new controller with the Linux SCSI mid layer.

The driver’s initial entry point is a routine called init_module(); it is called immediately by Linux as part of module loading. The main job of init_module() is initialization in this driver, and to register with the plug and play manager. To do this the cd template structure is filled with a table listing the fabric devices srpl is able to control, and srpl’s entry points for add unit and remove unit (srpl_add_unit() and srpl_remove_unit(), respectively). This structure is then passed as an argument to the plug and play manager’s RegisterChannelDriver() routine.

Loading of srpl will fail if the plug and play manager is not already running. (In that event, the failure happens even before init_module() is called, because the loader will fail to resolve the reference to the symbol RegisterChannelDriver(). This will not happen, of course, if it is the plug and play manager that loads srpl. Loading will also fail if RegisterChannelDriver() fails (for some reason).

When the plug and play manager is notified of a new SRP resource, it will call srpl’s add unit entry point (srpl_add_unit()) passing an IOC profile and an IOC GUID as arguments. Srpl is able to use this information to find the I/O unit and establish an InfiniBand connection to it. Srpl allocates memory for a new host structure to manage the connection, a pool of messages and a set of I/O request structures to manage individual transactions between Linux and the fabric attached controller.

After establishing a connection, srpl registers the new controller with the Linux SCSI mid layer. Linux immediately polls the controller’s bus to find any disk devices controlled by the controller. These

11-1

Page 44: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

PnP Service

21 3

srpl

Connection Services

SCSI Mid Layer

I/O Unit

Fabric

5

4

6

7

Figure 11–1. Controller discovery

devices are associated with block device nodes in the file system, so that they can be opened and accessed by applications or mounted as additions to the file system.

Finally srpl_add_unit() returns success to the plug and play manager, indicating that the controller has been initialized. The system is now capable of exchanging block data with SCSI devices controlled by a fabric attached SRP storage service.

11.2 Shutdown The srpl driver module may be shutdown and unloaded as the machine is shutting down entirely, or when the system administrator decides to unload the driver. Before srpl can be unloaded, it must have a usage count of zero. Thus, all devices managed by the driver must not be open by any application or by the file system. (If a device can be mounted as a file system, it must not be mounted in order to unload the

11-2

Page 45: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

driver.) If an attempt to unload the srpl driver when there is a file system device mounted from a fabric attached controller, or if a fabric attached device is open, the attempt will fail.

When the operating system is shutting down, it terminates any running applications (closing any files and devices they may have open), and unmounts the file system devices. Then the driver modules are unloaded. Operating system shut down is therefore expected to succeed.

There is a routine within srpl for handling driver module unload requests. This routine is cleanup_module(). This routine closes each of the InfiniBand connections to any SRP I/O units. Each host structure is recycled; memory for all of the resources (e.g. messages and I/O request structures) is freed.

11-3

Page 46: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 47: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

12. Installing, Configuring, and Uninstalling

12.1 Installing TBD

12.2 Configuring Srpl will have some parameters that can be modified at driver load time to change the behavior of the driver. The standard Linux techniques for doing this are employed: specifying the parameter and new value on the load command line, or specifying the parameter and value in the modules.conf file, using an options entry in that configuration file. A full list of these parameters can be found in the release notes.

12.3 Uninstalling TBD

12-1

Page 48: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver
Page 49: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

13. Unresolved Issues Hot remove SRP resources from the fabric causes Linux problem. The function srpl_remove_unit() simply returns an error if it is ever called. Linux will figure out that the driver is not completing I/O requests to that resource and it will stop sending them. That’s the best there is now. As Linux develops a plug and play strategy that works for SCSI controllers, this driver may be modified to become capable of handling hot controller remove requests.

Linux isn’t designed to have its SCSI controllers removed while the system is running. There may be potential for creating a patch to the Linux kernel to mitigate this circumstance. Even still, with surprise removals or I/O unit failure, the caching nature of the file system will almost certainly leave the state of disk image inconsistent. Use of special file system types, such as AFS, or NFS or EXT3 (journaling) may help reduce data loss due to sudden removal of service. Even then Linux still thinks it has a controller present, even if a disk has gone away. This is evidenced by the observation that if one were to load the srpl driver and Linux finds (four) disks /dev/sdb through /dev/sde, and unload and reload the driver, Linux finds (the same four) disks and binds them to /dev/sdf through /dev/sdi. The original bindings still exist, but access through /dev/sdb (and c, d, and e) result in I/O errors, because those device objects have become invalid.

USB offers some hope: Linux seems to be able to handle dynamic additions and removals of USB devices. Perhaps there is something in that logic which will help us with the SCSI removal solution development. This will be explored.

13-1

Page 50: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

13-2

Page 51: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

13-1

Page 52: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

13-2

Page 53: IBA Software Architecture SCSI RDMA Protocol (SRP) Storage ...infiniband.sourceforge.net/Storage/SRP/srpl-hld.pdf · The Linux SRP driver, srpl, is a low level Linux SCSI driver

IBA Software Architecture SCSI RDMA Protocol (SRP) Storage Driver

High Level Design

14. Data Structures and APIs To view the data structures and APIs associated with this component, click on <hyperlink>.

[Authors Note: This driver has no APIs per se. It does have some data structures that can be displayed here after the RoboDoc formatting is done. As yet, though, the header files have not been modified to produce RoboDoc output. The link will be added here when that step is taken.]

14-1