23
DAPL: Direct Access Transport Libraries Introduction and Example Yufei 10/01/2010

DAPL: Direct Access Transport Libraries Introduction and Example Yufei 10/01/2010

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

DAPL:Direct Access Transport

LibrariesIntroduction and Example

Yufei10/01/2010

• Review

• Result– reasons

– methods

• mr and cm_id• examples

Why DAPL• A common set of APIs for all of RDMA

networks– Reader and Writer

uDAPL API classification• Local Resource Model

• Connection Management

• Data Transfer Operations Initiation

• Data Transfer Operation Completions

• Memory Management

• Error Detection and Notification

• Event Model

• Name Service

• High Availability

Migrate ftpd/ftp to RDMA environment

Server start listening, and wait for new connection request

Client open new connection, then login to the server

Child use the established connection to transfer

COMMANDs and REPLIES information with Client

fork()USER, PASS,PORT, PASV,RETR, STOR

communication channel

data transfer channel

• Use librdmacm to establish the RDMA data transfer channel instead of socket TCP

Bad result

File Size RDMA TCP

1GB 1.87 secs (5.5e+05 Kbytes/sec)

1.8 secs (5.7e+05 Kbytes/sec)

100GB 444 secs (2.3e+05 Kbytes/sec)

311 secs (3.3e+05 Kbytes/sec)

Example – rput file1. client tells the server, use RDMA_WRITE to transfer data, so

the server has to prepare for the mr to recv data

2. the server response the client, preparation finished, and send the data sink address to the client, the longest data the server can receive

3. the client start data transfer, use RDMA_WRITE, include the information header

4. the client wait for the cq of RDMA_WRITE (RDMA_WRITE_COMPLETE)

5. the client send information to the server that the RDMA_WRITE finished

6. the server write data to the file system

7. the server tell the client continue

method• Independent Data Loading Module

– Reader and Writer

• Independent Data Transfer Module– Sender and Receiver

• Difficultly– data structure design(encapsulation)

– task decomposition

– threads communication and synchronization• Signal and condition variable

– error handling

method cont’d - Batch mr post• rdma.c

• Ibv_post_send– Ibv_send_wr

• Struct– Ibv_send_wr *next;

• In iperf and previous version of netkit ftp, I always use one task each time. If use this work list, application could save much time related to information exchange.

The relationship between memory region (mr) and communication id (cm_id) from programmer’s view

Separated? or Combined?

cm_id: communication identifymr: memory region

pd: protection domainqp: queue pair

cm_id

• int rdma_create_id ( \

struct rdma_event_channel *channel, \ struct rdma_cm_id **id, \

void *context, \

enum rdma_port_space ps);

cm_id

• int ibv_alloc_pd ( \

struct ibv_context *context);

Example:

ibv_alloc_pd(cm_id->verbs);

pd

cm_id

• int rdma_create_qp ( \

struct rdma_cm_id *id, \

struct ibv_pd *pd, \

struct ibv_qp_init_attr *qp_init_attr);

Example:

rmda_create_qp(cm_id, pd, …);

qp = cm_id->qp

pd

qp

cm_id

• struct ibv_mr *ibv_reg_mr ( \

struct ibv_pd *pd, \

void *addr, size_t length, int access);

pd

qp

mr

cm_id

pd

qp

mr

cm_id

pd

qp

mr

The relationship between memory region (mr) and communication id (cm_id) from programmer’s view

Separated: de-re-register each timeCombined: attach to the

established link

Q & A