19
Develop Application with Open Fabrics Yufei Ren Tan Li

Develop Application with Open Fabrics Yufei Ren Tan Li

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Develop Application with Open Fabrics Yufei Ren Tan Li

Develop Application withOpen Fabrics

Yufei RenTan Li

Page 2: Develop Application with Open Fabrics Yufei Ren Tan Li

 Agenda

• RDMA concept review• Modules in OFED-1.5.1 userspace• librdmacm(RDMA Communication)• libibverbs(InfiniBand)• Installation OFED on FedoraCore12/RHEL5

• about Lustre & future work

Page 3: Develop Application with Open Fabrics Yufei Ren Tan Li

RDMA ?

• RDMA: networking technologies that have a software interface with three features:– Remote DMA (RDMA write, RDMA read)– Asynchronous work queues (as Tan has illustrated)– Kernel bypass

Page 4: Develop Application with Open Fabrics Yufei Ren Tan Li

RDMA - Kernel bypass

non-iWARP iWARP

Page 5: Develop Application with Open Fabrics Yufei Ren Tan Li

RDMA Verbs and Objects

• Not quite an API

• Abstract definition of functionality

• “Resources(Objects) operated on by Verbs(functions).”– such as Queue Pair/Completion Queue operated on by

Create/Destroy.– rdma_create_qp()/rdma_destroy_qp() in

librdmacm/include/rdma_cma.h

• Maybe considered as Object and Method in OO language(C++/Java).

Page 6: Develop Application with Open Fabrics Yufei Ren Tan Li

What is OpenFabrics

• include:– Kernel-level drivers– Channel-oriented RDMA bypasses– Application Program Interface(API)

• for:– Parallel Message Passing(MPI)– Socket Data Exchage(SDP)– File System(Lustre)

 

Page 7: Develop Application with Open Fabrics Yufei Ren Tan Li

Modules in OFED-1.5.1 userspace

• librdmacm: Linux library to abstract connection setup.

• libibverbs: a library that allows programs to use RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace.

• device-specific drivers:– IB: libmthca, libmlx4, libipathverbs,

libehca– iWARP: libcxgb3, libamso

Page 8: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm

• Linux library to abstract connection setup. Same code runs on IB and iWARP fabric technologies.

• Mimics TCP socket model. (socket, connect, bind, listen, accept, getaddrinfo, etc). cm_id is socket analog.

• IP addressing can be used on iWARP, even InfiniBand (IPoIB).• Additional address/route resolution steps.–rdma_resolve_addr()–rdma_resolve_route()

• Events reported through “channels”- rdma_create_event_channel()- rdma_get_cm_channel()- rdma_ack_cm_channel()

   

Page 9: Develop Application with Open Fabrics Yufei Ren Tan Li

An example of ftp via OpenFabrics

Put

Get

RDMA FTP Client

RDMA FTP Serverrdma_getaddrinfo()

rdma_create_ep()

rdma_listen()

rdma_accept()

blocks until connection from

client

rdma_get_recv_comp()

rdma_post_send()

rdma_connect()

rdma_post_send()

rdma_get_recv_comp()

rdma_disconnect()

connection establishment

data

data

rdma_getaddrinfo()

rdma_create_ep()

rdma_deg_mr()

rdma_destroy_ep()

rdma_disconnect()

rdma_deg_mr()

rdma_destroy_ep()

FTPProtocol

FS

Page 10: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm – initialization

• rdma_create_event_channel()– Open a channel used to report communication events.

Asynchronous events are reported to users through event channels. Each event channel maps to a file descriptor.

• rdma_create_id()– Allocate a communication identifier. Creates an

identifier that is used to track communication information. Just as socket_fd.

Page 11: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm – active connection steps• rdma_resolve_addr()

– Resolve destination and optional source addresses from IP addresses to an RDMA address. If successful, the specified rdma_cm_id will be bound to a local device. getaddrinfo() in socket API.

• rdma_resolve_route()– Resolve the route information needed to establish a

connection. This is called on the client side of a connection after calling rdma_resolve_addr, but before calling rdma_connect.

• rdma_connect()– Initiate an active connection request.

Page 12: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm – passive connection steps• rdma_bind_addr()

– Bind an RDMA identifier to a source address.

• rdma_listen()– Listen for incoming connection requests.

• rdma_accept()– Called to accept a connection request.

Page 13: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm – data transfer• rdma_post_send()

– opcode == IBV_WR_RDMA_READ– RDMA read

• rdma_post_send()– Opcode == IBV_WR_RDMA_WRITE– RDMA write.

• librdmacm/example/rping.c

Page 14: Develop Application with Open Fabrics Yufei Ren Tan Li

librdmacm – Abbreviation•QP: queue pair•CQ: completion queue•WQ: working queue•MR: memory region•PD: protection domain•SRQ: shared receive queue•AH: address handle•MW: memory window

Page 15: Develop Application with Open Fabrics Yufei Ren Tan Li

libibverbs• libibverbs is a library that allows programs to use

RDMA "verbs" for direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace.

• Linux implementation of RDMA verbs.• Loads device-specific drivers for hardware

support.• IB: libmthca, libmlx4, libipathverbs, libehca• iWARP: libcxgb3, libamso

Page 16: Develop Application with Open Fabrics Yufei Ren Tan Li

Install OFED on FedoraCore12

http://docs.google.com/Doc?docid=0AYXBBIFwi6bqZGY5cm1jeGJfNjAzc2N6eGt2Mw&hl=en 

Page 17: Develop Application with Open Fabrics Yufei Ren Tan Li

lustre

• File system clients• Object Storage Servers(OSS): provide file I/O

services• Metadata Servers(MDS): manage the names and

directories in the file system

Page 18: Develop Application with Open Fabrics Yufei Ren Tan Li

Lustre – cont’

Page 19: Develop Application with Open Fabrics Yufei Ren Tan Li

Future work

• OpenFabrics run example on netqos04.• Configure lustre on netqos04. Real cluster need

more machines. LPAR?• OpenFabrics sources and RFC5040/5041/5044.