Intel Confidential — Do Not Forward
SC’16 Intel OPA LNET Update
Intel Confidential — Do Not Forward
LNET Intel OPA
2
LNET
Designed to meet the needs of large-scale computing clusters
Optimized for very large node counts, high throughput
Works with most networks, supports RDMA
Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc.
LNET is independent of the Lustre file system
Abstracts network details from Lustre
Implemented as a set of kernel modules
3
LNET (continued)
Networks are given unique names
o2ib0, tcp0, tcp1
Lustre Network Identifier (NID) defines interfaces
10.1.145.16@o2ib0
Includes native support for multiple networks
Accomplished via the Lustre Network Driver (LND)
Infiniband via o2ib verbs interface, with RDMA support
Ethernet via TCP/IP interface
Lustre -> Network RPC API -> LNET -> LND -> Linux Driver
4
Intel® Omni-Path Architecture
Building on the industry’s best technologies Highly leverage existing Aries and Intel®
True Scale fabric
Adds innovative new features and capabilities to improve performance, reliability, and QoS
Re-use of existing OpenFabricsAlliance* software
5
4
Robust product offerings and ecosystem End-to-end Intel product line
Strong ecosystem with 70+ Fabric Builders members
Software
Open SourceHost Software and
Fabric Manager
HFI Adapters
Single portx8 and x16
x8 Adapter(58 Gb/s)
x16 Adapter
(100 Gb/s)
Edge Switches
1U Form Factor24 and 48 port
24-portEdge Switch
48-portEdge Switch
Director Switches
QSFP-based192 and 768 port
192-portDirector Switch
(7U chassis)
768-portDirector Switch
(20U chassis)
Cables
Third Party VendorsPassive Copper Active Optical
Silicon
OEM custom designsHFI and Switch ASICs
Switch siliconup to 48 ports
(1200 GB/stotal b/w
HFI siliconUp to 2 ports
(50 GB/s total b/w)
LNET Intel® OPA Considerations
Base OS Support
Red Hat Enterprise Linux 7
SUSE Linux Enterprise Server 12
Lustre 2.7+ Required for OS Support Server Side
Intel Fabric Suite (IFS) delivers Driver UPDATES
IFS updates Base OS OFED components only as required
Enables the use of other “In Kernel” drivers concurrently
6
LNET Intel® OPA Considerations (continued)
Intel® OPA gen 1 Supports RDMA Verbs in OFED
LNET uses the existing LND Infiniband* Driver
Only LND and Driver TUNING required for operation
Automated LND settings at LNET install time
7
8
Intel OPA Development In Lustre
New ko2iblnd-opa driver
Intel OPA abstraction layer for Lnet
ko2iblnd-opa default settings
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
Allow the use of 2 different o2ib devices
Uses same LND driver
Different settings for IB and Intel OPA
Configure via Dynamic Lnet Configuration
Integration into Existing Fabrics
Challenge : Intel® OPA is not directly compatible with Infinband* (IB). Intel® OPA cannot plug into IB switch
Solution: LNET Routers
9
OPA Lustre Components
LNetRouters
IB Lustre ComponetsOPA Inifiband
LNET Routers Overview
LNET Routers: Lustre Software + Standard Hardware
Use off the shelf Hardware
Software is apart of standard LNET/Lustre
Clustered Deployment Recommended
Supported Configurations with Redhat* 7.2 and Lustre 2.7+
Intel® OPA -> Ethernet
Intel® OPA -> FDR (use in kernel drivers for IB)
Intel® OPA -> EDR (use in kernel drivers for IB)
See Intel® Enterprise Edition for Lustre* software Configuration For LNET Routers
10
Intel Confidential — Do Not Forward
Lnet Routing
11
12
LNet Routing
Clients and servers are endpoints, not routers
• Multi-homed servers can have multiple networks – Dual-rail, etc.
• Routers should be dedicated nodes with multiple networks
Module parameters identify routers
• All routing setups must be bi-directional, very much like other routing setups
Initial route decision is based on destination NID
• NID on a local network, send directly
• NID on a remote network, consult routing table
LNet routing table is in /proc
Routing decisions are based on hop count
• If there is a pool of routers, message is sent to the router with the shortest queue
Intel Confidential — Do Not Forward
13
LNet Routing (cont)
LNet Routers often connect different hardware (IB to Intel OPA, etc.)
• Dedicated hardware tends to be expensive
• Can run on any node(s) with both network interfaces
• LNet routers only route LNet traffic
Routers are fairly simple
• Have connections to more than one LNet network
• Have forwarding enabled, to forward traffic between LNet's
Routing includes a health check
• Disabled by default, but always best to enable
• Router checker can revive dead routers
Watch LNet router statistics
• Use the /usr/sbin/routerstat command
Intel Confidential — Do Not Forward
14
LNet Routing - Configuration
Example using networks:
• Servers on LAN1 – 10.10.0.0/24
• Clients on LAN2 – 10.20.0.0/24
• Router on LAN1 and LAN2 at 10.10.0.20 and 10.20.0.29
Servers:
options lnet networks="tcp1(eth1)" route="tcp2 10.10.0.20@tcp1"
Router:
options lnet networks="tcp1(eth1), tcp2(eth2)" "forwarding=enabled"
Clients:
options lnet networks="tcp2(eth1)" routes="tcp1 10.20.0.29@tcp2"
Print the configured routes:
# lctl route_list
Intel Confidential — Do Not Forward
15
LNet Routers – Pooling of Routers
Routers support a ‘pool’ model
• Routers discover each other, function as a pool (cluster)
• Monitor peer health and communicating state
• Will route traffic around failed peer
• Load balancing overall load across multiple routers
Router pooling does add some complexity versus a single router
• Routers feed back state information to the client
• Clients process the state of each router in the pool
• Use data to load balance traffic across the entire pool of routers
Router pooling is easy to configure
• Clients are configured to know NIDs of all the routers
Intel Confidential — Do Not Forward
16
LNet Routers – Pool Configuration
Example configuration using networks:
• Servers on LAN1 – 10.10.0.0/24
• Clients on LAN2 – 10.20.0.0/24
• Routers on LAN1 and LAN2 at 10.10.0.20-29 and 10.20.0.20-29
Servers:
options lnet networks="tcp1(eth1)" route="tcp2 10.10.0.[20-29]@tcp1"
Routers:
options lnet networks="tcp1(eth1), tcp2(eth2)" "forwarding=enabled"
Clients:
options lnet networks="tcp2(eth1)" routes="tcp1 10.20.0.[20-29]@tcp2"
Intel Confidential — Do Not Forward
17
LNet Routing – More Configurations
Configure "Router Checker" options on "clients":
options lnet networks="tcp2(eth2)" \
auto_down=1 \
live_router_check_interval=60 \
dead_router_check_interval=60 \
check_routers_before_use=1 \
forwarding=disabled \
accept=none
Intel Confidential — Do Not Forward
Wrap up
Lustre Intel® OPA is in production Today
LNET Routers provided Flexible Deployment Options
Learn More
www.intel.com/Lustre
18
Intel Confidential — Do Not Forward