24
OFED for Linux: Status and Next Steps www.openfabrics.org 1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED for Linux: Status and Next Steps 1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Embed Size (px)

Citation preview

Page 1: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED for Linux: Status and Next Steps

www.openfabrics.org 1

Betsy Zeller (Qlogic), Tziporet Koren (Mellanox)3/16/2010

Page 2: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Agenda

• Linux OFED components• Releases completed in 2009:

– OFED 1.4.1 & 1.4.2– OFED 1.5

• Releases planned for 2010:– OFED 1.5.1– OFED 1.5.2/3

• Future releases:– OFED 1.6 and beyond

• How can you contribute?• Questions and discussion

– Planning for scalability and discussion on MPI inclusion

2www.openfabrics.org

Page 3: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Linux OFED Components

• HCA/NIC Drivers – IB: IBM, Mellanox, QLogic– iWARP: Chelsio, Intel– RoCEE: Mellanox

• Core: Verbs, mad, SMA, CM, CMA• IPoIB• SDP• SRP and SRP Target• iSER and iSER Target• RDS• NFS-RDMA • Qlogic_VNIC• uDAPL• OSM• Diagnostic tools

Bonding module Open iSCSI MPI Components

MVAPICH Open MPI MVAPICH2 Benchmark tests

Proprietary MPIs: Intel, Platform MPI

Proprietary SMs: Sun, Voltaire, Qlogic, Mellanox

OFA Development Add on

Tested with

3

Page 4: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.4.1• Released on June 3, 2009• Distro integration:

– Integrated into RHEL 5.4– SLES 11

• New features – Added support for RHEL 5.3 and SLES11– NFS/RDMA: In beta quality with support for RHEL 5.2, 5.3 and SLES 10 SP2– Updated MPI packages: MVAPICH 1.1.0-3355, Open MPI 1.3.2– Updated bonding package: ib-bonding-0.9.0-40– Updated DAPL: compat-dapl-1.2.14 and dapl-2.0.19– Updated OpenSM version to include critical bug fixes– Fixed RDS iWARP support– Low level drivers updated: ehca, mlx4, cxgb3, nes, ipath, mthca– Added a module parameter to control number of MTTs per segment in Mellanox HCAs– mstflint update– Enhanced OpenSM and management tools, user interface, HA, routing enhancements,

and more

• Used in Intel ® Cluster Ready Solutions 4

Page 5: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.4.2

• Released on Aug 6, 2009– Critical bug fixes, including: – Fixes to NES (Intel iWarp) driver– Fixes to support running with Lustre installed– NFS/RDMA critical bug fixes (still technology preview)

• Distro integration:– RHEL 5.5 1.4.2+– SLES 11 SP1 1.4.2 (kernel from 2.6.32)

• Passed Oracle 11g certification with RDS

www.openfabrics.org 5

Page 6: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.5

• Released on December 30, 2009– Kernel base: 2.6.30

• To be included in RHEL 5.6• Used in Intel ® Cluster Ready Solutions• List of Supported Kernels for OFED 1.5

– RHEL4: up6, up7, up8 – RHEL5: up2, up3, up4 – SLES10: SP2, SP3 – SLES 11 – Fedora Core 11* – OpenSuSE 11* – Kernel.org: 2.6.18-2.6.32** minimal QA for these versions.

www.openfabrics.org 6

Page 7: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.5 – New features• New OSes: RedHat EL5.4, EL4.8 and SLES 10 SP 3• Kernel Base: 2.6.30

– New supported kernels: 2.6.29-2.6.32– Forward port to support kernels above 2.6.30

• Hardware driver (qib) for new Qlogic QDR HCA• Added ConnectX2 support to mlx4 driver• User space packages as tar balls for easier distro integration

– All libraries available under: http://www.openfabrics.org/downloads/

• SDP:– Zero Copy (beta); more performance improvements

• Bonding: Included only for older distros that does not include the IB support– RHEL 5.3 / SLES 10 SP2 or older version

7

Page 8: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

1.5 – New features – Cont.

• uDAPL: – Scalability enhancements– New UCM provider– Common code base with WinOF 2.1.

• MPI updates:– OSU MVAPICH 1.2.0– Open MPI 1.4– OSU MVAPICH2 1.4– MPI tests 3.2

• NFS-RDMA in beta• Many bug fixes

www.openfabrics.org 8

Page 9: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

1.5 – Open SM new features

Scalability & performance

• Optimized SL2VL setup

• Parallel LFTs setup

• Parallel MFTs setup

Routing & multicast

• FTree improvements

• Routing engine reloading

• Mesh switch reordering

optimizations

• MGID to MLID compression

QoS improvements

SL2VL setup optimization

QoS/LASH co-exist

Major bug fix

MCG join/leave fixes

Clean delayed MCG deletion

9

Page 10: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.5.1

• To be released within the next week• Main new features:

– Added RoCEE support– Updated Open MPI to rev 1.4.2– Updated MVAPICH2 to rev 1.4-2– Updated DAPL to rev 2.0.26– Added extended atomic operations to ConnectX (kernel)– Improved NFS/RDMA stability

• Need more testing

– Added IPv6 support to RDMACM– Bug fixes

www.openfabrics.org 10

Page 11: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 2010 Plans

• Proposal:– Provide additional dot releases throughout the year

• 1.5.2 – Jun• 1.5.3 – Sep

– Delay 1.6, and changing the kernel base, till Q1 2011

• Pros: – Improve the stability of OFED 1.5.x, faster bug fix turnaround– Integrate support for new distros as they become available– Logo program: Vendors will be able to fix issues found

• Cons: – Many patches in the fixes directory

• Opinions? …

www.openfabrics.org 11

Page 12: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.5.2 New Features

• Add new OSes: RHEL 5.5, SLES11 SP1• Update the management package• Add-on packages that does not touch the core• New low level drivers for new HW• Critical bug fixes

• Schedule: June/July– Decide in EWG meetings

* OFED 1.5.3 – will decide if needed after 1.5.2www.openfabrics.org 12

Page 13: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.6 New Features

• kernel base 2.6.35 or 36• Virtualization and SRIOV support• Mellanox Vnic/vHBA for BridgeX• New HW from vendors (if any)• New scalability features (if any)• Soft-RoCEE & Soft-iWarp drivers

– If ready can go into 1.5.x before

• New management features– 3D torus support by OpenSM

• MMU notifier for MPI – Solution that will be accepted by the kernel

• Improve connection management scheme?• More features – Based on input from customers

www.openfabrics.org 13

Page 14: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

OFED 1.6 Schedule & OS support

• OS support reduction– Stop supporting old OSes (e.g. RHEL 4.x, SLES 10)

• Schedule suggestion:– Start to work during Q4 2010– Release in Q1 2011

• Opinions?

www.openfabrics.org 14

Page 15: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Reminder: How do you contribute?

• Develop new code and features• Bug fixes• Performance tuning• Contribute backports for new OSes and forward ports for

new kernels• QA and testing• Send patches and comments to the mailing lists:

[email protected] – OFED for Linux specific only– [email protected] – General Linux development

• Open bugs in Bugzilla (https://bugs.openfabrics.org/)– Choose OpenFabrics Linux when opening a new bug– Test old bugs with new releases and update on bugzilla

• Participate in EWG bi-weekly meetings15

Page 16: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Open Discussion

• Should we have a scalability roadmap for OFED? How do we plan for scalability?

• Should we continue to include MPIs in the OFED releases?

• What’s the next step on MMU notifier kernel support for MPI?

• Questions and feedback from the community

16

Page 17: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Scalability Challenges

• Scale out to 10K-20K or more nodes– Performance– Reliability– Subnet management

• Focus additional attention/resources on issues– Should we have a BOF on this?

www.openfabrics.org 17

Page 18: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Infrastructure Scalability/Features • Improve multicore affinity/awareness• Multicast• Flow control for SRQ• Fault tolerance• IB monitoring

– Performance counters, throughput, hotspots, degraded links

• Adaptive routing– HCA out-of-order delivery– Switch logic for state info & adaptive algorithm, etc.

• Path Record Support• RDMA CM

18

Page 19: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

MPI Distribution in OFED: Rationale

• Open source MPI’s initially included to “bootstrap” the OFED project

• MPI was the main user for OFED, so this seemed like a natural pairing

– Made it (significantly) easier for customers to get their MPI jobs running on InfiniBand

– However, distros don’t like it – they package their own

• QA testing of MPI + OFED is still extremely valuable

– This is not a discussion of removing MPI + OFED QA

• How many use MPI from OFED, vs compiling own?19

Page 20: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

MPI Distribution in OFED: Pros

• MPI is still the most common OFED “customer”

• HPC customers get network stack + MPI in one package

– Helps rapid MPI deployment on new clusters (out-of-box)

– MPI-selector function allows to select MPI stack of choice during the

installation

• Customers get QA assurance of specific MPI + OFED

version tuples

• Helps to test multiple functionalities of the OFED stack and

IB/iWARP fabric with comprehensive suite of MPI-level

benchmarks20

Page 21: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

MPI Distribution in OFED: Cons

• MPI’s have their own QA cycles

– MPI+OFED QA testing is more for OFED, not MPI

• Bundling induces project scheduling difficulties between OFED and various MPI packages

• RedHat and SuSE both say “Don’t do this!”

– They both already include the open source MPI’s

– Makes it more difficult for them to take OFED drops

• Many users will download the latest-n-greatest MPIs anyway – not the ones included in OFED

21

Page 22: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

User Level MMU Notification

• Userlevel registration caches need to be notified when registered memory is freed– Problem for middleware that hides memory registration– For example: MPI

• Roland prototyped “ummunotify”– Jeff Sq. adapted Open MPI to use ummunotify– …but ummunotify was rejected by the kernel maintainers– KM’s want same functionality on performance counters

• MPI’s still need this functionality– Roland not available; Jeff Sq. will do the MPI work– Who can do the kernel side?

www.openfabrics.org 22

Page 23: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Questions and feedback from the community

www.openfabrics.org 23

Page 24: OFED for Linux: Status and Next Steps  1 Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010

Thank You!

www.openfabrics.org 24