44
QLogic Confidential 1 Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer 919-699-2951 [email protected]

Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

Embed Size (px)

Citation preview

Page 1: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential1

Complete InfiniBand From QLogic

Wenhao WuHPC Systems Engineer

[email protected]

Page 2: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential2

Meeting Agenda

QLogic in HPC marketSun/QLogic EngagementsQLogic IB Portfolio – “Complete InfiniBand”QLogic IB stack and ProtocolsDiscussions

Page 3: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential3

History of Infiniband

In 1999, two separate initiatives merged to form the Infiniband Trade Association (IBTA)

NGIO backed by IntelFIO backed by IBM, Compaq and HP

Charter members include Sun, HP, IBM, Dell, Microsoft and IntelSpec was written that all companies in the IB space strive to maintain (current is 1.2) for industry standard interoperability

Page 4: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential4

Common Infiniband Terminology

Switches• Provides “any-to-any” high speed access within the IB network• Switches are currently available in sizes from 12 nodes to 244

nodes• Single Data Rate (SDR) at 10 Gb/s; Double Data Rate (DDR) at

~ 20 Gb/s and Quad Data Rate (QDR) at 40 Gb/sHCA – Host Channel Adapter• Resides within a server• Configurations include

PCI-XDual Port with memory (133 MHz optimal, 64 Bit)

PCI Express (PCI-e x8)Dual Port with memoryDual port memory freeSingle port memory free

Hypertransport

Page 5: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential5

QLogic Provides Leading InfiniBand Technology

Little Mountain Group2001

Troika Networks2005

Ancor Communications2000 Storage Solutions Group

Switch Products Group

Computer Systems Group

Emulex Micro Devices1993

April 2006

November 2006

Page 6: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential6

Positioned for Success in InfiniBand

Financially strong public company• $586M revenue in FY07• $500M cash

Acquired two leading InfiniBand companies in 2006• Over 200 employees focused on

InfiniBand and HPCStrong and growing HPC business development teamWell established global support teamMajor investments in DVT and Signal IntegrityDemonstrated success in delivering current generation of end-to-end solutions• Directors offered over a year ahead of

nearest competition • Over 55,000 external DDR InfiniBand

switch ports shipped

428.7494.1

586.1

$0.0

$100.0

$200.0

$300.0

$400.0

$500.0

$600.0

FY05 FY06 FY07

($Millions)

Page 7: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential7

QLogic Supercomputing TOP100

QLogic is installed in 10 of 26 IB Clusters in the Supercomputing TOP100

Cisco/Dell15.62208Stanford54

Cisco/Dell34.85440Louisiana ONI23

Cisco/Dell539024NNSA/Sandia11Cisco/Dell46.75848TACC15Cisco/Dell42.45200Maui MHPCC16

SilverStorm Direct12.22200Virginia Tech71Alliance10.81140Intel81

Linux Networx15.23368ARL57

Dell18.32340Cambridge44

Linux Networx40.64416ARL17

PartnerRMaxTFLOPS

CPUsSiteRank

Page 8: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential8

Industry’s First 500+ SDR Fabric – RIKEN

2004#111 on the TOP500 after 3 years in operation

• Installation: Q1 2004. Operational: March 1, 2004• 512-node cluster (1,024 CPUs) with InfiniBand by leveraging

the combined technologies of Fujitsu and InfiniCon Systems.• Fujitsu rigorously evaluated InfiniCon’s products for data integrity,

robustness, component selection, management interfaces, mechanical design, reliability, and numerous other criteria.

• BenefitsCreated one of the world's top-performing supercomputers (6.2 TeraFlops), using commodity components and industry-standard technologies.

First 500-node clusterAccelerated performance to complete compute tasks by 20-45% over competitive clustering solutions.

Page 9: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential9

Industry’s First 800+ node DDR Network Deployment

2006TI-06 with ARL & Linux Networx

The First and Largest (at the time) DDR Network Deployment

2 systems (1100 & 842 compute nodes), 70 TFLOPS: June 2006# 57 on the TOP500

• Largest DDR Cluster at the time • 842 Compute Nodes

3368 Processors17 TFLOPS

*1 HABU = equivalent performance of the weighted DoD benchmark suite on a 1024 CPU IBM Power3 system

Page 10: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential10

First 1100+ DDR Cluster

June 2007# 17 on the TOP500

• Largest DDR Cluster at the time • 1100-compute nodes (MJM)

4400 3.0 GHz Intel Woodcrest cores for computation53 TFLOPSIncreased computational capability more than 64 HABUs* and 50 TFLOPs.

• 28 Management Nodes 112 3.0 GHz cores Login, storage, and administration8.8 TB of memory and 200 TB of disk

• A global file system of 200 TBIBM GPFS using QLogic Sockets Direct Protocol (SDP) for InfiniBand Performance in excess of 9.5 GB/sec

• All nodes communicate via 4X DDR (20 Gbps) QLogic IB

IB to 10GbE Gateway Module providing all nodes with uplink capability.

Page 11: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential11

Engagements with Sun

Page 12: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential12

Sun/QLogic – Joint Efforts

• Sun Partner Advantage Program member• Sun HPTC Alliance Partner• Sun Solaris Ready Program member

• InfiniBand Infrastructure supplier for Sun Solution Center for HPC in Oregon - 1000+ node capability, high density cluster, Top500 SC listing

• InfiniBand Infrastructure supplier for Sun Standards Benchmark Labin Oregon – 128 node plus FC I/O capableLustre• Total at Houston, 512 cores with QLogic SDR InfiniBand adapters• Quick Silver stack certified with Lustre

Page 13: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential13

Sun/QLogic – Joint Efforts

• Schlumberger, joint efforts produced Certified Oil&Gas offering used in Sun RFQ bids

• Oracle RAC, joint efforts produced Finance reference demo

• Installs at Kuwaiti Oil Company, University of Granada, University of Oslo, Conoco-Alaska, SDSC, Penn State, Princeton, Oregon State, Univ. Catholique de Louvain, Univ. de Liege, BioInformatic Institute, UAE Weather

Page 14: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential14

InfiniBand Product Portfolio

Page 15: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential15

Anatomy of a 1000 Node Cluster

3-4 Management Nodes 968 Compute Nodes 32 I/O Nodes

xxTB FC SAN xxTB NASIB SAN

HCAs

xxTBDAS

IBEdge

SwitchIB Director

Page 16: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential16

Complete InfiniBand Solution

Widest range of host InfiniBand adapters

OFED Plus software stacks

High Capacity Multi-protocol Director

InfiniBand Edge Switches

Multi-protocol gateways

Cables

InfiniBand Fabric Management

Page 17: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential17

QLogic Host Channel Adapters

First Silicon Alternative for IB HCAsFirst x16 DDR PCIe HCAComplete Offering of HCA IB Family• x8, x16, SDR, DDR, Dual Port, Single Port,

Mem, MemFreeNext Generation HCA Technology Leadership• Very low latency coupled with very high

message rate• Industry leading Spec MPI2007 results• Lowest Power Consumption: <1/2 power of

alternatives

Page 18: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential18

7U

14U

2U

7U

19” rack

4U

Series 9000 InfiniBand Products

9240 - 288 Ports

9120 - 144 Ports

9080 - 96 Ports

9040 - 48 Ports

9020 - 20 Ports

Core Fabric Switches

Multi-Protocol Fabric Director

Modules

4X SDR

4X DDR

- Same Modules – IB, FC, Ethernet

- Same Spine

- Same Power Supplies unit

- Same Fan Tray unit

- Same OOB management interface

- Same Serial interface

- Same running image

9024 - 24 Ports

EVIC

FVIC

Page 19: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential19

9240 (Core)9240 (Edge)

10,369 – 20,736

9240 (Core)9120 (Edge)

2,437 – 10,368

9240 (Core)9024 Unmanaged (Edge)

1,729 – 3,456

9120 (Core)9024 Unmanaged (Edge)

289 – 1,728

9240145 – 288

912097 – 144

908049 – 96

904021 – 48

9024 (Managed)1 – 24

90201 – 22

Switch Type# of Nodes

7U

14U

2U

7U

19” rack

4U

Series 9000 Usage

9240 - 288 Ports

9120 - 144 Ports

9080 - 96 Ports

9040 - 48 Ports

9020 - 20 Ports9024 - 24 Ports

Page 20: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential20

QLogic IB Stack and Protocols

Page 21: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential21

IB Protocol

Page 22: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential22

Linux Software Stacks: Standards +

InfiniPath Adapters only

iPath Driver

Verbs Provider Driver

IPTra

nsp

ortQ

Lo

gic

MP

ICH

Sta

nd

ard

Lin

ux T

CP

/IP

S

tack

PSM API

AcceleratedStacks

StandardStack

Op

en

MP

I

HP

-MP

I

Sca

li M

PI

MV

AP

ICH

MV

AP

ICH

2

For InfiniPath Adapters, additional acceleration is available!

OpenFabrics Verbs

VNIC, SDPSRP, iSER

OpenFabrics Stack

Inte

l M

PI

Op

en

MP

I

uDAPLMV

AP

ICH

Oracle RACAccelerator(via RDS)

GPFS Accelerator(via SDP)

Enterprise SDP(NFS, inet, etc)

Enterprise EthernetIO Controller

(via VNIC)

Enterprise IB Storage& FC IO Controller

(via SRP)

Enterprise FabricManagement

FastFabric Tools

QuickSilver Stack

QuickSilver Verbs

Inte

l M

PI

Op

en

MP

I

uDAPLQS

-MV

AP

ICH

Sca

liM

PI

HP

MP

I

VAPI

Page 23: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential23

QLogic OFED+ Host Software Stack

All OFED benefits plus optional value added capabilitiesAll OFED benefits plus optional value added capabilitiesAll OFED benefits plus optional value added capabilities

Page 24: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential24

Host Software Support ModelO

penF

abric

s A

llianc

e D

evel

opm

ent S

tream

snapshot OFED 1.1

snapshot OFED 1.2

snapshot OFED 1.3

QLogicOFED 1.1

QLogicOFED 1.2

QLogicOFED 1.3

snapshot

snapshot

snapshot

QLogicValue-adds

Model provides for the rapid resolution of customer issues!

Bug Fix

OFA Maintained OFED Maintained QLogic Maintained

Page 25: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential25

Rebuilding QuickSilver & QuickSilverMPI

Page 26: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential26

Build QuickSilver from Source

Get a copy of InfiniServ*GPL.tgz (and InfiniServNonGPL*.tgz if IFS2008 was purchased.)Extract the packages, GPL firstCommon missing packages include: kernel-headers, kernel-source, x11-devel, g77, expect, and tcl. If an error occurs, parse make.res for “ERROR” or “Error” or send a copy of make.res to [email protected]

[root@tsg67 ~]$ tar zxfInfiniServBasicGPL.4.1.1.0.15.tgz[root@tsg67 ~]$ cd InfiniServ.4.1.1.0.15/ALL_HOST/[root@tsg67 ALL_HOST]$ ./do_buildICS CDE Environment Settings:

Build Target : X86_64Build Target OS : redhat LINUX 2.6.9-67.ELsmpBuild Platform : redhat LINUX…

Page 27: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential27

Building QuickSilver from Source

Once complete you will see a summary report of the build.The software is located in release/<OS>/<arch>/

From this point you can proceed with a standard installInfiniServ*G.tgz – G means this was a compilation from source

[root@tsg67 ALL_HOST]$ ls -al release/redhat/X86_64/total 258748drwxr-xr-x 4 root root 4096 Apr 9 13:36 .drwxr-xr-x 3 root root 4096 Apr 9 13:35 ..drwxr-xr-x 9 root root 4096 Apr 9 13:35 InfiniServ.4.1.1.0.15G-rw-r--r-- 1 root root 100045684 Apr 9 13:36 InfiniServ.4.1.1.0.15G.tgzlrwxrwxrwx 1 root root 22 Apr 9 13:36 InfiniServBasic.4.1.1.0.15G -> InfiniServ.4.1.1.0.15G-rw-r--r-- 1 root root 100044283 Apr 9 13:36 InfiniServBasic.4.1.1.0.15G.tgz

Page 28: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential28

Rebuilding QuickSilver MPI for other 3rd

party compiler support

Insure MPI Source was installed (iba_config)

The following compiler build options are available if installed. (Insure the compiler you need is in $PATH)

[root@tsg67 mpich]$ cd /opt/iba/src/InfiniServMPI/mpich

Compiler options are:gnu_autoselect: Choose gnu or gnu4 based on g77/gfortran availablity. (default)gnu: gcc, g77 (default if g77 on system)gnu4: gcc, gfortran (default if gfortran on system)path_x86_64: Pathscale Compiler for x86_64pgi_x86_64: Portland Compiler for x86_64

(link symbols have two underscores)pgi_x86_64_nsu: Portland Compiler for x86_64

(link symbols have single underscore)pgi_x86_32: Portland Compiler for Opteron with 32bit OS

(link symbols have two underscores)pgi_x86_32_nsu: Portland Compiler for Opteron with 32bit OS

(link symbols have single underscore)pgi_ia32: Portland Compiler for IA32pgi_ia32_nsu: Portland Compiler for IA32

(link symbols have single underscore)

ifc_ia32: Intel Fortran Compiler for IA32ifc_x86_64: Intel Fortran Compiler for X86_64ifc_ia64: Intel Fortran Compiler for IA64ifcc_ia32: Intel Fortran and C Compiler for IA32ifcc_x86_64: Intel Fortran and C Compiler for X86_64ifcc_ia64: Intel Fortran and C Compiler for IA64lf95_ia32: Fujitsu F95 Compiler for IA32

Page 29: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential29

Rebuilding QuickSilverMPI

./do_config provides you with command line control to select installation location and compiler type. (./do_config pgi_x86_64 /opt/mpich_pgi)./do_build automates this function by searching your $PATH for additional compilers.

[root@tsg67 mpich]$ ./do_build

InfiniServ MPI Library/Tools rebuild1) GNU_g772) pgi_x86_643) pgi_x86_64_nsuSelect Compiler: 1

Page 30: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential30

QuickSilver SM

Page 31: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential31

QLogic Subnet Manager - SM

What is it?• The backbone of the InfiniBand Spec• The SM is responsible for initializing the fabric and

coordinating subnet management responsibilities with other redundant subnet managers. Fabric initialization includes:

1.Setting up the routing tables for unicast and multicast operation

2.Setting up user specified partitions for security/provisioning3.Setting up SL to VL mapping tables to configure QOS (quality

of service) policies4.Setting up VL Arbitration Tables to configure packet priorities

Without an SM, the InfiniBand fabric will not be active. It will stay in an “init” state.

Page 32: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential32

QLogic Fabric Manager

Key Capabilities• Very High Performance• Rapid Response to Fabric Changes• Highly Scalable SM and SA• Fully Redundant Operation• Supports Fabric Verification and Diagnosis Tools• Sophisticated Routing algorithms• Required for Adaptive Routing• Will Support Virtual Fabrics

QLogicFabric

Manager

SM

PM

BM

SA

FE

Page 33: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential33

Fabric ManagerScalability to Meet the Challenge

2048 node fabric initialization in <30 seconds• 1024 nodes < 15 seconds• 512 nodes < 6 seconds• 128 nodes < 2 seconds• Significant Performance Improvements every year

Rapid Identification of Fabric Changes• Essential to keeping fabric up in face of failures• Trap based mechanism allows << 1 second identification• No need for frequent bandwidth wasting sweeps

Highly Scalable SA implementation• Rapid response time speeds up application startup• Handles worst case burst of queries without dropping any

Page 34: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential34

How to enable the Subnet Manager

Run on the switch itselfSM Key needs to be generated for the switch’s specific GUID and the serial number from the CD caseContact support

Page 35: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential35

Diagnostic Files

Page 36: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential36

Diagnostic Files - Review Port Information

[root@compute-0-1 port1]# p1info (Fast fabric Tools shortcut)

[root@compute-0-1 port1]# cat /proc/iba/mt23108/1/port1/info

Port 1 Info

PortState: Active PhysState: LinkUp DownDefault: Polling

LID: 0x002D LMC: 0

Subnet: 0xfe80000000000000 GUID: 0x00066a00a00006ff

SMLID: 0x0000 SMSL: 0 RespTimeout: 32 ms SubnetTimeout: 4096 ns

M_KEY: 0x0000000000000000 Lease: 0 s Protect: Readonly

MTU: Active: 512 Supported: 2048

LinkWidth: Active: 4x Supported: 1-4x Enabled: 1-4x

LinkSpeed: Active: 2.5Gbps Supported: 2.5Gbps Enabled: 2.5Gbps

VLs: Active: 4+1 Supported: 4+1 HOQLife: 4096 ns

Capability 0x02010048: CR CM SL Trap

Violations: M_Key: 0 P_Key: 0 Q_Key: 0

ErrorLimits: Overrun: 0 LocalPhys: 15 DiagCode: 0x0000

P_Key Enforcement: In: Off Out: Off FilterRaw: In: Off Out: Off

Note: IPoIB will not be able to register if it is 1X

Page 37: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential37

Diagnostic Files - Reviewing Port Statistics

[root@compute-0-1 port1]# p1stats

[root@compute-0-1 port1]# cat stats

Port 1 Counters

Performance: Transmit

Xmit Data 78625 MB (0 Quads)

Xmit Pkts 92830192

Performance: Receive

Rcv Data 8792 MB (0 Quads)

Rcv Pkts 8292051

Errors: Async Events:

Symbol Errors 12 State Change 0

Link Error Recovery 0 Traps:

Link Downed 0 Link Integrity 0

Port Rcv Errors 0 Exc. Buffer Overrun 0

Port Rcv Rmt Phys Err 0 Flow Control Watchdog 0

Port Rcv Sw Relay Err 0 Capability Mask Chg 0

Port Xmit Discards 0 Platform Guid Chg 0

Port Xmit Constraint 0 Bad M-Key 0

Port Rcv Constraint 0 Bad P-Key 0

Local Link Integrity 0 Bad Q-Key 0

Exc. Buffer Overrun 0 Other 0

VL15 Dropped 0

To Clear counters: echo stats

Page 38: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential38

Diagnostic Files – iba_capture capture.tgz

iba_capture whatever.tgz will provide support with all relevant host information in order to diagnose a problem

[root] less whatever.tgzetc\sysconfigtmp\capture19723

\proc\sys

var\log

Page 39: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential39

iba-capture – etc/

Etc/ directory of the capture file contains all the configuration information that the QuickSilver drivers useThis gives our support a look at the systems configuration and verify parts of the installation

[root@tsg68 whatever]$ ls -aR etc/etc/:. .. hosts modprobe.conf modprobe.conf~ modprobe.conf.dist sysconfig

etc/sysconfig:. .. firstboot iba ipoib.cfg ipoib.cfg-sample network-scripts

etc/sysconfig/iba:. .. busdrv.conf iba_mon.conf iba_mon.conf-sample iba_stat.confiba_stat.conf-sample uvp.conf version

etc/sysconfig/network-scripts:. .. ifcfg-eth0 ifcfg-eth1 ifcfg-ib1 ifcfg-lo

Page 40: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential40

Discussions

Page 41: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential41

Topics to discuss

Which stack? QuickSilver or OFED+?Which MPI/tools/etc?Multiple HCAs?Performance discussion – kernel benchmarks or application benchmarks?

Page 42: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential42

How to utilize multiple HCAs?

Bonding? (Current State of art is only IPoIB)Load Balancing

By default VIADEV_PATH_METHOD=4 is used. This causes all HCAs in the system to be used equally.

If a system has more than 1 HCA, this is equivalent toVIADEV_PATH_METHOD=3,

If a system has only 1 HCA, this is equivalent to VIADEV_PATH_METHOD=0.

The MPI processes on that node will be evenly distributed among all HCAs with active ports

All HCAs with Active ports must be connected to the same fabric as all other nodes in the job Jobs may fail to start if the MPI job includes some nodes with 1HCA and some nodes with multiple HCAs.

Page 43: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential43

MPI Performance with QLogic HCAs

0.45~1.4M/s

3.5~2.6 µs

0.9~1.4 GB/s

QuickSilverSDR~DDR

.08 GUPs

8M/s

2.0 µs

1.56 µs

0.9 GB/s

QLE7140 SDR

2.3 µs1.4 µsHPCC Random Ring Latency @ 32 cores

ScalableLatency

1.5 GB/s1.9 GB/sOSU BandwidthPoint-to-Point Bandwidth

.02 GUPs.09 GUPsHPCC MPI Random Access @ 32 cores

ScalableMessage Rate(non-coalesced)

6 M/s11 M/sOSU multi_bw @ 4ppn

Point-to-Point Message Rate(non-coalesced)

1.5 µs1.4 µsOSU LatencyPoint-to-PointLatency (0 Byte)

ConnectX DDR

QLE7280 DDR

BenchmarkComparison

Scaling results are for 8 nodes / 32 cores.

Latency results include a single switch crossing.NEW

Page 44: Complete InfiniBand From QLogic - Computer Science€¦ · Complete InfiniBand From QLogic Wenhao Wu HPC Systems Engineer ... Q1 2004. Operational: March 1, 2004 • 512-node cluster

QLogic Confidential44

Thank You!