Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
InfiniBand diagnostics tools
HPC Advisory Council
Switzerland Workshop
March 21-23, 2011
Erez Cohen - Sr. Director of Field Engineering
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 2
OFED Tools
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 3
IBDIAG and other OFA tools
Single Node SRC/DST Pair Network
Ibdiagnet
ibnetdiscover
ibhosts
Ibswitches
saquery
sminfo
smpdump
Ibdiagpath
ibtracert
ibv_rc_pingpong
ibv_srq_pingpong
ibv_ud_pingpong
ib_send_bw
ib_write_bw
ibv_devinfo
ibstat
Ibportstate
ibroute
smpquery
perfquery
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 4
ibstat • displays basic information obtained from the local IB driver.
• Normal output includes Firmware version, GUIDS, LID, SMLID, port
state, link width active, and port physical state.
• Has options to list CAs and/or Ports.
ibv_devinfo • Reports similar information to ibstat
• Also includes PSID and an extended verbose mode (-v).
/sys/class/infiniband • File system which reports driver and other ULP information.
- e.g. [root@ibd001 /]# cat /sys/class/infiniband/mlx4_0/board_id
MT_04A0110002
HCA Device information
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 5
perfquery • Obtains and/or clears the basic performance and error counters from the
specified node
• Can be used to check port counters of any port in the cluster using
„perfquery <lid> <port number>‟
ibportstate • Query, change state (i.e. disable), or speed of Port
- ibportstate 38 1 query
ibroute
• Dumps routes within a switch
smpquery • Dump SMP query parameters, including:
- nodeinfo, nodedesc, switchinfo, pkeys, sl2vl, vlarb, guids
Node management utilities
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 6
ibswitches
• Lists all switches in cluster
ibhosts
• Lists all HCAs in cluster
ibtracert
• Shows path between two lids
- [root@ibd001 mft-2.5.0]# ibtracert -G 0x0002c90300001481 0x0002c90300001489
From ca {0x0002c90300001480} portnum 1 lid 12-12 "ibd017 HCA-1"
[1] -> switch port {0x000b8cffff002772}[5] lid 39-39 "MT47396 Infiniscale-III Mellanox Technologies"
[6] -> ca port {0x0002c90300001489}[1] lid 15-15 "ibd012 HCA-1"
To ca {0x0002c90300001488} portnum 1 lid 15-15 "ibd012 HCA-1"
Cluster utilities
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 7
Integrated diagnostic tools
• Queries cluster topology and indicates any port errors, link width, or link speed
mismatch.
• Automates calls to many “low level” operations
Easy to use
• Similar flags, logs and reports for both tools
• Report using meaningful names when topology file is provided
Cluster utilities - ibdiagnet / ibdiagpath
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 8
-i <dev-index> -p <port-num> • Device index (0..N) and port number connected to the network
-o <out-dir> • Directory to output the reports to
-lw <1x|4x|12x> -ls <2.5|5|10> • Link speed and width checked on every port on the network
-pm -pc • Perform error counters extensive check or clear counters respectively
-r • Extensive additional checks performed.
-P • Sets threshold for error levels. Also checks for errors of counters based on
absolute value of the error counter. When not using –P flag, error thresholds are only triggered based on how many errors were incremented DURING the ibdiagnet run.
-c • Packets to be sent on each link for error level checking
-h –V -v
• Help, Verbosity and Revision flags respectively
ibdiagnet - Optional flags
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 9
Ibdiagnet is particularly useful in finding misconfigured links (speed/width, topology mismatches, and marginal link/cable issues.
Typical usage: • Clear all port counters using „ibdiagnet –pc‟ • Stress the cluster • Check cluster using „ibdiagnet –lw 4x –ls 5 –P all=1
- Checks for link speed, link width, and port error counters greater than 1
Ibdiagnet usage
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 10
Reports a complete topology of cluster
Shows all interconnect connections reporting:
• Port LIDs
• Port GUIDs
• Host names
• Link Speed
GUID to name file can be used for more readable topology in
regards to switch devices
Cluster utilities - ibnetdiscover
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 11
Simple usage is: ibnetdiscover –node-name-map <guid to name file>
Cluster utilities - ibnetdiscover
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 12
SymbolErrors • Total number of minor link errors. Usually an 8b/10b error due to a bit error
Link Recovers • Total number of times the Port Training state machine has successfully completed the link error recovery
process.
LinkDowned • Total number of times the Port Training state machine has failed the link error recovery process and downed
the link.
RcvErrors • Total number of packets containing an error that were receive on the port. Usually due to a CRC error caused
by a bit error within the packet.
RcvSwRelayErrors • Total number of packets received on the port that were discarded because they could not be forwarded by the
switch relay. This counter should typically be ignored since Anafa-II has a bug that counts these when it gets a multicast packet on a port where that port also belongs to the multicast group of the packet.
XmtDiscards • Total number of outbound packets discarded by the port because the port is down or congested. Usually due
to the output port HOQ lifetime being exceeded.
VL15Dropped • Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port
XmtData,RcvData • Total number of 32-bit data words transmitted and received on the port.
XmtPkts,RcvPkts • Total number of data packets transmitted and received on the port.
Error counter review
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 13
Run performance tests • /usr/bin/ib_write_bw
• /usr/bin/ib_write_lat
• /usr/bin/ib_read_bw
• /usr/bin/ib_read_lat
• /usr/bin/ib_send_bw
• /usr/bin/ib_send_lat
Usage
• Server: <test name> <options>
• Client: <test name> <options> <server IP address>
Performance tests
Note: Same options must be passed to both server and
client. Use –h for all options.
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 14
UFM Unified Fabric Management
© 2011 MELLANOX TECHNOLOGIES 15
Today‘s HPC Fabric Challenges
Undetected issues, unutilized fabric
Troubleshooting takes long
Separate systems
Unnoticed performance degradation
Application based class of service
Multitenancy - affecting each other
Size & complexity
Separate systems
Manual error prone change management
15
Operational
Performance
Troubleshooting
© 2011 MELLANOX TECHNOLOGIES 16
UFM Essence
Provides Deep Visibility • Real-time and historical monitoring of fabric health and performance
• Central fabric dashboard
• Unique fabric-wide congestion map
Optimizes performance • Quality of Service
• Traffic Aware Routing Algorithm (TARA)
• Multicast routing optimization
Eliminates Complexity • One pane of glass to monitor and configure fabrics of thousand of nodes
• Enable advanced features like segmentation and QoS by automating provisioning
• Abstract the physical layer into user friendly entities such as jobs and resource groups
Maximizes Fabric Utilization • Threshold based alerts to quickly identify issues
• Performance optimization for maximum link utilization
• Master-standby HA architecture synchronized in real-time
16
© 2011 MELLANOX TECHNOLOGIES 17
Open system
Extensible architecture based on Web-services • Open API for users or 3rd party extensions
• Expose entire fabric and datacenter object model
• API Documentation and example tools
Provides enhanced functionality in various
areas • Group/batch device management tasks
• Enhanced functionality (e.g. e-mail event notifications)
• Export information to external portals to view system
information
Integrated with Job Schedulers • Adaptive Computing: Moab
• Platform Computing: LSF
• Altair: PBS Pro
17
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 18
Features Detailed Overview
18
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 19
Dashboard Tab
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 20
View Tab
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 21
View Tab - Internal Structure & Properties
Internal structure
Properties
Common Tasks
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 22
Manage Devices Tab
Lists all the physical hardware components for the selected site : • server, or switch.
The information is displayed in tabular form and includes the following Device information types : • State, ID, Name, IP address, Vendor, CPU type, RAM,
• Which Rack it belongs, the FW Version , Temperature
• Agent , Logical server it belongs to
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 23
Design Window
© 2011 MELLANOX TECHNOLOGIES 24
Advanced Monitoring and Analysis
Monitor & analyze fabric performance • B/W utilization
• Unique congestion monitoring
• Dashboard for aggregated fabric view
Real-time fabric-wide health monitoring • Monitor events and errors through-out the fabric
• Threshold based alarms
• Granular monitoring of host and switch parameters
Innovative congestion mapping • One view for fabric-wide congestion and traffic patterns
• Enables root cause analysis for routing, job placement
or resource allocation inefficiencies
24
© 2011 MELLANOX TECHNOLOGIES 25
Unique Monitoring Engine
25
Sessions per Logical
Groups – no need to
know physical nodes
Multiple sessions
On demand
Correlate switch and host
information
Various graphs (linear,
bar, historgram, pie…)
Keep Historical Data
From 1 Min to 1 Month
Formulas (AVG, Max,
Min, Sum)
© 2011 MELLANOX TECHNOLOGIES 26
UFM’s Unique Traffic & Congestion Map
Traffic pattern and overall fabric
condition
Identify multi to one scenarios
…Or Non-Optimized routing
…Or Slow receivers
… or Non-Optimized links…
Saves many hours of troubleshooting
26
Innovative b/w and congestion representation that provides fabric
health at a glance in a most effective way
© 2011 MELLANOX TECHNOLOGIES 27
Granular Fabric Control
27
Active tables – sortable, searchable and ‘filterable’
Real-time monitoring of port health and performance counters
Alarms per device view
Automate device management tasks
QDR CCM Aggregation from switches
All connected to the logical model
© 2011 MELLANOX TECHNOLOGIES 28
Event Management
28
Dozens of traffic and health events
Easy central drill-down to counters, alerts
and events to the port level
Configurable thresholds and criticality
levels
Alerts correlated to the application level
SNMP Traps to 3rd party systems
Script based action
© 2011 MELLANOX TECHNOLOGIES 29
Performance Optimization Toolbox
29
Quality of Service
Application isolation
Collective offload RDMA messaging bus
Congestion Control
Traffic Aware Routing Algorithm
isolation
© 2011 MELLANOX TECHNOLOGIES 30
Quality of Service Optimization
30
UFM Enables Isolation and QoS Optimizations
© 2011 MELLANOX TECHNOLOGIES 31
Traffic Aware Routing Algorithm (TARA)
A unique new routing algorithm on top of OpenSM • TARA is optimizing the routing according to topology, jobs and traffic direction
TARA provides the following benefits • Reduces competition between fabric resources, thus decreasing congestion
• Increases available bandwidth, resulting in improved fabric utilization
• Delivers lower latency and shorter application runtime
Customer case • TARA improved performance up to 300% (up to 4 times more b/w available for the application).
• The average improvement achieved was
100%, (available bandwidth doubled on average)
• Improvement magnitude is factor of
traffic patterns and available links
31
TARA increased b/w available for the application 4 times
© 2011 MELLANOX TECHNOLOGIES 32
UFM TARA Improves Fabric Utilization
32
NO UFM TARA UFM TARA is ON
© 2011 MELLANOX TECHNOLOGIES 33
Integration with Job Schedulers
Automatic fabric provisioning per job
QoS and TARA performance
optimization
Job oriented monitoring and events
Supported Schedulers:
• Moab (Adaptive Computing)
• LSF (Platform Computing)
• PBSPro (Altair )
33
The first integrated solution that correlates fabric management and
workload management for dynamic data centers
© 2011 MELLANOX TECHNOLOGIES 34
UFM in HPC Cluster
34
Workload Submitted in
Workload Manager
Matching workloads
Automatically Created in UFM
Application Level Monitoring
& Optimization Measurements
Fabric-wide Policy Pushed to Match
Application Requirements
© 2011 MELLANOX TECHNOLOGIES 35
Scaling Out
Large clusters pose management
challenges
• Topology map is overloaded with devices and
become inefficient for fabric analysis
• Slow discovery and updates
Optimizations made:
• Load physical map only on demand
• User experience: sees correlation between his actions and GUI response time
• Display only switch connectivity when “switch”
tree is selected
• Shorter update time
• “Cleaner” map view, no unnecessary clutter
35
GUI map is populated
Only when pressing “play” button
“Only switches” view
Screenshot from a 4K node cluster in the US
© 2011 MELLANOX TECHNOLOGIES 36
Summary: UFM Benefits
36
Simple and Automated
Lowers administration tasks
time from days to minutes
Increased Performance
Reduce congestion, lower latency
Quicker application runtime
Little Fabric Visibility
Unnoticed performance degradation
Difficult to assess impact
Low Performing Unutilized Fabrics
Arbitrary routing algorithms, QoS seldom implemented
Congested fabrics, latency affected
Complex and Manual Processes
Needs admin skills
Many options left unused at all
Ineffective Troubleshooting
Long troubleshooting time
Performance issues take days to analyze
Quick Issue Resolution
Dashboard, Alarms, Congestion Map
Reduces downtime, high fabric utilization
In-Depth Visibility and Control
Clear health and performance visualization
Business oriented impact and root analysis
Fabrics w/o UFM UFM Customers
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 37
Hands On
© 2011 MELLANOX TECHNOLOGIES 38
InfiniBand diagnostics tools – Hands On
Set up
• 2 servers with ConnectX HCA running SLES 11
• 8 port QDR IB switch based on InfiniScale 4 switch silicon
Steps
• Check HCA state
• Review /sys/class/infiniband filesystem
• Inventory: ibswitches, ibhost
• Ibnetdiscover
• perfquery, ibportstate, smpquery
• Ibdiagnet
• Performance test
© 2011 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 39 39
Thank You www.mellanox.com