19
Sonexion 1600 Mellanox InfiniBand Switch Firmware Update 1.5

About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

  • Upload
    dotruc

  • View
    228

  • Download
    3

Embed Size (px)

Citation preview

Page 1: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Sonexion 1600 Mellanox InfiniBand SwitchFirmware Update 1.5

Page 2: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

ContentsAbout Sonexion 1600 Mellanox InfiniBand Switch Firmware Update 1.5..................................................................3

Introduction................................................................................................................................................................4

Configure the Switch for the First Time......................................................................................................................6

Update Firmware on the Switch.................................................................................................................................9

()

2--

Page 3: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

About Sonexion 1600 Mellanox InfiniBand SwitchFirmware Update 1.5

This publication describes procedures to update and configure the firmware in the Mellanox InfiniBand switch in aSonexion 1600 system running release 1.5. The procedures are intended for Cray service technicians.

Typographic Conventions

Monospace A Monospace font indicates program code, reserved words or library functions,screen output, file names, path names, and other software constructs

Monospaced Bold A bold monospace font indicates commands that must be entered on a commandline.

Oblique or Italics An oblique or italics font indicates user-supplied values for options in the syntaxdefinitions

Proportional Bold A proportional bold font indicates a user interface control, window name, orgraphical user interface button or control.

Alt-Ctrl-f Monospaced hypenated text typically indicates a keyboard combination

Record of Revision: publication HR5-6122Note: For some releases, this subject was incorporated in the Release Upgrade Guide.

Publication Number Date Sonexion Release Comments

HR5-6122-0 December 2012 1.2.0 First publication

HR5-6122-A September 2015 1.5

()

3--

Page 4: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

IntroductionThis procedure describes the steps required to set the required level of the firmware on the Mellanox 5035InfiniBand switches for Sonexion 900 and Sonexion 1600 1.3.1, 1.4.0 and 1.5.0 systems.

For Sonexion 900 systems, this procedure should only be used for Mellanox IS5035 switches that have beenpurchased directly from Cray. If site personnel are uncertain whether the Sonexion 900 system's IS5035 switcheshave been purchased from Cray, contact Cray Support.

Prerequisites

This section specifies the prerequisite information required before starting the procedure. The sections belowdetail the system access requirements, service interruption type, and estimated service time required for theprocedure.

System Access RequirementsFor most Sonexion procedures, it is recommended that the technician log in as anadministrative (admin) user and perform the procedure using CSCLI commands. However,root access is required to perform this procedure on a Sonexion system. If you do not haveroot system access, contact Cray Support.

Service Interruption LevelThis procedure has a service interruption level of Interrupt. The procedure requires takingthe Lustre file system offline.

Estimated Service IntervalThe estimated length of time to complete this procedure is one hour. Schedule anappropriate service interval with the customer.

Required FilesThe firmware image contains both management software and the 7.4.2200 firmware image.Contact Cray Support to obtain the latest firmware version.

Required Tools and Equipment

▪ Serial cable, RJ45 to DB9. Example: Cisco P/N: 72-3383-01

These null modem cables are "rolled over," meaning that pin 1 on connector A connects to pin 8 on connectorB, pin 2 on connector A connects to pin 7 on connector B, and so on. The wiring on one connector is oppositeto the other.

▪ Ethernet cable to connect the switch to the cluster’s management network

You may need more than one cable, depending on the number of Mellanox switches present in your cluster.

▪ USB-to-Serial adapter

()

4--

Page 5: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

If your laptop does not have a dedicated serial (DB9) port, you may need to use a USB-to-Serial adapter,shown in Figure 1. Refer to the adapter's instructions for installation and configuration details.

Figure 1. USB-to-Serial Adapter

()

5--

Page 6: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Configure the Switch for the First TimeIf the management interface has already been configured on the switch, go to the next topic, Update Firmware onthe Switch on page 9.

1. Using the supplied serial cable, connect the host PC to the CONSOLE port (RJ-45) of the switch system.The CONSOLE ports for MTS3610, IS5030 and IS5XXX systems are shown in the following figure asexamples. Use the serial cable shipped with the Mellanox switch or another compatible cable that has a RJ45connector (switch end) and a DB9 connector (PC end).

Figure 2. CONSOLE Ports for MTS3610, IS5030 and IS5XXX systems

Figure 3. Settings for the Serial Connection

2. Log in (from a serial terminal program) as admin, using the site's password.

()

6--

Page 7: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

a. Configure the switch by running the following commands:

Do you want to use the wizard for initial configuration? noswitch-11a11a [standalone: master] > enableswitch-11a11a [standalone: master] # config tswitch-11a11a [standalone: master] (config) # interface eth0switch-11a11a [standalone: master] (config interface eth0) # no dhcpswitch-11a11a [standalone: master] (config interface eth0) # ip address 172.16.250.10 255.255.0.0switch-11a11a [standalone: master] (config interface eth0) # no shutdownswitch-11a11a [standalone: master] (config interface eth0) # exitswitch-11a11a [standalone: master] (config) # exitswitch-11a11a [standalone: master] # write memoryswitch-11a11a [standalone: master] # show interface eth0

Following is an example output:

Do you want to use the wizard for initial configuration? noswitch-11a11a [standalone: master] > enableswitch-11a11a [standalone: master] # config tswitch-11a11a [standalone: master] (config) # interface eth0switch-11a11a [standalone: master] (config interface eth0) # no dhcpswitch-11a11a [standalone: master] (config interface eth0) # ip address 172.16.250.10 255.255.0.0switch-11a11a [standalone: master] (config interface eth0) # no shutdownswitch-11a11a [standalone: master] (config interface eth0) # exitswitch-11a11a [standalone: master] (config) # exitswitch-11a11a [standalone: master] # write memoryswitch-11a11a [standalone: master] # show interface eth0Interface eth0 stateAdmin up: yesLink up: noIP address: 172.16.250.10Netmask: 255.255.0.0Speed: 10Mb/s (auto)Duplex: half (auto)Interface type: EthernetInterface source: physicalMTU: 900HW address: 00:02:C9:11:A1:1AComment:RX bytes: 0 TX bytes: 0RX packets: 0 TX packets: 0RX mcast packets: 0 TX discards: 0RX discards: 0 TX errors: 0RX errors: 0 TX overruns: 0RX overruns: 0 TX carrier: 0RX frame: 0 TX collisions: 0TX queue len: 1000

The procedure above configures the IP address for the lower switch in the base rack. If your installationhas more than one switch and more than one rack, then IP addresses for the switches should beassigned in the following order:

Base rack - lower switch - 172.16.250.10Base rack - upper switch - 172.16.250.11For the remaining racks, add "10" to the last octet of the IP addresses above, e.g.,1st expansion rack - lower switch - 172.16.250.20

()

7--

Page 8: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

1st expansion rack - upper switch - 172.16.250.212nd expansion rack - lower switch - 172.16.250.302nd expansion rack - upper switch - 172.16.250.31And so on.The netmask for all switches is 255.255.0.0

The procedure to configure the switch is complete.

()

8--

Page 9: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Update Firmware on the Switch1. Stop client I/O and unmount all Lustre clients. The steps to complete these actions can vary by site

configuration. In a basic configuration, run the client unmount command as the root user:

[Client]# umount lustre_mount_point

2. Log in to the primary MGMT node via SSH:

[Client]$ ssh -l admin primary_mgmt_node

3. Obtain the hostname of the MDS node (Sonexion 1600) or MGS/MDS node (Sonexion 900). If the Lustre filesystem name is not known, it displays in the command output:

[admin@n000]$ cscli fs_info

Sample output for a Sonexion 1600 system:

[admin@snx11000n000 ~]$ cscli fs_info------------------------------------------------------------------------------------Information about "ssetest" file system:------------------------------------------------------------------------------------Node Node type Targets Failover partner Devices------------------------------------------------------------------------------------snx11000n005 oss 4 / 4 snx11000n004 /dev/md1, /dev/md3, /dev/md5, /dev/md7snx11000n004 oss 4 / 4 snx11000n005 /dev/md0, /dev/md2, /dev/md4, /dev/md6snx11000n003 mds 1 / 1 snx11000n002 /dev/md66snx11000n002 mgs 0 / 0 snx11000n003

Sample output for a Sonexion 900 system:

[admin@snx11000n000 ~]$ cscli fs_info------------------------------------------------------------------------------------OST Redundancy style: Declustered Parity (MDRAID)Disk I/O Integrity guard (ANSI T10-PI) is not supported by hardware------------------------------------------------------------------------------------Information about "ssetest" file system:------------------------------------------------------------------------------------Node Role Targets Failover partner Devices------------------------------------------------------------------------------

()

9--

Page 10: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

------snx11000n000 mgmt 0 / 0 snx11000n001snx11000n001 mgsmds 1 / 1 snx11000n000 /dev/md66snx11000n002 oss 4 / 4 snx11000n003 /dev/md0, /dev/md2, /dev/md4, /dev/md6snx11000n003 oss 4 / 4 snx11000n002 /dev/md1, /dev/md3, /dev/md5, /dev/md7

If the MDS node (Sonexion 1600) or MGSMDS node (Sonexion 900) is in a failover state, the target displaysas 0 / 1 and the failover node displays as 1 / 0. In this case, the listed failover partner should be used in thenext step.

4. On the MDS node (Sonexion 1600) or MGSMDS node (Sonexion 900), confirm that all Lustre clients areunmounted:

[admin@n000]$ ssh mds/mgsmds_nodename "lctl get_param '*.*.exports.*.uuid'"

Sample output showing one client and Lustre running:

[admin@snx11000n000 ~]$ ssh snx11000n003 "lctl get_param '*.*.exports.*.uuid'"[email protected]=2412630a-db8d-806a-09eb-0690c8e1e86b

Make every effort to identify and stop the Lustre clients. In the sample output, 172.18.1.188@o2ibrepresents a client with the Lustre file system mounted that needs to be addressed. If all clients cannot beunmounted, refer to the NOTE in Step 5 for command options.

Sample output showing no clients running and Lustre started on the Sonexion configuration:

[admin@snx11000n000 ~]$ ssh snx11000n003 "lctl get_param '*.*.exports.*.uuid'"[admin@snx11000n000 ~]$

Sample output showing no clients running and Lustre not started on the Sonexion configuration:

[admin@snx11000n000 ~]$ ssh snx11000n003 "lctl get_param '*.*.exports.*.uuid'"error: get_param: /proc/{fs,sys}/{lnet,lustre}/*/*/exports/*/uuid: Found no match

5. Once all clients have been identified and unmounted, stop the Lustre file system:

[admin@n000]$ cscli unmount -f filesystem_name

If specific clients were not identified and unmounted, the Lustre file system unmount process can be forced byadding --evict --force to the unmount command. Sample output:

[admin@snx11000n000 ~]$ cscli unmount -f testfsunmount: No resources found on nodes snx11000n[000-001] for "testfs" file systemunmount: stopping testfs on snx11000n[002-003]...unmount: stopping testfs on snx11000n[004-005]...unmount: testfs is stopped on snx11000n[002-003]!unmount: testfs is stopped on snx11000n[004-005]!unmount: MGS is stopping...unmount: MGS is stopped!unmount: File system testfs is unmounted.

()

10--

Page 11: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Sample output showing the use of force client evictions:

[admin@snx11000n000 ~]$ cscli unmount -f testfs --force --evictunmount: evicting lustre clients...unmount: clients are evictedunmount: No resources found on nodes snx11000n[000-001] for "testfs" file systemunmount: stopping testfs on snx11000n[002-003]...unmount: stopping testfs on snx11000n[004-005]...unmount: testfs is stopped on snx11000n[002-003]!unmount: testfs is stopped on snx11000n[004-005]!unmount: MGS is stopping...unmount: MGS is stopped!unmount: File system testfs is unmounted.

6. Verify that the Lustre file system has stopped on all nodes:

[admin@n000]$ cscli fs_info

Sample output for a Sonexion 1600 system:

[admin@snx11000n000 ~]$ cscli fs_info------------------------------------------------------------------------------------Information about "ssetest" file system:------------------------------------------------------------------------------------Node Node type Targets Failover partner Devices------------------------------------------------------------------------------------snx11000n005 oss 0 / 4 snx11000n004 /dev/md1, /dev/md3, /dev/md5, /dev/md7snx11000n004 oss 0 / 4 snx11000n005 /dev/md0, /dev/md2, /dev/md4, /dev/md6snx11000n003 mds 0 / 1 snx11000n002 /dev/md66snx11000n002 mgs 0 / 1 snx11000n003

Sample output for a Sonexion 900 system:

[admin@snx11000n000 ~]$ cscli fs_info------------------------------------------------------------------------------------OST Redundancy style: Declustered Parity (MDRAID)Disk I/O Integrity guard (ANSI T10-PI) is not supported by hardware------------------------------------------------------------------------------------Information about "ssetest" file system:------------------------------------------------------------------------------------Node Role Targets Failover partner Devices------------------------------------------------------------------------------------snx11000n000 mgmt 0 / 0 snx11000n001snx11000n001 mgsmds 0 / 1 snx11000n000 /dev/md66snx11000n002 oss 0 / 4 snx11000n003 /dev/md0, /dev/md2, /dev/md4, /dev/md6snx11000n003 oss 0 / 4 snx11000n002 /dev/md1, /dev/md3, /dev/md5, /dev/md7

()

11--

Page 12: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

In the command output, under the Targets heading, look for "0" in the first character position. For example,0 / 4 or 0 / 1. If all values are "0" you may continue, otherwise repeat Steps 5 and 6.

7. Change to the root user:

[admin@n000]$ sudo su -

8. Verify that the switch is connected to the cluster’s management network switch using the Ethernet cable.

9. Connect your laptop to the management network switch (using any available port) and launch a browser.

10. Enter the IP address assigned to the switch’s MGMT port.

11. When the Management Console appears, enter the login credentials (default login entries are admin/admin oradmin/Cray). The Mellanox FabricIT Management Console launches, as shown in the following figure.

Figure 4. Mellanox FabricIT Management Console

12. Access the system modules by clicking the SYSTEM icon (second icon in the toolbar). The System Modulesscreen displays, as shown in the following figure.

()

12--

Page 13: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 5. Mellanox FabricIT: System Modules Screen

13. From the System Modules screen, select “FabricIT Upgrade” from the left-side menu, shown in the followingfigure.

()

13--

Page 14: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 6. Mellanox FabricIT: FabricIT Upgrade Screen

14. Load the new image-PPC file.

a. Select the “Install from local file” radio button.

b. Click “Choose File”.

c. Browse to the new firmware file (image-PPC) on your laptop and click “Open”. In this example,the newimage-PPC file is version 1.1.2700, as shown in the preceding figure. The FabricIT Upgrade screenupdates and displays the selected image-PPC file, as shown in the following figure.

()

14--

Page 15: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 7. Mellanox FabricIT: Install from Local File

15. Install the new image-PPC file by clicking the Install Image button at the bottom of the screen. The progressbar displays the image’s installation status, as shown in the following figure.

()

15--

Page 16: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 8. Mellanox FabricIT: Install Image (Progress Bar)

When the image is fully installed, the FabricIT Upgrade Status screen updates to indicate all steps arefinished and the installation was successful, as shown in the following figure.

()

16--

Page 17: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 9. Mellanox FabricIT: Image Installation Success

16. Click the OK button to advance to the next screen.

()

17--

Page 18: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 10. Mellanox FabricIT: Installed FabricIT Image

17. Activate the newly installed image by clicking the reboot link (blue) at the bottom of the screen, as shown inthe preceding figure. The System Reboot screen loads, as shown in the following figure.

()

18--

Page 19: About Sonexion 1600 Mellanox InfiniBand Switch Firmware Update

Figure 11. Mellanox FabricIT: System Reboot Screen

18. Click the Reboot button to start the switch reboot process. The Reboot dialog box displays.

19. Verify the reboot operation by clicking the Reboot button. Wait 5 minutes (approximately) for the switch toreboot. When the reboot is underway, the Management Console updates to indicate you are logged out of theswitch, as shown in the following figure.

Figure 12. Mellanox FabricIT: Logged Out Message

When the reboot operation is complete, the switch firmware has been successfully updated.

20. Verify the correct firmware level is running by logging back into the switch and checking that the new image(correct image-PPC version) is indicated on the summary page.

21. If the Sonexion system contains additional IB network switches, repeat this procedure on each switch toupdate its firmware level.After firmware is updated on the last switch, disconnect your laptop from the management switch, Log in tothe primary MGMT node, and start the Lustre file system:

[admin@n000]$ cscli mount -f filesystem_name

The Lustre file system restarts. This completes the procedure to upgrade firmware on a Mellanox 5035 IBnetwork switch.

()

19--