34
Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Embed Size (px)

Citation preview

Page 1: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Sonexion™ 3000 Release Notes (2.1.0-002)S-2533

Page 2: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Contents1 About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533...................................................................................3

2 Sonexion 3000 Terms, Abbreviations, and Definitions...........................................................................................5

3 Software Versions and Requirements..................................................................................................................11

4 What Is Supported in Sonexion 2.1.0...................................................................................................................12

5 Sonexion 3000 Components and Hardware List..................................................................................................14

6 Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002................................................................17

7 Firmware...............................................................................................................................................................27

8 Notices and Precautions.......................................................................................................................................33

Contents

2

Page 3: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

1 About Sonexion™ 3000 Release Notes (2.1.0-002)S-2533

This guide includes information about bugs, features, and components in the Sonexion 3000 (2.1.0-002) release.

Release 2.1.0 SU 002This is the initial release of this publication. This version includes updated information about Sonexion softwarerelease 2.1.0-002, released November 2016. This information pertains only to model Sonexion 3000, not to earliermodels.

Table 1. Record of Revision

Publication Title Date Updates

Sonexion™ 3000 Release Notes2.1.0-002 S-2533

November 2016 This is the original release of thisdocument, released for 2.1.0 SU002

Scope and AudienceThis publication is written for Cray personnel and users, to familiarize themselves with this release and model. Itdoes not include information about installation, repair, nor day-to-day operation of a Sonexion 3000 system.

FeedbackVisit the Cray Publications Portal at http://pubs.cray.com and make comments online using the Contact Us buttonin the upper-right corner or Email [email protected]. Your comments are important to us and we will respond within24 hours.

Typographic ConventionsMonospace Indicates program code, reserved words, library functions, command-line prompts,

screen output, file/path names, and other software constructs.

Monospaced Bold Indicates commands that must be entered on a command line or in response to aninteractive prompt.

Oblique or Italics Indicates user-supplied values in commands or syntax definitions.

Proportional Bold Indicates a GUI Window, GUI element, cascading menu (Ctrl→Alt→Delete), orkey strokes (press Enter).

\ (backslash) At the end of a command line, indicates the Linux® shell line continuation character(lines joined by a backslash are parsed as a single line).

About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

3

Page 4: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

TrademarksThe following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY anddesign, SONEXION, URIKA, and YARCDATA. The following are trademarks of Cray Inc.: APPRENTICE2,CHAPEL, CLUSTER CONNECT, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX, LIBSCI,NODEKARE. The following system family marks, and associated model number marks, are trademarks of CrayInc.: CS, CX, XC, XE, XK, XMT, and XT. The registered trademark LINUX is used pursuant to a sublicense fromLMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used inthis document are the property of their respective owners.

About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

4

Page 5: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

2 Sonexion 3000 Terms, Abbreviations, and Definitions2U4N, 2U 4-Node Intel server

Generic term referring to the Intel 2U 4-Node server used for the Sonexion 2000 CMU andthe CNG.

ADU, Additional DNE UnitDeprecated term that refers to the additional MDS nodes and MDT storage supported by theLustre DNE (Distributed Namespace) Phase 1 feature. For the 3000, this term issynonymous with the additional MMUs that may be optionally installed in up to 8 storageracks, 1 per rack.

Base MMU, Base Metadata Management UnitThe base MMU is the MMU that is always installed in the base rack. The base MMUprovides two MDS nodes along with two MDTs. MDT0 functions as the default MDT and asthe root MDT for DNE phase 1. MDT1 requires DNE in order to be utilized.

Base RackThe first rack in a Sonexion storage cluster that contains the SMU and base MMU alongwith the rack networking infrastructure and from 1 to 6 SSUs.

CLI, Command Line Interface

A text-based interface that is used to operate software and operating systems.

CLP, Sonexion Linux PlatformBase OS used by all the rack components.

CNG, CIFS/NFS Gateway2U4N server configured to export the Lustre file system to CIFS2 and NFS clients.

CMU, Cluster Management UnitThe Sonexion component that provides the physical deployment of the MDS, MGS, MGMTserver nodes and associated storage. This term is deprecated for the 3000 platform and isfunctionally equivalent to the combination of SMU and base MMU installed in the base rack.

Critical, Critical Array StateThe state of a GridRAID or MDRAID array where the subsequent failure of 1 more storagecomponent may lead to data loss.

CSI, Cray Sonexion InstallerSonexion software used for manufacturing and installing Sonexion systems.

CSSM, Cray Sonexion System ManagerSonexion platform, software and hardware management system.

CSMS, Cray Sonexion Management ServerSonexion MGMT node. The primary and secondary instances of the CSSM software and allassociated components and services running on a server node in the CMU.

Sonexion 3000 Terms, Abbreviations, and Definitions

5

Page 6: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

CMU StorageSonexion storage enclosure dedicated to the CMU. Deprecated for the Sonexion 3000, asthe SMU and MMU have their own storage resources.

Data BlockA component of a “parity group” (or “stripe”) containing actual user data, also referred to asa “data chunk” or “data unit.”

Degraded, Degraded Array StateThe state of a GridRAID or MDRAID array operating with one failed storage component.

Distributed Spare, Distributed Spare VolumeThe aggregate collection of distributed spare data blocks in a GridRAID array thatcomprises a single logical spare volume for the specific GridRAID array that contains it.Each distributed spare contains the equivalent of one physical drive’s worth of distributedspare space and is used as the target of the GridRAID reconstruction process and theprimary data source for the GridRAID rebalance process.

DMN, Dual LMNRefers to the “Dual local Management Networks” aka “Dual Management Networks” feature.

DNE, Distributed NamespaceLustre DNE Phase 1 feature supported in Lustre 2.5 that allows multiple MDS/MDTcomponents to operate within a single file system.

EAC, Embedded Application ControllerSBB form factor x86 base application controller provides the CPU platform for codeexecuting as part of the Sonexion file system cluster components.

EAN, External Administration NetworkCustomer administration network, external to the Sonexion solution. Connected to theCSMS nodes in order to provide access to the CSM software.

ECN, Enterprise Client NetworkRefers to the 10GbE or 40GbE data network connecting non-Lustre enterprise clients to theoptional CIFS NFS Gateway (CNG).

ESM, Embedded Server ModuleDeprecated term for an Embedded Application Controller (EAC) because it implies generalserver functionality that is not supported on the dedicated Sonexion Embedded ApplicationControllers (EAC).

ESU, Expansion Storage UnitA 5U84 storage enclosure with two SAS EBOD controllers installed in place of the EACs.

Expansion RackThe additional racks (to the base rack) in a Sonexion storage cluster that contain the racknetworking infrastructure and some number of SSUs. Sometimes called "storage rack."

Failed, Failed Array StateThe state of a GridRAID or MDRAID array that has experienced data loss and has beenfailed by the system.

GB/sec, GigaBytes per Second10^9 Bytes per second

Gbit/sec, Gigabit per Second10^9 bits per second

Sonexion 3000 Terms, Abbreviations, and Definitions

6

Page 7: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

GbE, Gigabit EthernetEthernet standard that transmits at 1 gigabit per second.

GridRAIDSonexion implementation of parity declustered RAID. A RAID level organization thatcombines RAID 6 data protection with a declustering methodology. GridRAID overcomessingle drive throughput bottlenecks by distributing parity groups and spare space across allstorage components in an array.

ICL, Inter-Controller LinkA link that connects two controllers or two servers together. Used in Sonexion as adedicated HA communication path.

ISL, Inter-Switch LinkA connection between two related switches.

KiB, Kibibyte1024 bytes

LCN, Lustre Client NetworkHigh speed data network connecting Lustre clients to the Sonexion Local Data Switches(LDS).

ldiskfs, Lustre Disk File SystemLustre version of a patched Ext4 file system.

LDN, Local Data NetworkA dual InfiniBand or 40GbE network with switches installed in all racks, connecting allservers and enclosures as needed and used as uplink points to the end user clientinfrastructure.

LDS, Local Data SwitchA dual InfiniBand or 10GbE network switch installed in a Sonexion rack as part of the LDNand used for providing high speed data connectivity. Used as uplink points to the end userclient infrastructure.

LMN, Local Management NetworkA private 1GbE network connecting all Sonexion servers and enclosures.

LMS, Local Management SwitchA 1GbE switch installed in a Sonexion rack as part of the LMN and used for providingprivate management network connectivity for all Sonexion servers and enclosures.

Lustre®Open source clustered file system trademarked by Xyratex/Seagate.

Lustre ServersThe set of Lustre servers that comprise the Lustre file system; includes the MGS, MDS, andmultiple OSSes.

MDS, Metadata ServerLustre server component that manages the Lustre file system metadata.

MDT, Metadata TargetLustre component, a storage volume that holds the Lustre file system metadata.

MGMT, Management Server Node

Sonexion 3000 Terms, Abbreviations, and Definitions

7

Page 8: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

One of two Sonexion management servers that provide management functions for thestorage cluster.

MGMT0The primary Sonexion management server, typically used for web access and SSH loginsfor managing the storage cluster.

MGMT1The secondary management server, typically used to provide boot services to nodes in thestorage cluster.

MGS, Management ServerLustre server component that manages the Lustre MGT.

MGT, Management TargetLustre component, the storage volume holding the Lustre file system management data thatallows clients to discover, mount, and operate the file system.

MMU, Metadata Management Unit2U24 with two EACs and associated storage provides dual MDS nodes and dual MDTs.Lustre requires the use of the built-in DNE phase 1 feature in order to make use of multipleMDTs and multiple concurrent MDSes.

NIS, Network Information ServiceMaintains and distributes a central directory of user and group information in a network.

Normal, Normal Array ActivityCharacterizes the activity of a GridRAID or MDRAID array that is engaged in processing I/Oonly and is not conducting any recovery, sync, or RAID checking activities.

Offline, Array Is OfflineThe array is not available.

Optimal, Optimal Array StateThe state of a GridRAID or MDRAID array where all drives in the array are operationalwithout the involvement of spare volumes or dedicated hot spares. For GridRAID this isequivalent to the “Redundant 0/2” terminology.

OSS, Object Storage ServerLustre server component that operates and manages the Lustre OSTs.

OST, Object Storage TargetLustre component, a storage volume that holds Lustre file system data.

Parity BlockComponent of a parity group that contains protection information for the group derived fromthe set of data blocks in the parity group. Also referred to as a “parity chunk” or “parity unit.”

Parity GroupThe set of “data blocks” and derivative “parity blocks” that together comprise a protecteddata set. Also referred to as a “stripe.”

RAID Check, RAID Consistency CheckThis is the process whereby the system periodically checks that the parity information isconsistent for every “parity group” (stripe) in the array. This process is sometimes referred toas “parity scrubbing.”

RAS System, Reliability, Availability, Serviceability SystemSonexion feature providing system RAS features.

Sonexion 3000 Terms, Abbreviations, and Definitions

8

Page 9: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Rebalance, Rebalance ProcessPhase 2 of the 2-phase GridRAID recovery process whereby a GridRAID array essentiallycopies reconstructed data from a distributed spare volume in the array to a physicalreplacement drive, freeing the distributed spare volume when complete for future reuse.

Rebalancing, Rebalancing Array ActivityCharacterizes the activity of a GridRAID array that is engaged in the rebalance phase of therecovery process.

Reconstructing, Reconstructing Array ActivityCharacterizes the activity of a GridRAID array that is engaged in the reconstruction phase ofthe recovery process.

Reconstruction, Reconstruction ProcessPhase 1 of the 2 phase GridRAID recovery process whereby a GridRAID array reconstructsthe data from a missing storage component onto one of the distributed spare volumes.

Recovering, Recovering Array ActivityCharacterizes the activity of a GridRAID or MDRAID array that is engaged in the recoveryprocess.

Recovery, Recovery ProcessThe process whereby a GridRAID or MDRAID array recovers from a storage componentfailure.

Rebuild, Rebuild ProcessThe single phase recovery process whereby an MDRAID array reconstructs data for a faileddrive and copies it to a dedicated replacement drive.

SED, Self-Encrypted DriveA disk drive that automatically encrypts/decrypts data to/from the media.

SMU, System Management Unit2U24 with dual EACs that provides two MGMT nodes and associated storage. There isalways only one SMU in a Sonexion file system cluster, and it is always installed in the baserack. In conjunction with the base MMU, the SMU replaces the functionality of the earlierSonexion CMU component.

Spare Volume, GridRAID Spare Volume or Distributed Spare VolumeThe aggregation of the equivalent of one drive's worth of distributed spare space consideredcollectively as a logical spare drive or volume and used as the target of the GridRAID repairoperation.

SSU, Scalable Storage Unit5U84 storage enclosure and dual EACs (Embedded Application Controllers), provides dualOSSes and associated storage.

SSU AdditionRefers to the process of increasing the storage capacity of a Sonexion file system byincorporating additional SSUs into the cluster.

SSU ExpansionRefers to the attachment of an ESU to each SSU, thus increasing the amount of storagemanaged by each SSU.

Storage ComponentRefers to an individual drive when considered as part of a configured GridRAID or MDRAIDarray.

Sonexion 3000 Terms, Abbreviations, and Definitions

9

Page 10: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Storage RackSee "Expansion Rack."

TB, Terabyte10^9 bytes

Sonexion 3000 Terms, Abbreviations, and Definitions

10

Page 11: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

3 Software Versions and RequirementsThis section provides information about the environment and software required for the Sonexion 2.1.0-002software release.

Cray Sonexion System Manager (CSSM) VersionVersion CSSM Sonexion 3000

Current Revision CSSM 2.1.0 Build v2.1.0-r29315,2016-06-30

SMU/MMU/SSU:GOBI OneStor USMSTX_GOBI_R1.16 ESU: USMr4.1.16

yes

Lustre Server (x86_64 Architecture)Version OperatingSystem Kernel FileSystem

Current Version Scientific Linux 6. 5 2.6.32-431.17.1.x2.1.32.x86_64

lustre-2.5.1.x7-241_2.6.32_431.17.1.x2.1.32.x86_64_g541638b

Required Customer-Supplied Network Infrastructure*DHCP Server Provides the MGMT nodes’ IP addresses for browser connections (customer can

choose to use a static IP address configuration for the “public” interfaces on theMGMT nodes)

NTP Server Synchronizes clocks across the cluster’s nodes

DNS Server Resolves LDAP and NTP hostnames on the MGMT nodes

* Manual workarounds may be available for environments without these servers. Contact your supportrepresentative for more information.

To view Lustre performance information, port 3306 must be open between the browser and the server hostingCSSM (GUI).

Software Versions and Requirements

11

Page 12: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

4 What Is Supported in Sonexion 2.1.0Qualified Functionality

● Installation and deployment of Sonexion 3000 systems with the following SSUconfigurations:

○ Single or Multi-SSU (SSU Only)

○ SSU + Single ESU (SSU+1)

● High Availability:

○ SSU Node Failover/Failback „RSMU Node Failover/Failback

○ MMU Node Failover/Failback

○ Dual Management Network Switch Redundancy (DMN)

○ Dual PDU Redundancy

● Lustre 2.5.1: Lustre Performance Monitoring of LMT

● Kernel/OS: Scientific Linux (SL) 6.5 / OS 6.2

● CSCLI - Sonexion Command-Line Interface

● CSSM - Sonexion Manager

● Chrome, Firefox, Safari, and Internet Explorer 11 browsers for Windows, Linux, andMacOS

● Support File Bundle Collection

● High Speed Interface: Mellanox CX-4 - EDR / FDR HCA

● RAID Stack: Updates to optimize GridRAID and SCSI performance.

● Drives: All 4K Native, T10-PI Format Type2

○ Seagate Thunderbolt 10K 900GB

○ Seagate Valkyrie 15K, 300GB

○ Seagate Tardis (HPC) 10K, 4TB (SED)

○ Seagate Makara 4TB, 6TB (SED)

○ Seagate Makara+ 8TB (SED)

○ Seagate Gibson SSD 800GB (SED)

● FRU Replacements

● RAS (Reliability, Availability, Serviceability)

○ RAS Infrastructure: CLI, Nagios and Ganglia plugins, REST API

○ Guided Walkthrough Repairs: 2U24 / 4U24 / 5U84 Drives, 2U24 / 4U24 PCMs, 2UQuad Server PSUs

○ Fault Isolation: 5U84 cooling module, 2U24 / 5U84 I/O controllers

What Is Supported in Sonexion 2.1.0

12

Page 13: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

○ Telemetry: service events, IEMs (Interesting Event Messages), inventory snapshots

● Extra MMU (ADU/DNE) Addition Procedure

● SSU Addition Procedure (SSU Only and SSU+1)

New Functionality

● Introduction of split SMU/MMU architecture that combines to make a CMU (seeSonexion 3000 Hardware Guide H-6144)

● Introduction of Mellanox IB EDR capabilities

● Introduction of next generation AP controller platform (Laguna Seca) using Intel HaswellCPUs

● Support for Mellanox SB7790 EDR Switches

● Support for Mellanox CX-4 HCAs

● Support for 4K Native and SED Hard Drives

● Support for 10K RPM HPC Drive

● Support for GOBI OneStor USM STX_GOBI_R1.16

NOT Supported in Sonexion 2.1.0

● CNG

● 40Gb Ethernet Data Fabric

● Intel Omni-Path Data Fabric

What Is Supported in Sonexion 2.1.0

13

Page 14: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

5 Sonexion 3000 Components and Hardware ListSonexion 3000 is a next-generation HPC storage platform that delivers industry-leading performance anddurability using 12GB architecture and the Intel Grantley/Haswell platform. The Sonexion 3000 platform buildsupon Sonexion’s history of HPC excellence by offering substantial upgrades and enhancements to systemcomponents and hardware. For a more thorough discussion of this hardware, please see the Sonexion 3000Hardware Guide H-6144.

CMUThe re-engineered CMU consists of two sub-components: the System Management Unit(SMU) and Metadata Management Unit (MMU), housed in separate 2U24 enclosures, whichreplace the Intel quad server and adjacent EBOD from the previous models.

SMU -- dual MGMT nodes in an HA pair:

● 2U24 enclosure

○ Dual PSUs

● Dual basic EACs

● 12 drives

○ 7 x Thunderbolt 10K HDDs (900 GB 2.5-inch)

○ 5 x Valkyrie 15K HDDs (300 GB 2.5-inch)

MMU -- dual MDS nodes in an HA pair:

● 2U24 enclosure

○ Dual PSUs

● Dual standard EACs

● 22 drives

○ 22 x Thunderbolt 10K HDDs (900 GB 2.5-inch)

SSUEach SSU hosts dual OSS nodes in an HA pair.

● 5U84 G2 enclosure

○ Dual PSUs

● Dual standard EACs

● 84 drives

○ 82 x HDDs (SAS 3.5-inch)

○ 2 x SSDs (SAS 2.5 inch)

ESU

Sonexion 3000 Components and Hardware List

14

Page 15: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

The ESU uses the 5U84 G2 enclosure with 6Gb EBOD controllers and high capacity HDDs.

● 5U84 G2 enclosure

○ Dual PSUs

● 6Gb EBOD controllers

● 82 HDDs (SAS 3.5 inch)

Management SwitchesDual Brocade ICX6610 switches are used for the management network (LMN).

● Base rack

○ Dual Brocade ICX6610 switches (24-port or 48-port, 1 GbE)

● Expansion rack

○ Dual Brocade ICX6610 switches (24-port, 1 GbE)

Network SwitchesDual Mellanox SB7790 EDR switches are used for the Lustre client network (LCN).

● Dual Mellanox SB7790 EDR (36-port, 100Gb InfiniBand)

5U84 G2The re-engineered 5U84 enclosure (5U84 G2) offers the following features:

● Enhanced LED display

● Improved drawer release

● Redesigned side card cover

● Improved sensor placement

EACsTwo EACs, basic and standard, are supported:

● Basic EAC

○ 64 GB DRAM

○ E5-2609 v3 CPU

○ 12Gb SAS controller

○ Dual 128GB SSDs

○ FDR IB

Sonexion 3000 Components and Hardware List

15

Page 16: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Figure 1. EAC for SMU Nodes (Basic EAC)

● Standard EAC

○ 64 GB DRAM

○ E5-2618L v3 CPU

○ 12Gb SAS controller

○ Single 128GB SSD

○ EDR IB

○ 12 Gb SAS card

Figure 2. EAC for MMU and SSU (Standard EAC)

Sonexion 3000 Components and Hardware List

16

Page 17: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

6 Bug Fixes, Features, Improvements, and KnownIssues for 2.1.0-002

This section lists the known issues for 2.1.0-002 at the time of this writing.

Bug Fixes799286 R35 SSU6 Disk Monitor reports SSDs as Hot Spares

808825 Heartbeat loss: CPUs executing ldlm_bl tasks on behalf of drop_caches

810767Multiple Deactivated and Nearly Full OSTs Resulted in High OSS Node Load Averages andUnusability of FS

814266 Pool modification commands

815953 1.5.0 Qualification - When removing a controller fail-over does not occur.

816617 crashed with LBUG on ASSERTION( lock != NULL )

820179 CSSM warns of critical firmware issue

820215 Multiple Nodes Powered Off

820639 SMP after perf of mds-survey was backgrounded

821046 Pool modification commands

821304 Slow raid rebuilds on MDRAID

821657 SU 17 upgrade problems

821763 powered down

821931 Sonexion 1.5 Control+c "exits" instead of "interrupts"

822520n002 failed over to n003 - LBUG :(ldlm_flock.c:849:ldlm_export_flock_put())ASSERTION( flock->blocking_export != ((void *)0) ) failed

822661 t0db database issues after 3 SSU add

822717 App hung in cl_sync_io_wait; bulk io rpcs stuck in unregistering phas

823580 The spacing in the output from "cscli show_nodes" is not correct on the Sonexino 900.

823581 Error messages when beSystemNetConfig script was run on 1.3.1 system

823633 stonith and failover - HA Timeout

823922 MDS stalls, cpu soft lockup while running mdtest

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

17

Page 18: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

824493 mds failovers with kdump during purie rhine rel-runs

824993 Unable to mount Filesystem from the Cray

825073 During 2.0 SU-06 install, 3 OSS nodes went down, install hung in beFreezeHA

825622 Unable to power up/communicate with SSU node

825638 Services in "pending" status in CSSM

825854 Sonexion cscli errors

826067 failover following disk failure and md0 had to be manually assembled

826087 Seagate SSDs too small to replace Hitachi SSDs

826317 ASSERTION( lock->l_export == opd->opd_exp ) failed:

826698 RAID Check Disabled- getting ***Error: I/O timeouts*** on multiple nodes

826806 8 Disk(s) failure(s) Slot 70 SSD I/O errors

826856 S7 - n007 & n006 panicked and md3 resources fail to start

827614 twistd memory usage (snx11128n000)

827656 db.py , inventory.py : ERROR updating t0db database error after drive replacement

827730 1.5.0 Upgrade beUpdatePuppet failure

827828 powered off and failed to fail-over

828105 Numerous OSTs fail to run monthly raid-check

828474 NEO 1.5 Upgrade Planning / Resource Request

828499 Sonexion fails to persist CLOSE event in changelog mask after unmount

828609MMU and ADU/DNE drives replaced with spare drive, but spare was marked as failed assoon as it was inserted.

828782 Nodes shows up as unknown

828958 Sonexion node falsely reports as non-responsive

829002 MDT corruption on the main Lustre filesystem

829283stopping raid-check causes multiple nodes to go down, full power cycle needed to recoverf/s

829453 Sonexion fail to reinstalle replaced node n030

829576 filesystem inconsistency

829750 8 failed slots, OST not starting

829787 node fails to kdump after lbug crash

830030 SWO - Changelog index count

830925 failover testing failure

831490 LBUG/ASSERTION "fid_is_sane(&md.body->fid1) ) failed:"

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

18

Page 19: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

831540mds, mgs crashes, ldiskfs panic following "ldiskfs_xattr_inode_iget: error while reading EAinode

831793 Kernel panic, Failover failed

832154 frequent sas driver messages

832511 Ping rpc hung in unregistering phase with rq_receiving_reply set

832809 Lustre not starting, resource failed actions "not installed"

833268 Failback taking a lot longer under heavy load.

833608 Staging a directory sometimes results in zero length files

834135 hlus01 - n018 - "CRITICAL:device OSS reported unhealthy"

834414 ASSERTION( get_current()->journal_info == ((void *)0) ) failed

834486fs down after MDT crash "ldiskfs_xattr_inode_iget: Backpointer from EA inode 2300579986to parent invalid."

834793 n009 failed over to n008 for unknown reason, failover was successful

834796 mds controller downed.

834805 MDS node crashed

834945 n211 failed to failover to n210

835090 soft lockup on MDS lead to fs failure, MDS and MGS nodes both down

835240 Application failure due to client eviction

835282 S072 - n007 Kernel panic - not syncing: LDISKFS-fs (device md3)

835444LBUG: (osc_page.c:333:osc_page_delete()) Trying to teardown failed: -16;ASSERTION( 0 )

835485 Sonexion stripe 8 files missing OST parts

835883 Client evicted from expired blocking callback timer

836705 failback resulted in _md66-fsys (ocf::heartbeat:XYMNTR)

838390 SU 15 install

838584 client crash in osc_cache.c:3107:discard_cb() LBUG after OST failover or failback

838602 single-shared file IOR jobs hang during OST failover/failback

838832 IOR jobs fail during IB cable pull test

839072 IOR data compare error during OST failover test

839147 Snx 1.3.1 SU25 caused serious upgrade delay due to incompatible DB entries

839275 Lustre recovery issues on OST failback, recovery sometimes hits hard limit

839678 2.0SU19 cscli fs_info and show_nodes not working

839743 IOR fails w/data check errors

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

19

Page 20: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

840345 md66-fsys unmount hang during mdt failback

841983 stonithed for HA failure in mdadm_conf_regenerate

842237 cscli failover triggers STONITH

826317OSS node crashed with assertion failure: ASSERTION( lock->l_export == opd->opd_exp )failed

831827, 832154 Multiple Instances of nodes being unexpectedly powered off

832809Add error message to catch the situation where drives are swapped between twoenclosures.

838602 ldlm: lost BL AST during failover

840984We need to validate all RPM packages from both base and SU repos before installation ofSU to avoid issues described in CSLTR-6550. Additionally we need to add check forsuccessful diskless image creation.

Seagate Internal Unable to see failed disk slot location in GUI

Seagate Internal During Resiliency Testing, pulled 2 drives and pdrepair did not start

Seagate Internal 2.0 SU-11.68, errors from post-install step

Seagate Internal Max write performance sometimes requires re-reading block bitmaps into memory

Seagate Internal support bundle collection hanging

Seagate Internal log rotate not running correctly on 1.2 system

Seagate Internal extraneous node entries in t0db netdev table caused beSystemNetConfig to fail

Seagate Internal CSSM did not allow configuration of LDAP via Configuration tab

Seagate Internal su-1.3.1-023.87 exposes CLSTR-4175 on systems upgraded from NEO 1.2.x

Seagate Internal 2.0 SU11-61 node powered off during fail over testing.

Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing

Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing

Seagate Internal3 down OSS nodes around the time of n000 failure, OST resources being stopped/restarted

Seagate Internal2 MDS nodes crash on DNE system while running beSystemNetConfig script,"osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed"

Seagate Internal 2.0 SU11.6X GUI and CSCLI not reporting lustre status correctly

Seagate Internal MGS failover nid problems on 2.0 systems

Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing

Seagate Internal MGS n002 node stonithed when setting lustre parameter

Seagate InternalCLONE - manual MDS failover on 180-ssu fs failed this morning; admin interventionrequired to complete failover. Timeout in start_xyraid too short

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

20

Page 21: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Seagate InternalbeSystemNetConfig.sh should avoid erase_params if possible, or should warn aboutparams erased

Seagate Internal 2.0 SU11.6X GUI and CSCLI not reporting lustre status correctly

Seagate Internal cscli show_nodes shows inaccurate target status

Seagate Internal Very slow to read file inode. `ls`shows file metadata with '?' marks.

Seagate Internal Unable to mount lustre clients, even from n01, during installation.

Seagate Internal3 disks on single md failed during RAID check ... node stonithed, md4 didn't start on thepartner node

Seagate Internal S005 - cron.hourly issues - MySQL /var/lib/mysql/mysql.sock

Seagate Internal stonith problems, stonith of MDS fails (after MDS crash that failed to cleanly panic)

Seagate Internal Remove ‘lctl notransno’ and ‘lctl readonly’ commands from the XYMNTR stop operation.

Seagate InternalIn order to support switch in image handling in SU Trinity code has to be updated to handlenew versioning schema - currently it relies only on cs-release package version instead ofconsidering SU version to

Seagate InternalGenerate special lustre_config for Sonexion 3000 case. Incorrect lustre configuration ofprimary MGS node is fixed.

Seagate InternalThe problem is active targets are updated based on the resources active on that node, Notconsidering the primary roles of the node. Fixed that part.

Seagate Internal tests: conflicting locks are not flushed properly

Seagate Internal llite: Lustre I/O hung waiting for page

Seagate Internal “MRP-3603 osp: wakeup osp_precreate_reserve on umount”

Seagate Internal ofd: handle last_rcvd file can’t update properly

Seagate Internal tests: race MDT->OST reconnection with create

Seagate Internal llite: add forgotten copy_from/to_user

Seagate Internal Need to enforce that management node arrays are comprised of uniform drive types.

Seagate Internalplex service restarts on active management node due to exceeding memory usagethreshold allocated to a single service on management node.

Seagate InternalDue to fault in puppet manifest regeneration logic in SU script it can throw false errors thatcan be interpreted by user as actual issues.

Seagate Internalwhen applying SUs the Pacemaker config is not updated, so all updates related toadjustments in HA (eg. ustonith..) are not in effect.

Seagate Internal Update between different builds of same SU version don’t work

Seagate Internal tests: In interop, ensure to save/restore correct debug flags

Seagate Internal tests: lnet-selftest Error inserting modules

Seagate Internal Modified ll_find_alias to avoid cache corruption

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

21

Page 22: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Seagate Internal ldlm: fixing a server crash with ASSERTION(flock->blocking_export != 0)) failure

Seagate Internal test: wait on MDS for ost-pool proc entry to update

Seagate Internal scrub: NOT assign LMA for EA inode

Seagate Internal osd: Add nodelalloc to ldisk mount options

Seagate Internal tests: customise ior, simul cmds and MPIRUN

Seagate Internal osd-ldiskfs: pass uid/gid/xtime directly to ldiskfs

Seagate Internal Port LU-7130 changes b_neo_stable_2.x

Seagate Internal nrs: add lock to protect TBF rule linkage

Seagate Internal tests: skip several tests for CLIENTONLY mode

Seagate Internal test: racer on NFS

Seagate Internal ldlm: soft lockup in ldlm_plain_compat_queue

Seagate Internal Remove force option from XYMNTR Lustre lazy umount path

Seagate InternalMGMT framework fails to restart successfully on active management node when memoryusage threshold allocated to a single service on management node has been exceeded.

Seagate Internal Failed Leguna Seca not resolving once it was repaired when the firmware alert is unknown.

Seagate Internal MRPD Collection failing with permission denied

Seagate InternalDue to having two MDT’s (combined with MGS and separate one) on Laguna Secasystems Lustre upcall may be set incorrectly

Seagate Internal Increase timeout for crash memory dumping procedure

Seagate Internal Repetitive errors messages while communicating with GEM are suppressed appropriately.

Seagate InternalDDUMP monitoring not able to clear statesave bit in ses_page 2, triggering DDUMPcollection frequently.

Seagate InternalMgmt node should not have lustre status as Started. It should be N/a. Added hotfix to notcheck lustre status for mgmt nodes.

Seagate Internal ldlm: Wrong evict during failover

Features797089 Monitor/measure PDU power consumption

793717 Sonexion - heartbeat is insufficient to detect MDS failure

793269 Access LMT data on Sonexion filesystem

793583 SNMP Monitoring

800652 Enable Nagios Notifications

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

22

Page 23: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Improvements833368 Many OSS nodes will not come online after SU10 and FW updates

831400 ldap does not appear to be functioning on the sonexion, user can not create files

Seagate Internal ADU add failing after usm 3.26 upgrade

822718 CSSM GUI does not allow failover/failback of MGS (n002) to its partner

828842 Disk watcher daemon KILLDRIVE alert trigger under high IO load

833048 OST down after drive problem; OSS stonith'd, OST fails to start on partner

826047 S14 slow array/disk errors.

818646 raid-check and rebuilds should have minimal impact on production jobs

827265 SWO due to multiple drives going offline following sled reset

834222 9 disk failed --- OST down

835920 reports several disk disabled and n041 does not mount all of the disks

835764 several disk failures & OST failed-over to n224.

832398 disk problems, failover, n250 couldn't reassemble disk.

836071 lost disks due to a raid disassemble

825255 OSS nodes is down:

834918 powered down.

827027 monitor timeout, node stonithed

827028 monitor timeout, node stonithed

Seagate Internal MDS nodes stopped

Seagate Internal MDS nodes stopped

Seagate Internal 2.0 SU10 pm -q not working as admin

827716 After disk failure during raidcheck, sync_min is left at a non-zero value

835952 SWO Lustre - FS offline after split brain resulted in double mount

830809 OST should not be allowed to failback to node without infiniband connectivity

838419 stonithed after HA timeout - prm-snmp-heartbeat:0_monitor_10000 Timed Out

819194 Disk Watcher Daemon interrogating all disks

833048 OST down after drive problem; OSS stonith'd, OST fails to start on partner

826047 S14 slow array/disk errors.

828842 Disk watcher daemon KILLDRIVE alert trigger under high IO load

823922 MDS stalls, cpu soft lockup while running mdtest

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

23

Page 24: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Seagate InternalDocumented procedure to backup, drop, and recreate the t0db, mysql and LMTdatabases.

813897 Puppet is not starting after cold boot of snx11003 and LMT database corruption

827425Change the behavior of Lustre changelogs so that a client changelog config problem (orsome other unexpected client issue) cannot take out the file system

Seagate Internal 1.2.1 install, mds start hangs during beSystemNetConfig.sh

Seagate Internal tests: customise the list of loads

Seagate InternalPreserve timestamps in the Ganglia plugin when streaming data to the Ganglia server.The ensures accurately of plotted data in ganglia-web.

Known IssuesKey Summary Workaround

DOC-1323Sonexion 3000 - Daily Mode CLI commands list from online helpis out of date and need to be updated

No workaround at thistime

FMW-18954After BIOS update of GOBI, node does not always boot upcorrectly

No workaround at thistime

MRP-3515 I/O from all clients is halted when one client loses power

I/O resumes afterapproximately half anhour, dependent onthe work load prior tothe event

MRP-3559 2.1 Aero RC13 LustreError general protection fault panicsNo workaround at thistime

NEO-2690On Sonexion 3000 platform, controllers will power upautomatically when power is applied

No workaround at thistime

NEO-2715Intermittently, after mgmt node(s) come back up after being shutdown, xybridge does not start up correctly on both nodes.

No workaround at thistime

NEO-2782Firmware for Thunderbolt and Valkyrie HDDs is missing in 2.1release

No workaround at thistime

NEO-2789 MDS and MGS nodes died while doing failover of MGS nodeNo workaround at thistime

NSIT-12Drives accessible via left side expander may intermittentlydisappear from the system. Root cause is under investigation.

No workaround at thistime

NSIT-17 "Verify" capability is missing from GOBI usmtool in Sonexion 2.1No workaround at thistime

OSG-1773Can't boot with Live key on Sonexion 3000 system. Likelyproblem with drive setup mismatch between BIOS and kickstartoptions

No workaround at thistime

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

24

Page 25: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

OSG-1850xybridge link down on secondary mgmt node after install.Possible mismatch with enclosure firmware and driver

No workaround at thistime

OSG-1852Both mgmt nodes went down during FOFB. Unknown cause atthis time

No workaround at thistime

OSG-1937 Watchdog timer is too short for kdumps to complete.No workaround at thistime

OSG-1946 Can't set LDAP after RC18 install. TRT-4571.No workaround at thistime

OSG-1947 Gemhpi and ses_monitor spinning on unresponsive enclosureNo workaround at thistime

OSG-1950 /tmp disk full preventing Lustre from mountingNo workaround at thistime

OSG-1957

There are two variations of HDD’s within the SMU enclosure(300GB 15K RPM & 900GB 10K RPM). The RAID Arrays createdduring installation may not have respected the differences of theHDDs and constructed RAID Arrays containing HDDs from bothdrive variants. Physical Impact: SMU RAID Arrays consist ofmixed drive capacities / variants and, as such, the size of thearray is based on the lowest capacity point present in the array(RAID10 = 600GB instead of 1.8TB / RAID1 = 300GB instead of900GB) Functional Impact: SMU RAID Array sizes are lower thandetailed in the Architectural Specification, and, as such: ¡ Forlarge systems (~50+ nodes) may have an impact by consumingall available storage prior to clean up operation. ¡ For smallsystems (~<50 nodes), there should be little impact as thestorage is cleaned regularly. RAS Impact: No impact CorrectiveAction - OEM re-Install of 2.1-GA- RMJT or utilize Seagatescript / procedure available via Seagate FAE’s

No workaround at thistime

OSG-1961 Watchdog timer is too short for vmcore dumps to complete.No workaround at thistime

OSG-1978 Rebuild failed to start on md64 when drive was failedNo workaround at thistime

RAMA-907 MGS node shows green in GUI heatmap when Lustre not startedNo workaround at thistime

RAMA-908 Dashboard shows downed nodes as green on heatmapNo workaround at thistime

RAS-473RAS treating SBB FRU Health status ‘unknown’ valueinconsistent.

No workaround at thistime

RAS-484No alert notification from RAS after triggered an uncorrectableECC error on LS

No workaround at thistime

SCRUF-1348Nodes n02 & n03 are physically swapped contradicting theArchitectural Specification, which details n02 as the top node and

No workaround at thistime

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

25

Page 26: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

n03 as the bottom node within the MMU enclosure. PhysicalImpact: MMU nodes will contradict the Sonexion 3000Architectural Specification and with n02 being the bottom physicalnode & n03 being the top physical node. Functional Impact:System will continue to operate as intended with no functionalimpact on performance or operation. RAS Impact: RAS willcontinue to be functional and identify the correct node, in thecase of a failure, if the nodes were identified as per below at pointof OEM installation. Top Node: purpose: [mgs=primary,

mds=secondary]

Bottom Node: purpose: purpose: [mds=primary, mgs=secondary]

Corrective Action - OEM re-Install of 2.1-GA- RMJT or utilizeSeagate script / procedure available via Seagate FAE’s

SCRUF-1369 ipmi-stonith is not being configured to use ipmi-sec on L300No workaround at thistime

SCRUF-1371 SU-002.13 script doesn't update mgmt HA configNo workaround at thistime

TRT-4361

Active/Active MDT are not properly configured on Sonexion 3000Physical Impact: Sonexion 3000 MDT not initialized as active /active Functional Impact: Sonexion 3000 MDT not initialized asactive / active RAS Impact: No impact Corrective Action - OEMre-Install of 2.1-GA- RMJT or utilize Seagate script / procedureavailable via Seagate FAE’s

No workaround at thistime

TRT-4398When accepting LNET routing files from user, DOS file format ofCR and LF is not handled and gives error. Workaround is toreplace this with only LF using an external utility.

No workaround at thistime

TRT-4416ses_monitor.py displaying "Possible bad or baulky drivemessages" in logs.

No workaround at thistime

TRT-4534In certain scenarios, while doing repeated failover and failback ofADU node, cscli faiback -n on ADU may not work as expected.

No workaround at thistime

TRT-4545While doing OEM install on Sonexion 3000 system, intermittentlythe installation screen will not show the status bar for all nodes.Refreshing of the browser tends to make the status bar accurate.

No workaround at thistime

TRT-4631Upload ssl certificate results in JAVA error: Invalid process stage:expected 49, actual 50

No workaround at thistime

Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002

26

Page 27: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

7 FirmwareThis section specifies component firmware qualified for the Sonexion 2.1.0-002 release.

SSU Firmware VersionsThis table lists qualified versions of firmware sub-components for the 5U84 G2 storage system released underGOBI OneStor USM STX_GOBI_R1.16 for the SSU enclosure.

SMU Component Firmware Sub-Component

Version Number

BMC Firmware: 0.01.0013

CPLD Firmware: 0.03.0004

BIOS Firmware: 0.02.0024

GEM

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

ConfigCRC 0x00000000

VPD structure 0x06

VPD CRC 0XD7CA1702

Eth Switch EEPROM CRC 0x45CD694A

GEMSat

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

Bootloader 1.00

ConfigCRC Not present

VPD structure 0x06

VPD CRC 0x992781BB

CPLD 2.1

MidplaneCPLD 0x13

VPD structure 0x0C

VPD crc 0x1E7457CB

PCM1Firmware 1.00 | 1.05

VPD structure 0x05

Firmware

27

Page 28: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

VPD CRC 0x 41BEF99C

PCM2Firmware 1.00 | 1.05

VPD structure 0x05

VPD CRC 0x 41BEF99C

Fan Controller 1Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 2Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 3Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 4Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 5Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Sideplane

Element0 Firmware : 4.0.0.67|BL=6.10|FC=0x9E928EC0|VR=

0x06|VC=0x699F059B|CR=0x12|PC=N/ A|EV=0x80040002|SV=3.06-B032

Element1 Firmware : 4.0.0.67|BL=6.10|FC=0xF3F1DF4D|VR=

0x06|VC=0x42845E7B|CR=0x12|PC=N/ A|EV=0x80050002|SV=3.06-B032

Element2 Firmware : 4.0.0.67|BL=6.10|FC=0xD82BA4C7|VR=

0x06|VC=0x25D4B564|CR=0x12|PC=N/ A|EV=0x80040002|SV=3.06-B032

Element3 Firmware : 4.0.0.67|BL=6.10|FC=0x1709EDAC|VR=

0x06|VC=0xAC2E8A42|CR=0x12|PC=N/ A|EV=0x80050002|SV=3.06-B032

Firmware

28

Page 29: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

SMU Firmware VersionsThis table lists qualified versions of firmware sub-components for the 2U24 storage system under GOBI OneStorUSM STX_GOBI_R1.16 for the SMU enclosure, when the product leaves the factory.

SMU Component Firmware Sub-Component Version Number

BMC Firmware: 0.01.0013

CPLD Firmware: 0.03.0004

BIOS Firmware: 0.02.0024

GEM

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

ConfigCRC 0x00000000

VPD structure 0x06

VPD CRC 0XD7CA1702

Eth Switch EEPROM CRC 0x45CD694A

GEMSat

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

Bootloader 1.00

ConfigCRC Not present

VPD structure 0x06

VPD CRC 0x992781BB

CPLD 2.1

Midplane

CPLD 0x13

VPD structure 0x0C

VPD crc 0x1E7457CB

PCM1

Firmware 1.00 | 1.05

VPD structure 0x05

VPD CRC 0x 41BEF99C

PCM2

Firmware 1.00 | 1.05

VPD structure 0x05

VPD CRC 0x 41BEF99C

MMU Firmware VersionsThis table lists qualified versions of firmware sub-components for the MMU enclosures, when the product leavesthe factory.

Firmware

29

Page 30: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

MMU Component Firmware Sub-Component Version Number

BMC Firmware: 0.01.0013

CPLD Firmware: 0.03.0004

BIOS Firmware: 0.02.0024

GEM

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

ConfigCRC 0x00000000

VPD structure 0x06

VPD CRC 0XD7CA1702

Eth Switch EEPROM CRC 0x45CD694A

GEMSat

Firmware 4.3.1.19

Firmware date Mar 22 2016 18:30:05

Bootloader 1.00

ConfigCRC Not present

VPD structure 0x06

VPD CRC 0x992781BB

CPLD 2.1

Midplane

CPLD 0x13

VPD structure 0x0C

VPD crc 0x1E7457CB

PCM1

Firmware 1.00 | 1.05

VPD structure 0x05

VPD CRC 0x 41BEF99C

PCM2

Firmware 1.00 | 1.05

VPD structure 0x05

VPD CRC 0x41BEF99C

ESU Firmware VersionsESU Component Firmware Sub-Component Version Number

EBODFirmware 4.0.0.75

Bootloader 5.07

Firmware

30

Page 31: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

VPD Structure 0x06

VPD CRC 0xB8D3D512

ConfigCRC 0x5BD2C2E8

GEM CPLD 0x14

Power CPLD 0x00176CF8

Midplane

CPLD 0x03

VPD structure 0x10

VPD CRC 0x7BE4F602

PCM1

Firmware 2.29|2.1E|2.00

VPD structure 0x03

VPD CRC 0x486003DF

PCM2

Firmware 2.29|2.1E|2.00

VPD structure 0x03

VPD CRC 0x486003DF

Fan Controller 1

Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 2

Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 3

Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 4

Device FW 01.0F

VPD version 0x05

Config 0x636B4986

Fan Controller 5

Device FW 01.0F

VPD version 0x05

Config 0x636B4986

SideplaneElement0 Firmware :

4.0.0.75|BL=0610|FC=0x9E928EC0|VR=0 x06|VC=0x699F059B|CR=0x12|PC=N/A|EV=0x80040002|SV=3.06-B032

Firmware

31

Page 32: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

Element1 Firmware :4.0.0.75|BL=0610|FC=0xF3F1DF4D|VR=0 x06|VC=0x42845E7B|CR=0x12|PC=N/A|EV=0x80050002|SV=3.06-B032

Element2 Firmware :4.0.0.75|BL=0610|FC=0xF2F500C7|VR=0 x06|VC=0x25D4B564|CR=0x12|PC=N/A|EV=0x80040002|SV=3.06-B032

Element3 Firmware :4.0.0.75|BL=0610|FC=0x4E77DD42|VR=0 x06|VC=0xAC2E8A42|CR=0x12|PC=N/A|EV=0x80050002|SV=3.06-B032

Rack Component Firmware VersionsThis table lists qualified versions of firmware for the Sonexion 3000 rack components (switches), when theproduct leaves the factory.

Rack Component Version Number

MellanoxSB7790 EDR(36-portIB) 11.0300.0354

Brocade ICX-6610-24 (24-port) 08.0.30

Brocade ICX-6610-48 (48-port) 08.0.30

Disk Drive Firmware VersionsThis table lists qualified versions of disk drive firmware, when the product leaves the factory.

Drive Model Firmware Version

Seagate 300GB HDD (ST300MP0065) [SMU/MMU2U24] K003

Seagate 900GB HDD (ST900MM0008) [SMU/MMU 2U24] K002

Seagate 800GB SSD(ST800FM0053) [SSU] XGEG

Seagate 4TB HDD (ST4000NM0074) [SSU & ESU] KT05

Seagate 4TB HPC HDD(ST4000NM0031) [SSU & ESU] KTF2

Seagate 6TB HDD (ST6000NM0074) [SSU & ESU] KT05

Seagate 8TB HDD (ST8000NM0095) [SSU & ESU] KT01

Firmware

32

Page 33: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

8 Notices and PrecautionsThe following statements are provided to list the specific known issues, and to ensure safety and safe operations.

Electrical Considerations● This equipment is designed to be installed on a dedicated circuit.

● The dedicated circuit must have circuit breaker or fuse protection. Protection of capacity equal to the currentrating of the distribution unit must be provided and must meet all applicable codes and regulations.

● Warning! HIGH LEAKAGE CURRENT. Ground (earth) connection is essential before connecting a supply.

● The Sonexion rack has multiple input power connectors. Disconnect all supply power for complete isolation.

● A safe electrical ground (earth) connection must be provided to the power supply cords.

● When power cycling any enclosure, wait approximately 10 seconds before re-applying power. Use the powersupply's ON/OFF switch to manage the power.

● After completing all assemblies and prior to powering on any system, perform a ground (earth) continuity anddielectric strength test.

● Verify that any circuit breakers installed in the facility are adequately sized, to avoid the possibility of thefacility's circuit breakers tripping in the event of a fault within the Sonexion rack and causing down time.

● When handling disk drives or components, avoid touching the printed circuit boards. You must observe allconventional ESD precautions.

Load Ratings and General PrecautionsThe rack has load ratings as described below:

● Base Rack with 6 SSUs:

○ Static Load (HDDs installed in SSUs): 1004 kg

○ Dynamic Load (no HDDs installed in SSUs): 568 kg

● Storage Rack with 7 SSUs (no Additional MMUs):

○ Static Load (HDDs installed in SSUs): 1034 kg

○ Dynamic Load (no HDDs installed in SSUs): 581 kg

● Frame load ratings are not dependent on side panels, doors, or other components for structural support.

● A customer is responsible for ensuring that the floor will support the static and dynamic load rating of the rack.This is especially important for installations that involve raised flooring.

● After removing the packaging and before moving the Sonexion rack to the final location, the bottom two SSUsmust be fully populated with disk drives.

Notices and Precautions

33

Page 34: Sonexion™ 3000 Release Notes (2.1.0-002) S-2533

● With the weight and size of the Sonexion rack, it is possible for the rack to topple over while it is being moved.Do not tip the rack more than 10 degrees from a level surface or when rolling down an incline or ramp. Ensurethat the outriggers are properly installed to prevent possible toppling.

● When moving the Sonexion rack, the drives must be removed from all except the bottom two SSUs. Whenremoving the disk drives you must ensure each drive is re-installed in the exact same drive slot and the exactsame enclosure.

● When loading the rack, fill from the bottom up and empty from the top down.

● Do not slide more than one drawer out of any SSU enclosure in the rack at a time to avoid the danger of therack toppling over.

● Do not leave any enclosure bay empty.

● Contact Cray Service for firmware upgrades.

● Only trained service personnel may service any field replacement unit (FRU), and must follow the approveddocumented procedures for the FRU.

● Replacement of a cooling fan in any SSU enclosure must be completed within 2 minutes.

● When opening a drawer on any SSU enclosure, do not leave the drawer open longer than 2 minutes.

● When replacing a disk drive in any SSU enclosure, unlatch the drive and wait 5 seconds for the drive to spindown before removal.

Notices and Precautions

34