Syncro CS 9271-8i Solution User Guide

DB15-001053-01

Syncro® CS 9271-8i SolutionUser Guide

Version 2.0October 2013

DB15-001053-01

LSI, the LSI & Design logo, Storage.Networking.Accelerated., Syncro, CacheCade, CacheVault, and MegaRAID are trademarks or registered trademarks of LSI Corporation in the United States and/or othercountries. All other brand and product names may be trademarks of their respective companies.

PCI Express and PCIe are registered trademarks of PCI-SIG.

LSI Corporation reserves the right to make changes to the product(s) or information disclosed herein at any time without notice. LSI Corporation does not assume any responsibility or liability arising out ofthe application or use of any product or service described herein, except as expressly agreed to in writing by LSI Corporation; nor does the purchase, lease, or use of a product or service from LSI Corporationconvey a license under any patent rights, copyrights, trademark rights, or any other of the intellectual property rights of LSI Corporation or of third parties. LSI products are not intended for use in life-supportappliances, devices, or systems. Use of any LSI product in such applications without written consent of the appropriate LSI officer is prohibited.This document contains proprietary information of LSICorporation. The information contained herein is not to be used by or disclosed to third parties without the express written permission of LSI Corporation.

Corporate Headquarters WebsiteSan Jose, CA www.lsi.com800-372-2447

Document Number: DB15-001053-01Copyright © 2013 LSI CorporationAll Rights Reserved

Syncro CS 9271-8i Solution User GuideOctober 2013

Revision History

Version and Date Description of Changes

Version 2.0, October 2013 Added sections for Linux® operating system support.

Version 1.0, March 2013 Initial release of this document.

Table of Contents


Table of Contents

LSI Corporation- 3 -

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1 Concepts of High-Availability DAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 HA-DAS Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Syncro CS 9271-8i Solution Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Hardware Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Overview of the Hardware Installation, Cluster Setup, and Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Chapter 2: Hardware and Software Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Syncro CS Cluster-in-a-Box Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Syncro CS Cluster-in-a-Box Software Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 3: Creating the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Creating Virtual Drives on the Controller Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.1 Creating Shared or Exclusive VDs with the WebBIOS Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 Creating Shared or Exclusive VDs with StorCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Creating Shared or Exclusive VDs with MSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.3.1 Unsupported Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 HA-DAS CacheCade Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Creating the Cluster in Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.1 Prerequisites for Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.1.1 Clustered RAID Controller Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.1.2 Enable Failover Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.1.3 Configure Network Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.2 Creating the Failover Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.3 Validating the Failover Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Creating the Cluster in Red Hat Enterprise Linux (RHEL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4.1 Prerequisites for Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4.1.1 Configure Network Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4.1.2 Install and Configure the High Availability Add-On Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4.1.3 Configure SELinux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.2 Creating the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4.3 Configure the Logical Volumes and Apply GFS2 File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.4 Add a Fence Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4.5 Create a Failover Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4.6 Add Resources to the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.7 Create Service Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4.8 Create a Disk Quorum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.9 Edit the Cluster Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.10 Mount the NFS Resource from the Remote Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Creating the Cluster in SuSE Linux Enterprise Server (SLES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.5.1 Prerequisites for Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5.1.1 Prepare the Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.5.1.2 Configure Network Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5.1.3 Connect to the NTP Server for Time Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.2 Creating the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.5.2.1 Cluster Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.3 Bringing the Cluster Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48


Table of Contents


3.5.4 Configuring the NFS Resource with STONITH SBD Fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5.4.1 Install NFSSERVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5.4.2 Configure the Partition and the File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5.4.3 Configure stonith_sbd Fencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5.5 Adding NFS Cluster Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.5.6 Mounting NFS in the Remote Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4: System Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 High Availability Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Understanding Failover Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.1 Understanding and Using Planned Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1.1 Planned Failover in Windows Server 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1.2 Planned Failover in Windows Server 2008 R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.2.1.3 Planned Failover in Red Hat Enterprise Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.1.4 Planned Failover in SuSE Linux Enterprise Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.2 Understanding Unplanned Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 Updating the Syncro CS 9271-8i Controller Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.4 Updating the MegaRAID Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.4.1 Updating the MegaRAID Driver in Windows Server 2008 R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.4.2 Updating the MegaRAID Driver in Windows Server 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.3 Updating the Red Hat Linux System Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.4.4 Updating the SuSE Linux Enterprise Server 11 Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 Performing Preventive Measures on Disk Drives and VDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Chapter 5: Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1 Verifying HA-DAS Support in Tools and the OS Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Confirming SAS Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.1 Using WebBIOS to View Connections for Controllers, Expanders, and Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2.2 Using WebBIOS to Verify Dual-Ported SAS Addresses to Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2.3 Using StorCLI to Verify Dual-Ported SAS Addresses to Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2.4 Using MSM to Verify Dual-Ported SAS Addresses to Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3 Understanding CacheCade Behavior During a Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.4 Error Situations and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.5 Event Messages and Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77



Chapter 1: IntroductionConcepts of High-Availability DAS

Chapter 1: Introduction

This document explains how to set up and configure the hardware and software for the Syncro® CS 9271-8i high-availability direct-attached storage (HA-DAS) solution.

The Syncro CS 9271-8i solution provides fault tolerance capabilities as a key part of a high-availability data storage system. The Syncro CS 9271-8i solution combines redundant servers, LSI® HA-DAS RAID controllers, computer nodes, cable connections, common SAS JBOD enclosures, and dual-ported SAS storage devices.

The redundant components and software technologies provide a high-availability system with ongoing service that is not interrupted by the following events:

The failure of a single internal node does not interrupt service because the solution has multiple nodes with cluster failover.

An expander failure does not interrupt service because the dual expanders in every enclosure provide redundant data paths.

A drive failure does not interrupt service because RAID fault tolerance is part of the configuration. A system storage expansion or maintenance activity can be completed without requiring an interruption of

service because of redundant components, management software, and maintenance procedures.

1.1 Concepts of High-Availability DAS

In terms of data storage and processing, High Availability (HA) means a computer system design that ensures a high level of operational continuity and data access reliability over a long period of time. High-availability systems are critical to the success and business needs of small and medium-sized business (SMB) customers, such as retail outlets and health care offices, who cannot afford to have their computer systems go down. An HA-DAS solution enables customers to maintain continuous access to and use of their computer system. Shared direct-attached drives are accessible to multiple servers, thereby maintaining ease of use and reducing storage costs.

A cluster is a group of computers working together to run a common set of applications and to present a single logical system to the client and application. Failover clustering provides redundancy to the cluster group to maximize up-time by utilizing fault-tolerant components. In the example of two servers with shared storage that comprise a failover cluster, when a server fails, the failover cluster automatically moves control of the shared resources to the surviving server with no interruption of processing. This configuration allows seamless failover capabilities in the event of planned failover (maintenance mode) for maintenance or upgrade, or in the event of a failure of the CPU, memory, or other server failures.

The Syncro CS 9271-8i solution is specifically designed to provide HA-DAS capabilities for a class of server chassis that include two server motherboards in one chassis. This chassis architecture is often called a cluster in a box (CiB).

Because multiple initiators exist in a clustered pair of servers (nodes) with a common shared storage domain, there is a concept of device reservations in which physical drives, drive groups, and virtual drives (VDs) are managed by a selected single initiator. For HA-DAS, I/O transactions and RAID management operations are normally processed by a single Syncro CS 9271-8i controller, and the associated physical drives, drive groups, and VDs are only visible to that controller. To assure continued operation, all other physical drives, drive groups, and VDs are also visible to, though not normally controlled by, the Syncro CS controller. This key functionality allows the Syncro CS 9271-8i solution to share VDs among multiple initiators as well as exclusively constrain VD access to a particular initiator without the need for SAS zoning.



Chapter 1: IntroductionHA-DAS Terminology

Node downtime in an HA system can be either planned and unplanned. Planned node downtime is the result of management-initiated events, such as upgrades and maintenance. Unplanned node downtime results from events that are not within the direct control of IT administrators, such as failed software, drivers, or hardware. The Syncro CS 9271-8i solution protects your data and maintains system up-time from both planned and unplanned node downtime. It also enables you to schedule node downtime to update hardware or firmware, and so on. When you bring one controller node down for scheduled maintenance, the other node takes over with no interruption of service.

1.2 HA-DAS Terminology

This section defines some additional important HA-DAS terms.

Cache Mirror: A cache coherency term describing the duplication of write-back cached data across two controllers.

Exclusive Access: A host access policy in which a VD is only exposed to, and accessed by, a single specified server. Failover: The process in which the management of drive groups and VDs transitions from one controller to the

peer controller to maintain data access and availability. HA Domain: A type of storage domain that consists of a set of HA controllers, cables, shared disk resources, and

storage media. Peer Controller: A relative term to describe the HA controller in the HA domain that acts as the failover controller. Server/Controller Node: A processing entity composed of a single host processor unit or multiple host processor

units that is characterized by having a single instance of a host operating system. Server Storage Cluster: An HA storage topology in which a common pool of storage devices is shared by two

computer nodes through dedicated Syncro CS 9271-8i controllers. Shared Access: A host access policy in which a VD is exposed to, and can be accessed by, all servers in the HA

domain. Virtual Drive (VD): A storage unit created by a RAID controller from one or more physical drives. Although a

virtual drive can consist of multiple drives, it is seen by the operating system as a single drive. Depending on the RAID level used, the virtual drive might retain redundant data in case of a drive failure.

1.3 Syncro CS 9271-8i Solution Features

The Syncro CS 9271-8i solution supports the following HA features.

Server storage cluster topology, enabled by the following supported operating systems:— Microsoft® Windows Server®2008 R2— Microsoft Windows Server 2012— Red Hat® Enterprise Linux 6.3— Red Hat Enterprise Linux 6.4— SuSE® Linux Enterprise Server 11 SP3— SuSE Linux Enterprise Server 11 SP2

Clustering/HA services support:— Microsoft failover clustering— Red Hat High Availability Add-on— SuSE High Availability Extensions

Dual-active HA with shared storage Controller-to-controller intercommunication over SAS



Chapter 1: IntroductionHardware Compatibility

Write-back cache coherency CacheCade 1.0 (Read) Shared and exclusive VD I/O access policies Operating system boot from the controller (exclusive access) Controller hardware and property mismatch detection, handling, and reporting Global hot spare support for all volumes in the HA domain Planned and unplanned failover modes CacheVault® provides cache cached data protection in case of host power loss or server failure Full MegaRAID® features, with the following exceptions.

— T10 Data Integrity Field (DIF) is not supported.— Self-encrypting drives (SED) and full disk encryption (FDE) are not supported.— CacheCade 2.0 (write back) is not supported. — Dimmer switch is not supported.— SGPIO sideband signaling for enclosure management is not supported.

1.4 Hardware Compatibility

The servers, disk drives, and optional JBOD enclosures you use in the Syncro CS 9271-8i solution must be selected from the list of approved components that LSI has tested for compatibility. Refer to the web page for the compatibility lists at http://www.lsi.com/channel/support/pages/interoperability.aspx.

1.5 Overview of the Hardware Installation, Cluster Setup, and Configuration

Chapters 2 and 3 describe how to install the hardware and software so that you can use the fault tolerance capabilities of HA-DAS to provide continuous service in event of drive failure or server failure and expand the system storage

Chapter 2 describes how to install the Syncro CS 9271-8i controllers and connect them by cable to the CiB enclosure. In addition, it lists the steps required after controller installation and cable connection, which include the following:

Configure the drive groups and the virtual drives on the two controllers Install the operating system driver on both server nodes Install the operating system on both server nodes, following the instructions from the manufacturer Install StorCLI and MegaRAID Storage Manager™ utilities

Chapter 3 describes how to perform the following actions while using a supported OS:

Install and enable the Cluster feature on both servers. Set up a cluster under the supported operating systems Configure drive groups and virtual drives Create a CacheCade® 1.0 virtual drive as part of a Syncro CS 9271-8i configuration



Chapter 1: IntroductionPerformance Considerations

1.6 Performance Considerations

SAS technology offers throughput-intensive data transfers and low latency times. Throughput is crucial during failover periods where the system needs to process reconfiguration activity in a fast, efficient manner. SAS offers a throughput rate of 124 Gb/s over a single lane. SAS controllers and enclosures typically aggregate 4 lanes into an x4 wide link, giving an available bandwidth of 48 Gb/s across a single connector, which makes SAS ideal for HA environments.

Syncro CS controllers work together across a shared SAS Fabric to achieve sharing, cache coherency, heartbeat monitoring and redundancy by using a set of protocols to carry out these functions. At any point in time, a particular VD is accessed or owned by a single controller. This owned VD is a termed a local VD. The second controller is aware of the VD on the first controller, but it has only indirect access to the VD. The VD is a remote VD for the second controller. In a configuration with multiple VDs, the workload is typically balanced across controllers to provide a higher degree of efficiency.

When a controller requires access to a remote VD, the I/Os are shipped to the remote controller, which processes the I/O locally. I/O requests that are handled by local VDs are much faster than those handled by remote VDs.

The preferred configuration is for the controller to own the VD that hosts the clustered resource (the MegaRAID Storage Manager utility shows which controller owns this VD). If the controller does not own this VD, it must issue a request to the peer controller to ship the data to it, which affects performance. This situation can occur if the configuration has been configured incorrectly or if the system is in a failover situation.

NOTE Performance tip: You can reduce the impact of I/O shipping by locating the VD or drive groups with the server node that is primarily driving the I/O load. Avoid drive group configurations with multiple VDs whose I/O load is split between the server nodes.

MSM has no visibility to remote VDs, so all VD management operations must be performed locally. A controller that has no direct access to a VD must use I/O shipping to access the data if it receives a client data request. Accessing the remote VD affects performance because of the I/O shipping overhead.

Performance tip: Use the MSM utility to verify correct resource ownership and load balancing. Load balancing is a method of spreading work between two or more computers, network links, CPUs, drives, or other resources. Load balancing is used to maximize resource use, throughput, or response time. Load balancing is the key to ensuring that client requests are handled in a timely, efficient manner.



Chapter 2: Hardware and Software Setup


This chapter explains how to set up the hardware and software for a Syncro CS 9271-8i solution with two controller nodes and shared storage. For this implementation you use a Cluster-in-a-Box (CiB) configuration in which the two server nodes with Syncro CS controllers and the shared disk drives are installed and connected inside a single CiB enclosure.

The Syncro CS cluster-in-a-box (CiB) system design combines two servers and a common pool of direct attached drives within one custom-designed enclosure. After you initially set it up, the CiB system design simplifies the deployment of two-node clusters because all components and connections are contained in one closed unit.

The Syncro CS cluster-in-a-box configuration requires a specially designed server and storage chassis that includes two Syncro CS 9271-8i controller boards and multiple SAS disks selected from the LSI disk compatibility list (see the URL in Section 1.4, Hardware Compatibility).

The LSI Syncro CS 9271-8i controller kit includes the following items:

— Two Syncro CS 9271-8i controllers— Two CacheVault Flash Modules 02 (CVFM02) (pre-installed on the controllers)— Two CacheVault Power Modules 02 (CVPM02)— Two CacheVault Power Module mounting clips and hardware— Two CacheVault Power Module extension cables— Two low-profile brackets— Syncro CS 9271-8i Controller Quick Installation Guide— Syncro CS Resource CD

The following figure shows a high-level diagram of a CiB Syncro CS 9271-8i CiB configuration connected to a network. The diagram includes the details of the SAS interconnections.




Figure 1 CiB Syncro CS 9271-8i Configuration

The Syncro CS 9271-8i controllers in the kit include the CacheVault Module Kit LSICVM01, which consists of a CacheVault Flash Module 02 (CVFM02), a CacheVault Power Module 02 (CVPM02), a clip, and a cable.

The CVFM02 module is an onboard 1-GB DDR3 1333 MT/s USB flash module that is pre-installed on the Syncro CS controller. The CVPM02 module is a super-capacitor pack that offers an intelligent backup power supply solution. The CVPM02 module provides both capacitor charge maintenance and capacitor health monitoring functions similar to those of an intelligent battery backup unit.

If a power failure occurs, the CVPM02 module unit provides power to transfer the cache memory contents from DRAM to a nonvolatile flash memory array on the CVFM02 module the next time the controller is powered on. Cached data can then be written to the storage devices.

NOTE The CVFM02 modules come pre-installed on the Syncro CS 9271-8i controllers.



Chapter 2: Hardware and Software SetupSyncro CS Cluster-in-a-Box Hardware Setup

2.1 Syncro CS Cluster-in-a-Box Hardware Setup

Follow these steps to set up the hardware for a Syncro CS CiB configuration. The setup procedure might be somewhat different for CiB models from different vendors, so be sure to also read the CiB manufacturer’s documentation.

NOTE The Syncro CS solution includes two Syncro CS controllers, so you must perform the installation procedure for both controllers, one in each server module inside the CiB enclosure.

1. Unpack the controller kit in a static-free environment and check the contents. Remove the components from the antistatic bags and inspect them for damage. If any components are missing or damaged, contact LSI or your Syncro CS OEM support representative.

2. Turn off power to the CiB enclosure, if necessary, and unplug the power cords from the power supply. Remove the cover from the CiB.

CAUTION Before you install the Syncro CS controllers, make sure that the CiB enclosure is disconnected from the power and from any networks.

3. Review the Syncro CS connectors and the jumper settings and change them if necessary.

You usually do not need to change the default factory settings of the jumpers. The following figure shows the location of the jumpers and connectors on the controller board.

The CVFM02 module comes preinstalled on your Syncro CS controller; however, the module is not attached in the following figure so that you can see all of the connectors and headers on the controller. Figure 3 and Figure 4 show the controller with the CVFM02 module installed.

Figure 2 Layout of the Syncro CS 9271-8i Controller

In the figure, Pin 1 is highlighted in red for each jumper.




The following table describes the jumpers and connectors on the Syncro CS 9271-8i controller.

Table 1 Syncro CS 9271-8i Controller Jumpers and Connectors

Jumper/Connector Type Description

J1A2 SCS Backplane Management connector 3-pin shielded headerImplements an enclosure management module that is responsible for providing enclosure management functions for storage enclosures and server backplanes.

J1A3 Local Battery Backup Unit connector 20-pin connectorConnects the LSIiBBU09 unit directly to the RAID controller.

J1A4 SCS Backplane Management connector 3-pin shielded headerImplements an enclosure management module that is responsible for providing enclosure management functions for storage enclosures and server backplanes.

J1B1 Individual PHY and Drive Fault Indication headerPorts 3 to 0Ports 7 to 4

2x8-pin headerConnects to an LED that indicates whether a drive is in a fault condition. There is one LED per port. When lit, each LED indicates the corresponding drive has failed or is in the Unconfigured-Bad state.The LEDs function in a direct-attach configuration (there are no SAS expanders). Direct attach is defined as a maximum of one drive connected directly to each port.

J2B4 Standard Edge Card connector The interface between the RAID controller and the host system.

This interface provides power to the board and an I2C interface connected to the I2C bus for the Intelligent Platform Management Interface (IPMI).

J3A1 Drive Activity LED header 2-pin connectorConnects to an LED that indicates activity on the drives connected to the controller.

J3L1 Remote Battery Backup connector 20-pin connectorNot used.

J4B1 CacheVault Flash Module DDR3 interface 70-pin connectorConnects the Syncro CS controller to the CVFM04 module.

J5A1 Internal x4 SFF-8087 mini SAS connector Internal x4 SAS connector.

J5A2 Write-Pending LED header 2-pin connectorConnects to an LED that indicates when the data in the cache has yet to be written to the storage devices. Used when the write-back feature is enabled.

J5B1 Internal x4 SFF-8087 mini SAS connector Internal x4 SAS connector.

J6B1 Advanced Software Options Hardware Key header

3-pin headerEnables support for the Advanced Software Options features.

J6B2 SBR Bypass header 2-pin connectorLets you bypass the SBR EEPROM during boot in case the EEPROM contents are corrupt or missing.

J6B3 Global Activity LED header 2-pin headerConnects to a single LED that indicates drive activity on either port.




4. Place the Syncro CS controller on a flat, clean, static-free surface after grounding yourself.

NOTE If you want to replace a bracket, refer to the Replacing Brackets on MegaRAID SAS+SATA RAID Controllers Quick Installation Guide for instructions.

5. Take the cable included with the kit and insert the smaller of the two 6-pin cable connectors on the cable into the 6-pin connector on the remote CVPM02 module, as shown in the following figure.

Figure 3 Connecting the Cable to the Remote CVPM02 Module

6. Mount the CVPM02 module inside the CiB enclosure, based on the location and the type of mounting option.

J6B4 Onboard Serial UART connector Reserved for LSI use.

J6B5 Global Drive Fault LED indicator 2-pin headerConnects to a single LED that indicates drive activity on either port.

J6B6 Onboard Serial UART connector Reserved for LSI use.

Table 1 Syncro CS 9271-8i Controller Jumpers and Connectors (Continued)

Jumper/Connector Type Description

3_01851-00

J2B1

CVPM02Module

CVFM02Module




NOTE Because CiB enclosures vary from one vendor to another, no standard mounting option exists for the CVPM02 module that is compatible with all CiB configurations. Authorized resellers and chassis manufacturers can customize the location of the power module to provide the most flexibility within various environments. Refer to the instructions from the CiB vendor to determine how to mount the CVPM02 module.

7. Make sure that the power to the CiB is still turned off, that the power cord is unplugged.

8. Insert the controller into a PCIe® slot on the motherboard of the server module, as shown in the following figure.

Press down gently, but firmly, to seat the controller correctly in the slot.

NOTE This controller is a PCIe x8 card that can operate in x8 or x16 slots. Some x16 PCIe slots, however, support only PCIe graphics cards; if a controller is installed in one of these slots, the controller will not function. Refer to the guide for your motherboard for information about the PCIe slot.

Figure 4 Installing the Syncro CS 9271-8i Controller and Connecting the Cable

9. Secure the controller to the computer chassis with the bracket screw.

10. Insert the other cable into the 6-pin cable connector on the onboard CVFM02 module, as shown in Figure 4.

11. Repeat step 4 to step 10 to install the second Syncro CS controller and the CVPM02 module in the second server module of the CiB enclosure.

Edge ofMotherboard

PCIe Slot

BracketScrew

PressHere

PressHere

3_01784-00



Chapter 2: Hardware and Software SetupSyncro CS Cluster-in-a-Box Software Setup

12. If necessary, install SAS disk drives in the CiB enclosure.

NOTE Drives that are used in the LSI Syncro CS solution must be selected from the compatibility list that LSI maintains on its web site. See the URL for this compatibility list in Section 1.4, Hardware Compatibility.

13. Use SAS cables to connect the internal connectors of the Syncro CS controllers to SAS devices in the CiB enclosure. Refer to the CiB manufacturer’s documentation for information on connecting SAS cables.

NOTE CiB chassis manufacturers determine the availability and configuration of external JBOD storage expansion connections. Refer to the instructions from the CiB vendor on how to properly configure SAS cable connections for external JBOD enclosures, if your CiB chassis offers this capability. Also, see the LSI hardware compatibility list at the UL listed in Section 1.4, Hardware Compatibility.

14. Reinstall the cover of the CiB enclosure and reconnect the power cords. Turn on the power to the CiB enclosure.

The firmware takes several seconds to initialize. During this time, the controllers scan the ports.

2.2 Syncro CS Cluster-in-a-Box Software Setup

Perform the following steps to set up the software for a Syncro CS CiB configuration.

1. Configure the drive groups and the virtual drives on the two controllers.

For specific instructions, see Section 3.1, Creating Virtual Drives on the Controller Nodes. You can use the WebBIOS, StorCLI, or MegaRAID Storage Manager configuration utilities to create the groups and virtual drives.

2. Install the operating system driver on both server nodes.

You must install the software driver first, before you install the operating system.

You can view the supported operating systems and download the latest drivers for the Syncro CS controllers from the LSI website at http://www.lsi.com/support/Pages/download-search.aspx. Access the download center, and follow the steps to download the appropriate driver.

Refer to the MegaRAID SAS Device Driver Installation User Guide on the Syncro CS Resource CD for more information about installing the driver. Be sure to review the readme file that accompanies the driver.

3. Install the operating system on both server nodes, following the instructions from the operating system vendor.

Make sure you apply all of the latest operating system updates and service packs to ensure proper functionality.

You have two options for installing the operating system for each controller node:

— Install it on a private volume connected to the system-native storage controller. The recommended best practice is to install the operating system on this private volume because the disks in the clustering configuration cannot see this volume. Therefore, no danger exists of accidentally overwriting the operating system disk when you set up clustering.

— Install it on an exclusive virtual drive connected to the Syncro CS 9271-8i controller. Exclusive host access is required for a boot volume so the volume is not overwritten accidentally when you create virtual drives for data storage. For instructions on creating exclusive virtual drives using the WebBIOS utility, see Section 3.1.1, Creating Shared or Exclusive VDs with the WebBIOS Utility.

NOTE The Syncro CS 9271-8i solution does not support booting from a shared operating system volume.

4. Install StorCLI and MegaRAID Storage Manager for the Windows® and Linux® operating systems following the installation steps outlined in the StorCLI Reference Manual and MegaRAID SAS Software User Guide on the Syncro CS Resource CD.



Chapter 3: Creating the ClusterCreating Virtual Drives on the Controller Nodes

Chapter 3: Creating the Cluster

This chapter explains how to set up HA-DAS clustering on a Syncro CS 9271-8i configuration after you configure the hardware and install the operating system.

3.1 Creating Virtual Drives on the Controller Nodes

The next step is creating VDs on the disk drives.

The HA-DAS cluster configuration requires a minimum of one shared VD to be utilized as a quorum disk to enable operating system support for clusters. Refer to the MegaRAID SAS Software User Guide for information about the available RAID levels and the advantages of each one.

As explained in the instructions in the following sections, VDs created for storage in an HA-DAS configuration must be shared. If you do not designate them as shared, the VDs are visible only from the controller node from which they were created.

You can use the WebBIOS pre-boot utility to create the VDs. You can also use the LSI MegaRAID Storage Manager (MSM) utility or the StorCLI utility to create VDs after the OS has booted. Refer to the MegaRAID SAS Software User Guide for complete instructions on using these utilities.

3.1.1 Creating Shared or Exclusive VDs with the WebBIOS Utility

To coordinate the configuration of the two controller nodes, both nodes must be booted into the WebBIOS pre-boot utility. The two nodes in the cluster system boot simultaneously after power on, so you must rapidly access both consoles. One of the systems is used to create the VDs; the other system simply remains in the pre-boot utility. This approach keeps the second system in a state that does not fail over while the VDs are being created on the first system.

NOTE The WebBIOS utility cannot see boot sectors on the disks. Therefore, be careful not to select the boot disk for a VD. Preferably, unshare the boot disk before doing any configuration with the pre-boot utility. To do this, select Logical Drive Properties and deselect the Shared Virtual Disk property.

Follow these steps to create VDs with the WebBIOS utility.

1. When prompted during the POST on the two systems, use the keyboard to access the WebBIOS pre-boot BIOS utility (on both systems) by pressing Ctrl-H.

Respond quickly, because the system boot times are very similar and the time-out period is short. When both controller nodes are running the WebBIOS utility, follow these steps to create RAID 5 arrays.

NOTE To create a RAID 0, RAID 1, or RAID 6 array, modify the instructions to select the appropriate number of disks.

2. Click Start.




3. On the WebBIOS main page, click Configuration Wizard, as shown in the following figure.

Figure 5 WebBIOS Main Page

The first Configuration Wizard window appears.

4. Select Add Configuration and click Next.

5. On the next wizard screen, select Manual Configuration and click Next.

The Drive Group Definition window appears.

6. In the Drives panel on the left, select the first drive, then hold down the Ctrl key and select more drives for the array, as shown in the following figure.

Figure 6 Selecting Drives




7. Click Add To Array, click ACCEPT, and click Next.

8. On the next screen, click Add to SPAN, then click Next.

9. On the next screen, click Update Size.

10. Select Provide Shared Access on the bottom left of the window, as shown in the following figure.

Alternatively, deselect this option to create an exclusive VD as a boot volume for this cluster node.

Figure 7 Virtual Drive Definition

The Provide Shared Access option enables a shared VD that both controller nodes can access. If you uncheck this box, the VD has a status of Exclusive, and only the controller node that created this VD can access it.

11. On this same page, click Accept, then click Next.

12. On the next page, click Next.

13. Click Yes to accept the configuration.

14. Repeat the previous steps to create the other VDs.

As the VDs are configured on the first controller node, the other controller node’s drive listing is updated to reflect the use of the drives.

15. When prompted, click Yes to save the configuration, and click Yes to confirm that you want to initialize it.

16. Define hot spare disks for the VDs to maximize the level of data protection.

NOTE The Syncro CS 9271-8i solution supports global hot spares and dedicated hot spares. Global hot spares are global for the cluster, not for a controller.

17. When all VDs are configured, reboot both systems as a cluster.




3.1.2 Creating Shared or Exclusive VDs with StorCLI

StorCLI is a command-line-driven utility used to create and manage VDs. StorCLI can run in any directory on the server. The following procedure assumes that a current copy of the 64-bit version of StorCLI is located on the server in a common directory as the StorCLI executable and the commands are run with administrator privileges.

1. At the command prompt, run the following command:

storcli /c0/vall show

The c0 parameter presumes that there is only one Syncro CS 9271-8i controller in the system or that these steps reference the first Syncro CS 9271-8i controller in a system with multiple controllers.

The following figure shows some sample configuration information that appears in response to the command.

Figure 8 Sample Configuration Information

The command generates many lines of information that scroll down in the window. You need to use some of this information to create the shared VD.

2. Find the Device ID for the JBOD enclosure for the system and the Device IDs of the available physical drives for the VD you will create.

In the second table in the preceding figure, the enclosure device ID of 252 appears under the heading EID, and the device ID of 0 appears under the heading DID. Use the scroll bar to find the device IDs for the other physical drives for the VD.

Detailed drive information, such as the drive group, capacity, and sector size, follows the device ID in the table and is explained in the text below the table.

3. Create the shared VD using the enclosure and drive device IDs with the following command line syntax:

Storcli /c0 add vd rX drives=e:s

The HA-DAS version of StorCLI creates, by default, a shared VD that is visible to all cluster nodes.




The following notes explain the command line parameters.

— The /c0 parameter selects the first Syncro CS 9271-8i controller in the system.— The add vd parameter configures and adds a VD (logical disk).— The rX parameter selects the RAID level, where X is the level.— The opening and closing square brackets define the list of drives for the VD. Each drive is listed in the form

enclosure device ID: [slot]drive device ID.

NOTE To create a VD that is visible only to the node that created it (such as creating a boot volume for this cluster node), add the [ExclusiveAccess] parameter to the command line.

For more information about StorCLI command line parameters, refer to the MegaRAID SAS Software User Guide.

3.1.3 Creating Shared or Exclusive VDs with MSM

Follow these steps to create VDs for data storage with MSM. When you create the VDs, you assign the Share Virtual Drive property to them to make them visible from both controller nodes. This example assumes you are creating a RAID 5 redundant VD. Modify the instructions as needed for other RAID levels.

NOTE Not all versions of MSM support HA-DAS. Check the release notes to determine if your version of MSM supports HA-DAS. Also, see Section 5.1, Verifying HA-DAS Support in Tools and the OS Driver.

1. In the left panel of the MSM Logical pane, right-click the Syncro CS 9271-8i controller and select Create Virtual Drive from the pop-up menu.

The Create Virtual Drive wizard appears.

2. Select the Advanced option and click Next.

3. In the next wizard screen, select RAID 5 as the RAID level, and select unconfigured drives for the VD, as shown in the following figure.

Figure 9 Drive Group Settings




4. Click Add to add the VD to the drive group.

The selected drives appear in the Drive groups window on the right.

5. Click Create Drive Group. Then click Next to continue to the next window.

The Virtual Drive Settings window appears.

6. Enter a name for the VD.

7. Select Always Write Back as the Write policy option, and select other VD settings as required.

8. Select the Provide Shared Access option, as shown in the following figure.

NOTE If you do not select Provide Shared Access, the VD is visible only from the server node on which it is created. Leave this option unselected if you are creating a boot volume for this cluster node.

Figure 10 Provide Shared Access Option

9. Click Create Virtual Drive to create the virtual drive with the settings you specified.

The new VD appears in the Drive groups window on the right of the window.

10. Click Next to continue.




The Create Virtual Drive Summary window appears, as shown in the following figure.

Figure 11 Create Virtual Drive Summary

11. Click Finish to complete the VD creation process.

12. Click OK when the Create Virtual Drive - complete message appears.

3.1.3.1 Unsupported Drives

Drives that are used in the Syncro CS 9271-8i solution must selected from the list of approved drives listed on the LSI web site (see the URL in Section 1.4, Hardware Compatibility). If the MegaRAID Storage Manager (MSM) utility finds a drive that does not meet this requirement, it marks the drive as Unsupported, as shown in the following figure.

Figure 12 Unsupported Drive in MSM



Chapter 3: Creating the ClusterHA-DAS CacheCade Support

3.2 HA-DAS CacheCade Support

The Syncro CS 9271-8i controller includes support for CacheCade 1.0, a feature that uses SAS SSD devices for read caching of frequently accessed read data. When a VD is enabled for the CacheCade feature, frequently read data regions of the VD are copied into the SSD when the CacheCade algorithm determines the region is a good candidate. When the data region is in the CacheCade SSD volume, the firmware can service related reads from the faster-access SSD volume instead of the higher-latency VD. The CacheCade feature uses a single SSD to service multiple VDs.

The Syncro CS 9271-8i solution requires the use of SAS SSDs that support SCSI-3 persistent reservations (PR) for CacheCade VDs. LSI maintains a list of SAS SSD drives that meet the HA-DAS requirements

NOTE A CacheCade VD is not presented to the host operating system, and it does not move to the peer controller node when a failover occurs. A CacheCade VD possesses properties that are similar to a VD with exclusive host access. Therefore, the CacheCade volume does not cache read I/Os for VDs that are managed by the peer controller node.

Follow these steps to create a CacheCade 1.0 VD as part of a Syncro CS 9271-8i configuration. The procedure automatically associates the CacheCade volume with all existing shared VDs in the configuration. Be sure that one or more SAS SSD drives are already installed in the system. Also, be sure you are using a version of MSM that supports HA-DAS.

1. In MSM, open the physical view, right-click on the controller name and select Create CacheCade SSD Caching.

2. In the Drive Group window, set the CacheCade RAID level and select one or more unconfigured SSD drives. Use the Add button to place the selected drives into the drive group.

RAID 0 is the recommended RAID level for the CacheCade volume.

The following figure shows the CacheCade drive group.

Figure 13 Creating a CacheCade Drive Group: 1



Chapter 3: Creating the ClusterHA-DAS CacheCade Support

3. Click Create Drive Group and then click Next.

4. In the Create CacheCade SSD Caching Virtual Drive window, update the SSD Caching VD name and set the size as necessary.

The maximum allowable size for the CacheCade volume is 512 GB. To achieve optimal read cache performance, the recommended best practice is to make the size as large as possible with the available SSDs, up to this limit.

Figure 14 Creating a CacheCade Drive Group: 2

5. Click Create Virtual Drive and then click Next.

6. In the Create CacheCade SSD Caching Summary window, review the configuration and then click Finish.

Figure 15 Reviewing the Configuration

7. In the Create CacheCade SSD Caching Complete box, click OK.



Chapter 3: Creating the ClusterCreating the Cluster in Windows

The CacheCade VD now appears on the Logical tab of MSM, as shown in the following figure. The CacheCade volume association with the drive groups appears in this view.

Figure 16 New CacheCade Drive Group

3.3 Creating the Cluster in Windows

The following subsections describe how to enable cluster support, and how to enable and validate the failover configuration while running a Windows operating system.

3.3.1 Prerequisites for Cluster Setup

3.3.1.1 Clustered RAID Controller Support

Support for clustered RAID controllers is not enabled by default in Microsoft Windows Server 2012 or Microsoft Windows Server 2008 R2.

To enable support for this feature, please consult with your server vendor. For additional information, visit the Cluster in a Box Validation Kit for Windows Server site on the Microsoft Windows Server TechCenter website for Knowledge Base (KB) article 2839292 on enabling this support.

3.3.1.2 Enable Failover Clustering

The Microsoft Server 2012 operating system installation does not enable the clustering feature by default. Follow these steps to view the system settings, and, if necessary, to enable clustering.

1. From the desktop, launch Server Manager.

2. Click Manage and select Add Roles and Features.

3. If the Introduction box is enabled (and appears), click Next.

4. In the Select Installation Type box, select Role Based or Feature Based.

5. In the Select Destination Server box, select the system and click Next.

6. In the Select Server Roles list, click Next to present the Features list.

7. Make sure that failover clustering is installed, including the tools. If necessary, run the Add Roles and Features wizard to install the features dynamically from this user interface.

8. If the cluster nodes need to support I/O as iSCSI targets, expand File and Storage Services, File Services and check for iSCSI Target Server and Server for NFS.




During creation of the cluster, Windows automatically defines and creates the quorum, a configuration database that contains metadata required for the operation of the cluster. To create a shared VD for the quorum, see the instructions in Section 3.1, Creating Virtual Drives on the Controller Nodes.

NOTE The recommended best practice is to create a small redundant VD for the quorum. A size of 500 MB is adequate for this purpose.

To determine if the cluster is active, run MSM and look at the Dashboard tab for the controller. The first of two nodes that boots shows the cluster status as Inactive until the second node is running and the MSM dashboard on the first node has been refreshed.

NOTE To refresh the MSM dashboard, press F5 or select Manage > Refresh on the menu.

The following figure shows the controller dashboard with Active peer controller status.

Figure 17 Controller Dashboard: Active Cluster Status

3.3.1.3 Configure Network Settings

To establish inter-server node communication within the cluster, each server node is contained within a common network domain served by a DNS.

1. Set the IP addresses of each server node within the same domain.

2. Use the same DNS and log on as members of the same domain name.

See the following example network configuration settings.

Server 1:

IP address: 135.15.194.21

Subnet mask: 255.255.255.0

Default gateway: 135.15.194.1

DNS server: 135.15.194.23

Server 2:

IP address: 135.15.194.22

Subnet mask: 255.255.255.0

Default gateway: 135.15.194.1

DNS server: 135.15.194.23




3.3.2 Creating the Failover Cluster

After all of the cluster prerequisites have been fulfilled, you can a create Failover Cluster by performing the following steps.

1. Launch the Failover Cluster Manager Tool from Server Manager: Select Server Manager > Tools > Failover Cluster Manager.

2. Launch the Create Cluster wizard: Click Create Cluster... from the Actions panel.

3. Select Servers: Use the Select Server wizard to add the two servers you want to use for clustering.

4. Validation Warning: To ensure the proper operation of the cluster, Microsoft recommends validating the configuration of your cluster.

See Section 3.3.3, Validating the Failover Cluster Configuration for additional details.

5. Access Point for Administering the Cluster: Enter the name that you want to assign to the Cluster in the Cluster Name field.

6. Confirmation: A brief report containing the cluster properties appears. If no other changes are required, you have the option to specify available storage by selecting the Add all eligible Storage to the cluster check box.

7. Creating the New Cluster: Failover Cluster Manager uses the selected parameters to create the cluster.

8. Summary: A cluster creation report summary appears; this report includes any errors or warnings encountered.

9. Click on the View Report… button for additional details about the report.

3.3.3 Validating the Failover Cluster Configuration

Microsoft recommends that you validate the failover configuration before you set up failover clustering. To do this, run the Validate a Configuration wizard for Windows Server 2008 R2 or Windows Server 2012, following the instructions from Microsoft. The tests in the validation wizard include simulations of cluster actions. The tests fall into the following categories:

System Configuration tests. These tests analyze whether the two server modules meet specific requirements, such as running the same version of the operating system version using the same software updates.

Network tests. These tests analyze whether the planned cluster networks meet specific requirements, such as requirements for network redundancy.

Storage tests. These tests analyze whether the storage meets specific requirements, such as whether the storage correctly supports the required SCSI commands and handles simulated cluster actions correctly.

Follow these steps to run the Validate a Configuration wizard.

NOTE You can also run the Validate a Configuration wizard after you create the cluster.

1. In the failover cluster snap-in, in the console tree, make sure Failover Cluster Management is selected and then, under Management, click Validate a Configuration.

The Validate a Configuration wizard starts.

2. Follow the instructions for the wizard and run the tests.

Microsoft recommends that you run all available tests in the wizard.

NOTE Storage Spaces does not currently support Clustered RAID controllers. Therefore, do not include the Validate Storage Spaces Persistent Reservation storage test in the storage test suite. For additional information, visit the Cluster in a Box Validation Kit for Windows Server site on the Microsoft Windows Server TechCenter website.

3. When you arrive at the Summary page, click View Reports to view the results of the tests.



Chapter 3: Creating the ClusterCreating the Cluster in Red Hat Enterprise Linux (RHEL)

4. If any of the validation tests fails or results in a warning, correct the problems that were uncovered and run the test again.

3.4 Creating the Cluster in Red Hat Enterprise Linux (RHEL)

The following subsections describe how to enable cluster support, create a two-node cluster and configure NFS-clustered resources for a Red Hat operating system.

Please note that the Syncro CS solution requires the Red Hat Enterprise Linux High Availability add-on in order for dual-active HA functionality to operate properly and ensure data integrity through fencing. Product information regarding the Red Hat Enterprise Linux High Availability add-on can be found at http://www.redhat.com/products/enterprise-linux-add-ons/high-availability/.


Before you create a cluster, perform the following tasks so that all of the necessary modules and settings are pre-configured. Additional details regarding Red Hat High Availability Add-On configuration and management can be found at https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/Cluster_Administration/Red_Hat_Enterprise_Linux-6-Cluster_Administration-en-US.pdf.


Perform the following steps to configure the network settings.

1. Activate the network connections for node eth0 and node eth1 by selecting the following paths:

System > Preferences > Network Connections > System eth0 > Edit > Check Connect automaticallySystem > Preferences > Network Connections > System eth1 > Edit > Check Connect automatically

2. NetworkManager is not supported on cluster nodes and should be removed or disabled. To disable, enter the following commands at the command prompt for both nodes:

service NetworkManager stop

chkconfig NetworkManager off

3. Perform the following steps to assign static IP addresses for both nodes (a total of four IP addresses).

a. Select Setup > Network Configuration > Device Configuration.b. Use DNS: YOUR_IP_ADDRESS.c. Edit the /etc/hosts file to include the IP address and the hostname for both nodes and the client.d. Make sure you can ping hostname from both nodes.

The following are examples of the hosts file:

YOUR_IP_ADDRESS Node1

YOUR_IP_ADDRESS Node2

YOUR_IP_ADDRESS Client

4. Configure the following iptables firewall settings to allow cluster services communication:

— cman (Cluster Manager): UDP ports 5405 and 5405— dlm (Distributed Lock Manager): TCP port 21064— ricci (part of Conga remote agent): TCP port 11111— modclustered (part of Conga remote agent): TCP port 16851— luci (Conga User Interface server): TCP port 8084

http://www.redhat.com/products/enterprise-linux-add-ons/high-availability/


https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/Cluster_Administration/Red_Hat_Enterprise_Linux-6-Cluster_Administration-en-US.pdf

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/Cluster_Administration/Red_Hat_Enterprise_Linux-6-Cluster_Administration-en-US.pdf






3.4.1.2 Install and Configure the High Availability Add-On Features

The Syncro CS solution requires that the Red Hat Enterprise Linux High Availability add-on be applied to the base RHEL OS.

Perform the following steps to install and configure the add-on feature.

1. Install the Red Hat Cluster Resource Group Manager, Logical Volume Manager (LVM), and GFS2 utilities, and then update to the latest version by typing the following commands at the command prompt:

yum install rgmanager lvm2-cluster gfs2-utils

yum update

NOTE This step assumes that both nodes have been registered with Red Hat using the Red Hat Subscription Manager.

2. Ricci is a daemon that runs on both server nodes and allows the cluster configuration commands to communicate with each cluster node.

Perform the following steps to change the ricci password for both server nodes.

a. Enter the following at the command prompt:

passwd ricci

b. Specify your password when prompted.c. Start the ricci service by entering the following at the command prompt for both nodes:

service ricci start

d. (Optional) Configure the ricci service to start on boot for both nodes by entering the following at the command prompt:

chkconfig ricci on

3. Luci is a user interface server that allows you to configure the cluster using the High Availability management web interface, Conga.

Best Practice: You can run the luci web interface on either node but it is best to run it on a remote management system.

Install luci by entering the following at the command prompt:

yum install luci

service luci start

3.4.1.3 Configure SELinux

You need to configure SELinux policies to allow for clustering. Refer to Red Hat documentation to properly configure your application

3.4.2 Creating the Cluster

Configuring cluster software often occurs on a single node and is then pushed to the remaining nodes in the cluster. Multiple methods exist to configure the cluster, such as using the command line, directly editing configuration files, and using a GUI. The procedures in this document use the Conga GUI tool to configure the cluster. After the Cluster is created, the following steps allow you to specify cluster resources, configure fencing, create a failover domain, and add cluster service groups.

Perform the following steps to create the Cluster.

1. Launch the luci web interface by going to https://YOUR_LUCI_SERVER_HOSTNAME:8084 from your web browser.




The following window appears.

Figure 18 Create New Cluster Window

2. Log in as root for the user, and enter the associated root password for the host server node.

3. Go to the Manage Cluster tab.

4. Click Create.

5. Enter a name in the Cluster Name field.

NOTE The Cluster Name field identifies the cluster and will be referenced in proceeding steps.

6. Add each server node in the Node Name field.

7. In the password field, enter the ricci password for each server node participating in the cluster.

8. Select the check box for the Enable Share Storage Support option.

9. Click Create Cluster to complete.

3.4.3 Configure the Logical Volumes and Apply GFS2 File System

Perform the following steps to create a virtual drive volume that can be managed by the Linux kernel Logical Volume Manager. All of the commands in the following procedure are entered in the command line prompt.

1. Create a virtual drive with Shared access policy based on the steps defined in Section 3.1, Creating Virtual Drives on the Controller Nodes.

2. Create a physical volume label for use with LVM by entering the following command:

pvcreate /dev/sdb

3. Create a volume group (vol_grp0) and map /dev/sdb to the volume group by entering the following command:

create vol_grp0 /dev/sdb




4. Create a virtual drive volume from the volume group of size X (gigabytes) by entering the following command:

lvcreate --size XXXG vol_grp0

Best Practice: Use the command vgdisplay to display X size information for the volume group.

The system now has the following device file (BlockDevice): /dev/vol_grp0/lvol0. The GFS2 file system is a cluster file system that allows for shared storage access.

NOTE The Cluster Name is the name that you specified in Section 3.4.2, Creating the Cluster.

5. Perform the following step to apply this file system to the virtual drives created in the previous step.

mkfs.gfs2 -p lock_dlm -t ClusterName:FSName -j NumberJournals BlockDevice

For example, using the virtual drive created in the previous step, the result is:

mkfs.gfs2 -p lock_dlm -t YOUR_CLUSTER_NAME:V1 -j 2 /dev/vol_grp0/lvol0.

6. Create mount points from each server node.

For example, you can create a mount point by entering the following command: /root/mnt/vol1.




3.4.4 Add a Fence Device

Fencing ensures data integrity on the shared storage file system by removing (by a power-down) any problematic node from the cluster before the node compromises a shared resource.

Perform the following steps to add a Fence Device.

1. Select the Fence Device tab and then click Add on the following window.

2. Select SCSI Reservation Fencing.

3. Select the Nodes tab and then perform the following steps for both nodes.

4. Select a cluster node name.

5. Under the section for Fencing Devices, select Add Fence Method > Submit.

6. Select Add Fence Instance > Choose Fence Devices.

7. Select Create > Submit.

Figure 19 Fence Devices Window




3.4.5 Create a Failover Domain

By default, all of the nodes can run any cluster service. To provide better administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference.

Perform the following steps to create a failover domain.

1. Click the Failover Domains tab and click Add on the following window.

2. Enter a failover domain name in the Name text box, and click the No Failback check box.

3. Select the nodes that you want to make members of the failover domain.

4. Specify any options needed for this resource in the Prioritized, Restricted, and No Failback check boxes.

5. Click Create to complete.

Figure 20 Add Failover Domain to Cluster Window




3.4.6 Add Resources to the Cluster

Shared resources can be shared directories or properties, such as the IP address, that are tied to the cluster. These resources can be referenced by clients as though the cluster were a single server/entity. This section describes how to add GFS2 and IP address cluster resources.

Perform the following steps to create a GFS2 cluster resource:

1. Select the Resources tab and click Add on the following window.

2. Select GFS2 from the pull-down menu.

3. Specify the name of the GF2 resource in the Name field.

4. Specify the mount point of the resource by using the mount point that you created for the shared storage logical volume in the Mount Point field.

5. Specify an appropriate reference for this resource in the Device, FS label, or UUID field.

6. Select GFS2 from the pull-down menu for the Filesystem Type field.

7. Specify any options needed for this volume in the Mount Options field.

8. Specify any options needed for this resource in the Filesystem ID field, and the Force Unmount, Enable NFS daemon and lockd workaround, or Reboot Host Node if Unmount Fails check boxes.

9. Select Submit to complete.

Figure 21 Add GFS2 Resource to Cluster Window




Perform the following steps to create an IP Address cluster resource:


2. Select IP Address from the pull-down menu.

3. Specify the address of the cluster resource in the IP Address field.

4. Specify any options needed for this resource in the Netmask Bites, Monitor Link, Disable Updates to Static Routes, and Number of Seconds to Sleep After Removing an IP Address fields.


Figure 22 Add IP Address Resource to Cluster Window




Perform the following steps to create an NFSv3 Export cluster resource:


2. Select NFS v3 Export from the pull-down menu.

3. Specify the name of the resource in the Name field.


Figure 23 Add the NFSv3 Export Resource to Cluster Window




Perform the following steps to create an NFS Client cluster resource:


2. Select NFS Client from the pull-down menu.

3. Specify the name of the resource in the Name field.

4. Specify the address of the resource in the Target Hostname, Wildcard, or Netgroup field.

5. Specify any options needed for this resource in the Allow Recovery of This NFS Client check box and Options field.


Figure 24 Add NFS Client Resource to Cluster Window




3.4.7 Create Service Groups

Service groups allow for greater organization and management of cluster resources and services that are associated with the cluster.

Perform the following steps to create Service Groups:

1. Select the Services Group tab and click Add on the following window.

2. Choose a service name that describes the function for which you are creating the service.

3. Select a previously created failover domain from the pull-down menu.

4. Click the Add a Resource tab.

5. From the drop down menu, select the IP Address resource that you created earlier (all of the created resources appear on top).

6. Click Add Resource, and then select the GFS File System resource created earlier from drop down menu.

7. Click Add a child resource to the added GFS File System resource, and select the NFS Export resource created earlier.

8. Click Add a child resource to the newly added NFS Export resource, and select the NFS Client resource created earlier.

Figure 25 Create Service Group Window




3.4.8 Create a Disk Quorum

The disk quorum allows the cluster manager to determine which nodes in the cluster are dominant using a shared storage disk (block device) as the medium. A shared virtual drive of at least 10MB will need to be configured for the disk quorum device.

Perform the following steps to make a disk quorum (qdisk):

1. Create or choose a small capacity Shared VD for the disk quorum (the example assumes a RAID 0 VD with mounting at /dev/sda with mr_qdisk as the quorum disk name) by entering the following at the command line prompt:

#mkqdisk -c /dev/sda -l mr_qdisk

2. Check whether qdisk was created at both nodes by entering the following at the command line prompt:

#mkqdisk -L

3. From the QDisk tab in the Cluster Management GUI shown in the following window, under the Heuristics section enter the following in the Path to Program field:

ping -c3 -w1 <YOUR_GATEWAY_IP_ADDRESS>

4. Enter 2 in the Interval field, 1 in the Score field, 10 or 20 in the TKO field, and 1 in the Minimum Total Score field.

5. Click Apply when complete.

Figure 26 Quorum Disk Configuration Window



Chapter 3: Creating the ClusterCreating the Cluster in SuSE Linux Enterprise Server (SLES)

3.4.9 Edit the Cluster Configuration File

For the Syncro CS solution to function properly, make the following parameter changes to the cluster configuration file:

1. Edit the cluster configuration file (/etc/cluster/cluster.conf ) with the following changes:

a. Increment the config_version by 1.b. Enter the following settings:

<quorum label=”mr_qdisk” min_score=”1” interval=”1” tko=”52”>

<heuristic [….]/>

</quorum>

<totem token=”120000”>

c. Propogate the new cluster.conf file to both nodes by entering the following command:

cman_tool version -r

d. Reboot both nodes to apply changes to the cluster.

3.4.10 Mount the NFS Resource from the Remote Client

Mount the NFS volume from the remote client by entering the following at the command line prompt:

mount –t nfs –o rw,nfsvers=3 exportname:/pathname /mntpoint

3.5 Creating the Cluster in SuSE Linux Enterprise Server (SLES)

The following subsections describe how to enable cluster support, create a two node cluster and configure a NFS clustered resource for SLES 11 SP2/SP3 operating system. Note that the Syncro CS solution requires the SuSE Linux Enterprise High Availability (SLE-HA) extensions in order to operate properly. Additional product details regarding SuSE High Availability Extensions can be found at https://www.suse.com/products/highavailability/.


Before you create a cluster, you need to perform the following tasks to ensure that all of the necessary modules and settings are pre-configured.

3.5.1.1 Prepare the Operating System

Perform the following steps to prepare the operating system:

1. Make sure that all of the maintenance updates for the SLES 11 Service Pack 2/3 are installed.

2. Install the SLE-HA extension by performing the following steps.

a. Download the SLE-HA Extension iso to each node.b. To install this SLE-HA add-on, start YaST and select Software > Add-On Products.c. Select the local ISO image, and then enter the path to ISO Image.d. From the filter list, select Patterns, and activate the High Availability pattern in the pattern list.e. Click Accept to start installing the packages.f. Install the High Availability pattern on node 2 that is part of the cluster.

https://www.suse.com/products/highavailability/







Each node should have two Ethernet ports, with one (em1) connected to the network switch, and another (em2) connected to the em2 ethernet port on the other node.

Perform the following steps to configure network settings:

1. Perform the following steps to assign the static IP addresses:

a. In each node, set up the static IP address for both em1 and em2 ethernet ports by selecting Application > yast > Network settings.

b. Select the ethernet port em1, and then select Edit. c. Select the statically assigned IP address, input the IP Address/ Subnet Mask / Hostname, and then confirm the

changes.d. Repeat steps b and c for ethernet port em2.

2. You need to open the following ports for each node in the firewall for communication of cluster services between nodes:

— TCP Ports – 30865, 5560, 7630, 21064— UDP Port – 5405

3. Create the file /etc/sysconfig/network/routes, and enter the following text:

default YOUR_GATEWAY_IPADDRESS - -

4. Edit the file /etc/resolv.conf to change the DNS IP address to the IP address of the DNS server in your network in the following format:

nameserver YOUR_DNS_IPADDRESS

Alternatively, you can set the DNS and Default gateway in the Network Settings screen Global Options tab, Host/DNS tab, or Routing tab, as shown in the following figures.

Figure 27 Network Settings on Global Options Tab




Figure 28 Network Settings on Hostname/DNS Tab

Figure 29 Network Settings on Routing Tab




5. Restart the network service if needed by entering the following command:

/etc/init.d/network restart

6. Edit the /etc/hosts file to include the IP address and the hostname for both node 1, node 2, and the remote client. Make sure that you can access both nodes through the public and private IP addresses.

The following examples use sles-ha1 to denote node 1 and sles-ha2 to denote node 2.

YOUR_IP_ADDRESS_1 sles-ha1.yourdomain.com sles-ha1




YOUR_CLIENT_IP_ADDRESS client.yourdomain client

7. Establish ssh keyless entry from node1 to node 2, and vice versa, by entering the following commands:

ssh-keygen

ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1

3.5.1.3 Connect to the NTP Server for Time Synchronization

Perform the following steps to configure both the nodes to use the NTP server in the network for synchronizing the time.

1. Access Yast > Network Services > NTP configuration, and then select Now & on boot.

2. Select Add, check Server, and select the local NTP server.

3. Add the IP address of the NTP server in your network.

3.5.2 Creating the Cluster

You can use multiple methods to configure the cluster directly, such as using the command line, editing configuration files, and using a GUI. The procedures in this document use a combination of the Yast GUI tool and the command line to configure the cluster. After the cluster is online, you can perform the following steps to add NFS cluster resources.

3.5.2.1 Cluster Setup

Perform the following step to install the cluster setup automatically.

1. On node1, start the bootstrap script by entering the following command:

sleha-init

NOTE If NTP is not configured on the nodes, a warning appears. You can address the warning by configuring NTP by following the steps in Section 3.5.1.3, Connect to the NTP Server for Time Synchronization.

2. Specify the Worldwide Identifier (WWID) for a shared virtual drive in your node when prompted.

The following WWID is shown as an example.

/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343

3. On Node 2, start the bootstrap script by entering the following command.

sleha-join

4. Complete the cluster setup on node 2 by specifying the Worldwide Identifier (WWID) for a shared virtual drive in your node when prompted.

After you perform the initial cluster setup using the bootstrap scripts, you need to make changes to the cluster settings that you could not make during bootstrap.




Perform the following steps to revise the cluster Communication Channels, Security, Service, Csync2 and conntrackd settings.

1. Start the cluster module from command line by entering the following command.

yast2 cluster

2. After the fields in the following screen display the information (according to your setup), click the check box next to the Auto Generate Node ID field to automatically generate a unique ID for every cluster node.

3. If you modified any options for an existing cluster, confirm your changes, and close the cluster module. YaST writes the configuration to /etc/corosync/corosync.conf.

Figure 30 Cluster Setup on the Communication Channels Tab




4. Click Finish.

5. After the fields in the following screen display the information, click Generate Auth Key File on Node1 only.

This action creates an authentication key that is written to /etc/corosync/authkey.

To make node2 join the existing cluster, do not generate a new key file. Instead, copy the /etc/corosync/authkey from node1 to the node2 manually.

Figure 31 Cluster Setup on Security Tab




6. Click Finish.

The following screen appears.

Figure 32 Cluster Setup on Service Tab




7. Click Finish.


Figure 33 Cluster Setup on Csync2 Tab

8. To specify the synchronization group, click Add in the Sync Host Group, and enter the local hostnames of all nodes in your cluster. For each node, you must use exactly the strings that are returned by the hostname command.

9. Click Generate Pre-Shared-Keys to create a key file for the synchronization group.

The key file is written to /etc/csync2/key_hagroup. After the key file has been created, you must copy it manually to node2 of the cluster by performing the following steps:

a. Make sure the same Csync2 configuration is available on all nodes. To do so, copy the file /etc/csync2/csync2.cfg manually to all node2 after you complete cluster configuration in node1.

Include this file in the list of files to be synchronized with Csync2.

b. Copy the file /etc/csync2/key_hagroup that you generated on node1 to node2 in the cluster, as it is needed for authentication by Csync2. However, do not regenerate the file on node2; it needs to be the same file on all nodes.

c. Both Csync2 and xinetd must be running on all nodes. Execute the following commands on all nodes to make both services start automatically at boot time and to start xinetd now:

chkconfig csync2 on

chkconfig xinetd on

rcxinetd start

d. Copy the configuration from node1 to node2 by using the following command:

csync2 -xv

This action places all of the files on all of the nodes. If all files are copied successfully, Csync2 finishes with no errors.




10. Activate Csync2 by clicking Turn Csync2 ON.

This action executes chkconfig csync2 to start Csync2 automatically at boot time.

11. Click Finish.


Figure 34 Cluster Setup on Conntrackd Tab

12. After the information appears, click Generate /etc/conntrackd/conntrackd.conf to create the configuration file for conntrackd.

13. Confirm your changes and close the cluster module.

If you set up the initial cluster exclusively with the YaST cluster module, you have now completed the basic configuration steps.

14. Starting at step 1 in this procedure, perform the steps for Node2.

Some keys do not need to be regenerated on Node2, but they have to be copied from Node1.

3.5.3 Bringing the Cluster Online

Perform the following steps to bring the cluster online.

1. Check if the openais service is already running by entering the following command:

rcopenais status

2. If the openais service is already running, go to step 3. If not, start OpenAIS/Corosync now by entering the following command:

rcopenais start




3. Repeat the steps above for each of the cluster nodes. On each of the nodes, check the cluster status with the following command:

crm_mon

If all of the nodes are online, the output should be similar to the following:

============

Last updated: Thu May 23 04:28:26 2013

Last change: Mon May 20 09:05:29 2013 by hacluster via crmd on sles-ha1

Stack: openais

Current DC: sles-ha2 - partition with quorum

Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e

2 Nodes configured, 2 expected votes

1 Resources configured.

Online: [ sles-ha2 sles-ha1 ]

stonith-sbd (stonith:external/sbd): Started sles-ha2

============

This output indicates that the cluster resource manager is started and is ready to manage resources.

3.5.4 Configuring the NFS Resource with STONITH SBD Fencing

The following subsections describe how to set up an NFS resource by installing the NFS kernel server, configuring the shared VD by partitioning, applying the ext3 file system, and configuring the stonith_sbd fencing.

3.5.4.1 Install NFSSERVER

Use Yast to install nfs-kernel-server and all of the required dependencies.

3.5.4.2 Configure the Partition and the File System

Perform the following steps to configure the partition and the file system.

1. Use fdisk or any other partition modification tool to create partitions on the virtual drive.

For this example, /dev/sda is on a shared virtual drive with two partitions created: sda1 (part1) for sbd and sda2 (part2) for NFS mount (the actual data sharing partition)

2. Use mkfs to apply the ext3 file system to the partition(s)/

sles-ha1:~ # mkfs.ext3 /dev/sda1

sles-ha1:~ # mkfs.ext3 /dev/sda2

3.5.4.3 Configure stonith_sbd Fencing

Stonith_sbd is the fencing mechanism used in SLES-HA. Fencing ensures data integrity on the shared storage by not allowing problematic nodes from accessing the cluster resources. Before you create another resource, you have to configure this mechanism correctly.

For this example, the World Wide Name (WWN - 0x600605b00316386019265c4910e9a343) refers to /dev/sda1.

NOTE Use only the wwn-xyz device handle to configure stonith_sbd. The /dev/sda1 device handle is not persistent, and using it can cause sbd unavailability after a reboot.




Perform the following step to set up the stonith_sbd fencing mechanism.

1. Create the sbd header and set the watchdog timeout to 52 seconds and mgswait timeout to 104 seconds by entering the following at the command prompt:

sles-ha1:~ # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 -4 104 -1 52 create

The following output appears.

Initializing device /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1

Creating version 2 header on device 3

Initializing 255 slots on device 3

Device /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 is initialized.

2. Verify that the sbd header was created and timeout set properly by entering the following at the command prompt:

sles-ha1:~ # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 dump


==Dumping header on disk /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1

Header version : 2

Number of slots : 255

Sector size : 512

Timeout (watchdog) : 10

Timeout (allocate) : 2

Timeout (loop) : 1

Timeout (msgwait) : 104

==Header on disk /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 is dumped

3. Add the contents to /etc/sysconfig/sbd by entering the following at the command prompt:

sles-ha1:~ # cat /etc/sysconfig/sbd


SBD_DEVICE="/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1"

SBD_OPTS="-W"

4. Allocate a slot for the node 1 for sbd by entering the following at the command prompt:

sles-ha1: # scp /etc/sysconfig/sbd root@sles-ha2:/etc/sysconfig/

sles-ha1:/etc/sysconfig # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 allocate sles-ha1


Trying to allocate slot for sles-ha1 on device /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1.

slot 0 is unused - trying to own

Slot for sles-ha1 has been allocated on /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1.

5. Allocate a slot for the node 2 for sbd by entering the following at the command prompt:

sles-ha1:/etc/sysconfig # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 allocate sles-ha2





Trying to allocate slot for sles-ha2 on device /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1.

slot 1 is unused - trying to own Slot for sles-ha2 has been allocated on /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1.

6. Verify that the both nodes have allocated slots for sbd by entering the following at the command prompt:

sles-ha1:/etc/sysconfig # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 list

The following output appears:

0 sles-ha1 clear

1 sles-ha2 clear

7. Restart the corosync daemon on node 1 by entering the following at the command prompt:

sles-ha1: # rcopenais restart


Stopping OpenAIS/corosync daemon (corosync): Stopping SBD - done OK

Starting OpenAIS/Corosync daemon (corosync): Starting SBD - starting... OK

8. Restart the corosync daemon on node 2 by entering the following at the command prompt:

sles-ha2:# rcopenais restart


Stopping OpenAIS/corosync daemon (corosync): Stopping SBD - done OK

Starting OpenAIS/Corosync daemon (corosync): Starting SBD - starting... OK

9. Check whether both the nodes are able to communicate with each other through sbd by entering the following commands.

sles-ha2:# sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 message sles-ha1 test

sles-ha1: # tail -f /var/log/messages

Output from node 1 similar to the following appears.

Jun 4 07:45:15 sles-ha1 sbd: [8066]: info: Received command test from sles-ha2 on disk /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1

10. Send a message from node 1 to node 2 to confirm that the message can be sent both ways by entering the following command.

sles-ha1: # sbd -d /dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1 message sles-ha2 test

11. After you confirm the message can be sent either way, configure stonith_sbd as a resource by entering the following commands in crm, the command line utility.

sles-ha1:# crm configure

crm(live)configure# property stonith-enabled="true"

crm(live)configure# property stonith-timeout="120s"

crm(live)configure# primitive stonith_sbd

stonith:external/sbd params sbd_device="/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1"

crm(live)configure# commit

crm(live)configure# quit




12. Revise the global cluster policy settings by entering the following commands.

sles-ha1:# crm configure

crm(live)configure# property no-quorum-policy="ignore"

crm(live)configure# rsc_defaults resource-stickiness="100"



3.5.5 Adding NFS Cluster Resources

This section describes the method you can use to add NFS cluster resources using command line method. Alternatively, you can use the Pacemaker GUI tool.

1. Create the mount folders in both sles-ha1 and sles-ha2 according to your requirements, by using the following commands.

sles-ha1:# mkdir /nfs

sles-ha1:# mkdir /nfs/part2

sles-ha2:# mkdir /nfs

sles-ha2:# mkdir /nfs/part2

ATTENTION Do not manually mount the ext3 partition on this folder. The cluster takes care of that action automatically. Mounting the partition manually would corrupt the file system.

2. Add the following contents to /etc/exports on both sles-ha1 and sles-ha2 by using the following command.

sles-ha2:~ # cat /etc/exports


/nfs/part2 YOUR_SUBNET/YOUR_NETMASK (fsid=1,rw,no_root_squash,mountpoint)

3. Configure the NFSSERVER to be started and stopped by the cluster by using the following commands.

sles-ha2:~ # crm configure

crm(live)configure# primitive lsb_nfsserver lsb:nfsserver op monitor interval="15s" timeout=”15s”

4. Configure a Filesystem service by using the following command.

crm(live)configure# primitive p_fs_part2 ocf:heartbeat:Filesystem params device=/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part2 directory=/nfs/part2 fstype=ext3 op monitor interval="10s"

5. Configure a Virtual IP address. This IP address is different from the IP address that connects to the Ethernet ports. This IP address can move between both nodes. Also, enter the netmask according to your network.

crm(live)configure# primitive p_ip_nfs ocf:heartbeat:IPaddr2 params ip="YOUR_VIRTUAL_IPADDRESS" cidr_netmask="YOUR_NETMASK" op monitor interval="30s"

6. Create a group and add the resources part of the same group by using the following commands.

NOTE The stonith_sbd should not be part of this group. Make sure that all added shared storage is listed at the beginning of the group order because migration of the storage resource is a dependency for other resources.

crm(live)configure# group g_nfs p_fs_part2 p_ip_nfs lsb_nfsserver

crm(live)configure# edit g_nfs





group g_nfs p_fs_part2 p_ip_nfs lsb_nfsserver \

meta target-role="Started"

7. Commit the changes and exit crm by entering the following commands:



8. Check whether the resources are added and the parameters are set to the correct values by using the following command. If the output is not correct, modify the resources and parameters accordingly.

sles-ha2:~ # crm configure show


node sles-ha1

node sles-ha2

primitive lsb_nfsserver lsb:nfsserver \

operations $id="lsb_nfsserver-operations" \

op monitor interval="15" timeout="15"

primitive p_fs_part2 ocf:heartbeat:Filesystem \

params device="/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part2" directory="/nfs/part2" fstype="ext3" \

op monitor interval="10s"

primitive p_ip_nfs ocf:heartbeat:IPaddr2 \

params ip="YOUR_VIRTUAL_IPADDRESS" cidr_netmask="YOUR_NETMASK" \

op monitor interval="30s"

primitive stonith_sbd stonith:external/sbd \

params sbd_device="/dev/disk/by-id/wwn-0x600605b00316386019265c4910e9a343-part1"

group g_nfs p_ip_nfs p_fs_part2 lsb_nfsserver \

meta target-role="Started"

property $id="cib-bootstrap-options" \

dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \

cluster-infrastructure="openais" \

expected-quorum-votes="2" \

stonith-timeout="120s" \

no-quorum-policy="ignore" \

last-lrm-refresh="1370644577" \

default-action-timeout="120s" \

default-resource-stickiness="100"




9. (Optional) You can use the Pacemaker GUI as an alternative to configure the CRM parameters.

Perform the following steps to use the Pacemaker GUI:

a. Before you use the cluster GUI for the first time, set the passwd for the hacluster account on both the nodes by entering the following commands. This password is needed to connect to the GUI.

sles-ha2:~ # passwd hacluster

sles-ha1:~ # passwd hacluster

b. Enter the following command to launch the GUI.

sles-ha2:~ # crm_gui

The Pacemaker GUI appears as shown in the following figures.

Figure 35 Pacemaker GUI on Policy Engine Tab (CRM Config)




The following figure shows the Pacemaker GUI with the CRM Daemon Engine tab selected.

Figure 36 Pacemaker GUI on CRM Daemon Engine Tab (CRM Config)

10. Check to confirm that the CRM is running by entering the following command.

sles-ha2:# crm_mon


============

Last updated: Mon Jun 10 12:19:47 2013

Last change: Fri Jun 7 17:13:20 2013 by hacluster via mgmtd on sles-ha1

Stack: openais

Current DC: sles-ha2 - partition with quorum

Version: 1.1.6-b988976485d15cb702c9307df55512d323831a5e

2 Nodes configured, 2 expected votes

4 Resources configured.

============

3.5.6 Mounting NFS in the Remote Client

In the remote system, use the following command to mount the exported NFS partition:

mount -t nfs "YOUR_VIRTUAL_IPADDRESS":/nfs/part2 /srv/nfs/part2



Chapter 4: System AdministrationHigh Availability Properties

Chapter 4: System Administration

This chapter explains how to perform system administration tasks, such as planned failovers and updates of the Syncro CS 9271-8i controller firmware.

4.1 High Availability Properties

The following figure shows the high availability properties that MSM displays on the Controller Properties tab for a Syncro CS 9271-8i controller.

Figure 37 Controller Properties: High Availability Properties

Following is a description of each high availability property:

Topology Type – A descriptor of the HA topology for which the Syncro CS 9271-8i controller is currently configured (the default is Server Storage Cluster).

Maximum Controller Nodes – The maximum number of concurrent Syncro CS 9271-8i controllers within the HA domain that the controller supports.

Domain ID – A unique number that identifies the HA domain in which the controller is currently included. This field has a number if the cluster or peer controller is in active state.

Peer Controller Status – The current state of the peer controller.Active: The peer controller is present and is participating in the HA domain. Inactive: The peer controller is missing or has failed. Incompatible: The peer controller is detected, but it has an incompatibility with the controller.

Incompatibility Details – If the peer controller is incompatible, this field lists the cause of the incompatibility.



Chapter 4: System AdministrationUnderstanding Failover Operations

4.2 Understanding Failover Operations

A failover operation in HA-DAS is the process by which VD management transitions from one server node to the peer server node. A failover operation might result from a user-initiated, planned action to move an application to a different controller node so that maintenance activities can be performed, or the failover might be unintended and unplanned, resulting from hardware component failure that blocks access to the storage devices. Figure 38 and Figure 39 show an example of a failover operation of various drive groups and VDs from Server A to Server B. The following figure shows the condition of the two server nodes before the failover.

Figure 38 Before Failover from Server A to Server B

Before failover, the cluster status is as follows in terms of managing the drive group and VDs:

All VDs in A-DG0 (Server A - Drive Group 0) are managed by Server A. VD3 in B-DG0 (Server B – Drive Group 0) is managed by Server B. The CacheCade VD (CC-VD) in A-CC is managed by Server A and services VDs in drive group A-DG0.

Before failover, the operating system perspective is as follows:

The operating system on Server A only sees VDs with shared host access and exclusive host access to Server A. The operating system on Server B only sees VDs with shared host access and exclusive host access to Server B.

Before failover, the operating system perspective of I/O transactions is as follows:

Server A is handling I/O transactions that rely on A-DG0:VD1 and A-DG0:VD2. Server B is handling I/O transactions that rely on A-DG0:VD0 and B-DG0:VD3.




The following figure shows the condition of the two server nodes after the failover.

Figure 39 After Failover from Server A to Server B

After failover, the cluster status is as follows, in terms of managing the drive group and the VDs:

All shared VDs in A-DG0 have failed over and are now managed by Server B. VD3 in B-DG0 is still managed by Server B. The CacheCade VD (CC-VD) in A-CC now appears as a foreign VD on Server B but does not service any VDs in

A-DG0 or B-DG0.

After failover, the operating system perspective is as follows:

The operating system on Server B manages all shared VDs and any exclusive Server B VDs.

After failover, the operating system perspective of I/O transactions is as follows:

Failover Cluster Manager has moved the I/O transactions for VD2 on A-DG0 to Server B. Server B continues to run I/O transactions on B-DG0:VD3. I/O transactions that rely on the exclusive A-DG0:VD1 on Server A fail because exclusive volumes do not move

with a failover

NOTE When Server A returns, the management and I/O paths of the pre-failover configurations are automatically restored.

The following sections provide more detailed information about planned failover and unplanned failover.




4.2.1 Understanding and Using Planned Failover

A planned failover occurs when you deliberately transfer control of the drive groups from one controller node to the other. The usual reason for initiating a planned failover is to perform some kind of maintenance or upgrade on one of the controller nodes—for example, upgrading the controller firmware, as described in the following section. A planned failover can occur when there is active data access to the shared drive groups.

Before you start a planned failover on a Syncro CS system, be sure that no processes are scheduled to run during that time. Be aware that system performance might be impacted during the planned failover.

NOTE Failed-over VDs with exclusive host access cannot be accessed unless the VD host access is set to SHARED. Do not transition operating system boot volumes from EXCLUSIVE to SHARED access.

4.2.1.1 Planned Failover in Windows Server 2012

Follow these steps to perform a planned failover on a Syncro CS 9271-8i system running Windows Server 2012.

1. Create a backup of the data on the Syncro CS 9271-8i system.

2. In the Failover Cluster Manager snap-in, if the cluster that you want to manage is not displayed in the console tree, right-click Failover Cluster Manager, click Manage a Cluster, and then select or specify the cluster that you want.

3. If the console tree is collapsed, expand the tree under the cluster that you want to configure.

4. Expand Services and Applications, and click the name of the virtual machine.

5. On the right-hand side of the screen, under Actions, click Move this service or application to another node, and click the name of the other node.

As the virtual machine is moved, the status is displayed in the results panel (center panel). Verify that the move succeeded by inspecting the details of each node in the RAID management utility.




4.2.1.2 Planned Failover in Windows Server 2008 R2

Follow these steps to perform a planned failover on a Syncro CS 9271-8i system running Windows Server 2008 R2.

1. Create a backup of the data on the Syncro CS 9271-8i system.

2. Open the Failover Cluster Manager, as shown in the following figure.

Figure 40 Failover Cluster Manager




3. In the left panel, expand the tree to display the disks, as shown in the following figure.

Figure 41 Expand Tree

4. Right-click on the entry in the Assigned To column in the center panel of the window.

5. On the pop-up menu, select Move > Select Node, as shown in the following figure.

Figure 42 Expand Tree

6. Select the node for the planned failover.




4.2.1.3 Planned Failover in Red Hat Enterprise Linux

Follow these steps to perform a planned failover on a Syncro CS 9272-8i system running Red Hat Enterprise Linux.

1. Back up the data that is on the Syncro CS 9271-8e system.

2. On the High Availability management web interface, select the Service Groups tab and select the service group that you want to migrate to the other node.

3. Select the node to migrate the service group to by using the drop-down menu by the Status field.

4. Click on the play radio button to migrate the service group.

Figure 43 Migrate Service Groups




4.2.1.4 Planned Failover in SuSE Linux Enterprise Server

Follow these steps to perform a planned failover on a Syncro CS 9271-8i system running SuSE Linux Enterprise Server.

1. Back up your data that is on the Syncro CS 9271-8i system.

2. From the Pacemaker GUI (crm_gui) click Management in the left pane.

3. As shown in the following figure, right-click the respective resource in the right pane that you want to migrate to the other node and select Migrate Resource.

Figure 44 Migrate Resource Option

4. As shown in the following figure of the Migrate Resource window, select the node to which to move the resource to from the pull-down menu in To Node field.

5. Click OK to confirm the migration.




Figure 45 Migrate Resource Window

4.2.2 Understanding Unplanned Failover

An unplanned failover might occur if the controller in one of the server nodes fails, or if the cable from one controller node to the JBOD enclosure is accidentally disconnected. The Syncro CS 9271-8i solution is designed to automatically switch to the other controller node when such an event occurs, without any disruption of access to the data on the drive groups.

NOTE When the failed controller node returns, the management and I/O paths of the pre-failover configurations are automatically restored.



Chapter 4: System AdministrationUpdating the Syncro CS 9271-8i Controller Firmware

4.3 Updating the Syncro CS 9271-8i Controller Firmware

Follow these steps to update the firmware on the Syncro CS 9271-8i controller board. You must perform the update only on the controller node that is not currently accessing the drive groups.

NOTE Be sure that the version of firmware selected for the update is specified for Syncro CS controllers. If you updating to a version of controller firmware that does not support Syncro CS controllers, you will experience a loss of HA-DAS functionality.

1. If necessary, perform a planned failover as described in the previous section to transfer control of the drive groups to the other controller node.

2. Start the MSM utility on the controller node that does not currently own the cluster.

NOTE To determine which node currently owns the cluster in Windows Server 2012, follow the steps in Section 4.2.1.2, Planned Failover in Windows Server 2008 R2, up to step 3, where information about the cluster disks is displayed in the center panel. The current owner of the cluster is listed in the Owner Node column.

3. In the left panel of the MSM window, click the icon of the controller that requires an upgrade.

4. In the MSM window, select Go To > Controller > Update Controller Firmware.

5. Click Browse to locate the .rom update file.

6. After you locate the file, click Ok.

The MSM software displays the version of the existing firmware and the version of the new firmware file.

7. When you are prompted to indicate whether you want to upgrade the firmware, click Yes.

The controller is updated with the new firmware code contained in the .rom file.

8. Reboot the controller node after the new firmware is flashed.

The new firmware does not take effect until reboot.

9. If desired, use planned failover to transfer control of the drive groups back to the controller node you just upgraded.

10. Repeat this process for the other controller.

11. Restore the cluster to its non-failed-over mode.



Chapter 4: System AdministrationUpdating the MegaRAID Driver

4.4 Updating the MegaRAID Driver

To update the MegaRAID driver used in the clustering configuration, download the latest version of the driver from the LSI website. Then follow these instructions for Windows Server 2008 R2, Windows Server 2012, Red Hat Linux, or SuSE Enterprise Linux.

4.4.1 Updating the MegaRAID Driver in Windows Server 2008 R2

As a recommended best practice, always back up system data before updating the driver, and then perform a planned failover. These steps are recommended because a driver update requires a system reboot.

1. Right-click on Computer and select Properties.

2. Click Change Settings, as shown in the following figure.

Figure 46 Windows Server 2008 R2 System Properties




3. Select the Hardware tab and click Device Manager.

4. Click Storage to expose the Syncro CS 9271-8i controller.

5. Right-click the Syncro CS 9271-8i controller and select Update Driver Software to start the Driver Update wizard, as shown in the following figure.

Figure 47 Updating the Driver Software

6. Follow the instructions in the wizard.

Syncro CS 92




4.4.2 Updating the MegaRAID Driver in Windows Server 2012

As a recommended best practice, always back up system data before updating the driver, and then perform a planned failover. These steps are recommended because a driver update requires a system reboot.

1. Run Server Manager and select Local Server on the left panel.

2. Click the Tasks selection list on the right-hand side of the window, as shown in the following figure.





3. Select Computer Management, then click Device Manager.

4. Click Storage controllers to display the Syncro CS controller.

5. Right-click on the Syncro CS controller and select Update Driver Software, as shown in the following figure, to start the Driver Update wizard.


6. Follow the instructions in the wizard.

4.4.3 Updating the Red Hat Linux System Driver

Perform the following steps to install or update to the latest version of the MegaSAS driver:

1. Boot the system.

2. Go to Console (your terminal GUI).

3. Install the Dynamic Kernel Module Support (DKMS) driver RPM.

Uninstall the earlier version first, if needed.

4. Install the MegaSAS driver RPM.


5. Reboot the system to load the driver.

4.4.4 Updating the SuSE Linux Enterprise Server 11 Driver

Perform the following steps to install or upgrade to the latest version of the MegaSAS driver:

1. Boot the system.

2. Go to Console (your terminal GUI).

3. Run Dynamic Kernel Module Support (DKMS) driver RPM.


Syncro CS 92



Chapter 4: System AdministrationPerforming Preventive Measures on Disk Drives and VDs

4. Install the MegaSAS driver RPM.

Uninstall the earlier version first, if necessary.

5. Reboot the system to load the driver.

4.5 Performing Preventive Measures on Disk Drives and VDs

The following drive and VD-level operations help to proactively detect disk drive and VD errors that could potentially cause the failure of a controller node. For more information about these operations, refer to the MegaRAID SAS Software User Guide.

Patrol Read – A patrol read periodically verifies all sectors of disk drives that are connected to a controller, including the system reserved area in the RAID configured drives. You can run a patrol read for all RAID levels and for all hot spare drives. A patrol read is initiated only when the controller is idle for a defined time period and has no other background activities.

Consistency Check – You should periodically run a consistency check on fault-tolerant VDs (RAID 1, RAID 5, RAID 6, RAID 10, RAID 50, and RAID 60 configurations; RAID 0 does not provide data redundancy). A consistency check scans the VDs to determine whether the data has become corrupted and needs to be restored.

For example, in a VD with parity, a consistency check computes the data on one drive and compares the results to the contents of the parity drive. You must run a consistency check if you suspect that the data on the VD might be corrupted.

NOTE Be sure to back up the data before running a consistency check if you think the data might be corrupted.



Chapter 5: TroubleshootingVerifying HA-DAS Support in Tools and the OS Driver

Chapter 5: Troubleshooting

This chapter has information about troubleshooting a Syncro CS system.

5.1 Verifying HA-DAS Support in Tools and the OS Driver

Not all versions of MegaRAID Storage Manager (MSM) support HA-DAS. The MSM versions that include support for HA-DAS have specific references to clustering. It is not always possible to determine the level of support from the MSM version number. Instead, look for the MSM user interface features that indicate clustering support. If the second item in the MSM Properties box on the dashboard for the HA-DAS controller is High Availability Cluster status, the version supports HA-DAS. This entry does not appear on versions of MSM without HA-DAS support.

You can also verify HA-DAS support in the MSM Create Virtual Drive wizard. A Provide Shared Access check box appears only if the MSM version supports clustering, as shown in the following figure.

Figure 50 Provide Shared Access Property

Versions of MSM that support HA-DAS also require an HA-DAS-capable OS driver to present HA-DAS features. The in-box driver for Windows Server 2012, RHEL 6.4, and SLES 11 SP3 do not present HA-DAS features in MSM.

To determine if your version of StorCLI supports HA-DAS, enter this help command:

Storcli /c0 add vd ?

If the help text that is returned includes information about the Host Access Policy: Exclusive to peer Controller / Exclusive / Shared parameter, your version of StorCLI supports HA-DAS.



Chapter 5: TroubleshootingConfirming SAS Connections

5.2 Confirming SAS Connections

The high availability functionality of HA-DAS is based on redundant SAS data paths between the clustered nodes and the disk drives. If all of the components in the SAS data path are configured and connected properly, each HA-DAS controller has two SAS addresses for every drive, when viewed from the HA-DAS controllers.

This section explains how to use three tools (StorCLI, WebBIOS, and MSM) to confirm the correctness of the SAS data paths.

5.2.1 Using WebBIOS to View Connections for Controllers, Expanders, and Drives

Use the Physical View in WebBIOS to confirm the connections between the controllers and expanders in the Syncro CS system. As shown in the following figure, if both expanders are running, the view in WebBIOS from one of the nodes includes the other HA-DAS RAID controller (Processor 8 in the figure), the two expanders, and any drives, as shown in the following figure.

Figure 51 WebBIOS Physical View

If the other node is powered off, the other RAID controller does not appear in WebBIOS. Devices can appear and disappear while the system is running, as connections are changed. Use the WebBIOS rescan feature to rediscover the devices and topology after a connection change.

5.2.2 Using WebBIOS to Verify Dual-Ported SAS Addresses to Disk Drives

Use the Drive Properties View in WebBIOS to confirm that each SAS drive displays two SAS addresses. In a Syncro CS 9271-8i system that is properly cabled and configured, every drive should have two SAS addresses. If the system lacks redundant SAS data paths, the WebBIOS shows only one SAS address on the screen.

To check the drive SAS addresses, open the Physical View on the home page of WebBIOS and click on a drive link. On the Disk Properties page, click Next. When the redundant SAS data paths are missing, this second view of drive properties shows only one SAS address in the left panel, as in the following figure.




Figure 52 Redundant SAS Data Paths Are Missing

The following figure shows the correct view with two drive SAS addresses.

Figure 53 Redundant SAS Data Paths Are Present

5.2.3 Using StorCLI to Verify Dual-Ported SAS Addresses to Disk Drives

The StorCLI configuration display command (show all) returns many lines of information, including a summary for each physical disk. To confirm the controller discovery of both SAS addresses for a single drive, examine the StorCLI configuration text for the drive information following the Physical Disk line. If only one of the drive’s SAS ports is discovered, the second SAS address is listed as 0x0. If both drive SAS ports are discovered, the second drive port SAS address is identical to the first except for the last hexadecimal digit, which always has a value of plus 1 or minus 1, relative to SAS Address(0).

The syntax of the StorCLI command is as follows:

Storcli /c0/ex/sx show all




The returned information relating to the physical disk is as follows. Some of the other preceding text is removed for brevity. The SAS addresses are listed at the end. In this example, only one of the drive’s SAS ports is discovered, so the second SAS address is listed as 0x0.

Drive /c0/e14/s9 :================-------------------------------------------------------------------------EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp-------------------------------------------------------------------------14:9 8 Onln 2 278.875 GB SAS HDD N N 512B ST3300657SS U-------------------------------------------------------------------------

Drive /c0/e14/s9 - Detailed Information :=======================================

Drive /c0/e14/s9 State :======================Shield Counter = 0Media Error Count = 0

Drive /c0/e14/s9 Device attributes :==================================SN = 6SJ2VR1NWWN = 5000C5004832228CFirmware Revision = 0008

Drive /c0/e14/s9 Policies/Settings :==================================Drive position = DriveGroup:2, Span:1, Row:1Enclosure position = 0Connected Port Number = 0(path1)

Port Information :================-----------------------------------------Port Status Linkspeed SAS address----------------------------------------- 0 Active 6.0Gb/s 0x0 1 Active 6.0Gb/s 0x5000c5004832228e-----------------------------------------

5.2.4 Using MSM to Verify Dual-Ported SAS Addresses to Disk Drives

When the Syncro CS system is running, you can use MSM to verify the dual SAS paths to disk drives in the HA-DAS configuration by following these steps:

1. Start MSM and access the Physical tab for the controller.

2. Click on a drive in the left panel to view the Properties tab for the drive.

3. Look at the SAS Address fields.

As shown in the following figure, a correctly configured and running HA-DAS cluster with both nodes active displays dual SAS addresses on the drives and dual 4-lane SAS connections on the controller.



Chapter 5: TroubleshootingUnderstanding CacheCade Behavior During a Failover

Figure 54 Redundant SAS Connections Displayed in MSM

5.3 Understanding CacheCade Behavior During a Failover

A CacheCade VD possesses properties that are similar to a VD with exclusive host access, and it is not presented to the other host operating system. Therefore, the CacheCade volume does not cache read I/Os for VDs that are managed by the peer controller node.

Foreign import of a CacheCade VD is not permitted. To migrate a CacheCade VD from one controller node to another, you must delete it from the controller node that currently manages it and then recreate the CacheCade VD on the peer controller node.



Chapter 5: TroubleshootingError Situations and Solutions

5.4 Error Situations and Solutions

The following table lists some problems that you might encounter in a Syncro CS configuration, along with possible causes and solutions.

Table 2 Error Situations and Solutions

Problem Possible Cause Solution

A drive is reported as Unsupported, and the drive cannot be used in a drive group.

The drive is not a SAS drive, or it does not support SCSI-3 PR.

Be sure you are using SAS drives that are included on the list of compatible SAS drives on the LSI web site, or ask your drive vendor.

One or more of the following error messages appear after you run the Microsoft Cluster Validation tool: Disk bus type does not support

clustering. Disk partition style is MBR. Disk partition type is BASIC.

No disks were found on which to perform cluster validation tests.

Two I/O paths are not established between the controller and drive.

This build of the Windows operating system does not natively support RAID controllers for clustering.

Confirm that device ports and all cabling connections between the controller and drive are correct and are functioning properly. See Section 5.2, Confirming SAS Connections.

Confirm that the version (or the current settings) of the operating system supports clustered RAID controllers.

When booting a controller node, the controller reports that it is entering Safe Mode. After entering Safe Mode, the controller does not report the presence of any drives or devices.

An incompatible peer controller parameter is detected. The peer controller is prevented from entering the HA domain.

A peer controller is not compatible with the controller in the HA domain. Entering Safe Mode protects the VDs by blocking access to the controller to allow for correction of the incompatibility.

The peer controller might have settings that do not match the controller. To correct this situation, update the firmware for the peer controller and the other controller, or both, to ensure that they are at the same firmware version.

The peer controller hardware does not exactly match the controller. To correct this situation, replace the peer controller with a unit that matches the controller hardware.

The LSI management applications do not present or report the HA options and properties.

The version of the management applications might not be HA-compatible.

Obtain an HA-compatible version of the management application from the LSI web site, or contact an LSI support representative.

The management application does not report a VD or disk group, but the VD or disk group is visible to the OS.

The shared VD is managed by the peer controller.

The VD or drive group can be seen and managed on the other controller node. Log in or to open a terminal on the other controller node.

In Windows clustered environments, I/O stops on the remote client when both SAS cable connections from one controller node are severed. The clustered shared volumes appear in offline state even when both cables have been reconnected.

Behavior outlined in Microsoft Knowledge Base article 2842111 might be encountered.

1. Reconnect the severed SAS cables.2. Open Failover Cluster Manager > Storage > Disk >

Cluster Disk to check the status of the cluster.3. If the disks are online, you can restart your client

application.If the disks are not online, right click on Disks > Refresh and bring them online manually. If the disks do not go online through manual methods, reboot the server node.

4. Restart the Server Role associated with the disks.5. Apply hotfix for Microsoft Knowledge Base article

2842111 to both server nodes.



Chapter 5: TroubleshootingEvent Messages and Error Messages

5.5 Event Messages and Error Messages

Each message that appears in the MegaRAID Storage Manager event log has an error level that indicates the severity of the event, as listed in the following table.

Error Level Meaning

The following table lists MegaRAID Storage Manager event messages that might appear in the MSM event log when the Syncro CS system is running.

Table 3 Event Error Levels

Error Level Meaning

Information Informational message. No user action is necessary.

Warning Some component might be close to a failure point.

Critical A component has failed, but the system has not lost data.

Fatal A component has failed, and data loss has occurred or will occur.

Table 4 HA-DAS MSM Events and Messages

Number Severity Level Event Text Cause Resolution

0x01cc Information Peer controller entered HA Domain

A compatible peer controller entered the HA domain.

None - Informational

0x01cd Information Peer controller exited HA Domain

A peer controller is not detected or has left the HA domain.

Planned conditions such as a system restart due to scheduled node maintenance are normal. Unplanned conditions must be further investigated to resolve.

0x01ce Information Peer controller now manages PD: <PD identifier>

A PD is now managed by the peer controller.


0x01cf Information Controller ID: <Controller identifier> now manages PD: <PD identifier>

A PD is now managed by the controller.


0x01d0 Information Peer controller now manages VD: <VD identifier>

A VD is now managed by the peer controller.


0x01d1 Information Controller ID: <Controller identifier> now manages VD: <VD identifier>

A VD is now managed by the controller.


0x01d2 Critical Target ID conflict detected. VD: <VD identifier> access is restricted from Peer controller

Multiple VD target IDs are in conflict due to scenarios that might occur when the HA domain has a missing cross-link that establishes direct controller-to-controller communication (called split-brain condition).

The Peer controller cannot access VDs with conflicting IDs. To resolve, re-establish the controller-to-controller communication path to both controllers and perform a reset of one system.

0x01d3 Information Shared access set for VD: <VD identifier>

A VD access policy is set to Shared. None - Informational

0x01d4 Information Exclusive access set for VD: <VD identifier>

A VD access policy is set to Exclusive. None - Informational




0x01d5 Warning VD: <VD identifier> is incompatible in the HA domain

The controller or peer controller does not support the VD type.

Attempts to create a VD that is not supported by the peer controller result in a creation failure. To resolve, create a VD that aligns with the peer controller VD support level. Attempts to introduce an unsupported VD that is managed by the peer controller result in rejection of the VD by the controller. To resolve, convert the unsupported VD to one that is supported by both controllers, or migrate the data to a VD that is supported by both controllers.

0x01d6 Warning Peer controller settings are incompatible

An incompatible peer controller parameter is detected. The peer controller is rejected from entering the HA domain.

The peer controller might have settings that do not match the controller. These settings can be corrected by a firmware update. To resolve, update the firmware for the peer controller and/or controller to ensure that they are at the same version.

0x01d7 Warning Peer Controller hardware is incompatible with HA Domain ID: <Domain identifier>

An incompatible peer controller is detected. The peer controller is rejected from entering the HA domain.

The peer controller hardware does not exactly match the controller. To resolve, replace the peer controller with a unit that matches the controller hardware

0x01d8 Warning Controller property mismatch detected with Peer controller

A mismatch exists between the controller properties and the peer controller properties.

Controller properties do not match between the controller and peer controller. To resolve, set the mismatched controller property to a common value.

0x01d9 Warning FW version does not match Peer controller

A mismatch exists between the controller and peer controller firmware versions.

This condition can occur when an HA controller is introduced to the HA domain during a controller firmware update. To resolve, upgrade or downgrade the controller or peer controller firmware to the same version.

0x01da Warning Advanced Software Option(s) <option names> mismatch detected with Peer controller

A mismatch exists between the controller and peer controller advanced software options.

This case does not result in an incompatibility that can affect HA functionality, but it can impact the effectiveness of the advanced software options. To resolve, enable an identical level of advanced software options on both controllers.

0x01db Information Cache mirroring is online Cache mirroring is established between the controller and the peer controller. VDs with write-back cache enabled are transitioned from write-through mode to write-back mode.


Table 4 HA-DAS MSM Events and Messages (Continued)





The following table shows HA-DAS boot events and messages.

0x01dc Warning Cache mirroring is offline Cache mirroring is not active between the controller and the peer controller. VDs with write-back cache enabled are transitioned from write-back mode to write-through mode.

This condition can occur if cache coherency is lost, such as a communication failure with the peer controller or VD in write-back mode when pending writes go offline, or with pinned cache scenarios. To resolve, reestablish proper cabling and hardware connections to the peer controller or disposition a controller's pinned cache.

0x01dd Critical Cached data from peer controller is unavailable. VD: <VD identifier> access policy is set to Blocked.

The peer controller has cached data for the affected VDs, but is not present in the HA domain. The VD access policy is set to Blocked until the peer controller can flush the cache data to the VD.

This condition can occur when cache coherency is lost due to failure of communication with the peer controller. To resolve, bring the peer controller online and reestablish communication paths to the peer controller. If the peer controller is unrecoverable, restore data from a backup or manually set the access policy (data is unrecoverable).

0x01e9 Critical Direct communication with peer controller(s) was not established. Please check proper cable connections.

The peer controller might be passively detected, but direct controller-to-controller communication could not be established due to a split brain condition. A split brain condition occurs when the two server nodes are not aware of each other’s existence but can access the same end device/drive.

Attempts to create a VD that is not supported by the peer controller result in a creation failure. To resolve, create a VD that aligns with the peer controller VD support level. Attempts to introduce an unsupported VD that is managed by the peer controller result in rejection of the VD by the controller. To resolve, convert the unsupported VD to one that is supported by both controllers, or migrate the data to a VD that is supported by both controllers.

Table 5 HA-DAS Boot Events & Messages

Boot Event Text Generic Conditions when each event occurs Actions to resolve

Peer controller firmware is not HA compatible. Please resolve firmware version/settings incompatibility or press 'C' to continue in Safe Mode (all drives will be hidden from this controller).

An incompatible peer controller parameter is detected. The peer controller is rejected from entering the HA domain.

The peer controller might have settings that do not match the controller. These settings might be corrected by a firmware update. To resolve, update the firmware for the peer controller and/or controller to ensure that they are at the same version.

Peer controller hardware is not HA compatible. Please replace peer controller with compatible unit or press 'C' to continue in Safe Mode (all drives will be hidden from this controller).

A peer controller is not compatible with the controller in the HA domain. Entering Safe Mode protects the VDs by blocking access to the controller to allow the incompatibility to be corrected.

The peer controller hardware does not exactly match the controller. To resolve, replace the peer controller with a unit that matches the controller hardware

Direct communication with peer controller(s) was not established. Please check proper cable connections.

The peer controller can be passively detected, but direct controller-to-controller communication could not be established due to split-brain conditions caused by a missing cross link.

A cross link to establish direct peer controller communication is not present. To resolve, check all SAS links in the topology for proper routing and connectivity.

Table 4 HA-DAS MSM Events and Messages (Continued)


Documents

Syncro CS 9271-8i Solution User Guide