416
TruCluster Available Server Configuration and Management Course Guide Order Number: EY-V355E-SG.0001

Truclu Ase

Embed Size (px)

Citation preview

Page 1: Truclu Ase

TruCluster Available ServerConfiguration and ManagementCourse GuideOrder Number: EY-V355E-SG.0001

Page 2: Truclu Ase

© Digital Equipment Corporation October 1996.

This document is confidential and proprietary and is the property of Digital EquipmentCorporation.

The information in this document is subject to change without notice and should notbe construed as a commitment by Digital Equipment Corporation. Digital EquipmentCorporation assumes no responsibility for any errors that may appear in this document.

Possession, use, duplication, or dissemination of the software described in thisdocumentation is authorized only pursuant to a valid written license from Digital orthe third-party owner of the software copyright.

No responsibility is assumed for the use or reliability of software on equipment that isnot supplied by Digital Equipment Corporation or its affiliated companies.

Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject torestrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical

All Rights Reserved.Printed in U.S.A.

AdvantageCluster, AlphaGeneration, AlphaServer, AXP, Bookreader, CDA, DEC,DECevent, DECnet, DECnsr, DECsafe, DECwindows, Digital, Digital UNIX, HSC,LAT, LinkWorks, OpenVMS, PATHWORKS, POLYCENTER, PrintServer, StorageWorks,TruCluster Software, TURBOchannel, ULTRIX, VAX, VAXcluster, VAX Notes, VMS,VMScluster, XMI and the DIGITAL logo are trademarks of Digital EquipmentCorporation.

AIX and IBM are registered trademarks of International Business MachinesCorporation. AppleTalk is a registered trademark of Apple Computer, Inc. GlobalKnowledge Network and the Global Knowledge Network logo are trademarks of GlobalKnowledge Network, Inc. Hewlett-Packard, HP and HP-UX are registered trademarksof Hewlett-Packard Company. MEMORY CHANNEL is a trademark of Encore ComputerCorporation. Microsoft is a registered trademark of Microsoft Corporation. MIPS isa trademark of MIPS Computer Systems, Inc. Motif, OSF and OSF/1 are registeredtrademarks of the Open Software Foundation. NFS, NEWS, Solaris and Sun areregistered trademarks of Sun Microsystems, Inc. Novell and NetWare are registeredtrademarks of Novell, Inc. ORACLE is a registered trademark of Oracle Corporation.ORACLE Parallel Server and ORACLE7 are trademarks of Oracle Corporation.POSIX is a registered trademark of IEEE. PostScript is a registered trademark ofAdobe Systems, Inc. Sony is a registered trademark of Sony Corporation. SunOSis a trademark of Sun Microsystems, Inc. UNIX is a registered trademark licensedexclusively through X/Open Company Ltd. Windows and Windows NT are trademarksof Microsoft Corporation. X/Open is a trademark of X/Open Company Ltd. X WindowSystem is a trademark of the Massachusetts Institute of Technology.

This document was prepared using VAX DOCUMENT Version 2.1.

Page 3: Truclu Ase

Contents

About This Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

1 Introducing TruCluster Available Server Configuration andManagement

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–3Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–3

Describing the TruCluster Available Server Product . . . . . . . . . . . . . . . . . 1–4Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–4Product Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–5

Configuring the TruCluster Available Server . . . . . . . . . . . . . . . . . . . . . . . 1–6Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–6Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–6Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–7Sample Available Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . 1–7

Presenting the TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–9Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–9ASE Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–10

Planning TruCluster Available Server Configurations . . . . . . . . . . . . . . . . 1–11Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11Network Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11Storage Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12Service Availability Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12

Determining Configuration and Maintenance Phases . . . . . . . . . . . . . . . . 1–14Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–14Planning the Available Server Configuration . . . . . . . . . . . . . . . . . . . . 1–15Configuring ASE Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–15Installing and Setting Up the Base Operating System . . . . . . . . . . . . 1–16Installing the TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–16Configuring ASE Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–16Testing the TruCluster Software Failover Sequences . . . . . . . . . . . . . 1–16Monitoring and Managing Available Server Configurations . . . . . . . . 1–17Troubleshooting an Existing Available Server Configuration . . . . . . . . 1–17

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–18Introduction to the Available Server Software . . . . . . . . . . . . . . . . . . . 1–18Configuring the Available Server Software . . . . . . . . . . . . . . . . . . . . . 1–18Presenting the TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . 1–18Planning Available Server Configurations . . . . . . . . . . . . . . . . . . . . . . 1–19Determining Configuration and Maintenance Phases . . . . . . . . . . . . . 1–19

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–20Describing the TruCluster Available Server Software Product:Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–20

iii

Page 4: Truclu Ase

Describing the TruCluster Available Server Software Product:Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–20Configuring the TruCluster Software: Exercise . . . . . . . . . . . . . . . . . . 1–20Configuring the TruCluster Software: Solution . . . . . . . . . . . . . . . . . . 1–20Presenting the TruCluster Software: Exercise . . . . . . . . . . . . . . . . . . 1–21Presenting the TruCluster Software: Solution . . . . . . . . . . . . . . . . . . . 1–21Planning Available Server Configurations: Exercise . . . . . . . . . . . . . . 1–21Planning Available Server Configurations: Solution . . . . . . . . . . . . . . 1–21Determining Configuration and Maintenance Phases: Exercise . . . . . 1–22Determining Configuration and Maintenance Phases: Solution . . . . . 1–22

2 Understanding TruCluster Software Interactions

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2

Introducing Highly Available Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3Independence of Services from Servers . . . . . . . . . . . . . . . . . . . . . . . . 2–3Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–4

Introducing the TruCluster Software Components . . . . . . . . . . . . . . . . . . 2–5Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5TruCluster Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5TruCluster Software Component Interaction . . . . . . . . . . . . . . . . . . . . 2–7

Understanding TruCluster Failure Detection and Response . . . . . . . . . . . 2–9Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–9Failure Events that Trigger a Response . . . . . . . . . . . . . . . . . . . . . . . 2–9Member Node Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–10SCSI Bus Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11Critical SCSI Path Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11Device Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–12ASE_PARTIAL _MIRRORING Parameter . . . . . . . . . . . . . . . . . . . . . . 2–13Network Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13Network Interface Failure Response . . . . . . . . . . . . . . . . . . . . . . . . . . 2–14Network Partition Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–15Monitored Network Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–15Service Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–15Reserving Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–17Choosing a New Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18Action Script Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18LSM and TruCluster Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–19

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20Introducing Highly Available Services . . . . . . . . . . . . . . . . . . . . . . . . . 2–20Introducing the TruCluster Software Components . . . . . . . . . . . . . . . 2–20Understanding TruCluster Software Failure Detection andResponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–22Introducing Highly Available Services: Exercise . . . . . . . . . . . . . . . . . 2–22Introducing Highly Available Services: Solution . . . . . . . . . . . . . . . . . 2–22Introducing the TruCluster Software Components: Exercise . . . . . . . . 2–22Introducing the TruCluster Software Components: Solution . . . . . . . . 2–22TruCluster Software Failure Detection and Response: Exercise . . . . . 2–23TruCluster Software Failure Detection and Response: Solution . . . . . 2–23

iv

Page 5: Truclu Ase

3 Configuring TruCluster Available Server Hardware

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2

Examining TruCluster Available Server General Hardware ConfigurationRules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3Rules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3SCSI Bus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–5

Determining Available Server Hardware Components . . . . . . . . . . . . . . . . 3–10Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–10TruCluster Available Server Supported Systems . . . . . . . . . . . . . . . . . 3–10DECsafe Supported SCSI Controllers . . . . . . . . . . . . . . . . . . . . . . . . . 3–10BA350, BA353, and BA356 Storage Expansion Units . . . . . . . . . . . . . 3–11Supported Controllers for DEC RAID Subsystems . . . . . . . . . . . . . . . 3–14Supported Disk Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–15Signal Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–15SCSI Cables and Terminators for Available Server Configurations . . . 3–20Network Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–23

Configuring TruCluster Available Server Hardware . . . . . . . . . . . . . . . . . 3–25Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–25Installing the Network Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–25Firmware Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–26Starting Your TruCluster Available Server Configuration . . . . . . . . . . 3–26Setting Up a Single-Ended Available Server Configuration for Usewith PMAZCs and a BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . . 3–27Setting Up a Differential Available Server Configuration for Use withPMAZCs and a BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–31Setting Up a Differential Available Server Configuration for Use withPMAZCs and a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–35Setting Up an Available Server Configuration for Use with PMAZCsand an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–38PMAZC Dual SCSI Module Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . . 3–41Verifying and Setting PMAZC and KZTSA SCSI ID and BusSpeed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–41Setting Up an Available Server Configuration Using a KZTSATURBOchannel to SCSI Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–44Setting Up an Available Server Configuration with KZMSA SCSIControllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–50Preparing a KZMSA for Use in an Available Server Environment . . . 3–56Setting Up an Available Server Configuration Using KZPSA PCI toSCSI Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–61Setting KZPSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . . . . 3–67Setting Up an Available Server Configuration with Mixed AdapterTypes and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . . . . 3–68Setting Up an Available Server Configuration with Mixed AdapterTypes and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–70

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–73Examining TruCluster Available Server General HardwareConfiguration Rules and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . 3–73Determining Available Server Hardware Components . . . . . . . . . . . . . 3–73Configuring TruCluster Available Server Hardware . . . . . . . . . . . . . . 3–74

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–76

v

Page 6: Truclu Ase

Examining TruCluster Available Server General HardwareConfiguration Rules and Restrictions: Exercise . . . . . . . . . . . . . . . . . . 3–76Examining TruCluster Available Server General HardwareConfiguration Rules and Restrictions: Solution . . . . . . . . . . . . . . . . . . 3–77Determining Available Server Hardware Components: Exercise . . . . . 3–77Determining Available Server Hardware Components: Solution . . . . . 3–78Configuring TruCluster Available Server Hardware: Exercise . . . . . . 3–79Configuring TruCluster Available Server Hardware: Solution . . . . . . 3–79

4 Installing TruCluster Software

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2

Performing Preliminary Setup Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3Subsets Required for TruCluster Available Server Operation . . . . . . . 4–3Before Installing TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . 4–4Network Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–5

Preparing to Install TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . 4–6Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6Choosing the TruCluster Software Installation Procedure . . . . . . . . . . 4–6Setting Up an ASE for the First Time . . . . . . . . . . . . . . . . . . . . . . . . . 4–8Performing a Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9Simultaneous Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–13Adding a Member System to an Existing ASE with ASE V1.4Operating Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–16

Installing TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17Installing TruCluster Available Server Software Version 1.4 . . . . . . . . 4–17

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–22Performing Preliminary Setup Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 4–22Preparing to Install TruCluster Software . . . . . . . . . . . . . . . . . . . . . . 4–22Installing TruCluster Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–23

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–24Performing Preliminary Setup Tasks: Exercise . . . . . . . . . . . . . . . . . . 4–24Performing Preliminary Setup Tasks: Solution . . . . . . . . . . . . . . . . . . 4–24Preparing to Install TruCluster Software: Exercise . . . . . . . . . . . . . . 4–25Preparing to Install TruCluster Software: Solution . . . . . . . . . . . . . . . 4–26Installing TruCluster Software: Exercise . . . . . . . . . . . . . . . . . . . . . . 4–27Installing TruCluster Software: Solution . . . . . . . . . . . . . . . . . . . . . . . 4–27

5 Setting Up and Managing ASE Members

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2

Introducing the asemgr Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3asemgr Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3Running Multiple Instances of the asemgr . . . . . . . . . . . . . . . . . . . . . 5–4

vi

Page 7: Truclu Ase

Setting Up and Managing Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5Using asemgr the First Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5Initializing ASE Member Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6Using asemgr to Manage Members . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7Adding a Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8Deleting a Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8Managing ASE Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–9Displaying ASE Member Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14Resetting the TruCluster Software Daemons . . . . . . . . . . . . . . . . . . . . 5–15TruCluster Software Daemon Scheduling . . . . . . . . . . . . . . . . . . . . . . 5–15

Using TruCluster Event Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–17Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–17Starting the Logger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–17Stopping the Logger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–17Setting System Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–17Displaying Logger Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–18Setting Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–19Using an Alert Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–20Examining Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–22

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24Introducing the asemgr Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24Setting Up and Managing Members . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24Using TruCluster Software Event Logging . . . . . . . . . . . . . . . . . . . . . 5–24

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25Introducing the asemgr Utility: Exercise . . . . . . . . . . . . . . . . . . . . . . 5–25Introducing the asemgr Utility: Solution . . . . . . . . . . . . . . . . . . . . . . 5–25Using asemgr to Manage Members: Exercise . . . . . . . . . . . . . . . . . . . 5–25Using asemgr to Manage Members: Solution . . . . . . . . . . . . . . . . . . . 5–25Using TruCluster Software Event Logging: Exercise . . . . . . . . . . . . . 5–26Using TruCluster Software Event Logging: Solution . . . . . . . . . . . . . 5–26

6 Writing and Debugging Action Scripts

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2

Introducing Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3Types of Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3Available Services and Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4Script Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4Script Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4Skeleton Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5Start and Stop Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–7

Creating Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9Methods to Create Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9Specifying Your Own Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–10Editing the Default Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–10Pointing to an External Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11Additional Script Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

Testing and Debugging Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13

vii

Page 8: Truclu Ase

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13Test First . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13Debugging Scripts in ASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14Introducing Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14Creating Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14Testing and Debugging Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . 6–14

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–15Introducing Action Scripts: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 6–15Introducing Action Scripts: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 6–15Creating Action Scripts: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16Creating Action Scripts: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18Testing and Debugging Action Scripts: Exercise . . . . . . . . . . . . . . . . . 6–20Testing and Debugging Action Scripts: Solution . . . . . . . . . . . . . . . . . 6–20

7 Setting Up ASE Services

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–2

Understanding Highly Available Services . . . . . . . . . . . . . . . . . . . . . . . . . 7–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3Introducing Supported Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3Describing Clients and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3Setting Up a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–4

Preparing to Set Up Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5Automatic Service Placement Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5Services and Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6Using NFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6Using UFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6Using AdvFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–7Using LSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–7Installing the Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8Configuration Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8

Setting Up NFS Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11Describing an NFS Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11Discussing the NFS Service Setup Procedure . . . . . . . . . . . . . . . . . . . 7–11Setting Up an NFS Service for a Public Directory . . . . . . . . . . . . . . . 7–12Discussing the /etc/exports.ase File . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–17NFS Mail Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–18

Setting Up a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19Describing a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19Describing the Set Up Procedure for a Disk Service . . . . . . . . . . . . . . 7–19Using a Network Alias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–20Setting Up a Disk Service for a Database Application . . . . . . . . . . . . 7–21

Setting Up a User-Defined Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–27Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–27User-Defined Service Setup Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 7–27

viii

Page 9: Truclu Ase

Adding a User-Defined Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–27User-Defined Login Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–31

Using asemgr to Manage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–32Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–32Managing Services Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–32Displaying Service Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–33Relocating a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–35Modifying a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–36

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–41Understanding Highly Available Services . . . . . . . . . . . . . . . . . . . . . . 7–41Preparing to Set Up Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–41Setting Up NFS Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–41Setting Up a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–41Setting Up a User-Defined Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–41Using asemgr to Manage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–42

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–43Understanding Highly Available Services: Exercise . . . . . . . . . . . . . . 7–43Understanding Highly Available Services: Solution . . . . . . . . . . . . . . 7–43Preparing to Set Up Services: Exercise . . . . . . . . . . . . . . . . . . . . . . . . 7–43Preparing to Set Up Services: Solution . . . . . . . . . . . . . . . . . . . . . . . . 7–43Setting Up NFS Services: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–44Setting Up NFS Services: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–44Setting Up a Disk Service: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 7–44Setting Up a Disk Service: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 7–44Setting Up a User-Defined Service: Exercise . . . . . . . . . . . . . . . . . . . 7–44Setting Up a User-Defined Service: Solution . . . . . . . . . . . . . . . . . . . . 7–44Using asemgr to Manage Services: Exercise . . . . . . . . . . . . . . . . . . . . 7–45Using asemgr to Manage Services: Solution . . . . . . . . . . . . . . . . . . . . 7–45

8 Using the Cluster Monitor

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2

Setting Up the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3Setup Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3Sample Setup Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4Updating the Cluster Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4

Using the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5Starting the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6Top View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6Device View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7Services View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–11

Launching Other Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–15Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–15Included Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–15External Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–15

Monitoring Available Server Configurations with the Cluster Monitor . . . 8–16Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–16Top View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–16Device View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–16

ix

Page 10: Truclu Ase

Services View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–17What to Do when You See an Error . . . . . . . . . . . . . . . . . . . . . . . . . . 8–17

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–19Setting Up the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–19Using the Cluster Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–19Launching Other Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–19Monitoring Available Server Configurations with the ClusterMonitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–20

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–21Setting Up the Cluster Monitor: Exercise . . . . . . . . . . . . . . . . . . . . . . 8–21Setting Up the Cluster Monitor: Solution . . . . . . . . . . . . . . . . . . . . . . 8–21Using the Cluster Monitor: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 8–22Using the Cluster Monitor: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 8–22Launching Other Tools: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–23Launching Other Tools: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–23Monitoring Available Server Configurations with the Cluster Monitor:Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–23Monitoring Available Server Configurations with the Cluster Monitor:Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–24

9 Testing, Recovering, and Maintaining TruCluster Configurations

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2

Performing TruCluster Testing Procedures . . . . . . . . . . . . . . . . . . . . . . . . 9–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3System Configuration Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3Observing System Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3System Power Off Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–4DWZZA-AA Power Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5Removing a Shared Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5Removing Power from BA350 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–6Removing One Member from the Network . . . . . . . . . . . . . . . . . . . . . 9–7Removing All Members from the Network . . . . . . . . . . . . . . . . . . . . . . 9–7

Recovering from Failures in the ASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–9Instructional Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–9Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–9LSM Mirrored Disk Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–9Obtaining LSM Disk Group Information . . . . . . . . . . . . . . . . . . . . . . . 9–10Removing Faulty Disk from LSM Database . . . . . . . . . . . . . . . . . . . . 9–11Restoring the Partition Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–12Initializing the Disk for LSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–12Associating the New Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–12Recovering the Plex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–12Rereserving the Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–13Replacing a Nonmirrored Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–13Unassigned Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–13Resetting TruCluster Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–14

Performing Ongoing Maintenance Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 9–15Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–15Changing Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–15Stopping and Restarting TruCluster Daemon Activity . . . . . . . . . . . . 9–16

x

Page 11: Truclu Ase

Adding and Removing Member Nodes . . . . . . . . . . . . . . . . . . . . . . . . . 9–16Adding and Removing Storage Boxes . . . . . . . . . . . . . . . . . . . . . . . . . 9–16Adding and Removing Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–17

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–18Performing TruCluster Software Testing Procedures . . . . . . . . . . . . . . 9–18Performing Disk Recovery Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 9–18Performing Ongoing Maintenance Tasks . . . . . . . . . . . . . . . . . . . . . . . 9–18

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–19Member Node Failure: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–19Member Node Failure: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–19Network Interface Test: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–19Network Interface Test: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–19Recovering from Failures in the ASE: Exercise . . . . . . . . . . . . . . . . . . 9–20Recovering from Failures in the ASE: Solution . . . . . . . . . . . . . . . . . . 9–20Performing Ongoing Maintenance Tasks: Exercise . . . . . . . . . . . . . . . 9–20Performing Ongoing Maintenance Tasks: Solution . . . . . . . . . . . . . . . 9–21

10 Troubleshooting TruCluster Configurations

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2

Introducing ASE Troubleshooting Techniques . . . . . . . . . . . . . . . . . . . . . . 10–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–3Troubleshooting Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–3Troubleshooting Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–4

Interpreting TruCluster Error Log Messages . . . . . . . . . . . . . . . . . . . . . . . 10–5Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–5TruCluster Logger Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–5Interpreting Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6Alert Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–7

Learning Troubleshooting Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–8Diagnosing Active Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–8Diagnosing Nonactive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–10

Using System Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–12Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–12Using asemgr to Monitor Member Status . . . . . . . . . . . . . . . . . . . . . . 10–13Using asemgr to Monitor Service Status . . . . . . . . . . . . . . . . . . . . . . . 10–15Using asemgr to Monitor the Network Configuration . . . . . . . . . . . . . 10–15Determining Host Adapter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–16Using the uerf Utility to Monitor SCSI Bus Errors . . . . . . . . . . . . . . . 10–16Monitoring Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–17Monitoring the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–18Monitoring Disk I/O with iostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–19Monitoring LSM Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–19Monitoring AdvFS Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–19

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–21Introducing ASE Troubleshooting Techniques . . . . . . . . . . . . . . . . . . . 10–21Interpreting TruCluster Software Error Log Messages . . . . . . . . . . . . 10–21Learning Troubleshooting Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 10–22Using System Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–23

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–24Introducing ASE Troubleshooting Techniques: Exercise . . . . . . . . . . . 10–24

xi

Page 12: Truclu Ase

Introducing ASE Troubleshooting Techniques: Solution . . . . . . . . . . . 10–24Interpreting TruCluster Software Error Log Messages: Exercise . . . . 10–24Interpreting TruCluster Software Error Log Messages: Solution . . . . 10–24Diagnosing a Nonactive System: Exercise . . . . . . . . . . . . . . . . . . . . . . 10–25Diagnosing a Nonactive System: Solution . . . . . . . . . . . . . . . . . . . . . . 10–25Generating CAM Error Information: Exercise . . . . . . . . . . . . . . . . . . . 10–25Generating CAM Error Information: Solution . . . . . . . . . . . . . . . . . . . 10–25Monitoring TruCluster Daemons: Exercise . . . . . . . . . . . . . . . . . . . . . 10–26Monitoring TruCluster Daemons: Solution . . . . . . . . . . . . . . . . . . . . . 10–26

11 Resolving Common TruCluster Problems

About This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2

Recognizing Common Problems and Their Symptoms . . . . . . . . . . . . . . . . 11–3Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–3Improperly Configured SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5Host Adapter Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6Member Crash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–7Storage Device Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–8Network Interface Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–9Network Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–10Invalid Script Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–12Multiple asemgr Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–13Removing Disk Without Updating asemgr . . . . . . . . . . . . . . . . . . . . . . 11–14NFS Service and ASE Member with Same Name . . . . . . . . . . . . . . . . 11–15Service Alias not in /etc/hosts on All Members . . . . . . . . . . . . . . . . . . 11–16ASEROUTING not Set in NFS Service . . . . . . . . . . . . . . . . . . . . . . . . 11–17ASE Member not Added to TruCluster Database . . . . . . . . . . . . . . . . 11–18LSM not Configured on New Member . . . . . . . . . . . . . . . . . . . . . . . . . 11–19Known TruCluster Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–19Users Occupying Mount Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–20Non-TruCluster Processes with Higher Priority . . . . . . . . . . . . . . . . . 11–21Using BC09 Cables with KZTSA Controller . . . . . . . . . . . . . . . . . . . . 11–22

Applying TruCluster Configuration Guidelines . . . . . . . . . . . . . . . . . . . . . 11–23TruCluster Configuration Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 11–23General Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–24SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–25Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–25Host Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–26Disk Storage Enclosures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–26Signal Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–27Tri-link Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–27Network Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–27TruCluster Software Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–28General Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–29TruCluster Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–30Service Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–31Disk Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–32LSM Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–33AdvFS Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–34Action Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–35

xii

Page 13: Truclu Ase

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–36Recognizing Common Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–36Applying TruCluster Configuration Guidelines . . . . . . . . . . . . . . . . . . 11–36

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–37TruCluster Message Interpretation: Exercise . . . . . . . . . . . . . . . . . . . 11–37TruCluster Message Interpretation: Solution . . . . . . . . . . . . . . . . . . . 11–37Problem Relocating Service: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . 11–37Problem Relocating Service: Solution . . . . . . . . . . . . . . . . . . . . . . . . . 11–37SCSI ID Limits: Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–37SCSI ID Limits: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–37Applying TruCluster Configuration Guidelines: Exercise . . . . . . . . . . 11–38Applying TruCluster Configuration Guidelines: Solution . . . . . . . . . . 11–38

12 Test

Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–2Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–13

Index

Examples

3–1 Displaying DEC 3000 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 3–423–2 Displaying PMAZC Bus Speed and SCSI ID . . . . . . . . . . . . . . . . . . . . 3–423–3 Displaying KZTSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . 3–433–4 Setting PMAZC SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . . . . 3–433–5 Setting KZTSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . . . . 3–443–6 Booting the LFU Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–573–7 Using the LFU Utility to Display Hardware Configuration . . . . . . . . . 3–583–8 Using the LFU Utility to Update KZMSA Firmware . . . . . . . . . . . . . 3–593–9 Using the LFU Utility to Modify KZMSA Options . . . . . . . . . . . . . . . 3–603–10 Displaying Devices on AlphaServer 1000, 2000 or 2100 Systems . . . . 3–663–11 Setting KZPSA SCSI ID and Bus Speed . . . . . . . . . . . . . . . . . . . . . . . 3–684–1 Installing TruCluster Available Server Software Version 1.4 . . . . . . . 4–175–1 asemgr Menus for Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–75–2 System Logging Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–185–3 Displaying Logger Daemon Location . . . . . . . . . . . . . . . . . . . . . . . . . . 5–195–4 Displaying Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–205–5 Setting the Log Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–205–6 Editing the Alert Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–215–7 Testing the Alert Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–215–8 daemon.log Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–226–1 Skeleton Start Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–56–2 Skeleton Check Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–66–3 Start Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–76–4 Stop Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–86–5 Specifying Your Own Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–106–6 Editing the Default Action Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–106–7 Pointing to an External Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11

xiii

Page 14: Truclu Ase

7–1 Providing a Mirrored Stripe Set Using LSM . . . . . . . . . . . . . . . . . . . . 7–97–2 Creating a File Domain Using AdvFS . . . . . . . . . . . . . . . . . . . . . . . . . 7–107–3 Adding an NFS Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–127–4 /etc/exports.ase File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–187–5 Using a Network Alias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–207–6 Adding a Disk Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–217–7 Setting Up a User-Defined Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–277–8 Managing ASE Services Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–337–9 Displaying Service Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–337–10 Relocating a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–357–11 Modifying a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–379–1 System Power Off Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–59–2 DWZZA Disconnection Error Messages . . . . . . . . . . . . . . . . . . . . . . . . 9–59–3 Failed Device Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–69–4 BA350 Power Off Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–69–5 Error Messages When One Member Removed from Network . . . . . . . 9–79–6 Error Messages When All Members Removed from Network . . . . . . . 9–89–7 LSM Disk Group Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–109–8 Confirming Failed Disk Information . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1110–1 Member Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1410–2 Service Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1510–3 Network Configuration Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1510–4 uerf CAM Error Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1610–5 Using ps to Determine Daemon Status . . . . . . . . . . . . . . . . . . . . . . . . 10–1710–6 Using rpcinfo to Display Daemon and Port Information . . . . . . . . . . . 10–1710–7 scu show edt Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1810–8 volprint Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1910–9 showfdmn Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1910–10 showfsets Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2011–1 Display when asemgr Cannot Modify Service . . . . . . . . . . . . . . . . . . . 11–12

Figures

1 TruCluster Available Server Configuration and Management CourseMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

1–1 Sample Available Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . 1–71–2 TruCluster Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–101–3 Available Server Configuration and Maintenance Phases . . . . . . . . . . 1–142–1 ASE Software Component Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 2–72–2 Member Down Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–102–3 Critical SCSI Path Failure Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 2–122–4 Service Failover Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–163–1 One SCSI Bus, Two Transmission Methods . . . . . . . . . . . . . . . . . . . . . 3–53–2 Legend for SCSI Bus Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–63–3 SCSI Buses with Devices on Bus Ends Only . . . . . . . . . . . . . . . . . . . . 3–63–4 SCSI Bus with Device in the Middle of the Bus . . . . . . . . . . . . . . . . . 3–7

xiv

Page 15: Truclu Ase

3–5 SCSI Buses Using Bus Segments with Different TransmissionMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8

3–6 Using External Termination on the SCSI Bus . . . . . . . . . . . . . . . . . . . 3–93–7 Disconnecting a Device from the SCSI Bus . . . . . . . . . . . . . . . . . . . . . 3–93–8 BA350 SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–123–9 BA353 Device Address Switches and SCSI Input and Output

Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–133–10 BA356 Storage Shelf SCSI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–143–11 DWZZA-AA Signal Converter SCSI Bus Termination . . . . . . . . . . . . . 3–173–12 DWZZA-VA Signal Converter SCSI Bus Termination . . . . . . . . . . . . . 3–183–13 DWZZB-AA Signal Converter SCSI Bus Termination . . . . . . . . . . . . . 3–193–14 DWZZB-VW Signal Converter SCSI Bus Termination . . . . . . . . . . . . . 3–203–15 Available Server Configuration with Two DEC 3000 Model 500

Systems and a Single-Ended Shared Bus with a BA350 . . . . . . . . . . . 3–303–16 Available Server Configuration with Two DEC 3000 Model 500

Systems and Two Single-Ended Shared Buses Each with a BA350 . . . 3–303–17 Available Server Configuration with Three DEC 3000 Model 500

Systems with PMAZCs, Differential Shared Bus, and a BA350 . . . . . 3–333–18 Available Server Configuration with Three DEC 3000 Model 500

Systems with PMAZCs, Differential Shared Bus, and a BA356 . . . . . 3–373–19 Available Server Configuration with Two DEC 3000 Model 500

Systems with PMAZC SCSI Controllers and an HSZ40 . . . . . . . . . . . 3–403–20 PMAZC Dual SCSI Module Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . . 3–413–21 Available Server Configuration with Two DEC 3000 Model 500

Systems Using KZTSA SCSI Adapters and a Single-Ended SharedBus with a BA350 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–46

3–22 Available Server Configuration with Two DEC 3000 Model 500Systems Using KZTSA SCSI Adapters and a Single-Ended SharedBus with a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–47

3–23 Two DEC 3000 Model 500 Systems with KZTSA TURBOchannelSCSI Adapters in an Available Server Configuration with anHSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–48

3–24 KZTSA Jumpers and Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–493–25 Available Server Configuration with Two DEC 7000 with KZMSA

XMI SCSI Adapters and a BA350 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–533–26 Available Server Configuration with Two DEC 7000 with KZMSA

XMI SCSI Adapters and a BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–543–27 Available Server Configuration with Two DEC 7000 Systems Using

KZMSA XMI to SCSI Adapters with an HSZ40 . . . . . . . . . . . . . . . . . 3–563–28 Available Server Configuration with Two AlphaServer 2100 Systems

Using KZPSA PCI to SCSI Adapters with a BA350 . . . . . . . . . . . . . . 3–633–29 Available Server Configuration with Two AlphaServer 2100 Systems

Using KZPSA PCI to SCSI Adapters with an HSZ40 . . . . . . . . . . . . . 3–653–30 KZPSA Termination Resistor Locations . . . . . . . . . . . . . . . . . . . . . . . . 3–653–31 Mixed Host Adapter Available Server Configuration with BA350

Storage Expansion Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–703–32 Mixed Host Adapter Available Server Configuration with an

HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–724–1 Upgrade Paths for Existing DECsafe Available Server Installation . . . 4–77–1 Client View of ASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–4

xv

Page 16: Truclu Ase

8–1 Cluster Monitor Top View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–78–2 Cluster Monitor Configuration View . . . . . . . . . . . . . . . . . . . . . . . . . . 8–88–3 SCSI Bus Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–98–4 All Shared Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–108–5 Local Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–118–6 Cluster Monitor Services View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–128–7 Service Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1410–1 Troubleshooting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–410–2 A daemon.log File Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–610–3 Alert Message Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–710–4 Troubleshooting an Active TruCluster Configuration . . . . . . . . . . . . . . 10–910–5 Troubleshooting Nonactive TruCluster Configurations . . . . . . . . . . . . . 10–11

Tables

1 Conventions Used in this Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi3–1 SCSI Bus Lengths in Some Devices and Systems . . . . . . . . . . . . . . . . 3–43–2 Available Server Supported System Types . . . . . . . . . . . . . . . . . . . . . . 3–103–3 DECsafe-Supported SCSI Controllers . . . . . . . . . . . . . . . . . . . . . . . . . 3–113–4 TruCluster Available Server Supported Storage Expansion Units . . . . 3–113–5 Supported DEC RAID Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–143–6 TruCluster Available Server Supported Disk Devices . . . . . . . . . . . . . 3–153–7 Supported Signal Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–163–8 Cables Used for Available Server Configurations . . . . . . . . . . . . . . . . . 3–213–9 Terminators and Special Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . 3–233–10 Supported Network Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–243–11 Setting Up TruCluster Available Server Configurations . . . . . . . . . . . 3–273–12 PMAZC SCSI Controllers and a BA350 or BA353 with Single-Ended

Available Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–283–13 Hardware Needed for a Single-Ended Available Server Configuration

with PMAZC SCSI Controllers and a BA350 or BA353 (NoDWZZAs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–31

3–14 Setting Up an Available Server Configuration with PMAZC SCSIControllers and a BA350 or BA353 in a Differential Available ServerConfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–31

3–15 Hardware Needed for a Differential Available Server Configurationwith PMAZC or KZMSA SCSI Controllers and a BA350 or BA353 . . . 3–34

3–16 Setting Up an Available Server Configuration with PMAZCSCSI Controllers and a BA356 in a Differential Available ServerConfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–35

3–17 Hardware Needed for a Differential Available Server Configurationwith PMAZC or KZMSA SCSI Controllers and a BA356 . . . . . . . . . . . 3–37

3–18 Setting Up a Available Server Configuration with PMAZC SCSIControllers and an HSZ10 or HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . 3–39

3–19 Hardware Needed for a Differential Available Server Configurationwith PMAZC or KZMSA SCSI Controllers and an HSZ40 or PMAZCsand an HSZ10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–40

3–20 Setting Up an Available Server Configuration with KZTSATURBOchannel to SCSI Adapters and a BA350, BA353, or BA356 . . 3–45

xvi

Page 17: Truclu Ase

3–21 Hardware Needed for a KZPSA or KZTSA and BA350, BA353, orBA356 Available Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 3–47

3–22 Setting Up an Available Server Configuration with KZTSATURBOchannel to SCSI Adapters and an HSZ40 . . . . . . . . . . . . . . . . 3–48

3–23 Hardware Needed for an Available Server Configuration with aKZPSA or KZTSA and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–49

3–24 KZMSA Boot ROM Part Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–503–25 Setting Up an Available Server Configuration with KZMSA XMI to

SCSI Adapters and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . 3–513–26 Setting Up an Available Server Configuration with KZMSA XMI to

SCSI Adapters and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–543–27 Setting Up an Available Server Configuration Using KZPSA Adapters

and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–623–28 Setting Up an Available Server Configuration Using KZPSA Adapters

and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–633–29 Setting Up an Available Server Configuration with Mixed Host

Adapters and a BA350, BA353, or BA356 . . . . . . . . . . . . . . . . . . . . . . 3–683–30 Hardware Needed for a Mixed Adapter Available Server

Configuration with BA350 or BA353 . . . . . . . . . . . . . . . . . . . . . . . . . . 3–703–31 Setting Up an Available Server Configuration with Mixed Host

Adapters and an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–713–32 Hardware Needed for a Mixed Adapter Available Server

Configuration with an HSZ40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–723–33 Phase 1: Installing the KZPSA SCSI Adapters . . . . . . . . . . . . . . . . . . 3–803–34 Creating a Shared Bus with the BA350 . . . . . . . . . . . . . . . . . . . . . . . 3–803–35 Creating a Shared Bus with the HSZ40 . . . . . . . . . . . . . . . . . . . . . . . 3–814–1 Upgrade Paths for Existing DECsafe Available Server Installation . . . 4–86–1 Script Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–48–1 Available Service Icons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–138–2 Main Window Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–168–3 Device View Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–178–4 Services View Failure Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1710–1 System Monitoring Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1210–2 Host Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1310–3 Agent Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1411–1 Frequently Reported Hardware Problems . . . . . . . . . . . . . . . . . . . . . . 11–411–2 Frequently Reported Software Problems . . . . . . . . . . . . . . . . . . . . . . . 11–1111–3 Known Limitations of TruCluster Available Server Software Version

1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–19

xvii

Page 18: Truclu Ase
Page 19: Truclu Ase

About This Course

xix

Page 20: Truclu Ase

About This Course

About This Course

Introduction The TruCluster Available Server Configuration and Managementcourse includes topics on configuring and managing variousconfigurations of the TruCluster Available Server SoftwareVersion 1.4 product.

This section describes the contents of the course, suggests ways inwhich you can most effectively use the materials, and sets up theconventions for the use of terms in the course. It includes:

• Course description — a brief overview of the course contents

• Target audience — who should take this course

• Prerequisites — the skills and knowledge needed to ensureyour success in this course

• Course goals and nongoals — what skills or knowledge thecourse will and will not provide

• Course organization — the structure of the course

• Course map — the sequence in which you should take eachchapter

• Chapter descriptions — brief descriptions of each chapter

• Time schedule — an estimate of the amount of time needed tocover the chapter material and lab exercises

• Course conventions — explanation of symbols and signs usedthroughout this course

• Resources — manuals and books to help you successfullycomplete this course

CourseDescription

This course describes tasks that support personnel and systemadministrators must perform to install, configure, and manageTruCluster Available Server configurations comprising as many asfour member systems.

The course provides extensive discussion on how to planTruCluster Available Server configurations, how to install theassociated peripherals and cabling, and how to upgrade to orinstall TruCluster Available Server Version 1.4 software.

TargetAudience

This course is designed for system management and supportpeople who have experience managing a Digital UNIX®environment and require instruction in advanced systemmanagement topics associated with configuring and maintainingTruCluster Available Server.

® Digital UNIX is a registered trademark of Digital Equipment Corporation.

xx

Page 21: Truclu Ase

About This Course

Prerequisites To get the most from this course, students should be able to:

• Install and manage a Digital UNIX system

• Install layered products and register PAKs

• Troubleshoot the operating system and make adjustments toimprove performance

• Connect to a TCP/IP network and understand subnetworkingconcepts

• Set up and maintain distributed network services such as timesynchronization

• Set up a Network File System

• Set up a LAT server

• Install and maintain AdvFS, LSM, and RAID functionality

These prerequisites can be satisfied by taking the followingcourses:

• DEC OSF/1 System Administration lecture lab or self-pacedcourse

• DEC OSF/1 Network Management lecture lab or self-pacedcourse

• AdvFS, LSM, and RAID Configuration and Managementlecture lab or self-paced course

These courses are accessible for personal use, internally overthe World Wide Web in PostScript format at the UNIX CourseDevelopment home page. The URL to this WWW location is:

http://mmstuf.zko.dec.com/

This course will also be taught by Global Knowledge Network,Inc.

Course Goals To perform TruCluster Available Server tasks, the systemmanager should be able to:

• Determine the hardware needs of various TruCluster AvailableServer Software Version 1.4 configurations

• Install and configure TruCluster Available Server hardware

• Install and configure the TruCluster Available Server SoftwareVersion 1.4 software on all member nodes

• Set up and maintain ASE services

• Generate and test ASE start and stop scripts

• Manage and maintain TruCluster Available Server hardwareand software

• Use other layered and complementary product software in aTruCluster Available Server configuration such as:

The Logical Storage Manager (LSM)

xxi

Page 22: Truclu Ase

About This Course

The Advanced File System (AdvFS) and its utilities

• Identify and resolve TruCluster Available Server-relatedproblems

Nongoals This course will not cover the following topics:

• Digital UNIX hardware installation

• AdvFS, LSM, or RAID configuration and management

• Configuring and setting up other cluster servers such as theproduction server

• Installing or maintaining other cluster-related software suchas the Oracle Parallel Server

• TruCluster Production Server functionality such as distributedraw disk or Distributed Lock Manager functionality

The latter goals are met in other courses:

• AdvFS, LSM, and RAID Configuration and Management

• TruCluster Software Configuration and Management

CourseOrganization

This Course Guide is divided into chapters designed to cover askill or related group of skills required to fulfill the course goals.Illustrations are used to present conceptual material. Examplesare provided to demonstrate concepts and commands.

In this course, each chapter consists of:

• An introduction to the subject matter of the chapter.

• One or more objectives that describe the goals of the chapter.

• A list of resources, or materials for further reference. Someof these manuals are included with your course materials.Others may be available for reference in your classroom or lab.

• The text of each chapter, which includes outlines, tables,figures, and examples.

• The summary highlights the main points presented in thechapter.

• The exercises found at the end of each chapter enable youto practice your skills and measure your mastery of theinformation learned during the course.

xxii

Page 23: Truclu Ase

About This Course

Course Map The Course Map shows how each chapter is related to otherchapters and to the course as a whole. Before studying achapter, you should master all of its prerequisite chapters. Theprerequisite chapters are depicted before the following chapterson the Course Map. The direction of the arrows determines theorder in which the chapters should be covered.

This material can be effectively presented in many different ways.By providing prerequisite information as needed, your instructormay choose to use a chapter organization different from the oneshown in the following Course Map.

Figure 1 TruCluster Available Server Configuration and Management Course Map

Configuring TruClusterAvailable Server

Hardware

InstallingTruClusterSoftware

Writing andDebugging Action

Scripts

Using theCluster Monitor

ZKOX−5481−12−RGS

Understanding TruClusterSoftware Interactions

Setting UpASE Services

Setting Up andManaging ASE

Members

IntroducingTruCluster Available Server

Configuration andManagement

Testing, Recovering,and Maintaining TruCluster

Configurations

TroubleshootingTruCluster

Configurations

Resolving CommonTruCluster Problems

xxiii

Page 24: Truclu Ase

About This Course

ChapterDescriptions

A brief description of each chapter is listed below.

• Introducing TruCluster Available Server Configurationand Management — Provides an overview of the TruClusterAvailable Server Software Version 1.4 product, and brieflydescribes the things you should know when planning anAvailable Server implementation.

• Understanding TruCluster Software Interactions —Discusses the interactions among the TruCluster Softwarecomponents.

• Configuring TruCluster Available Server Hardware— Describes the general hardware configuration rules andrestrictions, lists the supported hardware used to set upTruCluster Available Server Software Version 1.4, and how toset up the hardware for TruCluster Available Server.

• Installing TruCluster Software — Describes how toupgrade to, or install the TruCluster Available Server SoftwareVersion 1.4 software.

• Setting up and Managing ASE Members — Describes howto set up and manage ASE member systems.

• Writing and Debugging Action Scripts — Discusses thetypes of action scripts that TruCluster Available Server uses,and the guidelines and conventions for creating and debuggingaction scripts.

• Setting Up ASE Services — Explains ASE services anddescribes how to set them up.

• Using the Cluster Monitor — Describes how to install anduse the Cluster Monitor.

• Testing, Recovering, and Maintaining TruClusterConfigurations — Describes how to verify that your ASEservices will behave as you expect when hardware failuresoccur.

• Troubleshooting TruCluster Configurations — Describesways to troubleshoot problems that occur in TruClusterSoftware configurations.

• Resolving Common TruCluster Problems — Examinesways to recognize and solve common TruCluster problems,focusing on two main topics: common problems andconfiguration guidelines.

xxiv

Page 25: Truclu Ase

About This Course

Time Schedule The amount of time required for this course depends on eachstudent’s background knowledge, experience, and interest in thevarious topics. Use the following table as a guideline.

Day Course Chapter Lecture Hours Lab Hours

1 About This Course 0.5 -

1 Introducing ASE Configuration and Management 1.5 1

Understanding TruCluster Software Interaction 1 1

Configuring TruCluster Available Server Hardware 2 2

2 Installing TruCluster Software 2 2

Setting up and Managing ASE Members 1 1

Writing and Debugging Action Scripts 1 1

3 Setting Up ASE Services 1 1

Using the Cluster Monitor .5 1

Testing, Recovering, and Maintaining the TruClusterSoftware

1 .5

Troubleshooting the TruCluster Software .5 .5

Resolving Common TruCluster Software Problems .5 .5

xxv

Page 26: Truclu Ase

About This Course

CourseConventions

Table 1 describes the conventions used in this course.

Table 1 Conventions Used in this Course

Convention Meaning

keyword Keywords and new concepts are displayed inboldface type.

examples Examples, commands, options and pathnamesappear in monospace type.

command(x) Cross-references to command documentationinclude the section number in the referencepages. For example, fstab (5) means fstab isreferenced in Section 5.

$ A dollar sign represents the user prompt.

# A number sign represents the superuser prompt.

bold Within interactive examples, boldface typeindicates user input.

key The box symbol indicates that the named key onthe keyboard is pressed.

.

.

.

In examples, a vertical ellipsis indicates that notall lines in the example are shown.

[ ] In syntax descriptions, brackets indicate itemsthat are optional.

variable In syntax descriptions, italics indicate items thatare variable.

. . . In syntax descriptions, an ellipsis indicates theitem may be repeated.

) Used to separate items to be selected by clickinga mouse button.

xxvi

Page 27: Truclu Ase

About This Course

Resources For more information on the topics in this course, see thefollowing documentation:

• TruCluster Available Server Software Version 1.4 SPD

• TruCluster Available Server Software Release Notes

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Available ServerEnvironment Administration

• Digital UNIX Installation Guide

• Digital UNIX Network Administration

• Cluster Monitor online help

xxvii

Page 28: Truclu Ase
Page 29: Truclu Ase

1Introducing TruCluster Available Server

Configuration and Management

Introducing TruCluster Available Server Configuration and Management 1–1

Page 30: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter provides an overview of the TruCluster AvailableServer Software (abbreviated as TruCluster Available Server),and provides a high-level discussion of the complexities ofplanning and configuring Available Server implementations. TheTruCluster Available Server product is a high-availability solutionthat minimizes, but does not eliminate, the impact of hardwareand software failures.

A TruCluster Available Server implementation consists of thefollowing elements:

• TruCluster Available Server Software

• Supported hardware for an Available Server Implementation

• Supported system software for an Available ServerImplementation

Collectively, this group of elements constitutes an AvailableServer Environment (ASE).

TruCluster Available Server configurations significantly reducedowntime due to system hardware or software failures. Theyprovide multihost access to SCSI disks and a generic failovermechanism for network-based services such as NFS, mail, andlogin, and for applications such as databases.

Failover is accomplished using a set of daemons to monitor thehealth of systems in the environment and an infrastructure formoving services from one node to another. System managers canalso use this on-demand service migration capability for plannedshutdowns before maintenance cycles and load balancing duringpeak performance demands.

The TruCluster Available Server Software can be integratedwith the POLYCENTER Advanced File System (AdvFS) andthe Logical Storage Manager (LSM) to provide high availabilityfor disks, storage reliability (disk mirroring, striping, andconcatenation), and fast file system recovery.

1–2 Introducing TruCluster Available Server Configuration and Management

Page 31: Truclu Ase

About This Chapter

Objectives To understand TruCluster Available Server configurations, youshould be able to:

• Describe the TruCluster Available Server Software product,including its features, components, and position in the DigitalUNIX cluster program

• Describe the basic hardware and software requirements for anAvailable Server configuration

• Explain the concepts associated with the TruCluster Software

• Describe the three most important considerations whenconfiguring a TruCluster Software solution

• Describe the configuration and management phases involvedin establishing an Available Server Software implementation

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• TruCluster Available Server Software Software ProductDescription, SPD 44.17.xx

• Reference Pages

Introducing TruCluster Available Server Configuration and Management 1–3

Page 32: Truclu Ase

Describing the TruCluster Available Server Product

Describing the TruCluster Available Server Product

Overview The TruCluster Available Server Software product significantlyreduces downtime caused by hardware and software failuresby responding to predefined failure events within an AvailableServer Environment (ASE). An ASE is an integrated organizationof systems and external disks connected to one or more sharedSCSI buses and networks that together provide highly availablesoftware and disk data to client systems.

The TruCluster Available Server provides multihost access toSCSI disks and a generic failover mechanism for disks andapplications. After configuring the ASE and installing theTruCluster Software, administrators can set up services thatmake disks and applications highly available to client systems.For example, users can set up services for exported NFS disks,raw disks, disk-based applications such as database programsor mail, and nondisk-based applications such as a remote loginservice.

The TruCluster Available Server includes the following features:

Concurrentlyactive servers

All the systems and disks in the ASE are connected to at least oneshared SCSI bus. The services you set up in the ASE can use thedisks, but a service must have exclusive access to a disk. A serviceruns on one system at a time, but any system can run a service.

Master/Standbyconfiguration

You can set up a Master/Standby configuration in which the Mastersystem runs all the services. If a failure prevents the Master systemfrom running the services, the TruCluster Available Server relocatesthe services to the Standby system.

TransparentNFS failover

If the TruCluster Available Server relocates a service that providesaccess to exported NFS data, the change in the system exporting thedata is virtually transparent. Clients experience only a temporaryNFS server time out.

Fast file systemrecovery

The POLYCENTER Advanced File System (AdvFS) provides rapidcrash recovery, high performance, and a flexible structure.

Increased dataintegrity

Digital RAID technology and the Logical Storage Manager (LSM)software provide high data availability for disk storage devices,protecting against data loss and improving disk input/output (I/O)performance. You can also perform disk management tasks withoutdisrupting access to disks.

Global eventlogging

You can log messages about events that occur in the ASE to one ormore systems. You can also receive notification of critical problemsthrough electronic mail.

1–4 Introducing TruCluster Available Server Configuration and Management

Page 33: Truclu Ase

Describing the TruCluster Available Server Product

Networkfailover andnetworkmonitoring

The systems in the ASE can be configured using multiple networkadapters. This provides better availability and performance, asall known paths and routing connections are used when networkpath failures occur. Multiple network adapters also provide greaterflexibility in client access to ASE services.Networks can also be monitored for proper operation. The statusof monitored networks is passed to a TruCluster script that can becustomized.

Cluster Monitor The Cluster Monitor provides a graphical view of the Available Serverimplementation and may be used to determine the current state ofavailability and connectivity in the ASE.

ProductPosition

The TruCluster Available Server Software product was originallyconceived as a mechanism for restarting NFS services in anew location following a hardware failure. Its capabilities haveexpanded into a general failover product. If a server providing anetwork-based service or application crashes or cannot access thedisks or network, the service is relocated to another server. TheTruCluster Available Server is also a first step in implementingfull clustering capabilities on Digital UNIX platforms.

TruCluster Available Server configurations make distributedservices such as NFS, mail, or remote login highly availableand manageable. When the system providing a service fails,the TruCluster Software automatically relocates it to anotherserver. Failover occurs quickly and transparently to users. TheTruCluster Software also lets system managers relocate servicesfrom one server to another on demand.

The TruCluster Software provides a tool set for building availableapplications. Any application that runs on one system and canbe stopped and started via scripts can be set up to be highlyavailable in the ASE. You provide the scripts and the TruClusterSoftware provides the support for running and relocating services.

The TruCluster Available Server offers unified configurationand control functions and provides availability for distributedservices. This is a step in the direction of an integrated clusteringcapability. A cluster is a group of systems that works collectivelyas a single system to provide fast, uninterrupted computingservice. A cluster should provide high availability of services,scalable performance, and centralized management. You cancluster from two to four Alpha systems in a TruCluster AvailableServer configuration, using SCSI and TCP/IP as interconnects,and supporting multivendor client workstations. The DigitalAdvantageCluster Available Server package includes theTruCluster Software in a preconfigured hardware package.

TruCluster Available Server configurations are ideal forsupporting such high availability applications as order processing,point-of-sale transaction processing, online customer service,reservation, catalog and database query systems.

Introducing TruCluster Available Server Configuration and Management 1–5

Page 34: Truclu Ase

Configuring the TruCluster Available Server

Configuring the TruCluster Available Server

Overview The TruCluster Available Server coordinates groups of computersystems and their disks so they can function as a unifiedservice environment rather than as an unintegrated collection ofindividual computers.

The Available Server hardware configuration consists of from twoto four member systems, connected to at least one shared SCSIbus, with external disks. The systems communicate with eachother and monitor the shared devices through both the shared busand the network.

HardwareRequirements

An Available Server hardware configuration can contain from twoto four systems, referred to as member systems. The systemsare connected to at least one shared SCSI bus, and they accessexternal disks that are connected to the shared bus. The systemscommunicate with each other and monitor the shared devicesthrough both the shared bus and the configured networks.

The TruCluster Software supports a variety of Digital systems,as listed in the Software Product Description (SPD 44.17.xx).Because there are eight SCSI IDs (0-7), the TruCluster Softwaresupports combinations of a number of systems and disks.

• One to six shared disks per SCSI bus in a two-nodeenvironment

• One to five shared disks per SCSI bus in a three-nodeenvironment

• One to four shared disks per SCSI bus in a four-nodeenvironment

The external disks are mounted in a storage expansion box. Thisenables all the member systems to have access to the disks andprovides a separate power source, not dependent on any system’spower.

The TruCluster Software supports Ethernet and FDDI networkcontrollers. The TruCluster Software does not support PrestoserveNVRAM failover; the cache would be stranded.

See the Software Product Description for detailed hardwarerequirements.

1–6 Introducing TruCluster Available Server Configuration and Management

Page 35: Truclu Ase

Configuring the TruCluster Available Server

SoftwareRequirements

The TruCluster Available Server Software Version 1.4 requiresthe Digital UNIX operating system Version 4.0a. Before you caninstall Digital UNIX V4.0a, you must first upgrade your systemsto Digital UNIX Version 3.2g. The Digital UNIX Version 3.2gpatches are located on the Complementary Products CD–ROM.

You must install the following subsets for the TruCluster Softwareto work correctly:

• Basic Networking Services

• Software Development Environment

The POLYCENTER Advanced File System (see SPD 46.16.xx) isrecommended for fast file system recovery. The Logical StorageManager (See SPD 51.24.xx) is recommended for creatingmirrored and/or striped volumes.

SampleAvailableServerConfiguration

Figure 1–1 depicts an Available Server implementation withtwo member systems in the ASE, two network connections, twoshared SCSI buses, and four ASE services configured.

Figure 1–1 Sample Available Server Configuration

ZK−0940U−AI

Private non-shared disks

Server 1 Server 2

nfs_service mail_service

dbase_service login_service

SCSIController

SCSIController

SCSIController

SCSIController

Shared diskson shared buses

Client

Client

Client

The chapters in this course contain extensive system configurationdescriptions and information.

Introducing TruCluster Available Server Configuration and Management 1–7

Page 36: Truclu Ase

Configuring the TruCluster Available Server

You can also refer to TruCluster Available Server SoftwareHardware Configuration and Software Installation for otherdescriptions and examples of Available Server Softwareconfigurations.

1–8 Introducing TruCluster Available Server Configuration and Management

Page 37: Truclu Ase

Presenting the TruCluster Software

Presenting the TruCluster Software

SoftwareComponents

The TruCluster Software must be installed on each membersystem. The components interact by means of TCP/IP socketconnections. The TruCluster Software components include:

• Manager utility (asemgr ) — user interface allowing systemadministrators to send commands to the Director daemon

• Director daemon (asedirector ) — daemon that manages andcoordinates activities within the ASE domain

• Agent daemon (aseagent ) — daemon that runs on each ASEmember; oversees ASE activities for a specific member

• Host status monitor daemon (asehsm) — daemon that runs oneach ASE member; monitors the status of the other membersin the ASE domain

• Logger daemon (aselogger ) — daemon that writes errormessages to the system log files

• Availability manager driver — driver that supports server-to-server messages on the SCSI bus; reports I/O subsystemfailures to the Agent daemon

• ASE driver — driver that supports UFS file systems exportedfrom LSM volumes

• asecdb — database that describes all member nodes andservices within the ASE domain

• Action scripts — scripts that initialize, start, and stop ASEservices

• Cluster Monitor — GUI that allows you to monitor activitywithin the ASE

Figure 1–2 shows the distribution of the TruCluster Softwarecomponents in a configuration containing three ASE members.The asemgr utility and Director daemon run on only one membersystem at a time. The Logger daemon can run on more than onesystem.

Introducing TruCluster Available Server Configuration and Management 1–9

Page 38: Truclu Ase

Presenting the TruCluster Software

Figure 1–2 TruCluster Software Components

Member system

Client Client Client

ZKOX−3816−81−RGS

Director daemon

Loggerdaemon

Actionscript

Host statusmonitordaemon

Availability managerdriver

Agentdaemon

Member system Member system

Manager utility

Actionscript

Actionscript

Host statusmonitordaemon

Host statusmonitordaemon

Availability managerdriver

Availability managerdriver

Agentdaemon

Agentdaemon

ASE Services To make an application highly available, you must set up an ASEservice for that application. Each service is assigned a uniquename. The TruCluster Software supports three types of services:

• NFS service

• Disk service

• User-defined service

If the member system running a service fails, or if a network orI/O bus failure prevents the system from providing the serviceto clients, the TruCluster Software automatically relocates theservice to another member system under the following conditions:

• A member system fails

• Members cannot access a device due to a SCSI bus failure or adevice failure

• All network connections on a member system fail, while thenetwork is still available to other ASE members

• A member system that was down comes back up and has amore favored status to run a particular service, as defined inthe TruCluster Software Automatic Service Placement (ASP)policy

You can use the asemgr utility to manually relocate a service fromone member system to another.

1–10 Introducing TruCluster Available Server Configuration and Management

Page 39: Truclu Ase

Planning TruCluster Available Server Configurations

Planning TruCluster Available Server Configurations

Overview Planning an Available Server implementation requires you toconsider three main areas of configuration:

• Network configuration

• Storage configuration

• Service Availability configuration

NetworkConfiguration

When planning an Available Server network configuration, youmust determine:

• How clients will receive the services provided by AvailableServer

• How failover will be performed so that the clients can continuereceiving the service

Available Server implementations can be configured with asingle ‘‘primary’’ network, or with primary and backup networks.Backup networks provide higher availability by allowing servicesand daemons to switch to auxiliary network paths if a networkfailure occurs. The TruCluster Software will declare a networkpartition or interface failure only after all configured networkpaths are found to be nonfunctional.

To work properly, the primary and backup networks in theASE must provide equal access from any member to any clientrequiring the available services. This gives any member theability to serve any client during a failover recovery sequence.This is sometimes described as a symmetric networkconfiguration between clients and ASE members.

The TruCluster Software uses separate network IP addresses toprovide a common access for IP services like NFS. These serviceaddresses must be reserved so that no systems accidentally usethe same addresses.

Introducing TruCluster Available Server Configuration and Management 1–11

Page 40: Truclu Ase

Planning TruCluster Available Server Configurations

StorageConfiguration

When planning an Available Server configuration, you mustalso determine the shared storage requirements for the highlyavailable set of services. For example, a configuration may needto provide a highly available database to a set of clients. Youmust know the size and availability constraints of the database toset up an appropriate Available Server configuration.

The Available Server storage configuration requires that there beat least one shared SCSI bus between all members. All storageused to support highly available services must be commonlyaccessible by each cluster member through a shared SCSIbus connection. The configuration of the ASE, with respect toshared SCSI buses and devices, must be such that each deviceis addressable from each member by the same name. Thisconfiguration of common/shared devices and SCSI buses is oftenreferred to as a symmetric storage configuration.

You must also know the amount of storage to be shared and thedegree to which it must withstand failures to successfully planAvailable Server storage requirements. For example, if an ASEservice must survive failures of a single bus and single disk atthe same time, the storage associated with that service must bemirrored across at least two SCSI buses and each of those busesmust be directly accessible from all ASE members.

ServiceAvailabilityConfiguration

Finally, when planning any Available Server configuration youmust determine the services that are being made available.If there are no services, an Available Server solution is notrequired. Knowing the nature of the service and its availabilityrequirements are essential when creating a TruCluster AvailableServer implementation to best suit the service.

Two characteristics are required for an application to be suitablefor an ASE service:

• The application must run on only one system at a time.

• The application must be able to be started and stopped usinga set of commands issued in a specific order so that thecommands can be used in an action script.

One of the more common applications that lends itself to beingsuitable for an ASE service is a database application for a set ofclients. Anytime the database becomes inaccessible to the clients,the customer will complain that the TruCluster Available Serverconfiguration is failing. Making the database highly availablerequires an understanding of the database design, storagerequirements, and the communications paths used to provideaccess to clients.

A breakdown in any of the communications paths can preventaccess to the ASE database service. To the customer, this is anavailability problem and since the TruCluster Available ServerSoftware is the primary product that provides availability, this

1–12 Introducing TruCluster Available Server Configuration and Management

Page 41: Truclu Ase

Planning TruCluster Available Server Configurations

becomes a Digital problem. Unfortunately this problem maynot be directly related to the TruCluster Available Server. Theproblem is often caused by poorly configured networks thatbecome noisy or saturated.

Be careful when configuring failover storage associated with aservice. Ask the customer if the storage needs are likely to grow.If storage needs are expected to grow, the storage configurationplan should take this into account and provide for expansion.Storage can be expanded in many ways depending on theapplication’s storage requirements.

Introducing TruCluster Available Server Configuration and Management 1–13

Page 42: Truclu Ase

Determining Configuration and Maintenance Phases

Determining Configuration and Maintenance Phases

Overview Establishing and maintaining an Available Server implementationrequires that you go through a number of distinct phases. Youmust know which phase you or the customer is in when a problemoccurs. Many times problems occur because phases have beenskipped or issues associated with a particular phase have notbeen completely addressed.

As shown in the following figure, the phases associated withestablishing and maintaining an Available Server implementationare:

• Planning the hardware and software configuration

• Configuring the hardware

• Installing and setting up the base operating system software

• Installing the TruCluster Software software

• Configuring ASE services

• Testing ASE failover sequences

• Monitoring and managing the environment

• Troubleshooting

Figure 1–3 Available Server Configuration and Maintenance Phases

Installing the InstallingBase OS

Failover

Configuring

Installing andConfiguration Configuring the

Hardware

TroublishootingMonitoring and

Planning the

Software

Services

TestingManaging

ZKOX−5481−20−RGS

ASE ASEan ASE Implementations

TruCluster

TruCluster

To help determine whether a particular phase is complete, youmust develop worksheets to collect the pertinent informationfor each phase. The following sections provide lists of questionsassociated with each phase. You can use these functions todevelop the necessary information.

1–14 Introducing TruCluster Available Server Configuration and Management

Page 43: Truclu Ase

Determining Configuration and Maintenance Phases

Planning theAvailableServerConfiguration

When planning the hardware and software needs for an AvailableServer configuration, you must ask:

• What services are to be made available?

• What is the network configuration?

• What are the survivable failures?

Server => 2 to 4 nodes per ASE

Disk => mirroring or RAID (what kind?)

SCSI bus => mirroring across multiple SCSI buses

Power => UPS

Network => dual network controllers per served net (notcurrently supported on primary net)

• How much available storage is required?

• What is the frequency of client requests?

• What other services are provided to clients?

• What is the security policy?

• Do any available services require custom scripts?

• Is this Available Server configuration expected to grow inmembers or storage? If so, how soon and to what extent?

ConfiguringASE Hardware

When configuring the hardware for the ASE, you must ask:

• What is the primary network?

• What are the backup networks (if any).

• If subnets are being used, are all members on the samesubnet?

• What are the client connectivity networks?

• How many shared SCSI buses are required?

• How many systems or members are required?

• What are the storage requirements?

• How will the SCSI buses be configured (SNS, FWD)?

• Are DWZZAs required? If so, how many?

• What are the cable length restrictions?

• Is UPS required?

• What are the cabling needs? Y cables?

• Are all hardware component revisions correct?

Introducing TruCluster Available Server Configuration and Management 1–15

Page 44: Truclu Ase

Determining Configuration and Maintenance Phases

Installing andSetting Up theBase OperatingSystem

When setting up the base operating system, you must answer thefollowing questions:

• Are there any licensing issues?

• What kernel options to choose?

• What network address map to which interfaces?

• Which member system will be the ntp server to synchronizetime?

• What service addresses are set up?

• Are mirrored or striped volumes required?

• Are RAID setups required?

• How to set up the security policy?

• Run BIND and/or NIS to distribute IP addresses?

• Are all members in each others’ /etc/hosts file?

Installing theTruClusterSoftware

When installing the TruCluster Software, you must ask:

• Is the TruCluster Software Product Authorization Key (PAK)registered?

• Is the Logger daemon running on this node?

• Is the membership list up and operational (pinging)?

ConfiguringASE Services

When configuring the ASE services, you must ask:

• What types of services are going to be set up to fail over?

• Which networks do you want the TruCluster Software tomonitor?

• What is the automatic service placement policy?

• What are the storage needs (NFS, AdvFS, LSM, UFS)?

Testing theTruClusterSoftwareFailoverSequences

When you have installed the TruCluster Software and set upservices, you must test your system to verify that it fails overcorrectly. The questions you must answer are:

• How to cause a server failure?

• How to create a storage container failure?

• How to invoke a SCSI bus failure?

• How to test a network interface failure?

• How to test a network partition?

1–16 Introducing TruCluster Available Server Configuration and Management

Page 45: Truclu Ase

Determining Configuration and Maintenance Phases

Monitoringand ManagingAvailableServerConfigurations

After your system has been running for awhile, you will beperforming maintenance tasks. The questions you may be askingare:

• How will I get status on services and members?

• Will I be relocating services for load-balancing purposes?

• Will I be taking services off line?

• How do I restart services after a disk failure?

• How do I rereserve LSM disks after a disk failure?

• How do I upgrade the TruCluster Software?

Troubleshootingan ExistingAvailableServerConfiguration

When problems occur in an existing Available Serverconfiguration, you must be able to answer these troubleshootingquestions:

• How do I read and evaluate the kern.log and the daemon.logfiles?

• What corrective actions must I take?

• What are some of the most commonly experienced problems?

Introducing TruCluster Available Server Configuration and Management 1–17

Page 46: Truclu Ase

Summary

Summary

Introduction tothe AvailableServerSoftware

The TruCluster Available Server Software product is a highavailability solution that minimizes, but does not eliminate, theimpact of hardware and software failures.

The TruCluster Available Server Software was originallyconceived as a mechanism for restarting NFS services in anew location following a hardware failure. Its capabilities haveexpanded into a general failover product. It is also a first stepin implementing full clustering capabilities on the Digital UNIXplatforms.

Configuringthe AvailableServerSoftware

The TruCluster Software coordinates groups of computersystems and their disks so they can function as a unified serviceenvironment rather than as an unintegrated collection of looselycoupled processors. For the TruCluster Software to functionproperly, only supported hardware configurations are allowed.

An Available Server hardware configuration consists of from twoto four member systems connected to at least one shared SCSIbus, with external disks. The systems communicate with eachother and monitor the shared devices through both the shared busand the network.

Presentingthe TruClusterSoftware

The TruCluster Software components include:

• Manager utility (asemgr ) — user interface allowing systemadministrators to send commands to the Director daemon

• Director daemon (asedirector ) — daemon that manages andcoordinates activities within the ASE domain

• Agent daemon (aseagent ) — daemon that runs on each ASEmember; oversees ASE activities for a specific member

• Host status monitor daemon (asehsm) — daemon that runs oneach ASE member; monitors the status of the other membersin the ASE domain

• Logger daemon (aselogger ) — daemon that writes errormessages to the system log files

• Availability manager driver — driver that supports server-to-server messages on the SCSI bus; reports I/O subsystemfailures to the Agent daemon

• ASE driver — driver that supports UFS file systems exportedfrom LSM volumes

1–18 Introducing TruCluster Available Server Configuration and Management

Page 47: Truclu Ase

Summary

• asecdb — database that describes all member nodes andservices within the ASE domain

• Action scripts — scripts that initialize, start, and stop ASEservices

• Cluster Monitor — GUI that allows you to monitor activitywithin the ASE

PlanningAvailableServerConfigurations

There are three main areas you need to consider when you plan aTruCluster Software configuration:

• Network configuration

• Storage configuration

• Service Availability configuration

To work properly, the Available Server network configuration mustprovide equal access from any member to any client requiring theavailable services. This gives any member the ability to serveany client during a failover recovery sequence. This is sometimesdescribed as a symmetric network configuration betweenclients and ASE members.

The Available Server storage configuration requires that there beat least one shared SCSI bus between all members. All storageused to support highly available services must be commonlyaccessible by each cluster member through a shared SCSI busconnection.

When planning any Available Server configuration, you mustdetermine the services that are being made available. If thereare no services, an Available Server Software solution is notrequired. Knowing the nature of the service and its availabilityrequirements are essential to configuring the TruCluster Softwareto best suit the service.

DeterminingConfigurationandMaintenancePhases

The phases associated with establishing and maintaining anAvailable Server implementation are:

• Plan the hardware and software requirements for theAvailable Server implementation

• Configure the hardware for the ASE

• Install and set up the base operating system software

• Install the TruCluster Software and configure the ASEServices

• Test ASE failover sequences

• Monitor and manage the ASE

• Troubleshoot the Available Server configuration if a problemoccurs

Introducing TruCluster Available Server Configuration and Management 1–19

Page 48: Truclu Ase

Exercises

Exercises

Describingthe TruClusterAvailableServerSoftwareProduct:Exercise

1. Describe the purpose of the TruCluster Available ServerSoftware product.

2. Identify the TruCluster Software features.

3. Describe the TruCluster Software’s position in the DigitalUNIX cluster program.

Describingthe TruClusterAvailableServerSoftwareProduct:Solution

1. The TruCluster Software is a high availability solution thatminimizes, but does not eliminate, the impact of hardware andsoftware failures.

2. The TruCluster Software features include concurrently activeservers, master/standby configuration, transparent NFSfailover, automatic restart, fast file system recovery (whenused with AdvFS), and increased data integrity.

3. The TruCluster Software offers unified configuration andcontrol functions as well as availability for distributedservices. This is a step in the direction of an integratedclustering capability. The Digital AdvantageCluster AvailableServer package includes the TruCluster Software software in apreconfigured hardware package.

Configuringthe TruClusterSoftware:Exercise

Describe a basic Available Server hardware configuration.

Configuringthe TruClusterSoftware:Solution

A basic Available Server hardware configuration consists of two tofour member systems, connected to at least one shared SCSI bus,with external disks. The systems communicate with each otherand monitor the shared devices through both the shared bus andthe network.

1–20 Introducing TruCluster Available Server Configuration and Management

Page 49: Truclu Ase

Exercises

Presentingthe TruClusterSoftware:Exercise

1. Identify the TruCluster Software components.

2. Which components are located on all member systems? Ononly one member system?

Presentingthe TruClusterSoftware:Solution

1. Available Server software components include the Managerutility (asemgr ), the Director daemon (asedirector ), the Agentdaemon (aseagent ), the Host Status Monitor daemon (asehsm),the Logger daemon (aselogger ), the Availability Managerdriver (AM), and action scripts.

2. The Agent daemon, Host Status Monitor daemon, andAvailability Manager driver must be located on each membersystem. The Manager utility and Director daemon must berun on only one member system. The Logger daemon can runon one or more member systems. The action scripts are on allmember systems that can be servers for those services.

PlanningAvailableServerConfigurations:Exercise

List and describe the three areas of consideration when planningan Available Server implementation.

PlanningAvailableServerConfigurations:Solution

The three areas to consider when planning an Available ServerSoftware implementation are:

• Network Configuration - Available Server networkconfigurations must provide equal access from any member toany client requiring the available services.

• Storage Configuration - All storage used to support highlyavailable services must be commonly accessible by each ASEmember through a shared SCSI bus connection.

• Service Availability Configuration - One of the more commonapplication services is providing highly available access to adatabase for a set of clients.

Introducing TruCluster Available Server Configuration and Management 1–21

Page 50: Truclu Ase

Exercises

DeterminingConfigurationandMaintenancePhases:Exercise

List the phases for establishing and maintaining an AvailableServer configuration.

DeterminingConfigurationandMaintenancePhases:Solution

The phases associated with establishing and maintaining anAvailable Server implementation are:

• Plan the hardware and software configuration for theAvailable Server implementation

• Configure the hardware for the ASE

• Install and set up the base operating system software

• Install the TruCluster Software and configure ASE Services

• Test ASE failover sequences

• Monitor and maintain the environment

• Troubleshoot the existing Available Server configuration

1–22 Introducing TruCluster Available Server Configuration and Management

Page 51: Truclu Ase

2Understanding TruCluster Software

Interactions

Understanding TruCluster Software Interactions 2–1

Page 52: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter discusses the interactions among the TruClusterSoftware components. It concentrates on three major topics:

• Basic concepts of highly available services

• Descriptions of the TruCluster Software components

• Analysis of how the software components interact to detectfailure events and respond when failure events occur

Objectives To understand the TruCluster Software interactions, you shouldbe able to:

• Define the basic concepts of highly available services

• Describe the TruCluster Software components

• Describe the ways in which the TruCluster Softwarecomponents interact to detect and respond to failure events

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• Reference Pages

2–2 Understanding TruCluster Software Interactions

Page 53: Truclu Ase

Introducing Highly Available Services

Introducing Highly Available Services

Overview The TruCluster Available Server provides an infrastructurethat makes applications and system services ‘‘highly available’’to clients, minimizing downtime. High availability is achievedby decoupling application downtime from system downtime.If critical resources on a given server become unavailable, theTruCluster Software runs action scripts that can restart anapplication on another server. Different scripts are run inresponse to different failure events.

The TruCluster Software detects system failures only; it cannotdetect software failures within an application, or corrupted datafiles. However, the TruCluster Software can recover from systemsoftware failures, provided they are severe enough to interruptinter-server communication over at least one of the redundantcommunications paths monitored by the software. In somecases, the server that has suffered the failure may be capableof initiating the recovery process. However, when the damagedserver is unable to recover, it is up to the surviving servers torealize a server ‘‘went down’’ and react appropriately.

The services that the TruCluster Software supports are configuredwithin an Available Server Environment (ASE). An ASE consistsof from two to four servers that are loosely coupled through thesharing of one or more networks and one or more SCSI buses. Bysending message packets across these redundant paths at regularintervals, the TruCluster Software can determine the currentstatus of the ASE members and initiate appropriate actions whenfailures occur.

Independenceof Servicesfrom Servers

An ASE service is an application that is provided to clients, suchas a database, an NFS service, or an electronic mail service.Each ASE service has a unique name. Administrators use thename to manage the service, and clients also use the name whenspecifying their requests for service. This arrangement has thebenefit that clients need not know which server is currentlyproviding the service. For example, incoming mail addressed [email protected] will be queued and delayed while machineserver is taken off line for repair, while mail addressed [email protected] will not be affected because another servercan provide the service named mailer while server is unavailable.A service’s name is similar to a machine’s hostname, although itcannot be a machine’s hostname.

Understanding TruCluster Software Interactions 2–3

Page 54: Truclu Ase

Introducing Highly Available Services

Action Scripts ASE services are started and stopped through action scripts.Consequently, any application that is placed in an ASE servicemust be manageable through script-based commands. If theTruCluster Software determines that a resource critical to yourservice has failed and a redundant resource is available to replaceit, the TruCluster Software executes a script that stops therunning instance of the service, replaces or reassigns the failedresource, and executes other scripts to start a new instance of theservice on another server.

The TruCluster Software does not promise that clients will notnotice the interruption, nor does it promise that the client will notneed to take action to resume use of the service. The TruClusterSoftware simply promises to try to restart a broken service.

2–4 Understanding TruCluster Software Interactions

Page 55: Truclu Ase

Introducing the TruCluster Software Components

Introducing the TruCluster Software Components

Overview The TruCluster Software consists of four daemons, several scripts,a user interface, and two drivers. The presence of drivers in thekit requires that the kernel be rebuilt once the software is copiedfrom the distribution media.

The TruCluster Software software is installed in the followingdirectories, some of which are links:

/dev , /etc , /opt , /sbin , /sbin/init.d , /sbin/rc*.d , /usr/opt , /usr/bin , /usr/share/man , /usr/sys/kits , /usr/var/adm , /usr/var/ase ,/usr/var/opt , /usr/var/run , /var/ase , and /var/opt .

TruClusterSoftwareComponents

The TruCluster Software contains the following components:

• /usr/sbin/ asedirector

The Director daemon has a global view of the state of theservices provided by the servers in the ASE domain. Onlyone instance of the Director can run within a given domain;the Agents collaborate to start a new instance if the memberrunning the Director goes down. The Director assignsservices to servers according to the state of the domain,honoring preferences and constraints imposed by the systemadministrator.

• /usr/sbin/ aseagent

The Agent daemon oversees a single ASE member server.Each member runs an instance of the Agent, which maintainsa near real-time view of the status of its host, communicatingthis information to the Director.The Agent deduces the status of service availability frominformation reported by the Host Status Monitor and theAvailability Manager. It starts and stops services on the localmember under the Director’s direction. The Agent daemonuses the Availability Manager driver interfaces to reservedisks and to receive notification of lost reservations.The Agents on each member system are responsible forelecting a Director. If an Agent exits, the rest of the ASEmembers consider the affected server to be "down."

• /usr/sbin/ asehsm

The Host Status Monitor (HSM) daemon monitors the‘‘up/down’’ status of the other member nodes and the stateof the local network interfaces, reporting any changes to theAgent. On the member where the Director is running, theHSM reports to the Director as well. Each member node runsa single instance of the HSM.

Understanding TruCluster Software Interactions 2–5

Page 56: Truclu Ase

Introducing the TruCluster Software Components

• /usr/sbin/ aselogger

The aselogger (Logger) daemon writes messages to the systemlog files under /var/adm/syslog.dated . One instance of theLogger can run on each server within the ASE, but the Loggeris not required to run on any member. However, running aninstance of the Logger on each ASE member is recommended.

• Availability Manager DriverThe Availability Manager (AM) is a pseudodriver that lies ontop of the base system’s SCSI CAM drivers. It implementsfunctions needed by the HSM that support server-to-servermessages on the SCSI bus. The AM also reports I/Osubsystem failures to the Agent.

• ASE DriverThe ASE driver is a pseudodriver needed to support UFS filesystems exported from LSM volumes. If no UFS file systemshave LSM volumes assigned to them, the ASE driver is notused.

• /usr/sbin/ asemgr

The asemgr is the user interface to the Director. The systemadministrator can run the asemgr from any member node tocontrol the operation of the TruCluster Software. If more thanone instance of the asemgr is running within an ASE domain,the state of the database is protected by locks.Unlike the TruCluster Software daemons, the asemgr isnot run continuously. It can be accessed through a text-based interactive interface, and it can also be invokednoninteractively, as from a script.

• /usr/var/ase/config/ asecdb

The asecdb is the binary database that describes all membernodes and services known to the ASE. It is maintained usingthe asemgr .

• Action ScriptsAction Scripts act as the interface between the TruClusterSoftware and the ASE services. To make an application readyfor use within an ASE, you first write scripts to start and stopthe application. You can then use the asemgr to create theservice, assign the application’s resources to the service, andcopy the Action Scripts to the ASE database. The TruClusterSoftware invokes the script to restart the application in theevent of a failure. Since Action Scripts are copied into the ASEdatabase, editing the original does not alter the TruClusterSoftware’s copy. Action Script templates are provided in the/var/opt/ASE*/ase/lib directory.

2–6 Understanding TruCluster Software Interactions

Page 57: Truclu Ase

Introducing the TruCluster Software Components

You can also create user-defined Action Scripts to controlthe TruCluster Software through the asemgr command lineinterface. For instance, a cron script could be used to take anASE service off line, back it up, and place it on line again.

• Cluster MonitorThe Cluster Monitor, contained in an optional software subset,provides graphical assistance for the TruCluster Softwareadministration based on event reports from the TruClusterSoftware.

TruClusterSoftwareComponentInteraction

The TruCluster Software components work together to monitorthe state of the ASE domain and to detect and respond to failuresin hardware and system software.

Figure 2–1 depicts an Available Server configuration with twoASE member systems linked by two networks and two sharedSCSI buses.

Figure 2–1 ASE Software Component Interaction

ASE Member 1 ASE Member 2

ICMP ICMPNetwork Pings

SCSI Pings

aseagent

asehsm

Kernel Kernel

actionscripts aseagent

aselogger asehsm

ZKOX−5481−22−RGS

asecdb

actionscripts

asecdbAM/CAM AM/CAM

asemgrcommands

asedirector

aselogger

Understanding TruCluster Software Interactions 2–7

Page 58: Truclu Ase

Introducing the TruCluster Software Components

The two ASE member systems shown in Figure 2–1 are configuredin a symmetric manner. They share a common database, asecdb .Member 1 is running the asedirector and the aselogger . TheDirector daemon interacts with the Agent daemons on bothmembers to provide coordination of ASE activities.

The Host Status Monitor communicates the status of the SCSIbuses and the networks to the Agents and the Director. Itreceives information about the state of the SCSI bus from the AMdriver, which is configured into the kernel.

User commands are sent to the asedirector from both membersystems through the asemgr utility. However, if more than oneinstance of the asemgr is running at a given time, the asecdbbecomes locked and management operations cannot be performed.

The Logger daemon logs errors detected on the SCSI bus by theAM driver to the kern.log file. It logs errors detected by the othersoftware components to the daemon.log file.

2–8 Understanding TruCluster Software Interactions

Page 59: Truclu Ase

Understanding TruCluster Failure Detection and Response

Understanding TruCluster Failure Detection and Response

Overview The TruCluster Software monitors the health of the membernodes within the ASE domain, and determines if and how to reactwhen a member appears to change. The decision depends uponthe state of the two communication paths between the membernodes: the network and the shared SCSI bus. When a significantfailure is detected on a server, the TruCluster Software reassignsthe services that were being provided from the affected memberto another server. Such an automatic reassignment is called afailover.

The Host Status Monitor (HSM) on each system monitors thestate of the other member systems through Internet ControlMessage Protocol (ICMP) pings over the network and pingsgenerated by the Availability Manager (AM) over the shared SCSIbus. The HSMs on each ASE member determine the status of theother ASE members in the following way:

• Every few seconds, one member node issues a SCSI sendcommand to another member through their respective SCSIhost adapters,1 and the other node responds immediately.

• Over the network path, an ICMP echo, or ping, is exchanged.TruCluster-initiated network echoes carry data as well, todistinguish them from those initiated by other software.

• The failure of a node to respond to these periodic checks (withretries) may ultimately cause the alerted node to invoke thefailover logic.

This section examines how the TruCluster Software responds todifferent types of failures within the ASE.

Failure Eventsthat Trigger aResponse

There are five types of failure conditions that the TruClusterSoftware detects and responds to, as follows:

• Member Node failure — An ASE member becomes inoperable(Host Down)

• Critical SCSI path failure — A member detects an I/O error toa shared disk, even though the disk is available

• Device failure — A disk on the shared SCSI bus fails torespond to I/O

1 The TruCluster Software uses the ICMP term ‘‘ping’’ to describe thisinteraction.

Understanding TruCluster Software Interactions 2–9

Page 60: Truclu Ase

Understanding TruCluster Failure Detection and Response

• Network Interface failure — A member’s network connectionfails due to a bad controller, a pulled network cable, or asystem crash

• Network Partition — At least two members cannotcommunicate with each other over the network, even thoughall members’ network interfaces are functional

Member NodeFailure

Figure 2–2 depicts the failure of an ASE domain member. Inthis situation, the Host Status Monitor daemon on the local hostmember determines that the remote host is down because thehost pings on both the network and the SCSI bus have timedout.

Figure 2–2 Member Down Scenario

LocalHost

RemoteHost

I/O Bus

NetworkZKOX−3927−102−RGS

When a member failure is detected, the TruCluster Software:

• Notifies the administrator that a member is unavailable byrunning the Alert script

• Restarts the Director on a surviving member (if it was runningon the failed member)

• Restarts each affected service according to its AutomaticService Placement (ASP) policy

When a system crashes, the failed node does not run any of thestop scripts for the services that it was running. Consequently,when the member becomes available again, the ASE Agent willrun all of the stop scripts for each service in the ASE database tobe sure that any necessary "clean-up" has been performed.

2–10 Understanding TruCluster Software Interactions

Page 61: Truclu Ase

Understanding TruCluster Failure Detection and Response

SCSI BusFailures

The TruCluster Software uses the SCSI buses for two purposes:carrying I/O to devices, and communicating state informationbetween servers. A failure of a SCSI bus is not necessarily areason to invoke a failover. If another working bus is available,the server state information can pass over it just as well. If thereare no storage devices in active use on a failed bus, there is noreason to fail over services associated with those devices (the busmay become available before a disk I/O arrives). Furthermore,an unresponding disk may belong to an LSM mirrored volume,in which case LSM may be able to field the failure, avoiding afailover. This line of reasoning leads to the establishment of thefollowing conditions as being necessary and sufficient to initiate afailover on a SCSI bus:

• Failure of I/O to a device on a shared bus

• LSM cannot recover from the failure

Note that even if a member node fails to respond to SCSI pingsover all SCSI buses, this is still not sufficient to trigger a failover.In the case of a device failure, whether the service fails over or isshut down depends on whether another member is able to reachthe disk.

As indicated, failures can occur on a SCSI bus under two basicscenarios:

• Critical SCSI path failure

• Device failure

The next two sections discuss the TruCluster Software’s responseto each of these conditions.

Critical SCSIPath Failure

A critical SCSI path failure occurs when an ASE service cannotaccess a shared device even though the device itself is available.At the time of the I/O failure, the TruCluster Software is unableto distinguish a SCSI path failure from an actual device failure.

Figure 2–3 depicts a critical SCSI path failure. In this situation,the Availability Manager on the local host member notifies theHost Status Monitor daemon that the ping over the SCSI bushas timed out. In addition, the Availability Manager notifies theAgent daemon of a device path failure.

When an I/O failure is detected, the TruCluster Software:

• Runs an Alert script to notify the administrator that a devicecannot be reached.

• Determines whether the service can continue to run withoutaccessing the affected device (if it is part of an LSM mirroredvolume), and either:

Leaves the service running, if possible, or

Understanding TruCluster Software Interactions 2–11

Page 62: Truclu Ase

Understanding TruCluster Failure Detection and Response

Figure 2–3 Critical SCSI Path Failure Scenario

LocalHost

RemoteHost

I/O Bus

NetworkZKOX−3927−103−RGS

Stops the affected service and attempts to restart theservice on each eligible ASE member until successfulor until all eligible members are tried. The TruClusterSoftware selects members according to the affectedservice’s ASP policy.

If the SCSI path failure does not affect all eligible members, theservice will be made available on one of the unaffected members.If the SCSI path failure affects all eligible members, the servicewill remain unassigned.

Note that if a service cannot be stopped after an I/O failure,the TruCluster Software will reboot the member running theservice in order to make the service available on another member.This will occur if the file systems or filesets associated with theaffected service cannot be unmounted.

Device Failure A device failure occurs when a device cannot respond to I/O. Theonly difference between the TruCluster Software’s response to adevice failure and the TruCluster Software’s response to a criticalSCSI path failure is that when the TruCluster Software tries torestart the service on other eligible members, all attempts will failif the device has failed and the service will remain unassigned.

2–12 Understanding TruCluster Software Interactions

Page 63: Truclu Ase

Understanding TruCluster Failure Detection and Response

In the case where a failing disk is not mirrored, the TruClusterSoftware:

• Logs an alert message

• Stops the affected service

• Marks the service as ‘‘unassigned’’ and issues an alert message

If LSM mirroring is being used, and a plex is available, theTruCluster Software will continue to make the correspondingservice available. No further response by the TruClusterSoftware is necessary. While the service is still available from thecurrent member, it cannot be started on another member, whichminimizes the chance of accessing stale data. This is a minortradeoff between availability and data integrity.

ASE_PARTIAL_MIRRORINGParameter

There is a TruCluster Software runtime parameter namedASE_PARTIAL_MIRRORINGwhich, when set to ‘‘on’’ will enable an LSMservice with a failed plex to start on another member, but this isnot recommended.

The recommendation is that when you set up a service that usesan LSM mirrored volume, the asemgr asks you to choose whetheror not to allow the service to move if a more highly favoredmember becomes available. You should choose to not allow theTruCluster Software to move the service in this case. That way, ifa plex has failed, the service will remain available on the memberthat it is running on. Otherwise, when the TruCluster Softwaretries to move the service, it will be unable to start it on anymember and the service will then remain unavailable.

If you choose to enable ASE_PARTIAL_MIRRORING, you must realizethat you are trading off the remote potential for data integrity infavor of service availability. To enable ASE_PARTIAL_MIRRORING, typethe following on all ASE members:

# rcmgr set ASE_PARTIAL_MIRRORING on

NetworkFailures

Any network to which all ASE members are connected may beused for daemon communications or pings. If there are severalsuch fully connected networks, one is designated as the primarynetwork, while the others are considered to be backup networks.Unless you specify otherwise, the hostname network is the primarynetwork.

An ASE member will always use its primary network if it isoperating, and automatically switch to a backup upon failure. Ifone server switches to a backup, the other members will use thesame backup to communicate with it, but will continue to use theprimary network to communicate with each other. When a failedprimary network is restored, the TruCluster Software resumesusing it automatically.

Understanding TruCluster Software Interactions 2–13

Page 64: Truclu Ase

Understanding TruCluster Failure Detection and Response

When a network ping failure occurs, the Host Status Monitorwaits to determine whether the failure is local, or if the networkhas been partitioned. A network interface failure occurs whenthe HSM finds that all of the local network interfaces cannottransmit onto the network (the TruCluster Software containsinterface test state machines that test each monitored networkcontroller periodically to determine its status).

In contrast, a network partition is declared between two memberswhen another member cannot be pinged over any of the availablenetworks, even though the member does respond to SCSI pings.Therefore, a network partition may or may not involve a localnetwork interface failure.

When a network interface failure is detected, the TruClusterSoftware will fail ASE services over to another member whosenetwork interface is still working. However, if the networkhas been partitioned, it does not make sense to fail overservices because all other members are affected by the partition.The TruCluster Software’s response to a partition is passive;it disables failovers. In short, the necessary and sufficientconditions to initiate a failover due to a network failure are asfollows:

• A member node’s network interface is down, as indicated byits failure to respond to pings

• A network ping has timed out because the network ping hasfailed

Services cannot be restarted while the network is partitioned.The TruCluster Software continues to attempt pings over thefailed network at a reduced frequency until the partition isrepaired.

NetworkInterfaceFailureResponse

When a network interface failure is detected, the TruClusterSoftware:

• Notifies the administrator that a network interface has failedby running the Alert script

• Moves the Director to an unaffected member (if it was runningon the affected member)

• Stops all services that were running on the affected member

• Restarts each affected service according to its AutomaticService Placement (ASP) policy

It is important to note that if a service cannot be stopped aftera network interface failure, the TruCluster Software will rebootthe member running the service to make the service availableon another member. This occurs if the file systems or filesetsassociated with the affected service cannot be unmounted.

2–14 Understanding TruCluster Software Interactions

Page 65: Truclu Ase

Understanding TruCluster Failure Detection and Response

NetworkPartitionResponse

When a network partition is detected, the TruCluster Software:

• Notifies the administrator that a network partition hasoccurred by running the Alert script

• Continues to run the existing services

• Ceases to provide further failover or administrative services

The TruCluster Software continues to attempt pings over thefailed network at a reduced frequency until the network partitionfailure gets corrected.

During a network partition, the asemgr cannot be used.

When the network partition gets corrected, the Agent daemonsselect a member node to run the Director daemon and theDirector daemon restarts services.

MonitoredNetworkFailures

By default, all networks used by the TruCluster Softwarefor daemon communications are monitored. However, if youprefer, you can specify the networks whose local interfaces theTruCluster Software maintains a watch on. A monitored networkneed not be fully connected to all ASE members; any network’sinterface may be monitored, provided it is an Ethernet or FDDIinterface. You use the asemgr to designate which networks youwant monitored. When a monitored network’s interface fails, ascript is run to give your services the opportunity to react.

A network used for daemon communications does not necessarilyhave to be monitored, nor does a monitored network necessarilyhave to be used for daemon communications. The notions ofprimary and backup network are independent of monitor andignore. For example, you may choose to ignore a primary networkif you do not need it for your services. In this case, the TruClusterSoftware will not notify your service if the interface fails, but itwill route its own pings or daemon messages through a backup.Conversely, if you choose to monitor a network that is not sharedby all the ASE members, the TruCluster Software will notifyyour services of a failure, but the TruCluster Software’s ability tofunction will not be affected by that failure.

ServiceFailover

When the TruCluster Software relocates or fails over a serviceto another member, it performs a stop script to shut down theapplication (except in the case of a panic). If applicable, the filesystems are then unmounted and the SCSI disk reservationsreleased. The server designated to acquire the service thenreserves the disks, checks (UFS) and mounts the file system, andexecutes a start script to restart the application. For example, inthe NFS service bundled with the product, the stop/start scriptsmanage an IP address-aliasing process whereby a server answersto service’s IP address (as well as to its own, genuine address).

Understanding TruCluster Software Interactions 2–15

Page 66: Truclu Ase

Understanding TruCluster Failure Detection and Response

If the served application is inherently stateless, simple one-lineadditions to the start and stop action scripts included in theTruCluster Software distribution will often prove sufficient.Where state information must be maintained, however, the scriptsmay need elaborate enhancement to place an application in the‘‘state’’ it was in prior to the relocation or failover. For example,an application that keeps a journal file will likely need to berestarted with an option switch to read and execute the journal.This is the administrator’s responsibility. Of course, the journalmust be available to the application from the new server; the fileshould reside on a disk that migrates with the application.

Figure 2–4 depicts a failure in an NFS service. In this scenario,a bad controller on ASE Member 2 causes ASE service sourcesto fail over to ASE Member 3 (on which another ASE service isalready running).

Figure 2–4 Service Failover Scenario

ZKOX−3927−105−RGS

Network

SCSI

ASE Domain

NFS Service NFS Service

NFS Client NFS Client

NFS Servicemail sources users

/usr/users /sources /spool/mail

ASE Member 1 ASE Member 2 ASE Member 3

It is not currently possible to control the order in which severalservices being moved together are stopped and started. If order isimportant, they should be managed together as a single service,with a single pair of start and stop scripts.

2–16 Understanding TruCluster Software Interactions

Page 67: Truclu Ase

Understanding TruCluster Failure Detection and Response

Note

It is important to understand that as each member entersmultiuser run level and runs asemember start , the Agent, aspart of its initialization, runs the stop script of each servicein its database, regardless of the fact that the service, ifrunning at all, may be running on another server.

As previously described, only a few well-defined events willtrigger a service failover. General I/O errors are not reported tothe TruCluster Software by the system, so a crashed applicationor corrupted file system will not trigger a failover. For example, ifan application’s I/O fails because the file system is full, or becauseof insufficient privileges, the system will return an error to theapplication which may in turn notify the user, but the system willnot react further. An I/O failure at a lower level, however, suchas a disk not responding to commands, will trigger a failover.(However, if a disk is truly inoperable, no other member would beable to use it either, and a failover would not succeed.) In such acase, the service will be left ‘‘unassigned’’ and an alert messagelogged.

ReservingDevices

SCSI reservations are taken out on a service’s disks when theservice is started. A SCSI device holding a reservation will rejectmost commands coming from hosts other than the one whosereservations it holds. The inquiry command, however, is alwaysallowed.

When establishing a reservation on a device, the Agent uses thefollowing procedure:

1. The Agent sends a SCSI inquiry command to the device (thisis the only time the TruCluster Software ‘‘pings’’ a device).If the ping fails, the device is unreachable, and the Agentregisters a path failure.

2. If the inquiry succeeds, the Agent next attempts to open thedevice’s special file.

3. If the open succeeds, the Agent tries to reserve the device.

4. If this also succeeds, the event is logged and the process iscomplete.

If the call to open or the attempt to rereserve the device failed, theAgent sends an ASE_RESERVATION_FAILUREmessage to the Director,and the Director simply writes a log message in the daemon.logfile. No action is taken until a service actually tries to write to adevice — only at that time does the TruCluster Software decidewhat, if anything, to do.

Understanding TruCluster Software Interactions 2–17

Page 68: Truclu Ase

Understanding TruCluster Failure Detection and Response

There are two ways to ‘‘break’’ a reservation: by sending thedevice a bus device reset message, or by resetting the SCSI bus,which breaks all reservations on the bus. Either action causes anAgent to attempt to reestablish its reservations on all registereddevices.

Choosing aNew Director

When an Agent on an ASE domain member loses contact withthe Director, it uses the following procedure to reconnect to aDirector:

1. It walks down the list of members, attempting to connect to aDirector on each.

2. If no Director is found, it walks down the list a second timelooking for at least one member that satisfies three criteria, asfollows:

• The HSM thinks that the member is ‘‘up’’

• The member’s IP address is greater than the local host’s(this criteria is used to avoid starting multiple Directors)

• There is an Agent on that host that responds to a testmessage (so it isn’t likely to be ‘‘hung.’’

3. If no member meeting all three requirements is found, thelocal member will start a Director.

4. When the Agent connects to a Director, the Agent sends theDirector an ‘‘online’’ message which causes the Director to runits database consistency algorithm to ensure that all membersuse identical copies of the database.

Action ScriptErrors

When the TruCluster Software invokes an action script, it usuallyconsiders a 0 (zero) exit code as a success. An exit code of 1indicates that the script failed. A stop script can produce the exitcode 99, which indicates that the service could not be stoppedbecause the service was busy. The exception is the check actionscript, which exits with an exit code between 100 and 200 toindicate that the service is running, and an exit code that is lessthan 100 to indicate that the service is not running.

All standard output and standard error output from your scriptgoes to the syslog daemon log. If a script exits with a 0 (zero),it is logged as an informational message. If a script exits with anonzero exit code, the messages are logged as errors.

The timeout value for a script is the specified length of timeTruCluster Software waits for your script to finish running. Thisvalue should be the maximum amount of time that the scriptneeds. If your script runs longer than the timeout value (forexample, because it hangs), the TruCluster Software will considerthe script failed and report the failure as a timeout of the script.

2–18 Understanding TruCluster Software Interactions

Page 69: Truclu Ase

Understanding TruCluster Failure Detection and Response

LSM andTruClusterFailover

LSM complements the TruCluster Software in that it is able toprotect users and applications from the types of failures fromwhich the TruCluster Available Server cannot recover. LSM canmirror a fileset so that if one of the underlying disks fails, thedata is still available through the mirror fileset. For maximumprotection, the mirrored fileset should reside on an I/O subsystemwhose hardware substrate is completely independent of the faileddisk’s substrate. By placing the mirrors on different sharedSCSI buses, using different cables and other components, theTruCluster Software and LSM together can protect the applicationfrom any single point of I/O subsystem hardware failure. LSMprotects against disk failure and the TruCluster Available Serverprotects against failures in the path to the disk.

When the I/O to a disk fails, the Agent consults the database tosee if that disk belongs to a mirrored LSM plex. If so, the Agentruns a script causing LSM to manage the failure and continue I/Othrough the remaining plex(es). If the LSM script exits with errorstatus, indicating no other plex is available, the Agent will stopthe service and mark it unassigned.

Understanding TruCluster Software Interactions 2–19

Page 70: Truclu Ase

Summary

Summary

IntroducingHighlyAvailableServices

The TruCluster Software provides an infrastructure that makesapplications and system services ‘‘highly available’’ to clients,minimizing downtime. High availability is achieved by decouplingapplication downtime from system downtime.

The services that the TruCluster Software supports are configuredwithin an Available Server Environment (ASE). An ASE consistsof from two to four servers that are loosely coupled through thesharing of one or more networks and one or more SCSI buses.

An ASE service is an application provided to clients, such as adatabase, an NFS service, or an electronic mail service. EachASE service has a unique name. Administrators use the nameto manage the service, and clients also use the name whenspecifying their requests for service. This arrangement has thebenefit that clients need not know which server is currentlyproviding the service.

ASE services are started and stopped through action scripts.Consequently, any application placed in an ASE service must bemanageable through script-based commands. If the TruClusterSoftware determines that a resource critical to your service hasfailed and a redundant resource is available to replace it, theTruCluster Software executes a script that stops the runninginstance of the service, replaces or reassigns the failed resource,and executes other scripts to start a new instance of the serviceon another server.

Introducingthe TruClusterSoftwareComponents

All ASE member systems are configured in a symmetric manner.They share a common database, asecdb . One member runs theasedirector . The Director daemon interacts with the Agentdaemons on both members to provide coordination of ASEactivities.

The Host Status Monitor communicates the status of the SCSIbuses and the networks to the Agents and the Director. Itreceives information about the state of the SCSI bus from the AMdriver, which is configured into the kernel.

User commands are sent to the asedirector from both membersystems through the asemgr utility. However, if more than oneinstance of the asemgr is running at a given time, the asecdbbecomes locked and management operations cannot be performed.

The Logger daemon logs errors detected on the SCSI bus by theAM driver to the kern.log file. It logs errors detected by the othersoftware components to the daemon.log file. It is recommended,but not required, that you run the aselogger on all ASE members.

2–20 Understanding TruCluster Software Interactions

Page 71: Truclu Ase

Summary

UnderstandingTruClusterSoftwareFailureDetection andResponse

The TruCluster Software monitors the health of the membernodes within the ASE domain, and determines if and how to reactwhen a member appears to change. The decision depends uponthe state of the two communication paths between the membernodes: the network and the shared SCSI bus. When a significantfailure is detected on a server, the TruCluster Software reassignsthe services that were being provided from the affected memberto another server. Such an automatic reassignment is called afailover.

There are five types of failure conditions that the TruClusterSoftware detects and responds to, as follows:

• Member Node Failure — An ASE member becomes inoperable(Host Down)

• Critical SCSI Path Failure — A member detects an I/O errorto a shared disk, even though the disk is available.

• Device Failure — A disk on the shared SCSI bus fails torespond to I/O

• Network Interface Failure — A member’s network connectionfails due to a bad controller, a pulled network cable, or asystem crash

• Network Partition — At least two members cannotcommunicate with each other over the network, even thoughall members’ network interfaces are functional

Understanding TruCluster Software Interactions 2–21

Page 72: Truclu Ase

Exercises

Exercises

IntroducingHighlyAvailableServices:Exercise

1. Describe the mechanism through which the TruClusterSoftware determines the status of the ASE members.

2. Discuss some limitations of ASE services.

IntroducingHighlyAvailableServices:Solution

1. An ASE consists of from two to four servers that are looselycoupled through the sharing of one or more networks and oneor more SCSI buses. By sending message packets across theseredundant paths at regular intervals, the TruCluster Softwarecan determine the current status of the ASE members andinitiate appropriate actions when failures occur.

2. The TruCluster Software cannot detect software failureswithin an application, or corrupted data files. In addition,when ASE services are relocated, the TruCluster Softwaredoes not promise that clients will not notice any interruption,nor does it promise that the client will not need to take actionto resume use of the service. The TruCluster Software simplypromises to try to restart a broken service.

Introducingthe TruClusterSoftwareComponents:Exercise

Describe the function of the aseagent daemon.

Introducingthe TruClusterSoftwareComponents:Solution

The Agent daemon oversees a single member server. Eachmember runs an instance of the Agent, which maintains a nearreal-time view of the status of its host, communicating thisinformation to the Director.

The Agent deduces the status of service availability frominformation reported by the Host Status Monitor and theAvailability Manager. It starts and stops services on the localmember under the Director’s supervision. The Agent daemon usesthe Availability Manager driver interfaces to reserve disks and toreceive notification of lost reservations.

The Agents on each member system are responsible for electing aDirector. If an Agent exits, the rest of the ASE members considerthe affected server to be ‘‘down.’’

2–22 Understanding TruCluster Software Interactions

Page 73: Truclu Ase

Exercises

TruClusterSoftwareFailureDetection andResponse:Exercise

Describe the way in which the TruCluster Software monitors thestate of the systems in an Available Server configuration.

TruClusterSoftwareFailureDetection andResponse:Solution

Each member of the ASE domain is connected to all the othermembers through both SCSI bus and network communicationpaths. The member systems use these paths to detect the state ofthe other systems as follows:

• Every few seconds, one member node issues a SCSI sendcommand to another member through their respective SCSIhost adapters.

• Over the network path, an ICMP echo, or ‘‘ping’’, is exchanged.TruCluster-initiated network echoes carry data as well, todistinguish them from those initiated by other software.

The failure of a node to respond to these periodic checks (withretries) may ultimately cause the alerted node to invoke thefailover procedures.

Understanding TruCluster Software Interactions 2–23

Page 74: Truclu Ase
Page 75: Truclu Ase

3Configuring TruCluster Available Server

Hardware

Configuring TruCluster Available Server Hardware 3–1

Page 76: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter describes the general hardware configurationrules and restrictions. It also lists the supported hardwareused to set up TruCluster Available Server Software Version1.4, and describes how to set up the hardware for TruClusterAvailable Server. Hardware configuration includes installing thecables, terminators, and signal converters necessary to connecta supported controller to a storage shelf subsystem containingAvailable Server-supported disks.

Detailed information about installing a computer system, SCSIcontroller, or storage box is not provided. See the documentationfor the specific hardware for information on installing thehardware itself.

Objectives To set up a TruCluster Available Server environment, you shouldbe able to:

• Describe TruCluster Available Server general configurationrules and restrictions

• Determine the supported SCSI controllers, cables, terminators,and storage boxes

• Configure the hardware necessary to support variousTruCluster Available Server configurations

Resources For more information on the topics in this chapter, see theTruCluster Available Server Software documentation.

• Available Server Environment Administration

• Hardware Configuration and Software Installation

• TruCluster Available Server Software Version 1.4 SPD

3–2 Configuring TruCluster Available Server Hardware

Page 77: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

Examining TruCluster Available ServerGeneral Hardware Configuration Rules and Restrictions

Overview There are configuration rules and restrictions for each of thesupported components, but there are also general rules andrestrictions that must be adhered to for any Available Serverconfiguration. This topic covers the general rules and restrictions.

Additionally, this topic contains a SCSI bus overview to ensurethat you are familiar with SCSI bus terminology, cabling, andtermination.

Rules andRestrictions

Following are the general TruCluster Available Server rules andrestrictions that govern Available Server configurations:

• The number of systems allowed on a shared bus in anAvailable Server configuration ranges from two to four. If anysystem uses a KZMSA XMI to SCSI adapter or a PMAZCSCSI controller, the maximum number of systems is reducedto three.

• All the systems in a TruCluster Available Server configurationmust be connected to at least one common IP subnet.Available Server allows you to connect redundant networks forfailover situations.

• All Available Server member systems must be connected to atleast one shared SCSI bus.All systems in the Available Server configuration must see thedisks on the shared SCSI bus as the same device number. Forinstance, an RZ26 installed in BA350 slot 1 on SCSI bus 2(scsi2) will have a device address of rz17. All member systemsin the Available Server configuration must see this disk asrz17.Therefore, in this case, the shared SCSI bus must be SCSI bus2 on all systems in the Available Server configuration.

• SCSI bus specifications limit the number of devices on an 8-bitSCSI bus (narrow) to 8 and to 16 on a 16-bit SCSI bus (wide).However, although TruCluster Available Server supportswide, differential SCSI devices, Digital UNIX supports only 8devices on a SCSI bus.

• The length of the shared SCSI bus cannot exceed:

3 meters for single-ended fast SCSI

6 meters for single-ended slow SCSI

25 meters for differential SCSI

Configuring TruCluster Available Server Hardware 3–3

Page 78: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

The bus length includes the bus length within the systemand host adapter and any storage box. Table 3–1 providessome SCSI bus lengths you must consider for Available Serverconfigurations.

Table 3–1 SCSI Bus Lengths in Some Devices and Systems

Device SCSI Bus Length

BA350 0.9 meter

BA353 0.9 meter

BA356 1.0 meter

DEC 7000 0.8 meter

DEC 10000 0.8 meter

• For TURBOchannel-based systems, you must set theboot_reset console variable to ON. If the variable is not set,the TURBOchannel option self-test may fail and the systemmay not reboot automatically.

• When a system with a PMAZC installed is turned on, itmay hang during the PMAZC self-tests if the PMAZC is notproperly terminated.

• Although it is not necessary, you should use DWZZA-AA signalconverters whenever KZMSA XMI to SCSI adapters (DEC7000 and DEC 10000) are used because:

You cannot remove the KZMSA internal terminators. TheDWZZA-AA allows the system to be isolated from theshared SCSI bus without bringing down the entire sharedSCSI bus.

Because the KZMSA uses almost 1 meter of SCSI cableinternally, you have only 2 meters left (for fast speed)outside the cabinet. On machines the size of the DEC7000 or DEC 10000, 2 meters are not enough to connect tostorage devices and another host.

• You should use DWZZA-AA signal converters when usingPMAZC SCSI host adapters in fast mode.

• You must remove the DWZZA-AA cover to remove termination.When you replace the cover, ensure that star washers are inplace on all four screws that hold the cover in place. If thestar washers are missing, the DWZZA-AA is susceptible toproblems caused by excessive noise.

• The devices connected to the shared SCSI bus must beinstalled in a manner that allows you to disconnect themwithout affecting bus operation.

3–4 Configuring TruCluster Available Server Hardware

Page 79: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

An improperly terminated SCSI bus segment will causeproblems. It may operate properly for a period of time with noerror conditions, but cause problems when under heavy loadconditions. An unterminated SCSI bus will not operate.

• You cannot use a tape drive on the shared SCSI bus in anAvailable Server configuration.

SCSI BusOverview

This figure shows the SCSI bus adapter as a single-endeddevice. The signal converter converts the single-ended bus to adifferential bus.

Figure 3–1 One SCSI Bus, Two Transmission Methods

T T T

ZKOX−4418−53−RGS

T DifferentialDevice

Single−Ended BusDWZZA

Differential BusSingle−EndedHost Adapter

• Single-ended: This transmission method uses one wire forthe signal and a second wire for ground for each signal. Thismethod is susceptible to noise. The data path can be 8 or 16bits. Single-ended devices include:

BA350 (narrow)

BA353 (narrow)

BA356 (wide)

PMAZC

KZMSA

Single-ended side of a DWZZA or DWZZB signal converter

• Differential: The signal in this transmission method isdetermined by the potential difference between two wires andis less susceptible to noise than single-ended transmission.This enables the use of longer cables for a total bus length of25 meters.The data path used in a differential SCSI bus is usually 16bits. Differential devices that can be used in a TruClusterAvailable Server environment include:

KZPSA

KZTSA

HSZ10

HSZ40

Differential side of a DWZZA or DWZZB signal converter

Configuring TruCluster Available Server Hardware 3–5

Page 80: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

A SCSI bus can consist of multiple SCSI buses, but each SCSIbus must be terminated on each end of the bus.

Note

Figure 3–2 shows the legend used to indicate SCSI bustermination in figures in this course.

Figure 3–2 Legend for SCSI Bus Termination

T

T

T

SCSI bus is terminated externally to a host adapter or device

SCSI bus is terminated on the host adapter or device

The host adapter or device termination has been removed

ZKO−3927−07−RGS

Figure 3–3 shows examples of SCSI buses with a host adapterand devices on the end of the bus only.

Figure 3–3 SCSI Buses with Devices on Bus Ends Only

Single−EndedHost Adapter T Single−Ended

Device

T

T

DifferentialDevice

Single−ended Bus

TDifferential Bus

Bus length = 6 meters (slow) 3 meters (fast)

Bus length = 25 meters

ZKOX−3927−03−RGS

DifferentialHost Adapter

Host adapters and devices may be added to the SCSI bus as longas the bus is only terminated at the bus ends. In Figure 3–4, ahost adapter has been added to the middle of the bus. Note thatthe host adapter termination has been removed from the hostadapter in the middle of the bus.

3–6 Configuring TruCluster Available Server Hardware

Page 81: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

Figure 3–4 SCSI Bus with Device in the Middle of the Bus

Single−Ended Bus

Differential Bus

ZKOX−3927−04−RGS

Single−EndedHost Adapter T Single−Ended

Device

DifferentialDevice

T

TT

T

T

Differential

Single−EndedHost Adpater

Host Adapter

DifferentialHost Adapter

A single-ended SCSI bus can be connected to a differential SCSIbus through a signal converter.

The DWZZA signal converter is a single-ended SCSI to differentialSCSI 8-bit bus converter capable of data transfer rates up to10M bytes/sec. The DWZZA can be installed between an 8-bit,single-ended SCSI bus and a 16-bit, differential SCSI busoperating in the 8-bit mode.

The DWZZB signal converter is a single-ended SCSI to differentialSCSI 16-bit bus converter capable of data transfer rates up to20M bytes/sec. The DWZZB can be installed between a 16-bit,single-ended SCSI bus and a 16-bit, differential SCSI bus.

A DWZZA (DWZZB) has a single-ended end and a differentialend, is bidirectional, and has termination on each end.

Figure 3–5 shows four examples of SCSI buses using mixedtransmission.

Configuring TruCluster Available Server Hardware 3–7

Page 82: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

Figure 3–5 SCSI Buses Using Bus Segments with DifferentTransmission Methods

Single−Ended

16−bit Differential Bus

Bus

8−bit Single−Ended Bus

Single−Ended

Single−Ended

Host Adapter

Host Adapter

Single−EndedDeviceT

T

T

Single−EndedDeviceT

DifferentialHost Adapter

DifferentialDevice

T

T DWZZA

TDWZZA

T DWZZA

T

T

TSingle−Ended Bus

T DWZZA

Differential Bus

Single−Ended

T

BusDifferentialBus

ZKOX−4418−25−RGS

1T

2

16−bit Differential Bus 16−bit Single−Ended Bus Single−EndedDeviceTDifferential

Host Adapter T TDWZZBT 3

4

1 Connecting a single-ended host adapter to a differential device(HSZ40) through a DWZZA

2 Connecting a differential host adapter (KZPSA) to an 8-bitsingle-ended device (BA350) through a DWZZA

3 Connecting a differential host adapter (KZPSA) to a 16-bitsingle-ended device (BA356) through a DWZZB

4 Connecting a single-ended host adapter to a single-endeddevice using DWZZAs to increase the bus length and allowusing fast bus speed

In an Available Server configuration, there is a minimum of twocomputer systems with supported SCSI adapters and one sharedstorage device. This means that there is always at least one hostadapter or storage device in the middle of the SCSI bus.

One of the objectives of an Available Server environment is toallow the removal of a host adapter or device without affectingoverall bus termination.

In Figure 3–4, if a host adapter or device is disconnected fromthe SCSI bus, the SCSI bus would either be opened or improperlyterminated and cease to function properly.

The use of a "Y" cable or tri-link connector allows the SCSI busto be terminated external to a host adapter or device so thehost adapter or device can be removed from the bus withoutdisrupting SCSI bus operation. The device removal from the busis accomplished by disconnecting the "Y" cable from the devicethat you want to remove from the bus.

3–8 Configuring TruCluster Available Server Hardware

Page 83: Truclu Ase

Examining TruCluster Available Server General Hardware Configuration Rules and Restrictions

In Figure 3–6, "Y" cables are used to terminate the SCSI busexternal to the host adapter and device. Either host adaptercan be removed from the SCSI bus without disrupting SCSI busoperation by removing the "Y" cable from the host adapter. Noticethat host adapter and device internal termination is removedwhen external termination is used.

Figure 3–6 Using External Termination on the SCSI Bus

DifferentialBus

"Y" Cable"Y" Cable

T TDifferentialDeviceTDifferential

Host Adapter

T

DifferentialBus

T

"Y" Cable

ZKOX−3927−06−RGS

DifferentialHost Adapter

In Figure 3–7, a "Y" cable is disconnected from one of the hostadapters to remove the host adapter from the SCSI bus. Thisallows you to perform maintenance on the host adapter or systemwithout affecting the SCSI bus. Note that the SCSI bus still hastermination at both ends of the bus and will continue to functionnormally.

Figure 3–7 Disconnecting a Device from the SCSI Bus

DifferentialBus

"Y" Cable"Y" Cable

T TDifferentialDeviceT

T

DifferentialBus

"Y" Cable

ZKOX−3927−08−RGS

DifferentialHost Adapter

TDifferentialHost Adapter

Configuring TruCluster Available Server Hardware 3–9

Page 84: Truclu Ase

Determining Available Server Hardware Components

Determining Available Server Hardware Components

Overview The tables in this section provide lists of supported ASE hardwarecomponents and hardware and firmware revisions (if applicable).

TruClusterAvailableServerSupportedSystems

A TruCluster Available Server environment can consist of two tofour systems. Table 3–2 lists the systems supported by AvailableServer, the type of I/O bus, and SCSI controllers that can be usedin each system.

Table 3–2 Available Server Supported System Types

System Type

MinimumFirmwareRevision I/O Bus

SCSIController

AlphaServer 400 4/166, 4/233 4.5-0 1 PCI KZPSA-BB

AlphaServer 1000 4/200, 4/233, 4/266 4.5-69 1 PCI KZPSA-BB

AlphaServer 1000A 4/233, 4/266,5/300

4.5-72 1 PCI KZPSA-BB

AlphaServer 2000 4/200, 4/233, 4/275,5/250, 5/300, 5/350

4.5-51 1 PCI KZPSA-BB

AlphaServer 2100 4/200, 4/233, 4/275,5/250, 5/300, 5/350

4.5-51 1 PCI KZPSA-BB

AlphaServer 2100A 4/275 4.5-60 1 PCI KZPSA-BB

AlphaServer 2100A 5/250, 5/300,5/350

4.5-64 1 PCI KZPSA-BB

DEC 3000 Models 300, 300L, 300X,300LX, 400, 400S, 500, 500S, 500X,600, 600S, 700, 800, 800S, and 900 1

6.7 1 TURBOchannel PMAZCKZTSA

AlphaServer 4000 5/300, 5/300E 2.0 2 PCI KZPSA-BB

AlphaServer 4100 5/300, 5/300E 2.0 2 PCI KZPSA-BB

DEC 7000 Models 6xx and 7xx 4.4 1 XMI KZMSA

AlphaServer 8200/8400 5/300, 5/350 3.1 1 PCI KSPSA

DEC 10000 Models 6xx and 7xx 4.4 1 XMI KZMSA

1Firmware disk Version 3.62Firmware disk Version 3.7

DECsafeSupportedSCSIControllers

Each system must have one or more SCSI controllers dedicated toa shared SCSI bus in addition to any SCSI controller shipped withthe system (which is used for the internal SCSI bus). Table 3–3lists the supported SCSI controllers along with firmware andhardware revision information, as well as the transmissionmethod used by the controller.

3–10 Configuring TruCluster Available Server Hardware

Page 85: Truclu Ase

Determining Available Server Hardware Components

Table 3–3 DECsafe-Supported SCSI Controllers

Controller Applicable System Data Path

MinimumFirmwareRevision

MinimumHardwareRevision

Number ofSCSI Ports/Channels

KZMSA1 DEC 7000 or DEC10000

Fast or Slow, Narrow,Single-ended

5.6 F03 Two

KZPSA2 AlphaServers Fast, Wide,Differential

A10 F01 One

KZTSA DEC 3000 series Fast, Wide,Differential

A09 F01 One

PMAZC1 DEC 3000 series Fast or Slow, Narrow,Single-ended

1.8 3 N/A Two

1If you have a DECsafe configuration that includes a KZMSA XMI to SCSI adapter or a PMAZC TURBOchannel SCSIcontroller, you can have only three systems in the configuration.2You must have the Version 1.1 Firmware Update utility and the Version 1.0 Configuration Diagnostics utility.3If you change the PMAZC firmware revision, the SCSI ID and bus speed may be reset to the default of 7 and slow.

BA350, BA353,and BA356StorageExpansionUnits

Shared disks for an Available Server configuration must behoused externally in storage expansion units (Table 3–4) or DECRAID subsystems (Table 3–5).

These storage subsystems house only single-ended disk drives;you cannot use a tape drive on a shared SCSI bus in an AvailableServer configuration.

The BA350, BA353 and BA356 storage expansion units havean internal single-ended SCSI bus, and the bus length must beconsidered when you compute the overall SCSI bus length.

Table 3–4 TruCluster Available Server Supported Storage ExpansionUnits

Storage Box Data Path Internal SCSI Bus Length

BA350 Single-ended,narrow

.9 meter

BA353 Single-ended,narrow

.9 meter

BA356 Single-ended,wide

1.0 meter

BA350

Up to seven narrow (8-bit) StorageWorks building blocks (SBB)can be installed in the BA350, and their SCSI IDs are based uponthe slot they are installed in. For instance, a disk installed inBA350 slot 0 has SCSI ID 0; a disk installed in BA350 slot 1 hasSCSI ID 1, and so forth.

Configuring TruCluster Available Server Hardware 3–11

Page 86: Truclu Ase

Determining Available Server Hardware Components

The BA350 storage expansion unit contains internal SCSI bustermination and a SCSI bus jumper. There are occasions whenthe termination must be removed from the BA350 (daisy-chainingtwo BA350s together). The jumper is not removed during normaloperation.

The BA350 can be set up for two-bus operation, but that option isnot very useful for a shared SCSI bus and is not covered in thiscourse.

Figure 3–8 BA350 SCSI Bus

JA1 JB1

0

1

2

3

4

5

6

POWER (7)

T

J

ZKOX−3927−01−RGS

BA353

The SCSI ID for disks installed in a BA353 is defined by deviceaddress switches on the back of the BA353. The switches arelocated to the left of the SCSI input and SCSI output connectors,as shown in Figure 3–9.The switches are marked as Left (Slot 1), Center (Slot 2), andRight (Slot 3). Slot 1 is the left-most slot when the BA353 isviewed from the front.

The On position of a switch generates a logic "1" in the deviceaddress, and switch one is the least significant bit in the deviceaddress. The SCSI IDs shown in Figure 3–9 would be 0, 1, and 2,left to right.

3–12 Configuring TruCluster Available Server Hardware

Page 87: Truclu Ase

Determining Available Server Hardware Components

Figure 3–9 BA353 Device Address Switches and SCSI Input andOutput Connectors

off

on

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

SCSI Output

SCSI Input

1 2 3

off

off

on

on

1

1

2

2

3

3

Left

Center

Right

ZKOX−3927−10−RGS

The BA353 has an internal SCSI bus and internal SCSI bustermination on the output end of the bus. If a cable (BN21H) isconnected to the output connector to connect two BA353 boxestogether, the termination is disabled. There are circumstancesthat require the installation of a terminator on the BA353 SCSIinput connector.

BA356

The BA356, like the BA350, can hold up to seven StorageWorksbuilding blocks. But, unlike the BA350, these SBBs are widedevices. Also, like the BA350, the SBB SCSI IDs are based uponthe slot they are installed in, but the switches on the personalitymodule must be set as follows; switches 4, 5, and 6 are On andSwitches 1, 2, 3, 7, and 8 are Off. These are the default switchpositions.

Figure 3–10 shows the relative location of the BA356 SCSI busjumper, BA35X-MF. The jumper is accessed from the rear ofthe box. For Available Server operations, the jumper, J, shouldalways be installed in the normal position, behind slot 6. Notethat the SCSI bus jumper is not in the same position in theBA356 as in the BA350.

Termination for the BA356 is on the personality module, andis active unless a cable is installed on JB1 to daisy-chain twoBA356s together. In this case, when the cable is connected toJB1, the personality module terminator is disabled.

Like the BA350, the BA356 can be set up for two-bus operationby installing a SCSI bus terminator (BA35X-ME) in place of theSCSI bus jumper. However, like the BA350, two-bus operation inthe BA356 is not very useful for an Available Server environment.

The position behind slot 1 can be used to store the SCSI busjumper.

Configuring TruCluster Available Server Hardware 3–13

Page 88: Truclu Ase

Determining Available Server Hardware Components

Figure 3–10 shows the relative locations of the BA356 SCSI busjumper and the position for storing the SCSI bus jumper if youdo install the terminator. For Available Server operations, thejumper, J, should always be installed.

Figure 3–10 BA356 Storage Shelf SCSI Bus

JA1 JB1

0

1

2

3

4

5

6

POWER (7)

ZKOX−4418−11−RGS

J/T

J

Note that JA1 and JB1 are located on the personality module(in the top of the box when it is standing vertically). JB1, onthe front of the module, is visible. JA1 is on the left side of thepersonality module as you face the front of the BA356, and ishidden from the normal view.

SupportedControllersfor DEC RAIDSubsystems

Table 3–5 lists the supported controllers for DEC RAIDsubsystems.

Table 3–5 Supported DEC RAID Subsystems

ControllerFirmwareRevision Data Path

HSZ10 1 N/A Wide

HSZ40-Ax 2 2.0, 2.5 or later Wide

HSZ40-Bx 2 2.5 Wide

HSZ40-Cx 2 2.5 Wide

1The HSZ10 must be used in a TruCluster Available Server environment that uses onlyPMAZC SCSI controllers.2TruCluster Available Server Software supports dual-redundant HSZ40 configurations.Both HSZ40 controllers must be on the same shared SCSI bus.

3–14 Configuring TruCluster Available Server Hardware

Page 89: Truclu Ase

Determining Available Server Hardware Components

Supported DiskDevices

Table 3–6 lists supported disk devices. The RZ series of single-ended disks can be housed in the BA350, BA353, or BA356storage expansion units or HSZ40 DEC RAID controllers.

Table 3–6 TruCluster Available Server Supported Disk Devices

DiskFirmware RevisionSupported Data Path

RZ26 T392 and 392A Narrow

RZ26L 442D Narrow

RZ26L 442E Wide

RZ26N 0466 or later Narrow

RZ26N 0568 or later Wide

RZ28 442C Narrow

RZ28 442E Wide

RZ28B 0006 or later Narrow

RZ28D 0008 or later Narrow and wide

RZ28M 0466 or later Narrow

RZ28M 0568 or later Wide

RZ29B 0011 or later Narrow

SignalConverters

A signal converter is used to convert a single-ended SCSI busto a differential bus or a differential bus to a single-ended bus.When used in a TruCluster Available Server environment, asignal converter is used to convert a differential bus to single-ended to allow the use of differential devices (KZPSA) with thesingle-ended storage expansion units (BA350, BA353, and BA356).

The DWZZA and DWZZB signal converters are used in AvailableServer configurations. The supported models are shown inTable 3–7. The DWZZA is an 8-bit signal converter and theDWZZB is a 16-bit signal converter.

Configuring TruCluster Available Server Hardware 3–15

Page 90: Truclu Ase

Determining Available Server Hardware Components

Table 3–7 Supported Signal Converters

Signal Hardware Connectors

Converter Revision Single-ended Differential Comments

DWZZA-AA1 E01 or later 50-pin low density 68-pin highdensity

A standalone box requiringconnection to a power source.

DWZZA-VA1 F01 or later StorageWorkscompatible96-pin DINconnector; plugsinto BA350 orBA353 backplaneconnector

68-pin highdensity

Installed in a BA350 slot 02, or anyslot in a BA353. Receives powerfrom the BA350 or BA353. Does nottake up a SCSI ID but its presenceprevents having a disk at SCSI ID 0in a BA350 box.

DWZZB-AA3 A01 or later 68-pin highdensity

68-pin highdensity

A standalone box requiringconnection to a power source.

DWZZB-VW3 A01 or later StorageWorkscompatible 96-pinDIN connector;plugs intoBA356 backplaneconnector

68-pin highdensity

Installed in slot 04 of a BA356.Receives power from the BA356.Does not take up a SCSI ID but itspresence prevents having a disk atSCSI ID 0 in the BA356 box.

1DWZZA-AAs and DWZZA-VAs with serial numbers in the range of CX444xxxxx to CX449xxxxx must be upgraded. SeeFCO DWZZA-AA-F002 or DWZZA-VA-F001.2If you plug a DWZZA-VA into any BA350 slot other than slot 0, you must install external terminator P/N 12-37004-04into the JA1 connector and remove the DWZZA-VAs internal, single-ended termination.3The DWZZB-series SCSI bus converters are SCSI-2 and draft SCSI-3 compliant single-ended SCSI to differential 16-bitconverters capable of data transfer rates of up to 20M bytes/s.4If you plug a DWZZB-VW into any BA356 slot other than slot 0, you must install external terminator P/N 12-41768-02(FR-PCXAR-WJ) into the JA1 connector and remove the DWZZA-VAs internal, single-ended termination.

DWZZA-AA

When you use a DWZZA-AA (standalone DWZZA) in an AvailableServer configuration, you typically:

• Remove the differential termination resistor SIPs. SeeFigure 3–11.

• Ensure that the single-ended SCSI-2 termination jumper, J2,is installed.

• Attach an H885-AA tri-link connector (or BN21W-0B "Y"cable) to the differential connector to allow daisy-chaining thedifferential bus or the installation of bus termination if theDWZZA is attached to a controller or device on the end of theSCSI bus.

Figure 3–11 shows the location of the DWZZA-AA single-endedtermination SCSI bus jumper, J2, and the differential terminatorresistor SIPs.

Caution

After removing the DWZZA-AA cover to removetermination, when you replace the cover, ensure that thestar washers are in place on all four screws that hold

3–16 Configuring TruCluster Available Server Hardware

Page 91: Truclu Ase

Determining Available Server Hardware Components

the cover in place. If the star washers are missing, theDWZZA-AA is susceptible to problems caused by excessivenoise.

Figure 3–11 DWZZA-AA Signal Converter SCSI Bus Termination

J2

ZKOX−4418−19−RGS

Single−EndedSCSI−2 Termination

DifferentialTerminatorResistor SIPs

50−Pin

Single−EndedConnector

Low−Density

68−PinHigh−DensityDifferentialConnector

DWZZA-VA

When you use a DWZZA-VA (Figure 3–12) in a BA350 storageexpansion unit:

• Remove the DWZZA differential terminator resistor SIPs.

• Ensure that the DWZZA single-ended SCSI-2 terminationjumper, J2, is installed to provide single-ended termination.

• Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)to the differential connector to allow daisy-chaining thedifferential bus or bus termination if the BA350 is on the endof the SCSI bus.

• Install the DWZZA-VA in slot 0 of the BA350.

If you use the DWZZA-VA in a BA353 storage expansion unit:

• Remove the differential termination resistor SIPs.

• Remove the single-ended SCSI termination jumper, J2.

• Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)to the DWZZA differential connector to allow daisy-chainingthe differential bus.

• Install the DWZZA-VA in any BA353 slot.

• Install terminator part number 12-37004-04 on the BA353input connector to terminate the input end of the BA353single-ended SCSI bus.

Configuring TruCluster Available Server Hardware 3–17

Page 92: Truclu Ase

Determining Available Server Hardware Components

Note

The BA353 single-ended bus extends from the inputconnector to the output connector. It has an internal activeterminator on the output end of the bus. The terminationis disabled if a cable is attached to the output connector toconnect one BA353 to another storage box.The DWZZA-VA and any disks installed in the BA353 arein the middle of the BA353 SCSI bus. As the bus must beterminated on both ends, you must install a terminator onthe SCSI input connector. If the DWZZA-VA single-endedtermination is enabled, you create a stub between theDWZZA-VA and the input connector and the SCSI bus willnot operate properly.

Figure 3–12 DWZZA-VA Signal Converter SCSI Bus Termination

J2

ZKOX−4418−20−RGS

Single−EndedSCSI−2 Termination

DifferentialTerminatorResistor SIPs

96−PinSingle−EndedConnector 68−Pin

High−DensityDifferentialConnector

DWZZB-AA

The DWZZB-AA is a 16-bit, single-ended SCSI to differentialconverter. It can be connected to either a 16-bit SSB shelfpersonality module (BA356-SB) or an SSB shelf SCSI bus. It isused with the BA356 storage expansion unit to support the use ofwide disks such as the RZ26L-VW. It is a standalone unit and hasits own power supply.

When you use a DWZZB-AA (Figure 3–13) in an Available Serverconfiguration you typically:

• Remove the differential termination resistor SIPs.

• Ensure that the single-ended SCSI-2 termination jumpers, W1and W2, are installed to provide single-ended termination.

3–18 Configuring TruCluster Available Server Hardware

Page 93: Truclu Ase

Determining Available Server Hardware Components

• Attach an H885-AA tri-link connector (or BN21W-0B "Y"cable) to the differential connector to allow daisy-chaining thedifferential bus or the installation of bus termination if theDWZZB is attached to a controller or device on the end of theSCSI bus.

Figure 3–13 shows the location of the DWZZB-AA single-endedtermination SCSI bus jumpers, W1 and W2, and the differentialterminator resistor SIPs.

Figure 3–13 DWZZB-AA Signal Converter SCSI Bus Termination

ZKOX−4418−21−RGS

Connector

68−PinDifferential

68−PinSingle−EndedConnector

DifferentialTerminatorResistor SIPs

W2

W1

Single−EndedSCSI−2 Termination

DWZZB-VW

The DWZZB-VW is a single-ended SCSI to differential 16-bitconverter that you plug into slot 0 of a BA356-SB storageexpansion unit for use with wide disks such as the RZ26L-VW.

When you use the DWZZB-VW (Figure 3–14) in the BA356storage expansion unit you typically:

• Remove the DWZZB-VW differential terminator resistor SIPS.

• Ensure that the DWZZB-VW single-ended SCSI-2 terminationjumpers, W1 and W2, are installed to provide single-endedtermination.

• Attach a BN21W-0B "Y" cable (or H885-AA tri-link connector)to the differential connector to allow daisy-chaining thedifferential bus or bus termination if the BA356 is on the endof the SCSI bus.

• Install the DWZZB-VW in slot 0 of the BA356.

Configuring TruCluster Available Server Hardware 3–19

Page 94: Truclu Ase

Determining Available Server Hardware Components

Figure 3–14 DWZZB-VW Signal Converter SCSI Bus Termination

ZKOX−4418−22−RGS

96−PinSingle−EndedConnector

Connector

68−PinDifferentialW1

W2Single−EndedSCSI−2 Termination

DifferentialTerminatorResistor SIPs

SCSICables andTerminatorsfor AvailableServerConfigurations

You must know the proper cable to use for a particular connection.Always check the part number on the cable before you make theconnection.

One of the most critical aspects of cabling an Available Serverconfiguration is providing proper termination for each segment ofthe SCSI bus.

There are configurations that require the use of a specialconnector that provides for the connection of both a cable and aterminator.

Table 3–8 describes cables supported for use in an AvailableServer configuration.

3–20 Configuring TruCluster Available Server Hardware

Page 95: Truclu Ase

Determining Available Server Hardware Components

Table 3–8 Cables Used for Available Server Configurations

Cable Cable Connectors Use

BC06P Two 50-pin, low density. Connect one leg of a BN21V-0B "Y" cable to one leg ofanother BN21V-0B "Y" cable (for instance, to connect twoPMAZC SCSI controllers together).

ZKOX−3927−22−RGS

50−PinLow Density

50−PinLow Density

BC06P Series

BN21H/BN21J Two 50-pin, high density.The BN21J has a right-angle connector.

Connect two BA350 or BA353 storage expansion unitstogether.

50−PinHigh Density

50−PinHigh Density

ZKOX−3927−12−RGS

BN21H Series

BN21K,BN21L

Two 68-pin, high density.The BN21K has one right-angle connector. TheBN21L has two right-angleconnectors.

Connect differential adapters, for instance the KZPSAor KZTSA, to the differential side of a DWZZA. Connectone side of a BN21W-0B "Y" cable or an H885-AA tri-linkconnector to another BN21W-0B "Y" cable or H885-AAtri-link connector. Connect a BA356 to the single-endedend of a DWZZB-AA, or connect two BA356s together.

68−PinHigh Density

68−PinHigh Density

ZKOX−3927−23−RGS

BN21K Series

BN21R/BN23G

Two 50-pin, one highdensity, one low density.

Connect one leg of a BN21V-0B "Y" cable (for instanceon a PMAZC) to a BA350 or BA353. Connect a single-ended device (KZMSA, PMAZC, BA350, or BA353) to thesingle-ended side of a DWZZA-AA.

BN21R Series

50−PinLow Density

50−PinHigh Density

ZKOX−3927−24−RGS

(continued on next page)

Configuring TruCluster Available Server Hardware 3–21

Page 96: Truclu Ase

Determining Available Server Hardware Components

Table 3–8 (Cont.) Cables Used for Available Server Configurations

Cable Cable Connectors Use

BN21V-0B "Y"Cable

Three 50-pin, one highdensity, two low density.

Attach the high-density connector to a PMAZC or KZMSAport. Attach a terminator to one low-density connectorof the "Y" cable, and a cable (BN21R) between the "Y"connector and the storage box or to one leg of another "Y"connector (BC06P). A "Y" cable allows the system to bepowered down without affecting SCSI bus termination.

50−PinLow Density

50−PinHigh Density

BN21V−0B"Y" Cable

ZKOX−3927−25−RGS

BN21W-0B"Y" Cable

Three 68-pin, high density. Attach to a differential device. Attach an H879-AAterminator to one leg of the "Y" cable. Connect a BN21Kcable between this "Y" cable and another BN21W-0B "Y"cable (or a tri-link connector) attached to the differentialside of a DWZZA or another differential SCSI controller.The BN21W-0B "Y" cable is equivalent to the H885-AAtri-link connector. The BN21W-0B "Y" cable enables youto disconnect a system from the shared SCSI bus withoutaffecting bus termination.

68−PinHigh Density

68−PinHigh Density

BN21W−0B"Y" Cable

ZKOX−3927−26−RGS

The terminators and connectors supported for Available Serverconfigurations are shown in Table 3–9. Note that the tri-linkconnector has the same function as a "Y" cable.

3–22 Configuring TruCluster Available Server Hardware

Page 97: Truclu Ase

Determining Available Server Hardware Components

Table 3–9 Terminators and Special Connectors

Terminator orConnector

Number ofPins Density Use

H8574-A orH8860-AA

50 Low With a BN21V-0B "Y" cable to terminate a single-ended SCSIbus.

H879-AA 68 High With an H885-AA tri-link connector or BN21W-0B "Y" cableto terminate a differential SCSI bus.

12-37004-04 50 High Installed on the BA353 input connector when a DWZZA-VAis installed in the BA353 to terminate the input end of theBA353 single-ended SCSI bus.

H885-AA tri-link connector

Threeconnectors,each with 68pins

High Attaches to a wide device. Attach an H879-AA terminatorto one jack of the tri-link connector. Connect a BN21K cablebetween this tri-link connector and another tri-link connector(or a BN21W-0B "Y" cable) attached to the differential side ofa DWZZA or another differential SCSI controller. The tri-linkconnector enables you to disconnect a system from the sharedSCSI bus without affecting bus termination. The H885-AAtri-link connector is equivalent to the BN21W-0B "Y" cable.

Front View

Rear View

H885−AATri−link Connector

ZKOX−3927−27−RGS

Note

Care must be used with the H885-AA tri-link connectorson AlphaServers due to the spacing between modules.

NetworkOptions

The systems in the Available Server environment can beconfigured using multiple network adapters. This provides betteravailability of Available Server functionality by providing failoverbetween network adapters, thus routing connections betweenAvailable Server members around network path failures.

Multiple network adapters also provide greater flexibility in clientaccess to Available Server services.

Table 3–10 provides a list of supported network adapters forTruCluster Available Server Software Version 1.4. Note thatTruCluster Available Server supports only Ethernet and FDDInetwork adapters.

Configuring TruCluster Available Server Hardware 3–23

Page 98: Truclu Ase

Determining Available Server Hardware Components

Table 3–10 Supported Network Adapters

Adapter I/O Bus Type Network Type

DEFEA EISA FDDI

DE422 EISA Lance Ethernet

DE425 EISA Ethernet

DEFPA PCI FDDI

DE4351 PCI Ethernet

DE500-XB PCI Fast Ethernet

DEMFA XMI FDDI

DEMNA XMI Ethernet

PMAD TURBOchannel Ethernet

DEFTA TURBOchannel FDDI

DEFZA TURBOchannel FDDI

1Occasional system failures have been seen when LAT functionality is enabled onAlphaServer 8200 and 8400 systems which have DE435 PCI/Ethernet network adapters.Refer to the release notes for further information.

3–24 Configuring TruCluster Available Server Hardware

Page 99: Truclu Ase

Configuring TruCluster Available Server Hardware

Configuring TruCluster Available Server Hardware

Overview This section describes the steps necessary to set up TruClusterAvailable Server hardware. The steps refer to a specific sectionand table. Perform the steps in the table in order, referring to thefigures and examples as necessary.

You must answer the following questions to generate an AvailableServer configuration and order the correct hardware for theconfiguration.

• How many systems are there in the Available Serverconfiguration (two to four)? Keep in mind, you can haveonly eight devices on the shared SCSI bus. If you have threesystems, and are using a BA350 storage expansion unit, youcan have only five disks in the storage shelf (or storage shelvesif you daisy-chain more than one box).If you have a DWZZA-VA in slot 0 of the BA350, you cannothave a disk at SCSI ID 0.

• What type of controllers do you have?

• What are the constraints for the placement of the systems andstorage devices? In other words, how long is the shared SCSIbus? If you are using PMAZCs, do you have to run slow SCSI?

• Are you using single-ended or differential SCSI, or both?

• For greater availability of mirrored volumes, will you run twoSCSI buses with separate SCSI controllers?

• Will you use a BA350, BA353, or BA356 storage expansionunit, or an HSZ10 or HSZ40 with DEC RAID subsystem?

• Will the Available Server configuration utilize multiplenetwork adapters?

Installingthe NetworkInterfaces

TruCluster Available Server Software Version 1.4 supportsmultiple network adapters which may be used both for clientaccess to Available Server services and for Available Serverdaemon-to-daemon communications.

If the systems did not arrive with the multiple network adaptersinstalled, install the adapter using the documentation includedwith the adapter, then configure the new network interface withnetsetup .

All systems in an Available Server Environment (ASE) must be onthe same primary and backup networks. The network interfacenames for all the common networks must be in each system’s/etc/host file.

Configuring TruCluster Available Server Hardware 3–25

Page 100: Truclu Ase

Configuring TruCluster Available Server Hardware

FirmwareUpdate

When you install a SCSI adapter on a system which will be amember of an ASE, you must check the SCSI adapter firmware.It may be out of date. Also, the update procedures may havechanged since this course was written. Therefore, the releasenotes for the applicable system/SCSI adapter should be obtainedand used to update the firmware.

Note

To obtain the firmware release notes from the FirmwareUpdate Utility CD–ROM, your kernel must be configuredfor the ISO 9660 Compact Disk File System (CDFS).

To obtain the release notes for the firmware update:

• At the console prompt, determine the drive number of theCD–ROM.

• Boot the Digital UNIX operating system.

• Log in as root.

• Place the Alpha Systems Firmware Update CD–ROM into thedrive.

• Mount the CD–ROM as follows (/dev/rz4c is used as anexample CD–ROM drive).

# mount -rt cdfs -o noversion /dev/rz4c /mnt

• Copy the appropriate release notes to your system disk. Inthe example, we obtain the firmware release notes for theAlphaServer 8200/8400.

# cp "/mnt/cdrom/doc/alpha_2100_v45_relnote.txt" alpha_2100_v45_relnote.txt

• Unmount the CD–ROM drive.

# umount /mnt

• Print the release notes.

Starting YourTruClusterAvailableServerConfiguration

This section describes the steps necessary to set up the AvailableServer hardware. The assumption is made that the correcthardware has already been purchased, but you can use theinformation provided to determine the hardware components youneed for any configuration.

Before you start, verify that the system or systems, controllers,and so on are supported by the version of TruCluster AvailableServer you are planning to install. Refer to Table 3–2 throughTable 3–7.

When you are ready to begin your configuration, look for theSCSI controller/storage device combination in the left column ofTable 3–11 and refer to the section and table shown in the rightcolumn.

3–26 Configuring TruCluster Available Server Hardware

Page 101: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–11 Setting Up TruCluster Available Server Configurations

If your controller is a: Refer to:

PMAZC in a single-ended Available Serverconfiguration with a BA350 or BA353 storageexpansion unit

Section: Setting Up a Single-Ended AvailableServer Configuration for Use with PMAZCs and aBA350 or BA353 and Table 3–12

PMAZC in a differential Available Serverconfiguration with a BA350 or BA353

Section: Setting Up a Differential Available ServerConfiguration for Use with PMAZCs and a BA350or BA353 and Table 3–14

PMAZC in a differential Available Serverconfiguration with a BA356

Section: Setting Up a Differential Available ServerConfiguration for Use with PMAZCs and a BA356and Table 3–16

PMAZC in a differential Available Serverconfiguration with an HSZ10 or HSZ40 and DECRAID subsystem

Section: Setting Up an Available ServerConfiguration for Use with PMAZCs and anHSZ40 and Table 3–18

KZTSA in an Available Server configuration with aBA350, BA353, or BA356

Section: Setting Up an Available ServerConfiguration Using a KZTSA TURBOchannelto SCSI Adapter and Table 3–20

KZTSA in an Available Server configuration with anHSZ40

Section: Setting Up an Available ServerConfiguration Using a KZTSA TURBOchannelto SCSI Adapter and Table 3–22

KZMSA in an Available Server configuration with aBA350, BA353, or BA356

Section: Setting Up an Available ServerConfiguration with KZMSA SCSI Controllersand Table 3–25

KZMSA in an Available Server configuration with aHSZ40

Section: Setting Up an Available ServerConfiguration with KZMSA SCSI Controllersand Table 3–26

KZPSA in an Available Server configuration with aBA350, BA353, or BA356

Section: Setting Up an Available ServerConfiguration Using KZPSA PCI to SCSI Adaptersand Table 3–27

KZPSA in an Available Server configuration with anHSZ40

Section: Setting Up an Available ServerConfiguration Using KZPSA PCI to SCSI Adaptersand Table 3–28

Mixed configurations with single-ended adapters anddifferential adapters with a BA350, BA353, or BA356

Section: Setting Up an Available ServerConfiguration with Mixed Adapter Types and aBA350, BA353, or BA356 and Table 3–29

Mixed configurations with single-ended adapters anddifferential adapters with an HSZ40

Section: Setting Up an Available ServerConfiguration with Mixed Adapter Types andan HSZ40 and Table 3–31

Setting Up aSingle-EndedAvailableServerConfigurationfor Use withPMAZCs anda BA350 orBA353

Note

If you have an Available Server configuration that includesa PMAZC SCSI controller, you can have only three systemsin the configuration.

Configuring TruCluster Available Server Hardware 3–27

Page 102: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–12 covers the steps necessary to configure PMAZC SCSIcontrollers in a single-ended Available Server configuration usinga BA350 or BA353. In this configuration there is no differentialSCSI bus, so there are no DWZZAs.

Figure 3–15 shows two DEC 3000 Model 500 systems withPMAZC TURBOchannel SCSI controllers in an Available Serverconfiguration with a BA350 storage expansion unit.

Figure 3–16 shows two DEC 3000 Model 500 systems with eachPMAZC TURBOchannel SCSI controller channel being used fora shared bus in an Available Server configuration with a BA350storage expansion unit.

Table 3–13 lists the hardware necessary to generateconfigurations using two or three systems with PMAZCTURBOchannel SCSI controllers and a BA350 or BA353 storageexpansion unit.

Table 3–12 PMAZC SCSI Controllers and a BA350 or BA353 with Single-Ended Available ServerConfiguration

Step Action Refer to:

1 For each system using a PMAZC on the shared bus, shut down thesystem and install the PMAZC.

Dual SCSI Module(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility toset the PMAZC SCSI ID, bus speed, or to update the firmware.

Section: PMAZC DualSCSI Module Jumpers andFigure 3–20

2 Turn on system power and set the PMAZC SCSI ID and speed ifnecessary.

Section: Verifying andSetting PMAZC andKZTSA SCSI ID and BusSpeed; Example 3–1, 3–2,and 3–4

3 If firmware has to be updated, boot from the Alpha SystemsFirmware Update CD–ROM.

Refer to the firmwarerelease notes for the system

4 Turn off system power and remove jumper W1. Store it on an emptyjumper rest.

Figure 3–20

Disable the PMAZC internal termination by removing the jumper forthe appropriate port (W2 (port A) and W3 (port B)).

5 Install a BN21V-0B "Y" cable on the appropriate PMAZC port foreach PMAZC.

6 For any system containing a PMAZC on either end of the shared bus,attach an H8574-A terminator to one leg of the "Y" cable.

7 Connect adjacent PMAZC controllers together by installing a BC06Pcable between the BN21V-0B "Y" cables.

(continued on next page)

3–28 Configuring TruCluster Available Server Hardware

Page 103: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–12 (Cont.) PMAZC SCSI Controllers and a BA350 or BA353 with Single-EndedAvailable Server Configuration

Step Action Refer to:

8 If the BA350 or BA353 is on one end of the shared bus:

BA350 Ensure that the BA350 SCSI terminator is installed (behindslot 1)

Install a BN21R cable between the "Y" cable on the PMAZCadjacent the BA350 and BA350 input connector JA1. Youmight have to remove the disks in slots 0 and 1 to provideroom to attach the cable. Note that for normal single busoperation the BA350 bus jumper (behind slot 5) must beinstalled.

BA353 Install a BN21R cable between the "Y" cable on the PMAZCadjacent the BA353 and the BA353 SCSI input connector

Figure 3–8 (BA350)Figure 3–9 (BA353)

9 If the BA350 or BA353 is in the middle of the shared bus:

BA350 Remove the BA350 SCSI bus terminator (behind slot 1).Note that the BA350 bus jumper (behind slot 5) must beinstalled.

Install a BN21R cable from each of the two adjacent PMAZC"Y" cables to BA350 SCSI connectors JA1 and JA2.

BA353 Install a BN21R cable from each of the two adjacent PMAZC"Y" cables to the BA353 SCSI input and SCSI outputconnectors

Figure 3–8 (BA350)Figure 3–9 (BA353)

Configuring TruCluster Available Server Hardware 3–29

Page 104: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–15 Available Server Configuration with Two DEC 3000 Model500 Systems and a Single-Ended Shared Bus with aBA350

0

1

2

3

4

5

14

3

5

3

5 0 0

4

Network Interface

DEC 3000 Model 500

BA350

BN21V−0B"Y" Cable

DEC 3000 Model 500

BN21V−0B"Y" Cable

H8574−ATerminator

H8574−ATerminator

BN21R orBN23GCable

ZKOX−3927−11−RGS

2

11

2

11

TT

T

T

T

PMAZC PMAZC

Figure 3–16 Available Server Configuration with Two DEC 3000 Model500 Systems and Two Single-Ended Shared Buses Eachwith a BA350

0

1

2

3

4

5

0

1

2

3

4

5

14

3

5

3

5 0 0

4

Network Interface

DEC 3000 Model 500 BA350

BN21V−0B"Y" Cable

BA350 DEC 3000 Model 500

BN21V−0B"Y" Cable

H8574−ATerminator

H8574−ATerminator

BN21R orBN23GCable

BN21R orBN23GCable

ZKOX−3927−21−RGS

PMAZC PMAZC

Table 3–13 provides a list of hardware needed for single-endedAvailable Server configurations using PMAZC SCSI buscontrollers and BA350 or BA353 storage expansion units. Asthere is no differential SCSI bus, there are no DWZZAs.

3–30 Configuring TruCluster Available Server Hardware

Page 105: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–13 Hardware Needed for a Single-Ended Available Server Configuration with PMAZCSCSI Controllers and a BA350 or BA353 (No DWZZAs)

H8574-A TerminatorBN21R or BN23G

SCSI Cable BC06P SCSI Cable

Number ofSystems 1

BN21V-0B"Y" Cables

BA350 orBA353 onBus End

BA3502 orBA353 inMiddle ofBus

BA350 orBA353 onBus End

BA3502 orBA353 inMiddle ofBus

BA350 orBA353 onBus End

BA3502 orBA353 inMiddle ofBus

2 2 1 2 1 2 1 0

3 3 1 2 1 2 2 1

1You can have only three systems in an Available Server configuration that include a PMAZC SCSI controller.2BA350 internal terminator must be removed.

Setting Up aDifferentialAvailableServerConfigurationfor Use withPMAZCs anda BA350 orBA353

Note

If you have an Available Server configuration that includesa PMAZC SCSI controller, you can have a maximum ofthree systems in the configuration.

Use Table 3–14 to set up PMAZC SCSI controllers in a differentialshared SCSI Available Server configuration with a BA350 orBA353.

Figure 3–17 shows three DEC 3000 Model 500 systems withPMAZC TURBOchannel SCSI controllers in a differentialAvailable Server configuration with a BA350 storage expansionunit.

Table 3–15 lists the hardware necessary to generateconfigurations using two or three systems which use PMAZCTURBOchannel SCSI controllers, DWZZAs, and a BA350 orBA353 storage expansion unit.

Table 3–14 Setting Up an Available Server Configuration with PMAZC SCSI Controllers and aBA350 or BA353 in a Differential Available Server Configuration

Step Action Refer to:

1 For each system using a PMAZC on the shared bus, shut down thesystem and install the PMAZC.

Dual SCSI Module(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to setthe PMAZC SCSI ID, bus speed, or to update the firmware.

Figure 3–20

2 Turn on system power and set the SCSI ID and speed as necessary. Example 3–1,Example 3–2, andExample 3–4

(continued on next page)

Configuring TruCluster Available Server Hardware 3–31

Page 106: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–14 (Cont.) Setting Up an Available Server Configuration with PMAZC SCSI Controllersand a BA350 or BA353 in a Differential Available Server Configuration

Step Action Refer to:

3 If firmware has to be updated, boot from the Alpha Systems FirmwareUpdate CD–ROM.

Refer to the firmwarerelease notes for thesystem

4 Turn off system power and remove jumper W1. Store it on an emptyjumper rest.

Figure 3–20

Ensure that the appropriate PMAZC termination jumpers (W2 (Port A)or W3 (Port B)) are installed to provide termination for one end of eachsingle-ended SCSI bus.

5 You will need one DWZZA-AA for each system with a PMAZC SCSIcontroller and one DWZZA for the BA350 or BA353. You can use aDWZZA-VA for the BA350 or BA353.

For each DWZZA, remove the five differential terminator resistor SIPs. Figure 3–11

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for that end of the single-ended SCSI bus segment.

Figure 3–11

For a DWZZA-VA that is to be installed in a:

BA350 Ensure that the DWZZA-VA single-ended SCSI terminationjumper, J2, is installed

BA353 Remove the DWZZA-VA single-ended SCSI termination jumper,J2

Install terminator 12-37004-04 on the BA353 SCSI inputconnector

Figure 3–9

6 Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the68-pin differential connector on each DWZZA.

7 Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cableor one side of the H885-AA tri-link connector for the two DWZZAs on theend of the differential SCSI bus.

8 Connect a BN21R or BN23G cable between each PMAZC and the single-ended connector of a DWZZA-AA.

9 If you are using a DWZZA-AA to connect to the BA350 or BA353, connectthe DWZZA-AA to the BA350 or BA353 by installing a BN21R or BN23Gcable between the DWZZA-AA single-ended connector and JA1 on theBA350 or the BA353 SCSI input connector.

If you are using a DWZZA-VA, install it in slot 0 of the BA350.Remember, you no longer have SCSI ID 0 available for a disk in theBA350.

Figure 3–8

Install the DWZZA-VA in any slot in a BA353. Verify that terminator12-37004-04 has been installed on the BA353 SCSI input connector.

Figure 3–9

You will need one less BN21R (or BN23G) cable if you are using aDWZZA-VA.

(continued on next page)

3–32 Configuring TruCluster Available Server Hardware

Page 107: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–14 (Cont.) Setting Up an Available Server Configuration with PMAZC SCSI Controllersand a BA350 or BA353 in a Differential Available Server Configuration

Step Action Refer to:

10 Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885-AA tri-link connector) of all the DWZZAs. Start at the "Y" connectoror H885-AA tri-link on one end of the differential bus (one with aterminator) and daisy-chain until you reach the "Y" connector or tri-link with the other terminator. The number of these cables will bethe same as the number of PMAZC controllers in the Available Serverconfiguration.

11 Ensure that the BA350 terminator and SCSI jumper are both installed. Figure 3–8

Figure 3–17 Available Server Configuration with Three DEC 3000Model 500 Systems with PMAZCs, Differential SharedBus, and a BA350

T

T T

TT

T

T

1

2

3

4

T

TTT

Network Interface

DWZZA−VA with H885−AATri−link Connector

H885−AA Tri−link Connectorwith H879−AA Terminator

PMAZC PMAZC PMAZC BN21Ror BN23GCable

BN21Ror BN23GCable

ZKOX−3927−13−RGS

BA350

BN21K or BN21LSCSI DifferentialCable

Table 3–15 provides a list of hardware needed for differentialAvailable Server configurations using PMAZC SCSI buscontrollers or KZMSA XMI to SCSI bus adapters and BA350 orBA353 storage expansion units. DWZZA signal converters areused, one of which may be a DWZZA-VA. If a DWZZA-VA is used,it must be installed in slot 0 of the BA350.

Configuring TruCluster Available Server Hardware 3–33

Page 108: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–15 Hardware Needed for a Differential Available Server Configuration with PMAZC orKZMSA SCSI Controllers and a BA350 or BA353

BN21R or BN23GSCSI Cables

Number ofSystems 1

DWZZA-AAorDWZZA-VA 2

BN21W-0B "Y"Cables orH855-AA Tri-linkConnectors

H789-AATerminators

BN21KorBN21LCables

Using aDWZZA-VA3

NoDWZZA-VA

2 3 3 2 2 2 3

3 4 4 2 3 3 4

1You can have only three systems in an Available Server configuration that include a PMAZC SCSI controller or KZMSAXMI to SCSI adapter.2One of the DWZZAs can be a DWZZA-VA. The rest are DWZZA-AA.3If you use a DWZZA-VA and a BA353 storage expansion unit, you must install terminator 12-37004-04 on the BA353SCSI input connector.

3–34 Configuring TruCluster Available Server Hardware

Page 109: Truclu Ase

Configuring TruCluster Available Server Hardware

Setting Up aDifferentialAvailableServerConfigurationfor Use withPMAZCs and aBA356

Note

If you have an Available Server configuration that includesa PMAZC SCSI controller, you can have only three systemsin the configuration.

Use Table 3–16 to set up PMAZC SCSI controllers in a differentialshared SCSI Available Server configuration with a BA356. Notethat there has to be a signal converter for each system and theBA356. You must use a DWZZA-AA for each PMAZC, but can useeither a DWZZA-VA, DWZZB-VW, or DWZZB-AA at the BA356.Use of a DWZZB-VW is recommended to allow conversion to wideSCSI. This section covers only the use of a DWZZB-VW with theBA356.

Figure 3–18 shows three DEC 3000 Model 500 systems withPMAZC TURBOchannel SCSI controllers in a differentialAvailable Server configuration with a BA356 storage expansionunit.

Table 3–17 lists the hardware necessary to generateconfigurations using two or three systems which use PMAZCTURBOchannel SCSI controllers, DWZZAs, and a BA356 storageexpansion unit.

Table 3–16 Setting Up an Available Server Configuration with PMAZC SCSI Controllers and aBA356 in a Differential Available Server Configuration

Step Action Refer to:

1 For each system using a PMAZC on the shared bus, shut down thesystem and install the PMAZC.

Dual SCSI Module(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility to setthe PMAZC SCSI ID, bus speed, or to update the firmware.

Figure 3–20

2 Turn on system power and set the SCSI ID and speed as necessary. Example 3–1,Example 3–2, andExample 3–4

3 If firmware has to be updated, boot from the Alpha Systems FirmwareUpdate CD–ROM.

Refer to the firmwarerelease notes for thesystem.

4 Turn off system power and remove jumper W1. Store it on an emptyjumper rest.

Figure 3–20

Ensure that the appropriate PMAZC termination jumpers (W2 (Port A)or W3 (Port B)) are installed to provide termination for one end of eachsingle-ended SCSI bus.

5 You will need one DWZZA-AA for each PMAZC on the shared SCSI busand a DWZZB for the BA356. You can use a DWZZB-AA, or DWZZB-VWon the BA356 end of the shared SCSI bus.

(continued on next page)

Configuring TruCluster Available Server Hardware 3–35

Page 110: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–16 (Cont.) Setting Up an Available Server Configuration with PMAZC SCSI Controllersand a BA356 in a Differential Available Server Configuration

Step Action Refer to:

For each DWZZB, remove the five differential terminator resistor SIPs. Figure 3–13 orFigure 3–14

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for the single-ended SCSIbus segment.

Figure 3–11

If a DWZZB-AA is to be used (external to the BA356) ensure that thesingle-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 3–13

If a DWZZB-VW is to be installed in a BA356 (slot 0)ensure that thesingle-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 3–14

6 Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the68-pin differential connector on each DWZZA or DWZZB.

7 Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cableor one side of the H885-AA tri-link connector for the two DWZZ*s on theend of the differential SCSI bus.

8 Connect a BN21R or BN23G cable between each PMAZC and the single-ended connector of a DWZZA-AA.

9 If you are using a DWZZB-VW, install it in slot 0 of the BA356.Remember, you no longer have SCSI ID 0 available for a disk in theBA350.

Figure 3–10

10 Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885-AA tri-link connector) of all the DWZZ*’s. Start at the "Y" connectoror H885-AA tri-link on one end of the differential bus (one with aterminator) and daisy-chain until you reach the "Y" connector or tri-link with the other terminator. The number of these cables will bethe same as the number of PMAZC controllers in the Available Serverconfiguration.

11 If you are using a DWZZB-AA, connect the DWZZB-AA to BA356 JA1with a BN21K or BN21L cable.

3–36 Configuring TruCluster Available Server Hardware

Page 111: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–18 Available Server Configuration with Three DEC 3000Model 500 Systems with PMAZCs, Differential SharedBus, and a BA356

T

T T

TT

T

T

1

2

3

4

T

TTT

Network Interface

DWZZB−VA with H885−AATri−link Connector

H885−AA Tri−link Connectorwith H879−AA Terminator

PMAZC PMAZC PMAZC BN21Ror BN23GCable

BN21Ror BN23GCable

ZKOX−5481−02−RGS

BA356

BN21K or BN21LSCSI DifferentialCable

DWZZA−AA DWZZA−AADWZZA−AA

Table 3–17 provides a list of hardware needed for differentialAvailable Server configurations using PMAZC SCSI buscontrollers or KZMSA XMI to SCSI bus adapters and BA356.DWZZA-AA, DWZZB-AA, or DWZZB-VW signal converters areused. If a DWZZB-VW is used, it must be installed in slot 0 of theBA350.

Table 3–17 Hardware Needed for a Differential Available Server Configuration with PMAZC orKZMSA SCSI Controllers and a BA356

BN21K or BN21LSCSI Cables

Number ofSystems 1 DWZZ* 2

BN21W-0B "Y"Cables orH855-AA Tri-linkConnectors

H789-AATerminators

BN21RorBN23GSCSICables

Using aDWZZB-AA

NoDWZZB-AA 3

2 3 3 2 2 3 2

3 4 4 2 3 4 3

1You can have only three systems in an Available Server configuration that include a PMAZC SCSI controller or KZMSAXMI to SCSI adapter.2There must be one DWZZA-AA for each system in the Available Server configuration. The other DWZZ* may be aDWZZB-AA or DWZZB-VW.3If you do not use a DWZZB-AA, install the DWZZB-VW in BA356 slot 0.

Configuring TruCluster Available Server Hardware 3–37

Page 112: Truclu Ase

Configuring TruCluster Available Server Hardware

Setting Upan AvailableServerConfigurationfor Use withPMAZCs andan HSZ40

Note

You can have only three systems in an Available Serverconfiguration that include a PMAZC SCSI controller.

Use Table 3–18 to set up PMAZC SCSI controllers in an AvailableServer configuration with an HSZ10 or HSZ40. Note that theHSZ10 and HSZ40 are differential devices so the use of DWZZAsand a differential SCSI bus is required.

Figure 3–19 shows two DEC 3000 Model 500 systems withPMAZC TURBOchannel SCSI controllers in a differentialAvailable Server configuration with an HSZ40 and a DEC RAIDsubsystem.

Table 3–19 lists the hardware necessary to generateconfigurations using two or three systems that use PMAZCTURBOchannel SCSI controllers (or KZMSA XMI SCSI adapters),DWZZAs, and an HSZ10 or HSZ40.

3–38 Configuring TruCluster Available Server Hardware

Page 113: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–18 Setting Up a Available Server Configuration with PMAZC SCSI Controllers and anHSZ10 or HSZ40

Step Action Refer to:

1 For each system using a PMAZC on the shared bus, shut down thesystem and install the PMAZC.

Dual SCSI Module(PMAZC-AA)

If necessary, install jumper W1 to enable the setid console utility toset the PMAZC SCSI ID, bus speed, or to update the firmware.

Figure 3–20

2 Turn on system power and set the SCSI ID and speed as necessary. Example 3–1, 3–2, and3–4

3 If firmware has to be updated, boot from the Alpha Systems FirmwareUpdate CD–ROM.

Refer to the firmwarerelease notes for thesystem.

4 Turn off system power and remove jumper W1. Store it on an emptyjumper rest.

Figure 3–20

Ensure that the appropriate PMAZC termination jumpers (W2 or W3)are installed to provide termination for one end of each single-endedSCSI bus.

5 You will need one DWZZA-AA for each PMAZC SCSI controller. Figure 3–11

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for that end of thesingle-ended SCSI bus segment.

Remove the five differential terminator resistor SIPs from eachDWZZA.

6 Connect a BN21W-0B "Y" cable or a H885-AA tri-link connector toinput of the HSZ10 or HSZ40 and to the 68-pin differential connectoron each DWZZA.

7 Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cableor one side of the H885-AA tri-link connector for the host adapters (orthe HSZ10 or HSZ40) that will be on the ends of the shared bus.

8 Connect a BN21R or BN23G cable between each PMAZC and thesingle-ended connector of a DWZZA-AA.

9 Connect a BN21K or BN21L cable between the BN21W-0B "Y" (orH885-AA tri-link connector) of the DWZZA-AAs and HSZ10 or HSZ40.The number of these cables will be the same as the number of PMAZCcontrollers in the Available Server configuration. Make sure that youcreate a daisy-chain while keeping the terminators on both ends of theshared bus.

Configuring TruCluster Available Server Hardware 3–39

Page 114: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–19 Available Server Configuration with Two DEC 3000 Model500 Systems with PMAZC SCSI Controllers and anHSZ40

T TT

T

T T

1

3 2

4

5 T

Network Interface

DWZZA−AA

H885−AA Tri−link Connectorwith H879−AA Terminator

PMAZC PMAZC

DWZZA−AA

H885−AA Tri−link Connector

DEC 3000 Model 500

HSZ40 and DECRAID Subsystem

BN21R orBN23GCable

ZKOX−3927−14−RGS

BN21K orBN21LCable

H885−AATri−linkConnector

1

3 2

4

5 T

withH879−AATerminator

BN21K orBN21LCable

PMAZC

Table 3–19 provides a list of hardware needed for differentialAvailable Server configurations using PMAZC SCSI buscontrollers or KZMSA XMI to SCSI adapters and an HSZ40with DEC RAID subsystem. This table can also be used forconfigurations using PMAZC SCSI controllers and an HSZ10. †

Table 3–19 Hardware Needed for a Differential Available Server Configuration with PMAZC orKZMSA SCSI Controllers and an HSZ40 or PMAZCs and an HSZ10

Number ofSystems 1 DWZZA-AA

BN21W-0B"Y" Cables orH855-AA Tri-linkConnectors

H789-AATerminators

BN21K orBN21L Cables

BN21R or BN23GCables

2 2 3 2 2 2

3 3 4 2 3 3

1You can have only three systems in an Available Server configuration that include a PMAZC SCSI controller or aKZMSA XMI to SCSI adapter.

† You cannot use an HSZ10 in Available Server configurations with a KZMSAXMI to SCSI adapter.

3–40 Configuring TruCluster Available Server Hardware

Page 115: Truclu Ase

Configuring TruCluster Available Server Hardware

PMAZC DualSCSI ModuleJumpers

Figure 3–20 shows the jumpers on a PMAZC SCSI controller.

Figure 3–20 PMAZC Dual SCSI Module Jumpers

MLO-009707

1 2 3

1 W2 and W3 terminator jumpers: When installed, thesejumpers provide required termination to one end of the twoSCSI buses. W2 is for Port A, and is the left-most jumperin the figure. W3 is for Port B. They are shown as beingremoved.

2 Jumper rests used for storing jumpers that have beenremoved.

3 W1 is the flash memory write jumper. It should not beinstalled except to update the ROM code or when using thesetid utility to change the SCSI ID or bus speed.

Verifying andSetting PMAZCand KZTSASCSI ID andBus Speed

This topic provides examples of how to display and change theSCSI ID or bus speed for the PMAZC or SCSI ID for the KZTSASCSI controller.

To display the SCSI ID and bus speed for a PMAZC or KZTSASCSI ID, shut down the system. Use the console show configcommand to determine the PMAZC or KZTSA configurations.Example 3–1 shows that the DEC 3000 Model 500 has PMAZC-AA SCSI controllers in two TURBOchannel slots, TC0 and TC1,and a KZTSA in TURBOchannel slot TC3.

Configuring TruCluster Available Server Hardware 3–41

Page 116: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–1 Displaying DEC 3000 Configuration

>>> show configDEC 3000 - M500Digital Equipment Corporation

VPP PAL X5.48-82000101/OSF PAL X1.35-82000201 - Build on 20-JUL-1994 11:07:03.31

TCINFO DEVNAM DEVSTAT------ -------- -------

CPU OK KN15-AA -V5.1-5748-t19D-sV;.?-DECchip 21064 P2.1OSC 150 MHZ

ASIC OKMEM OK

8CXT OK

7NVR OKSCC OK

NI OKISDN OK

6SCSI OK

1-PMAZC-AA TC10-PMAZC-AA TC03-KZTSA-AA TC3>>>

To display the SCSI ID or bus speed for a specific PMAZCor KZTSA, use the t tc# cnfg console command shown inExample 3–2 and Example 3–3. In this command the numbersign (#) specifies the TURBOchannel slot number. For instance,in Example 3–2, the PMAZC-AA in TURBOchannel slot 0 and slot1 both have SCSI IDs of 7 and are set to slow speed.

Example 3–2 Displaying PMAZC Bus Speed and SCSI ID

>>> t tc0 cnfgDEC PMAZC-AA V2.0 Port A Slow Port B Slow (Dual SCSI [53CF96])

BOOTDEV ADDR DEVTYPE NUMBYTES RM/FV WP DEVNAM REV------- ---- ------- -------- ----- -- ------ ---

..HostID.. A/7 INITR

..HostID.. B/7 INITR>>> t tc1 cnfgDEC PMAZC-AA V2.0 Port A Slow Port B Slow (Dual SCSI [53CF96])

BOOTDEV ADDR DEVTYPE NUMBYTES RM/FX WP DEVNAM REV------- ---- ------- -------- ----- -- ------ ---

..HostID.. A/7 INITRDKB000 B/0/0 DISK 1GB FX RZ26 T386DKB100 B/1/0 DISK 1GB FX RZ26 392A

..HostID.. B/7 INITR>>>

Example 3–3 shows how to display the SCSI ID for the KZTSA.Note that the KZTSA only has one port.

3–42 Configuring TruCluster Available Server Hardware

Page 117: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–3 Displaying KZTSA SCSI ID and Bus Speed

>>> t tc3 cnfgDEC KZTSA-AA A09 (SCSI = 7, Slow)

--------------------------------------------------DEV PID VID REV SCSI DEV======= ================ ======== ======= ========dka0000 HSZ40-Bx (C) DEC DEC V21Z DIRdka0100 RZ28 (C) DEC DEC 442D DIRdka0300 RZ28B (C) DEC DEC 0006 DIR>>>

To set the SCSI ID or bus speed for both PMAZC ports, use the tcommand with the following format:

t tc# setid x y

The number sign (#) is the TURBOchannel slot, x is the SCSI IDor speed (s = slow and f = fast) for port A and y is the SCSI ID orspeed for port B.

Example 3–4 shows commands for setting the SCSI ID to 6for both ports and setting the speed to fast for the PMAZC inTURBOchannel slot 1, then verifying the changes.

Example 3–4 Setting PMAZC SCSI ID and Bus Speed

>>> t tc1 setid 6 6Precharging..................................................................Erasing..................................................................Programming..................................................................Checksum GOOD>>> t tc1 setid f fPrecharging..................................................................Erasing..................................................................Programming..................................................................Checksum GOOD>>> t tc1 cnfgDEC PMAZC-AA V2.0 Port A Fast Port B Fast (Dual SCSI [53CF96])

BOOTDEV ADDR DEVTYPE NUMBYTES RM/FV WP DEVNAM REV------- ---- ------- -------- ----- -- ------ ---

..HostID.. A/6 INITR

..HostID.. B/6 INITR>>>

Configuring TruCluster Available Server Hardware 3–43

Page 118: Truclu Ase

Configuring TruCluster Available Server Hardware

Use the same command to set the SCSI ID or bus speed for theKZTSA, except that the KZTSA has only one port. Example 3–5shows how to set the SCSI ID to 5 and set bus speed to fast forthe KZTSA in TURBOchannel slot 1.

Note in Example 3–5, that after you change the KZTSA SCSI ID,you must reset the SCSI bus to effect the ID change. The SCSIbus reset is not needed to change the speed.

Example 3–5 Setting KZTSA SCSI ID and Bus Speed

>>> t tc3 setid 5>>> t tc3 setid f>>> t tc3 cnfgDEC KZTSA-AA A09 (SCSI = 7, Fast)

--------------------------------------------------DEV PID VID REV SCSI DEV======= ================ ======== ======= ========dka0000 HSZ40-Bx (C) DEC DEC V21Z DIRdka0100 RZ28 (C) DEC DEC D41C DIRdka0300 RZ28B (C) DEC DEC 0006 DIR>>>INIT>>> t tc3 cnfgDEC KZTSA-AA A09 (SCSI = 5, Fast)

--------------------------------------------------DEV PID VID REV SCSI DEV======= ================ ======== ======= ========dka0000 HSZ40-Bx (C) DEC DEC V21Z DIRdka0100 RZ28 (C) DEC DEC D41C DIRdka0300 RZ28B (C) DEC DEC 0006 DIR>>>

Setting Upan AvailableServerConfigurationUsing a KZTSATURBOchannelto SCSIAdapter

This section is specific to an Available Server configuration usingKZTSA TURBOchannel to SCSI adapters. The KZTSA is adifferential single channel SCSI adapter. The use of a KZTSA in aDEC 3000 system simplifies hardware configuration and reducesthe total number of required DWZZAs.

Use Table 3–20 to set up an Available Server configuration usingKZTSA TURBOchannel to SCSI adapters with a BA350, BA353,or BA356 storage expansion unit.

Figure 3–21 shows an Available Server configuration with twoDEC 3000 Model 500 systems with KZTSA TURBOchannel SCSIadapters on a shared bus with a BA350 storage expansion unit.

Figure 3–22 shows an Available Server configuration with twoDEC 3000 Model 500 systems with KZTSA TURBOchannel SCSIadapters on a shared bus with a BA356 storage expansion unit.

3–44 Configuring TruCluster Available Server Hardware

Page 119: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–21 lists the hardware necessary to generateconfigurations using two, three, or four systems. Thesesystems use KZTSA TURBOchannel SCSI adapters or KZPSAPCI SCSI adapters in an Available Server configuration with aBA350, BA353, or BA356 storage expansion unit on the sharedbus.

Table 3–20 Setting Up an Available Server Configuration with KZTSA TURBOchannel to SCSIAdapters and a BA350, BA353, or BA356

Step Action Refer to:

1 For each DEC 3000 system that will have a KZTSA on the sharedSCSI bus, shut down the system and install the KZTSA.

KZTSA SCSI StorageAdapter Installation andUser’s Guide

Disable the KZTSA internal SCSI termination by removing the J1, J2,J3, J6, and J7 terminator packs.

Figure 3–24

2 The default SCSI ID for a KZTSA is 7. Turn on the system power andset the KZTSA SCSI ID if necessary.

Example 3–1, 3–3, and3–5

3 If firmware has to be updated, boot from the Alpha Systems FirmwareUpdate CD–ROM.

Refer to the firmwarerelease notes for thesystem.

4 You will need one DWZZ*. It can be a DWZZA-AA, DWZZA-VA, orDWZZB-VW.

For each DWZZ*, remove the five differential terminator resistor SIPs. Figure 3–11, Figure 3–12,or Figure 3–14 asappropriate

The DWZZ* single-ended termination is dependent on the type ofStorageWorks device used:

BA350 Ensure that a DWZZA-AA or DWZZA-VA single-endedtermination jumper, J2 is installed.

BA353 For a DWZZA-AA, ensure that the single-ended SCSItermination jumper, J2 is installed.

For the DWZZA-VA, remove the single-ended SCSItermination jumper, J2 and install terminator 12-37004-04 onthe BA353 SCSI input connector.

BA356 Ensure that the DWZZB-VW single-ended terminationjumpers, W1 and W2 are installed.

5 Install a BN21W-0B "Y" cable or H885-AA tri-link connector on theexternal SCSI p-connector of each KZTSA and on the differentialconnector of the DWZZA or DWZZB.

6 Connect an H879-AA terminator to one leg of each "Y" cable or tri-link connector of the two controllers or the device (KZTSA, BA350, orBA353) that will be on the end of the shared bus.

7 Connect a BN21K or BN21L cable between all open connectorson BN21W-0B "Y" cables or tri-link connectors, daisy-chaining thedevices.

(continued on next page)

Configuring TruCluster Available Server Hardware 3–45

Page 120: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–20 (Cont.) Setting Up an Available Server Configuration with KZTSA TURBOchannel toSCSI Adapters and a BA350, BA353, or BA356

Step Action Refer to:

8 If you are using a DWZZA-AA, connect a BN21R or BN23G cable fromthe DWZZA single-ended jack to JA1 on the BA350 or the BA353input connector.

9 Ensure that the BA350 terminator and SCSI jumper are bothinstalled.

Figure 3–8

Ensure that the BA356 SCSI jumper is installed. Figure 3–10

If you are using a BA353 with a DWZZA-VA, ensure that terminator12-37004-04 is installed on the BA353 SCSI input connector.

Figure 3–9

Figure 3–21 Available Server Configuration with Two DEC 3000Model 500 Systems Using KZTSA SCSI Adapters and aSingle-Ended Shared Bus with a BA350

1

2

3

4

5T

T

2H885−AATri−linkConnectorWith H879−AATerminator

T

TH885−AA

DWZZA−VAWith H885−AATri−link Connector

Tri−linkConnectorWith H879−AATerminator

Ethernet Interface

DEC 3000 Model 500 DEC 3000 Model 500

KZTSA

KZTSA

ZKOX−3927−15−RGS

BN21K orBN21LCable

BN21K orBN21LCable

BA350

3–46 Configuring TruCluster Available Server Hardware

Page 121: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–22 Available Server Configuration with Two DEC 3000Model 500 Systems Using KZTSA SCSI Adapters and aSingle-Ended Shared Bus with a BA356

1

2

3

4

5T

T

2H885−AATri−linkConnectorWith H879−AATerminator

T

TH885−AATri−linkConnectorWith H879−AATerminator

Ethernet Interface

DEC 3000 Model 500 DEC 3000 Model 500

KZTSA

KZTSA

ZKOX−5481−03−RGS

BN21K orBN21LCable

BN21K orBN21LCable

BA356

DWZZB−VWWith H885−AATri−link Connector

Table 3–21 Hardware Needed for a KZPSA or KZTSA and BA350, BA353, or BA356 AvailableServer Configuration

Number ofSystems

BN21W-0B"Y" Cables orH885-AA Tri-linkConnectors

H789-AATerminators

BN21K orBN21L SCSICables

DWZZA-AAand BN21Ror BN23GCable 1;2

DWZZA-VA 1;3 orDWZZB-VW 1

2 3 2 2 1 1

3 4 2 3 1 1

4 5 2 4 1 1

1Use either a DWZZA-AA or DWZZA-VA with a BA350 or BA353. Use a DWZZB-VW with a BA356.2The BN21R (BN23G) cable is not needed if you use a DWZZA-VA or DWZZB-VW.3If you use a DWZZA-VA with a BA353, you need a 12-37004-04 terminator installed on the BA353 SCSI input connector.

Use Table 3–22 to set up an Available Server configurationusing KZTSA TURBOchannel to SCSI adapters with an HSZ40.Remember, the KZTSA is a TURBOchannel to fast-widedifferential SCSI adapter, and therefore, when used with anHSZ40, you do not have to use a DWZZA in the Available Serverconfiguration.

Figure 3–23 shows an Available Server configuration with twoDEC 3000 Model 500 systems with KZTSA TURBOchannel SCSIadapters on a shared bus with an HSZ40.

Table 3–23 lists the hardware necessary to generateconfigurations using two, three, or four systems. Thesesystems use KZTSA TURBOchannel SCSI adapters or KZPSAPCI SCSI adapters in an Available Server configuration with anHSZ40 on the shared bus.

Configuring TruCluster Available Server Hardware 3–47

Page 122: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–22 Setting Up an Available Server Configuration with KZTSA TURBOchannel to SCSIAdapters and an HSZ40

Step Action Refer to:

1 For each DEC 3000 system that will have a KZTSA on the sharedSCSI bus, shut down the system and install the KZTSA.

KZTSA SCSI StorageAdapter Installation andUser’s Guide

Disable the KZTSA internal SCSI termination by removing the J1, J2,J3, J6, and J7 terminator packs.

Figure 3–24

2 The default SCSI ID for a KZTSA is 7. Turn on the system power andset the KZTSA SCSI ID if necessary.

Example 3–1, 3–3, and3–5

3 Install a BN21W-0B "Y" cable or H885-AA tri-link connector on theexternal SCSI p-connector of each KZTSA and on the HSZ40 inputconnector.

4 Connect an H879-AA terminator to one leg of each "Y" cable or tri-linkconnector of the two controllers or the device (KZTSA, HSZ40) thatwill be on the end of the shared bus.

5 Connect a BN21K or BN21L cable between all open connectorson BN21W-0B "Y" cables or tri-link connectors, daisy-chaining thedevices.

Figure 3–23 Two DEC 3000 Model 500 Systems with KZTSATURBOchannel SCSI Adapters in an Available ServerConfiguration with an HSZ40

T T

TT

H885−AA Tri−linkConnector withH879−AATerminator

KZTSA

DEC 3000 Model 500

Network Interface

DEC 3000 Model 500

KZTSA

H885−AA Tri−linkConnector withH879−AATerminator

BN21KorBN21LCable

BN21KorBN21LCable

ZKOX−3927−16−RGS

H879−AATerminator

H885−AATri−linkConnector

HSZ40 with DECRAID Subsystem

3–48 Configuring TruCluster Available Server Hardware

Page 123: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–23 Hardware Needed for an Available Server Configurationwith a KZPSA or KZTSA and an HSZ40

Number ofSystems

BN21W-0B "Y" Cablesor H885-AA Tri-linkConnectors

H879-AATerminator

BN21K orBN21L SCSICable

2 3 2 2

3 4 2 3

4 5 2 4

Figure 3–24 KZTSA Jumpers and Termination

NUO_010_000581_09_RGS

6

9

7

2

J2

J3

J6

J7

J1

1

8 3 4 5

2

1 Internal SCSI bus p-connector

2 Near-end SCSI bus terminator packs

3 Yellow LED—Power-on self-test passed

4 Red LED—Power-on self-test failed

5 Green LED—SCSI bus terminator power is functional

6 Jumper W1—Installed: In-line fuse that protects the onboardSCSI bus terminator power supply

Configuring TruCluster Available Server Hardware 3–49

Page 124: Truclu Ase

Configuring TruCluster Available Server Hardware

7 Jumper W2—Not installed: Manufacturing use only

8 Jumper W3—Installed: Enables terminator power onto theSCSI bus

9 Jumper W4—Not installed: Manufacturing use only

Setting Upan AvailableServerConfigurationwith KZMSASCSIControllers

The KZMSA is an XMI to SCSI adapter used in DEC 7000 orDEC 10000 systems. It is a dual channel, single-ended SCSIcontroller. The KZMSA internal termination cannot be removed.It must therefore be used with DWZZA-AA signal converter toprovide the proper SCSI bus termination and allow isolation ofa KZMSA and its associated system for maintenance purposes.When using a KZMSA for a shared SCSI bus in an AvailableServer configuration, make sure that you are connecting the busto the same KZMSA channel as on other KZMSA or PMAZC SCSIcontrollers.

Each KZMSA used for a shared SCSI bus in an Available Serverconfiguration must have the revision F03 boot ROM. If necessary,a revision F01 or F02 boot ROM must be replaced with a revisionF03 boot ROM. The part numbers for the various revisions ofKZMSA boot ROMs are shown in Table 3–24.

Table 3–24 KZMSA Boot ROM Part Numbers

Part Number Revision

23-368E9-01 F01

23-386E9-01 F02

23-419E9-01 F03

You can determine the KZMSA hardware revision by booting theLFU utility and using the console commands, or by examining the23-class part number printed on the boot ROM located at moduleposition E7. The LFU utility is covered later.

Only KZMSAs with Rev D NCR 53C710 chips can be used in anAvailable Server configuration. The chip must have part numbers609-3400546 or 609-3400563.

Follow the steps in Table 3–25 to set up an Available Serverconfiguration using only KZMSAs and a BA350, BA353, or BA356storage expansion unit.

Figure 3–25 shows an Available Server configuration with twoDEC 7000 systems with KZMSA XMI to SCSI adapters on ashared bus with a BA350.

Table 3–15 provides a list of hardware needed for differentialAvailable Server configurations using PMAZC SCSI buscontrollers or KZMSA XMI to SCSI bus adapters and BA350,BA353, or BA356 storage expansion units. DWZZ* signalconverters are used, one of which may be a DWZZA-VA or

3–50 Configuring TruCluster Available Server Hardware

Page 125: Truclu Ase

Configuring TruCluster Available Server Hardware

DWZZB-VW. If a DWZZA-VA is used, it must be installed in slot0 of the BA350. If a DWZZB-VW is used, it must be installed inslot 0 of the BA356.

Note

You can have only three systems in an Available Serverconfiguration that include a KZMSA XMI to SCSI adapter.

Table 3–25 Setting Up an Available Server Configuration with KZMSA XMI to SCSI Adaptersand a BA350, BA353, or BA356

Step Action Refer to:

1 For each DEC 7000 or DEC 10000 system using a KZMSA on theshared bus, shut down the system and install the KZMSA in an XMIslot, keeping in mind that all SCSI controllers on the shared SCSI busin an Available Server configuration must be on the same logical SCSIbus.

KZMSA AdapterInstallation Guide

Boot the Loadable Firmware Update (LFU) utility to configure theKZMSA hardware.

Example 3–6

Update the KZMSA firmware if necessary. Example 3–7,Example 3–8

Set the SCSI IDs for the KZMSA. Example 3–7,Example 3–9

Enable the Disable Reset configuration option for any KZMSA channelthat will be used for a shared SCSI bus and disable the option for anychannel not used on a shared SCSI bus.

Example 3–7,Example 3–9

Enable (disable) fast SCSI speed for the KZMSA. Example 3–7,Example 3–9

2 You will need one DWZZA-AA for each KZMSA on the shared SCSIbus and a DWZZ* for the BA350, BA353, or BA356.

For each DWZZ*, remove the five differential terminator resistor SIPs. Figure 3–13, Figure 3–12,or Figure 3–14

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for the single-endedSCSI bus segment.

Figure 3–11

If a DWZZA-VA is to be installed in the BA350 or BA353, make surethat the single-ended SCSI termination jumper, J2, is installed.

Figure 3–12

If a DWZZB-VW is to be installed in a BA356, make sure that thesingle-ended SCSI termination jumpers, W1 and W2, are installed.

Figure 3–14

If a DWZZB-AA is to be used (external to the BA356) ensure that thesingle-ended SCSI termination jumpers, W1 and W2, are installed

Figure 3–13

3 For each KZMSA used in the Available Server configuration, installa BN21R or BN23G cable between the KZMSA connector for theappropriate channel and the single-ended connector on a DWZZA-AA.

(continued on next page)

Configuring TruCluster Available Server Hardware 3–51

Page 126: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–25 (Cont.) Setting Up an Available Server Configuration with KZMSA XMI to SCSIAdapters and a BA350, BA353, or BA356

Step Action Refer to:

4 Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on thedifferential connector of each DWZZ*.

5 If you use a DWZZA-VA, install it in slot 0 of the BA350, or anyBA353 slot.

For a BA356, use a DWZZB-VW and install it in slot 0 of the BA356.

6 Install an H879-AA differential terminator on one leg of the twoBN21W-0B "Y" cables or tri-link connectors for the two adapters orthe device that will be on the ends of the shared SCSI bus.

7 If you are using a DWZZA-AA for the connection to the BA350 orBA353, connect a BN21R or BN23G cable between the single-endedDWZZA-AA connector and BA350 JA1 or the BA353 SCSI inputconnector.

If you are using a DWZZB-AA for the connection to the BA356,connect a BN21K or BN21L cable between the single-ended DWZZB-AA connector and BA356 JA1.

8 Install a BN21K or BN21L cable between the open connections onthe BN21W-0B "Y" cable or tri-link connector, creating a daisy-chainfrom one DWZZ* to another. Make sure that the "Y" cables or tri-linkconnectors with terminators are on the end of the bus.

9 Ensure that the BA350 terminator and SCSI jumper are bothinstalled. Ensure that the BA356 SCSI jumper is installed.

Figure 3–8 andFigure 3–10

If you are using a BA353 with a DWZZA-VA, ensure that terminator12-37004-04 is installed on the BA353 SCSI input connector.

Figure 3–9

3–52 Configuring TruCluster Available Server Hardware

Page 127: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–25 Available Server Configuration with Two DEC 7000 withKZMSA XMI SCSI Adapters and a BA350

T

T

T

T T

T

T T T

0

1

2

3

4

5

T T

DEC 7000

DWZZA−AA

H885−AA Tri−linkConnector WithH879−AA Terminator

Network Interface

BN21RorBN23GCable

BN21KorBN21LCable

H885−AA Tri−linkConnector

DEC 7000

BN21R

BN21R

or

or

BN23G

BN23G

Cable

Cable

BA350KZMSA

DWZZA−AA

DWZZA−AA

KZMSA

ZKOX−3927−17−RGS

Configuring TruCluster Available Server Hardware 3–53

Page 128: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–26 Available Server Configuration with Two DEC 7000 withKZMSA XMI SCSI Adapters and a BA356

T T

T T T T T

DEC 7000

DWZZA−AA

H885−AA Tri−linkConnector andH879−AA Terminator

Network Interface

BN21RorBN23GCable

H885−AA Tri−linkConnector

DEC 7000

BN21RorBN23GCable

KZMSA

DWZZA−AA

KZMSA

ZKOX−5481−07−RGS

1

2

3

4

BA356

DWZZB−VW with H885−AATri−link Connector andH879−AA Terminator

BN21KorBN21LCable

Follow the steps in Table 3–26 to set up an Available Serverconfiguration using only KZMSAs and an HSZ40.

Figure 3–27 shows an Available Server configuration with twoDEC 7000 systems with KZMSA XMI SCSI adapters on a sharedbus with an HSZ40.

Table 3–19 provides a list of hardware needed for differentialAvailable Server configurations using PMAZC SCSI buscontrollers or KZMSA XMI to SCSI adapters and an HSZ40 withDEC RAID subsystem.

Table 3–26 Setting Up an Available Server Configuration with KZMSA XMI to SCSI Adaptersand an HSZ40

Step Action Refer to:

1 For each DEC 7000 or DEC 10000 system using a KZMSA on theshared bus, shut down the system and install the KZMSA in an XMIslot, keeping in mind that all SCSI controllers on the shared SCSI busin a Available Server configuration must be on the same logical SCSIbus.

KZMSA AdapterInstallation Guide

(continued on next page)

3–54 Configuring TruCluster Available Server Hardware

Page 129: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–26 (Cont.) Setting Up an Available Server Configuration with KZMSA XMI to SCSIAdapters and an HSZ40

Step Action Refer to:

Boot the Loadable Firmware Update (LFU) utility to configure theKZMSA hardware.

Example 3–6

Update the KZMSA firmware if necessary. Example 3–7, 3–8

Set the SCSI IDs for the KZMSA. Example 3–7, 3–9

Enable the Disable Reset configuration option for any KZMSA channelthat will be used for a shared SCSI bus and disable the option for anychannel not used on a shared SCSI bus.

Example 3–7, 3–9

Enable (disable) fast SCSI speed for the KZMSA. Example 3–7, 3–9

2 You will need a DWZZA-AA for each KZMSA XMI to SCSI adapter inthe Available Server configuration.

For each DWZZA-AA:

Ensure that the single-ended SCSI jumper, J2, is installed.Remove the five differential terminator resistor SIPs.

Figure 3–11

3 For each KZMSA used in the ACE configuration, install a BN21Ror BN23G cable between the KZMSA connector for the appropriatechannel and the DWZZA-AA single-ended connector.

4 Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on theHSZ40 input connector and the DWZZA-AA differential connector.

5 Install an H879-AA differential terminator on one leg of the BN21W-0B "Y" cables or H885-AA tri-link connectors for the two adapters orthe device that will be on the ends of the shared SCSI bus.

6 Install a BN21K or BN21L cable between the unused connections onthe BN21W-0B "Y" cable or tri-link connector, creating a daisy-chainfrom one DWZZA to another. Make sure that the "Y" cables or tri-linkconnectors with the terminators are on the end of the bus.

Configuring TruCluster Available Server Hardware 3–55

Page 130: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–27 Available Server Configuration with Two DEC 7000Systems Using KZMSA XMI to SCSI Adapters with anHSZ40

T T

T TT

T

T

DEC 7000

DWZZA−AA

BN21RorBN23GCable

H885−AA Tri−linkConnector WithH879−AA Terminator

H885−AA Tri−linkConnector

Network Interface

T

DEC 7000

DWZZA−AA BN21RorBN23GCable

BN21K or BN21LCable

HSZ40 With DECRAID Subsystem

KZMSA KZMSA

ZKOX−3927−18−RGS

H885−AATri−linkConnectorWithH879−AATerminator

Preparing aKZMSA for Usein an AvailableServerEnvironment

If you are using a DEC 7000 or DEC 10000 system with a KZMSAin your Available Server configuration, you may have to updatethe KZMSA firmware, change the SCSI ID or bus speed, or enableor disable the Disable Reset option.

For the DEC 7000 and DEC 10000 systems, use the LoadableFirmware Update (LFU) utility to perform these hardwaretasks. Shut down the system then load the LFU, as shown inExample 3–6.

1 At the console prompt, use the show device kzmsa0 command todetermine the name of the RRD42 drive.Load the CD–ROM into an RRD42 caddy and insert the caddyinto the RRD42 drive. The CD–ROM that includes both theLFU utility and the KZMSA revision 5.6 firmware has thelabel:Alpha AXP Systems Firmware Update 2.9

2 Boot the LFU utility.

3–56 Configuring TruCluster Available Server Hardware

Page 131: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–6 Booting the LFU Utility

>>> show device kzmsa0 1polling for units on kzmsa0, slot2, xmi0...dka100.1.0.2.0 dka100 RRD42>>> boot DKA100 -flag 0,80 2

Boot File: KZMSA_LFU.EXE 3

Booting...

****** Loadable Firmware Update Utility ******

--------------------------------------------------------------Function Description--------------------------------------------------------------

Display Displays the system’s configuration table.Exit Returns to loadable offline operating environment.List Lists the device types and firmware revisions

supported by this revision of LFU.Modify Modifies port parameters and device attributes.Show Displays device mnemonic, hardware and firmware

revisions.Update Replaces current firmware with loadable data image.Verify Compares loadable and device images.? or Help Scrolls the function table.--------------------------------------------------------------

Function? 4

3 When prompted, specify the name of the secondary bootstrapfile (KZMSA_LFU.EXE).

4 Enter the command for the task you want to perform.

You can display the information about the hardware configurationwith the LFU utility using the display command, as shown inExample 3–7.

Configuring TruCluster Available Server Hardware 3–57

Page 132: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–7 Using the LFU Utility to Display Hardware Configuration

Function? display 1Name Type Rev Mnemonic FW Rev HW Rev

LSB0+ KN7AA (8001) 0000 kn7aa0 1.0 E045+ MS7AA (4000) 0000 ms7aa0 N/A A017+ MS7AA (4000) 0000 ms7aa1 N/A A018+ IOP (2000) 0001 iop0 N/A A

C0 XMI xmi08+ DWLMA (102A) A5A6 dwlma0 N/A AB+ KZMSA (0C36) 5143 kzmsa0 2 4.3 F01C+ KZMSA (0C36) 5143 kzmsa1 2 4.3 F01E+ DEMNA (0C03) 060B demna0 6.8

C1 XMI1+ KZMSA (0C36) 5343 kzmsa2 3 4.3 F032+ KZMSA (0C36) 5343 kzmsa3 3 4.3 F038+ DWLMA (102A) A5A6 dwlma1 N/A AFunction?

1 Enter the display command to display the configuration.

2 kzmsa0 and kzmsa1 have the revision 4.3 firmware and therevision F01 hardware.

3 kzmsa2 and kzmsa3 have the revision 4.3 firmware and therevision F03 hardware.

If the KZMSA firmware is not up to the correct revision, use theLFU utility update command to update it. Note that the CD–ROMcontaining the firmware must be installed in the RRD42. Theupdate command has the format:

update kzmsa#

where the number sign (#) indicates the number of the KZMSAwhich is to have the firmware updated.

Example 3–8 shows how to update the firmware for kzmsa2 toversion 5.6.

3–58 Configuring TruCluster Available Server Hardware

Page 133: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–8 Using the LFU Utility to Update KZMSA Firmware

Function? update kzmsa2 1

Update kzmsa2? [Y/(N)] Return

WARNING: updates may take several minutes to complete for each device.

DO NOT ABORT!kzmsa2 Updating to 5.6... Reading Device... Verifying 5.6... PASSED.

Function? display 2Name Type Rev Mnemonic FW Rev HW Rev

LSB0+ KN7AA (8001) 0000 kn7aa0 1.0 E045+ MS7AA (4000) 0000 ms7aa0 N/A A017+ MS7AA (4000) 0000 ms7aa1 N/A A018+ IOP (2000) 0001 iop0 N/A A

C0 XMI xmi08+ DWLMA (102A) A5A6 dwlma0 N/A AB+ KZMSA (0C36) 5143 kzmsa0 3 4.3 F01C+ KZMSA (0C36) 5143 kzmsa1 3 4.3 F01E+ DEMNA (0C03) 060B demna0 6.8

C1 XMI1+ KZMSA (0C36) 5356 kzmsa2 4 5.6 F032+ KZMSA (0C36) 5343 kzmsa3 5 4.3 F038+ DWLMA (102A) A5A6 dwlma1 N/A AFunction?

1 Update the firmware for kzmsa2 .

2 Display the configuration to verify that the firmware has beenupdated.

3 kzmsa0 and kzmsa1 are still at firmware revision 4.3.

4 kzmsa2 is now at firmware revision 5.6.

5 kzmsa3 is still at firmware revision 4.3.

Use the LFU utility modify kzmsa # command to display detailedinformation about a specific KZMSA and to:

• Change the SCSI ID.

• Enable or disable the fast SCSI option for a particularchannel.

• Enable or disable the Disable Reset option.

Example 3–9 shows how to use the LFU utility modify commandto display detailed information, set the SCSI ID, enable fast SCSIbus speed, and enable the Disable Reset option for kzmsa2 .

Configuring TruCluster Available Server Hardware 3–59

Page 134: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–9 Using the LFU Utility to Modify KZMSA Options

Function? modify kzmsa2 1kzmsa2Local Console: ENABLEDLog Selftest Errors: ENABLEDLog NRC 53C710 RBD Errors: ENABLEDLog XMI RBD Errors: ENABLEDLog XZA RBD Errors: ENABLEDRBD Error Logging: DISABLEDRBD Error Frame Overflow: DISABLED Read OnlyHard Error Frame Overflow: DISABLED Read OnlySoft Error Frame Overflow: DISABLED Read OnlyFW Update Error Frame Overflow: DISABLED Read OnlyDisable Reset Channel 0: DISABLED 2Disable Reset Channel 1: DISABLED 2Chnl 0 Fast SCSI: DISABLED 3Chnl 1 Fast SCSI: DISABLED 3Channel_0 ID: 07 4Channel_1 ID: 07 4Module Serial Numbers: *SG90XXX455*Do you wish to modify any of these parameters? [y/(n)] Return

Local Console: ENABLED Change? [y/(n)] Return

Log Selftest Errors: ENABLED Change? [y/(n)] Return

.

.

.

Disable Reset Channel 0: DISABLED Change? [y/(n)] y 5Disable Reset Channel 1: DISABLED Change? [y/(n)] y 5Chnl 0 Fast SCSI: DISABLED Change? [y/(n)] y 6Chnl 1 Fast SCSI: DISABLED Change? [y/(n)] y 6Channel_0 ID: 07 Change? [y/(n)] y 7Valid ID is a value from 0 to 7.Enter new Channel ID: 6 7Channel_1 ID: 07 Change? [y/(n)] y 7Valid ID is a value from 0 to 7.Enter new Channel ID: 6 7Module Serial Numbers: *SG90XXX455* Change? [y/(n)] nLocal Console: ENABLEDLog Selftest Errors: ENABLEDLog NRC 53C710 RBD Errors: ENABLEDLog XMI RBD Errors: ENABLEDLog XZA RBD Errors: ENABLEDRBD Error Logging: DISABLEDRBD Error Frame Overflow: DISABLED Read OnlyHard Error Frame Overflow: DISABLED Read OnlySoft Error Frame Overflow: DISABLED Read OnlyFW Update Error Frame Overflow: DISABLED Read OnlyDisable Reset Channel 0: ENABLED 8Disable Reset Channel 1: ENABLED 8Chnl 0 Fast SCSI: ENABLED 9

(continued on next page)

3–60 Configuring TruCluster Available Server Hardware

Page 135: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–9 (Cont.) Using the LFU Utility to Modify KZMSA Options

Chnl 1 Fast SCSI: ENABLED 9Channel_0 ID: 06 1 0

Channel_1 ID: 06 1 0

Module Serial Numbers: *SG909T1455*Modify kzmsa2 with these parameter values? [y/(n)] y 1 1

Function? exit>>>

1 Execute the LFU modify command to modify the options forkzmsa2 . The present options are displayed first.

2 The Disable Reset option for both channels is disabled.

3 The fast SCSI option is disabled for both channels.

4 The ID for both channels is 7.

5 Enable the Disable Reset option for channels 0 and 1.

6 Enable the fast SCSI option for channels 0 and 1.

7 Change the SCSI ID for channels 0 and 1 to 6.

8 The LFU utility is set up to enable the Disable Reset option.

9 The LFU utility is set up to enable the fast SCSI option.

1 0 The LFU utility is set up to set the SCSI ID for both channelsto 6.

1 1 Cause the options to be changed to the requested values.

Setting Upan AvailableServerConfigurationUsing KZPSAPCI to SCSIAdapters

The KZPSA PCI to SCSI bus adapter is installed in a PCI slotof any supported AlphaServer for use in an Available Serverenvironment.

The KZPSA is a fast, wide differential adapter with only a singleport, so only one differential shared SCSI bus can be connected toa KZPSA adapter.

The KZPSA operates at fast or slow speed and is compatible withnarrow or wide SCSI. The fast speed is 10 MB/sec for a narrowSCSI bus and 20 MB/sec for a wide SCSI bus. The slow speed is5 MB/sec for a narrow SCSI bus and 10 MB/sec for a wide SCSIbus.

Use Table 3–27 to set up an Available Server configuration withKZPSA adapters and a BA350, BA353 or BA356.

Figure 3–28 shows an Available Server configuration with twoAlphaServer 2100 systems with KZPSA PCI to SCSI adapters ona shared bus with a BA350 storage expansion unit.

Table 3–21 shows the hardware components needed forconfigurations using KZPSA PCI to SCSI adapters (or KZTSATURBOchannel to SCSI adapters) and a BA350, BA353, orBA356.

Configuring TruCluster Available Server Hardware 3–61

Page 136: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–27 Setting Up an Available Server Configuration Using KZPSA Adapters and a BA350,BA353, or BA356

Step Action Refer to:

1 Install a KZPSA PCI to SCSI bus adapter in the PCI slot correspondingto the logical bus to be used for the shared SCSI bus.

KZTSA SCSI StorageAdapter Installationand User’s Guide

Remove the KZPSA internal termination resistors, Z1, Z2, Z3, Z4, andZ5.

Figure 3–30

2 Use the show config , show device , and show pk#* console commandsto display the installed devices and information about the KZPSAs onthe AlphaServer 1000, 2000, or 2100 systems.

Example 3–10

If necessary, update the KZPSA firmware by booting from the AlphaSystems Firmware Update CD–ROM.

Refer to the firmwarerelease notes for thesystem

Set the KZPSA SCSI bus ID and bus speed as necessary for thisconfiguration.

Example 3–11

3 You will need one DWZZ*. It can be a DWZZA-AA, DWZZA-VA, orDWZZB-VW. It is recommended that a DWZZB-VW be used.

For each DWZZ*, remove the five differential terminator resistor SIPs. Figure 3–11

The DWZZ* single-ended termination is dependent on the type ofStorageWorks device used:

BA350 Ensure that the DWZZA-AA or DWZZA-VA single-endedtermination jumper, J2 is installed.

BA353 For a DWZZA-AA, ensure that the single-ended SCSItermination jumper, J2 is installed.

For the DWZZA-VA, remove the single-ended SCSI terminationjumper, J2 and install terminator 12-37004-04 on the BA353SCSI input connector.

BA356 Ensure that the DWZZB-VW single-ended termination jumpers,W1 and W2 are installed.

4 Install a BN21W-0B "Y" cable or H885-AA tri-link connector on eachKZPSA in the configuration and on the differential end of the DWZZA orDWZZB.

5 Install an H879-AA terminator on the two tri-link connectors or "Y"cables attached to the two adapters or the device that will be on the endsof the shared bus.

6 Connect the other "Y" cables or tri-link connectors together withBN21K or BN21L cables. You will need one cable for each KZPSA inthe configuration. Daisy-chain from one adapter or device to the nextkeeping the "Y" cable or tri-link connector with the installed terminatorson the ends of the bus.

7 If you are using a DWZZA-VA, install it in slot 0 of the BA350 or anyBA353 slot. Install a DWZZB-VW in slot 0 of a BA356.

8 If you are using a BA350, ensure that the BA350 terminator and jumperare both installed.

Figure 3–8

(continued on next page)

3–62 Configuring TruCluster Available Server Hardware

Page 137: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–27 (Cont.) Setting Up an Available Server Configuration Using KZPSA Adapters and aBA350, BA353, or BA356

Step Action Refer to:

For a BA353 with a DWZZA-VA, ensure that terminator 12-37004-04 isinstalled on the BA353 SCSI input connector.

Figure 3–9

For a BA356, ensure that the SCSI jumper is installed. Figure 3–10

9 If you are using a DWZZA-AA, connect a BN21R or BN23G cablebetween the DWZZA-AA single-ended connector and the BA350 JA1input connector or the BA353 SCSI input connector.

Figure 3–28 Available Server Configuration with Two AlphaServer2100 Systems Using KZPSA PCI to SCSI Adapters with aBA350

1

2

3

4

5

T

T

T T

AlphaServer 2100

BA350

Network Interface

DWZZA−VAWith H885−AATri−linkConnectorand H879−AATerminator

ZKOX−5481−05−RGS

BN21K orBN21LCable

BN21W−0B"Y" Cable

BN21W−0B"Y" Cable

BN21K orBN21LCable

H879−AATerminator

Table 3–28 shows how to use KZPSA adapters in an AvailableServer configuration with an HSZ40. An example hardwareconfiguration is shown in Figure 3–29.

Table 3–23 shows the hardware components needed forconfigurations using KZPSA PCI to SCSI adapters (or KZTSATURBOchannel to SCSI adapters) and an HSZ40.

Table 3–28 Setting Up an Available Server Configuration Using KZPSA Adapters and an HSZ40

Step Action Refer to:

1 Install a KZPSA PCI to SCSI bus adapter in the PCI slot correspondingto the logical bus to be used for the shared SCSI bus.

KZTSA SCSI StorageAdapter Installationand User’s Guide

(continued on next page)

Configuring TruCluster Available Server Hardware 3–63

Page 138: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–28 (Cont.) Setting Up an Available Server Configuration Using KZPSA Adapters andan HSZ40

Step Action Refer to:

Remove the KZPSA internal termination resistors, Z1, Z2, Z3, Z4, andZ5.

Figure 3–30

2 Use the show config , show device , and show pk#* console commandsto display the installed devices and information about the KZPSAs onthe AlphaServer 1000, 2000, or 2100 systems.

Example 3–10

If necessary, boot from the Alpha Systems Firmware Update CD–ROMand update the KZPSA firmware.

Refer to the FirmwareRelease notes for theapplicable system.

Set the KZPSA SCSI bus ID and bus speed as necessary for thisconfiguration.

Example 3–11

3 Install a BN21W-0B "Y" cable or H885-AA tri-link connector on theHSZ40 input connector and each KZPSA in the configuration.

4 Install an H879-AA terminator on the two tri-link connectors or "Y"cables attached to the two adapters or the device that will be on the endsof the shared bus.

5 Connect the "Y" cables or tri-link connectors to each other with BN21Kor BN21L cables. You will need one cable for each KZPSA in theconfiguration. Daisy-chain from one device to the next, making sure thatyou keep the "Y" cables or tri-link connectors with installed terminatorsat the end of the bus.

3–64 Configuring TruCluster Available Server Hardware

Page 139: Truclu Ase

Configuring TruCluster Available Server Hardware

Figure 3–29 Available Server Configuration with Two AlphaServer2100 Systems Using KZPSA PCI to SCSI Adapters withan HSZ40

T

T T

AlphaServer 2100

Network Interface

ZKOX−5481−08−RGS

T

HSZ40 With DECRAID Subsystem

BN21W−0B"Y" Cable

BN21K orBN21LCable

BN21K orBN21LCable

BN21W−0B"Y" Cable

H879−AATerminator

H885−AATri−linkConnectorWithH879−AATerminator

Figure 3–30 KZPSA Termination Resistor Locations

ZKOX−3927−09−RGS

Z1, Z2, Z3, Z4, and Z5 Terminators

Use the show config and show device console commands shownin Example 3–10 to display information about installed deviceson an AlphaServer 1000, 2000, or 2100 system. The show configcommand shows you which slots the KZPSAs are installed in, andtheir SCSI IDs, but does not indicate the hardware or firmwarerevision. Although the show device output does not call outthe KZPSA by name, it provides the hardware and firmwarerevisions.

Configuring TruCluster Available Server Hardware 3–65

Page 140: Truclu Ase

Configuring TruCluster Available Server Hardware

Unfortunately, neither the show config nor the show devicecommand provide the KZPSA bus speed. Use the show pk#*console command to determine the bus speed. The number sign(#) is the letter designation for the KZPSA from the show config orshow device command output. Example 3–10 shows the results forpkb0 .

Example 3–10 Displaying Devices on AlphaServer 1000, 2000 or 2100 Systems

>>> show config 1Digital Equipment Corporation

AlphaServer 2100 4/200

SRM Console X3.10-3020 VMS PALcode X5.48-91, OSF PALcode X1.35-59

Component Status Module ID

CPU 0 P B2020-AA DECchip (tm) 21064-3CPU 1 P B2020-AA DECchip (tm) 21064-3

Memory 0 P B2021-BA 64 MBMemory 1 P B2021-BA 64 MB

I/O B2110-AAdva0.0.0.1000.0 RX26

SLOT Option Hose 0, Bus 0, PCI0 DECchip 21040-AA ewa0.0.0.0.0 08-00-2B-E2-7C-811 NCR 53C810 pka0.7.0.1.0 SCSI Bus ID 7

dka0.0.0.1.0 RZ28dka100.1.0.1.0 RZ28dka600.6.0.1.0 RRD43

2 Intel 82375EB Bridge to Bus 1, EISA

6 DEC KZPSA pkb0.4.0.6.0 SCSI Bus ID 4 2dkb0.0.0.6.0 HSZ40-Bxdkb100.1.0.6.0 RZ28dkb300.3.0.6.0 RZ28B

8 DEC KZPSA pkc0.7.0.8.0 SCSI Bus ID 7 3>>> show device 4dka0.0.0.1.0 DKA0 RZ28 D41Cdka100.1.0.1.0 DKA100 RZ28 D41Cdka600.6.0.1.0 DKA600 RRD43 1084dkb0.0.0.6.0 DKB0 HSZ40-Bx V21Zdkb100.1.0.6.0 DKB100 RZ28 D41Cdkb300.3.0.6.0 DKB300 RZ28B 0006dva0.0.0.1000.0 DVA0 RX26ewa0.0.0.0.0 EWA 08-00-2B-E2-7C-81pka0.7.0.1.0 PKA0 SCSI Bus ID 7pkb0.4.0.6.0 PKB0 SCSI Bus ID 4 C01 A04 5pkc0.7.0.8.0 PKC0 SCSI Bus ID 7 C01 A04 6>>> show pkb* 7pkb0_fast 1 8pkb0_host_id 4 9pkb0_termpwr 1 1 0>>>

3–66 Configuring TruCluster Available Server Hardware

Page 141: Truclu Ase

Configuring TruCluster Available Server Hardware

1 Use the show config command to show the systemconfiguration.

2 The first KZPSA available for Available Server is pkb0 , whichhas a SCSI ID of 4.

3 This system has a second KZPSA, pkc0 , which has SCSI ID 7.

4 Use the show device command to get more information.

5 KZPSA pkb0 is hardware revision C01 and has revision A04firmware.

6 KZPSA pkc0 also has hardware revision C01 and revision A04firmware.

7 Use the show pkb* command to show all variables set forKZPSA pkbo .

8 The KZPSA bus speed is fast (1 = fast, 0 = slow).

9 The KZPSA SCSI ID is 4.

1 0 The KZPSA is generating termination power.

Setting KZPSASCSI ID andBus Speed

If the SCSI ID is not correct, or if it was reset to 7 by thefirmware update utility, or you need to change the KZPSA speed,use the set console command.

Use the set command with the following format to set the SCSIbus ID:

set pk n0_host_id #

The n specifies the KZPSA ID, which you obtain from the showdevice console command. The number sign (#) is the SCSI bus IDfor the KZPSA.

Use the set command with the following format to set the busspeed: set pk n0_fast #

The number sign (#) specifies the bus speed. Use a 0 for slow anda 1 for fast.

Example 3–11 shows how to determine the present SCSI ID andbus speed, then set the KZPSA SCSI ID to 5 and the bus speed tofast for pkc0 .

Configuring TruCluster Available Server Hardware 3–67

Page 142: Truclu Ase

Configuring TruCluster Available Server Hardware

Example 3–11 Setting KZPSA SCSI ID and Bus Speed

>>> show pkc0_host_id 1

7 1>>> show pkc0_fast 2

0 2>>>>>> set pkc0_host_id 5 3

>>> set pkc0_fast 1 4

>>> show pkc0_host_id 5

5 5>>> show pkc0_fast 6

1 6>>>

1 Display the present SCSI ID.

2 Display the present bus speed, which is slow (pkc0_fast is 0).

3 Set the SCSI ID to 5.

4 Set the bus speed to fast.

5 Verify that the SCSI ID is now 5.

6 Verify that the bus speed is now fast (pkc0_fast is 1).

Setting Upan AvailableServerConfigurationwith MixedAdapter Typesand a BA350,BA353, orBA356

This section describes how to install an Available Serverhardware configuration consisting of multiple host adapters whichare not all the same type, using a BA350, BA353, or storageexpansion unit. For instance, you may have an Available Serverconfiguration consisting of two DEC 3000 Model 500 systems, onewith a PMAZC TURBOchannel SCSI controller and the otherwith a KZTSA TURBOchannel SCSI adapter, and an AlphaServer2100 with a KZPSA PCI SCSI adapter.

Table 3–29 provides the steps necessary to install the hardwarefor a mixed configuration with a BA350, BA353, or BA356. Notethat you will be referring to steps in previous tables for hostadapter installation and setup.

Table 3–30 provides a list of the hardware necessary for amixed configuration with BA350, BA353, or BA356. Figure 3–31provides an illustration of a sample mixed configuration with aBA350 storage expansion unit.

Table 3–29 Setting Up an Available Server Configuration with Mixed Host Adapters and aBA350, BA353, or BA356

Step Action Refer to:

1 For each system with a:

(continued on next page)

3–68 Configuring TruCluster Available Server Hardware

Page 143: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–29 (Cont.) Setting Up an Available Server Configuration with Mixed Host Adapters anda BA350, BA353, or BA356

Step Action Refer to:

PMAZC Table 3–14 Steps 1,2, and 3

KZTSA Table 3–20 Steps 1,2

KZMSA Table 3–25 Step 1

KZPSA Table 3–27 Steps 1,2

2 You will need one DWZZA-AA for each system with a PMAZCTURBOchannel SCSI controller or KZMSA SCSI adapter and a DWZZAfor the BA350 or BA353. You can use a DWZZA-VA for the BA350 orBA353. Use a DWZZB-VW for a BA356.

For each DWZZ*, remove the five differential terminator resistor SIPs. Figure 3–11

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for that end of the single-ended SCSI bus segment.

Figure 3–11

For a DWZZA-VA that will be installed in a:

BA350 Ensure that the single-ended SCSI termination jumper, J2, isinstalled.

BA353 Remove the single-ended SCSI termination jumper, J2.

Install terminator 12-37004-04 on the BA353 SCSI input connector.

For a DWZZB-VW that will be installed in a BA356, ensure that thesingle-ended termination jumpers, W1 and W2 are installed.

3 For each single-ended host adapter, install a BN21R or BN23G cablebetween the host adapter and the single-ended connector on a DWZZA-AA.

4 Install a BN21W-0B "Y" cable or an H885-AA tri-link connector on thedifferential connector of each DWZZ* and on the connector for eachdifferential host adapter.

5 Install an H879-AA terminator on the two "Y" cables or tri-link connectorson the two devices that will be on the ends of the shared bus.

6 Connect the other "Y" cables or tri-link connectors together with BN21K orBN21L cables. You will need one cable for each system in the configuration.Daisy-chain from one host adapter or device to the next, keeping the "Y"cable or tri-link connectors with the installed terminators on the ends of thebus.

7 If you are using a DWZZA-AA for the connection to the BA350 or BA353,connect a BN21R or BN23G cable between the DWZZA-AA single-endedconnector and BA350 connector JA1 or the BA353 SCSI input connector.

If you are using a DWZZA-VA, install it in slot 0 of the BA350 or any BA353slot.

Install a DWZZB-VW in slot 0 of a BA356.

Configuring TruCluster Available Server Hardware 3–69

Page 144: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–30 Hardware Needed for a Mixed Adapter Available Server Configuration with BA350or BA353

Number ofHost Adapters 1

BN21R or BN23GSCSI Cables

Single-ended Differential

DWZZA-AAorDWZZA-VAorDWZZB-VW2

BN21W-0B"Y" CablesorH885-AATri-linkConnectors

H789-AATerminators

BN21Kor BN21LCables

Using aDWZZA-VA3 orDWZZB-VW

NoDWZZA-VA orDWZZB-VW

1 1 2 3 2 2 1 2

2 1 3 4 2 3 2 3

1 2 2 4 2 3 1 2

1You can have only three systems in an Available Server configuration that include a PMAZC TURBOchannel SCSIcontroller or KZMSA XMI to SCSI adapter.2One of the DWZZ*s can be a DWZZA-VA or DWZZB-VW. The rest are DWZZA-AA.3If you use a DWZZA-VA and a BA353 storage expansion unit, you must install terminator 12-37004-04 on the BA353SCSI input connector.

Figure 3–31 Mixed Host Adapter Available Server Configuration withBA350 Storage Expansion Unit

T

1

2

3

4

T

Network Interface

Tri−link Connector

PMAZC

BN21Ror BN23GCable

DWZZA−VA with H885−AA

H885−AA Tri−link Connectorwith H879−AA Terminator

T

T

KZTSA

DEC 3000 Model 500AlphaServer 2100

BN21W−0B"Y" Cable

DEC 3000 Model 500

T T

T

H879−AATerminator

BN21K orBN21LCable

BN21K orBN21LCable

BA350

H885−AATri−linkConnector

DWZZA−AA

ZKOX−5481−09−RGS

Setting Upan AvailableServerConfigurationwith MixedAdapter Typesand an HSZ40

This section describes how to install an Available Server hardwareconfiguration consisting of multiple host adapters which are notall the same type, using an HSZ40 DEC RAID subsystem.For instance, you may have an Available Server configurationconsisting of two DEC 3000 Model 500 systems with PMAZCTURBOchannel SCSI controllers and an AlphaServer 2100 with aKZPSA PCI SCSI adapter.

3–70 Configuring TruCluster Available Server Hardware

Page 145: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–31 provides the steps necessary to install the hardwarefor a mixed configuration with an HSZ40.

Table 3–32 provides a list of the hardware necessary for a mixedconfiguration with an HSZ40, and Figure 3–32 provides anillustration of a sample mixed configuration with an HSZ40.

Table 3–31 Setting Up an Available Server Configuration with Mixed Host Adapters and anHSZ40

Step Action Refer to:

1 For each system with a:

PMAZC Table 3–14 Steps 1, 2,and 3

KZTSA Table 3–20 Steps 1, 2

KZMSA Table 3–25 Step 1

KZPSA Table 3–27 Steps 1, 2

2 You will need one DWZZA-AA for each single-ended host adapter. Figure 3–11

For each DWZZA-AA, ensure that the single-ended SCSI terminationjumper, J2, is installed to provide termination for that end of the single-ended SCSI bus segment.

Remove the five differential terminator resistor SIPs from each DWZZA.

3 Connect a BN21W-0B "Y" cable or an H885-AA tri-link connector to the:

• HSZ40 input connector

• Differential end of each DWZZA-AA

• Connector for each differential host adapter

4 Connect an H879-AA terminator to one leg of the BN21W-0B "Y" cableor one side of the H885-AA tri-link connector for the host adapters (orHSZ40) that will be on the ends of the shared bus.

5 Connect a BN21R or BN23G cable between each single-ended host adapterand the single-ended connector of a DWZZA-AA.

6 Connect a BN21K or BN21L cable between the BN21W-0B "Y" (or H885-AA tri-link connector) of the DWZZA-AAs and HSZ40. The number ofthese cables will be the same as the number of systems on this sharedbus in the Available Server configuration. Make sure that you create adaisy-chain while keeping the terminators on both ends of the shared bus.

Configuring TruCluster Available Server Hardware 3–71

Page 146: Truclu Ase

Configuring TruCluster Available Server Hardware

Table 3–32 Hardware Needed for a Mixed Adapter Available Server Configuration with anHSZ40

Number of Host Adapters 1

Single-ended Differential DWZZA-AA

BN21W-0B"Y" CablesorH885-AATri-linkConnectors

H789-AATerminators

BN21Kor BN21LCables

BN21R orBN23G Cables

1 1 1 3 2 2 1

2 1 2 4 2 3 2

1 2 1 4 2 3 1

1You can have only three systems in an Available Server configuration that include a PMAZC SCSI controller or KZMSAXMI to SCSI adapter.

Figure 3–32 Mixed Host Adapter Available Server Configuration withan HSZ40

T TT

T T

1

3 2

4

5 T

Network Interface

DWZZA−AA

H885−AA Tri−link Connectorwith H879−AA Terminator

PMAZC PMAZC

DWZZA−AA

H885−AA Tri−link Connector

DEC 3000 Model 500

HSZ40 and DECRAID Subsystem

BN21R orBN23GCable

ZKOX−5481−10−RGS

BN21K orBN21LCable

1

3 2

4

5 T

BN21K orBN21L

T

AlphaServer 2100

T

H879−AATerminator

H885−AATri−linkConnector

BN21K orBN21LCable

PMAZC

BN21W−0B"Y" Cable

3–72 Configuring TruCluster Available Server Hardware

Page 147: Truclu Ase

Summary

Summary

ExaminingTruClusterAvailableServer GeneralHardwareConfigurationRules andRestrictions

Some of the more important general rules include:

• A maximum of four systems is allowed on each shared bus ina TruCluster Available Server configuration.

• A maximum of eight devices (SCSI host adapters and disks)are allowed on each bus.

• All the systems in a TruCluster Available Server configurationmust be on the same network subnet.

• The SCSI host adapter must be installed in a logicallyequivalent I/O bus slot on each system. When the kernelboots, the SCSI bus number is determined by the order inwhich the SCSI host adapters are installed in I/O bus slots,starting with the first slot.

• For dual-ported SCSI host adapters, the shared SCSI busmust be on equivalent ports, for instance KZMSA channel 0(1) and PMAZC port A (B).

• You must keep the shared SCSI within the lengthrequirements.

• Improper termination will eventually cause problems.

• You should use DWZZA-AA signal converters when usingPMAZC SCSI host adapters in fast mode.

• Ensure that DWZZA-AA star washers are in place on all fourscrews that hold the cover in place after removing terminatorresistor SIPs.

DeterminingAvailableServerHardwareComponents

• Only certain systems and SCSI controllers are supported fora TruCluster Available Server environment, and unsupporteddevices should not be used.

• When connecting the devices to form the shared SCSI bus, youmust use the correct cables and keep within the cable lengthlimits for your particular configuration.

• Each SCSI bus segment must be properly terminated or youwill have problems.

• Each Available Server member system or storage deviceshould be connected to the shared SCSI bus using a "Y" cableor H885-AA tri-link connector to allow removal of a devicewithout affecting SCSI bus operation.

Configuring TruCluster Available Server Hardware 3–73

Page 148: Truclu Ase

Summary

ConfiguringTruClusterAvailableServerHardware

Some of the most important things to consider when preparing fora TruCluster Available Server configuration are:

• How many systems are in the Available Server configuration?

• What kind of controllers do you have?

• What are the constraints dictated by the system placement asto the length of the SCSI buses?

• Are you using single-ended or differential SCSI controllers?

• Do you need to use DWZZA signal converters?

• Is the SCSI controller dual-ported?

• What kind of device is being used to house the disk devices?

• Ensure that the single-ended SCSI bus termination is correctbetween a single-ended controller and a DWZZA signalconverter.

Check that SCSI controller internal termination is presentfor a single ended controller connected to a DWZZA.

Check that the DWZZA single-ended termination jumper,J2, is installed.

• Ensure that the single-ended SCSI bus termination is correctbetween a DWZZ* (either a DWZZA-AA, DWZZA-VA, orDWZZB-VW) and a BA350 or BA356.

The single-ended terminator jumper, J2, must be installedin the DWZZA.

The single-ended terminator jumpers, W1 and W2, must beinstalled in the DWZZB.

The BA350 terminator must be installed.

• Ensure that the single-ended SCSI bus termination is correctbetween a DWZZA-AA and a BA353.

The single-ended terminator jumper, J2, must be installedin the DWZZA.

The BA353 has internal termination for the other end ofthe single-ended bus.

• Ensure that the single-ended SCSI bus termination is correctbetween a DWZZA-VA and a BA353.

The single-ended terminator jumper, J2, must be removedfrom the DWZZA-VA.

Terminator 12-37004-04 must be installed on the BA353SCSI input connector.

• If you have a BN21V-0B, BN21W-0B, or H885-AA tri-linkconnector attached to the SCSI bus controller:

The controller internal termination has to be removed.

3–74 Configuring TruCluster Available Server Hardware

Page 149: Truclu Ase

Summary

A terminator has to be installed on the "Y" cables ortri-link connector for those two controllers or the controllerand device that are on the end of the SCSI bus.

• If you are using a BA350 or BA356, the internal jumper mustbe installed.

• If you are using a DWZZA-VA, it must be installed in slot 0 ofa BA350 (any slot for a BA353).

• If you are using a DWZZB-VW, it must be installed in slot 0 ofa BA356.

• If you have installed a BN21W-0B "Y" cable or an H885-AAtri-link connector on the differential end of a DWZZA-AA,DWZZA-VA, or DWZZB-VW you must remove the differentialtermination from the DWZZ*.

• SCSI bus IDs must be properly set.

• SCSI bus speeds must be properly set.

• If you are using a PMAZC or KZMSA, the shared bus must beon the same port (channel).

• The shared SCSI bus must be on the same logical bus.

• The correct firmware must be installed in the SCSI buscontroller.

Configuring TruCluster Available Server Hardware 3–75

Page 150: Truclu Ase

Exercises

Exercises

ExaminingTruClusterAvailableServer GeneralHardwareConfigurationRules andRestrictions:Exercise

1. You must use a DWZZA signal converter with a KZMSAbecause:

a. The KZMSA has only one channel

b. The KZMSA uses the differential mode of signaltransmission

c. You cannot remove the KZMSA internal terminators

d. The KZMSA operates only on a wide SCSI bus

2. You should use a DWZZA signal converter with aPMAZC SCSI host adapter because:

a. The use of the DWZZA increases the maximum sharedSCSI bus length

b. The PMAZC is a dual-ported SCSI host adapter

c. The PMAZC operates as either a fast or slow SCSI hostadapter

d. Signal conversion is necessary to connect the PMAZC to aBA350 storage box

3. The maximum length of the shared SCSI bus in anAvailable Server configuration for Version 1.4 is:

a. 3 meters

b. 6 meters

c. 25 meters

d. 31 meters

4. How many systems may be used in an Available Serverconfiguration for Version 1.4?

a. 1

b. 2

c. 3

d. 4

3–76 Configuring TruCluster Available Server Hardware

Page 151: Truclu Ase

Exercises

ExaminingTruClusterAvailableServer GeneralHardwareConfigurationRules andRestrictions:Solution

1. c You must use a DWZZA signal converter with a KZMSAbecause:

a. The KZMSA has only one channel

b. The KZMSA uses the differential mode of signaltransmission

c. You cannot remove the KZMSA internal terminators

d. The KZMSA operates only on a wide SCSI bus

2. a You should use a DWZZA signal converter with aPMAZC SCSI host adapter because:

a. The use of the DWZZA increases the maximum sharedSCSI bus length

b. The PMAZC is a dual-ported SCSI host adapter

c. The PMAZC operates as either a fast or slow SCSI hostadapter

d. Signal conversion is necessary to connect the PMAZC to aBA350 storage box

3. d The maximum length of the shared SCSI bus in aAvailable Server configuration for Version 1.4 is:

a. 3 meters

b. 6 meters

c. 25 meters

d. 31 meters

4. d How many systems may be used in a Available Serverconfiguration for Version 1.4?

a. 1

b. 2

c. 3

d. 4

DeterminingAvailableServerHardwareComponents:Exercise

1. Which cable do you attach to a single-ended device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

Configuring TruCluster Available Server Hardware 3–77

Page 152: Truclu Ase

Exercises

2. Which cable do you attach to a differential device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

3. Which of the following is a signal converter thatcontains its own power supply?

a. DWZZA-AA

b. DWZZA-VA

c. DWZZB-VW

d. H885-AA

4. Which of the following could you use in place of aBN21W-0B?

a. H8574-A

b. H8660-AA

c. H879-AA

d. H885-AA

DeterminingAvailableServerHardwareComponents:Solution

1. c Which cable do you attach to a single-ended device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

2. d Which cable do you attach to a differential device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

3–78 Configuring TruCluster Available Server Hardware

Page 153: Truclu Ase

Exercises

3. a Which of the following is a signal converter thatcontains its own power supply?

a. DWZZA-AA

b. DWZZA-VA

c. DWZZB-VW

d. H885-AA

4. d Which of the following could you use in place of aBN21W-0B?

a. H8574-A

b. H8660-AA

c. H879-AA

d. H885-AA

ConfiguringTruClusterAvailableServerHardware:Exercise

Install the hardware necessary to create a two-shared SCSIAvailable Server configuration with the following hardware:

• Two AlphaServer 2100 systems, each with two KZPSA PCI toSCSI adapters to provide two shared buses on each system

• One shared bus will have a BA350 with two RZ26L disks

• The other shared bus will have an HSZ40 RAID controllerwith four RZ26L disks

ConfiguringTruClusterAvailableServerHardware:Solution

The hardware needed is:

• 2 AlphaServer 2100 systems

• 4 KZPSA PCI to SCSI adapters

• 1 BA350 Storage Works Enclosure

• 1 HSZ40 RAID controller

• 6 RZ26L 1.05 GB SCSI disk drives

• 4 BN21K (or BN21L) SCSI cables

• 6 H885-AA tri-link connectors, 6 BN21W-0B "Y" cables or anycombination of the two to make 6 total

• 4 H879-AA terminators

Note

Each system needs an Ethernet controller to providenetwork communications.

The solution will be performed in three phases:

1. Install the KZPSA SCSI adapters, two to a system.

Configuring TruCluster Available Server Hardware 3–79

Page 154: Truclu Ase

Exercises

2. Install the remaining hardware for the shared SCSI bus withthe BA350.

3. Install the remaining hardware for the shared SCSI bus withthe HSZ40.

Table 3–33 outlines the steps necessary to install the KZPSASCSI adapters in the systems.

Table 3–33 Phase 1: Installing the KZPSA SCSI Adapters

Step Action Refer to:

1 Remove the KZPSA internal termination resistors, Z1, Z2, Z3,Z4, and Z5.

Figure 3–30

2 Install two KZPSA PCI to SCSI bus adapters in the PCI slotcorresponding to the logical bus to be used for the shared SCSIbus on each AlphaServer 2100.

KZPSA PCI-to-SCSI StorageAdapter

3 Power up the systems and use the show config , show device ,and show pk#* console commands to display the installeddevices and information about the KZPSAs on the AlphaServer2100.

Example 3–10

4 If necessary, update the KZPSA firmware. Refer to the Firmwre ReleaseNotes for the AlphaServer 2100system

5 Set the SCSI bus ID of both KZPSAs on one system to 6 and to 7on the other system, and set the bus speed to fast on all KZPSASCSI adapters.

Example 3–11

Table 3–34 provides the steps necessary to install the hardwarefor the shared SCSI bus with the BA350.

Table 3–34 Creating a Shared Bus with the BA350

Step Action Refer to:

1 Remove the DWZZA-VA five differential terminator resistor SIPs. Figure 3–11

2 Ensure that the single-ended termination jumper, J2 is installedin the DWZZA-VA.

Figure 3–11

3 Install the DWZZA-VA in slot 0 of the BA350. Figure 3–8

4 Ensure that the BA350 terminator and jumper are bothinstalled.

Figure 3–8

5 Install a RZ26L SCSI disk in BA350 slots 1 and 2.

6 Install an H885-AA tri-link connector on the KZPSA in eachsystem that will be on this shared SCSI bus.

7 Install an H885-AA tri-link connector on the differential end ofthe DWZZA-VA.

(continued on next page)

3–80 Configuring TruCluster Available Server Hardware

Page 155: Truclu Ase

Exercises

Table 3–34 (Cont.) Creating a Shared Bus with the BA350

Step Action Refer to:

8 Install an H879-AA terminator on the two tri-link connectorsattached to the two KZPSA SCSI adapters as they will be on theends of the shared bus.

9 Install a BN21K cable between the open connector on the tri-linkconnectors on each of the two KZPSAs to the tri-link connectoron the DWZZA-VA.

Table 3–35 provides the steps necessary to install the hardwarefor the shared SCSI bus with the HSZ40.

Table 3–35 Creating a Shared Bus with the HSZ40

Step Action

1 Install an H885-AA tri-link connector on the HSZ40 input connector andboth of the KZPSAs for this shared bus.

2 Install an H879-AA terminator on the two tri-link connectors attached tothe two KZPSA SCSI adapters as they will be on the ends of the sharedbus.

3 Install BN21K cables between the tri-link connectors on the KZPSAsand the tri-link connector on the HSZ40.

Configuring TruCluster Available Server Hardware 3–81

Page 156: Truclu Ase
Page 157: Truclu Ase

4Installing TruCluster Software

Installing TruCluster Software 4–1

Page 158: Truclu Ase

About This Chapter

About This Chapter

Introduction The TruCluster Available Server environment uses two tofour member systems, running TruCluster Software, as highlyavailable servers. This chapter overviews the installation ofTruCluster Software.

Objectives To set up and manage TruCluster Available Server servers, youshould be able to:

• List TruCluster Available Server system prerequisites

• Determine the correct installation procedure

• Install TruCluster Software on all member systems

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Version 1.4 SPD

• TruCluster Available Server Software Release Notes

• TruCluster Available Server Software Hardware Configurationand Software Installation

• Digital UNIX Installation Guide

• Digital UNIX Network Administration

4–2 Installing TruCluster Software

Page 159: Truclu Ase

Performing Preliminary Setup Tasks

Performing Preliminary Setup Tasks

Overview Before you install the software for TruCluster Available ServerSoftware, you must determine if you are properly prepared for theinstallation.

Hardware Before installing the TruCluster Software, you should read theSPD and release notes and verify any system prerequisites.

Make sure the hardware is supported by the current version ofTruCluster Available Server Software Version 1.4. The hardwareshould be set up and tested. Use console commands to make sureall the devices on the shared bus(es) are recognized.

SubsetsRequired forTruClusterAvailableServerOperation

There are several required Digital UNIX and TruCluster Softwaresubsets which must be installed on each of the systems that willbe Available Server Environment (ASE) members.

Digital UNIX Version 4.0A must be installed, with the followingoptional subsets:

• OSFCLINET405: Basic Networking Services

• OSFPGMR405: Standard Programmer Commands

• OSFCMPLRS405: Compiler Back End

TruCluster Available Server Software Version 1.4 must beinstalled:

• TCRCOMMON140: TruCluster Common Components

• TCRASE140: TruCluster Available Server Software

• TCRCONF140: TruCluster Configuration Software

If you want to use the Cluster Monitor, you will also need toinstall the following subsets. Note that the C++ Class SharedLibraries and CDE Minimum Runtime Environment subsets mustbe installed before TruCluster Available Server Software Version1.4 is installed.

• CXLSHRDA405: C++ Class Shared Libraries

• OSFCDEMIN405: CDE Minimum Runtime Environment

• TCRCMS140: TruCluster Cluster Monitor

You may also want to install the following optionalsubsets/products:

• NFS utilities

• POLYCENTER Advanced File System utilities

• Logical Storage Manager

• Networker Version 3.2 client software

Installing TruCluster Software 4–3

Page 160: Truclu Ase

Performing Preliminary Setup Tasks

BeforeInstallingTruClusterSoftware

Some installation requirements and restrictions you should beaware of before you install TruCluster Software are:

• The TruCluster Software subsets for TruCluster AvailableServer Software Version 1.4 can be installed only on systemsrunning the Digital UNIX Version 4.0A operating system.Therefore, you must upgrade the Digital UNIX operatingsystem, and system and SCSI interface adapter firmwarebefore you install TruCluster Software.The route you choose to get to Digital UNIX Version 4.0Adepends upon the current operating system version and theversion of DECsafe Available Server. You may want to performan install update, or you may decide to perform a completeinstallation.

• Do not use a disk connected to a shared SCSI as the rootinstallation disk. It may be listed in the root installationmenu.

• TruCluster Available Server relies on synchronized time on allmember systems; it needs accurate timestamps for databaseversions. You must be running a distributed time service suchas the Network Time Protocol (NTP) daemon (xntpd ).

Note

If you are using NTP to synchronize system times, ensurethat the setting for the NTP version in the /etc/ntp.conffile on client systems matches the actual version of NTPrunning on the server.Digital UNIX Version 4.0A supports NTP Version 3. TheDigital UNIX update installation procedure sets the valuefor the NTP version in the /etc/ntp.conf file to Version 3if an entry for the setting does not exist. If your serveris running NTP Version 2, change the setting in the/etc/ntp.conf file to:

server alpha version 2

• You must register the ASE-OA PAK before you installTruCluster Available Server Software Version 1.4 or you willnot be allowed to install the TruCluster Software subsets.

• Do not install TruCluster Available Server Software Version1.4 into a dataless environment.

• Use console commands to determine which SCSI buses yourshared disks are on before you start the TruCluster Softwareinstallation.

• If you generate a new system configuration file either with thedoconfig command without the -c option, or with the sizer-n command, you must run the /var/ase/sbin/ase_fix_configscript.

4–4 Installing TruCluster Software

Page 161: Truclu Ase

Performing Preliminary Setup Tasks

NetworkServices

Each member system must be included in all the member systems’/etc/hosts files. The members must be able to communicate witheach other even if the network name resolution service you maybe using becomes unavailable.

Starting with DECsafe Available Server Version 1.3, the useof multiple networks in an ASE increases the availability ofapplications and data. Configuring multiple network pathsbetween member systems reduces the chance that a membersystem will be erroneously considered unavailable, and ifa network interface fails, member systems can continue tocommunicate over another network path.

All member systems must be on the same network subnets, andall member systems must be able to access each network, soclients can access services from any member.

You must set up the local network on each member system (seenetsetup ). You should set up NIS or BIND if you intend to use itfor network name resolution on your network (see ypsetup andbindsetup ). You should set up NFS and start the daemons if youintend to use NFS services (see nfssetup ). You should set up mailso root can receive alert messages from TruCluster Software (seemailsetup ).

Installing TruCluster Software 4–5

Page 162: Truclu Ase

Preparing to Install TruCluster Software

Preparing to Install TruCluster Software

Overview The TruCluster Software installation procedure provides multiplechoices of how the software will be installed. It depends uponwhether DECsafe Available Server is installed or not, and if it isinstalled, which version is installed.

Choosing theTruClusterSoftwareInstallationProcedure

The procedure you use to install TruCluster Available ServerSoftware Version 1.4 depends upon whether you are installingTruCluster Available Server Software Version 1.4 for the firsttime, upgrading an existing version of DECsafe Available Server,or adding a new member to an existing ASE with membersystems already at TruCluster Available Server Software Version1.4.

Use your present ASE configuration to determine the installationprocedure you will use:

• Setting up an ASE for the first time: Use this procedure if youare installing TruCluster Software on systems that are notcurrently in an ASE. You will use this procedure if none of thesystems has TruCluster Software installed.

• Rolling upgrade: This procedure allows you to upgrademember systems without shutting down the ASE. You deletefrom the ASE the member system that you will upgrade,upgrade the system, then add the system back into the ASE.

• Simultaneous upgrade: This procedure requires that youshut down the ASE. Depending upon the current version ofDECsafe Available Server, you may be able to preserve theexisting ASE database.

• Adding a member system to an existing ASE: If you have anexisting ASE, you can add a new member without shuttingdown the ASE.

Figure 4–1 provides an overview of various paths an upgrade maytake for an existing DECsafe Available Server installation.

4–6 Installing TruCluster Software

Page 163: Truclu Ase

Preparing to Install TruCluster Software

Figure 4–1 Upgrade Paths for Existing DECsafe Available Server Installation

Digital UNIXVersion

3.2D or 3.2F?

Yes

No

Must beV3.2G

DECsafe AvailableServer V1.3

Simultaneous upgradeto Digital UNIX V3.2G

and ASE V1.3. You cannot reuse the database.

Simultaneous upgradeto Digital UNIX V3.2G

and ASE V1.3.Preserve the database.

DECsafe AvailableServer Versions

Prior to V1.2

Rolling upgrade toDigital UNIX V3.2G

and ASE V1.3

ASEVersionInstalled

?

ZKOX−5481−11−RGS

DECsafe AvailableServer V1.2A or V1.2

DECsafe AvailableServer notInstalled

Install Digital UNIXV4.0A andASE V1.4

Simultaneous Upgradeto Digital UNIX V4.0A

and ASE V1.4

Rolling Upgradeto Digital UNIX V4.0A

and ASE 1.4

To upgrade to TruCluster Available Server Software Version 1.4,existing DECsafe Available Server configurations must be at, orupgraded to, Digital UNIX Version 3.2G and DECsafe AvailableServer Version 1.3. Table 4–1 provides more information aboutthe upgrade paths.

Before starting any upgrade, take into consideration the presentDigital UNIX and DECsafe Available Server configuration.Determine if it is desired to provide continuous support forservices without disruption during the upgrade if at all possible.You must also consider the amount of time needed to upgradefrom one version of software to another version, to anotherversion, and so forth, and compare that time with the amountof time it would take to install Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4.

Remember that the format for the ASE configuration database(asecdb ) changed in DECsafe Available Server Version 1.3.The ASE V1.3 upgrade procedures take the database formatchange into consideration, but upgrades prior to ASE 1.3, andthe upgrade to ASE V1.4 does not. The V1.4 rolling upgradeprocedure assumes that there is no change in database format.

Installing TruCluster Software 4–7

Page 164: Truclu Ase

Preparing to Install TruCluster Software

Table 4–1 Upgrade Paths for Existing DECsafe Available Server Installation

ASEVersion

OperatingSystem Upgrade Procedure

V1.3 Digital UNIXV3.2G

Rolling upgrade to Digital UNIX Version 4.0A and TruCluster AvailableServer Software Version 1.4.

V1.3 Digital UNIXV3.2D or V3.2F

Rolling upgrade to Digital UNIX Version 3.2G and DECsafe Available ServerV1.3, then rolling upgrade to Digital UNIX Version 4.0A and TruClusterAvailable Server Software Version 1.4.

V1.2A Digital UNIXV3.2C

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe AvailableServer V1.3, then rolling upgrade to Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4. Preserve the database.Upgrade install (setld ) to Digital UNIX Version 3.2G.

V1.2 DEC OSF/1V3.2A

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe AvailableServer V1.3, then rolling upgrade to Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4. Preserve the database.

You must update install (installupdate ) to Digital UNIX V3.2C beforeyou can upgrade (setld ) to Digital UNIX 3.2G. Or, you can do a completeinstallation of Digital UNIX 3.2C then upgrade to Digital UNIX 3.2G. Youcan also do a complete installation of Digital UNIX Version 4.0A.

Note

Because of the amount of time it would take to upgradeto TruCluster Available Server Software Version 1.4 fromversions of DECsafe prior to V1.2, it is recommended thatDigital UNIX Version 4.0A and TruCluster Available ServerSoftware Version 1.4 be installed from scratch.

V1.1 DEC OSF/1 V3.0 Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe AvailableServer V1.3, then rolling upgrade to Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4. You cannot reuse thedatabase. You must update install (installupdate ) to DEC OSF/1 V3.2,then update install to Digital UNIX V3.2C before you can upgrade (setld )to Digital UNIX 3.2G. Or, you can do a complete installation of DigitalUNIX 3.2C then upgrade to Digital UNIX 3.2G. You can also do a completeinstallation of Digital UNIX Version 4.0A.

V1.0AV1.0

DEC OSF/1 V2.1DEC OSF/1 V2.0

Simultaneous upgrade to Digital UNIX Version 3.2G and DECsafe AvailableServer V1.3, then rolling upgrade to Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4. You cannot reuse thedatabase. You must update install (installupdate ) to DEC OSF/1 V3.0,then update install to DEC OSF/1 V3.2, then update install to Digital UNIXV3.2C before you can upgrade (setld ) to Digital UNIX 3.2G. Or, you can doa complete installation of Digital UNIX 3.2C then upgrade to Digital UNIX3.2G. You can also do a complete installation of Digital UNIX Version 4.0A.

Setting Up anASE for theFirst Time

If you are installing TruCluster Available Server Software Version1.4 on systems that are not currently in an ASE, perform thefollowing tasks on each system that will be an ASE member:

1. Install Digital UNIX Version 4.0A along with the appropriateutility subsets and register the associated licenses.

4–8 Installing TruCluster Software

Page 165: Truclu Ase

Preparing to Install TruCluster Software

2. Register the TruCluster Available Server Software Version 1.4software license before you install the software.

3. Use the setld -l command to load the TruCluster AvailableServer Software Version 1.4 subsets. When the subsets havebeen installed, the installation procedure starts.

4. Enter the necessary information at each of the prompts. Ifprompted, do not use a saved ASE database.

5. Rebuild the kernel. The doconfig utility is automaticallyrun. The ase_fix_config utility also runs, providing you theopportunity to renumber the shared SCSI buses.

6. Move the new kernel to the root directory and reboot thesystem.

7. Ensure that the host name and IP address for each membersystem is listed in the /etc/hosts file of each member system.

8. After all the system software (Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4) has beeninstalled, run the asemgr utility on one system and add all themember systems to the ASE, then set up the ASE services.

Performinga RollingUpgrade

A rolling upgrade allows you to upgrade one member of the ASEat a time, without having to shut down the ASE. Rolling upgradescan be performed only from:

• Digital UNIX Version 3.2D or 3.2F/DECsafe Available ServerV1.3 to Digital UNIX Version 3.2G/DECsafe Available ServerV1.3

• Digital UNIX Version 3.2G/DECsafe Available Server V1.3to Digital UNIX Version 4.0A/TruCluster Available ServerSoftware Version 1.4

To perform a rolling upgrade, you must:

• Delete one member system at a time from the ASE

• Delete the ASE software subsets

• Update the operating system on that system

• Install the ASE software subsets

• Add the system back into the ASE

The new ASE functionality is not enabled until all membersystems have been upgraded to the same versions of DECsafe/TruCluster Available Server.

As the rolling upgrade tasks depend upon the Digital UNIX/ASEversions, there are two different procedures for executing arolling upgrade. The first procedure is a rolling upgrade forsystems in an ASE with Digital UNIX Version 3.2G/DECsafeAvailable Server V1.3. The second procedure is a rolling upgradefor systems in an ASE with Digital UNIX Version 3.2C/3.2D or3.2F/DECsafe Available Server V1.2A/V1.3.

Installing TruCluster Software 4–9

Page 166: Truclu Ase

Preparing to Install TruCluster Software

Rolling Upgrade from Digital UNIX Version 3.2G/ASE V1.3 to DigitalUNIX Version 4.0A/ASE V1.4

Perform the following tasks for a rolling upgrade if the ASEmember systems are running Digital UNIX Version 3.2G andDECsafe Available Server V1.3.

1. On one member system, run the asemgr utility and delete themember system that is to be upgraded.If the system is included in the list of members favored to runthe service, according to the service’s Automatic PlacementPolicy (ASP), you cannot delete the member.

2. On the system being upgraded, use the/sbin/init.d/asemember stop command to stop runningDECsafe Available Server daemons.

3. Use the setld -i | grep ASE command to determine whichDECsafe Available Server V1.3 subsets are installed.

4. Delete the DECsafe Available Server V1.3 subsets with thesetld -d command. If desired, restore the original kernel.

5. Install update (installupdate ) Digital UNIX Version 4.0A andregister the appropriate licenses.

6. Register the TruCluster Available Server software licensebefore you install the TruCluster Available Server SoftwareVersion 1.4 software.

7. Use the Associated Products CD–ROM and use setld -l toload the TruCluster Available Server Software Version 1.4subsets.

8. Enter the necessary information at each of the prompts. Ifyou are prompted to determine if you want to use a saved ASEdatabase, answer n.

9. Rebuild the kernel, then move the new kernel to the rootdirectory.

10. Reboot the system.

11. Ensure that the host name and IP address of each existingASE member system is in the /etc/hosts file.

12. Run the asemgr utility on an existing ASE member and addthe upgraded system to the ASE.

13. Repeat the previous steps for the remaining systems in theASE.

Rolling Upgrade from Digital UNIX Version 3.2D or 3.2F/ASE V1.3 toDigital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a rolling upgrade if the ASEmember systems are running Digital UNIX Version 3.2D or 3.2Fand DECsafe Available Server V1.3.

4–10 Installing TruCluster Software

Page 167: Truclu Ase

Preparing to Install TruCluster Software

Assuming a two-member ASE, when you start the upgrade, youhave the following ASE configuration:

Node A: ASE 1.3/Digital UNIX V3.2DNode B: ASE 1.3/Digital UNIX V3.2D

1. On member system B, run the asemgr utility and deletemember system A, the member that is to be upgraded first.If the system is included in the list of members favored to runthe service, according to the service’s Automatic PlacementPolicy (ASP), you cannot delete the member. Change the ASP,then delete the member to be updated.

2. On system A, use the /sbin/init.d/asemember stop command tostop running DECsafe Available Server daemons.

3. Use the setld -i | grep ASE command to determine whichDECsafe Available Server V1.3 subsets are installed.

4. Delete the DECsafe Available Server V1.3 subsets with thesetld -d command. If desired, restore the original kernel.

5. Upgrade install (setld -l ) Digital UNIX Version 3.2G andregister the appropriate licenses.

6. Mount the Digital UNIX V3.2D and Complementary ProductsCD–ROM and use setld -l to load the DECsafe AvailableServer V1.3 subsets.

7. Enter the necessary information at each of the prompts. Ifyou are prompted to determine if you want to use a saved ASEdatabase, answer n.

8. Rebuild the kernel, then move the new kernel to the rootdirectory.

9. Reboot the system.

10. Ensure that the host name and IP address of each existingASE member system is in the /etc/hosts file.

11. Run the asemgr utility on an existing ASE member and addthe upgraded system to the ASE.What you have accomplished at this point is to upgradesystem A to ASE V1.3/Digital UNIX V3.2G.Note that this procedure was very specific about reinstallingDECsafe Available Server V1.3 and adding the upgradedsystem back into the ASE.If you had a two-member ASE, you will have, at this point:

Node A: ASE 1.3/Digital UNIX V3.2GNode B: ASE 1.3/Digital UNIX V3.2D

You may be tempted to skip reinstalling ASE 1.3 afterupgrading to Digital UNIX Version 3.2G, and instead updateinstall to Digital UNIX Version 4.0A. Do not do it. What youwould have in that case would be:

Node A: ASE 1.4/Digital UNIX V4.0A

Installing TruCluster Software 4–11

Page 168: Truclu Ase

Preparing to Install TruCluster Software

Node B: ASE 1.3/Digital UNIX V3.2DThis is an unqualified/unsupported configuration.You must reinstall ASE V1.3 and add the system back into theASE.For rolling upgrades, the supported configurations are:

Node A: ASE 1.3/Digital UNIX V3.2GNode B: ASE 1.3/Digital UNIX V3.2D

and

Node A: ASE 1.3/Digital UNIX V3.2GNode B: ASE 1.4/Digital UNIX V4.0A

12. Once the upgraded system has been made a member ofthe ASE, run the asemgr utility, preferably on the upgradedsystem, and delete another member system that is yet to beupgraded (in this case, system B).

13. On system B, use the /sbin/init.d/asemember stop command tostop running DECsafe Available Server daemons.

14. Use the setld -i | grep ASE command to determine whichDECsafe Available Server V1.3 subsets are installed.

15. Delete the DECsafe Available Server V1.3 subsets with thesetld -d command.

16. Upgrade install (setld -l ) Digital UNIX Version 3.2G andregister the appropriate licenses.

17. Install update (installupdate ) Digital UNIX Version 4.0A andregister the appropriate licenses.

18. Use the Associated Products CD–ROM and use setld -l toload the TruCluster Available Server Software Version 1.4subsetsWhat you now have, with the second system upgraded is:

Node A: ASE 1.3/Digital UNIX V3.2GNode B: ASE 1.4/Digital UNIX V4.0A

19. Enter the necessary information at each of the prompts. Ifyou are prompted to determine if you want to use a saved ASEdatabase, answer n.

20. Rebuild the kernel, then move the new kernel to the rootdirectory.

21. Reboot the system.

22. Run the asemgr utility on an existing ASE member (Node A inour case) and add the upgraded system to the ASE.

23. On the member system just updated (B), run the asemgr utilityand delete member system A to complete the upgrade for thatsystem.

24. On system A, use the /sbin/init.d/asemember stop command tostop running DECsafe Available Server daemons.

4–12 Installing TruCluster Software

Page 169: Truclu Ase

Preparing to Install TruCluster Software

25. Use the setld -i | grep ASE command to determine whichDECsafe Available Server V1.3 subsets are installed.

26. Delete the DECsafe Available Server V1.3 subsets with thesetld -d command.

27. Install update (installupdate ) Digital UNIX Version 4.0A andregister the appropriate licenses.

28. Use the Associated Products CD–ROM and use setld -l toload the TruCluster Available Server Software Version 1.4subsetsWhat you have for an ASE configuration is:

Node A: ASE 1.4/Digital UNIX V4.0ANode B: ASE 1.4/Digital UNIX V4.0A

29. Enter the necessary information at each of the prompts. Ifyou are prompted to determine if you want to use a saved ASEdatabase, answer n.

30. Rebuild the kernel, then move the new kernel to the rootdirectory.

31. Reboot the system.

32. Run the asemgr utility on an existing ASE member (Node B)and add the upgraded system to the ASE.

SimultaneousUpgrade

If you are willing to shut down the ASE, you can perform asimultaneous upgrade on all the member systems of an existingASE.

You can use the simultaneous upgrade and preserve the ASEdatabase if you are currently running DECsafe Available ServerVersion 1.2 or 1.2A. For versions of DECsafe prior to 1.2, youcannot preserve the ASE database.

For configurations previous to DECsafe Available ServerVersion 1.2, it is recommended that, instead of doing multiplesimultaneous upgrades to get to Digital UNIX Version 4.0A andTruCluster Available Server Software Version 1.4, just installDigital UNIX Version 4.0A and TruCluster Available ServerSoftware Version 1.4. It will take much less time.

This section covers two simultaneous upgrades to get to DigitalUNIX Version 4.0A and TruCluster Available Server SoftwareVersion 1.4:

• Digital UNIX Version 3.2G and DECsafe Available ServerVersion 1.3

• Digital UNIX Version 3.2A and DECsafe Available ServerVersion 1.2. A simultaneous upgrade from Digital UNIXVersion 3.2C/DECsafe Available Server 1.2A would be verysimilar.

Installing TruCluster Software 4–13

Page 170: Truclu Ase

Preparing to Install TruCluster Software

Simultaneous Upgrade from Digital UNIX Version 3.2G/ASE V1.3 toDigital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a simultaneous upgrade fromDigital UNIX Version 3.2G and DECsafe Available Server Version1.3 to Digital UNIX Version 4.0A and TruCluster Available ServerSoftware Version 1.4:

1. Use the asemgr to: Put all the services off line if you want topreserve the ASE database. Delete all the services if you donot want to preserve the existing ASE database. This allowsyou to keep any AdvFS and LSM disks configured.

2. Use the setld -i | grep ASE command to determine whichDECsafe Available Server subsets are installed.

3. Delete the DECsafe Available Server subsets with the setld -dcommand. You will be asked if you want to retain the existingASE database. If you do not perform an update installation ofDigital UNIX Version 4.0A, but install the operating system,move the database file (/usr/var/ase/config/asecdb ) to a safeplace prior to installing Digital UNIX Version 4.0A. Thiswill prevent the database from being overwritten during theinstallation.

4. Upgrade Digital UNIX to Digital UNIX Version 4.0A or installDigital UNIX Version 4.0A and register the appropriatelicenses.

5. If necessary, move the database file back to /usr/var/ase/config/asecdb after the operating system has been installed.Ensure that the protections, owner, and group are-rw-r--r-- root system .

6. Register the TruCluster Available Server Software Version 1.4software license before you install the TruCluster Software.

7. Install the TruCluster Available Server Software Version 1.4software subsets.

8. Specify the appropriate information at the prompts.

9. Rebuild the kernel for each system.

10. Ensure that the host name and IP address of each existingASE member system is in the /etc/hosts file.

11. Reboot each system.

12. Run the asemgr utility on only one system to add the membersystems and set up the ASE services.

Simultaneous Upgrade from Digital UNIX Version 3.2A/ASE V1.2 toDigital UNIX Version 4.0A/ASE V1.4

Perform the following tasks for a simultaneous upgrade fromDigital UNIX Version 3.2A and DECsafe Available Server Version1.2 to Digital UNIX Version 4.0A and TruCluster Available ServerSoftware Version 1.4.

4–14 Installing TruCluster Software

Page 171: Truclu Ase

Preparing to Install TruCluster Software

Note that the steps provided here only upgrade to Digital UNIXVersion 3.2G and DECsafe Available Server Version 1.3. Aftercompleting these steps, you must complete the steps of thesimultaneous upgrade from Digital UNIX Version 3.2G/ASE V1.3to Digital UNIX Version 4.0A and TruCluster Available ServerSoftware Version 1.4.

1. Use the asemgr to put all the services off line to preserve theASE database. If you will not preserve the existing ASEdatabase, delete all the services (this allows you to keep anyAdvFS and LSM disks configured).

2. Use the setld -i | grep ASE command to determine whichDECsafe Available Server subsets are installed.

3. Delete the DECsafe Available Server subsets with the setld -dcommand. You will be asked if you want to retain the existingASE database. Move the database file (/usr/var/ase/config/asecdb ) to a safe place prior to upgrading to Digital UNIXVersion 3.2C.

4. Update install (installupdate ) Digital UNIX to Digital UNIXVersion 3.2C and register the appropriate licenses.

5. Upgrade (setld -l ) Digital UNIX to Digital UNIX Version 3.2Gand register the appropriate licenses.

6. Move the database file back to /usr/var/ase/config/asecdbafter the operating system has been upgraded. Ensure thatthe protections, owner, and group are -rw-r--r-- root system .

7. Register the DECsafe Available Server Version 1.3 softwarelicense before you install the TruCluster Software.

8. Install the DECsafe Available Server Version 1.3 softwaresubsets.

9. When prompted, enter s to select "Performing a simultaneousupgrade" when you receive the following prompt:

ASE Installation Menu

f) Setting up an ASE for the first timer) Performing a rolling upgrades) Performing a simultaneous upgradee) Adding to an ASE with a V1.3 operating version

x) Quit installation ?) Help

Enter your choice: s

10. Specify the appropriate information at the prompts.

11. When prompted, specify if you want to use the preserved ASEdatabase.

12. Rebuild the kernel for each system.

13. Reboot each system.

14. If you preserved an ASE database, run the asemgr utility andput the services on line.

Installing TruCluster Software 4–15

Page 172: Truclu Ase

Preparing to Install TruCluster Software

If you did not preserve a database, use the asemgr to add themember systems and set up the ASE services.

15. Perform the necessary steps for a simultaneous upgrade fromDigital UNIX Version 3.2G/ASE V1.3 to Digital UNIX Version4.0A/ASE V1.4.

Adding aMemberSystem to anExisting ASEwith ASE V1.4OperatingSoftware

You can add a new member to an existing TruCluster AvailableServer Software Version 1.4 configuration without shutting downthe ASE.

To install TruCluster Available Server Software Version 1.4 ona system and add it to an existing ASE, perform the followingtasks:

1. Install the Digital UNIX Version 4.0A operating system andappropriate licenses.

2. Register the TruCluster Available Server Software Version 1.4license before you install the TruCluster Software.

3. Use the setld -l command to load the TruCluster AvailableServer Software Version 1.4 subsets on the new system.

4. Specify the appropriate information at the installationprompts.

5. When the installation has completed, and the kernel has beenrebuilt, move the new kernel to the root file system.

6. Reboot the system.

7. Ensure that the host name and IP address of each existingASE member system is in the /etc/hosts file.

8. Run the asemgr on an existing member system and add thenew system to the ASE.

4–16 Installing TruCluster Software

Page 173: Truclu Ase

Installing TruCluster Software

Installing TruCluster Software

Overview This section contains an example of installing TruClusterAvailable Server Software Version 1.4 on a system that waspreviously a member of an ASE.

InstallingTruClusterAvailableServerSoftwareVersion 1.4

Use the following steps to install TruCluster Available ServerSoftware Version 1.4:

1. Log in as superuser.

2. Load the Complimentary Products CD–ROM into theappropriate CD–ROM drive.

3. Mount the CD–ROM on the /mnt (or other appropriate)directory, for example:

# mount -r /dev/rz4c /mnt

4. Load the TruCluster Software subsets using the setld -lutility. Specify the mount point and the directory wherethe TruCluster Available Server Software Version 1.4 kit islocated:

# setld -l /mnt/TCR140

5. The setld utility and installation procedure provide the outputshown in Example 4–1.

Example 4–1 Installing TruCluster Available Server Software Version 1.4

# setld -l /mnt/TCR140 1

*** Enter subset selections ***

The following subsets are mandatory and will be installed automaticallyunless you choose to exit without installing any subsets:

* TruCluster Available Server Software* TruCluster Common Components* TruCluster Configuration Software

The subsets listed below are optional:

There may be more optional subsets than can be presented on a singlescreen. If this is the case, you can choose subsets screen by screenor all at once on the last screen. All of the choices you make willbe collected for your confirmation before any subsets are installed.

- TruCluster(TM) Software:1) TruCluster Cluster Monitor

--- MORE TO FOLLOW ---Enter your choices or press RETURN to display the next screen.

Choices (for example, 1 2 4-6): Return

(continued on next page)

Installing TruCluster Software 4–17

Page 174: Truclu Ase

Installing TruCluster Software

Example 4–1 (Cont.) Installing TruCluster Available Server Software Version 1.4

2) TruCluster Reference Pages

The following choices override your previous selections:

3) ALL mandatory and all optional subsets4) MANDATORY subsets only5) CANCEL selections and redisplay menus6) EXIT without installing any subsets

Add to your choices, choose an overriding action orpress RETURN to confirm previous selections.

Choices (for example, 1 2 4-6): 3

You are installing the following mandatory subsets:

TruCluster Available Server SoftwareTruCluster Common ComponentsTruCluster Configuration Software

You are installing the following optional subsets:

- TruCluster(TM) Software:TruCluster Cluster MonitorTruCluster Reference Pages

Is this correct? (y/n): y

Checking file system space required to install selected subsets:

File system space checked OK.

5 subset(s) will be installed.

Loading 1 of 5 subset(s)....

TruCluster Common ComponentsCopying from . (disk)Verifying

Loading 2 of 5 subset(s)....

TruCluster Available Server SoftwareCopying from . (disk)

Working....Thu Sep 5 13:45:40 EDT 1996Verifying

Loading 3 of 5 subset(s)....

TruCluster Cluster MonitorCopying from . (disk)

Working....Thu Sep 5 13:46:13 EDT 1996Verifying

Loading 4 of 5 subset(s)....

TruCluster Reference PagesCopying from . (disk)Verifying

Loading 5 of 5 subset(s)....

TruCluster Configuration SoftwareCopying from . (disk)Verifying

(continued on next page)

4–18 Installing TruCluster Software

Page 175: Truclu Ase

Installing TruCluster Software

Example 4–1 (Cont.) Installing TruCluster Available Server Software Version 1.4

5 of 5 subset(s) installed successfully.

Configuring "TruCluster Common Components " (TCRCOMMON140)

Configuring "TruCluster Available Server Software" (TCRASE140)

Configuring "TruCluster Cluster Monitor " (TCRCMS140)

Configuring "TruCluster Reference Pages " (TCRMAN140)

Configuring "TruCluster Configuration Software " (TCRCONF140)

Enter the IP name for the member network interface [tinker]: Return

You chose "tinker," IP 16.30.80.33 using interface ln0Is this correct? [y]: Return

Do you want to run the ASE logger on this node? [n]: y 2

An old ASE database file has been found. Do you want to use this (y/n): n 3Removing the local disk copy of the ASE database (services and members) ...

Initializing a new ASE V1.4 database ...

The kernel will now be configured using "doconfig". 4

Enter the name of the kernel configuration file. [TINKER]: Return

*** KERNEL CONFIGURATION AND BUILD PROCEDURE ***

Saving /sys/conf/TINKER as /sys/conf/TINKER.bck

Do you want to edit the configuration file? (y/n) [n]: n

*** PERFORMING KERNEL BUILD ***

The ASE I/O Bus Renumbering Tool has been invoked.

Select the controllers that define the shared ASE I/O buses.

Name Controller Slot Bus Slot) scsi0 tcds0 0 tc0 6) scsi1 tcds0 1 tc0 6

2) scsi2 tcds1 0 tc0 13) scsi3 tcds1 1 tc0 14) scsi4 tza0 0 tc0 4

q) Quit without making changes

Enter your choices (comma or space separated): 2 4 5

scsi2 tcds1 0 tc0 1scsi4 tza0 0 tc0 4

Are the above choices correct (y|n)? [y]: y

I/O Controller Name Specification Menu

All controllers connected to an I/O bus must be named the same on all ASEmembers. Enter the controller names for all shared ASE I/O buses by assigningthem one at a time or all at once with the below options.

Name New Name Controller Slot Bus Slot2) scsi2 scsi2 tcds1 0 tc0 14) scsi4 scsi4 tza0 0 tc0 4

(continued on next page)

Installing TruCluster Software 4–19

Page 176: Truclu Ase

Installing TruCluster Software

Example 4–1 (Cont.) Installing TruCluster Available Server Software Version 1.4

f) Assign buses starting at a given numberp) Assign buses as was done in pre-ASE V1.3v) View non shared controllerss) Show previous assignmentsr) Reapply previous assignmentsq) Quit without making any changesx) Exit (done with modifications)

Enter your choice [f]: q 6No changes made

Working....Thu Sep 5 13:51:44 EDT 1996Working....Thu Sep 5 13:53:46 EDT 1996Working....Thu Sep 5 13:55:48 EDT 1996

The new kernel is /sys/TINKER/vmunix

The kernel build was successful. Please perform the following actions: 7

o Move the new kernel to /.o Before rebooting make sure that the member network interface IP

addresses for all cluster members are recorded in each member’s/etc/hosts file.

o Reboot the system.#

1 Use the setld utility to install the TruCluster Available ServerSoftware Version 1.4.

2 This enables the ASE Logger daemon, which tracks allmessages generated by the member systems.

3 The system on which the installation is taking place waspreviously in an ASE and has an existing database. You wouldnot normally reuse the existing database.

4 A new kernel is automatically built.

5 Select the external controllers on the system that are to beused as the shared SCSI buses in the ASE.

6 All member systems must recognize the disks on a sharedSCSI bus at the same device number. Because differentsystems have different numbers of internal SCSI buses, the/var/ase/sbin/ase_fix_config script is used to assign a specificbus number to each external SCSI controller installed on asystem.You are provided the opportunity to change the shared SCSIbus numbers to ensure that the shared buses are the same onall systems in the ASE.In this case, the shared SCSI buses are on SCSI buses 2 and4, and no changes are necessary.

7 After installation is complete, move the new kernel to theroot file system, ensure that the member network interfaceIP addresses for all cluster members are recorded in eachmember’s /etc/hosts file, then reboot the system.

4–20 Installing TruCluster Software

Page 177: Truclu Ase

Installing TruCluster Software

After TruCluster Available Server Software Version 1.4 hasbeen installed, you need to run the asemgr on one (and onlyone) ASE member system to add the newly installed system tothe ASE.

Installing TruCluster Software 4–21

Page 178: Truclu Ase

Summary

Summary

PerformingPreliminarySetup Tasks

Before you install TruCluster Available Server Software Version1.4 software, you must determine if you are properly prepared forthe installation.

• Read the release notes.

• Verify any system prerequisites.

• Set up and test the hardware.

• Use console commands to make sure all the devices on theshared bus(es) are recognized.

Install Digital UNIX including the following subsets:

• OSFCLINET405: Basic Networking Services

• OSFPGMR405: Standard Programmer Commands

• OSFCMPLRS405: Compiler Back End

• NFS, LSM and AdvFS subsets if you will use those services.

Set up the local network, BIND, NFS, mail, and NTP.

Add each member system to all the member systems’ /etc/hostsfiles.

To use the Cluster Monitor, you must also install the followingsubsets. Install the C++ Class Shared Libraries and CDEMinimum Runtime Environment subsets before installingTruCluster Available Server Software Version 1.4.

• CXLSHRDA405: C++ Class Shared Libraries

• OSFCDEMIN405: CDE Minimum Runtime Environment

• TCRCMS140: TruCluster Cluster Monitor

Preparingto InstallTruClusterSoftware

Choose one of the installation procedures based upon yourconfiguration.

• Setting up an ASE for the first time if you are installingTruCluster Available Server Software Version 1.4 on systemsthat are not currently in an ASE. Use this procedure if noneof the systems have DECsafe Available Server or TruClusterAvailable Server Software Version 1.4 installed.

• Performing a rolling upgrade if any of the following is true:

DECsafe Available Server Version 1.2A/Digital UNIXVersion 3.2C is installed on a member of an existing ASE.

DECsafe Available Server Version 1.3/Digital UNIXVersion 3.2D or 3.2F is installed on a member of anexisting ASE.

4–22 Installing TruCluster Software

Page 179: Truclu Ase

Summary

DECsafe Available Server Version 1.3/Digital UNIXVersion V3.2G is installed on a member of an existingASE.

This procedure allows you to upgrade member systemswithout shutting down the ASE.In the first two cases above, a rolling upgrade to Digital UNIXVersion 3.2G and DECsafe Available Server Version 1.3 isrequired before a rolling upgrade to Digital UNIX Version4.0A and TruCluster Available Server Software Version 1.4can be completed.

• Performing a simultaneous upgrade if DECsafe AvailableServer Version 1.2A, Version 1.2, or versions previous to 1.2 isinstalled on an existing ASE. If the DECsafe version is 1.2Aor 1.2, you can preserve the existing ASE database if desired.If the DECsafe version is prior to 1.2, you cannot preservethe ASE database. You must add the member systems andservices after installation.

• Adding to an ASE with a V1.4 operating version if you areadding a new member to an existing DECsafe Available ServerVersion 1.3 configuration. You can add the new memberwithout shutting down the existing ASE.

InstallingTruClusterSoftware

Use the following steps to install TruCluster Available ServerSoftware Version 1.4:

1. Log in as superuser.

2. Load the Complimentary Products CD–ROM into theappropriate CD–ROM drive.

3. Mount the CD–ROM on the /mnt (or other appropriate)directory, for example:

# mount -r /dev/rz4c /mnt

4. Load the TruCluster Software subsets using the setld -lutility. Specify the mount point and the directory wherethe TruCluster Available Server Software Version 1.4 kit islocated:

# setld -l /mnt/TCR140

5. Answer the questions at the prompts throughout theinstallation.

6. Rebuild the kernel.

7. Reboot the system.

Installing TruCluster Software 4–23

Page 180: Truclu Ase

Exercises

Exercises

PerformingPreliminarySetup Tasks:Exercise

1. Before installing the TruCluster Software, you should:

a. Read the release notes

b. Verify system prerequisites

c. Install, set up, and test hardware

d. All of the above

2. Which subset is not required for TruCluster AvailableServer Software Version 1.4?

a. OSFCLINET405

b. OSFPGMR405

c. OSFCMPLRS405

d. None of the above, they are are all required.

3. To use the Cluster Monitor, you must install which ofthese subsets?

a. CXLSHRDA405

b. OSFCDEMIN405

c. TCRCMS140

d. All of the above

PerformingPreliminarySetup Tasks:Solution

1. d Before installing the TruCluster Software, you should:

a. Read the release notes

b. Verify system prerequisites

c. Install, set up, and test hardware

d. All of the above

2. d Which subset is not required for TruCluster AvailableServer Software Version 1.4?

a. OSFCLINET405

b. OSFPGMR405

c. OSFCMPLRS405

d. None of the above, they are are all required.

4–24 Installing TruCluster Software

Page 181: Truclu Ase

Exercises

3. d To use the Cluster Monitor, you must install which ofthese subsets?

a. CXLSHRDA405

b. OSFCDEMIN405

c. TCRCMS140

d. All of the above

Preparingto InstallTruClusterSoftware:Exercise

1. For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C

b. ASE V1.1/Digital UNIX Version 3.0

c. ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0

2. A rolling upgrade allows you to upgrade ASE membersystems without shutting down the ASE.

a. True

b. False

3. You can perform a rolling upgrade to Digital UNIXVersion 4.0A from which operating system version?

a. Digital UNIX Version 3.2C

b. Digital UNIX Version 3.2D

c. Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G

4. If ASE member systems are at DECsafe AvailableServer Version 1.2, you can preserve the ASE database ifdesired.

a. True

b. False

5. To add a new member to an existing TruClusterAvailable Server Software Version 1.4 configuration, you mustshut down the ASE before adding the new member.

a. True

b. False

Installing TruCluster Software 4–25

Page 182: Truclu Ase

Exercises

6. Which configuration is supported during a rollingupgrade to TruCluster Available Server Software Version 1.4?

a. ASE V1.3/Digital UNIX Version 3.2D and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

b. ASE V1.3/Digital UNIX Version 3.2F and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

c. ASE V1.3/Digital UNIX Version 3.2G and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

d. All of the above

Preparingto InstallTruClusterSoftware:Solution

1. a For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C

b. ASE V1.1/Digital UNIX Version 3.0

c. ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0

2. a A rolling upgrade allows you to upgrade ASE membersystems without shutting down the ASE.

a. True

b. False

3. d You can perform a rolling upgrade to Digital UNIXVersion 4.0A from which operating system version?

a. Digital UNIX Version 3.2C

b. Digital UNIX Version 3.2D

c. Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G

4. a If ASE member systems are at DECsafe Available ServerVersion 1.2, you can preserve the ASE database if desired.

a. True

b. False

5. b To add a new member to an existing TruClusterAvailable Server Software Version 1.4 configuration, you mustshut down the ASE before adding the new member.

a. True

4–26 Installing TruCluster Software

Page 183: Truclu Ase

Exercises

b. False

6. c Which configuration is supported during a rollingupgrade to TruCluster Available Server Software Version 1.4?

a. ASE V1.3/Digital UNIX Version 3.2D and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

b. ASE V1.3/Digital UNIX Version 3.2F and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

c. ASE V1.3/Digital UNIX Version 3.2G and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

d. All of the above

InstallingTruClusterSoftware:Exercise

Register the ASE-OA license PAK and install the TruClusterAvailable Server Software Version 1.4 software on all membersystems.

InstallingTruClusterSoftware:Solution

If the system is presently a member of an ASE, remove thesystem from the ASE and delete the ASE software subsets beforeinstalling TruCluster Available Server Software Version 1.4.

Use lmfsetup , lmf , or the License Manager GUI to register thePAK.

Use setld to install the TruCluster Available Server SoftwareVersion 1.4 software subsets and rebuild the kernel. See the textfor a sample script.

Installing TruCluster Software 4–27

Page 184: Truclu Ase
Page 185: Truclu Ase

5Setting Up and Managing ASE Members

Setting Up and Managing ASE Members 5–1

Page 186: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter describes how to set up and administer your ASEmember systems.

Objectives To understand how to set up and manage ASE members, youshould be able to:

• Describe the purpose and syntax of the asemgr utility

• Set up ASE member systems

• Manage TruCluster Software event logging

Resources For more information on the topics in this chapter, see the

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• Reference Pages

5–2 Setting Up and Managing ASE Members

Page 187: Truclu Ase

Introducing the asemgr Utility

Introducing the asemgr Utility

Overview You use the asemgr utility to set up and administer the ASE.Tasks you can perform with the asemgr include the following:

• Adding and deleting member systems

• Adding and deleting network interfaces

• Creating and managing ASE services

• Displaying the status of member systems and ASE services

• Specifying logger locations and levels

asemgrCommandSyntax

The asemgr utility has an interactive menu mode and a limitedcommand line interface. If you enter asemgr without options, itdisplays menus and prompts you for information. The commandline interface allows you to use asemgr in shell scripts.

The syntax for the asemgr command is as follows:

/usr/sbin/asemgr [ options ]

Command line options are as follows:

• -d [ -h ] [ member ] [ -v ] [ service ] [ -l ]

Displays the status of all the member systems (-h ) andservices (-v ), or specific member systems and services. Alsodisplays the member systems that are running the loggerdaemon (-l ).

• -m service member

Relocates the specified service to the specified member system.When you relocate a service, you stop the service on themember system currently running the service and start theservice on another member system.

• -r service . . .

Restarts a service.

• -s service . . .

Starts the specified service and sets it on line, making itavailable to clients.

• -x service . . .

Stops the specified service and sets it off line, making itunavailable to clients.

Setting Up and Managing ASE Members 5–3

Page 188: Truclu Ase

Introducing the asemgr Utility

RunningMultipleInstances ofthe asemgr

Some ASE administrative tasks can lock the ASE. If you tryto run the asemgr utility and the ASE is locked, the followingmessage is displayed:

ASE is locked by ’hostname’

This message indicates that the task cannot be performed becauseanother member system is running the asemgr utility.

5–4 Setting Up and Managing ASE Members

Page 189: Truclu Ase

Setting Up and Managing Members

Setting Up and Managing Members

Overview After you have installed the TruCluster Software and rebootedthe members, you can use the asemgr to add all the ASE membersystems at the same time and from the same system. Theconfiguration database created by asemgr (/usr/var/ase/config/asecdb ) will be copied to each member system.

You can also add members one at a time from an existing membersystem once the TruCluster Software is installed and running.

Using asemgrthe First Time

The first time you invoke the asemgr utility you will not see themain menu. Instead you will be prompted for member systemnames and asked to confirm the configuration, as shown in thefollowing example:

# /usr/sbin/asemgr

Enter a comma separated list of all the host names you wantas ASE servers.

Enter Members: tinker, tailor

Member List: tinker, tailor

Is this correct (y/n) [y]: y

Would you like to define any other network interfaces to tinkerfor ASE use (y/n)? [n]: n

Would you like to define any other network interfaces to tailorfor ASE use (y/n)? [n]: n

ASE Network Configuration

Member Name Interface Name Member Net Monitor___________ ______________ __________ _______

tinker tinker Primary Yestailor tailor Primary Yes

Is this configuration correct (y|n)? [y]: y

After you enter member names and verify the configuration, theasemgr main menu is displayed:

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice:

Setting Up and Managing ASE Members 5–5

Page 190: Truclu Ase

Setting Up and Managing Members

InitializingASE MemberSystems

If an ASE does not function properly when you attempt to youadd members, first make sure that you have adhered to theinstallation requirements. If this still does not allow you to fix theproblem, you can initialize one or all of the member systems in anASE.

Initializing a system stops any running ASE daemons andremoves any member system and service information from theASE database on the system. After you initialize a system, it canbe added to an existing ASE or used in a new ASE.

Initializing a Single Member System

To initialize one member system in an ASE, use the followingprocedure:

1. If the system is already a member system, use the asemgrutility to delete the member system from the ASE. If youcannot delete the member system, you cannot initialize onlythat member.

2. If the system is not an ASE member system, delete the/usr/var/ase/config/asecdb ASE database file, if it exists, fromthe system.

3. Invoke the /usr/sbin/asesetup command on the system.

4. Run the asemgr utility on an existing member system and addthe initialized system to the ASE.

Initializing All the Member Systems

Initializing all the member systems returns the ASE to a statethat includes no member systems or services. After you do this,you must add the member systems and set up your services again.

To initialize all the member systems in an ASE, follow thesesteps:

1. If possible, use the asemgr utility to display the status of themember systems, networks, and services in the ASE. Thisinformation will help you to recreate your ASE.

2. If possible, use the asemgr utility to delete all the services fromthe ASE. This allows you to save any Logical Storage Manager(LSM) or Advanced File System (AdvFS) configurations on aspecific system.

3. Delete the usr/var/ase/config/asecdb ASE database file fromall the systems.

4. Invoke the usr/sbin/asesetup command on each system.

5. Run the asemgr utility on a system, add the other initializedsystems to the ASE, one at a time, and set up your services.

5–6 Setting Up and Managing ASE Members

Page 191: Truclu Ase

Setting Up and Managing Members

Using asemgrto ManageMembers

To manage ASE member systems, invoke the asemgr utility andchoose the Managing the ASE item from the main menu. Thefollowing example shows the Managing the ASE menu.

Example 5–1 asemgr Menus for Members

# /usr/sbin/asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: a

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

x) Exit to the Main Menu ?) Help

Enter your choice [x]:

Setting Up and Managing ASE Members 5–7

Page 192: Truclu Ase

Setting Up and Managing Members

Adding aMember

Choose the Add a member item from the Managing the ASEmenu to add a member. The screen displays the current membersystems and prompts you to enter a new member name and toconfirm the new configuration, as shown in the following example:

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

x) Exit to the Main Menu

Enter your choice [x]: a

Member List: tinker, tailor

Enter a new member: weaver

Member List: tinker, tailor, weaver

Is this correct (y/n)? [y]: y

Would you like to define any other network interfacesto weaver for ASE use? (y/n)? [n]: n

ASE Network Configuration

Member Name Interface Name Member Net Monitor___________ ______________ __________ _______

tinker tinker Primary Yestailor tailor Primary Yesweaver weaver Primary Yes

Is this configuration correct (y|n)? [y]: y

Deleting aMember

Choose the Delete a member item from the Managing the ASEmenu to remove a member system. The screen displays thecurrent member systems and prompts you to identify the systemto remove. The following example shows the screen display fordeleting a member:

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

x) Exit to the Main Menu

Enter your choice [x]: d

Select the member to delete:

1) tailor2) weaver

) tinker

5–8 Setting Up and Managing ASE Members

Page 193: Truclu Ase

Setting Up and Managing Members

x) Exit without deleting a member ?) Help

Enter your choice [x]: 2

New member list: tinker, tailor

Member to delete: weaver

Is this correct? (y/n) [y]: yDeleting member ’weaver’ ...Member deleted

You cannot delete a member selected as a favored member for aservice using the favor members or restrict to favored membersplacement policy. You must modify the service and remove thatfavored member from the list before you can delete the memberfrom the available server environment.

You cannot delete the member running the asemgr utility. If thereis only one member system in the ASE, you cannot delete thatmember using the asemgr utility. You can use setld -d to deletethe TruCluster Software subset from the last member system.

Managing ASENetworks

When you add a member system to the ASE, the asemgr utilityprompts you for additional network interface names. Before youadd an interface, you must use the netsetup utility to define thenetwork interface on the system.

Primary and backup networks in an ASE must be subnets thatare common to all member systems. Network interface namesused in an ASE for common networks must be included in thelocal /etc/hosts file on each member system.

The following example is part of an /etc/hosts file and showstwo member systems, tinker and tailor , and multiple networkinterfaces for the systems.

# ASE member systems#16.140.64.121 tinker.abc.def.com tinker16.140.64.122 tinker.abc.def.com tailor### FDDI ring #1#16.140.64.121 tinker1.abc.def.com tinker116.140.64.122 tailor1.abc.def.com tailor1### FDDI ring #2#16.140.64.121 tinker2.abc.def.com tinker216.140.64.122 tailor2.abc.def.com tailor2

You must specify the interface names for the primary and backupnetworks in the local /etc/routes file on each member system.For each member system, you must define a host route to allother member systems. This definition is needed to fail over IPtraffic between member systems when a network path fails.

Setting Up and Managing ASE Members 5–9

Page 194: Truclu Ase

Setting Up and Managing Members

For example, if your member systems are tinker1 and tailor1 ,where the number in the name refers to the subnet, and eachmember system also has interface names tinker2 and tailor2 ,then each member system’s /etc/routes file must contain thefollowing information:

-host tinker tinker-host tinker tinker-host tinker1 tinker1-host tinker2 tinker2-host tailor1 tailor1-host tailor2 tailor2

Modifying the Network Configuration

Choose the Modify the network configuration item from theManaging the ASE menu to manage the network interface inan ASE configuration. The ASE Network Modify menu appears,containing options that allow you to add and delete networkinterfaces, display the current network configuration, specifyprimary and backup networks, and specify networks to bemodified or ignored.

Enter your choice [x]: n

ASE Network Modify Menu

a) Add network interfacesd) Delete network interfacess) Show the current configuration

p) Specify the primary ASE member networkb) Specify a backup ASE member networki) Specify an ASE member network to be ignoredm) Specify network interfaces to be monitored

q) Quit without making changesx) Exit

Adding and Deleting Network Interfaces

Before you specify a network interface for a member system, theinterface must be defined and configured on the system.

Choose the Add network interfaces item from the ASE NetworkModify Menu to add a network interface:

ASE Member Menu

Select a member to add an interface to:

0) tinker1) tailorq) Quit without making changes

Enter your choice [q]: 1

Enter interface names for member ’tailor’Interface name (return to exit): tailor1

5–10 Setting Up and Managing ASE Members

Page 195: Truclu Ase

Setting Up and Managing Members

To delete network interfaces, choose the Delete network interfacesitem:

ASE Member Menu

Select a member to delete an interface from:

0) tinker1) tailorq) Quit without making changes

Enter your choice [q]: 1

Network interfaces for member ’tailor’

Choose one or more network interfaces to delete:

1) tailor 16.140.64.1212) tailor1 16.142.112.1213) tailor2 16.142.96.122

q) Quit to previous menu

Enter your choices (comma or space separated): 1

Displaying the Current Network Configuration

Choose the Show the current configuration item from the ASENetwork Modify menu to display the member systems, theirinterface names, whether an interface is designated as a primaryor a backup network, and whether monitoring is enabled. Forexample:

ASE Network Configuration

Member Name Interface Name Member Net Monitor___________ ______________ __________ _______

tinker tinker Primary Yestinker1 tinker1 Backup Yestinker2 tinker2 Backup Yes

tailor tailor Primary Yestailor1 tailor1 Backup Yestailor2 tailor2 Backup Yes

Specifying Primary and Backup Networks

The primary network in an ASE is the network most frequentlyused to query other member systems. Backup networks are alsoused for queries, but at a slower rate. Interfaces for primaryand backup networks must be common to all the ASE membersystems and included in each member system’s local /etc/hostsand /etc/routes files.

Choose the Specify the primary ASE member network menu itemto select an interface for the primary network:

ASE Member Primary Network Menu

Choose one of the networks to be the ASE memberprimary network:

0) 16.140.64.121 (tinker, tailor)1) 16.142.112.121 (tinker1, tailor1)2) 16.142.96.122 (tinker2, tailor2)

Setting Up and Managing ASE Members 5–11

Page 196: Truclu Ase

Setting Up and Managing Members

q) Quit to previous menu

Enter your choice: 1

16.142.112.121 (tinker1, tailor1)Is the above choice correct? (y/n) [y]: y

Choose the Specify a Backup ASE Member Network menu item toselect backup network interfaces for the ASE:

ASE Member Backup Network Menu

Choose the networks you want to be the ASE memberbackup networks:

0) 16.140.64.121 (tinker, tailor)1) 16.142.112.121 (tinker1, tailor1)2) 16.142.96.122 (tinker2, tailor2)

q) Quit to previous menu

Enter your choices (comma or space separated): 0,2

16.140.64.121 (tinker, tailor)16.142.96.122 (tinker2, tailor2)

Are the above choices correct? (y/n) [y]: y

Specifying a Network to be Ignored

Choose the Specify an ASE member network to be ignored menuitem to specify a network that you want to configure but you donot currently want the member system to use:

Ignore ASE Member Network Menu

Choose a network not to be used as an ASE membernetwork:

0) 16.140.64.121 (tinker, tailor)1) 16.142.112.121 (tinker1, tailor1)2) 16.142.96.122 (tinker2, tailor2)

q) Quit to previous menu

Enter your choices (comma or space separated): 2

16.142.96.122 (tinker2, tailor2)

Are the above choices correct? (y/n) [y]: y

Specifying a Network to be Monitored

You should monitor an interface if you are concerned with clientaccess on a particular interface. Monitoring an interface allowsyou to customize a TruCluster Available Server operation when anetwork interface fails. You can monitor the primary and backupnetwork interfaces in an ASE. You can also monitor a networkinterface that is configured on only one system and is not commonto all the ASE member systems.

If a monitored network interface fails, the TruCluster Softwareruns the error Alert script which invokes the /var/ase/lib/ni_status_awk script that is located on the member system.The default script causes the TruCluster Software to stop allthe services running on that member system and start them on

5–12 Setting Up and Managing ASE Members

Page 197: Truclu Ase

Setting Up and Managing Members

another member system if all the network interfaces on the firstmember system fail.

However, you can edit the /var/ase/lib/ni_status_awk scripton each member system to specify a different action to take.For example, you can edit the script so that services relocateto another member system if any network interface fails or if aparticular interface fails. In addition, because the error Alertscript is propagated on all the member systems, you can edit theerror Alert script itself, so the actions will be the same on allsystems. Use the asemgr utility to edit the error Alert script.

Choose the Specify network interfaces to be monitored menu itemto modify specific interfaces:

ASE Member Menu

Choose a member to modify:

0) tinker1) tailorq) Quit without making changes

Enter your choice: 0

Network Interfaces for Member ’tinker’

Choose one or more network interfaces:

0) tinker 16.140.64.121 (monitored)1) tinker1 16.142.112.121 (not monitored)2) tinker2 16.142.96.122 (monitored)q) Quit to previous menun) Do not monitor any interfaces

Enter your choice: 1

tinker1 16.142.96.122

Are the above choices correct? (y/n) [y]: y

Setting Up and Managing ASE Members 5–13

Page 198: Truclu Ase

Setting Up and Managing Members

Displaying ASEMember Status

Choose the Display the status of the members menu item todisplay the status of member systems. The screen displays thehost status (UP or DOWN) and the status of the agent daemonfor each member system, as shown in the following example:

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

x) Exit to the Main Menu

Enter your choice [x]: m

Member Status

Member: Host Status: Agent Status:tinker UP RUNNINGtailor UP RUNNING

The Director daemon obtains system status from the Host StatusMonitor (HSM) daemons running on all the member systems. Thefollowing table describes the information for the Host Status field:

Host Status Description

UP The member system is up and can be accessed by the memberthat is running the Director daemon using the primary network.

DOWN The member system cannot be accessed by the member that isrunning the Director daemon using the primary network or theSCSI bus.

DISCONNECTED The member system is disconnected from all monitored networks.Any services running on the member system are stopped, and noservices can be added, deleted, or started on the member system.

NETPAR There is a network partition between the member system andthe member system running the Director daemon, althoughthe member systems can communicate using SCSI bus queries.Services that are currently running on the member systemremain running, but the member system cannot start or stop anyservice until it leaves this state.

The Director determines the status of the Agent daemons runningon the member systems. The following table describes theinformation in the Agent Status field:

Agent Status Description

RUNNING The ASE Agent daemon is running on themember system.

5–14 Setting Up and Managing ASE Members

Page 199: Truclu Ase

Setting Up and Managing Members

Agent Status Description

DOWN The ASE Agent daemon is not running onthe member system.

INITIALIZING The ASE Agent daemon that is running onthe member system is in its initializationphase and will be running soon.

UNKNOWN The ASE Director daemon cannotdetermine the state of the Agent daemonon the member system.

INVALID The ASE Director daemon reports aninvalid state for the Agent daemon on themember system.

Resetting theTruClusterSoftwareDaemons

If you experience problems in your ASE, you can reset theTruCluster Software daemons. This stops the Director and HostStatus Monitor daemons and initializes the Agent daemons.The Agent daemons then restart the other daemons to make theTruCluster Software fully operational. If resetting the TruClusterSoftware daemons does not fix the problem, you can initialize orreboot the system.

To reset the TruCluster Software daemons on a member system,use the following command:

/sbin/init.d/asemember restart

TruClusterSoftwareDaemonScheduling

The TruCluster Software daemons need to run with a schedulingpriority that is higher than normal system processes because thedaemons must be able to respond to administrative commandsand time-sensitive events in the ASE. The Agent daemon(aseagent ) and Logger daemon (aselogger ) are started in the/sbin/init.d/asemember script with a nice value of -5, which raisesthe priority of those daemons and all processes they start.

If you see an ASE timeout error message in the daemon.log file,it means the TruCluster Software daemons are timing out whilewaiting to run.

There might be non-TruCluster processes with higher schedulingpriority, forcing the TruCluster Software daemons to wait. Inthis case, you can raise the scheduling priority of the TruClusterSoftware daemons by increasing the nice value in the /sbin/init.d/asemember script. Refer to nice (1) for more informationabout scheduling priorities.

Note that TruCluster Software daemons started with a ‘‘nice’’priority will not always stay at that priority. Over time, if themember systems do not reboot, a daemon’s priority may return tothe average run priority. When the member systems reboot, thedaemon’s priority is raised again according to the ‘‘nice’’ value inthe sbin/init.d/asemember script.

Setting Up and Managing ASE Members 5–15

Page 200: Truclu Ase

Setting Up and Managing Members

Therefore, the default sbin/init.d/asemember script contains thefollowing command, which supercedes the ‘‘nice’’ value for theasehsm daemon and runs the daemon with a fixed high prioritythat does not degrade over time:

aseagent -p hsm

If you do not want the fixed high priority for the asehsm daemon,remove this command from the sbin/init.d/asemember script.

You can also raise and fix the priority of the asehsm, asehsm,and asehsm daemons by including the following command in thesbin/init.d/asemember script.

aseagent -p all

5–16 Setting Up and Managing ASE Members

Page 201: Truclu Ase

Using TruCluster Event Logging

Using TruCluster Event Logging

Overview The Logger daemon (aselogger ) tracks the TruCluster AvailableServer messages generated by all the member systems. It isrecommended that you run an instance of the Logger daemon oneach ASE member system.

The Logger daemon uses the Digital UNIX event logging facility,syslog , which collects messages logged by various kernel,command, utility, and application programs.

Alert level messages are critical and need immediate attention.You can specify special action to take in case an alert error occurs.You can also identify users to receive a mail message or otheractions to take in an Alert script.

You can use the asemgr utility to manage TruCluster Softwareevent logging and perform the following tasks:

• Display the logger location

• Set and display the level of message logging

• Edit and test the Alert script

Starting theLogger

During installation, the TruCluster Software prompts you to startthe Logger daemon on that member system. If you choose not torun the Logger daemon during installation, you can invoke thefollowing commands to start the Logger daemon.

# rcmgr set ASELOGGER 1# /sbin/init.d/asemember restart

Stopping theLogger

You can stop TruCluster Software logging on a member system byentering the following commands:

# rcmgr set ASELOGGER 0# /sbin/init.d/asemember restart

Setting SystemLogging

System event messages processed by syslog are logged to alocal file or forwarded to a remote system, as specified in the/etc/syslog.conf file. If you use the default configuration, allasemgr utility and TruCluster Software daemon messages arelogged to the /var/adm/syslog.dated/ date /daemon.log file on thesystem running the logger. The Availability Manager drivermessages are logged to the kern.log file in the same directory.

If no Logger daemon is running, or if the member running thelogger goes down, all TruCluster Software messages are loggedlocally.

Setting Up and Managing ASE Members 5–17

Page 202: Truclu Ase

Using TruCluster Event Logging

Example 5–2 System Logging Configuration File

1 2 3kern.debug /var/adm/syslog.dated/kern.logdaemon.debug /var/adm/syslog.dated/daemon.log*.emerg *

The following example shows the format of the /etc/syslog.conffile. For more information, see syslogd (8) and Digital UNIXSystem Administration, Chapter 14.

1 Specifies the part of the system generating the message. Theasterisk (* ) represents all parts of the system.

2 Specifies the severity level. The syslogd daemon logs allmessages of the specified level or greater severity. Severitylevels include emerg (panic), alert, crit, err, warn, notice,info , and debug .

3 Specifies the destination where the messages are logged.You can specify a full pathname to log to a file, @hostname toforward messages to that host, a comma-separated list ofusers to receive messages, or an asterisk (*) to write messagesto all users who are logged in.

DisplayingLoggerLocation

To find the TruCluster Software message logs, identify a membersystem running the logger daemon, then check its system logfiles. You can determine which member systems are running theLogger daemon by choosing the Obtaining ASE Status item fromthe asemgr main menu, and then choosing the Display the locationof the logger(s) item. The following example shows how to displaythe location of the Logger daemon.

5–18 Setting Up and Managing ASE Members

Page 203: Truclu Ase

Using TruCluster Event Logging

Example 5–3 Displaying Logger Daemon Location

Obtaining ASE Status

m) Display the status of the memberss) Display the status of a servicel) Display the location of the logger(s)v) Display the level of logging

x) Exit to the Main Menu ?) Help

Enter your choice [x]: l

Location of logger(s)The following member(s) are logging ASE information:

tinkertailor

Setting LogLevel

You can use the asemgr utility to specify the level of the messagesyou want logged by the Logger daemon. The Logger daemon usesfour logging levels, as described in the following table.

Logging Level Description

Error Logs messages with Error and Alert severity level.Specifies critical conditions that need immediateattention.

Warning Logs messages with Warning and Error severitylevels. Includes potential error conditions.

Notice Logs messages with Notice, Warning, and Errorseverity levels. Includes informational messagesabout significant activity. This is the default.

Informational Logs messages of all severity levels. This is veryverbose, for debugging purposes.

You can display your current logging level by choosing the Displaythe level of logging item from the Obtaining ASE Status menu, asshown in the following example.

Setting Up and Managing ASE Members 5–19

Page 204: Truclu Ase

Using TruCluster Event Logging

Example 5–4 Displaying Log Level

Obtaining ASE Status

m) Display the status of the memberss) Display the status of a servicel) Display the location of the logger(s)v) Display the level of logging

x) Exit to the Main Menu ?) Help

Enter your choice [x]: v

Level of ASE Logging:

Notice, warning and error logging

To set the severity level, choose the Set the logging level itemfrom the Managing the ASE menu, as shown in the followingexample.

Example 5–5 Setting the Log Level

Managing the ASE

a) Add a memberd) Delete a memberm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

res) Reset the ASE daemons

x) Exit to the Main Menu ?) Help

Enter your choice [x]: l

Enter the logging level for the ASE:

i) Informational (log everything)n) Notice, warning and error loggingw) Warning and error logginge) Error logging only

x) Exit to Managing the ASE

Enter your choice [n]: n)

Using an AlertScript

When an error of severity level alert occurs, the TruClusterSoftware uses a special script to determine additional actions toperform. The default Alert script sends mail to root. You can editthe script to specify other users to receive mail. You can also editthe script to specify some other action to take.

To edit the Alert script, choose the Edit the error Alert scriptitem from the Managing the ASE menu. The asemgr utilityputs you into the vi editor or the editor defined by the EDITORenvironment variable and allows you to edit the script as needed.The following example shows how to edit the Alert script.

5–20 Setting Up and Managing ASE Members

Page 205: Truclu Ase

Using TruCluster Event Logging

Example 5–6 Editing the Alert Script

Managing the ASE

a) Add a memberd) Delete a memberm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

res) Reset the ASE daemons

x) Exit to the Main Menu ?) Help

Enter your choice [x]: e

#! /bin/sh## Script to log critical ASE errors## Define ADMIN on next line to get mailADMIN="root ,tom "

PATH=/sbin:/usr/sbin:/usr/binexport PATHERR_FILE=/var/ase/tmp/alertMsgTIME=‘date +"%D %T"‘if [ -n "${ADMIN}" ]; then

if [ ! -f "${ERR_FILE}" ]; thenecho "Critical ASE error detected \

on ‘date‘" > ${ERR_FILE}fimailx -s "Critical ASE error detected." \

${ADMIN} < ${ERR_FILE}firm -f ${ERR_FILE}

: wq

After editing the Alert script, you should test it. To test theAlert script, choose the Test the error Alert script item from theManaging the ASE menu. A test message is sent to the Loggerdaemon and the Alert script is invoked, as shown in the followingexample.

Example 5–7 Testing the Alert Script

Managing the ASE

a) Add a memberd) Delete a memberm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

res) Reset the ASE daemons

x) Exit to the Main Menu ?) Help

Enter your choice [x]: t

(continued on next page)

Setting Up and Managing ASE Members 5–21

Page 206: Truclu Ase

Using TruCluster Event Logging

Example 5–7 (Cont.) Testing the Alert Script

Enter ’y’ to send a test alert message to logger (y/n):y

--- Test alert message sent to logger

Examining LogMessages

The system event logs are ASCII text files that are placed underthe /var/adm/syslog.dated directroy. The files can be displayedwith commands such as cat , more , and tail .

The following example shows the messages logged to daemon.logon system tinker when an NFS service running on system tinkeris relocated to system tailor .

Example 5–8 daemon.log Entries

Sep 10 09:24:07 tinker ASE: tinker Agent Notice: stopping service nfsusersSep 10 09:24:12 tinker ASE: tailor Director Notice: stopped nfsusers on tinkerSep 10 09:24:12 tinker ASE: tailor Agent Notice: starting service nfsusersSep 10 09:24:22 tinker ASE: tailor Director Notice: started nfsusers on tailorSep 10 09:24:22 tinker ASE: tailor AseMgr Notice: Relocated service nfsusers to tailor

TruCluster Software messages include the following information:

• Time stamp

• Local system name

• ASE identifier (not used in messages from the AvailabilityManager driver)

• System that generated the message (or local)

• Source of the message:

AseMgr asemgr utility

Director Director daemon

Agent Agent daemon

HSM HSM daemon

AseLogger Logger daemon

AM Availability manager driver

vmunix Availability manager driver

AseUtility Command executed by an action script

• Severity of the message

• Message text

The AseUtility source indicates the message has been producedby a command or daemon not directly related to the TruClusterSoftware. It is caused by some other software that the TruClusterSoftware is using. For example, the following message wasproduced by the LSM software and captured by a TruClusterSoftware action script:

5–22 Setting Up and Managing ASE Members

Page 207: Truclu Ase

Using TruCluster Event Logging

AseUtility Error: voldisk: Volume daemon is not accessible

You must examine the messages in the logs to determinethe severity and source. Alert messages and their meaningsare discussed further in the chapter on troubleshooting. Formore information on alert messages, see TruCluster AvailableServer Software Available Server Environment Administration,Appendix B.

Setting Up and Managing ASE Members 5–23

Page 208: Truclu Ase

Summary

Summary

Introducing theasemgr Utility

You use the asemgr utility to set up and administer the ASE.Tasks you can perform with the asemgr include the following:

• Adding and deleting member systems

• Adding and deleting network interfaces

• Creating and locating ASE services

• Displaying the status of member systems and ASE services

• Specifying logger locations and message levels

Setting Upand ManagingMembers

After you have installed the TruCluster Software and rebootedthe members, you can use the asemgr to add all the ASE membersystems at the same time and from the same system. Theconfiguration database created by asemgr (/usr/var/ase/config/asecdb ) will be copied to each member system.

The asemgr ’s Managing the ASE menu contains options thatallow you to add and delete ASE members, modify the networkconfiguration, and display the status of the ASE members.

UsingTruClusterSoftware EventLogging

The Logger daemon (aselogger ) tracks the TruCluster Softwaremessages generated by all the member systems. You can start theLogger daemon during TruCluster Software installation or later.The Logger daemon uses the Digital UNIX event logging facility,syslog . You can specify special action to take in case an errorof level alert occurs, including send a mail message to specifiedusers.

5–24 Setting Up and Managing ASE Members

Page 209: Truclu Ase

Exercises

Exercises

Introducing theasemgr Utility:Exercise

If you run more than one instance of the asemgr on different ASEmember systems, what may happen?

Introducing theasemgr Utility:Solution

Some ASE administrative tasks can lock the ASE. If you tryto run the asemgr utility and the ASE is locked, the followingmessage is displayed:

ASE is locked by ’hostname’

This message indicates that the task cannot be performed becauseanother member system is running the asemgr utility.

Using asemgrto ManageMembers:Exercise

After installing and rebooting all member systems, run the asemgrutility on one member to do the following:

1. Add all member system names

2. Display member status

Using asemgrto ManageMembers:Solution

1. Sample solution. The first time you run asemgr , you will beprompted for member names, rather than see the menu.

Enter a comma separated list of all the host names you wantas ASE servers.

Enter Members: alpha, omega

Member List: alpha, omega

Is this correct (y/n) [y]: y

Would you like to define any other network interfaces to alphafor ASE use (y/n)? [n]: n

Would you like to define any other network interfaces to omegafor ASE use (y/n)? [n]: n

ASE Network Configuration

Member Name Interface Name Member Net Monitor___________ ______________ __________ _______

alpha omega Primary Yesomega alpha Primary Yes

Is this configuration correct (y|n)? [y]: y

2. Sample solution.

Available Server Environment (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Setting Up and Managing ASE Members 5–25

Page 210: Truclu Ase

Exercises

Enter your choice: a

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the membersl) Set the logging levele) Edit the error alert scriptt) Test the error alert script

x) Exit to the Main Menu ?) HelpEnter your choice [x]: m

Member Status

Member: Host Status: Agent Status:tinker UP RUNNINGtailor UP RUNNING

UsingTruClusterSoftware EventLogging:Exercise

1. If you did not start the Logger daemon during TruClusterSoftware installation, do so now with the following commands:

# rcmgr set ASELOGGER 1# /sbin/init.d/asemember restart

2. Identify where system event messages are logged on yoursystem.

3. Identify a member system running the Logger daemon.

4. Display the current TruCluster Software logging level andchange it to Informational level.

5. Add your user name to the error Alert script to receive mailand then test the script. (Create a user account if you do notalready have one.)

6. Shut down a member system to cause an alert.

7. Examine the daemon.log file and find the alert messages youcaused in the previous steps. Also verify that root and youruser name received mail about the alert.

UsingTruClusterSoftware EventLogging:Solution

1. Sample solution.

# ps ax | grep aselogger262 ?? I < 0:00.19 /usr/sbin/aselogger

In this case, the logger is running.

2. By default, asemgr utility and ASE daemon messages arelogged to the /var/adm/syslog.dated/ date /daemon.log file andAM driver messages are logged to the kern.log file. Checkthe /etc/syslog.conf file for kern and daemon messagedestinations.

3. Choose the Obtaining ASE Status item from the asemgr mainmenu, then choose the Display the location of the logger(s)item.

5–26 Setting Up and Managing ASE Members

Page 211: Truclu Ase

Exercises

4. Choose the Display the level of logging item from theObtaining ASE Status menu to display the current level.Choose the Set the logging level item from the Managing theASE menu to set the severity level.

5. Choose the Edit the error Alert script item from the Managingthe ASE menu. Add your user name after "ADMIN=".Choose the Test the error Alert script item from the Managingthe ASE menu. You should receive a mail message similar tothe following.

tinker AseMgr ***ALERT: Test of alert script

6. Turn the power off at another member system.

7. Sample example.

Jan 18 14:11:48 tinker ASE: tailor AseMgr ***ALERT:Member tailor is not available

Setting Up and Managing ASE Members 5–27

Page 212: Truclu Ase
Page 213: Truclu Ase

6Writing and Debugging Action Scripts

Writing and Debugging Action Scripts 6–1

Page 214: Truclu Ase

About This Chapter

About This Chapter

Introduction Action scripts control starting and stopping available services.Application services require scripts to start and stop theapplication; disk and NFS services may not need any scripts.

This chapter discusses the types of action scripts that TruClusterAvailable Server uses and the guidelines and conventions forcreating and debugging them. It shows how to use the asemgrutility to add scripts to a service.

Objectives To write and debug action scripts, you should be able to:

• Describe the types of action scripts and the action scriptconventions

• Create action scripts

• Test and debug action scripts

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

6–2 Writing and Debugging Action Scripts

Page 215: Truclu Ase

Introducing Action Scripts

Introducing Action Scripts

Overview Action scripts contain the operations to set up, start and stop aservice in an Available Server Environment (ASE), so that it canfail over from one member to another.

TruCluster Available Server defines several types of action scripts.Some available services do not require scripts; some servicesrequire only two. There are a number of conventions to followwhen writing scripts.

Types of ActionScripts

An action script specifies a series of operations telling the systemwhat to do to manage an available service.

There are five types of action scripts:

• Add action script contains all the commands to configure theservice on a system; for example, set up a system parameter,create a device special file, or edit a file. You may not need anadd script for your service.This script is executed on all member systems when youconfigure the service and when a member system reboots.

• Delete action script contains the commands to reverse theservice setup, to undo the add script; for example, if the addscript sets a system parameter, the delete script resets theparameter.This script is executed on all member systems when you deletea service from the ASE.

• Start action script contains the commands to start the serviceon a member system; for example, invoke the application.This script is executed only on the member selected to providethe service to start or restart (online) the service.

• Stop action script contains the commands to stop the service;for example, stop the application. The script must stop allprocesses accessing disks used in the service or it cannotunmount the disks and stop the service.This script is executed on the member selected to providethe service to stop (offline) or relocate the service. It is alsoexecuted when a member reboots, in case the server crashed.

• Check action script determines if a service is runningon a member. The default check action script checksfor the existence of the file /var/opt/TCR*/ase/tmp/ service_name _IS_RUNNING, which is created when TruClusterSoftware starts the service on that system.This script is executed when the TruCluster Software directordaemon starts, when you check the status of a service, orwhen you stop or delete a service.

Writing and Debugging Action Scripts 6–3

Page 216: Truclu Ase

Introducing Action Scripts

All members run the add script, while only the member selectedto offer the service runs the start script.

All the stop scripts are run on reboot to make sure the servicesare cleaned up. (TruCluster Available Server does not know ifthis system was running a service before reboot.)

AvailableServices andScripts

For NFS services, you usually specify service-specific informationin response to asemgr prompts and you need no other scripts.TruCluster Available Server includes internal scripts for NFS anddisk services to create devices, set up AdvFS file domains andLSM logical volumes, and mount and unmount file systems.

You must create the action scripts for a user-defined (application)service to define the operations to control your application. Youneed at least start and stop scripts.

If your disk service includes an application, you must createscripts to start and stop the application; scripts to fail over controlof the disks are created from your responses to asemgr prompts.

Script ExitCodes

The following table shows how TruCluster Software interpretsscript exit codes.

Table 6–1 Script Exit Codes

Script Exit Code Meaning

Add 0 (zero) Success

6=0 (non-zero) Failure

Delete 0 Success

6=0 Failure

Start 0 Success

6=0 Failure

Stop 0 Success

1 Failure

99 Failure because service or devicebusy

Check Between 100and 200

Service is running

Less than 100 Service is not running

Script Output All standard output and error output from a script goes to theTruCluster Software Logger daemon, if one is running in theenvironment. The Logger daemon passes messages to the syslogdaemon on the same system. If a Logger daemon is not runningin the environment, all messages are logged locally.

6–4 Writing and Debugging Action Scripts

Page 217: Truclu Ase

Introducing Action Scripts

If a script exits with a 0 (zero), it is logged as an informationalmessage. Otherwise, it is logged as an error.

SkeletonScripts

The Available Server skeleton scripts can be used as a basefor application-specific commands. They are located in the/var/opt/TCR*/ase/lib directory. The file names are:

• addAction

• checkAction

• deleteAction

• startAction

• stopAction

TruCluster Available Server provides skeleton scripts for you toadd your application-specific commands to. These scripts arefound in the /var/opt/TCR*/ase/lib directory. A sample skeletonstart action script is shown in Example 6–1. A sample skeletoncheck action script is shown in Example 6–2.

Example 6–1 Skeleton Start Action Script

# more /var/ase/lib startAction...

## A skeleton example of a start action script.#

PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to start

elsesvcName=

fi

## Any non zero exit will be considered a failure.#exit 0

Writing and Debugging Action Scripts 6–5

Page 218: Truclu Ase

Introducing Action Scripts

Example 6–2 Skeleton Check Action Script

# more /var/ase/lib/checkAction...

## A skeleton example of a check action script.#

PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to check

elsesvcName=

fi

## For check, exit with 100 to 200 if service is running on this member,# else exit with < 100 if not running.#if [ -f ${ASETMPDIR}/${svcName}_IS_RUNNING ]; then

exit 100else

exit 0fi

6–6 Writing and Debugging Action Scripts

Page 219: Truclu Ase

Introducing Action Scripts

Start and StopScripts

Example 6–3 shows a sample start script for an ASE 1.2 diskservice, a disk-based database application. This script is basedon the skeleton start action script. Example 6–4 shows thecorresponding stop script.

Example 6–3 Start Script

#!/bin/sh 1 2PATH=/sbin:/usr/sbin:/usr/bin 3export PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; then 4svcName=$1

elsesvcName=

fi

echo "ASE: Starting $svcName start script..." | tee /dev/console 5echo "ASE: Setting dbserver internet address ..." | tee /dev/console## This sets up the alias: ifconfig alias dbserver#/var/ase/sbin/nfs_ifconfig $svcName start dbserver 6

echo "ASE: Starting $svcName service ..." | tee /dev/console. /usr/local/dbadm/oracle_version/def_ora_syslog/bin/su oracle -c $DBSTART | tee /dev/console 7

echo "ASE: Service $svcName is running on ‘hostname‘" | tee /dev/console## Any non zero exit will be considered a failure.#exit 0 1 8

Writing and Debugging Action Scripts 6–7

Page 220: Truclu Ase

Introducing Action Scripts

The following example shows the corresponding stop script.

Example 6–4 Stop Script

#!/bin/sh 1PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

echo "ASE: Starting $svcName stop script" | tee /dev/console 5echo "ASE: Killing polyserver processes ..." | tee /dev/console/usr/local/bin/psvkill | tee /dev/consoleecho "ASE: Stopping the database service ..." | tee /dev/console. /usr/local/dbadm/oracle_version/def_ora_syslog/bin/su oracle -c $DBSHUT | tee /dev/console

echo "ASE: Deleting the dbserver internet address ..." | tee /dev/console/var/ase/sbin/nfs_ifconfig $svcName stop dbserver## exit 0 = success# exit 1 = failure# exit 99 = failure because busy#exit 0

1 This is part of the skeleton provided by TruCluster AvailableServer.

2 You can use any shell. It is good programming practice toidentify the shell in the first line. Otherwise the root’s loginshell executes the script.

3 Define the path needed for your commands.

4 If arguments were passed to the script, the first argumentshould be the service name.

5 This is user-defined action.

6 TruCluster Available Server provides the nfs_ifconfig scriptto alias the service name/address to the member system.

7 Start the application.

8 It would be more accurate to check the result of the user-defined action and return the result of that, rather thanalways returning success.

6–8 Writing and Debugging Action Scripts

Page 221: Truclu Ase

Creating Action Scripts

Creating Action Scripts

Overview To fail over an application, you must create a start actionscript and a stop action script. If you need to set up the systemenvironment to run the service, you must also create add anddelete action scripts. Create a check action script so TruClusterAvailable Server can determine if the service is running.

When you add or modify a service, you can use the asemgr utilityto specify your action scripts. You can create a script and specifythe pathname to asemgr , or you can edit a skeleton script throughasemgr .

You should test your action scripts outside the Available ServerEnvironment (ASE) to ensure they work correctly before makingthem part of a service. To debug your script in TruClusterAvailable Server, set the event logging level to ensure allsignificant messages are logged, start and stop the service, thencheck the system event logs.

The asemgr utility allows you to specify arguments passed to thescript and a timeout value representing the maximum length oftime the script needs to run. If the script takes longer than thetimeout value, Available Server will consider the script failed.

Methods toCreate Scripts

Use the asemgr utility to specify any user-defined action scriptswhen adding or modifying a service.

There are three ways to create and specify a user-defined actionscript for a service:

• Create the script outside TruCluster Available Server bycopying a skeleton script and modifying it. Test the scriptthoroughly before running asemgr . Specify the pathname whenasemgr prompts for the script name. Your script is then copiedinto the ASE database. Any further changes to the scriptmust be done using the asemgr utility because Available Serveruses only the database copy of the script.

• When the asemgr prompts you for a script name, specifydefault . You can then edit the TruCluster Available Server-provided skeleton action script with the commands the systemneeds to perform.

• Create the action script outside Available Server by copyingthe skeleton (default) script and modifying it. When you havethe script completed, copy it to all member nodes. Test thescript before running asemgr . Specify default when asemgrprompts for the script name. Edit the script, adding thepathname of the script you create. This method enablesyou to edit the script without interrupting the service bymaking changes with the asemgr utility. However, you must

Writing and Debugging Action Scripts 6–9

Page 222: Truclu Ase

Creating Action Scripts

redistribute copies of the script to all member nodes aftermaking changes.

SpecifyingYour OwnScript

The following example shows how to specify an action script atthe asemgr prompt. The script must already exist at the specifiedpathname. Verify that the script operates as expected before yourun asemgr .

Example 6–5 Specifying Your Own Action Script

Service Configuration

a) Add a new servicem) Modify a service

.

.

Modifying user-defined scripts for ‘service1‘:

1) Start action..

Modifying the start action script for ‘service1‘:

f) Replace the start action scripte) Edit the start action scriptg) Modify the start action script arguments [service1]t) Modify the start action script timeout [60]r) Remove the start action scriptx) Exit - done with changes

Enter your choice [x]: f

Enter the full pathname of your start action script or "default"for the default script (x to exit): /usr/sbin/dbase_account

Editing theDefault Script

The following example shows how to edit the default actionscript. When you choose the "Edit the start action script" menuoption (e), asemgr loads the appropriate skeleton action script andplaces you in the vi editor, or the editor defined by the EDITORenvironment variable. You can now make the appropriate changesto the file.

Example 6–6 Editing the Default Action Script

Enter the full pathname of your start action script or "default"for the default script (x to exit): default

e) Edit the start action script..

(continued on next page)

6–10 Writing and Debugging Action Scripts

Page 223: Truclu Ase

Creating Action Scripts

Example 6–6 (Cont.) Editing the Default Action Script

#!/bin/sh#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to start

elsesvcName=

fi

# Start prophecy_1 application:su - prophecy -c dbstart

## Any non zero exit will be considered a failure.#exit 0:wq

Pointing to anExternal Script

The following example shows how to edit the default action scriptto point to an external script. Be sure to copy the external scriptto all member systems.

Example 6–7 Pointing to an External Script

Enter the full pathname of your start action script or "default"for the default script (x to exit): default

e) Edit the start action script..

#!/bin/sh#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to start

elsesvcName=

fi

# Start application with script/usr/local/adm/start.ts

## Any non zero exit will be considered a failure.#exit 0:wq

Writing and Debugging Action Scripts 6–11

Page 224: Truclu Ase

Creating Action Scripts

AdditionalScriptInformation

In addition to script pathnames, the asemgr utility allows you tospecify:

• Arguments that are passed to scripts, useful if you have ageneric script that you need to pass the service name or anaction to make the script work.

• Timeout value, the maximum number of seconds AvailableServer should wait for the script to complete. If the scriptruns longer than the timeout value (for example, because ithas hung), Available Server considers the script failed andreports the failure as a timeout of the script.

6–12 Writing and Debugging Action Scripts

Page 225: Truclu Ase

Testing and Debugging Action Scripts

Testing and Debugging Action Scripts

Overview You should test your action scripts outside the TruClusterAvailable Server service to ensure they work correctly. To debugyour script in Available Server, set the event logging level toensure all significant messages are logged, start and stop theservice, then check the system event logs.

Test First Test your action scripts before you include them in the service. Abug in a script can cause the service to hang; neither start norstop cleanly. Restarting a service that is not stopped cleanly cancause data corruption or panic the system.

Your scripts should run without error. Some general debuggingtips include:

• Add echo commands to display variables or messagesindicating activity.

• Use the -n option (Bourne and C shells) to read commandsand check them for syntax errors without executing them.

• Use the -x option (Bourne, Korn and C shells) to printcommands and their arguments as they are executed.

DebuggingScripts in ASE

You can set the Available Server Environment (ASE) logging levelto informational to log all messages, including script executionsuccess. Once you add the scripts, examime the syslog daemonlog for any problems. If you added any debugging echo commandsto the script, they will show up in the log.

Writing and Debugging Action Scripts 6–13

Page 226: Truclu Ase

Summary

Summary

IntroducingAction Scripts

Action scripts contain the operations to set up, start and stopa service in an Available Server Environment (ASE), so that itcan fail over from one member to another. TruCluster AvailableServer supports the following types of action scripts:

• Add (configure) service

• Delete service

• Start service

• Stop service

• Check if a service is running

You need at least start and stop action scripts for applicationservices.

Creating ActionScripts

When you add or modify a service, you can use the asemgr utilityto:

• Specify your action scripts

• Create a script and specify the pathname to asemgr

• Edit a skeleton script through asemgr

Add your commands to the skeleton script

Add a pointer to an external script

The asemgr utility allows you to specify arguments passed to thescript and a timeout value representing the maximum length oftime the script needs to run.

Testing andDebuggingAction Scripts

You should test your action scripts outside the TruClusterAvailable Server service to ensure they work correctly. To debugyour script, set the event logging level to ensure all significantmessages are logged, start and stop the service, then check thesystem event logs.

6–14 Writing and Debugging Action Scripts

Page 227: Truclu Ase

Exercises

Exercises

IntroducingAction Scripts:Exercise

1. Describe the five types of action scripts.

2. For each of the types of TruCluster Available Server services,which scripts must you define?

3. Where does script output go? How is it treated when thescript succeeds; when it fails?

4. TruCluster Available Server provides skeleton scripts in the/var/opt/TCR*/ase/lib directory. Examine these scripts anddetermine what each does.

IntroducingAction Scripts:Solution

1. There are five types of action scripts:

• Add action script contains all the commands to set up thesystem environment for the service to run.

• Delete action script contains all the commands to reverseactions in the add script.

• Start action script contains all the commands to start aservice on a member system.

• Stop action script reverses the actions in the start script;for example, stop the application.

• Check action script enables TruCluster Available Server todetermine if a service is running.

2. You do not generally need scripts for an NFS service. If adisk service includes an application, you must create scriptsto start and stop the application. You need at least start andstop scripts for user-defined (application) service.

3. Standard output and error output from a script goes toaselogger (then to syslog ). Script success is logged as aninformational message; script failure is logged as an error.

4. The skeleton action scripts addAction , deleteAction ,startAction , and stopAction are similar. Here is the addActionscript:

Writing and Debugging Action Scripts 6–15

Page 228: Truclu Ase

Exercises

## A skeleton example of an add action script.#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to add

elsesvcName=

fi## Any non zero exit will be considered a failure.#exit 0

They define a command path and a temporary directory,determine the service name from the first argument, and exitwith a success code.The stopAction script runs when the service is stopped, andalso when the TruCluster Available Server is initializing on amember as it boots.

## A skeleton example of a stop action script.#

PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to stop

elsesvcName=

fi

case "${MEMBER_STATE}" inBOOTING) # Stopping ${svcName} as ASE member boots.

;;RUNNING) # This is a true stop of ${svcName}.

;;esac

## exit 0 = success - service stopped successfully# exit 1 = failure - could not stop service# exit 99 = failure - could not stop service (service busy)#exit 0

CreatingAction Scripts:Exercise

1. Write start and stop scripts for the calculator application.Place the scripts in the /usr/local/adm directory.

a. Create a start script (calc-start ) to display the calculatoron a client workstation. Use the command:

/usr/bin/X11/dxcalc -d client &

6–16 Writing and Debugging Action Scripts

Page 229: Truclu Ase

Exercises

(Suggestion: you can use the skeleton scripts in /var/opt/TCR*/ase/lib .)

b. Create a stop script (calc-stop ) to stop the calculator. Usethe command:

kill ‘ps -e |grep "dxcalc" |grep -v grep |awk ’{print $1}’‘

c. What are the disadvantages of this kill command?

d. An alternative is for the stop script to write the processID (PID) of the application to a file (by convention in the/var/run directory). Then the stop script can kill the rightprocess. Update your scripts. (Hint: use the service namefor a file name.)

2. Write a single action script (calc-start-stop ) that will startand stop the calculator application depending on the argumentpassed in.

3. There are problems with the following script. Can you identifythem?

## calc start action script.4#PATH=/sbin:/usr/sbin:/usr/binexport PATH

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

dxcalc -d tinker &

exit 0

4. The following start script fails because of a timeout error. Canyou identify the reason?

#!/bin/sh# calc start action script.5#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

/usr/bin/X11/dxcalc -d tinker

exit 0

Writing and Debugging Action Scripts 6–17

Page 230: Truclu Ase

Exercises

CreatingAction Scripts:Solution

1. Write start and stop scripts for the calculator application.

a. Sample solution

#!/bin/sh# calc start action script.#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

/usr/bin/X11/dxcalc -d tinker &

exit $?

b. Sample solution

#!/bin/sh# calc stop action script.#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

kill ‘ps -e |grep "dxcalc" |grep -v grep |awk ’{print $1}’‘

exit $?

c. The disadvantage of this kill command is if several dxcalcprocesses are running, this command may stop the wrongone.

d. Sample start script

#!/bin/sh# calc start action script.2#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

/usr/bin/X11/dxcalc -d tinker &pid=$!echo $pid > /var/run/${svcName}.pid

exit $?

6–18 Writing and Debugging Action Scripts

Page 231: Truclu Ase

Exercises

Sample stop script

#!/bin/sh# calc stop action script.2#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

kill ‘cat /var/run/${svcName}.pid‘

exit $?

2. Sample solution

#!/bin/sh# calc action script.3#PATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1 # Service name to stop

elseecho "$0: Insufficient arguments"echo "usage: service start|stop"exit 1

fi

if [ $# -gt 1 ]; thenaction=$2elseecho "$0: Insufficient arguments"echo "usage: $1 start|stop"exit 1

fi

case "${action}" in’start’)

/usr/bin/X11/dxcalc -d rdwngs:0 &pid=$!echo "Service name is " $svcName | tee /dev/consoleecho "$svcName PID is " $pid | tee /dev/consoleecho $pid > /var/run/${svcName}.pidexit $?;;

’stop’)

echo "Stopping " $svcName "service" | tee /dev/consolekill ‘cat /var/run/${svcName}.pid‘exit $?;;

esacexit 0

Writing and Debugging Action Scripts 6–19

Page 232: Truclu Ase

Exercises

3. The shell is not defined; if the default shell of TruClusterAvailable Server (root) is not sh , the script may not runcorrectly. The application path is not defined. Even if thedxcalc command fails, the script returns a zero (success).

4. The application is invoked in foreground rather thanbackground and does not return.

Testing andDebuggingAction Scripts:Exercise

1. The following start script fails. Can you identify the problem?

#!/bin/shPATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

if [ -f /usr/local/appstart ]; thensu - dbmaster -c /usr/local/appstart

exit 0

2. The following stop script stops the application and unmountsthe disk. Can you identify any potential problems?

#!/bin/shPATH=/sbin:/usr/sbin:/usr/binexport PATH

/usr/local/application stop/sbin/umount /appdataexit 0

Testing andDebuggingAction Scripts:Solution

1. Syntax error; ‘if ’ unmatched.

2. There is no error checking. If a file is open, the disk will failto unmount. The stop script still succeeds. The start script onanother member will mount and reserve the disk, which canthen be seen as mounted on both systems. (However, only thesystem with the SCSI reserve can perform I/O to the disk.)

6–20 Writing and Debugging Action Scripts

Page 233: Truclu Ase

7Setting Up ASE Services

Setting Up ASE Services 7–1

Page 234: Truclu Ase

About This Chapter

About This Chapter

Introduction To make an application highly available, set up an AvailableServer environment (ASE) service for that application. ASEselects a member system to provide the service to client systems.Clients refer to the service name rather than the server name. Ifthe server fails, ASE will relocate the service to another membersystem.

This chapter discusses the types of services that ASE supportsand how to set them up. It examines the action scripts to controla service, as well as using the asemgr utility to manage services.

Objectives To set up and manage ASE services, you should be able to:

• Describe the services ASE supports

• Describe the service control structure: the automatic serviceplacement policy.

• Set up a NFS service

• Set up a Disk service

• Set up a user-defined service

• Manage services in the Available Server environment

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

7–2 Setting Up ASE Services

Page 235: Truclu Ase

Understanding Highly Available Services

Understanding Highly Available Services

Overview To make an application highly available, you set up an AvailableServer environment (ASE) service for that application.

Each service is assigned a unique name.

IntroducingSupportedServices

ASE supports three types of services:

• A Network File System (NFS) service provides highly availableaccess to exported disk data. When you create an NFS service,specify the UNIX file systems, AdvFS filesets, or LSM logicalvolumes to export.

• A disk service provides highly available access to disks or adisk-based application, such as a database program. A diskservice is similar to an NFS service except that no data isexported. When you create a disk service, specify the UNIXfile systems, AdvFS filesets, or LSM logical volumes and anyapplication that you want to make available.

• A user-defined service provides highly available access toan application that is not disk based; for example, a loginservice. When you create a user-defined service, specify theapplication.

DescribingClients andServices

Clients refer to service names rather than server names. Thefollowing figure shows the client view of the ASE environment.

Setting Up ASE Services 7–3

Page 236: Truclu Ase

Understanding Highly Available Services

Figure 7–1 Client View of ASE

Member system

Client Client Client

ZKOX−3816−82−RGS

nfs_service

NFSprograms

dbase_service

Databaseapplication

Member system

mail_service

NFSprograms

login_service

ifconfig

sendmail

To access the NFS service nfs_service , a client will have a linesuch as the following in its /etc/fstab file:

/project@nfs_service /usr/project nfs rw,bg 0 0

It will also have an entry in its /etc/hosts file for nfs_servicewith an Internet address. This is not the address associatedwith the host name of any system in the network. It is a floatingaddress aliased to the member system currently running theservice.

If the service is relocated to another member system, the newmember system will respond to that Internet address. Clients areunaware of the change in the system exporting the file systemsand experience only a temporary NFS server time out.

Setting Up aService

To set up a service, you must prepare your application anddisks. Provide information to ASE about the application, such ascommands to start and stop the application. You can restrict theservice so that it runs only on a select group of member systems.

After you use the asemgr utility to set up a service, ASE choosesa member system to run the service for clients. Although eachmember system can run any service, only one member systemruns a service at one time.

7–4 Setting Up ASE Services

Page 237: Truclu Ase

Preparing to Set Up Services

Preparing to Set Up Services

Overview Before adding a service, you must plan how to set up the serviceand perform some preparatory tasks. For example, you may needto set up NFS, AdvFS, or LSM. You must install any applicationyou want to make highly available before you set up the service.

You must assign each service a unique name and an automaticservice placement (ASP) policy.

You may also need to generate start and stop action scripts tostart and stop an application.

Documentation The TruCluster Available Server Software Available ServerEnvironment Administration, Chapter 2 describes many of thetasks to prepare to set up services. It includes information onusing AdvFS and LSM.

AutomaticServicePlacementPolicy

You must designate an automatic service placement policy (ASP)when you create a service. The automatic service placementpolicy enables you to control which members can run the service.The ASP is used when ASE automatically starts or relocates aservice. You can override the automatic placement policy by usingthe asemgr utility to manually relocate a service.

There are three automatic service placement (ASP) policies:

• Balanced service distribution tries to balance the serviceload. ASE will choose the member running the least numberof services at the time the new service is started.

• Favor members checks the specified members in order first.If one of them is available, it is selected to run the service. Ifnone of the favored members are available, ASE will choosethe member running the least number of services.

• Restrict to favored members checks the specified membersin order. If none of the favored members are available, ASEwill not start the service. This policy ensures that ASE nevermoves the service to a member not on the list.You can however, manually relocate the service to anothermember not on the list.

In addition, you must specify how you want ASE to react when amore highly favored member system becomes available. You canchoose to relocate the service to the more highly favored memberor keep it running on the current member.

Setting Up ASE Services 7–5

Page 238: Truclu Ase

Preparing to Set Up Services

Services andDisks

NFS and disk services can use both applications and disks.User-defined services use applications.

A disk cannot be used in more than one service because a servicemust have exclusive access to the disk. When you use a disk in aservice, you use the entire disk. The availability manager driverreserves a disk using the SCSI reserve command.

Once a disk is used in an ASE service, it must be managed withinthe ASE.

To stop a disk-based service, ASE must be able to unmount thefile systems. This means ASE must be able to stop all processesaccessing the mounted file systems. You should ensure that allprocesses invoked by the start action script are stopped by thestop action script. Avoid users accessing the local mount point(and preventing unmounting) by allowing access only to thedirectory that is exported.

Using NFS NFS service names cannot be member system names. Servicenames and member names must be unique. Service names andmember names must have addresses on the same IP subnet, andmust be in all members’ /etc/hosts files. The service name andIP address must also be in all clients /etc/hosts file.

Do not use the automount command option /net -hosts on anyclient system to access an NFS service. It may cause a stale filehandle error if the service is relocated.

To enable client access through NFS on a member system, createan entry in each member system’s /etc/fstab file specifying theexported path and the service name as the remote host.

Using UFS To use UFS with ASE, set up the disks in the usual way withdisklabel and newfs . Do not locally mount the file systemsbecause ASE mounts them for you when the service is started.

Using AdvFS To use AdvFS with ASE, set up the domains and filesets on thesame member on which you will run asemgr to add the service.A service can use more than one AdvFS domain, but a domaincannot be used by more than one service. A service should controlall the filesets in the domain; do not put one fileset in a serviceand mount another locally. Do not locally mount the filesetsbecause ASE mounts them for you when the service is started.

All member systems should have AdvFS software installed.

7–6 Setting Up ASE Services

Page 239: Truclu Ase

Preparing to Set Up Services

Quotas You can enable quotas on UFS file systems or AdvFS filesets usedin an ASE service. To use quotas you must mount the /proc filesystem on each member system that you want to fail over.

Add the /proc file system to the /etc/fstab file as follows:

/proc /proc procfs rw

There are two methods to set up disk quotas on a UNIX filesystem:

• Using the quotacheck command:

1. Set up the quota.user or quota.group file.

2. Use the asemgr utility to specify the file system in a service.When prompted for quota files, specify the pathname of thequota.user or quota.group file.

• Using the asemgr and edquota utilities:

1. Use asemgr to specify the file system.

2. When prompted for quota files, specify the pathname forquota.user or quota.group .

3. After you start the service, use the edquota command tospecify the quota limits. The edquota command must berun on the system running the service.

For AdvFS filesets, when you use the mkfset command to createan AdvFS fileset, the command sets up quota files in the root ofthe fileset. To set up quotas on an AdvFS fileset in ASE:

• Use the asemgr utility to specify the fileset.

• Specify the quota.user or quota.group file.

• After adding the service, use the vedquota command to specifyquota limits in the files.

Using LSM You can use LSM logical volumes in NFS and disk services. Youcan also use a UNIX file system or an AdvFS fileset on top of anLSM volume. Set up the disk groups, logical volumes and filesystems or filesets on the same member on which you will runasemgr to add the service.

All member systems need LSM software so that any of them canrun the service. Each member system needs a rootdg disk groupset up on a local (nonshared) disk. The rootdg disk group mustbe active (imported) whenever ASE is active, to provide an activedisk group for LSM. Set up other disk groups using the shareddisks.

See TruCluster Available Server Software Available ServerEnvironment Administration Chapter 2 for more information onusing LSM with ASE.

Setting Up ASE Services 7–7

Page 240: Truclu Ase

Preparing to Set Up Services

Installing theApplication

If your service includes an application, install the applicationbefore setting up the service. For example, if you are setting upan NFS mail service, set up the mail hubs; if you are setting upa database service, install the database program on all membersystems.

An application must have the following characteristics to be madehighly available with ASE:

• The application must run on only one system at a time.

• The application must be able to be started and stopped usinga series of commands (action script).

ConfigurationExample

This example configuration is based on information inSUPER::USER6:[SES$PUBLIC]PARIBAS_DIR.TAR .

Consider a sample configuration (Example 7–1) using two DEC7610 systems with four shared SCSI buses, each with threeRZ26L disks, to provide a disk service consisting of a 6-gigabyte,mirrored Oracle database made up of two stripe sets of six disks.

This configuration uses LSM to provide a mirrored stripe set, thelogical volume voldb , as shown in the following volprint report.

This configuration uses an AdvFS domain (db_dom) on the logicalvolume voldb and fileset db_fs ( Example 7–2).

7–8 Setting Up ASE Services

Page 241: Truclu Ase

Preparing to Set Up Services

Example 7–1 Providing a Mirrored Stripe Set Using LSM

Disk group: rootdg (Local disk group)

TYPE NAME ASSOC KSTATE LENGTH COMMENTdg rootdg rootdg - -

dm rz1g rz1g - 3584

Disk group: db (Shared disk group)

TYPE NAME ASSOC KSTATE LENGTH COMMENTdg db db - -

dm pd-rz33 rz33 - 2050347dm pd-rz34 rz34 - 2050347dm pd-rz35 rz35 - 2050347dm pd-rz41 rz41 - 2050347dm pd-rz42 rz42 - 2050347dm pd-rz43 rz43 - 2050347dm pd-rz49 rz49 - 2050347dm pd-rz50 rz50 - 2050347dm pd-rz51 rz51 - 2050347dm pd-rz57 rz57 - 2050347dm pd-rz58 rz58 - 2050347dm pd-rz59 rz59 - 2050347

vol voldb fsgen ENABLED 12301824 (Logical volume)plex db-01 voldb ENABLED 12301824 (Mirror 1)sd pd-rz33-log db-01 - 1sd pd-rz33-data db-01 - 2050304sd pd-rz41-data db-01 - 2050304sd pd-rz34-data db-01 - 2050304sd pd-rz42-data db-01 - 2050304sd pd-rz35-data db-01 - 2050304sd pd-rz43-data db-01 - 2050304plex db-02 voldb ENABLED 12301824 (Mirror 2)sd pd-rz49-log db-02 - 1sd pd-rz49-data db-02 - 2050304sd pd-rz57-data db-02 - 2050304sd pd-rz50-data db-02 - 2050304sd pd-rz58-data db-02 - 2050304sd pd-rz51-data db-02 - 2050304sd pd-rz59-data db-02 - 2050304

Setting Up ASE Services 7–9

Page 242: Truclu Ase

Preparing to Set Up Services

Example 7–2 Creating a File Domain Using AdvFS

# showfdmn -k db_dom

Id Date Created LogPgs Domain Name2e6b3030.000b9620 Mon Sep 5 16:50:24 1994 512 db_dom

Vol 1K-Blks Free % Used Cmode Rblks Wblks Vol Name1L 6150912 4851072 21% on 128 128 /dev/vol/db/voldb

# showfsets db_domdb_fs

Id : 2e6b3030.000b9620.1.8001Files : 529, SLim= 0, HLim= 0Blocks (512) : 2587658, SLim= 0, HLim= 0Quota Status : user=on group=on

7–10 Setting Up ASE Services

Page 243: Truclu Ase

Setting Up NFS Services

Setting Up NFS Services

Overview This topic describes how to use the asemgr to add a Network FileSystem (NFS) service to an available server environment (ASE).

Describing anNFS Service

An NFS service exports one or more file systems, AdvFS filesets,or LSM logical volumes located on one or more entire disks. Ifa hardware or software failure occurs, ASE relocates the filesystems, filesets, or logical volumes to another member systemfor export to clients. To enable clients to access an NFS service,the NFS service name is assigned its own Internet address. Themember system that runs the service responds to this address.

Both the client and member systems must be running NFSVersion 2.0 or Version 3.0 and use the Address Resolution Protocol(ARP).

NFS Version 3.0 has an option to use TCP connections for NFSmounts. This option cannot be used with an ASE service.

Discussingthe NFSService SetupProcedure

To set up an NFS service:

1. Specify the service name and its Internet address in allmember and client systems’ /etc/hosts files. The service nameis a virtual host name making the service independent of theavailability of any member system.

2. Use the asemgr utility to specify the following:

• Service name (already in /etc/hosts )

• Disks

UFS device special file names

AdvFS filesets

LSM volumes

• Mount pathname to be exported to clients

• (Optional) netgroups or system names allowed access

• Read-write or read-only access

• (Optional) mount options

• NFS locking areaIf you have more than one writeable disk areas in aservice, you need to specify a writeable space to store somestate information for NFS locking that must be failed over.

• Service automatic service placement (ASP) policy andfavored members

3. Add the service name and export mount point to the clients’/etc/fstab files.

Setting Up ASE Services 7–11

Page 244: Truclu Ase

Setting Up NFS Services

Setting Up anNFS Servicefor a PublicDirectory

Example 7–3 in this section covers how to use the assemgr to setup an NFS service for a public directory to be exported to othersystems. This example uses an AdvFS domain, but could use aUFS file system or LSM volume.

The procedure in Example 7–3 assumes that the AdvFS domainhas already been set up.

Example 7–3 shows how to use the asemgr utility to add the NFSservice nfspublic , which exports an AdvFS domain.

Example 7–3 Adding an NFS Service

# asemgr 1

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m 2

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: c 3

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: a 4

Adding a service

Select the type of service:

1) NFS service2) Disk service3) User-defined service

x) Exit to Service Configuration ?) Help

(continued on next page)

7–12 Setting Up ASE Services

Page 245: Truclu Ase

Setting Up NFS Services

Example 7–3 (Cont.) Adding an NFS Service

Enter your choice [1]: 1 5

You are now adding a new NFS service to ASE.

An NFS service consists of an IP host name and disk configuration that arefailed over together. The disk configuration can include UFS file systems,AdvFS filesets, or LSM volumes.

NFS Service Name

The name of an NFS service is a unique IP host name that has been set upfor this service. This host name must exist in the local hosts databaseon all ASE members.

Enter the NFS service name (’q’ to quit): nfspublic 6

Checking to see if nfspublic is a valid host...

Specifying Disk Information

Enter one or more UFS device special files, AdvFS filesets, or LSM volumesto define the disk storage for this service.

For example: UFS device special file: /dev/rz3cAdvFS fileset: domain1#set1LSM volume: /dev/vol/dg1/vol01

To end entering disk information, press the ’Return’ key at the prompt.

Enter a device special file, an AdvFS fileset, or an LSM volume as storagefor this service (press ’Return’ to end): public-domain#public 7

ADVFS domain ‘public-domain‘ has the following volume(s):/dev/vol/public-dg/public-vol

Is this correct (y/n) [y]: y 8

Following is a list of device(s) and pubpath(s) for disk group public-dg:

DEVICE PUBPATH

rz18 /dev/rz18grz34 /dev/rz34g

Is this correct (y/n) [y]: y 9

Enter a directory pathname(s) to be NFS exported from the storage area"public-domain#public". Press ’Return’ when done.

Directory pathname: /usr/nfspublic 1 0

Enter a host name, NIS netgroup, or IP address for the NFS exportslist. (press ’Return’ for all hosts):

Directory pathname: Return 1 1

AdvFS Fileset Read-Write Access and Quota Management

Mount ‘public-domain#public‘ fileset with read-write or read-only access?

1) Read-write2) Read-only

Enter your choice [1]: 1 1 2

You may enable user, and group and fileset quotas on this file system byspecifying the full pathnames for the quota files. Quota files must residewithin the fileset. Enter "none" to disable quotas.

(continued on next page)

Setting Up ASE Services 7–13

Page 246: Truclu Ase

Setting Up NFS Services

Example 7–3 (Cont.) Adding an NFS Service

User quota file path [/var/ase/mnt/nfspublic/usr/nfspublic/quota.user]: Return 1 3Group quota file path [/var/ase/mnt/nfspublic/usr/nfspublic/quota.group]: Return

AdvFS Mount Options Modification

Enter a comma-separated list of any mount options you want to use forthe ‘public-domain#public‘ fileset (in addition to the defaults listed in themount.8 reference page). If none are specified, only the default mountoptions are used.

Enter options (Return for none): Return 1 4

Specifying Disk Information

Enter one or more UFS device special files, AdvFS filesets, or LSM volumesto define the disk storage for this service.

For example: UFS device special file: /dev/rz3cAdvFS fileset: domain1#set1LSM volume: /dev/vol/dg1/vol01

To end entering disk information, press the ’Return’ key at the prompt.

Enter a device special file, an AdvFS fileset, or an LSM volume as storagefor this service (press ’Return’ to end): Return 1 5

Modifying user-defined scripts for ‘nfspublic‘:

1) Start action2) Stop action3) Add action4) Delete action

x) Exit - done with changes

Enter your choice [x]: Return 1 6

Selecting an Automatic Service Placement (ASP) Policy

Select the policy you want ASE to use when choosing a memberto run this service:

b) Balanced Service Distributionf) Favor Membersr) Restrict to Favored Members

x) Exit to Service Configuration ?) Help

Enter your choice [b]: b 1 7

Selecting an Automatic Service Placement (ASP) Policy

Do you want ASE to consider relocating this service to another memberif one becomes available while this service is running (y/n/?): y 1 8

Enter ’y’ to add Service ’nfspublic’ (y/n): y 1 9

Adding service...Starting service...Service nfspublic successfully added...

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

(continued on next page)

7–14 Setting Up ASE Services

Page 247: Truclu Ase

Setting Up NFS Services

Example 7–3 (Cont.) Adding an NFS Service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: s 2 0

Service Status

Select the service whose status you want to display:

1) nfsusers on tinker2) nfspublic on tailor

x) Exit to previous menu ?) Help

Enter your choice [x]: 2 2 1

Status for NFS service ‘nfspublic‘

Status: Relocate: Placement Policy: Favored Member(s):on tailor yes Balance Services None

Storage configuration for NFS service ‘nfspublic‘

NFS Exports list/usr/nfspublic

Mount Table (device, mount point, type, options)public-domain#public /var/ase/mnt/nfspublic/usr/nfspublic advfs rw,groupquota,userquota

Advfs ConfigurationDomain: Volume(s):public-domain /dev/vol/public-dg/public-vol

LSM ConfigurationDisk Group: Device(s):public-dg rz18 rz34

Press ’Return’ to continue: Return

Service Status

Select the service whose status you want to display:

1) nfsusers on tinker2) nfspublic on tailor

x) Exit to previous menu ?) Help

Enter your choice [x]: Return

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: Return

Managing ASE Services

(continued on next page)

Setting Up ASE Services 7–15

Page 248: Truclu Ase

Setting Up NFS Services

Example 7–3 (Cont.) Adding an NFS Service

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: Return

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: x#

1 Invoke the asemgr utility.

2 From the ASE Main menu, choose the Managing ASE Servicesitem.

3 From the Managing ASE Services menu, choose the ServiceConfiguration item.

4 From the Service Configuration menu, choose the Add a newservice item.

5 From the Add a new service menu, choose the NFS service.

6 Enter the service name, the virtual host name already in the/etc/hosts file with an Internet address.

7 Enter the UFS device special file, AdvFS fileset, or LSMvolume which defines the disk storage for this service. Therecan be more than one UFS device special file, and so forth.Other storage is added later on.

8 Because the storage is an AdvFS fileset on an LSM volume,you are required to verify that the LSM volume is correct.

9 Verify the devices which make up the disk group.

1 0 Enter the directory to be exported to clients.

1 1 You can specify a netgroup or system names allowed access.

1 2 Select the read-write or read-only mount option.

1 3 Enter the path names for user and group quota files or use thedefault. Enter "none" to disable quotas.

1 4 Enter any other mount options you want to use.

1 5 You can enter another UFS device special file, AdvFS fileset,or LSM volume for this service.

7–16 Setting Up ASE Services

Page 249: Truclu Ase

Setting Up NFS Services

1 6 Action scripts are not generally needed for NFS services.

1 7 Choose a placement policy for this service. The help screensdescribe the various options. If you choose one of the favoredmembers policies, you will be prompted to select the members.

1 8 Determine whether the service will be relocated to anothermember.

1 9 Confirm that you want to add the service. This updates theASE database. If there are errors, the service will not beadded. Check for errors in the /var/adm/syslog.dated daemonlog.

2 0 If desired, select Display the status of a service from theService Configuration menu to display the status of the newservice, or any existing service.

2 1 Select the service you want to display.

Discussing the/etc/exports.aseFile

To export the NFS service to clients, the asemgr creates an/etc/exports.ase file, which is included in the /etc/exports file.The /etc/exports.ase file includes an exports file for each service,/etc/exports.ase. servicename . This specifies the device specialfile, the pathname to export, the local mount point (-m option),and the mount options.

Example 7–4 shows the exports files for the NFS service added inExample 7–3 with the entry for the exiting NFS service.

Setting Up ASE Services 7–17

Page 250: Truclu Ase

Setting Up NFS Services

Example 7–4 /etc/exports.ase File

# more /etc/exports...

.INCLUDE /etc/exports.ase

# more /etc/exports.ase.INCLUDE /etc/exports.ase.nfsusers.INCLUDE /etc/exports.ase.nfspublic

# more /etc/exports.ase.nfspublic## ASE exports file for service nfspublic (ONLY EDIT THIS FILE WITH asemgr)#

#public-domain#public exports (after this line) - DO NOT DELETE THIS LINE/usr/nfspublic -m=/var/ase/mnt/nfspublic/usr/nfspublic#

NFS MailService

An NFS mail service fails over mail hubs (servers) so that mailservice is highly available. If a hardware or software error occurs,ASE relocates the queued mail and reroutes any new mail to anew hub.

To set up a highly available mail service with ASE, the filesystems containing the mailbox directory /var/spool/mail and themail queue area /var/spool/mqueue must be set up as an NFSservice, and the mail hubs’ sendmail.cf configuration file must bemodified to ensure that the service name is treated as a virtualhost. The mail hub member systems then NFS mount the maildirectories from the service.

7–18 Setting Up ASE Services

Page 251: Truclu Ase

Setting Up a Disk Service

Setting Up a Disk Service

Overview This topic describes how to use the asemgr to add a disk service inan Available Server Environment (ASE).

Describing aDisk Service

A disk service includes one or more file systems, AdvFS filesets,or LSM logical volumes on one or more entire disks, and usuallyan application that utilizes the disks.

A database application is a popular disk service. Most commercialdatabase programs provide a rollback/commit function, where asystem failure will roll back uncommitted transactions on reboot.This makes them good candidates for ASE failover.

Describingthe Set UpProcedure for aDisk Service

Before setting up a disk service, install the application softwareon all member systems, and prepare the shared disks. If youwant to fail over an application as well as disks, you will needaction scripts to start and stop the application.

Write start and stop action scripts and debug them before addingthe disk service that will use the action scripts.

A disk service is not the same as a distributed raw disk (DRD)service, which is supported only if you have the TruClusterProduction Server Software product. Disk services typicallyinvolve file system usage, while DRD services provide clusterwideaccess to raw physical disks.

Before using the asemgr to set up a disk service, you must:

• Include the disk service name, with the Internet address inthe /etc/hosts file of each member system.

• Ensure that the Internet address associated with a diskservice is on the same subnet as the member systems.

• Client systems that will access the disk service must have thedisk service name and Internet address in their /etc/hostsfile.

Use the asemgr utility to specify the following:

• Unique service name

• Disks

UFS device special file names

AdvFS filesets

LSM logical volumes

• (Optional) mount points for each file system, fileset, or volume

Setting Up ASE Services 7–19

Page 252: Truclu Ase

Setting Up a Disk Service

Specify NONE if you do not want ASE to automatically mountthe devices. For example, an application using raw disks doesnot use mounted devices. Client systems specify this mountpoint in their /etc/fstab file to access the services file system.

• Read-write or read-only access and (optional) mount options

• Automatic Service Placement (ASP) policy and favoredmembers

• Action scripts to fail over the application

Using aNetwork Alias

Client access to the application may be able to use a service namealiased to the member system providing the service.

Add the service (pseudo host) name and Internet address to allmember and client systems’ /etc/hosts file. Use the /var/ase/sbin/nfs_ifconfig script in the start and stop action scripts toestablish and remove the alias.

Example 7–5 shows a start script that establishes the servicealias.

Example 7–5 Using a Network Alias

#!/bin/shPATH=/sbin:/usr/sbin:/usr/binexport PATHASETMPDIR=/var/ase/tmp

if [ $# -gt 0 ]; thensvcName=$1

elsesvcName=

fi

# This sets up the alias: ifconfig interface alias $svcName/var/ase/sbin/nfs_ifconfig $svcName start aliasnamestatus=$?if [ $status != 0 ]then

echo "$0: Can not set alias; exit status of nfs_ifconfig = $status"exit 1

fiexit 0

• nfs_ifconfig is a script that uses ifconfig to establish andremove the alias.

• $svcName is an internal variable set to the service name.

• start is the keyword to establish the alias; stop is the keywordto remove the alias.

• aliasname is the pseudo host name in /etc/hosts .

7–20 Setting Up ASE Services

Page 253: Truclu Ase

Setting Up a Disk Service

Setting Up aDisk Servicefor a DatabaseApplication

Consider a sample configuration with two member systems, eachwith two shared SCSI buses. Each SCSI bus has two RZ28 disksto provide a disk service consisting of a 4-gigabyte, mirroreddatabase, made up of two stripe sets of two disks each. Thisconfiguration uses LSM to provide a mirrored stripe set, the diskgroup disks-dbase , and AdvFS to create a file domain dbase-domainand fileset dbase .

Example 7–6 shows how to use the asemgr utility to add the diskservice database using the AdvFS filset dbase-domain#dbase .

Example 7–6 Adding a Disk Service

# asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m 1

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: c 2

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: a 3

Adding a service

Select the type of service:

1) NFS service2) Disk service3) User-defined service

x) Exit to Service Configuration ?) Help

(continued on next page)

Setting Up ASE Services 7–21

Page 254: Truclu Ase

Setting Up a Disk Service

Example 7–6 (Cont.) Adding a Disk Service

Enter your choice [1]: 2 4

You are now adding a new disk service to ASE.

A disk service consists of a disk-based application and disk configurationthat are failed over together. The disk configuration can include UFSfile systems, AdvFS filesets, LSM volumes, or raw disk information.

Disk Service Name

The name of a disk service must be a unique service name. Optionally,an IP address may be assigned to a disk service. In this case, thename must be a unique IP host name set up for this service and presentin the local hosts database on all ASE members.

Enter the disk service name (’q’ to quit): dbase-01 5

Assign an IP address to this service? (y/n): y 6

Checking to see if dbase-01 is a valid host...

Specifying Disk Information

Enter one or more device special files, AdvFS filesets, or LSM volumesto define the disk storage for this service.

For example: Device special file: /dev/rz3cAdvFS fileset: domain1#set1LSM volume: /dev/vol/dg1/vol01

To end the list, press the Return key at the prompt.

Enter a device special file, an AdvFS fileset, or an LSM volume as storagefor this service (press ’Return’ to end): dbase-domain#dbase 7ADVFS domain ‘dbase-domain‘ has the following volume(s):

/dev/vol/disks-dbase/database-vol

Is this correct (y/n) [y]: Return 8

Following is a list of device(s) and pubpath(s) for disk group disks-dbase:

DEVICE PUBPATH

rz19 /dev/rz19grz20 /dev/rz20grz35 /dev/rz35grz36 /dev/rz36g

Is this correct (y/n) [y]: Return 8

Mount Point

The mount point is the directory on which to mount ‘dbase-domain#dbase‘.If you do not want it mounted, enter "NONE".

Enter the mount point or NONE: /usr/dbase 9

AdvFS Fileset Read-Write Access and Quota Management

Mount ‘dbase-domain#dbase‘ fileset with read-write or read-only access?

1) Read-write2) Read-only

Enter your choice [1]: 1 1 0

(continued on next page)

7–22 Setting Up ASE Services

Page 255: Truclu Ase

Setting Up a Disk Service

Example 7–6 (Cont.) Adding a Disk Service

You may enable user, and group and fileset quotas on this file system byspecifying the full pathnames for the quota files. Quota files must residewithin the fileset. Enter "none" to disable quotas.

User quota file path [/usr/dbase/quota.user]: Return 1 1

Group quota file path [/usr/dbase/quota.group]: Return

AdvFS Mount Options Modification

Enter a comma-separated list of any mount options you want to use forthe ‘dbase-domain#dbase‘ fileset (in addition to the defaults listed in themount.8 reference page). If none are specified, only the default mountoptions are used.

Enter options (Return for none): Return 1 2

Specifying Disk Information

Enter one or more device special files, AdvFS filesets, or LSM volumesto define the disk storage for this service.

For example: Device special file: /dev/rz3cAdvFS fileset: domain1#set1LSM volume: /dev/vol/dg1/vol01

To end the list, press the Return key at the prompt.

Enter a device special file, an AdvFS fileset, or an LSM volume as storagefor this service (press ’Return’ to end): Return 1 3

Modifying user-defined scripts for ‘dbase-01‘:

1) Start action2) Stop action3) Add action4) Delete action

x) Exit - done with changes

Enter your choice [x]: 1 1 4

Modifying the start action script for ‘dbase-01‘:

a) Add a start action script) Edit the start action script) Modify the start action script arguments []) Modify the start action script timeout [60]) Remove the start action script

x) Exit - done with changes

Enter your choice [x]: a 1 5

Enter the full pathname of your start action script or "default"for the default script (x to exit):/usr/local/adm/database /usr/local/adm/database) 1 6

Enter the argument list for the start action script(x to exit): start 1 7

Enter the timeout in seconds for the start action script [60]: Return 1 8

(continued on next page)

Setting Up ASE Services 7–23

Page 256: Truclu Ase

Setting Up a Disk Service

Example 7–6 (Cont.) Adding a Disk Service

Modifying the start action script for ‘dbase-01‘:

f) Replace the start action scripte) Edit the start action scriptg) Modify the start action script arguments [start]t) Modify the start action script timeout [60]r) Remove the start action scriptx) Exit - done with changes

Enter your choice [x]: Return 1 9

Modifying user-defined scripts for ‘dbase-01‘:

1) Start action2) Stop action3) Add action4) Delete action

x) Exit - done with changes

Enter your choice [x]: 2 2 0

Modifying the stop action script for ‘dbase-01‘:

a) Add a stop action script) Edit the stop action script) Modify the stop action script arguments []) Modify the stop action script timeout [60]) Remove the stop action script

x) Exit - done with changes

Enter your choice [x]: a 2 0

Enter the full pathname of your stop action script or "default"for the default script (x to exit): /usr/local/adm/database 2 1

Enter the argument list for the stop action script(x to exit): stop 2 2

Enter the timeout in seconds for the stop action script [60] Return 2 3

Modifying the stop action script for ‘dbase-01‘:

f) Replace the stop action scripte) Edit the stop action scriptg) Modify the stop action script arguments [stop]t) Modify the stop action script timeout [60]r) Remove the stop action scriptx) Exit - done with changes

Enter your choice [x]: Return

Modifying user-defined scripts for ‘dbase-01‘:

1) Start action2) Stop action3) Add action4) Delete action

x) Exit - done with changes

(continued on next page)

7–24 Setting Up ASE Services

Page 257: Truclu Ase

Setting Up a Disk Service

Example 7–6 (Cont.) Adding a Disk Service

Enter your choice [x]: Return

Selecting an Automatic Service Placement (ASP) Policy

Select the policy you want ASE to use when choosing a memberto run this service:

b) Balanced Service Distributionf) Favor Membersr) Restrict to Favored Members

x) Exit to Service Configuration ?) Help

Enter your choice [b]: b 2 4

Selecting an Automatic Service Placement (ASP) Policy

Do you want ASE to consider relocating this service to another memberif one becomes available while this service is running (y/n/?): y 2 5

Enter ’y’ to add Service ’dbase-01’ (y/n): y 2 6

Adding service...Starting service...Service dbase-01 successfully added...

.

.

.

1 After invoking asemgr , from the ASE main menu, choose theManaging ASE Services item.

2 From the Managing ASE Services menu, choose the ServiceConfiguration item.

3 From the Service Configuration menu, choose the Add a newservice item.

4 From the Adding a service menu, choose the Disk service.

5 Enter the service name, which is a virtual host name alreadyin the /etc/hosts file of all member systems with an Internetaddress.

6 If you assign an IP address to the service, POLYCENTERNetWorker Save and Restore (NetWorker) can back up thedisks associated with this service.

7 Enter the UFS device special file, AdvFS fileset, or LSMvolume which defines the disk storage for this service. Therecan be more than one UFS device special file, and so forth.Additional storage is added later.

8 Verify that the LSM volume and list of disks being used iscorrect.

9 Enter the mount point for the disk service.

1 0 Select the read-write or read-only mount option.

1 1 Enter the pathnames for user and group quota files; use thedefault, or enter "none" to disable quotas.

Setting Up ASE Services 7–25

Page 258: Truclu Ase

Setting Up a Disk Service

1 2 Enter any other mount options you want to use.

1 3 You can enter another UFS device special file, AdvFS fileset,or LSM volume for this service.

1 4 Add actions scripts if you want. There should be at least startand stop action scripts. Select the start action script.

1 5 Add the start action script.

1 6 Provide the full pathname for the start action script.

1 7 Provide the arguments needed to start the service.

1 8 Enter the timeout value or take the default of 60 seconds.

1 9 Finish the start action script.

2 0 Add the stop action script.

2 1 Provide the full pathname for the Stop action script.

2 2 Provide the arguments needed to stop the service.

2 3 Enter the timeout value or take the default of 60 seconds.

2 4 Choose a placement policy for this service. The help screensdescribe the various options. If you choose one of the favoredmembers policies, you will be prompted to select the members.

2 5 Determine whether the service will be relocated to anothermember.

2 6 Confirm that you want to add the service. This updates theASE database.

7–26 Setting Up ASE Services

Page 259: Truclu Ase

Setting Up a User-Defined Service

Setting Up a User-Defined Service

Overview A user-defined service consists only of an application that can failover.

User-DefinedService SetupProcedure

Before setting up a user-defined service, install the application onall member systems and write, then test the action scripts to startand stop the application.

A user-defined application cannot use disks. If your application isdisk-based, set up a disk service or an NFS service instead.

Use the asemgr utility to specify the following:

• Unique service name

• Action scripts to fail over the application

• Service placement policy and favored members

Adding aUser-DefinedService

Example 7–7 shows how to set up a user-defined service. Theexample uses dxcalc as the application.

Example 7–7 Setting Up a User-Defined Service

$ asemgr...

Adding a service

Select the type of service:

1) NFS service2) Disk service3) User-defined service

x) Exit to Service Configuration ?) Help

Enter your choice [1]: 3 1

You are now adding a new user-defined service to ASE.

User-defined Service Name

The name of a user-defined service must be a unique service name withinthe ASE environment.

Enter the user-defined service name (’q’ to quit): dxcalc 2

Modifying user-defined scripts for ‘dxcalc‘:

(continued on next page)

Setting Up ASE Services 7–27

Page 260: Truclu Ase

Setting Up a User-Defined Service

Example 7–7 (Cont.) Setting Up a User-Defined Service

1) Start action2) Stop action3) Add action4) Delete action5) Check actionx) Exit - done with changes

Enter your choice [x]: 1 3

Modifying the start action script for ‘dxcalc‘:

f) Replace the start action scripte) Edit the start action scriptg) Modify the start action script arguments [dxcalc]t) Modify the start action script timeout [60]r) Remove the start action scriptx) Exit - done with changes

Enter your choice [x]: f 4

Enter the full pathname of your start action script or "default"for the default script (x to exit): /usr/local/adm/calc-start-stop 5

Modifying the start action script for ‘dxcalc‘:

f) Replace the start action scripte) Edit the start action scriptg) Modify the start action script arguments [dxcalc]t) Modify the start action script timeout [60]r) Remove the start action scriptx) Exit - done with changes

Enter your choice [x]: g 6

Enter the argument list for the start action script(x to exit, NONE for none) [dxcalc]: dxcalc start 7

Modifying the start action script for ‘dxcalc‘:

f) Replace the start action scripte) Edit the start action scriptg) Modify the start action script arguments [dxcalc start]t) Modify the start action script timeout [60]r) Remove the start action scriptx) Exit - done with changes

Enter your choice [x]: Return

Modifying user-defined scripts for ‘dxcalc‘:

1) Start action2) Stop action3) Add action4) Delete action5) Check actionx) Exit - done with changes

Enter your choice [x]: 2 8

(continued on next page)

7–28 Setting Up ASE Services

Page 261: Truclu Ase

Setting Up a User-Defined Service

Example 7–7 (Cont.) Setting Up a User-Defined Service

Modifying the stop action script for ‘dxcalc‘:

f) Replace the stop action scripte) Edit the stop action scriptg) Modify the stop action script arguments [dxcalc]t) Modify the stop action script timeout [60]r) Remove the stop action scriptx) Exit - done with changes

Enter your choice [x]: f 9

Enter the full pathname of your stop action script or "default"for the default script (x to exit): /usr/local/adm/calc-start-stop 1 0

Modifying the stop action script for ‘dxcalc‘:

f) Replace the stop action scripte) Edit the stop action scriptg) Modify the stop action script arguments [dxcalc]t) Modify the stop action script timeout [60]r) Remove the stop action scriptx) Exit - done with changes

Enter your choice [x]: g 1 1

Enter the argument list for the stop action script(x to exit, NONE for none) [dxcalc]: dxcalc stop 1 2

Modifying the stop action script for ‘dxcalc‘:

f) Replace the stop action scripte) Edit the stop action scriptg) Modify the stop action script arguments [dxcalc stop]t) Modify the stop action script timeout [60]r) Remove the stop action scriptx) Exit - done with changes

Enter your choice [x]: Return

Modifying user-defined scripts for ‘dxcalc‘:

1) Start action2) Stop action3) Add action4) Delete action5) Check actionx) Exit - done with changes

Enter your choice [x]: Return

Selecting an Automatic Service Placement (ASP) Policy

Select the policy you want ASE to use when choosing a memberto run this service:

b) Balanced Service Distributionf) Favor Membersr) Restrict to Favored Members

x) Exit to Service Configuration ?) Help

(continued on next page)

Setting Up ASE Services 7–29

Page 262: Truclu Ase

Setting Up a User-Defined Service

Example 7–7 (Cont.) Setting Up a User-Defined Service

Enter your choice [b]: Return 1 3

Selecting an Automatic Service Placement (ASP) Policy

Do you want ASE to consider relocating this service to another memberif one becomes available while this service is running (y/n/?):n 1 4

Enter ’y’ to add Service ’dxcalc’ (y/n): y 1 5

Adding service...Starting service...Service dxcalc successfully added...

.

.

.$

1 From the Adding a service menu, choose the User-definedservice item.

2 Enter the service name, a virtual host name already inthe /etc/hosts file of all member systems with an Internetaddress.

3 Select Start action to add the start action script.

4 Select Replace the start action script to allow using a scriptyou have already written and debugged.

5 Provide the complete pathname of the start action script.

6 You must provide arguments to the start action script.

7 Provide the arguments to the start action script, the name ofthe service, and the action to be taken (start).

8 Select Stop action to add the stop action script.

9 Select Replace the stop action script to allow using a scriptyou have already written and debugged.

1 0 Provide the complete pathname of the stop action script.

1 1 You must provide arguments to the stop action script.

1 2 Select the ASP policy.

1 3 Determine if you want ASE to relocate the service.

1 4 Provide the arguments to the stop action script, the name ofthe service, and the action to be taken (stop).

1 5 Confirm that you want to add the service. This adds theservice and modifies the ASE database.

7–30 Setting Up ASE Services

Page 263: Truclu Ase

Setting Up a User-Defined Service

User-DefinedLogin Service

You can set up a user-defined network or login service that usesa pseudo host name for user login and network operations. Thepseudo host name is the name of the service and has an Internetaddress; it is used as an alias of a member system. This servicename must be unique from any system name. Users can log in tothe pseudo host name.

To set up a user-defined login service, you must perform thefollowing steps:

1. Specify the pseudo host name and its Internet address in allmember and client systems’ /etc/hosts files.

2. Use the asemgr utility to specify the following:

• Service (pseudo host) name

• Action scripts to start and stop the serviceASE provides a script, /var/ase/sbin/nfs_ifconfig , toestablish and remove the host name alias. Refer tothe description of Adding a User-Defined Login Servicein Chapter 3 of TruCluster Available Server SoftwareAvailable Server Environment Administration.

• Service placement policy and favored members

Setting Up ASE Services 7–31

Page 264: Truclu Ase

Using asemgr to Manage Services

Using asemgr to Manage Services

Overview Use the asemgr utility to manage services, including the followingactivities:

• Add a new service

• Modify an existing service

• Delete a service

• Display the status of a service

• Manually relocate a service to a specific member system

• Temporarily stop and restart a service

• Restart a stopped service

• Rereserve a service’s devices (LSM only)

ManagingServices Menu

To manage services, invoke the asemgr utility and choose theManaging ASE Services item from the main menu. The followingexample shows the menus dealing with services.

7–32 Setting Up ASE Services

Page 265: Truclu Ase

Using asemgr to Manage Services

Example 7–8 Managing ASE Services Menu

# asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]:

DisplayingService Status

To display the status of a service, choose the Display the status ofa service item from various ASE menus, as shown in the followingexample, then select the service you want to display.

Example 7–9 Displaying Service Status

# asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m

Managing ASE Services

(continued on next page)

Setting Up ASE Services 7–33

Page 266: Truclu Ase

Using asemgr to Manage Services

Example 7–9 (Cont.) Displaying Service Status

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: s

Service Status

Select the service whose status you want to display:

1) nfsusers on tinker2) nfspublic on tailor3) dxcalc on tailor

x) Exit to previous menu ?) Help

Enter your choice [x]: 1

Status for NFS service ‘nfsusers‘

Status: Relocate: Placement Policy: Favored Member(s):on tinker yes Balance Services None

Storage configuration for NFS service ‘nfsusers‘

NFS Exports list/usr/nfsusers

Mount Table (device, mount point, type, options)users-domain#users /var/ase/mnt/nfsusers/usr/nfsusers advfs rw,groupquota,userquota

Advfs ConfigurationDomain: Volume(s):users-domain /dev/vol/users-dg/users-vol

LSM ConfigurationDisk Group: Device(s):users-dg rz17 rz33

Press ’Return’ to continue: Return

Service Status

Select the service whose status you want to display:

1) nfsusers on tinker2) nfspublic on tailor3) dxcalc on tailor

x) Exit to previous menu ?) Help

(continued on next page)

7–34 Setting Up ASE Services

Page 267: Truclu Ase

Using asemgr to Manage Services

Example 7–9 (Cont.) Displaying Service Status

Enter your choice [x]:...

#

Relocating aService

A service is automatically relocated by ASE if a failure stops amember system from providing the service. You can also usethe asemgr utility to manually relocate a service. This stopsthe service on the member currently running the service, andstarts the service on the member you select. You can override theservice’s placement policy when you select a member system torun the service.

To relocate a service, choose the Relocate a service item from theManaging ASE Services menu, as shown in the Example 7–10.

Example 7–10 Relocating a Service

# asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: r

Select the service you want to relocate

Services:

1) nfsusers on tinker2) nfspublic on tailor3) dxcalc on tailor

(continued on next page)

Setting Up ASE Services 7–35

Page 268: Truclu Ase

Using asemgr to Manage Services

Example 7–10 (Cont.) Relocating a Service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: 1

Select member to run ’nfsusers’ service:

1) tailor) tinker

x) Exit without making changes ?) Help

Enter your choice: 1Relocating service ‘nfsusers‘ to member ‘tailor‘...Relocation successful.

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: s

Service Status

Select the service whose status you want to display:

1) nfsusers on tailor2) nfspublic on tailor3) dxcalc on tailor

x) Exit to previous menu ?) Help

Enter your choice [x]:...

Modifying aService

You can use the asemgr to modify any information that wasspecified when a service was added to ASE.

• Disk configuration: You can add UFS file systems, AdvFSfilesets, LSM volumes to the service or delete them fromthe service. You can change any disk information that wasspecified when the service was added, including:

Name of the file system, fileset, or volume

Mount point

Access mode and mount options

Owner and mode of the mount point

Exports file and locking area for an NFS service

7–36 Setting Up ASE Services

Page 269: Truclu Ase

Using asemgr to Manage Services

• Service information:

Service name

Automatic Service Placement (ASP) policy and favoredmembers

User-defined action scripts

Exports file for an NFS service

To modify an LSM disk group or logical volume, or AdvFS domainor fileset being used in a service, you can modify the configurationwhile the service is on line. Relocate the service to the memberon which you will change the configuration, and ensure that theservice will not relocate if a more highly favored member becomesavailable. Use the AdvFS or LSM commands to change theconfiguration. Then use the asemgr utility to modify the service.When you select the fileset or volume, asemgr displays the changedconfiguration. Enter y if the information is correct, and the ASEdatabase is updated.

After the service is modified by the asemgr , the TruClusterSoftware:

• Stops the service

• Deletes the service

• Propagates the database to all member systems

• Starts the modified service

Example 7–11 uses the asemgr to modify the Automatic ServicePlacement (ASP) policy of a service.

Example 7–11 Modifying a Service

# asemgr

TruCluster Available Server (ASE)

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: m

Managing ASE Services

(continued on next page)

Setting Up ASE Services 7–37

Page 270: Truclu Ase

Using asemgr to Manage Services

Example 7–11 (Cont.) Modifying a Service

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: c

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]: m

Modifying a Service

Select the service you want to modify:

1) nfsusers on tinker2) nfspublic on tailor3) dxcalc on tinker

x) Exit to Service Configuration ?) Help

Enter your choice [x]: 2

Select what you want to modify in service ‘nfspublic‘:

g) General service informationa) Automatic service placement (ASP) policy

x) Exit without modifications ?) Help

Enter your choice [g]: a

Selecting an Automatic Service Placement (ASP) Policy

Select the policy you want ASE to use when choosing a memberto run this service:

b) Balanced Service Distributionf) Favor Membersr) Restrict to Favored Members

x) Exit to Service Configuration ?) Help

Enter your choice [b]: f

Selecting an Automatic Service Placement (ASP) Policy

Select the favored member(s) IN ORDER for service ’nfspublic’:

1) tinker2) tailor

x) No favored members ?) Help

(continued on next page)

7–38 Setting Up ASE Services

Page 271: Truclu Ase

Using asemgr to Manage Services

Example 7–11 (Cont.) Modifying a Service

Enter a comma-separated list [x]: 2

Selecting an Automatic Service Placement (ASP) Policy

Do you want ASE to relocate this service to a more highly favored memberif one becomes available while this service is running (y/n/?): y

NOTE: Modifying a service causes it to stop and then restart. If you donot want to interrupt the service availability, do not modify the service.

Enter ’y’ to modify service ’nfspublic’ (y/n): yStopping service...Deleting service...Adding service...Starting service...Service successfully updated.

Service Configuration

a) Add a new servicem) Modify a serviced) Delete a services) Display the status of a service

x) Exit to Managing ASE Services ?) Help

Enter your choice [x]:

Managing ASE Services

c) Service Configuration -->r) Relocate a service

on) Set a service on lineoff) Set a service off lineres) Restart a service

s) Display the status of a servicea) Advanced Utilities -->

x) Exit to the Main Menu ?) Help

Enter your choice [x]: s

Service Status

Select the service whose status you want to display:

1) nfsusers on tinker2) nfspublic on tailor3) dxcalc on tinker

x) Exit to previous menu ?) Help

Enter your choice [x]: 2

Status for NFS service ‘nfspublic‘

(continued on next page)

Setting Up ASE Services 7–39

Page 272: Truclu Ase

Using asemgr to Manage Services

Example 7–11 (Cont.) Modifying a Service

Status: Relocate: Placement Policy: Favored Member(s):on tailor yes Favor Member(s) tailor

.

.

.

7–40 Setting Up ASE Services

Page 273: Truclu Ase

Summary

Summary

UnderstandingHighlyAvailableServices

To make an application highly available, set up an ASE servicefor that application.

ASE supports three types of services:

• NFS service provides highly available access to exported diskdata.

• Disk service provides highly available access to disks or adisk-based application.

• User-defined service provides highly available access to anapplication.

Clients refer to service names rather than server names.

Preparingto Set UpServices

Before adding a service, you must plan how to set up the serviceand perform some preparatory tasks. For example, you may needto set up NFS, AdvFS, or LSM, or install an application. Youmust assign each service a unique name and a automatic serviceplacement policy.

Setting Up NFSServices

To set up an NFS service, you must specify the service name andits Internet address in all member and client systems’ /etc/hostsfiles. Use the asemgr utility to specify the service, device, exportpath, and mount options, and add the service name and exportmount point to the clients’ /etc/fstab files.

To set up a mail service with ASE, the file systems containingthe mailbox directory /var/spool/mail and the mail queue area/var/spool/mqueue must be set up as an NFS service, and the mailhubs’ sendmail.cf configuration file must be modified to ensurethat the service name is treated as a virtual host.

Setting Up aDisk Service

To set up a disk service with ASE, install the application softwareon all member systems, prepare the shared disks, develop actionscripts to start and stop the disk-based application, and use theasemgr utility to specify the service information.

Ensure that the service name is in the /etc/hosts file of allmember and client systems.

Setting Up aUser-DefinedService

To set up a user-defined service with ASE, install the applicationsoftware on all member systems, develop action scripts to startand stop the application, and use the asemgr utility to specify theservice information.

Setting Up ASE Services 7–41

Page 274: Truclu Ase

Summary

To set up a user-defined login service with ASE, add the pseudohost name and its Internet address to all member and clientsystems’ /etc/hosts files, and use the asemgr utility to specify theservice information. You can use the nfs_ifconfig script in yourstart and stop action scripts to set up and remove the host namealias.

Using asemgrto ManageServices

Use the asemgr utility to manage services, including the followingactivities:

• Add a new service

• Modify an existing service

• Delete a service

• Display the status of a service

• Manually relocate a service to a specific member system

• Temporarily stop and restart a service

• Restart a stopped service

• Rereserve an Logical Storage Manager (LSM) device

7–42 Setting Up ASE Services

Page 275: Truclu Ase

Exercises

Exercises

UnderstandingHighlyAvailableServices:Exercise

1. Describe the three types of services supported by ASE.

2. Explain the significance of the service name.

UnderstandingHighlyAvailableServices:Solution

1. ASE supports three types of services:

• NFS service provides highly available access to exporteddisk data.

• Disk service provides highly available access to disks or adisk-based application.

• User-defined service provides highly available access to anapplication.

2. The service name, which is what the client refers to, is distinctfrom any system name. This means the service is not tied toany system, and can be relocated if necessary.

Preparingto Set UpServices:Exercise

1. Describe the three automatic service placement policies.

2. Identify the characteristics an application must have to bemade highly available with ASE.

Preparingto Set UpServices:Solution

1. The three automatic service placement policies are:

• Balanced service distribution tries to balance the serviceload. ASE will choose the member running the leastnumber of services at the time the service is started.

• Favor members checks the specified members in orderfirst. If one of them is available, it is selected to run theservice. If none of the favored members are available,ASE will choose the member running the least number ofservices.

• Restrict to favored members checks the specified members.However, if none of the favored members are available,ASE will not start the service. This policy ensures thatASE never moves the service to a member not on the list.

2. An application must have the following characteristics to bemade highly available with ASE:

• The application must run on only one system at a time.

• The application must be able to be started and stoppedusing a series of commands (action script).

Setting Up ASE Services 7–43

Page 276: Truclu Ase

Exercises

Setting UpNFS Services:Exercise

Set up an NFS service.

1. Set up a UFS file system or AdvFS fileset on a shared disk.

2. Choose a unique service name. Enter the service name and anInternet address into each server and client /etc/hosts file.

3. Use asemgr to specify the service name, device special file,fileset, or logical volume, mount pathname, and read-writeaccess.

4. Use the balanced placement policy.

5. Add the service name and export mount point to the clients’/etc/fstab files.

6. Log in to a client system and create a file on this file system.

Setting UpNFS Services:Solution

Use Example 7–3 from the course guide as an example solution.

Setting Up aDisk Service:Exercise

Use the asemgr utility to set up a disk service. You do not haveto include a disk-based application (therefore no start/stop actionscript is necessary).

Setting Up aDisk Service:Solution

No solution is necessary. Refer to Example 7–6 for a samplesolution.

Setting Up aUser-DefinedService:Exercise

Set up the /usr/bin/X11/dxcalc program as a user-defined service.If you have not already done so, write an action script thatwill both start and stop the service. Place the script in the/usr/local/adm directory. Verify that the script executes correctlybefore running asemgr to add the service.

Keep in mind when you write the script, the command tostart /usr/bin/X11/dxcalc may need to include the -d option todesignate your workstation.

Setting Up aUser-DefinedService:Solution

No solution required. Refer to Example 7–7 as a sample solution.

7–44 Setting Up ASE Services

Page 277: Truclu Ase

Exercises

Using asemgrto ManageServices:Exercise

Use the asemgr utility to:

1. Set a service off line.

2. Set the same service back on line.

3. Relocate a service to another member.

4. Change the ASP policy of a service.

Using asemgrto ManageServices:Solution

No solution necessary. Refer to the examples in this section.

Setting Up ASE Services 7–45

Page 278: Truclu Ase
Page 279: Truclu Ase

8Using the Cluster Monitor

Using the Cluster Monitor 8–1

Page 280: Truclu Ase

About This Chapter

About This Chapter

Introduction The Cluster Monitor monitors the status of an Available Serverconfiguration and displays the configuration, including membersystems, available services, storage devices, and interconnects.The Cluster Monitor provides a graphical interface to the clusterconfiguration map. It simplifies TruCluster Available Servermanagement by allowing you to view the status of members andservices, to relocate a service, and to launch other tools such asdxlsm and asemgr .

This chapter shows you how to set up and run the ClusterMonitor and launch other tools.

Objectives To use the Cluster Monitor to monitor an Available Serverconfiguration, you should be able to:

• Set up the Cluster Monitor

• Use the Cluster Monitor to view the status of devices andservices

• Launch other management tools through the Cluster Monitor

• Use the Cluster Monitor to identify problems in a TruClusterSoftware configuration

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• Reference Pages

• Cluster Monitor online help

8–2 Using the Cluster Monitor

Page 281: Truclu Ase

Setting Up the Cluster Monitor

Setting Up the Cluster Monitor

Overview Before you can run the Cluster Monitor, the hardware andsoftware must be installed, the ASE must be properly configured,and all member systems and devices must be up and available.On a new ASE configuration, or after you change the hardwareconfiguration, you must also create the cluster map.

The Cluster Monitor obtains the ASE hardware configurationinformation from the cluster configuration map. The cluster mapis formed by gathering hardware configuration information fromeach of the member systems in the ASE. This information isis compiled into a text file, /etc/CCM . This file is copied to eachmember system.

SetupProcedure

To set up the Cluster Monitor, follow these steps:

1. Make sure that the required subsets for the product areinstalled:

• TCRCMS140(TruCluster Cluster Monitor)

• CXLSHRDA405(DEC C++ Class Shared Libraries)To use dxadvfs , you must also have the AFAADVGUI401 subsetinstalled. In addition, LSM and other GUI tools may requirelicenses to be loaded on the member system running themonitor.

2. Set up the /.rhosts file to allow root access for the rshcommand between any two member systems. You mustinclude all member systems, including the local system, in the/.rhosts file. For example, the /.rhosts file on system tinkershould list systems tailor and tinker. Refer to .rhosts (4) formore information.

3. Check that all member systems are ‘‘UP’’ by running theasemgr utility and displaying member status.

4. Use the cluster_map_create command to create the clusterconfiguration map on one member system in the ASE domain.To issue the cluster_map_create command, log in as superuserand use the following syntax:

/usr/sbin/cluster_map_create clustername -full

The following table describes the usage for the clusternamevariable and the -full option.

Using the Cluster Monitor 8–3

Page 282: Truclu Ase

Setting Up the Cluster Monitor

clustername Name of ASE domain, up to 64 characters; usedto label the title bar on the Cluster Monitormain window. You can use any name you wantfor this variable.

-full Option forces all member systems to rescan fornew components and update their cluster map.

After you issue the cluster_map_create command, theconfiguration information is merged into the cluster map,/etc/CCM , which is distributed to the kernels of all membersystems. If any configuration errors are discovered, or ifany member systems are down, the utility generates errormessages. Refer to cluster_map_create (8) and CCM(5) for moreinformation.

Sample SetupScript

The following is a sample script showing the commands you issueand the system prompts that are displayed when you set up theCluster Monitor application.

# /usr/sbin/cluster_map_create ASE0 -full

Members running are: ( tinker tailor )Doing device table scans....Doing symmetry checks...Processing map input file....Calling makeclmap to create /etc/CCM.Distributing cluster map to all members.Processing member tinker.Processing member tailor.Successful cluster map creation and distribution.

Updating theCluster Map

If you add a member system later, you must perform thepreceding tasks on that new system. If you make other changesto your hardware configuration, invoke the cluster_map_createcommand with the -full and -append options on one membersystem. This updates each member’s cluster configuration map.

8–4 Using the Cluster Monitor

Page 283: Truclu Ase

Using the Cluster Monitor

Using the Cluster Monitor

Overview The Cluster Monitor provides a graphical interface for managingan Available Server configuration and detecting Available Server-related problems. It shows the current state of availability andconnectivity, and visually alerts the administrator to problems.The Cluster Monitor performs the following tasks:

• Displays the status of each member system

• Reports member system failures, ASE service failures, andhard and soft disk errors

• Displays the configuration of an Available Serverimplementation, including its member systems, ASEservices, storage devices, and network interfaces

• Displays the devices on a member system’s private SCSI buses

• Displays the shared storage reserved by an ASE service

• Stops, starts and relocates ASE services

• Launches external tools:

asemgr

dxadvfs

dxlsm

pmgr

Using the Cluster Monitor 8–5

Page 284: Truclu Ase

Using the Cluster Monitor

Starting theCluster Monitor

To start the Cluster Monitor:

1. Set the session security to allow X access, or use the xhost +command to disable all X client security checks.

2. Log in to an ASE member system as root.

3. Set your DISPLAY variable to point to the desired workstationor PC (that supports the X Window System).

4. Run /usr/bin/X11/cmon .

For more information on the Cluster Monitor, see cmon(8) and theonline help.

Top View The top view, or main window, presents an overview of the statusof all the ASE member systems. If you have a four-member ASEdomain, all four member systems will be visible in the top view.

For each member system, icons show the status of the system,its ASE services, its interconnects, and its storage. If the ClusterMonitor detects a problem with one of these subsystems, it drawsa line through the corresponding icon.

8–6 Using the Cluster Monitor

Page 285: Truclu Ase

Using the Cluster Monitor

Figure 8–1 is a representation of the Cluster Monitor top view.

Figure 8–1 Cluster Monitor Top View

1 2

3

tinker tailor

Monitor Options Help

4

Cluster Monitor: ASE 0: tinker

ZKOX−5481−21−RGS

ASE 0

The icons in each server indicate the following:

1 System status (up or down)

2 Status of ASE services

3 Status of shared storage devices

4 Local area network interfaces

Click a member system to display the device view for thatmember. Click a service icon to display the services view.

Device View The device view displays the hardware configuration in the ASEdomain. It displays the network interconnects, the membersystems, the shared SCSI buses, and the shared storage devices.

Using the Cluster Monitor 8–7

Page 286: Truclu Ase

Using the Cluster Monitor

Figure 8–2 is a representation of the Cluster Monitorconfiguration view.

Figure 8–2 Cluster Monitor Configuration View

tinker tailor

View Help

Cluster Monitor: ASE: ASE 0

Action

Device View Status

Net_199.155.0.0

SCSI_3

SCSI_2

rz17rz18rz25rz26

08:44:57 Cluster Monitor started

Device Service

Tools

XtermAsemgrDxadvfsDxlsmPmgr

Net_199.156.0.0

ZKOX−5481−23−RGS

8–8 Using the Cluster Monitor

Page 287: Truclu Ase

Using the Cluster Monitor

Click MB1 on a device to display its connections. For example,click a SCSI bus and the monitor will display all devices attachedto that bus, as shown in Figure 8–3.

Figure 8–3 SCSI Bus Configuration

tinker tailor

Help

Cluster Monitor: ASE: ASE 0

Device View Status

Net_199.155.0.0

SCSI_3

SCSI_2

rz17rz18rz25rz26

08:44:57 Cluster Monitor started

Device Service

View ActionTools

XtermAsemgrDxadvfsDxlsmPmgr

Net_199.156.0.0

ZKOX−5481−24−RGS

Using the Cluster Monitor 8–9

Page 288: Truclu Ase

Using the Cluster Monitor

Press Ctrl while clicking MB1 to add connections from anothercomponent, as shown in Figure 8–4.

Figure 8–4 All Shared Connections

tinker tailor

Help

Cluster Monitor: ASE: ASE 0

Device View Status

Net_199.155.0.0

SCSI_3

SCSI_2

rz17rz18rz25rz26

08:44:57 Cluster Monitor started

Device Service

View ActionTools

XtermAsemgrDxadvfsDxlsmPmgr

Net_199.156.0.0

ZKOX−5481−25−RGS

8–10 Using the Cluster Monitor

Page 289: Truclu Ase

Using the Cluster Monitor

Double-clicking an icon may open a dialog box with moredetailed information. For example, double-clicking a membersystem displays that system’s local bus and devices, as shown inFigure 8–5.

Figure 8–5 Local Connections

tinker tailor

Help

Cluster Monitor: ASE: ASE 0

Device View Status

Net_199.155.0.0

SCSI_3

SCSI_2

rz17rz18rz25rz26

08:44:57 Cluster Monitor started

Device Service

View ActionTools

XtermAsemgrDxadvfsDxlsmPmgr

Net_199.156.0.0

Status:

08:44:57 Cluster Monitor started

Help

tailor

SCSI0_tailor

rz1Cdrom6 rz0

Private Devices view: tailor

Close

tz5

ZKOX−5481−26−RGS

Services View The services view displays the services that are registered in theASE domain. It displays the member systems with the onlineservices that are running on them. It also displays the offline andunavailable services.

Using the Cluster Monitor 8–11

Page 290: Truclu Ase

Using the Cluster Monitor

Figure 8–6 is a representation of the Cluster Monitor servicesview.

Figure 8–6 Cluster Monitor Services View

tinker tailor

View Help

Cluster Monitor: ASE: ASE 0

ZKOX−4418−38−RGS

Action

Services View Status

08:44:57 Cluster Monitor started

Device Service

users projectX

database

Tools

XtermAsemgrDxadvfsDxlsmPmgr

8–12 Using the Cluster Monitor

Page 291: Truclu Ase

Using the Cluster Monitor

Each type of available service has an icon, as shown in Table 8–1.

Table 8–1 Available Service Icons

Disk service

NFS service

User-defined service

Unknown type of service

Double-click a service to display the shared storage devicesassociated with the service, as shown in Figure 8–7.

Using the Cluster Monitor 8–13

Page 292: Truclu Ase

Using the Cluster Monitor

Figure 8–7 Service Devices

tinker tailor

View Help

Cluster Monitor: ASE: ASE 0

ZKOX−4418−39−RGS

Action

Services View Status

08:44:57 Cluster Monitor started

Device Service

users projectX

databaseHelp

XtermAsemgrDxadvfsDxlsmPmgr

Tools

users

/dev/rz17c

/dev/rz25c

Service Details View: users

Close

You can relocate a service visually by dragging its icon to anothermember system in this window.

The Action menu (available on the menu bar or by pressing MB3)provides operations that can be performed on an ASE service,including:

• Placing a service on line

• Placing a service off line

• Restarting a service

• Relocating a service

8–14 Using the Cluster Monitor

Page 293: Truclu Ase

Launching Other Tools

Launching Other Tools

Overview The Cluster Monitor provides a graphical view of the AvailableServer configuration, where ASE member systems and tools arerepresented as icons. It allows the administrator to launch toolsvia a drag and drop interface. The Cluster Monitor is extensibleto allow integration of other graphical management applications.

With this capability, an administrator can manage a set ofsystems from a local workstation.

Included Tools The Cluster Monitor configuration and services windows includetoolbar icons for the following utilities:

• LSM Visual Administrator (dxlsm )

• AdvFS manager (dxadvfs )

• Performance Manager (pmgr)

• ASE manager (asemgr )

• X terminal window (xterm )

You can click a tool icon to activate the utility on a system in thecurrent ASE domain, or drag the tool icon to a member systemicon to run the utility on that system.

External Tools You can drag icons from the CDE desktop or Application Managerto the Cluster Monitor window to invoke the application onthat cluster member system. For example, dragging the SystemInformation icon to the Cluster Monitor window and dropping theicon onto a member system would cause the Cluster Monitor torun that command on the member system and display the results.

Using the Cluster Monitor 8–15

Page 294: Truclu Ase

Monitoring Available Server Configurations with the Cluster Monitor

Monitoring Available Server Configurations with the ClusterMonitor

Overview The Cluster Monitor provides a useful tool for monitoring thehealth of the components of an Available Server configuration.Each view provides indicators of the status of the components itdisplays.

You can use the Cluster Monitor to:

• Determine which components are failing

• Launch diagnostic tools to check the setup

• Launch management tools to reconfigure hardware or ASEservices

Top View The top view presents an overview of the status of an AvailableServer configuration. Problems or state changes are indicated bychanges in the icons displayed, as shown in Table 8–2.

Table 8–2 Main Window Failure Indicators

Icon or Display Meaning

Outline aroundsystem graphic

Failure of that system, one of its devices,or one of its services

Failure of the system

Failure of one or more services on thatsystem

Failure of one or more shared devices

In addition, an attention light appears at the bottom of the mainwindow when a failure has been reported.

Device View The device view displays the status of the hardware configurationof the Available Server configuration. Problems or state changesare indicated by changes in the icons displayed, as shown in thefollowing table.

8–16 Using the Cluster Monitor

Page 295: Truclu Ase

Monitoring Available Server Configurations with the Cluster Monitor

Table 8–3 Device View Failure Indicators

Icon or Display Meaning

Available Server Availability Manager isnot reporting that the system is a memberof the ASE domain

Ten soft errors have occurred in a 15-minute interval; this indicates the diskmay be deteriorating

Hard error has occurred on the device; dataon the disk may be corrupted

In addition, the status log area reports status changes.

Services View The services view displays the status of services registered in theASE domain. Problems or state changes are indicated by changesin the icons displayed, as shown in the following table.

Table 8–4 Services View Failure Indicators

Icon or Display Meaning

Service is off line

Service is unassigned due to a missingresource

Available Server Availability Manager isnot reporting that the system is a memberof the ASE domain

Unknown type of service

In addition, the status log area reports status changes.

What to Dowhen You Seean Error

If the Cluster Monitor indicates a problem, use the various viewsto gather more detailed information about the failed component.Double-click the component to see if further details are available.See the Cluster Monitor online help to find out what the symboldisplayed means for that component. The online help may suggestsome tools you can use to further diagnose the problem.

Using the Cluster Monitor 8–17

Page 296: Truclu Ase

Monitoring Available Server Configurations with the Cluster Monitor

You can run the clu_ivp script to check the cluster setup. Run theasemgr utility to check the ASE member and service setup. Checkthe system event logs for cluster messages. Run other systemmonitoring utilities, such as ifconfig , netstat , and ping .

You can run character-based utilities without leaving the ClusterMonitor by selecting the Xterm option on the Tools menu, byclicking the xterm icon on the toolbar, or by dragging the xtermicon using MB2 to any member system.

8–18 Using the Cluster Monitor

Page 297: Truclu Ase

Summary

Summary

Setting Up theCluster Monitor

To set up the Cluster Monitor:

1. Make sure all systems and devices are properly connected.

2. Make sure the Cluster Management subset (TCRCMS140) isinstalled.

3. On each member system, set up the /.rhosts file to allow rootaccess for the rsh command between any two member systems.You must include all member systems, including the localsystem, in the /.rhosts file.

4. Check that all member systems are up by running the asemgrutility and displaying member status.

5. Create the cluster configuration map on one ASE membersystem.

/usr/sbin/cluster_map_create clustername -full

Using theCluster Monitor

The Cluster Monitor shows the current state of availability andconnectivity, and visually alerts the administrator to problems.

Run the Cluster Monitor by executing /usr/bin/X11/cmon as rooton an ASE member system.

The Cluster Monitor uses the following display windows:

• The top view presents an overview of the status of the membersystems. For each member system, icons show the status ofthe system, its ASE services, its interconnects, and its storage.

• The device view displays the hardware configuration. Itdisplays the network interconnects, the member systems, theshared SCSI buses, and the shared storage devices.

• The services view displays the services that have beenregistered in the current environment.

LaunchingOther Tools

The Cluster Monitor displays ASE member systems, devices andtools as icons. It allows you to drag a tool icon to a system icon toexecute the utility on that system.

The Cluster Monitor includes toolbar icons for the followingutilities:

• LSM Visual Administrator (dxlsm )

• AdvFS manager (dxadvfs )

• Performance Manager (pmgr)

• ASE manager (asemgr )

• X terminal window (xterm )

Using the Cluster Monitor 8–19

Page 298: Truclu Ase

Summary

In CDE, you can also drag application icons from the systemmanagement applications window in the Application Manager tothe Cluster Monitor window.

MonitoringAvailableServerConfigurationswith theCluster Monitor

The Cluster Monitor provides a useful tool for monitoring thehealth of the components of an Available Server configuration.Each view provides indicators of the status of the components itdisplays. In general, a diagonal line across an icon indicates thefailure of that component.

8–20 Using the Cluster Monitor

Page 299: Truclu Ase

Exercises

Exercises

Setting Upthe ClusterMonitor:Exercise

Set up the Cluster Monitor by following these steps:

1. Make sure the appropriate subsets are installed.

2. Set up the /.rhosts file to allow root access for the rshcommand between any two member systems. You mustinclude all member systems, including the local system, in the/.rhosts file.

3. Check that all member systems are "UP" by running theasemgr utility and displaying member status.

4. Create the cluster configuration map on one cluster membersystem.

Setting Upthe ClusterMonitor:Solution

1. Sample solution for TruCluster Software

# /usr/sbin/setld -i | grep TCRCMS140

You should also check for CXLSHRDA and any other subsetsyou require.

2. Sample solution

tinker# cat > /.rhoststailortinker

Ctrl/D

3. Sample solution (in each ASE)

# /usr/sbin/asemgr

ASE Main Menu

a) Managing the ASE -->m) Managing ASE Services -->s) Obtaining ASE Status -->

x) Exit ?) Help

Enter your choice: a

Managing the ASE

a) Add a memberd) Delete a membern) Modify the network configurationm) Display the status of the members

.

.

.

Enter your choice : m

Member Status

Using the Cluster Monitor 8–21

Page 300: Truclu Ase

Exercises

Member: Host Status: Agent Status:tinker UP RUNNINGtailor UP RUNNING

4. Sample solution

# /usr/sbin/cluster_map_create myase -full

Using theClusterMonitor:Exercise

1. Start the Cluster Monitor on your workstation.

a. Log in to an ASE member system as root.

b. Set your DISPLAY variable to point to the workstation.

c. Set the session security to allow X access, or use the xhost+ command to disable all X client security checks.

d. Run /usr/bin/X11/cmon .

2. Click a member system to show the device view for thatmember.

3. Click a SCSI bus to display the devices attached to that bus.

4. Press Ctrl while clicking to display more connections.

5. Double-click a member system to display that system’s localSCSI buses and devices.

6. Click the Service icon in the device view above the ClusterMap to switch to the services view.

7. Double-click an ASE service to display its shared storagedevices.

8. Relocate an ASE service by dragging its icon to anothermember system. Use the Action menu (from the menu bar orby pressing MB3) to put a service off line.

Using theClusterMonitor:Solution

1. The Cluster Monitor top view should appear. If you get adisplay error, check your DISPLAY environment variable andthe session security.

2. See the online help for more information.

3. When you click a bus, the name should highlight, and linesshould be drawn showing the connections.

4. When you press Ctrl while clicking, additional connectionsshould be drawn.

5. When you double-click a member system, the host detailsdialog box appears, displaying the system’s local bus anddevices.

6. See the online help for more information.

7. When you double-click a service, the service details dialog boxappears, displaying the service’s shared storage devices.

8–22 Using the Cluster Monitor

Page 301: Truclu Ase

Exercises

8. See the online help for more information.

LaunchingOther Tools:Exercise

1. Start the Cluster Monitor and check the online help forinformation about launching tools.

2. Run each of the tools included in the Cluster Monitor byclicking their icons.

3. Drag the asemgr icon to a service icon to get detailed status ofthe service. Drag the xterm icon to a member system icon anduse the hostname command to verify you are executing on thatmember system.

4. In CDE, launch the Application Manager by clicking its iconon the CDE Front Panel. Drag the System Information iconfrom the System Admin group and drop it on a membersystem icon.

LaunchingOther Tools:Solution

1. Use the Help menu to access online help. The Tasks sectionincludes a topic on running applications by drag and drop.

2. Each tool should activate its own window.

3. The asemgr utility will run in an xterm window. The result ofthe hostname command should be the name of the system onwhich you dropped the xterm icon.

4. The Application Manager icon is a file drawer with tools init. The System Information icon is in the Daily Admin group.The System Information application should run in its ownwindow.

MonitoringAvailableServerConfigurationswith theClusterMonitor:Exercise

1. Match each of the following Cluster Monitor failure indicationswith its correct meaning.

a. Blank area in the shapeof a system icon

a. Service is off line

b. Diagonal line across asystem icon

b. Service is unassigned

c. Diagonal line across astorage icon

c. System is not reported as anASE member

d. Circle in the corner of aservice icon

d. Disk error (may be soft errors)

e. Diagonal lines crossingin the corner of serviceicon

e. Failure of the system

2. Run the Cluster Monitor and check the online help to seewhat troubleshooting information it provides.

Using the Cluster Monitor 8–23

Page 302: Truclu Ase

Exercises

3. Pull a disk or shut down a member system and check theCluster Monitor to see the effect.

MonitoringAvailableServerConfigurationswith theClusterMonitor:Solution

1. Solution to matching

a. c.

b. e.

c. d.

d. b.

e. a.

2. An introduction to troubleshooting information is provided inthe Reference section of the Cluster Monitor online help. Itexplains each of the icons and suggests tools to use to furtherdiagnose problems.

3. Any services that depend on the failed resource should tryto restart on another member in the same ASE domain. Thefailure of a member system should be indicated by an iconcontaining a blank space in the shape of a server.

8–24 Using the Cluster Monitor

Page 303: Truclu Ase

9Testing, Recovering, and Maintaining

TruCluster Configurations

Testing, Recovering, and Maintaining TruCluster Configurations 9–1

Page 304: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter describes how to verify that your ASE services willbehave as you expect when hardware failures occur. The chapteralso describes how to recover from hardware failures and how tomodify your ASE hardware configuration.

The following failures are discussed:

• Member node crash

• Failure of a DWZZA

• Shared disk failure

• Removal of power from a storage enclosure

• Network interface falure

• Network partition

Knowing how ASE fails over services will help you understandhow to best configure your ASE services.

You must also know what steps are needed to perform ongoingmaintenance tasks.

Objectives To test TruCluster Available Server failover capability, you shouldbe able to:

• Test the hardware failure conditions that the TruClusterSoftware detects, and predict how Available ServerEnvironment responds to these failures

• Recover from hardware failures in an ASE

• Change the hardware configuration in an ASE

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• TruCluster Available Server Software Software ProductDescription, SPD 44.17.xx

• Reference Pages

9–2 Testing, Recovering, and Maintaining TruCluster Configurations

Page 305: Truclu Ase

Performing TruCluster Testing Procedures

Performing TruCluster Testing Procedures

Overview This section describes how to test whether your Available ServerEnvironment configuration responds properly when failure eventsoccur. The following six failure events are tested:

• System Power Off

• DWZZA-AA Power Off

• Removal of a Shared Disk

• Removing Power from a BA350

• Removal of the Network From One Member

• Removal of the Network From All Members

SystemConfigurationAssumptions

Tests in this section assume that the configuration meetsminimum requirements. There must be at least two Alphasystem member nodes, with a shared SCSI bus that is properlyterminated. Either external or internal DWZZAs can be used.Testing can be performed with shared disks being mirrored withLSM and/or with disks that are not mirrored.

ObservingSystemResponse

You should use the Obtaining ASE Status submenu on the ASEManager’s Main menu to see where a service resides before andafter performing tests. This will verify that the services failedover as you expect.

You can also observe what messages the TruCluster Softwareproduces by using the tail -f command on a member noderunning the Logger daemon and specifying the current daemon.logfile in the /var/adm/syslog.dated/ date directory, where date is adate and time stamp directory name such as 11-Jan-10:00 .

For example, to observe messages in the directory for January 11at 10:00 a.m., you would enter the following on a member noderunning the Logger daemon:

# tail -f /var/adm/syslog.dated/11-Jan-10:00/daemon.log

As you introduce system failures, the Logger daemon sendsmessages to the daemon.log file and tail displays them. This givesyou immediate feedback as you perform each test. Be aware thatnetwork failures can interfere with the sending of log messages.

Whenever a message of severity level alert occurs, messagesget logged and the Alert script gets invoked. This script sends"Critical ASE error" message mail to users specified in the script.The default user specified in the script is root .

Testing, Recovering, and Maintaining TruCluster Configurations 9–3

Page 306: Truclu Ase

Performing TruCluster Testing Procedures

Instructor Note

Relocation of each service is based on the service’s failoverAutomatic Service Placement (ASP) policy.In general, all tests in this section are highly dependenton how the hardware and software have been configured.The test results you achieve will vary depending on theconfiguration, but the error message examples should beconsistent.

System PowerOff Test

This test consists of turning off the power on any member noderunning a service to produce a Host Down condition. When youpower off a member node, The TruCluster Software relocatesservices to surviving members based on the Automatic ServicePlacement (ASP) policies in the Available Server Environmentconfiguration database. If the Director daemon was running onthe failed system, the Agent daemons on the remaining membernodes elect a new node to start the Director daemon.

In a two-member environment, all services get relocated to thesurviving member node unless the service is restricted to a failedmember.

The daemon.log file will contain messages similar to those in thefollowing example. In the example, tailor is the member thatfails and tinker is a remaining member node. Note that theDirector daemon was running on tailor so that tinker starts anew Director.

9–4 Testing, Recovering, and Maintaining TruCluster Configurations

Page 307: Truclu Ase

Performing TruCluster Testing Procedures

Example 9–1 System Power Off Messages

Sep 16 14:24:22 tinker ASE: local HSM Warning: Can’t ping tailor over the SCSI busSep 16 14:24:25 tinker ASE: local HSM Warning: Can’t ping tailor over the networkSep 16 14:24:26 tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:DOWNSep 16 14:24:26 tinker ASE: local HSM Warning: member tailor is DOWNSep 16 14:24:46 tinker ASE: tinker AseMgr Warning: timeout waiting on Reply

to ASE_INQ_SERVICESSep 16 14:24:47 tinker ASE: tinker AseMgr Notice: director request timed out,

retrying...Sep 16 14:24:56 tinker ASE: tinker Agent Notice: agent on tailor should start

director, but isn’t in RUN stateSep 16 14:24:56 tinker ASE: tinker AseMgr Warning: timeout waiting on Reply

to ASE_INQ_SERVICESSep 16 14:24:57 tinker ASE: tinker Agent Notice: starting a new directorSep 16 14:25:00 tinker ASE: tinker Agent Notice: starting service nfsusersSep 16 14:25:13 tinker ASE: tinker Director Notice: started service nfsusers

on tinker

DWZZA-AAPower Off

This test consists of removing power from one of the DWZZA-AA signal converters located on a shared bus connected to acontroller. The purpose of the test is to simulate a SCSI pathfailure. This test can also be accomplished by disconnecting atri-link connector on the DWZZA-AA connected to a controller.

The results of this action are that the disks located on thecorresponding SCSI bus will be inaccessible to the affectedmember.

The TruCluster Software logs an error message, but takes nofurther action until I/O to the affected devices is attempted.

The following example shows the error messages produced whena DWZZA is disconnected.

Example 9–2 DWZZA Disconnection Error Messages

Sep 20 10:32:41 tinker ASE: local HSM Warning: Can’t ping tailor over the SCSI busSep 20 10:32:41 tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:UPSep 20 10:32:42 tinker ASE: local HSM ***ALERT: network ping to host tailor is

working but SCSI ping is notSep 20 10:22:53 tinker ASE: tinker Agent ***ALERT: device access failure on

/dev/rz17g from tinkerSep 20 10:22:58 tinker ASE: tinker Agent Warning: AM can’t ping /dev/rz17gSep 20 10:22:58 tinker ASE: tinker Agent Warning: can’t reach device ’/dev/rz17g’

Removing aShared Disk

This test consists of removing a shared disk during I/O activitywithout powering off the disk and is also referred to as a hotdisk removal. This test creates a device failure condition.

If LSM mirroring is being used, the volumes should still beavailable from the mirrored disk so that the service continues torun. The Director daemon logs a message.

If mirroring is not being used, the Director daemon stops theservice.

Testing, Recovering, and Maintaining TruCluster Configurations 9–5

Page 308: Truclu Ase

Performing TruCluster Testing Procedures

When a service is using mirrored disks, use the asemgr ’s rereservefunction to restart the service if no plexes are available and theservice fails. When a service is not using mirrored disks, youshould use the asemgr ’s restart function to restart the service.

The following example shows sample device error messages in aconfiguration that has mirrored volumes.

Example 9–3 Failed Device Error Messages

Sep 20 11:54:20 tailor ASE: tailor Agent ***ALERT: device access failure on/dev/rz17g from tailor

Sep 20 11:54:25 tailor ASE: tailor Agent Warning: AM can’t ping /dev/rz17gSep 20 11:54:25 tailor ASE: tailor Agent Warning: can’t reach device

’/dev/rz17g’Sep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:

Using default setting of ASE_PARTIAL_MIRRORING=OFFSep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:

LSM plex "users-01" is not enabledSep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:

LSM plex "users-02" is OK, volume "users-vol" can continue to runSep 20 11:54:35 tailor ASE: tailor Agent Notice: /var/ase/sbin/lsm_lv_action:

Device "/dev/vol/users-dg/users-vol" passed the LSM volume query

Disk recovery procedures are discussed later in this chapter.

RemovingPower fromBA350

This test removes power from a BA350 on a shared SCSI bus,which also simulates a device failure. This is similar to removinga shared disk. In a single BA350 configuration, the other membernodes cannot access any of the disks. Therefore, any associateddisk services become "unassigned" until power is reapplied to theBA350 device.

If a second BA350 had been configured to mirror the failedBA350, the results of this test would be that the service continuesto run with data I/O continuing on the mirrored BA350.

The example shows a portion of the messages generated whenpowering off a BA350.

Example 9–4 BA350 Power Off Error Messages

Sep 20 10:42:07 tailor ASE: tailor Agent Warning: AM can’t ping /dev/rz33gSep 20 10:42:07 tailor ASE: tailor Agent Warning: can’t reach device

’/dev/rz33g’Sep 20 10:42:07 tailor ASE: tailor Agent Warning: can’t reserve /dev/rz33gSep 20 10:42:07 tailor ASE: tinker Agent Error: can’t unreserve deviceSep 20 10:42:09 tailor ASE: tailor Agent Warning: AM can’t ping /dev/rz33gSep 20 10:42:09 tailor ASE: tailor Agent Warning: can’t reach device

’/dev/rz33g’Sep 20 10:42:09 tailor ASE: tailor Agent ***ALERT: possible device failure:

/dev/rz33g

9–6 Testing, Recovering, and Maintaining TruCluster Configurations

Page 309: Truclu Ase

Performing TruCluster Testing Procedures

Removing OneMember fromthe Network

This test causes a network interface failure by disconnectingall configured ASE network connections from one of the ASEmembers. Because the ASE software continuously pings thenetwork, it knows immediately when network connection to amember is lost.

When the network is lost to a member, the Agent daemon stopsall services on that member and the Director daemon relocatesthem to a remaining member, as defined by the AutomaticServices Placement policy for the ASE.

The following example shows the error messages for a networkinterface failure.

Example 9–5 Error Messages When One Member Removed from Network

Sep 20 12:07:45 tinker ASE: local HSM Warning: Can’t ping tailor over the networkSep 20 12:07:45 tinker ASE: local HSM ***ALERT: HSM_PATH_STATUS:30.14.80.33:DOWNSep 20 12:07:49 tinker ASE: local HSM Warning: member tailor is disconnected

from the networkSep 20 12:07:50 tinker ASE: tinker Agent ***ALERT: member ’tailor’ cut off from netSep 20 12:08:02 tinker ASE: tinker Agent Notice: starting service nfsusersSep 20 12:08:20 tinker ASE: tinker Director Notice: started service nfsusers

on tinkerSep 20 12:08:20 tinker ASE: tinker Director Notice: finished processing agent

state change from HSM: agent tailor state NIT_DOWN

Removing AllMembers fromthe Network

This test causes a network partition by disconnecting allconfigured ASE network connections from all the ASE members.

When the network is lost among all the ASE members, the ASEservices continue to run, but the Director daemon exits, so theTruCluster Available Server configuration no longer providesfailover or administrative functions until the network partition isrepaired.

The following example shows the error messages for a networkpartition.

Testing, Recovering, and Maintaining TruCluster Configurations 9–7

Page 310: Truclu Ase

Performing TruCluster Testing Procedures

Example 9–6 Error Messages When All Members Removed from Network

Sep 20 12:18:35 tinker ASE: local HSM Warning: \Network interface ln0 30.14.80.33:DOWN

Sep 20 12:18:35 tinker ASE: local HSM ***ALERT: \HSM_NI_STATUS:30.14.80.33:DOWN

Sep 20 12:18:37 tinker ASE: local Simulator Notice: \snd: exiting...

Sep 20 12:18:38 tinker ASE: tinker Director ***ALERT: \Network connection down... exiting

Sep 20 12:18:38 tinker ASE: tinker Director Warning: \Director exiting...

Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \/var/ase/sbin/lsm_dg_action: voldg: Disk group \users-dg: Some volumes in the disk group are in use

Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \/var/ase/sbin/lsm_dg_action: voldg deport \of disk group users-dg failed

Sep 20 12:18:45 tinker ASE: tinker Agent Notice: \/var/ase/sbin/lsm_dg_action: fsgen/volume: \Warning: Volume users-vol in use by another utility

Sep 20 12:18:50 tinker ASE: tinker AseMgr Error: \we’re net partitioned from the director

Sep 20 12:18:51 tinker ASE: tinker AseMgr ***ALERT: \Net partition or disconnect - cannot find a director.

Sep 20 12:18:51 tinker ASE: tinker AseMgr Error: \Unable to open the database.

9–8 Testing, Recovering, and Maintaining TruCluster Configurations

Page 311: Truclu Ase

Recovering from Failures in the ASE

Recovering from Failures in the ASE

InstructionalStrategy

Instructor Note

This section describes procedures that a systemadministrator may need to use when certain disk errorsoccur.These procedures are provided as examples for studentconsideration when recovering a failed disk. TheTruCluster Software automatically recovers from otherfailures.

Overview Recovery from the following failures is performed automatically:

• Member node

• SCSI path

• Network interface

• Network partition

Recovery from disk device failures requires manual recovery.

This section describes procedures for replacing an LSM mirroreddisk and replacing a nonmirrored disk if either disk fails duringnormal operations.

LSMMirrored DiskReplacement

When a disk under LSM control in an ASE becomes faulty,the Available Server Environment Alert script generates Alertmessages. The messages identify the faulty disk by including its/dev/rz device name along with error-specific information.

With mirrored disks, the remaining plexes to the faulty diskprovide users with transparent access to the data. However fieldservice engineers must still replace the bad disk. When the diskis replaced, you must bring the new disk on line. The followingsteps explain how to use LSM bottom-up commands to accomplishthis task.

Perform the following steps to replace a shared disk under LSMcontrol in an ASE.

1. Obtain LSM disk group information.

2. Remove faulty disk from the LSM database.

3. Restore the partition table.

4. Initialize the new disk for use with LSM.

Testing, Recovering, and Maintaining TruCluster Configurations 9–9

Page 312: Truclu Ase

Recovering from Failures in the ASE

5. Associate the new disk to a disk media name and to a diskgroup.

6. Recover the plex from the working plex.

7. Rereserve the corresponding service devices

These steps are described in the following sections.

Obtaining LSMDisk GroupInformation

Use the volprint command on the member running the service toobtain information on the failed disk. The example shows how todisplay all records in all disk groups and their associations withan RZ26l disk named rz49 as a faulty disk.

Example 9–7 LSM Disk Group Information

# volprint -hA

TYPE NAME ASSOC KSTATE LENGTH COMMENTdg rootdg rootdg - - Disk groupdm rz2g rz2g - 4096 Disk media

dg db db - - Disk groupdm pd-rz33 rz33 - 2050347 Disk mediadm pd-rz34 rz34 - 2050347 Disk mediadm pd-rz35 rz35 - 2050347 Disk mediadm pd-rz41 rz41 - 2050347 Disk media

.

.

.dm pd-rz49 - - 2050347 Disk mediadm pd-rz50 rz50 - 2050347 Disk mediadm pd-rz51 rz51 - 2050347 Disk media

.

.

.vol vold fsgen ENABLED 12301824 Volumeplex db-01 vold ENABLED 12301824 Plex nbr 1SD pd-rz33-data db-01 - 2050347 Sub-diskSD pd-rz34-data db-01 - 2050347 Sub-diskSD pd-rz35-data db-01 - 2050347 Sub-diskSD pd-rz41-data db-01 - 2050347 Sub-diskplex db-02 vold DISABLED 12301824 Plex nbr 2

SD pd-rz49-log db-02 - 2050347 Sub-diskSD pd-rz49-data db-02 - 2050347 Sub-disk

.

.

.

9–10 Testing, Recovering, and Maintaining TruCluster Configurations

Page 313: Truclu Ase

Recovering from Failures in the ASE

The listing provides the following information on rz49 :

• Disk group name - db

• Physical disk name - rz49

• Disk media name - pd-rz49

• Subdisk name - pd-rz49-log and pd-rz49-data

• Plex name - db-02

• Volume name - vold

The listing also shows that because the disk failed during I/O, thedb-02 plex was automatically disabled as indicated in the KSTATEcolumn.

You can confirm the failed disk information using the voldiskcommand, as shown in the following example.

Example 9–8 Confirming Failed Disk Information

# voldisk list

DEVICE TYPE DISK GROUP STATUSrz1g simple rz1g rootdg onlinerz33 sliced pd-rz33 db onlinerz34 sliced pd-rz34 db onlinerz35 sliced pd-rz35 db onlinerz36 sliced pd-rz36 db onlinerz41 sliced pd-rz41 db onlinerz42 sliced pd-rz42 db onlinerz43 sliced pd-rz43 db onlinerz49 sliced pd-rz49 db online

.

.

.rz50 sliced pd-rz50 db onlinerz51 sliced pd-rz51 db online

pd-rz49 db failed - was rz49

Note that the rz49 device is disassociated from the disk medianame, rz49-data along with the failed message in the STATUScolumn.

RemovingFaulty Diskfrom LSMDatabase

You must remove the faulty disk’s disk media name from the diskgroup and the disk access record from the LSM configurationdatabase so that you can use the disklabel command. Thefollowing example shows how to remove rz49 from the LSMconfiguration databases.

# voldg -g db rmdisk pd-rz49

# voldisk rm rz49

Testing, Recovering, and Maintaining TruCluster Configurations 9–11

Page 314: Truclu Ase

Recovering from Failures in the ASE

Restoring thePartition Table

Once the disk is removed from the LSM database, you mustrestore the partition table on the new disk. Use the disklabelcommand on a rz26l with a device name of rz49 as follows:

# disklabel -r -R rz49 /usr/ASE_SERVICES/disklabel_rz49 rz26l

Initializing theDisk for LSM

To initialize a sliced disk with rz49g as the public region andrz49h as the private region:

# voldisk -f init rz49

Associating theNew Disk

Next associate the new disk to a disk media name and to a diskgroup. Here is an example of associating rz49 to the db disk groupwith a disk name of pd-rz49 .

# voldg -g db -k addisk pd-rz49=rz49

Recovering thePlex

The last step is to recover the plex from the working plex. Thisoperation will not interfere with normal database service providedby the TruCluster Available Server, except for a temporary systemperformance degradation while the plex is restored. The recoveryprocess can take a number of hours to complete depending on thesize of the volume and the processing capability of the system.

The volrecover command restores the plex. For example, torecover the db plex using volrecover in the background:

# volrecover -g db -sb

Note

You must never interrupt a volrecover operation with akill -9 command because the volume and the plex willbecome locked.If kill -9 is invoked, you must issue commands similar tothe following to restart the recovery process:

# voledit -g db -P set tutil0="" db-02# voledit -g db -v set tutil0="" vold

Use the volprint command to check the recovery process asfollows:

# volprint -g db -mp db-02 | grep iomode

When the iomode equals RW for Read/Write, the recoveryoperation is complete.

9–12 Testing, Recovering, and Maintaining TruCluster Configurations

Page 315: Truclu Ase

Recovering from Failures in the ASE

Rereservingthe Service

You must finish incorporating the LSM disk into the ASE serviceby rereserving the devices associated with the service.

1. Run the asemgr utility.

2. Choose the Advanced Utilities submenu item from theManaging ASE Services menu.

3. Choose Rereserve a service’s devices (LSM only).

4. Select the service associated with the disk that you want torereserve.

Replacing aNonmirroredDisk

If a disk that is not part of an LSM mirrored volume or amirrored RAID device needs to be replaced, you must do one ofthe following:

• Remove the disk from the service by modifying the serviceand deleting the disk. Once you replace the disk, modify theservice and add the disk.Select the Modify a service item on the Service Configurationmenu.

• Temporarily stop the service by setting the service off line.Once you replace the disk, set the service on line.Select the Set a service off line and the Set a service on lineitems in the Managing ASE Services menu.

UnassignedService

A service becomes unassigned when the TruCluster Softwarecannot start the service on another eligible member, such as whenthere is a disk failure. The Availability Manager notifies theAgent daemon of a device path failure or device failure message.If the affected disk is not mirrored with LSM, the Agent stops theservice and notifies the Director daemon. The Director daemonsequentially attempts to start the service on another member.

If you see a service status of unassigned with asemgr , you shouldcheck the log files that the Logger daemon writes to, such as thedaemon.log file, to determine the cause.

Once the hardware is working, the service must be manuallyrestarted by selecting the Restart a service item in the ManagingASE Services menu.

Testing, Recovering, and Maintaining TruCluster Configurations 9–13

Page 316: Truclu Ase

Recovering from Failures in the ASE

ResettingTruClusterDaemons

If you experience problems, you can reset the TruClusterdaemons. This stops the Director and Host Status Monitordaemons and initializes the Agent daemons. The Agent daemonsthen restart the other daemons to make the TruCluster Softwareoperational.

To reset the TruCluster daemons, issue the following command:

# /sbin/init.d/asemember restart

9–14 Testing, Recovering, and Maintaining TruCluster Configurations

Page 317: Truclu Ase

Performing Ongoing Maintenance Tasks

Performing Ongoing Maintenance Tasks

Overview After you have set up your TruCluster Available Serverconfiguration and it has been running awhile, events may occurthat require you to:

• Change the hardware configuration

• Add or remove systems

• Add or remove storage boxes

• Add or remove disks

This section discusses how to accomplish these tasks.

ChangingHardwareConfiguration

The biggest concern with changing the hardware configurationfocuses on proper termination of shared buses. If the originalconfiguration used Y cables and DWZZAs, you may be able toisolate a device and maintain termination of the bus so that youcan perform maintenance without affecting the rest of the system.

If the original system was not configured to isolate devices, youwill need to stop all TruCluster Software daemon activity beforechanging the configuration and then restart the TruClusterSoftware once you have made the changes.

Testing, Recovering, and Maintaining TruCluster Configurations 9–15

Page 318: Truclu Ase

Performing Ongoing Maintenance Tasks

Stopping andRestartingTruClusterDaemonActivity

Before performing maintenance on a device you cannot isolateand maintain a terminated shared bus, you must set all the ASEservices off line and then stop all TruCluster Software daemonactivity. To stop all TruCluster Software daemon activity, invokethe asemember utility on all ASE members as follows:

# /sbin/init.d/asemember stop

Once you have reconfigured the system, restart TruClusterSoftware daemon activity as follows:

# /sbin/init.d/asemember start

After you issue the asemember start command, you can set theASE services back on line.

Adding andRemovingMember Nodes

If you want to add a member node, the shared bus will beunterminated at some position. Therefore, you must stop allTruCluster Software daemon activity before you add the newmember.

When removing a system, you do not have to stop TruClusterSoftware daemon activity if the shared bus is properly terminated.With a properly terminated bus, you can:

• Delete the member system from the ASE

• Shut down and power off the system

• Disconnect the system from the shared bus

Adding andRemovingStorage Boxes

When adding a storage box, you must stop all TruClusterSoftware daemon activity because the shared bus will beunterminated at some point in the configuration and will need tobe reconfigured.

When removing a storage box, you do not have to shut downTruCluster Software daemon activity provided that removing thestorage box will not cause the bus to be unterminated. Makesure that no services are using the disks in the storage box thatyou are removing unless the disks are part of an LSM mirroredvolume or a hardware-mirrored RAID device.

To take a shared disk off line, ensure that a running service is notusing the disk, unless the disk is part of a mirrored volume or amirrored RAID device. If the disk is being used by a service, useasemgr to put the service off line. Use the asemgr to restart theservice when you bring the disk back on line.

If the disk is part of an LSM mirrored set and the service isrunning, you must use the ASE Manager rereserve function whenyou bring the disk back on line. To use this function, choose a)Advanced Utilities from the Managing ASE Services menu. TheAdvanced Utilities submenu has the following menu item:

r) Rereserve a service’s devices (LSM only)

9–16 Testing, Recovering, and Maintaining TruCluster Configurations

Page 319: Truclu Ase

Performing Ongoing Maintenance Tasks

Adding andRemovingDisks

To add a disk to your hardware configuration:

1. Install the disk in the storage box.

2. Note the disk’s unique SCSI ID.The SCSI ID for a disk in a BA350 storage box corresponds toits slot number. The SCSI IDs for disks in a BA353 storagebox are set by the device address switch on the back of thebox. On an 8-bit SCSI bus, the SCSI specification limits thenumber of nodes or devices to eight and each SCSI device orcontroller must have a unique SCSI ID from 0 to 7.The HSZ10 uses only one SCSI ID, while the HSZ40 can beconfigured with 1 to 4 SCSI IDs.

3. If necessary, update the system configuration files to ensurethat the system recognizes the new disk.

4. If you are using the disk in an AdvFS domain or LSM volume,you must also perform the appropriate steps to add the disk tothe file system or volume.

If you are removing a disk from a storage box, ensure that arunning service is not using the disk, unless the disk is part ofan LSM mirrored volume or a mirrored RAID device. Use theasemgr to stop any services using a disk that you want to removeor replace.

Testing, Recovering, and Maintaining TruCluster Configurations 9–17

Page 320: Truclu Ase

Summary

Summary

PerformingTruClusterSoftwareTestingProcedures

Testing procedures give you confidence that the TruClusterSoftware configuration you have built will fail over as expected.Some of the tests that you can run are:

• Power off a member node to create a Host Down condition.

• Power off DWZZA-AA to cause a SCSI path failure.

• Remove a shared disk to create a device failure condition.

• Remove power from BA350 to also produce a device failure.

• Remove a member node from the network to create a networkinterface failure.

• Remove all member nodes from the network to create anetwork partition.

PerformingDisk RecoveryProcedures

When a disk becomes faulty, you must perform differentprocedures to recover from the failure depending on whether thedisk is mirrored with LSM or is a nonmirrored disk.

PerformingOngoingMaintenanceTasks

There are a number of maintenance tasks you must perform asconditions change with your system. These tasks include:

• Changing the hardware configuration

• Adding and removing disk storage boxes

• Adding and removing disks

• Adding and deleting members

9–18 Testing, Recovering, and Maintaining TruCluster Configurations

Page 321: Truclu Ase

Exercises

Exercises

MemberNode Failure:Exercise

Perform a member node failure test. How does the TruClusterSoftware respond?

MemberNode Failure:Solution

To test a member node failure:

a. Use the tail utility on an unaffected member node runningthe Logger daemon to observe the entries in the daemon.logfile. Locate the messages associated with services that failedover to another member node (for example, tinker ).

# tail -f /var/adm/syslog.dated/26-Mar-16:22/daemon.log

b. Turn off the power on one of the ASE members (for example,tailor ) running a particular service (such as the ase4 service).

c. Observe the error messages that were generated.

d. Confirm that services on the disconnected member node tailorfailed over as defined by the Automatic Service Placementpolicy to another member node tinker . This assumes that theservice is not restricted to run only on tailor . Use the asemgrmenu selections to display the status of the ase4 service:

Status for NFS service ‘ase4‘

Status: Relocate: Placement Policy: Favored Member(s):on tinker no Favor Member(s) tinker

NFS service ‘ase4‘ exports## ASE exports file for service ase4#

#loopback#fset1 exports (after this line) - DO NOT DELETE THIS LINE/share/test/loopback -r=0 cluster_dev

File system(s):AdvFS set: loopback#fset1 mount options: rwAdvFS domain: loopback devices: /dev/vol/dg2/vol01LSM Disk Group: dg2 devices: rz19c rz25c

NetworkInterface Test:Exercise

Perform a network interface test. How does the TruClusterSoftware respond?

NetworkInterface Test:Solution

To test a network interface failure:

a. Use the tail utility on an unaffected ASE member to observethe entries in the daemon.log file.

b. Disconnect all configured ASE network connections from anASE member node (for example, tailor ) running a particularservice (such as the ase4 service).

Testing, Recovering, and Maintaining TruCluster Configurations 9–19

Page 322: Truclu Ase

Exercises

c. Confirm that services on the disconnected member node tailorfailed over as defined by the Automatic Service Placementpolicy to another member node tinker . This assumes that theservice is not restricted to run on only tailor . Use the asemgrmenu selections to display the status of the ase4 service:

Status for NFS service ‘ase4‘

Status: Relocate: Placement Policy: Favored Member(s):on tinker no Favor Member(s) tinker

NFS service ‘ase4‘ exports## ASE exports file for service ase4#

#loopback#fset1 exports (after this line) - DO NOT DELETE THIS LINE/share/test/loopback -r=0 cluster_dev/share/test/loopback -r=0 cluster_dev

File system(s):AdvFS set: loopback#fset1 mount options: rwAdvFS domain: loopback devices: /dev/vol/dg2/vol01LSM Disk Group: dg2 devices: rz19c rz25c

Recoveringfrom Failuresin the ASE:Exercise

What procedure would you use to remove a nonmirrored diskfrom an ASE service?

Recoveringfrom Failuresin the ASE:Solution

To remove a nonmirrored disk from an ASE service:

a. Choose Modify a service from the Service Configuration itemin the asemgr Managing ASE Services menu and delete thedisk from the service.

b. Set the service off line.

c. Replace the disk.

d. Choose Modify a service from the Service Configuration itemin the asemgr Managing ASE Services menu and add the newdisk to the service.

e. Set the service back on line.

PerformingOngoingMaintenanceTasks:Exercise

Describe the commands used to start and stop all TruClusterSoftware daemon activity.

9–20 Testing, Recovering, and Maintaining TruCluster Configurations

Page 323: Truclu Ase

Exercises

PerformingOngoingMaintenanceTasks:Solution

• To stop all TruCluster Software daemon activity, use thefollowing command:

# /sbin/init.d/asemember stop

• To restart TruCluster Software daemon activity, use thefollowing command:

# /sbin/init.d/asemember start

Testing, Recovering, and Maintaining TruCluster Configurations 9–21

Page 324: Truclu Ase
Page 325: Truclu Ase

10Troubleshooting TruCluster Configurations

Troubleshooting TruCluster Configurations 10–1

Page 326: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter describes ways to troubleshoot a TruClusterAvailable Server configuration that is not functioning properly. Itfocuses on three main topics:

• Interpreting error messages

• Learning troubleshooting procedures

• Using system monitoring tools to diagnose problems

Used in combination, the information covered in these topics canhelp you to isolate and resolve problems in TruCluster AvailableServer configurations.

Objectives To understand how to troubleshoot a TruCluster Available Serverconfiguration, you should be able to:

• Describe basic techniques for troubleshooting TruClusterconfigurations

• Interpret TruCluster Software error log messages and use thatinformation to assess problem causes

• Describe the recommended procedures for troubleshootingTruCluster Software configurations

• Use the system monitoring tools to gauge whether aTruCluster Software implementation is functioning properly

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• Reference Pages

10–2 Troubleshooting TruCluster Configurations

Page 327: Truclu Ase

Introducing ASE Troubleshooting Techniques

Introducing ASE Troubleshooting Techniques

Overview TruCluster Available Server implementations involve the complexinteraction of hardware and software in customized solutions.Because of the variability of possible TruCluster configurations, itis difficult to generalize about malfunctions that may occur.

This chapter presents a general overview of the techniques andtools available for TruCluster Available Server troubleshooting.

TroubleshootingTopics

The contents of the three main troubleshooting topics in thischapter are as follows:

• Interpreting error messagesThe TruCluster Software keeps a record of significant eventsthat occur by generating error log messages. This sectiondescribes how to interpret these messages.

• Following troubleshooting proceduresIt is important to identify and resolve failures in TruClusterAvailable Server configurations with as little down time aspossible. This section describes procedures that promoteefficient troubleshooting of TruCluster problems.

• Using system monitoring toolsSystem monitoring tools allow you to determine whetheryour TruCluster Available Server configuration is functioningproperly. This section describes some of the most useful toolsfor monitoring TruCluster configurations.

Troubleshooting TruCluster Configurations 10–3

Page 328: Truclu Ase

Introducing ASE Troubleshooting Techniques

TroubleshootingSequence

The key to successful troubleshooting is to quickly isolate thesource of the problem. To promote this goal, the information inthis chapter is presented in an order that supports the followingrecommended troubleshooting strategy:

1. If possible, examine the error log messages to isolatethe component of your TruCluster configuration that ismalfunctioning.

2. Follow troubleshooting procedures to pinpoint the problemcause.

3. Use system monitoring tools to diagnose the problem.

4. Formulate and apply a recovery plan.

Following this strategy can help you to find and correct a problemwith a minimum of extraneous effort.

Figure 10–1 Troubleshooting Strategy

Examine ErrorLog and AlertMessages

Apply SystemMonitoringTools

ZKOX−3927−100−RGS

FollowTroubleshootingProcedures

Formulateand ApplyRecovery Plan

10–4 Troubleshooting TruCluster Configurations

Page 329: Truclu Ase

Interpreting TruCluster Error Log Messages

Interpreting TruCluster Error Log Messages

Overview Examining the TruCluster log messages is one of the best ways todetermine the cause of a problem in an ASE. This section reviewshow to access and interpret these messages.

TruClusterLoggerDaemon

The TruCluster Logger daemon (aselogger ) tracks the messagesgenerated by all the ASE member systems. The Logger daemonuses the event logging facility, syslog , which collects messageslogged by various kernel, command, utility, and applicationprograms.

Messages sent to the daemon.log file can be generated by thefollowing software components:

• AseMgr — The asemgr utility

• Director — The Director daemon

• Agent — The Agent daemon

• AseLogger — The Logger daemon

• HSM — The Host Status Monitor daemon

• AseUtility — A process or daemon unrelated to TruCluster

Error messages generated by the Availability Manager (AM)driver are sent to the kern.log file.

Log messages are logged to a local file or forwarded to a remotesystem, as specified in the system’s /etc/syslog.conf file. Logmessages generated by the HSM daemon and the AM driver arelogged only to the local system. If all Logger daemons in the ASEstop, daemon messages continue to be logged, but only locally.

To find the TruCluster message logs, identify a member systemrunning the Logger daemon, then check its system log files.Log files are located at: /var/adm/syslog.dated/ date , where dateis a date and time stamp subdirectory (such as 10-Feb-09:07 ).The daemon.log files contain messages generated by the variousTruCluster Software daemons according to the severity level setwith the asemgr . Error messages generated by the AvailabilityManager do not contain severity levels.

Troubleshooting TruCluster Configurations 10–5

Page 330: Truclu Ase

Interpreting TruCluster Error Log Messages

InterpretingLog Messages

TruCluster log messages include the following information:

• Time stamp

• Local system name

• ASE identifier (not used in messages from the AvailabilityManager driver)

• System that generated the message (or local)

• Source of the message:

AseMgr asemgr utility

Director Director daemon

Agent Agent daemon

HSM HSM daemon

AseLogger Logger daemon

AM Availability manager driver

vmunix Kernel image

AseUtility Command executed by an action script

• Severity of the message (not provided in AM messages)

• Message text

Figure 10–2 shows the location of the information in a sampledaemon.log message.

Figure 10–2 A daemon.log File Entry

Message Text

Member node on which the event originated

the message

Event severity level

Daemon that generated

Sep 27 15:38:20 tailor ASE: tailor Agent Notice: AseMgr on ’tailor’ disconnected

Name of the local system

Date and Time stampZKOX−5481−27−RGS

ASE identifier

10–6 Troubleshooting TruCluster Configurations

Page 331: Truclu Ase

Interpreting TruCluster Error Log Messages

Alert Messages An alert is triggered when the TruCluster Software detects acircumstance requiring user intervention to maintain normaloperation of the ASE. Such events include disk access failure,network access failure, and failure to read the TruClusterconfiguration database.

When an alert condition occurs, the TruCluster Software printsan Alert message in the log. It also writes the Alert message intothe file /var/ase/tmp/alertMsg and executes a user-defined script.The example script shipping with TruCluster mails the error textto a list of users. However, users can write their own scripts toperform other tasks in response to an alert, such as dialing abeeper number. Scripts can also be used to attempt to parse alerttext from the alertMsg file.

Figure 10–3 depicts the two paths an Alert message takes whenan alert condition occurs.

Figure 10–3 Alert Message Paths

Condition

AlertError

LoggerDaemon

Alert ScriptActions

Log Files in/var/adm/sys.dated/date

TruCluster

ZKOX−5481−28−RGS

Troubleshooting TruCluster Configurations 10–7

Page 332: Truclu Ase

Learning Troubleshooting Procedures

Learning Troubleshooting Procedures

DiagnosingActive Systems

This section discusses procedures that can help you to diagnoseand fix TruCluster Available Server problems with a minimum ofwasted effort.

You should try to identify and resolve TruCluster Available Serverproblems while the TruCluster Software is still up and running.The TruCluster Available Server configuration can continueproviding ASE services (if possible), and you have access to dataabout the ongoing status of the system.

As Figure 10–4 indicates, troubleshooting active TruClusterAvailable Server configurations involves the following steps:

• Check Error Logs — Start by examining the error log files.Error log messages can often identify a problem cause andsuggest a solution.

• Apply System Monitoring Tools — Use available systemmonitoring tools to look for problems.Utilities that can be helpful in troubleshooting TruClusterAvailable Server configurations are discussed in the systemmonitoring section of this chapter.

• Reset Daemons — If all else fails, you can reset the TruClusterSoftware daemons. This stops the Director, Logger, and HostStatus Monitor (HSM) daemons and initializes the Agentdaemons. The Agent daemons then restart all the daemons tomake the TruCluster Software fully operational.

10–8 Troubleshooting TruCluster Configurations

Page 333: Truclu Ase

Learning Troubleshooting Procedures

Figure 10–4 Troubleshooting an Active TruCluster Configuration

uerf toExamine Errors on

the SCSI Bus

ExamineError LogMessages

Use System Monitoring

Tools

file and

asemgr forASE Memberand Service

Status

AdvFS and

to Monitor Disk Services

LSM Utilities

netstat toDiagnose Network Problems

Reset

DaemonsTruCluster

ZKOX−5481−29−RGS

scu/show edtto Check SCSI

Devices

Troubleshooting TruCluster Configurations 10–9

Page 334: Truclu Ase

Learning Troubleshooting Procedures

DiagnosingNonactiveSystems

If a problem cannot be resolved while the TruCluster Software isup and running, you may need to shut down some or all of themember systems to look for the problem cause. As Figure 10–5indicates, you should use the following procedure:

1. Stop all TruCluster activity (procedures for stoppingTruCluster activity are described later in this section).

2. Shut down the systems (procedures for turning off membersystems are described later in this section).

3. Running from the console of each ASE member, check to seethat all devices are recognized. Determine that the basichardware setup, SCSI bus configuration, and network setupare all correct.

4. Boot each machine, then make sure that each disk isreachable by the operating system. Check to see that you canping each host member over the network. Use uerf to makesure devices are numbered consistently. If you discover aproblem, you may need to reconfigure the kernel.

Note

If steps 3 and 4 reveal no problems, you have a highassurance that the basic setup is OK.

5. Do some data transfers to make sure that the system cando reliable I/O. If you get parity errors, you probably have acabling problem.

6. Do some network transfers to make sure that the network isup and running properly.

7. If these steps reveal no problems, but failures in theTruCluster configuration persist, follow procedures fordiagnosing active systems.

10–10 Troubleshooting TruCluster Configurations

Page 335: Truclu Ase

Learning Troubleshooting Procedures

Figure 10–5 Troubleshooting Nonactive TruCluster Configurations

Stop All

Activity

Shut DownASE Member

Systems

Use ConsoleCommands toCheck Devices

Boot MemberSystems, andCheck Networkand SCSI Bus

Try SomeData

Transfers

If Necessary,Reconfigurethe Kernel

Follow Procedures for

DiagnosingActive Systems

TruCluster

ZKOX−5481−30−RGS

Troubleshooting TruCluster Configurations 10–11

Page 336: Truclu Ase

Using System Monitoring Tools

Using System Monitoring Tools

Overview You can use various system monitoring tools to examine yourTruCluster Available Server configuration and gauge the statusof system components. This section identifies system monitoringtools that can be helpful in diagnosing TruCluster problems. Italso describes procedures to apply these tools.

Table 10–1 lists some of the system monitoring utilities that canbe useful in troubleshooting TruCluster configurations.

Table 10–1 System Monitoring Utilities

Utility Function

asemgr Displays ASE member, service, and networkstatus

ps Displays daemon status

rpcinfo Displays daemon status and port information

uerf Searches CAM logs for SCSI bus problems

netstat Diagnoses network problems

iostat Determines disk status

file Checks devices on the shared SCSI bus

scu/show edt Checks devices in the Equipment Device Tableshared SCSI bus

showfdmn Monitors AdvFS file domains

showfsets Monitors AdvFS file sets

volprint Monitors LSM disk services

10–12 Troubleshooting TruCluster Configurations

Page 337: Truclu Ase

Using System Monitoring Tools

Using asemgrto MonitorMember Status

You can use the asemgr utility to display the host status and thestatus of the Agents for each member system in the ASE domain.Table 10–2 describes possible values displayed in the Host Statusfield.

Table 10–2 Host Status Values

Host Status Description

UP The member system is up and can beaccessed by the member running theDirector using the network and the SCSIbus.

DOWN The member system cannot be accessed bythe member running the Director using thenetwork or the SCSI bus.

DISCONNECTED The member system is disconnected fromthe network.

NETPAR There is a network partition between themember system running the Director andthe specified member.

Troubleshooting TruCluster Configurations 10–13

Page 338: Truclu Ase

Using System Monitoring Tools

Table 10–3 describes possible values displayed in the Agent Statusfield.

Table 10–3 Agent Status Values

Agent Status Description

RUNNING The Agent is running on the member system.

DOWN The Agent is not running on the membersystem.

INITIALIZING The Agent running on the member system isin its initialization phase and will be runningsoon.

UNKNOWN The Director cannot determine the state ofthe Agent on the member system.

INVALID The Director reports an invalid state for theAgent on the member system.

Example 10–1 shows the asemgr member status display of atwo-member TruCluster configuration where one member isdisconnected.

Example 10–1 Member Status

Member: Host Status: Agent Statustailor DISCONNECTED UNKNOWNtinker UP RUNNING

10–14 Troubleshooting TruCluster Configurations

Page 339: Truclu Ase

Using System Monitoring Tools

Using asemgrto MonitorService Status

You can also use the asemgr to display the status of existing ASEservices. The service status includes the following information:

• The type of service; either NFS, disk, or user-defined

• The service name

• The member on which the service is running or OFFLINE ifthe service is off line

• The service’s ASP policy

• The disk configuration that the service uses

Example 10–2 shows the asemgr status display for a disk servicenamed nfsusers .

Example 10–2 Service Status

Status for NFS service ‘nfsusers‘

Status: Relocate: Placement Policy: Favored Member(s):on tinker yes Balance Services None

Storage configuration for NFS service ‘nfsusers‘

NFS Exports list/usr/nfsusers

Mount Table (device, mount point, type, options)users-domain#users /var/ase/mnt/nfsusers/usr/nfsusers\

advfs rw,groupquota,userquota

Advfs ConfigurationDomain: Volume(s):users-domain /dev/vol/users-dg/users-vol

LSM ConfigurationDisk Group: Device(s):users-dg rz17 rz33

Using asemgrto Monitorthe NetworkConfiguration

You can choose the Show the current configuration item from theasemgr ASE Network Configuration menu to show the currentnetwork status, as shown in Example 10–3.

Example 10–3 Network Configuration Status

ASE Network Configuration

Member Name Interface Name Member Net Monitor___________ ______________ __________ _______

tinker tinker Primary Yes

tailor tailor Primary Yes

Is this configuration correct (y|n)? [y]: y

Troubleshooting TruCluster Configurations 10–15

Page 340: Truclu Ase

Using System Monitoring Tools

DeterminingHost AdapterSettings

To determine host adapter settings, follow the instructions forthe particular host adapter component you are using in yourTruCluster configuration. There is no generic way to obtainthis information; each host adapter has its own set of commandrequirements.

Using theuerf Utility toMonitor SCSIBus Errors

You can use the uerf utility to inspect information about CAMerrors on the SCSI bus. In addition, when a system reboots, uerfdisplays useful information about the bus configuration. Notethat the information displayed by the uerf utility varies accordingto the host adapter being used.

To generate uerf information, log in as superuser and enter thefollowing command string:

# uerf -o full | more

Example 10–4 shows uerf output indicating a bus reset.

Example 10–4 uerf CAM Error Display

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENTOS EVENT TYPE 199. CAM SCSISEQUENCE NUMBER 2.OPERATING SYSTEM DEC OSF/1OCCURRED/LOGGED ON Mon Apr 17 18:10:09 1995OCCURRED ON SYSTEM tinkerSYSTEM ID x00020004 CPU TYPE: DEC 3000SYSTYPE x00000000

----- UNIT INFORMATION -----

CLASS x0022 DEC SIMSUBSYSTEM x0000 DISKBUS # x0003

x00C8 LUN x0TARGET x1

----- CAM STRING -----

ROUTINE NAME ss_device_reset_done

Bus device reset has been performed

If the uerf output displays a repeated pattern of messageswhich has no obvious explanation, this often indicates there isa termination problem. It can also be useful to look for errorsregistered by different targets (devices) on the same bus — thisindicates that there is a problem with the specified bus.

10–16 Troubleshooting TruCluster Configurations

Page 341: Truclu Ase

Using System Monitoring Tools

MonitoringDaemons

Use the ps command to determine where the TruCluster daemonsare running. The following command string allows you to isolateTruCluster-related process information.

# ps -ax | grep ase

This command generates the following information display.

Example 10–5 Using ps to Determine Daemon Status

tailor.zko.dec.com> ps -ax | grep asePID TTY S TIME COMMAND265 ?? I < 0:01.02 /usr/sbin/aselogger267 ?? I < 0:02.00 /usr/sbin/aseagent -b358 ?? S < 23:09.48 asehsm

11491 ?? I < 0:00.40 asedirector

In addition, you can use rpcinfo -p to display a list of thedaemons running and the ports the daemons are using.

Example 10–6 Using rpcinfo to Display Daemon and Port Information

tailor.zko.dec.com> rpcinfo -p | grep aseprogram vers proto port

395179 1 tcp 1023 aselogger395177 1 tcp 1021 asehsm395176 1 tcp 1022 aseagent395175 1 tcp 1017 asedirector

You can also use rpcinfo to determine if applications areregistered and running. For more information, see therpcinfo(8nfs) man page.

Troubleshooting TruCluster Configurations 10–17

Page 342: Truclu Ase

Using System Monitoring Tools

Monitoring theNetwork

If a problem exists with a service using an IP "pseudo-address",you can go to the server on which the service is running and usenetstat to make sure it is configured. For example, the netstat -icommand generates the following information about service nfs2 :

Name Mtu Network Addr Ipkts Ierrs Opkts Oerrs Collnfs2 1500 10.30.50 none 32221 0 12843 6 6989

You can also use the scu command to monitor devices listedin the Equipment Device Table (edt). Example 10–7 displaysinformation about devices at bus 4, lun 0 generated by the scushow edt command.

Example 10–7 scu show edt Display

# scuscu> show edt bus 4 lun 0 fullInquiry Information:

SCSI Bus ID: 4SCSI Target ID: 1

SCSI Target LUN: 0Peripheral Device Type: 0 (Direct Access)

Peripheral Qualifier: 0 (Peripheral DeviceConnected)

Device Type Modifier: 0Removable Media: No

ANSI Version: 2 (Complies to ANSIX3.131-1994, SCSI-2)

ECMA Version: 0ISO Version: 0

Response Data Format: 2 (SCSI-2)Terminate I/O Process: 0

Asynchronous Notification: 0Additional Length: 91

Soft Reset Support: NoCommand Queuing Support: YesTarget Transfer Disable: Yes

Linked Command Support: NoSynchronous Data Transfers: Yes

Support for 16 Bit Transfers: NoSupport for 32 Bit Transfers: No

Relative Addressing Support: NoVendor Identification: DEC

Product Identification: RZ28 (C) DECFirmware Revision Level: 442D

10–18 Troubleshooting TruCluster Configurations

Page 343: Truclu Ase

Using System Monitoring Tools

MonitoringDisk I/O withiostat

You can use iostat to determine if you are able to read and writeto a specific disk. The following output was generated by theiostat rz24 command.

tailor.zko.dec.com> iostat rz24tty rz1 rz2 rz24 cpu

tin tout bps tps bps tps bps tps us ni sy id0 5 1 0 0 0 0 0 23 0 3 74

For more information about iostat , see the iostat(1) man page.

MonitoringLSMConfigurations

Use the volprint command to generate information about an LSMconfiguration. Example 10–8 shows the information generated bythe volprint command.

Example 10–8 volprint Display

# /usr/sbin/volprintTYPE NAME ASSOC KSTATE LENGTH COMMENTdg rootdg rootdg - -dm rz3a rz3a - 130544

You can also use the dxlsm utility to display the current status ofan LSM configuration. For more information, refer to the sectionon LSM System Administration.

MonitoringAdvFSConfigurations

To monitor an AdvFS configuration, use the showfdm and showfsetscommands. Example 10–9 shows the output of the showfdmcommand.

Example 10–9 showfdmn Display

tinker.zko.dec.com> showfdmn dbfdmn1

Id Date Created LogPgs DomainName

2f620925.00038f30 Sat Mar 11 512 dbfdmn115:33:41 1995

Vol 512-Blks Free %Used Cmode Rblks Wblks Vol Name1L 102400 93760 8% on 128 128 /dev/vol

/dbdg/vol22 102400 101456 1% on 128 128 /dev/vol

/dbdg/vol3------ ------ -----204800 195216 5%

Example 10–10 shows the output of the showfsets command.

Troubleshooting TruCluster Configurations 10–19

Page 344: Truclu Ase

Using System Monitoring Tools

Example 10–10 showfsets Display

tailor.zko.dec.com> /sbin/showfsets dbfdmn1dbfs1

Id : 2f620925.00038f30.1.8001Files : 2, SLim= 0, HLim= 0Blocks (512) : 32, SLim= 0, HLim= 0Quota Status : user=on group=on

dbfs2Id : 2f620925.00038f30.2.8001Files : 2, SLim= 0, HLim= 0Blocks (512) : 32, SLim= 0, HLim= 0Quota Status : user=on group=on

You can also use the dxadvfs command to display the currentstatus of an AdvFS configuration.

10–20 Troubleshooting TruCluster Configurations

Page 345: Truclu Ase

Summary

Summary

IntroducingASETroubleshootingTechniques

Troubleshooting TruCluster Software configurations involvesthree main activities:

• Interpreting error messages

• Learning troubleshooting procedures

• Using system monitoring tools to diagnose problems

Use the following strategy to troubleshoot an ASE configuration:

• Examine error log and alert messages

• Apply system monitoring tools

• Check common problem scenarios

• Review configuration guidelines

InterpretingTruClusterSoftware ErrorLog Messages

The TruCluster Logger daemon (aselogger ) tracks the messagesgenerated by all the ASE member systems.

Log files are located at: /var/adm/syslog.dated/ date , where date isa date and time stamp subdirectory (such as 23-Sep-09:07 ).

TruCluster Software messages include the following information:

• Time stamp

• Local system name

• ASE identifier (not used in messages from the availabilitymanager driver)

• System that generated the message (or local)

• Source of the message:

AseMgr asemgr utility

Director Director daemon

Agent Agent daemon

HSM HSM daemon

AseLogger Logger daemon

AM Availability manager driver

vmunix Kernel image

AseUtility Command executed by an action script

• Severity of the message (not provided in AM messages)

• Message text

When an alert condition occurs, the TruCluster Software printsan Alert message in the log. It also writes the Alert message intothe file /var/ase/tmp/alertMsg and executes a user-defined script.

Troubleshooting TruCluster Configurations 10–21

Page 346: Truclu Ase

Summary

LearningTroubleshootingProcedures

Troubleshooting active TruCluster implementations involves thefollowing steps:

• Check Error Logs — First, examine the error log files. Errorlog messages can often identify a problem cause and suggest asolution.

• Apply System Monitoring Tools — Use available systemmonitoring tools to look for problems.

• Reset Daemons — If all else fails, you can reset the TruClusterSoftware daemons. This stops the Director, Logger, and HostStatus Monitor (HSM) daemons and initializes the Agentdaemons. The Agent daemons then restart all the daemons tomake the TruCluster Software fully operational.

Use the following procedure to diagnose inoperative TruClusterSoftware implementations.

1. Stop all TruCluster activity.

2. Shut down the systems.

3. Running from the console of each ASE member, check to seethat all devices are recognized. Determine that the basichardware setup, SCSI bus configuration, and network setupare all correct.

4. Boot each machine, then make sure that each disk isreachable by the operating system. Check to see that you canping each host member over the network. Use uerf to makesure devices are numbered consistently. If you discover aproblem, you may need to reconfigure the kernel.

Note

If steps 3 and 4 reveal no problems, you have a highassurance that the basic setup is OK.

5. Do some data transfers to make sure that the system cando reliable I/O. If you get parity errors, you probably have acabling problem.

6. Do some network transfers to make sure that the network isup and running properly.

10–22 Troubleshooting TruCluster Configurations

Page 347: Truclu Ase

Summary

Using SystemMonitoringTools

The following system monitoring tools can be helpful indiagnosing problems with ASE configurations:

• asemgr for viewing ASE member and service status

• uerf for monitoring the SCSI bus

• ps and rpcinfo for determining daemon status

• netstat for monitoring the network

• iostat for monitoring disk I/O

• volprint and dxlsm for monitoring LSM configurations

• showfdmn and dxadvfs for monitoring AdvFS configurations

Troubleshooting TruCluster Configurations 10–23

Page 348: Truclu Ase

Exercises

Exercises

IntroducingASETroubleshootingTechniques:Exercise

Describe the recommended troubleshooting strategy for isolatingproblems in a TruCluster Software configuration.

IntroducingASETroubleshootingTechniques:Solution

The following troubleshooting strategy is recommended:

1. If possible, examine the error log messages to isolatethe component of your TruCluster configuration that ismalfunctioning.

2. Follow troubleshooting procedures to pinpoint the problemcause.

3. Use system monitoring tools to diagnose the problem.

4. Formulate and apply a recovery plan.

InterpretingTruClusterSoftware ErrorLog Messages:Exercise

Describe the format of a TruCluster error log message.

InterpretingTruClusterSoftware ErrorLog Messages:Solution

TruCluster Software error messages include the followinginformation:

• Time stamp

• Local system name

• ASE identifier (not used in messages from the AvailabilityManager driver)

• System that generated the message (or local)

• Source of the message

• Severity of the message (not provided in AM messages)

• Message text

10–24 Troubleshooting TruCluster Configurations

Page 349: Truclu Ase

Exercises

Diagnosinga NonactiveSystem:Exercise

Describe the procedure for diagnosing a nonactive TruClusterconfiguration.

Diagnosinga NonactiveSystem:Solution

Use the following procedure to diagnose a nonactive system.

1. Stop all TruCluster activity (procedures for stoppingTruCluster activity are described later in this section).

2. Shut down the systems (procedures for turning off membersystems are described later in this section).

3. Running from the console of each ASE member, check to seethat all devices are recognized. Determine that the basichardware setup, SCSI bus configuration, and network setupare all correct.

4. Boot each machine, then make sure that each disk isreachable by the operating system. Check to see that you canping each host member over the network. Use uerf to makesure devices are numbered consistently. If you discover aproblem, you may need to reconfigure the kernel.

5. Do some data transfers to make sure that the system cando reliable I/O. If you get parity errors, you probably have acabling problem.

6. Do some network transfers to make sure that the network isup and running properly.

7. If these steps reveal no problems, but failures in theTruCluster configuration persist, follow procedures fordiagnosing active systems.

GeneratingCAM ErrorInformation:Exercise

Describe the command sequence you use to generate CAM errorinformation about activity on the SCSI bus.

GeneratingCAM ErrorInformation:Solution

Use the following command sequence to generate CAM errorinformation about activity on the SCSI bus:

# uerf -o full | more

Troubleshooting TruCluster Configurations 10–25

Page 350: Truclu Ase

Exercises

MonitoringTruClusterDaemons:Exercise

Describe the utilitiess you can use to monitor the TruClusterSoftware daemons.

MonitoringTruClusterDaemons:Solution

You can use the ps utility and the rpcinfo utility to monitorthe TruCluster Software daemons. The command syntax fordetermining the TruCluster daemon status with the ps utility isas follows:

# ps -ax | grep ase

The command syntax for determining the TruCluster daemonstatus with the rpcinfo utility is as follows:

# /usr/sbin/rpcinfo -p | grep ase

10–26 Troubleshooting TruCluster Configurations

Page 351: Truclu Ase

11Resolving Common TruCluster Problems

Resolving Common TruCluster Problems 11–1

Page 352: Truclu Ase

About This Chapter

About This Chapter

Introduction This chapter describes ways to diagnose and recover fromfrequently reported problems in TruCluster Available Serverconfigurations. The information is divided into two main topics:

• Recognizing and solving common problems

• Applying TruCluster configuration guidelines

Objectives To understand how to troubleshoot a TruCluster Available Serverconfiguration, you should be able to:

• Describe some of the most common problems found inTruCluster Software configurations, and steps you can take toremedy them

• Apply TruCluster Software configuration guidelines todetermine if an implementation is properly set up

Resources For more information on the topics in this chapter, see thefollowing:

• TruCluster Available Server Software Available ServerEnvironment Administration

• TruCluster Available Server Software Hardware Configurationand Software Installation

• TruCluster Available Server Software Version 1.4 ReleaseNotes

• Reference Pages

11–2 Resolving Common TruCluster Problems

Page 353: Truclu Ase

Recognizing Common Problems and Their Symptoms

Recognizing Common Problems and Their Symptoms

Overview This section identifies some of the more common problems thatcan arise in TruCluster Available Server implementations.

TruCluster problems can be grouped into two main categories:

• Hardware problemsHardware problems usually involve either faulty hardwareconfiguration or failure of a hardware component. Table 11–1lists common hardware problems and their symptoms.

• Software problemsSoftware problems usually involve faulty configuration orfailure to comply with TruCluster requirements. Table 11–2lists common software problems and their symptoms.

Problems can also arise due to limitations of the hardware andsoftware used in a TruCluster configuration. Table 11–3 listsproblems related to such problems. Note that these limitationsmay not apply to future versions.

After each table, discussions of the listed problems contain thefollowing information:

• Problem description

• Symptoms

• Message patterns

• Possible solutions

If a malfunction in your TruCluster Available Server configurationcorresponds to one of the listed entries, you can use thisinformation to diagnose and recover from the problem.

Resolving Common TruCluster Problems 11–3

Page 354: Truclu Ase

Recognizing Common Problems and Their Symptoms

Table 11–1 lists frequently reported hardware problems.

Table 11–1 Frequently Reported Hardware Problems

Problem Symptom

Improperly configuredSCSI bus

Faulty I/O

Cannot ping member over SCSI bus

Cannot reach devices

SCSI CAM errors in uerf

Host adapter failure Cannot ping member over SCSI bus

Member crash Cannot ping member over network and SCSI bus

Disk failure Device unreachable

Improperly configuredstorage device

Frequent SCSI bus resets

I/O timeouts

Device unreachable

Network interface failure Cannot ping member over network

Network partition Cannot ping member over network

Cannot access network services

11–4 Resolving Common TruCluster Problems

Page 355: Truclu Ase

Recognizing Common Problems and Their Symptoms

ImproperlyConfiguredSCSI Bus

• Problem DescriptionSCSI bus difficulties are among the most frequentlyencountered problems in TruCluster Available Serverconfigurations. Common causes include:

Faulty bus connections

Improperly terminated bus segments

Cable lengths too long

Incorrect bus configurationNote that an improperly configured bus may operate properlyfor a period of time with no error conditions, but then causeproblems when under heavy load conditions.

• SymptomsCommon symptoms of SCSI bus problems include:

TruCluster cannot ping devices over SCSI bus

I/O errors and device failures

Services taken off line or undefined

• Message PatternsThe following daemon.log excerpt contains TruClustermessages sent during a SCSI bus failure.

Apr 17 18:54:16 tinker ASE: local HSM Warning: Can’t ping tailor over the SCSI busApr 17 18:54:17 tinker ASE: local HSM ***ALERT: network ping to host tailoris working but SCSI ping is not

• Possible Solutions

Review the SCSI bus configuration guidelines to ensurethat all requirements are met.

Check to see that the SCSI bus segments are properlyconnected and terminated.

Determine that the cable lengths are within the prescribedlimitations.

Running from a member console, use system monitoringtools to ensure that the bus setup is correct.

Note

If many otherwise unrelated failures occur on a singlebus (for example, failures on disks belonging to differentservices), it is a strong indication that there is a problemwith the specified bus.

Resolving Common TruCluster Problems 11–5

Page 356: Truclu Ase

Recognizing Common Problems and Their Symptoms

Host AdapterFailure

• Problem DescriptionA host adapter failure isolates the affected ASE member fromthe SCSI bus(es) to which the adapter is connected. However,a host adapter failure does not cause the TruCluster networkto stop functioning.

• Symptoms

Cannot ping affected member over SCSI bus

Services and/or TruCluster Director failover

If service is running on affected member under Restrict toFavored Member policy, the service goes off line

• Message PatternsThe following daemon.log message is generated when a hostadapter failure occurs.

Apr 17 14:05:44 tailor ASE: local HSM Warning:Can’t ping tinker over the SCSI bus

• Possible SolutionsIf you suspect that a host adapter has failed, first check thecable(s) connecting the adapter to the SCSI bus. If this doesnot resolve the problem, disconnect the affected member fromthe TruCluster configuration. If the TruCluster configurationnow functions properly, examine the suspect host adapter inthe following manner:

Ensure that the proper firmware is installed.

Confirm that the host adapter is recognized by the system.

If an adapter has been removed and reinstalled, make sureit has been placed in the proper slot.

If the configuration is correct and the host adapter stilldoes not work properly, replace the host adapter.

11–6 Resolving Common TruCluster Problems

Page 357: Truclu Ase

Recognizing Common Problems and Their Symptoms

Member Crash • Problem DescriptionA member system crash removes the affected memberfrom the TruCluster configuration, eliminating any failovercapability or performance enhancement provided by thatmember. However, if the TruCluster system is properlyconfigured, a member system crash does not cause TruClusterto fail.

• SymptomsCommon symptoms of a member system crash include:

Cannot ping member over SCSI bus or network

Services and/or TruCluster Director fail over

If service is running on crashed member under restrict tofavored member policy, the service goes off line

uerf error reports indicate SCSI bus resets

• Message PatternsThe following daemon.log excerpt shows the message patternthat occurs when member tinker fails and the TruClusterDirector and service ds1 fail over to member tailor .

Apr 17 18:09:58 tailor ASE: local HSM Warning: Can’t ping tinkerover the SCSI busApr 17 18:10:05 tailor ASE: local HSM Warning: Can’t ping tinkerover the networkApr 17 18:10:05 tailor ASE: local HSM Warning: member tinker is DOWNApr 17 18:10:05 tailor ASE: tailor Agent Notice: starting a newdirectorApr 17 18:10:07 tailor ASE: local Director ***ALERT: Member tinkeris not availableApr 17 18:10:08 tailor ASE: tailor Agent Notice: starting serviceds1Apr 17 18:10:27 tailor ASE: tailor Agent Notice:/var/ase/sbin/ase_filesystem: /dev/rvol/dbdg/vol1: 4 files, 23857 used, 24974 free(14 frags, 3120 blocks, 0.0% fragmentation)Apr 17 18:10:27 tailor ASE: tailor Agent Notice:/var/ase/sbin/ase_filesystem: /sbin/ufs_fsck -P /dev/rvol/dbdg/vol1Apr 17 18:10:27 tailor ASE: tailor Director Notice: started serviceds1 on tailor

• Possible SolutionsPotential causes of member crashes are too numerousto be discussed in this chapter. One TruCluster-specificmember crash, however, should be mentioned. This isthe phenomenon in which a reboot occurs when a servicerelocation is attempted while a user is occupying a mountpoint. This problem is discussed at greater length in theKnown TruCluster Limitations section.

Resolving Common TruCluster Problems 11–7

Page 358: Truclu Ase

Recognizing Common Problems and Their Symptoms

Storage DeviceFailure

• Problem DescriptionIn the case where a failing disk is not mirrored, the TruClusterSoftware:

Logs an Alert message

Stops the affected service

Marks the service as ‘‘unassigned’’ and issues an Alertmessage

If the service is mirrored on a device that has not failed,storage device failures may not become immediately apparent.However, a storage device failure or disk failure causes anAlert message to be sent.

• Symptoms

Disk or NFS service fails to start up

Device access failure messagesThe following asemgr status display shows the status of a diskservice in which the storage device has failed.

Status for DISK service ‘ds1‘

Status: Relocate: Placement Policy: Favored Member(s):- UNKNOWN yes Balance Services None

disk service "ds1"

File system(s):UFS device: /dev/vol/dbdg/vol1 mount options: rwAdvFS set: dbfdmn1#dbfs2 mount options: rwAdvFS set: dbfdmn1#dbfs1 mount options: rwAdvFS domain: dbfdmn1 devices: /dev/vol/dbdg/vol2 /dev/vol/dbdg/vol3

• Message PatternsThe following daemon.log error message pattern indicates astorage device failure.

Apr 19 17:42:06 tailor ASE: tinker Agent ***ALERT: device access failure on/dev/rz24g from tinker.zko.dec.comApr 19 17:42:13 tailor ASE: tinker Agent Warning: AM can’t ping /dev/rz24gApr 19 17:42:13 tailor ASE: tinker Agent Warning: can’t reach device’/dev/rz24g’Apr 19 17:42:19 tailor ASE: tinker Agent Warning: AM can’t ping/dev/rz25gApr 19 17:42:19 tailor ASE: tinker Agent Warning: can’t reach device’/dev/rz25g’

• Possible Solutions

Check the storage devices for proper connections andconfiguration.

Determine the location of the failed disk and followprocedures for replacing a disk described in the section onTruCluster failure recovery.

11–8 Resolving Common TruCluster Problems

Page 359: Truclu Ase

Recognizing Common Problems and Their Symptoms

NetworkInterfaceFailure

• Problem DescriptionA network interface error occurs when an individual memberbecomes isolated from the primary TruCluster network.Although this problem may cause services to fail over from theisolated member, the other members of a properly configuredTruCluster system should continue to function.

• SymptomsThe following asemgr display occurs during a network interfacefailure on member tinker .

Member Status

Member: Host Status: Agent Status:tinker DISCONNECTED UNKNOWNtailor UP RUNNING

If you try to examine the network status by choosing the Showthe current configuration item from the ASE Network Modifymenu, the TruCluster Software issues the message: ‘‘Netpartition or disconnect - cannot find a director.’’

• Message PatternsThe following daemon.log error message pattern was recordedwhen member tinker experienced a network interface failure.

Apr 17 19:02:54 tailor ASE: local HSM Warning: Can’t ping tinkerover the networkApr 17 19:02:56 tailor ASE: local HSM Warning: Can’t ping tinkerover the networkApr 17 19:04:39 tailor last message repeated 2 timesApr 17 19:04:43 tailor ASE: local HSM Warning: member tinker isdisconnected from the networkApr 17 19:04:44 tailor ASE: tailor Agent ***ALERT: member ’tinker’cut off from netApr 17 19:04:45 tailor ASE: tailor Director Notice: finishedprocessing agent state change from HSM: agent tinker state NIT_DOWN

• Possible Solutions

Check the cable connections from the affected ASE membersystem to the network.

Use netstat to diagnose the problem on the affectedsystem.

If necessary, replace the network controller or cableconnections.

Resolving Common TruCluster Problems 11–9

Page 360: Truclu Ase

Recognizing Common Problems and Their Symptoms

NetworkPartition

• Problem DescriptionA network partition occurs when all ASE member systemsbecome isolated from the local subnet.

• SymptomsThe following symptoms occur during a network partition:

Cannot ping member over network

Cannot access network servicesIf you try to examine the network status by choosing the Showthe current configuration item from the ASE Network Modifymenu, the TruCluster Software issues the message: ‘‘Netpartition or disconnect - cannot find a director.’’

• Message PatternsThe following is an example of the daemon.log message patternrecorded during a network partition on node tailor , themember on which the TruCluster Director was running.

Apr 17 19:23:15 tailor ASE: local HSM Warning: Network interface DOWNApr 17 19:23:15 tailor ASE: tailor Director ***ALERT: Network connectiondown... exitingApr 17 19:23:16 tailor ASE: tailor Director Warning: Director exiting...Apr 17 19:23:20 tailor ASE: local HSM Warning: Can’t ping tinker over thenetworkApr 17 19:23:26 tailor ASE: local HSM Warning: Can’t ping tinker over thenetwork

The next example shows the daemon.log message patternrecorded during a network partition on node tinker (on whichthe TruCluster Director was not running).

Apr 17 19:25:56 tinker ASE: local HSM Warning: Can’t ping tailor overthe networkApr 17 19:25:58 tinker ASE: local HSM Warning: Can’t ping tailor overthe networkApr 17 19:26:00 tinker ASE: local HSM Warning: network partition detectedbetween local host and member tailorApr 17 19:26:01 tinker ASE: tinker Agent ***ALERT: network is partitionedbetween local host and tailor

The following kern.log excerpt contains a TruCluster messagesent during a network partition.

Apr 17 19:23:09 tailor vmunix: ln0: lost carrier: check connectorApr 17 19:23:39 tailor last message repeated 17 timesApr 17 19:24:25 tailor last message repeated 21 times

• Possible SolutionsUse the following procedure to troubleshoot a networkpartition:

Check to see that all network cable connections areproperly attached.

Use netstat to troubleshoot the problem.

11–10 Resolving Common TruCluster Problems

Page 361: Truclu Ase

Recognizing Common Problems and Their Symptoms

Table 11–2 lists frequently reported software problems.

Table 11–2 Frequently Reported Software Problems

Problem Symptom

Invalid script format Script fails

Multiple asemgr processes asemgr locks

Removing a disk without updatingasemgr

System crashes

NFS service and ASE member withsame name

Agent daemon fails to initialize

Service alias not in /etc/hosts on allmembers

Cannot add service

ASEROUTING not set in NFS service NFS service cannot be added

ASE member not added to ASE database New member cannot connect to ASE services

LSM not configured on new member Disk service relocation fails

Resolving Common TruCluster Problems 11–11

Page 362: Truclu Ase

Recognizing Common Problems and Their Symptoms

Invalid ScriptFormat

• Problem DescriptionAn action script that contains an error will make it impossibleto start or stop the associated service.

• SymptomsThe following output is generated by the asemgr when aninvalid script is added to a service.

Example 11–1 Display when asemgr Cannot Modify Service

NOTE: Modifying a service causes it to stop and then restart. If you donot want to interrupt the service availability, do not modify the service.

Enter ’y’ to modify service ’ds1’ (y/n): yStopping service...Deleting service...Adding service...Starting service...Start failed - Unable to start service.Check syslog’s daemon log to determine the error.

This service uses either AdvFS or LSM in its storage configuration.You must select a member on which to leave the storage configured.

1) tinker2) tailor

x) Exit to Service Configuration ?) Help

Enter your choice [1]: x

Enter ’o’ to restore the old service configuration, ’n’ to retry the new serviceconfiguration, or ’d’ to delete the service [n]:

• Message PatternsThe following log message pattern is generated when aninvalid script is added to a service.

Apr 20 10:38:03 tinker ASE: tailor Agent Notice: starting service ds1Apr 20 10:38:12 tinker ASE: tailor Agent Notice: /var/ase/sbin/ase_filesystem: /dev/rvol/dbdg/vol1: File system unmounted cleanly - no fsck neededApr 20 10:38:12 tinker ASE: tailor Agent Notice: /var/ase/sbin/ase_filesystem: /sbin/ufs_fsck -P /dev/rvol/dbdg/vol1Apr 20 10:38:12 tinker ASE: tailor Agent Error: user script: /tmp/ase_sh5164[20]: B: not foundApr 20 10:38:16 tinker ASE: tailor AseMgr Error: Start failed - Unable to start service.

• Possible SolutionsTo solve a problem with an invalid Action Script, fix thescript and test it by executing it outside of the TruClusterconfiguration. As a general rule, all scripts should be testedthis way before adding them to an ASE service.

11–12 Resolving Common TruCluster Problems

Page 363: Truclu Ase

Recognizing Common Problems and Their Symptoms

MultipleasemgrProcesses

• Problem DescriptionIf you run multiple asemgr processes, the TruCluster databasebecomes locked, and you cannot perform any modifications onthe system.

• SymptomsThe asemgr exhibits the following display if you try to accessthe database when multiple asemgr processes are running.

ASE is locked by ‘tailor.zko.dec.com‘

ASE is locked - you have the option of forcing ASE to reset

Enter ’y’ to force ASE to reset (y/n): n

ASE is locked - you have the option of forcing ASE to reset

Enter ’y’ to force ASE to reset (y/n): y

• Message PatternsThe following message pattern is displayed in the daemon.logfile when you try to access the TruCluster database whenmultiple asemgr processes are running.

Apr 17 17:05:48 tinker ASE: tinker Director Notice: DB lock is inuse by an ASEmgr on tailorApr 17 17:05:48 tinker ASE: tinker AseMgr Error: ASE is locked by‘tailor.zko.dec.com‘Apr 17 17:05:50 tinker ASE: tinker AseMgr Error: Unable to freeupASE database.Apr 17 17:05:50 tinker ASE: unknown client Notice: restarting Agent!Apr 17 17:05:50 tinker ASE: tinker Agent Notice: restarting Agent!

• Possible SolutionsTo resolve this problem, stop the multiple asemgr processesuntil only one is running.

Resolving Common TruCluster Problems 11–13

Page 364: Truclu Ase

Recognizing Common Problems and Their Symptoms

RemovingDisk WithoutUpdatingasemgr

• Problem DescriptionIf you remove a disk from an ASE service and forget to updatethe asemgr to redefine the service, the service will fail.

• SymptomsIf the service is running when you remove the disk, the systemrunning the service will crash when you try to access a mountpoint. However, if the service is mirrored, it may fail over toanother member.If the service is not running when you remove the disk, theservice becomes unreachable.

• Message PatternsThe following daemon.log message pattern is generated whenyou access a mount point on a disk that has been removed.

Apr 28 12:18:47 tailor ASE: tinker Agent Error:can’t unreserve deviceApr 28 12:18:50 tailor ASE: tinker Agent Warning:AM can’t ping /dev/rz24gApr 28 12:18:50 tailor ASE: tinker Agent Warning:can’t reach device ’/dev/rz24g’Apr 28 12:18:51 tailor ASE: tinker Agent ***ALERT:possible device failure: /dev/rz24gApr 28 12:18:51 tailor ASE: tinker Agent Error:can’t unreserve device /dev/rz24gApr 28 12:18:54 tailor ASE: tinker Agent Error:can’t unreserve deviceApr 28 12:18:57 tailor ASE: tinker Agent Warning:AM can’t ping /dev/rz24gApr 28 12:18:57 tailor ASE: tinker Agent Warning:can’t reach device ’/dev/rz24g’Apr 28 12:18:58 tailor ASE: tinker Agent ***ALERT:possible device failure:/dev/rz24gApr 28 12:18:58 tailor ASE: tinker Agent Error: can’t unreserve device /dev/rz24gApr 28 12:19:15 tailor ASE: tinker Agent Notice: can’t unreserve ds1’sdevices, stopping it anywayApr 28 12:19:25 tailor ASE: local HSM Warning: Can’t ping tinker over the network

• Possible SolutionsIf possible, try to set a service off line before you replace adisk used by the service. In any case, you must ensure that nousers are occupying the mount point or otherwise attemptingto access the disk while it is being replaced.After you replace the disk and turn on the storage device,rereserve the device (if necessary) and set the service back online.

11–14 Resolving Common TruCluster Problems

Page 365: Truclu Ase

Recognizing Common Problems and Their Symptoms

NFS Serviceand ASEMember withSame Name

• Problem DescriptionNFS services require the configuration of a pseudo host that isassociated with an internet address so that the service can befailed over. If the service name is the same as one of the ASEmembers, the Agent daemon process for that host name willbecome confused and fail to initialize properly.

• SymptomsIf this problem arises, the asemgr host status for the affectedhost will be UP, while the Agent status will be DOWN.However, the Agent daemon will be running on the host.

• Message PatternsThe following message is recorded when a system with ASEmember name tinker has an NFS service name tinker .

Apr 22 11:40:38 tinker ASE: tinker AseMgr Error:AseMgr failed to initialize

• Possible SolutionsReconfigure the NFS service with a unique name.

Resolving Common TruCluster Problems 11–15

Page 366: Truclu Ase

Recognizing Common Problems and Their Symptoms

ServiceAlias not in/etc/hosts onAll Members

• Problem DescriptionASE services require that you associate the service name withan IP address by placing an entry in the /etc/hosts file on allASE member systems. If you do not do this, you will not beable to configure the service.

• SymptomsIf this problem arises, the attempt to add the service will fail.

• Message PatternsThe following message pattern is recorded when the servicenfsusers is added, but the service and IP address are notproperly entered in the /etc/hosts files on all the membersystems:

Sep 20 16:24:25 tailor ASE: tinker Agent Notice: addingservice nfsusers

Sep 20 16:24:26 tailor ASE: tailor Agent Notice: addingservice nfsusers

Sep 20 16:24:28 tailor ASE: tinker Agent Error:/var/ase/sbin/nfs_ifconfig: nfsusers not in hostsdatabase

Sep 20 16:24:29 tailor ASE: tailor Agent Error:/var/ase/sbin/nfs_ifconfig: nfsusers not in hostsdatabase

Sep 20 16:24:30 tailor ASE: tinker Agent Error:/var/ase/sbin/nfs_ifconfig: nfsusers not in hostsdatabase

Sep 20 16:24:31 tailor ASE: tailor Agent Error:/var/ase/sbin/nfs_ifconfig: nfsusers not in hostsdatabase

Sep 20 16:24:31 tailor ASE: tinker Director Error:can’t add serviceSep 20 16:24:31 tailor ASE: tailor AseMgr Error:Add failed - Unable to add service.

• Possible SolutionsAdd the proper service alias and IP address to the /etc/hostsfile on all member systems.

11–16 Resolving Common TruCluster Problems

Page 367: Truclu Ase

Recognizing Common Problems and Their Symptoms

ASEROUTINGnot Set in NFSService

• Problem DescriptionAs described in Available Server Environment Administration,it is possible to configure an NFS service to broadcast hostnames on networks that are not native to the ASE NFSservice name. If you set up this configuration but fail toset the rc.config file to ASEROUTING=yeson all ASE membersystems, the NFS service cannot be added to the configurationdatabase.

• SymptomsThe asemgr issues a message that the NFS service cannot beadded.

• Message PatternsThe following message is recorded when ASEROUTING is notset in an NFS service.

tailor ASE: tailor Agent Error:Must be IP router and run gated to useASE routing; run netsetup.

• Possible SolutionsRun netsetup again, following the instructions in the AvailableServer Environment Administration for setting up TruClusterrouting. After you have completed the netsetup process, makesure you run the following command on all ASE membersystems:

# rcmgr set ASEROUTING yes

Resolving Common TruCluster Problems 11–17

Page 368: Truclu Ase

Recognizing Common Problems and Their Symptoms

ASE Membernot Added toTruClusterDatabase

• Problem DescriptionIf the TruCluster Software is installed on a member system,but the member is not added to the TruCluster configurationdatabase on the original member system, the new system willnot be recognized as a legitimate member.

• SymptomsIf the asemgr is run on one of the preexisting ASE members,the new member will not be displayed in the member statusdisplay. Since the new member is essentially a foreign system,any requests by the new member to connect to ASE serviceswill fail.

• Message PatternsThe following message pattern is recorded when system tinkertries to access the TruCluster configuration without havingbeen added to the TruCluster database.

May 9 07:31:50 tailor ASE: tailor Agent ***ALERT: ***Possible securitybreach attempt: connect request from non-member node ’tinker’May 9 07:31:50 tailor ASE: tailor Agent Notice: connection refused byconnect callback

• Possible SolutionsRun the asemgr on the original member and add the newmember to the TruCluster configuration database.

11–18 Resolving Common TruCluster Problems

Page 369: Truclu Ase

Recognizing Common Problems and Their Symptoms

LSM notConfigured onNew Member

• Problem DescriptionFor a disk service to fail over to a new member, LSM must beset up on the new system, with rootdg configured on a localdisk.

• SymptomsIf LSM has not been configured on a new member, and aservice using LSM attempts to relocate to the new system, therelocation will fail.

• Message PatternsThe following daemon.log message patterns are displayed ifLSM has not been configured on a system and a service usingLSM attempts to relocate.

May 9 10:27:06 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:voldg: Volume daemon is not accessible

May 9 10:27:07 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:voldg deport of disk group dbdg failed

May 9 10:27:07 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:forcing a deport of disk group

May 9 10:27:07 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:voldg: Volume daemon is not accessible

May 9 10:27:07 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:voldg deport of disk group dbdg failed

May 9 10:27:07 tailor ASE: tinker Agent Error: /var/ase/sbin/lsm_dg_action:force deport of dbdg failed

• Possible SolutionsUse volsetup to configure the rootdg disk group on the newsystem, then restart the vold daemon.

KnownTruClusterLimitations

Table 11–3 lists known limitations of TruCluster AvailableServer Software Version 1.4. Note that these limitations may beaddressed in future versions.

Table 11–3 Known Limitations of TruCluster Available Server Software Version 1.4

Problem Symptom

Users occupying mount points System reboots when service relocated

Non-TruCluster processes with higherpriority

ASE services time out

BC09D cables used with KZTSAcontroller

Errors on SCSI bus

Resolving Common TruCluster Problems 11–19

Page 370: Truclu Ase

Recognizing Common Problems and Their Symptoms

UsersOccupyingMount Points

If you try to relocate a service while users are occupying a mountpoint, it will cause the system on which the service is running toshut down.

• Problem DescriptionTo stop a disk-based service, the TruCluster Software mustbe able to unmount the file systems. This means that theTruCluster Software must be able to stop all processesaccessing the mounted file systems. You should ensure that allprocesses invoked by the start action script are stopped by thestop action script. Avoid users accessing the local mount point(and preventing unmounting) by recommending that usersaccess only the directory that is exported.

• SymptomsThe system shuts down when you try to relocate a service.

• Message Patterns

Mar 1 10:05:05 member1 ASE: member1 Agent Notice: stopping serviceservice1Mar 1 10:05:37 member1 ASE: member1 Agent Error: /var/ase/sbin/ase_mount_action: /share/test/service1: Device busyMar 1 10:05:38 member1 last message repeated 9 timesMar 1 10:05:38 member1 ASE: member1 Agent Error: /var/ase/sbin/ase_mount_action: Unable to umount /share/test/service1Mar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm_dg_action: voldisk: Device rz16c: Device is in useMar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm_dg_action: voldisk: Device rz19c: Device is in useMar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm_dg_action: voldisk: Device rz25c: Device is in useMar 1 10:05:39 member1 ASE: member1 Agent Notice: /var/ase/sbin/lsm_dg_action: voldg: Disk group dg2: import failed: Disk group exists andis importedMar 1 10:05:41 member1 ASE: member1 Director Error: can’t stop serviceMar 1 10:05:41 member1 ASE: member1 AseMgr Error: Stop failed - Unableto stop service.Mar 1 10:05:41 member1 ASE: member1 AseMgr Error: Unable to stop service ’service1’ - Relocation not successful.Mar 1 10:05:41 member1 ASE: member1 AseMgr Error: Unable to relocate‘service1‘ to ‘member2‘.

• Possible SolutionsAvoid accessing the mount point on systems running ASEservices.

11–20 Resolving Common TruCluster Problems

Page 371: Truclu Ase

Recognizing Common Problems and Their Symptoms

Non-TruClusterProcesses withHigher Priority

• Problem DescriptionIf there are non-TruCluster processes with a schedulingpriority higher than the priority of the TruCluster daemons,the daemons could time out while waiting to run. If thisoccurs, a TruCluster timeout error appears in the daemon.logfile.

• SymptomsIf the TruCluster Software daemons do not start even thoughthe TruCluster Software is properly configured, you may havea timeout problem.

• Message PatternsThe following message pattern indicates a timeout error dueto a scheduling priority problem.

Mar 8 13:09:28 surry ASE: surry AseMgr error:ASE timeout - Unable to stop service.

• Possible SolutionsIf necessary, you can raise the scheduling priority of theTruCluster daemons by changing the lines in the /sbin/init.d/asemember file that start the asedirector , aseagent ,and aselogger daemons, or fixing a higher priority with theaseagent -p command. For more information about schedulingpriorities, see Available Server Environment Administrationpriorities.

Resolving Common TruCluster Problems 11–21

Page 372: Truclu Ase

Recognizing Common Problems and Their Symptoms

Using BC09Cables withKZTSAController

• Problem DescriptionBC09D narrow cables are built to an earlier SCSI specification,and they are conditionally supported in TruClusterconfigurations; the newer BN family of cables is preferred. IfBC09D cables are used in an ASE configuration, they shouldbe limited to slow speed operation, with a maximum length of3 meters.If you use BC09D cables with a KZTSA controller, you mayget errors on the SCSI bus.

• SymptomsIf you have a problem with a BC09D cable, common symptomsinclude irregular I/O and reports of CAM errors.

• Message PatternsPossible daemon.log message patterns associated withproblems with a BC09D cable include errors accessing deviceson the shared SCSI bus, such as the following:

May 02 12:42:06 tailor ASE:tinker Agent ***ALERT: device access failure on/dev/rz24g from tinker.zko.dec.com

• Possible SolutionsReplace the BC09D cable(s) with equivalent cables from theBNxx family of cables.

11–22 Resolving Common TruCluster Problems

Page 373: Truclu Ase

Applying TruCluster Configuration Guidelines

Applying TruCluster Configuration Guidelines

TruClusterConfigurationGuidelines

This section contains a series of checklists you can use to helpdetermine whether a TruCluster Available Server implementationis properly configured. The information in this section is dividedinto the following sections:

• General Hardware Configuration

• SCSI Bus

• Termination

• Host Adapters

• Disk Storage Enclosures

• Signal Converters

• Tri-link Connectors

• Network Connections

• TruCluster Software Installation

• General Software Configuration

• TruCluster Configuration

• Service Configuration

• Disk Services

• LSM Configuration

• AdvFS Configuration

• Action Scripts

For specific information about configuring TruCluster components,see the appropriate section in the TruCluster Available ServerSoftware documentation.

Resolving Common TruCluster Problems 11–23

Page 374: Truclu Ase

Applying TruCluster Configuration Guidelines

GeneralHardwareConfiguration

Use the following checklist to determine if your basic hardwareconfiguration complies with TruCluster requirements:1

• TruCluster Available Server configurations can have fromtwo to four member systems (if your system has a PMAZC orKZMSA host adapter, the upper limit is three).

• All member systems and disk storage boxes must be connectedby means of a shared SCSI bus.

• Only eight devices (SCSI controllers, disks, or RAIDcontrollers) are allowed on each shared bus.

• Devices must be connected in a way that enables you todisconnect them from the bus without affecting the bustermination.

• All members in your ASE must be configured to be on at leastone common network subnet.

• A member system may not boot properly if it does not have agraphics head attached. If a member does not have a graphicshead, you must set the SERVER console variable to ON.

1 These checklists assume that your configuration uses only hardware andsoftware supported for TruCluster Available Server Software.

11–24 Resolving Common TruCluster Problems

Page 375: Truclu Ase

Applying TruCluster Configuration Guidelines

SCSI Bus Use this checklist to determine whether your SCSI busconfiguration complies with TruCluster requirements.

• Configurations can use no more than 8 SCSI IDs (0-7) on eachshared SCSI bus (devices that require SCSI IDs include hostadapters and storage devices, but not signal controllers).

• Member systems must see all devices on a given SCSI bus atthe same SCSI ID.

• If you are using a dual-ported controller, the shared bus mustbe on the same channel.

• If narrow and wide devices are used, a signal converter mustbe placed between them.

• Tape devices are not permitted on the shared SCSI bus.

• The length of the shared SCSI bus must be within theTruCluster limits:

Sub-buses cannot exceed 3 meters for single-ended fast, 6meters for single-ended slow, and 25 meters for differential.

Bus length calculations must include the internal buslengths of devices.

• All the SCSI controllers on a bus do not have to be set to thesame bus speed. However, using fast SCSI bus speed on anySCSI controller connected to a bus decreases the total amountof single-ended cable that you can use for that bus from 6meters to 3 meters. If one or more SCSI controllers are setto fast SCSI bus speed, your shared bus must adhere to thiscable length restriction.

Termination Use this checklist to determine whether your SCSI bus is properlyterminated:

• An improperly terminated bus segment will cause problems.The bus may operate properly for a period of time with noerror conditions, but cause problems when under heavy loadconditions.

• Check to see that the shared SCSI buses are properlyterminated at both ends (this includes sub-buses or bussegments).

• Make sure that terminators that should be removed are not inplace.

• Check to see that all movable resistors are installed correctly(they must not be installed backwards).

Resolving Common TruCluster Problems 11–25

Page 376: Truclu Ase

Applying TruCluster Configuration Guidelines

Host Adapters Use this checklist to determine whether the host adapters in yourconfiguration comply with TruCluster requirements.

• The SCSI host adapters must be installed in logicallyequivalent I/O bus slots in each system. When the kernelboots, the SCSI bus number is determined by the order inwhich the SCSI controllers are installed in SCSI bus slots,starting with the first slot.

• The correct firmware must be installed in all SCSI hostadapters.

• SCSI IDs and mode settings must be correct for each hostadapter.

• Depending on your configuration, you may have to remove theinternal termination for a SCSI controller port.

• Unused ports on duel-ported host adapters must beterminated.

Disk StorageEnclosures

Use this checklist to determine whether the storage devices inyour configuration comply with TruCluster requirements.

• Check to see that all systems on a given shared SCSI bus seethe disks at the same device.

• If you alter the logical configuration of a bus, you must allowfor the fact that previously configured disk device numbersmay have changed.

• Be careful when performing maintenance on any devicelocated on the shared bus because of the constant activityon the bus. In general, to perform maintenance on a devicewithout shutting down the ASE, you must isolate the devicefrom the shared bus without affecting bus termination.

• To take a disk that is on the shared bus off line, you mustensure that no service is using the disk, unless the disk is partof an LSM-mirrored logical volume or a mirrored RAID device.

• If you disconnect a storage box from the SCSI bus, TruClusterbehavior is unpredictable, unless one of the followingconditions is met:

The disks are part of a mirrored Logical Storage Manager(LSM) volume.

The disks are contained in a RAID set.

• When an offline disk goes on line again, you must use theasemgr utility to manually restart the service. You should dothis even if the disk was part of a mirrored volume because itmay not be reserved.

11–26 Resolving Common TruCluster Problems

Page 377: Truclu Ase

Applying TruCluster Configuration Guidelines

SignalConverters

Use this checklist to determine whether the signal converters inyour configuration comply with TruCluster requirements.

• If the SCSI host adapter on an ASE member system isconnected to a DWZZA, be sure to follow the proper sequencewhen turning the system on or off:

Turn on the system and allow it to complete startupdiagnostics before you turn on any DWZZA attached to thesystem.

Before turning off a system, you must turn off all DWZZAsconnected to the system.

• If you take off the DWZZA-AA cover to remove termination,ensure that star washers are in place on all four screws thathold the cover in place when you replace the cover. (If thestar washers are missing, the DWZZA-AA is susceptible toproblems caused by excessive noise.)

Tri-linkConnectors

Tri-link connectors must adhere to the following restrictions:

• If you connect a cable to a tri-link connector, do not blockaccess to the screws that mount the tri-link, or you will beunable to disconnect the tri-link from a device.

• If a device to which a tri-link connector is connected is at theend of a shared bus, you can attach a terminator to one of thetri-link’s connectors to terminate the bus.

NetworkConnections

Use this checklist to determine whether the network in yourconfiguration complies with TruCluster requirements:

• Systems must be on the same network subnet.

• Member system names must correspond to the name returnedby the /sbin/hostname command.

• A DEMFA network controller cannot be the network controllerassociated with the host name of a member system.

• If the primary network connected to the systems becomessaturated, TruCluster operation is impaired. If you receivemessages indicating that you are out of mbufs you can:

Use a dedicated network as the primary network for themember systems.

Adjust the ubcmaxpercent and ubcminpercent configurationfile parameters. Refer to the Digital UNIX manualSystem Tuning and Performance Management for moreinformation.

Resolving Common TruCluster Problems 11–27

Page 378: Truclu Ase

Applying TruCluster Configuration Guidelines

TruClusterSoftwareInstallation

Use this checklist to determine that the software installationof your TruCluster Available Server implementation meetsTruCluster requirements.

• The TruCluster Software subset can be installed only onsystems running the supported operating system version.

• Each member system should have at least 64 MB of memory.

• You must install all required operating system subsets.

• The TruCluster Software subset and license must be installedon each member system.

• If you install the PAK after you install the TruClusterSoftware, you must reboot the system after installing the PAK.

• If you use AdvFS, LSM, NFS, or RIS, additional subsets arerequired:

For AdvFS: POLYCENTER AdvFS, Utilities, and GUI

For LSM: Logical Storage Manager Advanced Utilities andGUI

For NFS: NFS Utilities (client and server)

For RIS: Remote Installation Service (RIS) subsetUse setld -i to determine which subsets are installed on yoursystem.

• Check the TruCluster Available Server Software release notesand cover letter to see if there are operating system patchesfor TruCluster.

11–28 Resolving Common TruCluster Problems

Page 379: Truclu Ase

Applying TruCluster Configuration Guidelines

GeneralSoftwareConfiguration

Use this checklist to determine that the general softwareconfiguration for your TruCluster implementation meetsTruCluster requirements.

• You must set up the local network on each member system(see netsetup(8) ).

• Set up NIS or BIND if you intend to use them for networkname resolution on your network (see bindsetup(8) andypsetup(8) ).

• If you intend to use NFS services, you should set up NFS andstart the daemons (see nfssetup(8) ).

• Set up MAIL so that root can receive alert messages fromTruCluster (see mailsetup(8) ).

• You must set up a distributed time service such as theNetwork Time Protocol (NTP) daemon (xntpd) . TruClusterrelies on synchronized time on all member systems; it needsaccurate timestamps for database versions.

• The host names and IP addresses of each member systemmust be included in each member system’s /etc/hosts file.

Resolving Common TruCluster Problems 11–29

Page 380: Truclu Ase

Applying TruCluster Configuration Guidelines

TruClusterSoftwareConfiguration

Use this checklist to determine that your TruCluster Softwareconfiguration meets TruCluster requirements.

• When you use the asemgr utility to add the member systemsto the ASE, add all the member systems at the same timeand from the same system. Do not run the asemgr utility onone system and add one member system, then run the asemgrutility on another system and add a different member system.

• When adding a new member to an existing TruClusterconfiguration, do not run asemgr on the new member, as thiswill create a new ASE domain and a new database.

• To change the name of a member system, you must delete themember system from the ASE, initialize the system, and thenadd the new member system to the ASE.

• You cannot delete a member if it is included in the list ofmembers that are favored to run the service, according tothe service’s Automatic Service Placement (ASP) policy. Youmust use the asemgr utility to change the service’s ASP beforedeleting the member. This restriction does not apply if theservice allows you to change the only favored member to theone specified when you manually relocate a service. In thissituation, use the asemgr utility to relocate the service beforedeleting the member.

• You cannot delete the member running the asemgr utility. Ifthere is only one member system in the ASE, you cannotdelete that member using asemgr . Use setld -d to delete theTruCluster Software subset from the last member system.

11–30 Resolving Common TruCluster Problems

Page 381: Truclu Ase

Applying TruCluster Configuration Guidelines

ServiceConfiguration

Use this checklist to determine that the service configuration foryour TruCluster implementation meets TruCluster requirements.

• The maximum number of services TruCluster can handle is256.

• Only one member system at a time can run a given service.

• You cannot use an NFS service name that is the same as thename of a member system. You cannot use a service namethat has a slash (/) in it.

• Only certain types of applications can be made available withan ASE service. The application must:

Run on only one system at a time.

Be able to be started and stopped using a set of commandsperformed in a specific order.

When you set up a service, these commands are included inthe action scripts for the service.

• The balanced service policy attempts to balance the service atthe time a new service is started. It does not relocate servicesto continue balancing the service load.

• TruCluster Software clients refer to service names ratherthan server names. For instance, to access an NFS servicenfs_services , a client will have a line such as the following inits /etc/fstab file:

/project@nfs_service /usr/project nfs rw,bg 0 0

The client must also have an entry in its /etc/hosts filefor nfs_service with an Internet address. This is a floatingaddress aliased to the member system currently running theservice.

• When adding a service, be sure to configure the service so thatit can run on all member systems.

Resolving Common TruCluster Problems 11–31

Page 382: Truclu Ase

Applying TruCluster Configuration Guidelines

Disk Services Use this checklist to determine that the disk services in yourTruCluster Available Server implementation meet TruClusterrequirements.

• NFS service names and member names must have addresseson the same IP subnet, and must be in all members’ /etc/hosts files.

• Do not manually edit the /etc/exports.ase. service file; youmust use the asemgr utility to edit it.

• To use UFS with TruCluster, set up the disks in the usualway with disklabel and newfs . Do not locally mount the filesystems because TruCluster mounts them for you when theservice is started.

• A disk cannot be used in more than one service because aservice must have exclusive access to the disk. When you usea disk in a service, you use the entire disk.

• Applications in user-defined services cannot use disks. If yourapplication is disk based, set up a disk service instead.

11–32 Resolving Common TruCluster Problems

Page 383: Truclu Ase

Applying TruCluster Configuration Guidelines

LSMConfiguration

Use this checklist to determine that the LSM configurationin your TruCluster Available Server implementation meetsTruCluster requirements.

• When using LSM in TruCluster Available Server configuration,all member systems need LSM software so that any of themcan run the service.

• All member systems need the rootdg disk group set up on alocal (nonshared) disk. The rootdg disk group must be active(imported) whenever TruCluster is active, to provide an activedisk group for LSM. Set up another disk group using theshared disks.

• You must set up a service’s LSM disk groups and volumes onthe same member on which you will be running the asemgrutility to set up the service.

• All LSM disk group names in the TruCluster configurationmust be unique.

• When adding a service that uses LSM or modifying a service’sLSM configuration, the disk groups used in the service mustbe imported to the machine on which you are running theasemgr utility. LSM configuration changes can be made only onan imported disk group.

• A disk or disk group can be used in only one service, but aservice can use more than one disk or disk group.

• Rereserving an LSM device allows you to replace a newsynchronized part of an LSM mirrored volume withoutstopping the service.

• If a service uses LSM mirrored volumes, do not modify theservice while a mirrored volume is resynchronizing becausethe resynchronization will abort and then restart. The abortwill not corrupt the volume, but it will delay the volumeresynchronization.

Resolving Common TruCluster Problems 11–33

Page 384: Truclu Ase

Applying TruCluster Configuration Guidelines

AdvFSConfiguration

Use this checklist to determine that the AdvFS configurationin your TruCluster Available Server implementation meetsTruCluster requirements.

• When using AdvFS in a TruCluster configuration, all membersystems must have AdvFS software installed.

• To use AdvFS with TruCluster, set up the domains and filesetson the same member on which you will run asemgr to add theservice.

• If you create a disk service that uses AdvFS and choose notto have the TruCluster Software automatically mount thefilesets, a member system may panic unless the followingconditions are met:

Before you add the disk service, make sure that the filesetis not already mounted.

If you mount a fileset in your own user-defined actionscripts, make sure that the user-defined stop action scriptunmounts the file system and returns an error if theunmount fails.

• To modify an AdvFS configuration that a service uses, thedisks must be configured on the system on which you makethe modifications.

• AdvFS domain names must be unique on all the membersystems.

• A service can use more than one AdvFS domain, but a domaincannot be used by more than one service.

• A service should control all the filesets in the domain; do notput one fileset in a service and mount another locally.

• Do not locally mount the filesets because TruCluster mountsthem for you when the service is started.

• Do not use quotas on a UFS file system or enable quotas on anAdvFS fileset that you want to fail over.

11–34 Resolving Common TruCluster Problems

Page 385: Truclu Ase

Applying TruCluster Configuration Guidelines

Action Scripts Use this checklist to determine that the action scripts in yourTruCluster Available Server implementation meet TruClusterrequirements.

• When you add a disk service, you are not prompted for actionscripts. To fail over an application, modify the service andspecify the action scripts.

• If you create your own user-defined action scripts, you mustinstall them locally on each member system.

• If you specify a pathname for a script when prompted by theasemgr utility, you can edit the script only by using the asemgr .This is because the TruCluster Software uses the copy of thescript that is in the TruCluster database and not the onelocated on the system.

• Ensure that only processes started and stopped by theservice’s action scripts can access the disks used in a service.Make sure that all the processes invoked by the start actionscript are stopped by the stop action script. The actions in thestart script must be reversed by the actions in the stop script.

• If you must allow users access to the local mount point,ensure that the service’s stop action script is able to stop theseprocesses.

• The TruCluster Software does not report messages generatedby applications running within an ASE service. However,TruCluster action scripts capture any output from thecommands that they execute. If the action script fails, thecommand output is logged in the daemon.log file.

Resolving Common TruCluster Problems 11–35

Page 386: Truclu Ase

Summary

Summary

RecognizingCommonProblems

You should be familiar with the common TruCluster hardwareand software problems that have been described in this chapter.In addition, you should be aware of the TruCluster limitationswhich cause problem situations.

ApplyingTruClusterConfigurationGuidelines

Use the TruCluster configuration guideline checklists to helpdetermine whether your TruCluster Software configuration isproperly set up. Checklists are provided for the following topics:

• General Hardware Configuration

• SCSI Bus

• Termination

• Host Adapters

• Disk Storage Enclosures

• Signal Converters

• Tri-Link Connectors

• Network Connections

• TruCluster Software Installation

• General Software Configuration

• ASE Configuration

• Service Configuration

• Disk Services

• LSM Configuration

• AdvFS Configuration

• Action Scripts

11–36 Resolving Common TruCluster Problems

Page 387: Truclu Ase

Exercises

Exercises

TruClusterMessageInterpretation:Exercise

Describe the common problem that generates the followingTruCluster error message:

May 9 07:31:50 tailor ASE: tailor Agent ***ALERT:***Possible security breach attempt:connect request from non-member node ’tinker’May 9 07:31:50 tailor ASE: tailor Agent Notice:connection refused by connect callback

TruClusterMessageInterpretation:Solution

A common cause of this message is a situation where ASE hasbeen installed on a member system, but the member has notbeen added to the ASE configuration database on the originalmember system. The new system is not recognized as a legitimatemember.

ProblemRelocatingService:Exercise

If you try to relocate a disk service and get a device busy message,what is the likely cause?

ProblemRelocatingService:Solution

The message may indicate that the TruCluster Software couldnot unmount a disk, possibly because it could not stop all theprocesses accessing the disk.

The TruCluster Software cannot stop a service that uses mountedfile systems, filesets, or volumes unless it can unmount them.TruCluster may not be able to unmount a disk in the followingsituations:

• A process that accesses the disk was started by the servicesstart action script, but was not stopped by the service’s stopaction script.

• A process that the start action script did not start and thatis unrelated to TruCluster is accessing the disk. This couldoccur if a user logs in to the system on which the file system islocally mounted and changes directory to the mount point.

SCSI ID Limits:Exercise

Describe the limitations that TruCluster places on the use of SCSIIDs.

SCSI ID Limits:Solution

TruCluster Software configurations can use no more than 8 SCSIIDs (0-7) on each shared SCSI bus (devices that require SCSIIDs include host adapters and storage devices, but not signalcontrollers).

Resolving Common TruCluster Problems 11–37

Page 388: Truclu Ase

Exercises

ApplyingTruClusterConfigurationGuidelines:Exercise

Describe important service configuration guidelines.

ApplyingTruClusterConfigurationGuidelines:Solution

Service configuration guidelines are as follows:

• The maximum number of services TruCluster can handle is256.

• Only one member system at a time can run a given service.

• You cannot use an NFS service name that is the same as thename of a member system. You cannot use a service namethat has a slash (/) in it.

• Only certain types of applications can be made available withan ASE service. The application must:

Run on only one system at a time.

Be able to be started and stopped using a set of commandsperformed in a specific order.

When you set up a service, these commands are included inthe action scripts for the service.

• The balanced service policy attempts to balance the service atthe time a new service is started. It does not relocate servicesto continue balancing the service load.

• TruCluster Software clients refer to service names ratherthan server names. For instance, to access an NFS servicenfs_services , a client will have a line such as the following inits /etc/fstab file:

/project@nfs_service /usr/project nfs rw,bg 0 0

The client must also have an entry in its /etc/hosts filefor nfs_service with an Internet address. This is a floatingaddress aliased to the member system currently running theservice.

• When adding a service, be sure to configure the service so thatit can run on all member systems.

11–38 Resolving Common TruCluster Problems

Page 389: Truclu Ase

12Test

Test 12–1

Page 390: Truclu Ase

Questions

Questions

In the space provided, write the letter corresponding with the bestanswer to each multiple-choice, matching, or true/false question.

1. What is not a feature of the TruCluster Softwareproduct?

a. Concurrently Active Servers

b. Network Failover

c. Distributed Lock Manager

d. Transparent NFS Failover

2. The following is a hardware requirement for TruClusterSoftware.

a. A shared SCSI bus

b. External disks in an expansion box

c. Ethernet or FDDI network

d. All of the above

3. Identify the Available Server component that controlsAvailable Server operations on one member system.

a. aseagent

b. asedirector

c. asehsm

d. asemgr

4. The Available Server component that controls the ASEand coordinates the ASE activities.

a. aseagent

b. asedirector

c. asehsm

d. asemgr

5. The following characteristic is not required for anapplication to be suitable for an ASE service:

a. The application must be able to be stopped using a set ofcommands issued in a specific order

b. The application must read and write data from an NFS filesystem

c. The application must be able to be started using a set ofcommands issued in a specific order

d. The application must run on only one system at a time

12–2 Test

Page 391: Truclu Ase

Questions

6. When you make initial plans for an Available Serverimplementation, you must consider:

a. Services to be made available

b. Survivable failures

c. Do services require custom scripts?

d. All of the above

7. Which feature is not provided by the TruClusterSoftware?

a. Failover of services

b. Decoupling of host name and service name

c. Restarting failed applications

d. Determining status of ASE members

8. What provides the user interface to the ase software?

a. asedirector

b. asemgr

c. Cluster Monitor

d. Availability Manager driver

9. A failure condition that does not result in a servicerelocation:

a. Network Interface Failure

b. Host Down Scenario

c. Device Failure

d. Network Partition

10. You must use a DWZZA signal converter with a KZMSAbecause:

a. The KZMSA has only one channel

b. The KZMSA uses the differential mode of signaltransmission

c. You cannot remove the KZMSA internal terminators

d. The KZMSA operates on only a wide SCSI bus

11. You should use a DWZZA signal converter with aPMAZC SCSI host adapter because:

a. The DWZZA increases the maximum shared SCSI buslength

b. The PMAZC is a dual-ported SCSI host adapter

c. The PMAZC operates as either a fast or slow SCSI hostadapter

Test 12–3

Page 392: Truclu Ase

Questions

d. Signal conversion is necessary to connect the PMAZC to aBA350 storage box

12. The maximum length of the shared SCSI bus in anAvailable Server configuration for Version 1.4 is:

a. 3 meters

b. 6 meters

c. 25 meters

d. 31 meters

13. Which cable do you attach to a single-ended device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

14. You cannot mix single-ended and differential SCSI bussegments on a shared SCSI bus for ASE configurations.

a. True

b. False

15. What could you use in place of a BN21W-0B?

a. H8574-A

b. H8660-AA

c. H879-AA

d. H885-AA

16. You are configuring a shared SCSI bus on port A of aPMAZC. Which jumper do you remove to disable the PMAZCsingle-ended bus termination?

a. W1

b. W2

c. W3

d. W4

12–4 Test

Page 393: Truclu Ase

Questions

17. What is the most important thing to consider whenconfiguring DEC 3000 Model 500 systems in a single-endedAvailable Server configuration with a BA350 without using aDWZZA?

a. Remove the PMAZC single-ended termination jumper forthe port being used

b. Remove the flash memory write jumper

c. SCSI bus length is appropriate for the bus speed

18. Which console command sets the SCSI ID for aKZTSA?

a. t tc cnfg

b. t tc setid

c. t tc speed

d. t tc id

19. When using a KZMSA XMI to SCSI bus adapter inan Available Server configuration, you must use a DWZZAbecause you cannot remove the KZMSA single-ended bustermination.

a. True

b. False

20. The lfu utility modifies the SCSI ID or bus speed forwhich adapter?

a. KZMSA

b. KZPSA

c. KZTSA

d. PMAZC

21. Use the set console command to set the SCSI ID or busspeed for which adapter?

a. KZMSA

b. KZPSA

c. KZTSA

d. PMAZC

Test 12–5

Page 394: Truclu Ase

Questions

22. You have an Available Server configuration with twoAlphaServer 2100 systems with KZPSA PCI to SCSI adapters,an HSZ40, a DWZZA-VA, and a BA350 containing four RZ28s.Why is the HSZ40 likely to be assigned SCSI ID zero (0)?

a. The DWZZA-VA is installed in BA350 slot 0

b. SCSI ID 0 is reserved for the HSZ40

c. The HSZ40 must have a higher priority than the RZ28s

d. SCSI ID 0 is reserved for the HSZ40

e. None of the above

23. Before installing the TruCluster Software:

a. Read the release notes

b. Verify system prerequisites

c. Install, set up, and test hardware

d. All of the above

24. Which subsets are not required for TruCluster AvailableServer Software Version 1.4?

a. OSFCLINET405

b. OSFPGMR405

c. OSFCMPLRS405

d. None of the above; all are required

25. To use the Cluster Monitor, you must install whichsubsets?

a. CXLSHRDA405

b. OSFCDEMIN405

c. TCRCMS140

d. All of the above

26. A rolling upgrade allows you to upgrade ASE membersystems without shutting down the ASE.

a. True

b. False

27. For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C

b. ASE V1.1/Digital UNIX Version 3.0

c. ASE V1.0A/Digital UNIX Version 2.1

12–6 Test

Page 395: Truclu Ase

Questions

d. ASE V1.0/Digital UNIX Version 2.0

28. You can perform a rolling upgrade to Digital UNIXVersion 4.0A from which operating system version?

a. Digital UNIX Version 3.2C

b. Digital UNIX Version 3.2D

c. Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G

29. If ASE member systems are at DECsafe AvailableServer Version 1.2, you can preserve the ASE database ifdesired.

a. True

b. False

30. To add a new member to an existing TruClusterAvailable Server Software Version 1.4 configuration, you mustshut down the ASE before adding the new member.

a. True

b. False

31. Which configuration is supported during a rollingupgrade to TruCluster Available Server Software Version 1.4?

a. ASE V1.3/Digital UNIX Version 3.2D and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

b. ASE V1.3/Digital UNIX Version 3.2F and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

c. ASE V1.3/Digital UNIX Version 3.2G and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

d. All of the above

32. Action script containing commands to stop anapplication.

a. Add

b. Delete

c. Start

d. Stop

e. Check

Test 12–7

Page 396: Truclu Ase

Questions

33. Action script containing commands to determine if aservice is running.

a. Add

b. Delete

c. Start

d. Stop

e. Check

34. Minimum scripts you must create for an applicationservice.

a. Add and delete

b. Start and stop

c. Add, delete, start, and stop

d. Add, delete, start, stop, and check

35. How do you specify an action script that will remainexternal to TruCluster Available Server in asemgr ?

a. Specify the pathname when asemgr prompts for the scriptname

b. Specify default , then add the commands to the skeletonscript

c. Specify default , then add a pointer to the external script

d. Copy the external script to the ASE database

36. You create an action script and specify its pathnameto asemgr . The next day you make changes to the script, butTruCluster Available Server does not use the new version.What action must you take?

a. Delete and add the service again

b. Modify the script through asemgr to update its database

c. Edit the default script

d. Stop and start the service

37. If you restrict ASE service A to one favored memberand that member crashes:

a. ASE relocates service A on the member running the leastnumber of services

b. ASE relocates service A on another favored member

c. ASE finds no other favored member and relocates serviceA on the member running the least number of services

d. ASE will not relocate service A

12–8 Test

Page 397: Truclu Ase

Questions

38. If you select the balanced service distribution policy forservice A and the member system running it crashes:

a. ASE relocates service A on the member running the leastnumber of services

b. ASE relocates service A on another favored member

c. ASE finds no other favored member and relocates serviceA on the member running the least number of services

d. ASE will not relocate service A

39. Before using asemgr to add an NFS service you must:

a. Specify the service name and Internet address in allmember /etc/hosts files

b. Specify the service name and Internet address in all clientsystem /etc/hosts files

c. Set up the UFS device special files, AdvFS, or LSMvolumes

d. All of the above

40. You are adding a disk service and the service name isin the /etc/hosts file only for the member system on whichasemgr is being run. The service will be set up and started.

a. True

b. False

41. When you write an action script to both start and stopthe user-defined service, it will require what parameters?

a. Service name only

b. Action only

c. Service name and action

d. None of the above

42. When setting up a highly available user-definedapplication, the application must be installed on all membersystems.

a. True

b. False

43. If you stop a service that uses the Logical StorageManager, the disk groups are deported, are inaccessible, andthe volumes are deleted.

a. True

b. False

Test 12–9

Page 398: Truclu Ase

Questions

44. If the status of a service is unassigned, how would youmanually restart the service?

a. Relocate a service

b. Restart a service

c. Set a service on line

d. Set a service off line

45. The command to create the cluster map.

a. /etc/CCM

b. cluster_map_create

c. cmon

d. tractd

46. The command to start the Cluster Monitor.

a. asemgr

b. cluster_map_create

c. cmon

d. monitor

47. You are logged in to ASE member alpha fromworkstation gamma and run the Cluster Monitor. You seeASE domain members alpha and beta. To run the LSM utilityon beta you must:

a. Drag the beta icon and drop it on the dxlsm icon

b. Drag the dxlsm icon and drop it on the alpha icon

c. Drag the dxlsm icon and drop it on the beta icon

d. Drag the dxlsm icon and drop it on the gamma icon

48. To check the status of the shared disks associatedwith a particular ASE service, go to this view of the ClusterMonitor.

a. Top view, or main window

b. Devices view

c. Services view

d. Any of the above

49. In the Cluster Monitor, this symbol indicates the systemis not reported as an ASE member.

a. Blank area in the shape of the system icon

b. Outline around the system graphic

c. Diagonal line across the system icon

d. Question mark in the middle of the system icon

12–10 Test

Page 399: Truclu Ase

Questions

50. When a network partition occurs, the TruClusterSoftware:

a. Stops all services

b. Fails over services to the member on which the Director isrunning

c. Continues running all services on the servers on whichthey are located

d. Reboots all systems on which services are running

51. Which command initializes a disk for LSM?

a. voldisk init

b. voldg init

c. voldg -g db

d. volrecover -g db -sb

52. If your ASE has a properly terminated bus, withoutstopping TruCluster Software activity, you can:

a. Add a storage box to the system

b. Add a new member system

c. Remove a DWZAA from the bus

d. Disconnect a member system from the bus

53. The file in which the Availability Manager logs errormessages:

a. daemon.log

b. asecdb

c. kern.log

d. asemgr.log

54. When troubleshooting an active TruClusterimplementation, the first thing you should do is:

a. Reset the daemons

b. Examine the error log messages

c. Stop the services

d. Turn off the DWZAAs

55. Which utility displays the status of an ASE service?

a. uerf

b. showfdmn

c. ps

d. asemgr

Test 12–11

Page 400: Truclu Ase

Questions

56. One common cause of problems on a TruCluster SCSIbus:

a. Improperly terminated bus segments

b. Cable lengths too long

c. Improperly configured SCSI IDs

d. All of the above

57. The host names and IP addresses of each membersystem must be included in which file on each membersystem?

a. rc.local

b. asecdb

c. /etc/hosts

d. /etc/fstab

12–12 Test

Page 401: Truclu Ase

Answers

Answers

1. c What is not a feature of the TruCluster Softwareproduct?

a. Concurrently Active Servers

b. Network Failover

c. Distributed Lock Manager

d. Transparent NFS Failover

2. e The following is a hardware requirement for TruClusterSoftware.

a. A shared SCSI bus

b. External disks in an expansion box

c. Ethernet or FDDI network

d. All of the above

3. a Identify the Available Server component that controlsAvailable Server operations on one member system.

a. aseagent

b. asedirector

c. asehsm

d. asemgr

4. b The Available Server component that controls the ASEand coordinates the ASE activities.

a. aseagent

b. asedirector

c. asehsm

d. asemgr

5. a The following characteristic is not required for anapplication to be suitable for an ASE service:

a. The application must be able to be stopped using a set ofcommands issued in a specific order

b. The application must read and write data from an NFS filesystem

c. The application must be able to be started using a set ofcommands issued in a specific order

d. The application must run on only one system at a time

6. d When you make initial plans for an Available Serverimplementation, you must consider:

a. Services to be made available

b. Survivable failures

Test 12–13

Page 402: Truclu Ase

Answers

c. Do services require custom scripts?

d. All of the above

7. c Which feature is not provided by the TruClusterSoftware?

a. Failover of services

b. Decoupling of host name and service name

c. Restarting failed applications

d. Determining status of ASE members

8. b What provides the user interface to the ase software?

a. asedirector

b. asemgr

c. Cluster Monitor

d. Availability Manager driver

9. d A failure condition that does not result in a servicerelocation:

a. Network Interface Failure

b. Host Down Scenario

c. Device Failure

d. Network Partition

10. c You must use a DWZZA signal converter with a KZMSAbecause:

a. The KZMSA has only one channel

b. The KZMSA uses the differential mode of signaltransmission

c. You cannot remove the KZMSA internal terminators

d. The KZMSA operates on only a wide SCSI bus

11. a You should use a DWZZA signal converter with aPMAZC SCSI host adapter because:

a. The DWZZA increases the maximum shared SCSI buslength

b. The PMAZC is a dual-ported SCSI host adapter

c. The PMAZC operates as either a fast or slow SCSI hostadapter

d. Signal conversion is necessary to connect the PMAZC to aBA350 storage box

12–14 Test

Page 403: Truclu Ase

Answers

12. d The maximum length of the shared SCSI bus in anAvailable Server configuration for Version 1.4 is:

a. 3 meters

b. 6 meters

c. 25 meters

d. 31 meters

13. c Which cable do you attach to a single-ended device toenable you to disconnect the system without affecting SCSIbus termination?

a. BN21J

b. BN21H

c. BN21V-0B

d. BN21W-0B

14. b You cannot mix single-ended and differential SCSI bussegments on a shared SCSI bus for ASE configurations.

a. True

b. False

15. d What could you use in place of a BN21W-0B?

a. H8574-A

b. H8660-AA

c. H879-AA

d. H885-AA

16. b You are configuring a shared SCSI bus on port A of aPMAZC. Which jumper do you remove to disable the PMAZCsingle-ended bus termination?

a. W1

b. W2

c. W3

d. W4

17. c What is the most important thing to consider whenconfiguring DEC 3000 Model 500 systems in a single-endedAvailable Server configuration with a BA350 without using aDWZZA?

a. Remove the PMAZC single-ended termination jumper forthe port being used

Test 12–15

Page 404: Truclu Ase

Answers

b. Remove the flash memory write jumper

c. SCSI bus length is appropriate for the bus speed

18. b Which console command sets the SCSI ID for aKZTSA?

a. t tc cnfg

b. t tc setid

c. t tc speed

d. t tc id

19. a When using a KZMSA XMI to SCSI bus adapter inan Available Server configuration, you must use a DWZZAbecause you cannot remove the KZMSA single-ended bustermination.

a. True

b. False

20. a The lfu utility modifies the SCSI ID or bus speed forwhich adapter?

a. KZMSA

b. KZPSA

c. KZTSA

d. PMAZC

21. c Use the set console command to set the SCSI ID or busspeed for which adapter?

a. KZMSA

b. KZPSA

c. KZTSA

d. PMAZC

22. a You have an Available Server configuration with twoAlphaServer 2100 systems with KZPSA PCI to SCSI adapters,an HSZ40, a DWZZA-VA, and a BA350 containing four RZ28s.Why is the HSZ40 likely to be assigned SCSI ID zero (0)?

a. The DWZZA-VA is installed in BA350 slot 0

b. SCSI ID 0 is reserved for the HSZ40

c. The HSZ40 must have a higher priority than the RZ28s

d. SCSI ID 0 is reserved for the HSZ40

12–16 Test

Page 405: Truclu Ase

Answers

e. None of the above

23. d Before installing the TruCluster Software:

a. Read the release notes

b. Verify system prerequisites

c. Install, set up, and test hardware

d. All of the above

24. d Which subsets are not required for TruCluster AvailableServer Software Version 1.4?

a. OSFCLINET405

b. OSFPGMR405

c. OSFCMPLRS405

d. None of the above; all are required

25. d To use the Cluster Monitor, you must install whichsubsets?

a. CXLSHRDA405

b. OSFCDEMIN405

c. TCRCMS140

d. All of the above

26. a A rolling upgrade allows you to upgrade ASE membersystems without shutting down the ASE.

a. True

b. False

27. a For which environment can you use a rolling upgrade?

a. ASE V1.2A/Digital UNIX Version 3.2C

b. ASE V1.1/Digital UNIX Version 3.0

c. ASE V1.0A/Digital UNIX Version 2.1

d. ASE V1.0/Digital UNIX Version 2.0

28. d You can perform a rolling upgrade to Digital UNIXVersion 4.0A from which operating system version?

a. Digital UNIX Version 3.2C

b. Digital UNIX Version 3.2D

c. Digital UNIX Version 3.2F

d. Digital UNIX Version 3.2G

Test 12–17

Page 406: Truclu Ase

Answers

29. a If ASE member systems are at DECsafe AvailableServer Version 1.2, you can preserve the ASE database ifdesired.

a. True

b. False

30. b To add a new member to an existing TruClusterAvailable Server Software Version 1.4 configuration, you mustshut down the ASE before adding the new member.

a. True

b. False

31. c Which configuration is supported during a rollingupgrade to TruCluster Available Server Software Version 1.4?

a. ASE V1.3/Digital UNIX Version 3.2D and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

b. ASE V1.3/Digital UNIX Version 3.2F and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

c. ASE V1.3/Digital UNIX Version 3.2G and TruClusterAvailable Server Software Version 1.4/Digital UNIXVersion 4.0A

d. All of the above

32. d Action script containing commands to stop anapplication.

a. Add

b. Delete

c. Start

d. Stop

e. Check

33. e Action script containing commands to determine if aservice is running.

a. Add

b. Delete

c. Start

d. Stop

e. Check

12–18 Test

Page 407: Truclu Ase

Answers

34. b Minimum scripts you must create for an applicationservice.

a. Add and delete

b. Start and stop

c. Add, delete, start, and stop

d. Add, delete, start, stop, and check

35. c How do you specify an action script that will remainexternal to TruCluster Available Server in asemgr ?

a. Specify the pathname when asemgr prompts for the scriptname

b. Specify default , then add the commands to the skeletonscript

c. Specify default , then add a pointer to the external script

d. Copy the external script to the ASE database

36. b You create an action script and specify its pathnameto asemgr . The next day you make changes to the script, butTruCluster Available Server does not use the new version.What action must you take?

a. Delete and add the service again

b. Modify the script through asemgr to update its database

c. Edit the default script

d. Stop and start the service

37. d If you restrict ASE service A to one favored memberand that member crashes:

a. ASE relocates service A on the member running the leastnumber of services

b. ASE relocates service A on another favored member

c. ASE finds no other favored member and relocates serviceA on the member running the least number of services

d. ASE will not relocate service A

38. a If you select the balanced service distribution policy forservice A and the member system running it crashes:

a. ASE relocates service A on the member running the leastnumber of services

b. ASE relocates service A on another favored member

c. ASE finds no other favored member and relocates serviceA on the member running the least number of services

d. ASE will not relocate service A

Test 12–19

Page 408: Truclu Ase

Answers

39. d Before using asemgr to add an NFS service you must:

a. Specify the service name and Internet address in allmember /etc/hosts files

b. Specify the service name and Internet address in all clientsystem /etc/hosts files

c. Set up the UFS device special files, AdvFS, or LSMvolumes

d. All of the above

40. b You are adding a disk service and the service name isin the /etc/hosts file only for the member system on whichasemgr is being run. The service will be set up and started.

a. True

b. False

41. c When you write an action script to both start and stopthe user-defined service, it will require what parameters?

a. Service name only

b. Action only

c. Service name and action

d. None of the above

42. a When setting up a highly available user-definedapplication, the application must be installed on all membersystems.

a. True

b. False

43. b If you stop a service that uses the Logical StorageManager, the disk groups are deported, are inaccessible, andthe volumes are deleted.

a. True

b. False

44. b If the status of a service is unassigned, how would youmanually restart the service?

a. Relocate a service

b. Restart a service

c. Set a service on line

d. Set a service off line

45. b The command to create the cluster map.

a. /etc/CCM

b. cluster_map_create

12–20 Test

Page 409: Truclu Ase

Answers

c. cmon

d. tractd

46. c The command to start the Cluster Monitor.

a. asemgr

b. cluster_map_create

c. cmon

d. monitor

47. c You are logged in to ASE member alpha fromworkstation gamma and run the Cluster Monitor. You seeASE domain members alpha and beta. To run the LSM utilityon beta you must:

a. Drag the beta icon and drop it on the dxlsm icon

b. Drag the dxlsm icon and drop it on the alpha icon

c. Drag the dxlsm icon and drop it on the beta icon

d. Drag the dxlsm icon and drop it on the gamma icon

48. c To check the status of the shared disks associatedwith a particular ASE service, go to this view of the ClusterMonitor.

a. Top view, or main window

b. Devices view

c. Services view

d. Any of the above

49. a In the Cluster Monitor, this symbol indicates the systemis not reported as an ASE member.

a. Blank area in the shape of the system icon

b. Outline around the system graphic

c. Diagonal line across the system icon

d. Question mark in the middle of the system icon

50. c When a network partition occurs, the TruClusterSoftware:

a. Stops all services

b. Fails over services to the member on which the Director isrunning

c. Continues running all services on the servers on whichthey are located

d. Reboots all systems on which services are running

51. a Which command initializes a disk for LSM?

a. voldisk init

b. voldg init

Test 12–21

Page 410: Truclu Ase

Answers

c. voldg -g db

d. volrecover -g db -sb

52. d If your ASE has a properly terminated bus, withoutstopping TruCluster Software activity, you can:

a. Add a storage box to the system

b. Add a new member system

c. Remove a DWZAA from the bus

d. Disconnect a member system from the bus

53. c The file in which the Availability Manager logs errormessages:

a. daemon.log

b. asecdb

c. kern.log

d. asemgr.log

54. b When troubleshooting an active TruClusterimplementation, the first thing you should do is:

a. Reset the daemons

b. Examine the error log messages

c. Stop the services

d. Turn off the DWZAAs

55. d Which utility displays the status of an ASE service?

a. uerf

b. showfdmn

c. ps

d. asemgr

56. d One common cause of problems on a TruCluster SCSIbus:

a. Improperly terminated bus segments

b. Cable lengths too long

c. Improperly configured SCSI IDs

d. All of the above

57. c The host names and IP addresses of each membersystem must be included in which file on each membersystem?

a. rc.local

b. asecdb

c. /etc/hosts

d. /etc/fstab

12–22 Test

Page 411: Truclu Ase

Index

Aaction scripts, 2–6, 6–9, 7–20

add, 6–3check, 6–3delete, 6–3start, 6–3stop, 6–3

Address Resolution ProtocolSee ARP

AdvFS, 1–2, 7–21Use with ASE, 7–3, 7–6, 7–11, 7–37Use with Available Server, 1–7Use with the TruCluster Available Server,

1–4Alert messages, 10–7arc , 3–67ARC console, 3–67ARP, 7–11ASE database

/usr/var/ase/config/asecdb , 4–14ase driver, 2–6ASE Logger daemon, 4–20aseagent daemon, 2–5, 5–15asecdb , 4–7asedirector daemon, 2–5, 2–8asehsm daemon, 2–5aselogger daemon, 2–6, 5–15, 5–17, 6–4asemgr , 7–2asemgr utility, 4–10, 4–14, 4–16, 5–5, 6–4,

6–9, 6–12, 7–4, 7–5, 7–11, 7–12, 7–17,7–19, 7–21, 7–27, 7–32, 7–35, 7–36, 10–13

aseprod, 1–5ase_fix_config , 4–4, 4–9ase_fix_config script, 4–20ASP, 7–5, 7–20Automatic Service Placement policy

See ASPautomount, 7–6Availability Manager driver, 2–6Available Server, 1–2

troubleshooting, 1–17Available Server configuration

differential with PMAZC, 3–31, 3–35Single-ended with PMAZC, 3–28

Available Server configuration planning,1–15

Available Server EnvironmentSee ASE

Available Server Management Phases, 1–14

BBA350, 3–11

jumper, 3–12termination, 3–12

BA353, 3–11BA356, 3–11

jumper, 3–13termination, 3–13

Base operating system setup, 1–16BC06P, 3–21bindsetup , 4–5BN21H, 3–21BN21K, 3–21BN21L, 3–21BN21R, 3–21BN21V-0B, 3–21BN21W-0B, 3–21BN23G, 3–21Bus speed

setting for KZPSA, 3–67

CCables, 3–20

BC06P, 3–21BC09, 11–22BN21H, 3–21BN21K, 3–21BN21L, 3–21BN21R, 3–21BN21V-0B, 3–21BN21W-0B, 3–21BN23G, 3–21

CDFS, 3–26cluster, 1–5cluster configuration map

See /etc/CCMcluster map, 8–3

Index–1

Page 412: Truclu Ase

Cluster Monitor, 8–6setup, 8–3

cluster_map_create command, 8–3cmon utility, 8–6Commands

arc , 3–67iostat, 10–19netstat, 10–18ps, 10–17rpcinfo, 10–17scu, 10–18set , 3–67set pkn , 3–67show config , 3–64, 3–66show device , 3–64, 3–66show pk#* , 3–64, 3–66t , 3–42t Test TURBOchannel command, 3–42uerf, 10–16

Compact Disk File SystemSee CDFS

Configurationstarting, 3–26

Configuring ASE hardware, 1–15Configuring ASE Services, 1–16Configuring Available Server

with KZMSA and BA350, 3–51with KZMSA and HSZ40, 3–54with PMAZC and an HSZ10 or HSZ40,

3–38with PMAZC, differential bus, and BA350,

3–31with PMAZC, differential bus, and BA356,

3–35Configuring TruCluster Available Server

with PMAZC, single-ended bus, andBA350, 3–27

Console Utilityt Test TURBOchannel command, 3–42

Ddaemon.log file, 10–5Database format change, 4–7director daemon, 2–5Disk devices, 3–15disk service, 6–4, 7–3, 7–19, 7–21disklabel , 7–6Displaying devices on an AlphaServer 1000,

2000 or 2100, 3–66doconfig , 4–4, 4–9DWZZA, 3–15

in BA350 slot 0, 3–32termination, 3–17, 3–32, 3–35, 3–39, 3–51,

3–71

DWZZA-AA, 3–36DWZZA-VA

in BA356 slot 0, 3–36DWZZB, 3–15

termination, 3–19, 3–51DWZZB-VW

in BA356 slot 0, 3–36

Eedquota , 7–7/etc/CCM , 8–3, 8–4/etc/exports , 7–17/etc/exports.ase , 7–17/etc/exports.ase.servicename, 7–17/etc/fstab , 7–4, 7–6, 7–7, 7–11, 7–20/etc/host , 3–25/etc/hosts , 4–5, 4–9, 4–10, 4–11, 7–4, 7–11/etc/ntp.conf , 4–4/etc/syslog.conf , 5–17event logging, 10–5

Ffailover, 2–9Fast SCSI, 3–3firmware update

KZPSA, 3–67Firmware Update utility, 3–26fwupdate.exe , 3–67

GGlobal Event Logging, 1–4

HH6660-AA, 3–21, 3–22H8574-A, 3–21H879-AA, 3–21, 3–22H885-AA, 3–21, 3–22Hardware components, 3–10

disk devices, 3–15SCSI cables, 3–20SCSI controllers, 3–10signal converters, 3–15storage expansion units, 3–11systems, 3–10terminators, 3–20tri-link connector, 3–20

Host Status Monitor, 2–5HSM daemon, 2–5

Index–2

Page 413: Truclu Ase

IInstalling TruCluster Software, 1–16, 4–17,

4–23installupdate , 4–10, 4–11

Kkern.log file, 10–5Kernel build, 4–20KZMSA, 3–50

Available Server configuration with BA350,3–51

Available Server configuration with HSZ40,3–54

boot ROM part numbers, 3–50Disable Reset configuration option, 3–51,

3–55hardware revision, 3–50NCR chips, 3–50setting SCSI ID, 3–51, 3–55setting SCSI speed, 3–51, 3–55updating firmware, 3–51, 3–55

KZMSA and DWZZA, 3–4KZPSA

bus speed, 3–67SCSI bus ID, 3–67setting bus speed, 3–67setting SCSI ID, 3–67

KZTSAdisplaying and changing SCSI ID, 3–41setting up an Available Server

configuration, 3–44, 3–47t command, 3–42

LLFU, 3–56LFU utility, 3–50, 3–58Loadable Firmware Update utility

See LFUlogin service, 7–31LSM, 1–2, 7–21

Use with ASE, 7–3, 7–7, 7–11, 7–37Use with the TruCluster Available Server,

2–19

Mmailsetup , 4–5member systems, 1–6mirrored stripe set, 7–21mirroring, 7–21mkfset , 7–7mount command, 4–17, 4–23

Nnetsetup , 3–25, 4–5Network adapters, 3–25Network Time Protocol

See xntpdnewfs , 7–6NFS, 1–5, 7–4, 7–6, 7–7, 7–11NFS service, 6–4, 7–3, 7–11nfssetup , 4–5NTP

See xntpd

PPMAZC

Available Server configuration with anHSZ10 or HSZ40, 3–38

Available Server differential configurationwith BA350, 3–31

Available Server differential configurationwith BA356, 3–35

Available Server single-ended configurationwith BA350, 3–27

configuring for a differential configuration,3–31, 3–35, 3–38

displaying and changing SCSI ID, 3–41displaying and changing speed, 3–41in single-ended configuration, 3–27install, 3–28, 3–31, 3–35, 3–39internal jumpers, 3–41jumpers, 3–28, 3–31, 3–35, 3–39, 3–41setting SCSI bus speed, 3–28setting SCSI ID, 3–28t command, 3–42termination, 3–28, 3–31, 3–35, 3–39used with DWZZA, 3–32, 3–39used with DWZZA-AA, 3–35

/proc , 7–7pseudo host name, 7–31

Qquota, 7–7, 11–34quota.group , 7–7quota.user , 7–7quotacheck , 7–7

RReplacing an LSM Shared Disk, 9–9Required subsets, 4–3/.rhosts , 8–3

Index–3

Page 414: Truclu Ase

S/sbin/init.d/asemember script, 5–15/sbin/init.d/asemember stop , 4–10, 4–11SCSI Bus ID

setting for KZPSA, 3–67SCSI bus length, 3–3SCSI bus termination, 3–20SCSI cables, 3–20SCSI controllers, 3–10sendmail.cf , 7–18service

highly available, 7–3set , 3–67setid , 3–28, 3–31, 3–35, 3–39setld -d , 4–10, 4–11setld -i , 4–10, 4–11setld utility, 4–8, 4–10, 4–14, 4–16, 4–17,

4–23, 5–9Setting bus speed

KZPSA, 3–67Setting SCSI ID

KZPSA, 3–67Shared SCSI bus selection, 4–20show config , 3–64, 3–66show device , 3–64, 3–66show pk#* , 3–64, 3–66showfdmn command, 10–19showfsets command, 10–19Signal converters, 3–15sizer , 4–4Slow SCSI, 3–3Software subsets, 4–3Starting an Available Server configuration,

3–26Storage expansion units, 3–11stripe set, 7–21Supported hardware

cables, 3–20disk devices, 3–15signal converters, 3–15storage expansion units, 3–11terminators, 3–20tri-link connector, 3–20

Supported systems, 3–10syslog , 5–17, 6–4, 10–5

Tt , 3–42Terminators, 3–20

H6660-AA, 3–21, 3–22H8574-A, 3–21, 3–22H879-AA, 3–21, 3–22

tri-connectorH885-AA, 3–21

tri-link connector, 3–20H885-AA, 3–22

Troubleshooting, 10–2See TruCluster Available Server

troubleshootingTruCluster Available Server

troubleshooting, 10–2troubleshooting procedures, 10–8

TruCluster Available Server installationAdding a member system to an existing

ASE, 4–16Rolling upgrade, 4–10Setting up an ASE for the first time, 4–8Simultaneous upgrade, 4–14

TruCluster Available Server troubleshooting,11–2

common problems, 11–3configuration guidelines, 11–23

TruCluster Softwareconfiguring services, 1–16failover testing, 1–16installing, 1–16

TruCluster troubleshootingsystem monitoring tools, 10–12

Uuser-defined service, 6–4, 7–3, 7–27, 7–31/usr/bin/X11/cmon , 8–6/usr/sbin/ asemgr , 2–6/usr/var/ase/config/asecdb , 5–5/usr/var/ase/config/ asecdb , 2–6Utilities

firmware update utility, 3–67LFU, 3–56LFU, 3–50setid , 3–28, 3–31, 3–35, 3–39

V/var/adm/syslog.dated , 7–12/var/adm/syslog.dated/date/daemon.log ,

5–15, 5–17, 5–22/var/adm/syslog.dated/date/kern.log ,

5–17/var/spool/mail , 7–18/var/spool/mqueue , 7–18vedquota , 7–7volprint , 10–19

XX Window System, 8–6xntpd , 4–4

Index–4

Page 415: Truclu Ase

YY cable

BN21V-0B, 3–21BN21W-0B, 3–21

ypsetup , 4–5

Index–5

Page 416: Truclu Ase