434
VERITAS Cluster Server for UNIX, Fundamentals (Lessons) HA-VCS-410-101A-2-10-SRT (100-002149-A)

VERITAS Cluster Server for UNIX Fundamentals

Embed Size (px)

Citation preview

Page 1: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals (Lessons) HA-VCS-410-101A-2-10-SRT (100-002149-A)

Page 2: VERITAS Cluster Server for UNIX Fundamentals

COURSE DEVELOPERSBilge GerritsSiobhan SeegerDawn Walker

LEAD SUBJECT MATTER EXPERTS

Geoff BergrenConnie EconomouPaul JohnstonDave RogersJim SenickaPete Toemmes

TECHNICAL CONTRIBUTORS AND REVIEWERS

Billie BachraBarbara CeranBob LucasGene HenriksenMargy Cassidy

Disclaimer

The information contained in this publication is subject to change without notice. VERITAS Software Corporation makes no warranty of any kind with regard to this guide, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. VERITAS Software Corporation shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this manual.

Copyright

Copyright © 2005 VERITAS Software Corporation. All rights reserved. No part of the contents of this training material may be reproduced in any form or by any means or be used for the purposes of training or education without the written permission of VERITAS Software Corporation.

Trademark Notice

VERITAS, the VERITAS logo, and VERITAS FirstWatch, VERITAS Cluster Server, VERITAS File System, VERITAS Volume Manager, VERITAS NetBackup, and VERITAS HSM are registered trademarks of VERITAS Software Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

VERITAS Cluster Server for UNIX, FundamentalsParticipant Guide

April 2005 Release

VERITAS Software Corporation350 Ellis StreetMountain View, CA 94043Phone 650–527–8000 www.veritas.com

Page 3: VERITAS Cluster Server for UNIX Fundamentals

Table of Contents iCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Course IntroductionVERITAS Cluster Server Curriculum ................................................................ Intro-2Course Prerequisites......................................................................................... Intro-3Course Objectives............................................................................................. Intro-4

Certification Exam Objectives........................................................................... Intro-5Cluster Design Input .......................................................................................... Intro-6Sample Design Input.......................................................................................... Intro-7Sample Design Worksheet................................................................................. Intro-8

Lab Design for the Course ................................................................................ Intro-9Lab Naming Conventions ................................................................................ Intro-10Classroom Values for Labs............................................................................... Intro-11

Course Overview............................................................................................. Intro-12Legend ............................................................................................................ Intro-15

Lesson 1: VCS Building BlocksIntroduction ............................................................................................................. 1-2Cluster Terminology ................................................................................................ 1-4

A Nonclustered Computing Environment ................................................................ 1-4Definition of a Cluster .............................................................................................. 1-5Definition of VERITAS Cluster Server and Failover............................................... 1-6Definition of an Application Service ........................................................................ 1-7Definition of Service Group...................................................................................... 1-8Service Group Types................................................................................................. 1-9Definition of a Resource ......................................................................................... 1-10Resource Dependencies .......................................................................................... 1-11Resource Attributes................................................................................................. 1-12Resource Types and Type Attributes...................................................................... 1-13Agents: How VCS Controls Resources .................................................................. 1-14Using the VERITAS Cluster Server Bundled Agents Reference Guide ................ 1-15

Cluster Communication......................................................................................... 1-16Low-Latency Transport .......................................................................................... 1-17Group Membership Services/Atomic Broadcast (GAB) ........................................ 1-18The Fencing Driver ................................................................................................. 1-19The High Availability Daemon............................................................................... 1-20Comparing VCS Communication Protocols and TCP/IP ....................................... 1-21

Maintaining the Cluster Configuration ................................................................... 1-22VCS Architecture................................................................................................... 1-24

How does VCS know what to fail over?................................................................. 1-24How does VCS know when to fail over?................................................................ 1-24

Supported Failover Configurations........................................................................ 1-25Active/Passive......................................................................................................... 1-25N-to-1...................................................................................................................... 1-26N + 1 ....................................................................................................................... 1-27Active/Active .......................................................................................................... 1-28N-to-N..................................................................................................................... 1-29

Table of Contents

Page 4: VERITAS Cluster Server for UNIX Fundamentals

ii VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lesson 2: Preparing a Site for VCSPlanning for Implementation ................................................................................... 2-4

Implementation Needs .............................................................................................. 2-4The Implementation Plan .......................................................................................... 2-5Using the Design Worksheet..................................................................................... 2-6

Hardware Requirements and Recommendations ................................................... 2-7SCSI Controller Configuration for Shared Storage .................................................. 2-9Hardware Verification............................................................................................ 2-12

Software Requirements and Recommendations................................................... 2-13Software Verification ............................................................................................. 2-15

Preparing Cluster Information ............................................................................... 2-16VERITAS Security Services .................................................................................. 2-17

Lab 2: Validating Site Preparation ........................................................................ 2-19

Lesson 3: Installing VERITAS Cluster ServerIntroduction ............................................................................................................. 3-2Using the VERITAS Product Installer...................................................................... 3-4

Viewing Installation Logs ......................................................................................... 3-4The installvcs Utility ................................................................................................. 3-5Automated VCS Installation Procedure .................................................................... 3-6Installing VCS Updates.......................................................................................... 3-10

VCS Configuration Files........................................................................................ 3-11VCS File Locations ................................................................................................. 3-11Communication Configuration Files...................................................................... 3-12Cluster Configuration Files .................................................................................... 3-13

Viewing the Default VCS Configuration ................................................................ 3-14Viewing Installation Results .................................................................................. 3-14Viewing Status ....................................................................................................... 3-15

Other Installation Considerations .......................................................................... 3-16Fencing Considerations .......................................................................................... 3-16Cluster Manager Java GUI..................................................................................... 3-17

Lab 3: Installing VCS ............................................................................................ 3-20

Lesson 4: VCS OperationsIntroduction ............................................................................................................. 4-2Managing Applications in a Cluster Environment.................................................... 4-4

Key Considerations ................................................................................................... 4-4VCS Management Tools ........................................................................................... 4-5

Service Group Operations....................................................................................... 4-6Displaying Attributes and Status............................................................................... 4-7Bringing Service Groups Online............................................................................... 4-9Taking Service Groups Offline ............................................................................... 4-11Switching Service Groups...................................................................................... 4-12Freezing a Service Group....................................................................................... 4-13Bringing Resources Online .................................................................................... 4-14Taking Resources Offline ...................................................................................... 4-15Clearing Resource Faults ....................................................................................... 4-16

Page 5: VERITAS Cluster Server for UNIX Fundamentals

Table of Contents iiiCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the VCS Simulator ...................................................................................... 4-18The Simulator Java Console ................................................................................... 4-19Creating a New Simulator Configuration ............................................................... 4-20Simulator Command-Line Interface ....................................................................... 4-21Using the Java GUI with the Simulator .................................................................. 4-22

Lab 4: Using the VCS Simulator ........................................................................... 4-24

Lesson 5: Preparing Services for VCSIntroduction ............................................................................................................. 5-2Preparing Applications for VCS............................................................................... 5-4

Application Service Component Review.................................................................. 5-4Configuration and Migration Procedure ................................................................... 5-5

One-Time Configuration Tasks ............................................................................... 5-6Identifying Components............................................................................................ 5-6Configuring Shared Storage...................................................................................... 5-7Configuring the Network .......................................................................................... 5-8Configuring the Application ................................................................................... 5-12

Testing the Application Service............................................................................. 5-13Bringing Up Resources ........................................................................................... 5-14Verifying Resources................................................................................................ 5-18Testing the Integrated Components ........................................................................ 5-19

Stopping and Migrating an Application Service..................................................... 5-20Stopping Application Components ......................................................................... 5-20Manually Migrating an Application Service........................................................... 5-21

Validating the Design Worksheet .......................................................................... 5-22Documenting Resource Attributes.......................................................................... 5-22Checking Resource Attributes ................................................................................ 5-23Documenting Resource Dependencies ................................................................... 5-24Validating Service Group Attributes ...................................................................... 5-25

Lab 5: Preparing Application Services .................................................................. 5-27

Lesson 6: VCS Configuration MethodsIntroduction ............................................................................................................. 6-2Overview of Configuration Methods ........................................................................ 6-4

Effects on the Cluster................................................................................................ 6-5Controlling Access to VCS ...................................................................................... 6-6

Relating VCS and UNIX User Accounts.................................................................. 6-6Simplifying VCS Administrative Access ................................................................. 6-7User Accounts........................................................................................................... 6-8Changing Privileges ................................................................................................ 6-10VCS Access in Secure Mode .................................................................................. 6-11

Online Configuration ............................................................................................. 6-12How VCS Changes the Online Cluster Configuration ........................................... 6-13Opening the Cluster Configuration......................................................................... 6-14Saving the Cluster Configuration............................................................................ 6-15Closing the Cluster Configuration .......................................................................... 6-16How VCS Protects the Cluster Configuration ........................................................ 6-17

Page 6: VERITAS Cluster Server for UNIX Fundamentals

iv VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Offline Configuration ............................................................................................. 6-18Offline Configuration Examples ............................................................................ 6-19

Starting and Stopping VCS ................................................................................... 6-22How VCS Starts Up by Default ............................................................................. 6-22VCS Startup with a .stale File ................................................................................ 6-25Forcing VCS to Start from a Wait State................................................................. 6-26Building the Configuration Using a Specific main.cf File..................................... 6-28Stopping VCS......................................................................................................... 6-30

Lab 6: Starting and Stopping VCS ........................................................................ 6-32

Lesson 7: Online Configuration of Service GroupsIntroduction ............................................................................................................. 7-2Online Configuration Procedure.............................................................................. 7-4

Creating a Service Group .......................................................................................... 7-4Adding a Service Group.......................................................................................... 7-5

Adding a Service Group Using the GUI ................................................................... 7-5Adding a Service Group Using the CLI.................................................................... 7-6Classroom Exercise: Creating a Service Group ........................................................ 7-7Design Worksheet Example...................................................................................... 7-8

Adding Resources................................................................................................... 7-9Online Resource Configuration Procedure ............................................................... 7-9Adding Resources Using the GUI: NIC Example.................................................. 7-10Adding an IP Resource........................................................................................... 7-12Classroom Exercise: Creating Network Resources Using the GUI ....................... 7-13Adding a Resource Using the CLI: DiskGroup Example ...................................... 7-16Classroom Exercise: Creating Storage Resources using the CLI .......................... 7-20The Process Resource ............................................................................................ 7-23Classroom Exercise: Creating a Process Resource ................................................ 7-24

Solving Common Configuration Errors.................................................................. 7-26Flushing a Service Group....................................................................................... 7-27Disabling a Resource.............................................................................................. 7-28Copying and Deleting a Resource.......................................................................... 7-29

Testing the Service Group .................................................................................... 7-30Linking Resources.................................................................................................. 7-31Resource Dependencies ......................................................................................... 7-32Classroom Exercise: Linking Resources................................................................ 7-33Design Worksheet Example................................................................................... 7-34Setting the Critical Attribute .................................................................................. 7-35Classroom Exercise: Testing the Service Group.................................................... 7-36A Completed Process Service Group..................................................................... 7-37

Lab 7: Online Configuration of a Service Group ................................................... 7-41

Lesson 8: Offline Configuration of Service GroupsIntroduction ............................................................................................................. 8-2Offline Configuration Procedures............................................................................ 8-4

New Cluster............................................................................................................... 8-4Example Configuration File ...................................................................................... 8-5Existing Cluster ......................................................................................................... 8-7

Page 7: VERITAS Cluster Server for UNIX Fundamentals

Table of Contents vCopyright © 2005 VERITAS Software Corporation. All rights reserved.

First System .............................................................................................................. 8-7Using the Design Worksheet................................................................................. 8-10

Resource Dependencies .......................................................................................... 8-11A Completed Configuration File ............................................................................ 8-12

Offline Configuration Tools.................................................................................... 8-14Editing Configuration Files .................................................................................... 8-14Using the VCS Simulator ....................................................................................... 8-15

Solving Offline Configuration Problems ................................................................ 8-16Common Problems ................................................................................................. 8-16All Systems in a Wait State .................................................................................... 8-17Propagating an Old Configuration .......................................................................... 8-17Recovering from an Old Configuration .................................................................. 8-18Configuration File Backups .................................................................................... 8-19

Testing the Service Group .................................................................................... 8-20Service Group Testing Procedure ........................................................................... 8-20

Lab 8: Offline Configuration of Service Groups..................................................... 8-22

Lesson 9: Sharing Network InterfacesIntroduction ............................................................................................................. 9-2Sharing Network Interfaces..................................................................................... 9-4

Conceptual View....................................................................................................... 9-4Alternate Network Configurations ........................................................................... 9-6

Using Proxy Resources ............................................................................................. 9-6The Proxy Resource Type......................................................................................... 9-7

Using Parallel Service Groups ................................................................................ 9-8Determining Service Group Status ........................................................................... 9-8Phantom Resources................................................................................................... 9-9The Phantom Resource Type .................................................................................. 9-10Configuring a Parallel Service Group..................................................................... 9-11Properties of Parallel Service Groups ..................................................................... 9-12

Localizing Resource Attributes.............................................................................. 9-13Localizing a NIC Resource Attribute ..................................................................... 9-13

Lab 9: Creating a Parallel Service Group.............................................................. 9-15

Lesson 10: Configuring NotificationIntroduction ........................................................................................................... 10-2Notification Overview ............................................................................................ 10-4

Message Queue ....................................................................................................... 10-4Message Severity Levels......................................................................................... 10-5

Configuring Notification ......................................................................................... 10-6The NotifierMngr Resource Type........................................................................... 10-8Configuring the ResourceOwner Attribute........................................................... 10-10Configuring the GroupOwner Attribute................................................................ 10-11Configuring the SNMP Console ........................................................................... 10-12

Using Triggers for Notification............................................................................. 10-13Lab 10: Configuring Notification .......................................................................... 10-15

Page 8: VERITAS Cluster Server for UNIX Fundamentals

vi VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lesson 11: Configuring VCS Response to Resource FaultsIntroduction ........................................................................................................... 11-2VCS Response to Resource Faults ...................................................................... 11-4

Failover Decisions and Critical Resources ............................................................. 11-4How VCS Responds to Resource Faults by Default ............................................... 11-5The Impact of Service Group Attributes on Failover.............................................. 11-7Practice: How VCS Responds to a Fault............................................................... 11-10

Determining Failover Duration ............................................................................. 11-11Failover Duration on a Resource Fault ................................................................. 11-11Adjusting Monitoring............................................................................................ 11-13Adjusting Timeout Values .................................................................................... 11-14

Controlling Fault Behavior................................................................................... 11-15Type Attributes Related to Resource Faults.......................................................... 11-15Modifying Resource Type Attributes.................................................................... 11-18Overriding Resource Type Attributes ................................................................... 11-19

Recovering from Resource Faults....................................................................... 11-20Recovering a Resource from a FAULTED State .................................................. 11-20Recovering a Resource from an ADMIN_WAIT State ........................................ 11-22

Fault Notification and Event Handling ................................................................. 11-24Fault Notification .................................................................................................. 11-24Extended Event Handling Using Triggers ............................................................ 11-25The Role of Triggers in Resource Faults .............................................................. 11-25

Lab 11: Configuring Resource Fault Behavior .................................................... 11-28

Lesson 12: Cluster CommunicationsIntroduction ........................................................................................................... 12-2VCS Communications Review .............................................................................. 12-4

VCS On-Node Communications............................................................................ 12-4VCS Inter-Node Communications ......................................................................... 12-5VCS Communications Stack Summary ................................................................. 12-5Cluster Interconnect Specifications........................................................................ 12-6

Cluster Membership .............................................................................................. 12-7GAB Status and Membership Notation.................................................................. 12-7Viewing LLT Link Status ...................................................................................... 12-9The lltstat Command .............................................................................................. 12-9

Cluster Interconnect Configuration...................................................................... 12-10Configuration Overview....................................................................................... 12-10LLT Configuration Files ....................................................................................... 12-11The sysname File.................................................................................................. 12-15The GAB Configuration File ............................................................................... 12-16

Joining the Cluster Membership.......................................................................... 12-17Seeding During Startup ........................................................................................ 12-17LLT, GAB, and VCS Startup Files ...................................................................... 12-18Manual Seeding.................................................................................................... 12-19Probing Resources During Startup....................................................................... 12-20

Page 9: VERITAS Cluster Server for UNIX Fundamentals

Table of Contents viiCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lesson 13: System and Communication FaultsIntroduction ........................................................................................................... 13-2Ensuring Data Integrity.......................................................................................... 13-4

VCS Response to System Failure ........................................................................... 13-5Failover Duration on a System Fault ...................................................................... 13-6

Cluster Interconnect Failures ................................................................................ 13-7Single LLT Link Failure ......................................................................................... 13-7Jeopardy Membership............................................................................................. 13-8Recovery Behavior................................................................................................ 13-11Modifying the Default Recovery Behavior........................................................... 13-12Potential Split Brain Condition............................................................................. 13-13Interconnect Failures with a Low-Priority Public Link ........................................ 13-14Interconnect Failures with Service Group Heartbeats .......................................... 13-16Preexisting Network Partition............................................................................... 13-17

Changing the Interconnect Configuration............................................................ 13-18Modifying the Cluster Interconnect Configuration............................................... 13-19Adding LLT Links ................................................................................................ 13-20

Lab 13: Testing Communication Failures............................................................ 13-22Optional Lab: Configuring the InJeopardy Trigger .............................................. 13-23

Lesson 14: I/O FencingIntroduction ........................................................................................................... 14-2Data Protection Requirements .............................................................................. 14-4

Understanding the Data Protection Problem........................................................... 14-4Split Brain Condition .............................................................................................. 14-7Data Protection Requirements ................................................................................ 14-8

I/O Fencing Concepts and Components ............................................................... 14-9I/O Fencing Components ...................................................................................... 14-10

I/O Fencing Operations ....................................................................................... 14-12Registration with Coordinator Disks .................................................................... 14-12Service Group Startup........................................................................................... 14-13System Failure ...................................................................................................... 14-14Interconnect Failure .............................................................................................. 14-15I/O Fencing Behavior............................................................................................ 14-19I/O Fencing with Multiple Nodes ......................................................................... 14-20

I/O Fencing Implementation ................................................................................ 14-21Communication Stack........................................................................................... 14-21Fencing Driver ...................................................................................................... 14-23Fencing Implementation in Volume Manager ...................................................... 14-24Fencing Implementation in VCS .......................................................................... 14-25Coordinator Disk Implementation ........................................................................ 14-26

Configuring I/O Fencing ...................................................................................... 14-27Fencing Effects on Disk Groups ........................................................................... 14-31

Stopping and Recovering Fenced Systems ........................................................ 14-32Stopping Systems Running I/O Fencing............................................................... 14-32Recovery with Running Systems .......................................................................... 14-33Recovering from a Partition-In-Time ................................................................... 14-34

Lab 14: Configuring I/O Fencing ......................................................................... 14-36

Page 10: VERITAS Cluster Server for UNIX Fundamentals

viii VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lesson 15: TroubleshootingIntroduction ........................................................................................................... 15-2Monitoring VCS..................................................................................................... 15-4

VCS Logs ............................................................................................................... 15-5UMI-Based Support ............................................................................................... 15-7Using the VERITAS Support Web Site ................................................................. 15-8

Troubleshooting Guide.......................................................................................... 15-9Procedure Overview............................................................................................... 15-9Using the Troubleshooting Job Aid ..................................................................... 15-10

Cluster Communication Problems....................................................................... 15-11Checking GAB ...................................................................................................... 15-11Checking LLT ...................................................................................................... 15-12Duplicate Node IDs.............................................................................................. 15-13Problems with LLT .............................................................................................. 15-14

VCS Engine Problems ........................................................................................ 15-15Startup Problems .................................................................................................. 15-15STALE_ADMIN_WAIT ..................................................................................... 15-16ADMIN_WAIT.................................................................................................... 15-17

Service Group and Resource Problems.............................................................. 15-18Service Groups Problems..................................................................................... 15-18Resource Problems............................................................................................... 15-27Agent Problems and Resource Type Problems.................................................... 15-30

Archiving VCS-Related Files............................................................................... 15-32Making Backups................................................................................................... 15-32The hasnap Utility ................................................................................................ 15-33

Lab 15: Troubleshooting ..................................................................................... 15-35

Index

Page 11: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction

Page 12: VERITAS Cluster Server for UNIX Fundamentals

Intro–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

VERITAS Cluster Server CurriculumThe VERITAS Cluster Server curriculum is a series of courses that are designed to provide a full range of expertise with VERITAS Cluster Server (VCS) high availability solutions—from design through disaster recovery.

VERITAS Cluster Server, FundamentalsThis course covers installation and configuration of common VCS configurations, focusing on two-node clusters running application and database services.

VERITAS Cluster Server, Implementing Local ClustersThis course focuses on multinode VCS clusters and advanced topics related to more complex cluster configurations.

VERITAS Cluster Server Agent DevelopmentThis course enables students to create and customize VCS agents.

High Availability Design Using VERITAS Cluster ServerThis course enables participants to translate high availability requirements into a VCS design that can be deployed using VERITAS Cluster Server.

Disaster Recovery Using VVR and Global Cluster Option This course covers cluster configurations across remote sites, including replicated data clusters (RDCs) and the Global Cluster Option for wide-area clusters.

Learning Path

VERITAS Cluster Server,

Implementing Local Clusters

Disaster Recovery Using VVR 4.0 and

Global Cluster Option

High AvailabilityDesign Using

VERITAS Cluster Server

VERITAS Cluster Server, Fundamentals

VERITAS Cluster Server Curriculum

VERITAS Cluster Server Agent

Development

Page 13: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Course PrerequisitesThis course assumes that you have an administrator-level understanding of one or more UNIX platforms. You should understand how to configure systems, storage devices, and networking in multiserver environments.

Course PrerequisitesTo successfully complete this course, youshould have the following expertise:

UNIX operating system and network administrationSystem and network device configurationVERITAS Volume Manager configuration

Page 14: VERITAS Cluster Server for UNIX Fundamentals

Intro–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Course ObjectivesIn the VERITAS Cluster Server for UNIX, Fundamentals course, you are given a high availability design to implement in the classroom environment using VERITAS Cluster Server.

The course simulates the job tasks you perform to configure a cluster, starting with preparing the site and application services that will be made highly available. Lessons build upon each other, exhibiting the processes and recommended best practices you can apply to implementing any design cluster.

The core material focuses on the most common cluster implementations. Other cluster designs emphasizing additional VCS capabilities are provided to illustrate the power and flexibility of VERITAS Cluster Server.

Course ObjectivesAfter completing the VERITAS Cluster Server for UNIX, Fundamentals course, you will be able to:

Manage services in an existing VCS environment.Install and configure a cluster according to a specified sample design.Use a design worksheet to put applications under VCS control.Customize cluster behavior to implement specified requirements.Respond to resource, system, and communication failures.

Page 15: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Certification Exam ObjectivesThe high-level objectives for the Implementation of HA Solutions certification exam are shown in the slide.

Note: Not all objectives are covered by the VERITAS Cluster Server for UNIX, Fundamentals course. The VERITAS Cluster Server for UNIX, Implementing Local Clusters course is also required to provide complete training on all certification exam objectives.

Detailed objectives are provided on the VERITAS Web site, along with sample exams.

Certification Exam ObjectivesThe summary of VERITAS CertifiedHigh Availability Implementation Examobjectives covered in this lesson are:

Verify and adjust the preinstallationenvironment. Install VCS.Configure the high availability environment.Perform advanced cluster configuration.Validate the implementation and make adjustments for high availability.Document and maintain the high availability solution.

For the complete set of exam objectives, follow the Certification link fromwww.veritas.com/education.

Page 16: VERITAS Cluster Server for UNIX Fundamentals

Intro–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Cluster Design InputThe staff responsible for the deployment of a VCS cluster may not necessarily be the same people who developed the cluster design. To ensure a successful deployment process, define the information that needs to be passed to the deployment team from a VCS design.

A VCS design includes the following information:• Cluster information, including cluster communications

– The cluster name and ID number– Ethernet ports that will be used for the cluster interconnect– Any other VCS communication channels required

• Member system names• High availability services information

– The service name and type– Systems where the service can start up and run– Startup policies – Failover policies – Interactions with other services– Resources required by the services, and their relationships

• User information and privilege levels• Notification requirements: SNMP/SMTP notification and triggers• Customization requirements: Enterprise and custom agents; cluster, service

group, system, resource, and agent attributes that are not VCS default values

Cluster Design InputA VCS cluster design includes:

Cluster information, including cluster communicationsSystem informationApplication service information, including detailed information about required software and hardware resourcesUser account informationNotification requirementsCustomization requirements

This course provides cluster design information needed to prepare, install, and configure a cluster.

Page 17: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Sample Design InputA VCS design may come in many different formats with varying levels of detail.

In some cases, you may have only the information about the application services that need to be clustered and the desired operational behavior in the cluster. For example, you may be told that the application service uses multiple network ports and requires local failover capability among those ports before it fails over to another system.

In other cases, you may have the information you need as a set of service dependency diagrams with notes on various aspects of the desired cluster operations.

If you receive the design information that does not detail the resource information, develop a detailed design worksheet before starting the deployment, as shown in the following “Cluster Design Worksheet.”

Using a design worksheet to document all aspects of your high availability environment helps ensure that you are well-prepared to start implementing your cluster design.

You are provided with a design worksheet showing sample values to use throughout this course as a tool for implementing the cluster design in the lab exercises.

You can use a similar format to collect all the information you need before starting deployment at your site.

Sample Design Input

IP192.168.3.132

Web Service • Start up on system S1.• Restart Web server

process 3 times before faulting it.

• Fail over to S2 if any resource faults.

• Notify [email protected] any resource faults.

••

WebServer

VolumeWebVol

Disk GroupWebDG

NICeri0

IP Address192.168.3.132

Components required to provide the Web service.Components required to provide the Web service.

Mount/web

Page 18: VERITAS Cluster Server for UNIX Fundamentals

Intro–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Sample Design Worksheet

Example: main.cfgroup WebSG (

SystemList = { S1 = 0, S2 = 1 }AutoStartList = { S1 })

IP WebIP (Device = eri0Address = 192.168.3.132Netmask = 255.255.255.0)

Service Group Definition Sample Value

Group WebSG

Required Attributes

FailOverPolicy Priority

SystemList S1=0 S2=1

Optional Attributes

AutoStartList S1

Resource Definition Sample Value

Service Group WebSG

Resource Name WebIP

Resource Type IP

Required Attributes

Device eri0

Address 192.168.3.132

Optional Attributes

Netmask 255.255.255.0

Critical? Yes (1)

Enabled? Yes (1)

Page 19: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab Design for the CourseThe diagram shows a conceptual view of the cluster design used as an example throughout this course and implemented in hands-on lab exercises.

Each aspect of the cluster configuration is described in greater detail, where applicable, in course lessons.

The cluster consists of:• Two nodes• Five high availability services; four failover service groups and one parallel

network service group• Fibre connections to SAN shared storage from each node through a switch• Two private Ethernet interfaces for the cluster interconnect network• Ethernet connections to the public network

Additional complexity is added to the design to illustrate certain aspects of cluster configuration in later lessons. The design diagram shows a conceptual view of the cluster design described in the worksheet.

Lab Design for the Course

trainxxtrainxx

vcsx

their_nameSG1

their_nameSG2your_nameSG1

your_nameSG2 NetworkSG

Page 20: VERITAS Cluster Server for UNIX Fundamentals

Intro–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab Naming ConventionsTo simplify the labs, use your name or a nickname as a prefix for cluster objects created in the lab exercises. This includes Volume Manager objects, such as disk groups and volumes, as well as VCS service groups and resources.

Following this convention helps distinguish your objects when multiple students are working on systems in the same cluster and helps ensure that each student uses unique names. The lab exercises represent your name with the word name in italics. You substitute the name you select whenever you see the name placeholder in a lab step.

valueSGAttribute3Optional Attributes

valueSGAttribute2valueSGAttribute1

Required AttributesnameSGGroup

Sample Value

Service GroupDefinition

Lab Naming Conventions

. . .valueResAttribute2

nameIPResource Name

valueResAttribute1Required Attributes

IPResource Type

nameSGService Group NameSample ValueResource Definition

Substitute your name, or a nickname, wherever tables or instructions indicate name in labs.Following this convention simplifies labs and helps prevent naming conflicts with your lab partner.

Page 21: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Classroom Values for LabsYour instructor will provide the classroom-specific information you need to perform the lab exercises. You can record these values in your lab books using the tables provided, or your instructor may provide separate handouts showing the classroom values for your location.

In some lab exercises, sample values may be shown in tables as a guide to the types of values you must specify. Substitute the values provided by your instructor to ensure that your configuration is appropriate for your classroom.

If you are not sure of the configuration for your classroom, ask your instructor.

DNS AddressSubnet

Your ValueNetwork Definition

Classroom Values for Labs

Lab files directory. . .

VCS installation dirYour ValueSoftware Location

Use the classroom values provided by your instructor at the beginning of each lab exercise.Lab tables are provided to at the beginning of the lab to record these values. Alternately, your instructor may hand out printed tables.If sample values are provided as guidelines, substitute your classroom-specific values provided by your instructor.

Page 22: VERITAS Cluster Server for UNIX Fundamentals

Intro–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Course OverviewThis training provides comprehensive instruction on the installation and initial configuration of VERITAS Cluster Server (VCS). The course covers principles and methods that enable you to prepare, create, and test VCS service groups and resources using tools that best suit your needs and your high availability environment. You learn to configure and test failover and notification behavior, cluster additional applications, and further customize your cluster according to specified design criteria.

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Course Overview

Page 23: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

Course ResourcesThis course uses this participant guide containing lessons presented by your instructor and lab exercises to enable you to practice your new skills.

Lab materials are provided in three forms, with increasing levels of detail to suit a range of student expertise levels. • Appendix A: Lab Synopses has high-level task descriptions and design

worksheets.• Appendix B: Lab Details includes the lab procedures and detailed steps.• Appendix C: Lab Solutions includes the lab procedures and steps with the

corresponding command lines required to perform each step.• Appendix D: Job Aids provides supplementary material that can be used as

on-the-job guides for performing some common VCS operations.• Appendix E: Design Worksheet Template provides a blank design

worksheet.

Additional supplements may be used in the classroom or provided to you by your instructor.

Course ResourcesParticipant Guide– Lessons– Appendix A: Lab Synopses– Appendix B: Lab Details– Appendix C: Lab Solutions– Appendix D: Job Aids– Appendix E: Design Worksheet Template

Supplements– VCS Simulator: van.veritas.com– Troubleshooting Job Aid– VCS Command-Line Reference card– Tips & Tricks: www.veritas.com/education

Page 24: VERITAS Cluster Server for UNIX Fundamentals

Intro–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Course PlatformsThis course material applies to the VCS platforms shown in the slide. Indicators are provided in slides and text where there are differences in platforms.

Refer to the VERITAS Cluster Server user documentation for your platform and version to determine which features are supported in your environment.

Course PlatformsThis course covers the following versions of VCS:

VCS 4.1, 4.0, and 3.5 for SolarisVCS 4.0 for LinuxVCS 4.0 for AIXVCS 3.5 for HP-UX

Page 25: VERITAS Cluster Server for UNIX Fundamentals

Course Introduction Intro–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

LegendThese are common symbols used in this course.

Symbol Description

Server, node, or cluster system (terms used interchangeably)

Server or cluster system that has faulted

Storage

Application service

Cluster interconnect

Wide area network (WAN) cloud

Page 26: VERITAS Cluster Server for UNIX Fundamentals

Intro–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Client systems on a network

VCS service group

Offline service group

VCS resource

Symbol Description

Page 27: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1VCS Building Blocks

Page 28: VERITAS Cluster Server for UNIX Fundamentals

1–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson introduces basic VERITAS Cluster Server terminology and concepts, and provides an overview of the VCS architecture and supporting communication mechanisms.

ImportanceThe terms and concepts covered in this lesson provide a foundation for learning the tasks you need to perform to deploy the VERITAS Cluster Server product, both in the classroom and in real-world applications.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 29: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Outline of Topics• Cluster Terminology • Cluster Communication• Maintaining the Cluster Configuration• VCS Architecture• Supported Failover Configurations

Describe the failover configurations supported by VCS.

Supported Failover Configurations

Describe the VCS architecture.VCS Architecture

Describe how the cluster configuration is maintained.

Maintaining the Cluster Configuration

Describe cluster communication mechanisms.

Cluster CommunicationDefine clustering terminology.Cluster Terminology

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 30: VERITAS Cluster Server for UNIX Fundamentals

1–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Cluster Terminology A Nonclustered Computing EnvironmentAn example of a traditional, nonclustered computing environment is a single server running an application that provides public network links for client access and data stored on local or SAN storage.

If a single component fails, application processing and the business service that relies on the application are interrupted or degraded until the failed component is repaired or replaced.

A Nonclustered Computing Environment

Page 31: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Definition of a ClusterA clustered environment includes multiple components configured such that if one component fails, its role can be taken over by another component to minimize or avoid service interruption.

This allows clients to have high availability to their data and processing, which is not possible in nonclustered environments.

The term cluster, simply defined, refers to multiple independent systems or domains connected into a management framework for increased availability.

Clusters have the following components: • Up to 32 systems—sometimes referred to as nodes or servers

Each system runs its own operating system.• A cluster interconnect, which allows for cluster communications • A public network, connecting each system in the cluster to a LAN for client

access• Shared storage (optional), accessible by each system in the cluster that needs to

run the application

Definition of a Cluster

A cluster is a collection of multiple independent systems working together under a management framework for increased service availability.

Application

Node

Storage

Cluster Interconnect

Page 32: VERITAS Cluster Server for UNIX Fundamentals

1–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Definition of VERITAS Cluster Server and FailoverIn a highly available environment, HA software must perform a series of tasks in order for clients to access a service on another server in the event a failure occurs. The software must:• Ensure that data stored on the disk is available to the new server, if shared

storage is configured (Storage).• Move the IP address of the old server to the new server (Network).• Start up the application on the new server (Application).

VERITAS Cluster Server (VCS) is a software solution for automating these tasks. VCS monitors and controls applications running in the cluster and, if a failure is detected, automates application restart.

When another server is required to restart the application, VCS performs a failover—this is the process of stopping the application on one system and starting them on another system.

Definition of VERITAS Cluster Server and Failover

VCS detects faults and performs automated failover.

Application

Node

Failed Node

Storage

Cluster Interconnect

Page 33: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Definition of an Application ServiceAn application service is a collection of hardware and software components required to provide a service, such as a Web site an end-user may access by connecting into a particular network IP address or host name. Each application service typically requires components of the following three types:• Application binaries (executables)• Network• Storage

If an application service needs to be switched to another system, all of the components of the application service must migrate together to re-create the service on another system.

Note: These are the same components that the administrator must manually move from a failed server to a working server to keep the service available to clients in a nonclustered environment.

Application service examples include:• A Web service consisting of a Web server program, IP addresses, associated

network interfaces used to allow access into the Web site, a file system containing Web data files, and a volume and disk group containing the file system.

• A database service may consist of one or more IP addresses, relational database management system (RDBMS) software, a file system containing data files, a volume and disk group on which the file system resides, and a NIC for network access.

Definition of an Application Service

An application service is a collectionof all the hardware and software components required to provide a service.

If the service must be migrated to another system, all components need to be moved in an orderly fashion.Examples include Web servers, databases, and applications.

Page 34: VERITAS Cluster Server for UNIX Fundamentals

1–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Definition of Service Group A service group is a virtual container that enables VCS to manage an application service as a unit. The service group contains all the hardware and software components required to run the service, which enables VCS to coordinate failover of the application service resources in the event of failure or at the administrator’s request.

A service group is defined by these attributes: • The cluster-wide unique name of the group• The list of the resources in the service group, usually determined by which

resources are needed to run a specific application service• The dependency relationships between the resources • The list of cluster systems on which the group is allowed to run• The list of cluster systems on which you want the group to start automatically

Definition of a Service Group

A service group is a virtual containerthat enables VCS to manage an application service as a unit.

All components required to provide the service, and the relationships between these components, are defined within the service group.A service groups has attributes that define its behavior, such as where it can start and run.

Page 35: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Service Group TypesService groups can be one of three types:• Failover

This service group runs on one system at a time in the cluster. Most application services, such as database and NFS servers, use this type of group.

• Parallel This service group runs simultaneously on more than one system in the cluster. This type of service group requires an application that can be started on more than one system at a time without threat of data corruption.

• Hybrid (4.x)A hybrid service group is a combination of a failover service group and a parallel service group used in VCS 4.x replicated data clusters (RDCs), which are based on VERITAS Volume Replicator. This service group behaves as a failover group within a defined set of systems, and a parallel service group within a different set of systems. RDC configurations are described in the VERITAS Disaster Recovery Using VVR and Global Cluster Option course.

Service Group TypesFailover:– The service group can be online on only one cluster

system at a time.– VCS migrates the service group at the administrator’s

request and in response to faults.Parallel– The service group can be online on multiple cluster

systems simultaneously.– An example is Oracle Real Application Cluster (RAC).

HybridThis is a special-purpose type of service group used to manage service groups in replicated data clusters (RDCs), which are based on VERITAS Volume Replicator.

Page 36: VERITAS Cluster Server for UNIX Fundamentals

1–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Definition of a ResourceResources are VCS objects that correspond to hardware or software components, such as the application, the networking components, and the storage components.

VCS controls resources through these actions:• Bringing a resource online (starting)• Taking a resource offline (stopping)• Monitoring a resource (probing)

Resource Categories• Persistent

– NoneVCS can only monitor persistent resources—they cannot be brought online or taken offline. The most common example of a persistent resource is a network interface card (NIC), because it must be present but cannot be stopped. FileNone and ElifNone are other examples.

– On-onlyVCS brings the resource online if required, but does not stop it if the associated service group is taken offline. NFS daemons are examples of on-only resources. FileOnOnly is another on-only example.

• Nonpersistent, also known as on-offMost resources fall into this category, meaning that VCS brings them online and takes them offline as required. Examples are Mount, IP, and Process. FileOnOff is an example of a test version of this resource.

Definition of a ResourceResources are VCS objects that correspond to thehardware or software components of an applicationservice.

Each resource must have a unique name throughout the cluster. Choosing names that reflect the service group name makes it easy to identify all resources in that group, for example, WebIP in the WebSG group.Resources are always contained within service groups.Resource categories include:– Persistent

None (NIC)On-only (NFS)

– NonpersistentOn-off (Mount)

Page 37: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Resource DependenciesResources depend on other resources because of application or operating system requirements. Dependencies are defined to configure VCS for these requirements.

Dependency Rules

These rules apply to resource dependencies:• A parent resource depends on a child resource. In the diagram, the Mount

resource (parent) depends on the Volume resource (child). This dependency illustrates the operating system requirement that a file system cannot be mounted without the Volume resource being available.

• Dependencies are homogenous. Resources can only depend on other resources.

• No cyclical dependencies are allowed. There must be a clearly defined starting point.

Resource DependenciesResources in a service group have a defined dependency relationship, which determines theonline and offline order of the resource.

A parent resource dependson a child resource.There is no limit to the number of parent and child resources.Persistent resources, such as NIC, cannot be parent resources.Dependencies cannot be cyclical.

Parent/child

Child

Parent

Page 38: VERITAS Cluster Server for UNIX Fundamentals

1–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Resource AttributesResources attributes define the specific characteristics on individual resources. As shown in the slide, the resource attribute values for the sample resource of type Mount correspond to the UNIX command line to mount a specific file system. VCS uses the attribute values to run the appropriate command or system call to perform an operation on the resource.

Each resource has a set of required attributes that must be defined in order to enable VCS to manage the resource.

For example, the Mount resource on Solaris has four required attributes that must be defined for each resource of type Mount:• The directory of the mount point (MountPoint)• The device for the mount point (BlockDevice)• The type of file system (FSType)• The options for the fsck command (FsckOpt)

The first three attributes are the values used to build the UNIX mount command shown in the slide. The FsckOpt attribute is used if the mount command fails. In this case, VCS runs fsck with the specified options (-y) and attempts to mount the file system again.

Some resources also have additional optional attributes you can define to control how VCS manages a resource. In the Mount resource example, MountOpt is an optional attribute you can use to define options to the UNIX mount command. For example, if this is a read-only file system, you can specify -ro as the MountOpt value.

Resource AttributesResource attributes definean individual resource.

The attribute values are used by VCS to manage the resource.Resources can have required and optionalattributes, as specified by the resource type definition.

mount –F vxfs /dev/vx/dsk/WebDG/WebVol /Webmount –F vxfs /dev/vx/dsk/WebDG/WebVol /Web

WebMountresourceWebMountresource

SolarisSolaris

Page 39: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Resource Types and Type AttributesResources are classified by resource type. For example, disk groups, network interface cards (NICs), IP addresses, mount points, and databases are distinct types of resources. VCS provides a set of predefined resource types—some bundled, some add-ons—in addition to the ability to create new resource types.

Individual resources are instances of a resource type. For example, you may have several IP addresses under VCS control. Each of these IP addresses individually is a single resource of resource type IP.

A resource type can be thought of as a template that defines the characteristics or attributes needed to define an individual resource (instance) of that type.

You can view the relationship between resources and resource types by comparing the mount command for a resource on the previous slide with the mount syntax on this slide. The resource type defines the syntax for the mount command. The resource attributes fill in the values to form an actual command line.

Resource TypesResources are classifiedby type.

The resource typespecifies the attributes needed to define a resource of that type.For example, a Mount resource has different properties than an IP resource.

mount [-F FSType] [options] block_device mount_pointmount [-F FSType] [options] block_device mount_point

SolarisSolaris

Page 40: VERITAS Cluster Server for UNIX Fundamentals

1–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Agents: How VCS Controls ResourcesAgents are processes that control resources. Each resource type has a corresponding agent that manages all resources of that resource type. Each cluster system runs only one agent process for each active resource type, no matter how many individual resources of that type are in use.

Agents control resources using a defined set of actions, also called entry points. The four entry points common to most agents are:• Online: Resource startup• Offline: Resource shutdown• Monitor: Probing the resource to retrieve status • Clean: Killing the resource or cleaning up as necessary when a resource fails to

be taken offline gracefully

The difference between offline and clean is that offline is an orderly termination and clean is a forced termination. In UNIX, this can be thought of as the difference between exiting an application and sending the kill -9 command to the process.

Each resource type needs a different way to be controlled. To accomplish this, each agent has a set of predefined entry points that specify how to perform each of the four actions. For example, the startup entry point of the Mount agent mounts a block device on a directory, whereas the startup entry point of the IP agent uses the ifconfig command to set the IP address on a unique IP alias on the network interface.

VCS provides both predefined agents and the ability to create custom agents.

Agents have one or more entry points that perform a set of actions on resources.Each system runs one agent for each active resource type.

Agents: How VCS Controls ResourcesEach resource type has a corresponding agent process that manages all resources of that type.

online

offline

monitor

clean

NIC

eri0

IP

10.1.2.3

Mount

/web /log

Volume

WebVol logVol

Disk Group

WebDG

Page 41: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Using the VERITAS Cluster Server Bundled Agents Reference GuideThe VERITAS Cluster Server Bundled Agents Reference Guide describes the agents that are provided with VCS and defines the required and optional attributes for each associated resource type.

Excerpts of the definitions for the NIC, Mount, and Process resource types are included in the “Job Aids” appendix.

VERITAS also provides a set of agents that are purchased separately from VCS, known as enterprise agents. Some examples of enterprise agents are:• Oracle• NetBackup• Informix• iPlanet

Select the Agents and Options link on the VERITAS Cluster Server page at www.veritas.com for a complete list of agents available for VCS.

To obtain PDF versions of product documentation for VCS and agents, see the Support Web site at http://support.veritas.com.

Using the VERITAS Cluster Server Bundled Agents Reference Guide

Solaris HP-UXAIX Linux

The VERITAS Cluster Server Bundled Agents Reference Guide defines all VCS resource types for all bundled agents.See http://support.veritas.com for product documentation.

The VERITAS Cluster Server Bundled Agents Reference Guide defines all VCS resource types for all bundled agents.See http://support.veritas.com for product documentation.

Page 42: VERITAS Cluster Server for UNIX Fundamentals

1–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Cluster CommunicationVCS requires a cluster communication channel between systems in a cluster to serve as the cluster interconnect. This communication channel is also sometimes referred to as the private network because it is often implemented using a dedicated Ethernet network.

VERITAS recommends that you use a minimum of two dedicated communication channels with separate infrastructures—for example, multiple NICs and separate network hubs—to implement a highly available cluster interconnect. Although recommended, this configuration is not required.

The cluster interconnect has two primary purposes:• Determine cluster membership: Membership in a cluster is determined by

systems sending and receiving heartbeats (signals) on the cluster interconnect. This enables VCS to determine which systems are active members of the cluster and which systems are joining or leaving the cluster.In order to take corrective action on node failure, surviving members must agree when a node has departed. This membership needs to be accurate and coordinated among active members—nodes can be rebooted, powered off, faulted, and added to the cluster at any time.

• Maintain a distributed configuration: Cluster configuration and status information for every resource and service group in the cluster is distributed dynamically to all systems in the cluster.

Cluster communication is handled by the Group Membership Services/Atomic Broadcast (GAB) mechanism and the Low Latency Transport (LLT) protocol, as described in the next sections.

Cluster Communication

The cluster interconnectserves to:

Determine which systems are members of the cluster using a heartbeat mechanism.Maintain a single view of the status of the cluster configuration on all systems in the cluster membership.

A cluster interconnect provides a communicationchannel between cluster nodes.

Page 43: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Low-Latency TransportVERITAS uses a high-performance, low-latency protocol for cluster communications. LLT is designed for the high-bandwidth and low-latency needs of not only VERITAS Cluster Server, but also VERITAS Cluster File System, in addition to Oracle Cache Fusion traffic in Oracle RAC configurations. LLT runs directly on top of the Data Link Provider Interface (DLPI) layer over Ethernet and has several major functions:• Sending and receiving heartbeats over network links• Monitoring and transporting network traffic over multiple network links to

every active system• Balancing cluster communication load over multiple links• Maintaining the state of communication• Providing a nonroutable transport mechanism for cluster communications

Low-Latency Transport (LLT)

LLT

LLT

LLT:Is responsible for sending heartbeat messagesTransports cluster communication traffic to every active systemBalances traffic load across multiple network linksMaintains the communication link stateIs a nonroutable protocolRuns on an Ethernet network

Page 44: VERITAS Cluster Server for UNIX Fundamentals

1–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Group Membership Services/Atomic Broadcast (GAB)GAB provides the following:• Group Membership Services: GAB maintains the overall cluster

membership by way of its Group Membership Services function. Cluster membership is determined by tracking the heartbeat messages sent and received by LLT on all systems in the cluster over the cluster interconnect. Heartbeats are the mechanism VCS uses to determine whether a system is an active member of the cluster, joining the cluster, or leaving the cluster. If a system stops sending heartbeats, GAB determines that the system has departed the cluster.

• Atomic Broadcast: Cluster configuration and status information are distributed dynamically to all systems in the cluster using GAB’s Atomic Broadcast feature. Atomic Broadcast ensures all active systems receive all messages for every resource and service group in the cluster.

Group Membership Services/Atomic Broadcast (GAB)

GAB:Performs two functions:– Manages cluster

membership; referred to as GAB membership

– Sends and receives atomic broadcasts of configuration information

Is a proprietary broadcast protocolUses LLT as its transport mechanism

LLTLLT

GAB LLT

GAB

Page 45: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

The Fencing DriverThe fencing driver prevents multiple systems from accessing the same Volume Manager-controlled shared storage devices in the event that the cluster interconnect is severed. In the example of a two-node cluster displayed in the diagram, if the cluster interconnect fails, each system stops receiving heartbeats from the other system.

GAB on each system determines that the other system has failed and passes the cluster membership change to the fencing module.

The fencing modules on both systems contend for control of the disks according to an internal algorithm. The losing system is forced to panic and reboot. The winning system is now the only member of the cluster, and it fences off the shared data disks so that only systems that are still part of the cluster membership (only one system in this example) can access the shared storage.

The winning system takes corrective action as specified within the cluster configuration, such as bringing service groups online that were previously running on the losing system.

The Fencing DriverFencing:

Monitors GAB to detect cluster membership changesEnsures a single view of cluster membershipPrevents multiple nodes from accessing the same Volume Manager 4.x shared storage devices

LLT

GAB

Fence

Fence

LLT

GAB

Reboot

Page 46: VERITAS Cluster Server for UNIX Fundamentals

1–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The High Availability DaemonThe VCS engine, also referred to as the high availability daemon (had), is the primary VCS process running on each cluster system.

HAD tracks all changes in cluster configuration and resource status by communicating with GAB. HAD manages all application services (by way of agents) whether the cluster has one or many systems.

Building on the knowledge that the agents manage individual resources, you can think of HAD as the manager of the agents. HAD uses the agents to monitor the status of all resources on all nodes.

This modularity between had and the agents allows for efficiency of roles:• HAD does not need to know how to start up Oracle or any other applications

that can come under VCS control. • Similarly, the agents do not need to make cluster-wide decisions.

This modularity allows a new application to come under VCS control simply by adding a new agent—no changes to the VCS engine are required.

On each active cluster system, HAD updates all the other cluster systems of changes to the configuration or status.

In order to ensure that the had daemon is highly available, a companion daemon, hashadow, monitors had and if had fails, hashadow attempts to restart it. Likewise, had restarts hashadow if hashadow stops.

The High Availability Daemon (HAD)The VCS engine, the high availability daemon:– Runs on each system

in the cluster– Maintains configuration

and state information for all cluster resources

– Manages all agentsThe hashadow daemon monitors HAD.

HADhashadow

LLT

GAB

Fence

Page 47: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Comparing VCS Communication Protocols and TCP/IPTo illustrate the suitability and use of GAB and LLT for VCS communications, compare GAB running over LLT with TCP/IP, the standard public network protocols.

GAB Versus TCP

GAB is a multipoint-to-multipoint broadcast protocol; all systems in the cluster send and receive messages simultaneously. TCP is a point-to-point protocol.

GAB Versus UDP

GAB also differs from UDP, another broadcast protocol. UDP is a fire-and-forget protocol—it merely sends the packet and assumes it is received. GAB, however, checks and guarantees delivery of transmitted packets, because it requires broadcasts to all nodes including the originator.

LLT Versus IP

LLT is driven by GAB, has specific targets in its domain and assumes constant connection between servers, known as a connection-oriented protocol. IP is a connectionless protocol— it assumes that packets can take different paths to reach the same destination.

Comparing VCS Communication Protocols and TCP/IP

HADhashadow

LLT

GAB

iPlanet

NIC

TCP

IP

NIC

User Processes

Kernel Processes

Hardware

Page 48: VERITAS Cluster Server for UNIX Fundamentals

1–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Maintaining the Cluster ConfigurationHAD maintains configuration and state information for all cluster resources in memory on each cluster system. Cluster state refers to tracking the status of all resources and service groups in the cluster. When any change to the cluster configuration occurs, such as the addition of a resource to a service group, HAD on the initiating system sends a message to HAD on each member of the cluster by way of GAB atomic broadcast, to ensure that each system has an identical view of the cluster.

Atomic means that all systems receive updates, or all systems are rolled back to the previous state, much like a database atomic commit.

The cluster configuration in memory is created from the main.cf file on disk in the case where HAD is not currently running on any cluster systems, so there is no configuration in memory. When you start VCS on the first cluster system, HAD builds the configuration in memory on that system from the main.cf file. Changes to a running configuration (in memory) are saved to disk in main.cf when certain operations occur. These procedures are described in more detail later in the course.

Maintaining the Cluster ConfigurationHAD maintains a replica of the cluster configuration in memory on each system.Changes to the configuration are broadcast to HAD on all systems simultaneously by way of GAB using LLT.The configuration is preserved on disk in the main.cf file.

HADmain.cf hashadow

HADhashadow

Page 49: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

VCS Configuration Files

Configuring VCS means conveying to VCS the definitions of the cluster, service groups, resources, and resource dependencies. VCS uses two configuration files in a default configuration:• The main.cf file defines the entire cluster, including cluster name, systems

in the cluster, and definitions of service groups and resources, in addition to service group and resource dependencies.

• The types.cf file defines the resource types.

Additional files similar to types.cf may be present if agents have been added. For example, if the Oracle enterprise agent is added, a resource types file, such as OracleTypes.cf, is also present.

The cluster configuration is saved on disk in the /etc/VRTSvcs/conf/config directory, so the memory configuration can be re-created after systems are restarted.

Note: The VCS installation utility creates the $VCS_CONF environment variable containing the /etc/VRTSvcs path. The short path to the configuration directory is $VCS_CONF/conf/config.

VCS Configuration Files

include "types.cf"cluster vcs (

UserNames = { admin = ElmElgLimHmmKumGlj }Administrators = { admin }CounterInterval = 5)

system S1 ()

system S2 ()

group WebSG (SystemList = { S1 = 0, S2 = 1 })Mount WebMount (

MountPoint = "/web"BlockDevice = "/dev/vx/dsk/WebDG/WebVol"FSType = vxfsFsckOpt = "-y")

main.cf

A simple text file is used to store the cluster configuration on disk.The file contents are described in detail later in the course.

A simple text file is used to store the cluster configuration on disk.The file contents are described in detail later in the course.

Page 50: VERITAS Cluster Server for UNIX Fundamentals

1–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

VCS ArchitectureThe slide shows how the major components of the VCS architecture work together to manage application services.

How does VCS know what to fail over?Each cluster system has its own copy of configuration files, libraries, scripts, daemons, and executable programs that are components of VCS. Cluster systems share a common view of the cluster configuration.

An application service consists of all the resources that the application requires in order to run, including the application itself, and networking and storage resources. This application service provides the structure for a service group, which is the unit of failover.

Dependencies define whether a resource or service group failure impacts other resources or service groups. Dependencies also define the order VCS brings service groups and resources online or takes them offline.

How does VCS know when to fail over?Agents communicate the status of resources to HAD, the VCS engine. The agents alert the engine when a resource has faulted. The VCS engine determines what to do and initiates any necessary action.

VCS ArchitectureAgents monitor resources on each system and provide status to HAD on the local system.HAD on each system sends status information to GAB.GAB broadcasts configuration information to all cluster members. LLT transports all cluster communications to all cluster nodes.HAD on each node takes corrective action, such as failover, when necessary.

Page 51: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

Supported Failover ConfigurationsThe following examples illustrate the wide variety of failover configurations supported by VCS.

Active/Passive In this configuration, an application runs on a primary or master server. A dedicated redundant server is present to take over on any failover. The redundant server is not configured to perform any other functions.

The redundant server is on standby with full performance capability. The next examples show types of active/passive configurations:

Active/Passive

Before Failover After Failover

Page 52: VERITAS Cluster Server for UNIX Fundamentals

1–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

N-to-1This configuration reduces the cost of hardware redundancy while still providing a dedicated spare. One server protects multiple active servers, on the theory that simultaneous multiple failures are unlikely.

This configuration is used when shared storage is limited by the number of servers that can attach to it and requires that after the faulted system is repaired, the original configuration is restored.

Active/Passive N-to-1

Before Failover

After Failover

Page 53: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

N + 1When more than two systems can connect to the same shared storage, as in a SAN environment, a single dedicated redundant server is no longer required.

When a server fails in this environment, the application service restarts on the spare. Unlike the N-to-1 configuration, after the failed server is repaired, it can then become the redundant server.

Before Failover

After Repair

Active/Passive N + 1

After Failover

Page 54: VERITAS Cluster Server for UNIX Fundamentals

1–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Active/Active In an active/active configuration, each server is configured to run a specific application service, as well as to provide redundancy for its peer.

In this configuration, hardware usage appears to be more efficient because there are no standby servers. However, each server must be robust enough to run multiple application services, increasing the per-server cost up front.

Active/Active

Before Failover After Failover

Page 55: VERITAS Cluster Server for UNIX Fundamentals

Lesson 1 VCS Building Blocks 1–29Copyright © 2005 VERITAS Software Corporation. All rights reserved.

1

N-to-NThis configuration is an active/active configuration that supports multiple application services running on multiple servers. Each application service is capable of being failed over to different servers in the cluster.

Careful testing is required to ensure that all application services are compatible to run with other application services that may fail over to the same server.

N-to-N

Before Failover

After Failover

Page 56: VERITAS Cluster Server for UNIX Fundamentals

1–30 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson introduced the basic VERITAS Cluster Server terminology and gave an overview of VCS architecture and supporting communication mechanisms.

Next StepsYour understanding of basic VCS functions enables you to prepare your site for installing VCS.

Additional Resources• High Availability Design Using VERITAS Cluster Server

This course will be available in the future from VERITAS Education if you are interested in developing custom agents or learning more about high availability design considerations for VCS environments.

• VERITAS Cluster Server Bundled Agents Reference GuideThis guide describes each bundled agent in detail.

• VERITAS Cluster Server User’s GuideThis guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

Lesson SummaryKey Points – HAD is the primary VCS process, which manages

resources by way of agents.– Resources are organized into service groups.– Each system in a cluster has an identical view of

the state of resources and service groups.Reference Materials– High Availability Design Using VERITAS Cluster

Server course– VERITAS Cluster Server Bundled Agents

Reference Guide– VERITAS Cluster Server User’s Guide

Page 57: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2Preparing a Site for VCS

Page 58: VERITAS Cluster Server for UNIX Fundamentals

2–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes guidelines and considerations for planning to deploy VERITAS Cluster Server (VCS). You also learn how to prepare your site for installing VCS.

ImportanceBefore you install VERITAS Cluster Server, you must prepare your environment to meet the requirements needed to implement a cluster. By following these guidelines, you can ensure that your system hardware and software are configured to install VCS.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 59: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

Outline of Topics• Planning for Implementation• Hardware Requirements and Recommendations• Software Requirements and Recommendations• Preparing Cluster Information

Collect cluster design information to prepare for installation.

Preparing Cluster Information

Describe general VCS software requirements.

Software Requirements and Recommendations

Describe general VCS hardware requirements.

Hardware Requirements and Recommendations

Plan for VCS implementation.Planning for Implementation

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 60: VERITAS Cluster Server for UNIX Fundamentals

2–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Planning for ImplementationImplementation Needs

Confirm Technical Personnel Availability

Verify that you have identified the key internal personnel required for deployment and ensure that they are available as needed. Also, consider including other IT staff who will be involved in ongoing maintenance of the cluster. The deployment phase is an opportunity for transferring knowledge about how to manage applications and cluster services.

Confirm System and Site Access

Consider site security and equipment access:• What is the time window allowed for deployment? • When will you have access to the systems? • Is there special security in the server room that you need to be aware of?• Which user name and password will you use to obtain access to the systems?

Consider that you will need initial access to systems for verification purposes prior to the onset of implementation.

Implementation NeedsAccess to staffing resources, as required– Network, system, and application

administrators required for configuration and testing

– Future cluster operators and administrators who should be involved in deployment in preparation for managing the cluster

Physical access to the equipment in accordance with security policyAccess to support resources, such as VERITAS, operating system, application vendor telephone, and Web sites

Page 61: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

The Implementation PlanVCS installation, configuration, and testing has an impact on running application services and operations. When preparing for VCS installation and configuration, develop an implementation plan that takes into account how VERITAS products can be installed with minimal impact on the services already running.

You can use an implementation plan to:• Describe any actions necessary to prepare the environment for VCS

installation.• Describe the impacts on staff and services during the implementation.• Determine how to minimize the time period during which services are not

available.• Determine the impact of clustering application services on operational

procedures. For example, applications under VCS control should no longer be stopped or started without taking VCS into consideration, which may impact the way backups are taken on a server.

VERITAS recommends that you prepare a detailed design worksheet to be used during VCS installation and configuration if you are not provided with a completed worksheet resulting from the design phase. A sample design worksheet is provided for the deployment tasks that are carried out during this course.

The Implementation Plan• Prepare a plan stating the impact of VCS implementation

on running services and operations. For example, you may need to add network interfaces, patch the operating system, or upgrade applications.

• Determine how to minimize downtime for existing services, taking into consideration the time needed for operational testing.

• Plan any actions necessary to prepare the environment for VCS installation as described throughout this lesson. Consider how these activities may affect running services.

• Prepare or complete a design worksheet that is used during VCS installation and configuration, if this worksheet is not provided.

Page 62: VERITAS Cluster Server for UNIX Fundamentals

2–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the Design Worksheet This course assumes that you are given a completed design worksheet, which you can use to prepare the site for VCS installation and deployment. As you configure your cluster environment in preparation for installation, verify that the information in the design worksheet is accurate and complete. If you are not provided with a completed design worksheet, you can use the site preparation phase as an opportunity to record information in a new worksheet. You can then use this worksheet later when you are installing VCS.

Cluster Definition Value

Cluster Name vcs

Required Attributes

UserNames admin=password

ClusterAddress 192.168.3.91

Administrators admin

System Definition Value

System S1

System S2

Using the Design Worksheet

S1

S2

Validate the design worksheet as you prepare the site.

Validate the design worksheet as you prepare the site.

adminAdministrators192.168.3.91ClusterAddressadmin=passwordUserNames

Required AttributesvcsCluster NameValueCluster Definition

S1SystemS2System

ValueSystem Definition

Page 63: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

Hardware Requirements and RecommendationsSee the hardware compatibility list (HCL) at the VERITAS Web site for the most recent list of supported hardware for VERITAS products.

Networking

VERITAS Cluster Server requires a minimum of two heartbeat channels for the cluster interconnect, one of which must be an Ethernet network connection. While it is possible to use a single network and a disk heartbeat, the best practice configuration is two or more network links.

Loss of the cluster interconnect results in downtime, and in nonfencing environments, can result in split brain condition (described in detail later in the course).

For a highly available configuration, each system in the cluster must have a minimum of two physically independent Ethernet connections for the cluster interconnect:• Two-system clusters can use crossover cables.• Clusters with three or more systems require hubs or switches.• You can use layer 2 switches; however, this is not a requirement.

Note: For clusters using VERITAS Cluster File System or Oracle Real Application Cluster (RAC), VERITAS recommends the use of multiple gigabit interconnects and gigabit switches.

Hardware Requirements and RecommendationsHardware requirements:– Supported hardware (HCL)– Minimum configurations– Redundant cluster interconnect

Hardware recommendations:– Redundant public network

interfaces and infrastructures– Redundant HBAs for shared

storage (Fibre or SCSI)– Redundant storage arrays– Uninterruptible power supplies– Identically configured systems

System typeNetwork interface cardsStorage HBAs

• support.veritas.com• www.veritas.com

• support.veritas.com• www.veritas.com

Page 64: VERITAS Cluster Server for UNIX Fundamentals

2–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Shared Storage

VCS is designed primarily as a shared data high availability product; however, you can configure a cluster that has no shared storage.

For shared storage clusters, consider these requirements and recommendations:• One HBA minimum for nonshared disks, such as system (boot) disks

To eliminate single points of failure, it is recommended to use two HBAs to connect to the internal disks and to mirror the system disk.

• One HBA minimum for shared disks› To eliminate single points of failure, it is recommended to have two

HBAs to connect to shared disks and to use a dynamic multipathing software, such as VERITAS Volume Manager DMP.

› Use multiple single-port HBAs or SCSI controllers rather than multiport interfaces to avoid single points of failure.

• Shared storage on a SAN must reside in the same zone as all of the nodes in the cluster.

• Data residing on shared storage should be mirrored or protected by a hardware-based RAID mechanism.

• Use redundant storage and paths.• Include all cluster-controlled data in your backup planning and

implementation. Periodically test restoration of critical data to ensure that the data can be restored.

Page 65: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

SCSI Controller Configuration for Shared StorageIf using shared SCSI disk arrays, the SCSI controllers on each system must be configured so that they do not conflict with any devices on the SCSI bus.

SCSI Interfaces

Additional considerations for SCSI implementations:• Both differential and single-ended SCSI controllers require termination;

termination can be either active or passive.• All SCSI devices on a controller must be compatible with the controller—use

only differential SCSI devices on a differential SCSI controller.• Mirror disks on separate controllers for additional fault tolerance.• Configurations with two systems can use standard cables; a bus can be

terminated at each system with disks between systems.• Configurations with more than two systems require cables with connectors that

are appropriately spaced.

Cabling SCSI Devices

Use the following procedure when cabling SCSI devices:1 Shut down all systems in the cluster.2 If the cluster has two systems, cable shared devices in a SCSI chain with the

systems at the ends of the chain.3 If the cluster has more than two systems, disable SCSI termination on systems

that are not at the end of a SCSI chain.

SCSI Controller Configuration Requirements

scsi-initiator-id

7

scsi-initiator-id

5 Typical

default7

Not applicable for fibre attached storageIf using SCSI for shared storage:– Use unique SCSI IDs for each system. – Check the controller SCSI ID on both systems and the SCSI

IDs of the disks in shared storage.– Change the controller SCSI ID on one system, if necessary.– Shut down, cable shared disks, and reboot.– Verify that both systems can see all the shared disks.

Solaris HP-UXAIX Linux

Page 66: VERITAS Cluster Server for UNIX Fundamentals

2–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Changing the Controller SCSI IDSolaris

1 Use the eeprom command to check the SCSI initiator ID on each system.2 If necessary, connect shared storage to one system only and check the SCSI

IDs of disk devices using the probe-scsi-all command at the ok prompt from the system.

3 Select a unique SCSI ID for each system on the shared SCSI bus.Note: SCSI is designed to monitor and respond to requests from SCSI IDs in this order: 7 to 0, then 15 to 8. Therefore, use high-priority IDs for the systems and lower-priority IDs for devices, such as disks. For example, use 7, 6, and 5 for the systems and use the remaining IDs for the devices.a If the SCSI initiator IDs are already set to unique values, you do not need to

make any changes.b If it is necessary to change the SCSI ID for each system, bring the system

to the ok prompt, and type these commands: setenv scsi-initiator-id idok setenv scsi-initiator-id 5

Notes: – You can also change this parameter without suspending the system by

typing the eeprom scsi-initiator-id=5 command from the command line. However, the change does not take place until you reboot.

– Because this command changes the SCSI ID of all the controllers on the system, you need to ensure that there are no conflicts with devices on the nonshared controllers, as well.

4 Reboot all of the systems by typing:ok boot -r

Note: While this is a very quick and effective method, it changes the SCSI ID for all controllers on that system. To control the individual SCSI IDs for each controller in the system, refer to the VERITAS Cluster Server Installation Guide.

AIX1 Determine the SCSI adapters on each system:

lsdev -C -c adapter | grep scsi 2 Verify the SCSI ID of each adapter:

lsattr -E -l scsi0 -a idlsattr -E -l scsi1 -a id

3 Change the SCSI initiator ID, if needed, on one system only:chdev -P -l scsi0 -a id=5chdev -P -l scsi1 -a id=5

4 Shut down, cable disks, and reboot.5 Verify shared storage devices from both systems:

lspv

Page 67: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

HP-UX1 Check the controller SCSI ID and SCSI IDs of shared disk devices using the

ioscan –fnC ctl command.2 Change the controller SCSI ID, if needed.

Some controller cards have a dip switch to set the controller SCSI ID. You may need to call an HP service technician to make this change.For PCI controllers that require a software setup:– Reboot the system.– Break out of the boot process.– Change the SCSI initiator ID using the configuration menu:

Main Menu:Enter command>ser scsi init path valueMain Menu:Enter command>ser scsi init 8/4 5

3 Use the ioscan -fn command to verify shared disks after the system reboots.

Linux1 Connect the disk to the first cluster system.2 Power on the disk.3 Connect a terminator to the other port of the disk.4 Boot the system. The disk is detected while the system boots.5 Press the key sequence for your adapter to bring up the SCSI BIOS settings for

that disk.6 Set Host adapter SCSI ID = 7 or to an appropriate value for your configuration.7 Set Host Adapter BIOS in Advanced Configuration Options to Disabled.

Page 68: VERITAS Cluster Server for UNIX Fundamentals

2–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Hardware VerificationHardware may have been installed but not yet configured, or improperly configured. Basic hardware configuration considerations are described next.

Network

Test the network connections to ensure that each cluster system is accessible on the public network. Also verify that the cluster interconnect is working by temporarily assigning network addresses and using ping to verify communications. You must use different IP network addresses to ensure that traffic actually uses the correct interface.

Also, depending on the operating system, you may need to ensure that network interface speed and duplex settings are hard set and auto negotiation is disabled.

Storage

VCS is designed primarily as a shared data high availability product. In order to fail over an application from one system to another, both systems must have access to the data storage.

Other considerations when checking hardware include:• Switched-fabric zoning configurations in a SAN• Active-active versus active-passive on disk arrays

Hardware VerificationInspect the hardware:

Confirm that the hardware being used in the implementation is supported.Cable the cluster interconnect.Ensure that the hardware is configured properly for the HA environment:– Confirm public network connectivity for each

system.– Confirm that multiple channels to storage

exist.Can the operating system detect all storage?Are arrays configured properly?

Page 69: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

Software Requirements and RecommendationsFor the latest software requirements, refer to the VERITAS Cluster Server Release Notes for the specific operating system. You can also obtain requirements from the VERITAS Web site or by calling Sales or Support.

Before installing VCS, add the directory containing the VCS binaries to the PATH environment variable. The installation and other commands are located in the /sbin, /usr/sbin, and /opt/VRTSvcs/bin directories. Add the path to the VCS manual pages to the MANPATH variable.

Follow these recommendations to simplify installation, configuration, and management of the cluster:• Operating system: Although it is not a strict requirement to run the same

operating system version on all cluster systems, doing so greatly reduces complexity during the initial installation of VCS through ongoing maintenance of the cluster.

• Software configuration: Setting up identical operating system configurations helps ensure that your application services run properly on all cluster systems that are startup or failover targets for the service.

• Volume management software: Using storage management software, such as VERITAS Volume Manager and VERITAS File System, enhances high availability by enabling you to mirror data for redundancy and change the configuration or physical disks without interrupting services.

Software requirements:– Determine supported software.– Modify the PATH environment variable.

Software recommendations:– Use the same operating system version and patch

level on all systems.– Use identical configurations:

Configuration filesUser accountsDisabled abort sequence (Solaris)ssh or rsh configured during installation

– Use volume management software for storage.

Software Requirements and Recommendations

• support.veritas.com• www.veritas.com

• support.veritas.com• www.veritas.com

Page 70: VERITAS Cluster Server for UNIX Fundamentals

2–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

• Consider disabling the abort sequence on Solaris systems. When a Solaris system in a VCS cluster is halted with the abort sequence (STOP-A), it stops producing VCS heartbeats. To disable the abort sequence on Solaris systems, add the following line to the /etc/default/kbd file (create the file if it does not exist):KEYBOARD_ABORT=disableAfter the abort sequence is disabled, reboot the system.

• Enable ssh/rsh communication between systems. This enables you to install all cluster systems from the system on which you run the installation utility. If you cannot enable secure communications, you can install VCS individually on each system.

See the installation guide for your platform and version for the specific requirements for your environment.

Page 71: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

Software VerificationVerify that the VERITAS products in the high availability solution are compatible with the operating system versions in use or with the planned upgrades. • Verify that the required operating system patches are installed on the systems

before installing VCS.• Obtain VCS license keys.

You must obtain license keys for each cluster system to complete the license process. For new installations, use the VERITAS vLicense Web site, http://vlicense.veritas.com, or contact your VERITAS sales representative for license keys. For upgrades, contact VERITAS Support.Also, verify that you have the required licenses to run applications on all systems where the corresponding service can run.

• Verify that operating system and network configuration files are configured to enable application services to run identically on all target systems. For example, if a database needs to be started with a particular user account, ensure that user account, password, and group files contain the same configuration for that account on all systems that need to be able to run the database.

Software VerificationInspect the software:

Confirm that the operating system version is supported. Verify that the necessary patches are installed.Verify that software licenses are available or installed for VCS and applications.Verify that the operating system and network configuration files are the same.

• vlicense.veritas.com• VERITAS sales representative• VERITAS Support for upgrades

• vlicense.veritas.com• VERITAS sales representative• VERITAS Support for upgrades

Page 72: VERITAS Cluster Server for UNIX Fundamentals

2–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Preparing Cluster InformationVerify that you have the information necessary to install VCS. If necessary, document the information in your design worksheet.

Be prepared to supply:• Names of the systems that will be members of the cluster• A name for the cluster, beginning with a letter of the alphabet (a-z, A-Z)• A unique ID number for the cluster in the range 0 to 255

Avoid using 0 because this is the default setting and can lead to conflicting cluster numbers if other clusters are added later using the default setting. All clusters sharing a private network infrastructure (including connection to the same public network if used for low-priority links) must have a unique ID.

• Device names of the network interfaces used for the cluster interconnect

You can opt to configure additional cluster services during installation.• VCS user accounts: Add accounts or change the default admin account.• Web GUI: Specify a network interface and virtual IP address on the public

network during installation so that VCS configures a highly available Web management interface.

• Notification: Specify SMTP and SNMP information during installation to configure the cluster notification service.

• Broker nodes required for security (4.1)VCS can be configured to use VERITAS Security Services (VxSS) to provide secure communication between cluster nodes and clients, as described in the next section.

Verify that you have the information needed for theVCS installation procedure:

System namesLicense keysCluster name, ID numberNetwork interfaces for cluster interconnectVCS user names and passwordsNetwork interface for Web GUIIP address for Web GUISMTP server name and e-mail addressesSNMP Console name and message levels Root and authentication broker nodes for security

Preparing Cluster Information

Optional

Page 73: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

VERITAS Security ServicesVCS versions 4.1 and later can be configured to use VERITAS Security Services (VxSS) to provide secure communication between cluster nodes and clients, including the Java and the Web consoles. VCS uses digital certificates for authentication and uses SSL to encrypt communication over the public network.

In the secure mode, VCS uses platform-based authentication; VCS does not store user passwords. All VCS users are system users. After a user is authenticated, the account information does not need to be provided again to connect to the cluster (single sign-on).

Note: VERITAS Security Services are in the process of being implemented in all VERITAS products.

VxSS requires one system to act as a root broker node. This system serves as the main registration and certification authority and should be a system that is not a member of the cluster.

All cluster systems must be configured as authentication broker nodes, which can authenticate clients.

Security can be configured after VCS is installed and running. For additional information on configuring and running VCS in secure mode, see “Enabling and Disabling VERITAS Security Services” in the VERITAS Cluster Server User’s Guide.

VERITAS Security ServicesIf configured in secure mode, VCS uses VERITAS Security Services (VxSS) to provide secure communication:– Among cluster systems– Between VCS clients (Cluster Manager Java and Web

consoles) and cluster systemsVCS uses digital certificates for authentication and Secure Socket Layer (SSL) to encrypt communication over the public network.VxSS provides a single sign-on for authenticated user accounts. All cluster systems must be authentication broker nodes. VERITAS recommends using a system outside the cluster to serve as the root broker node.

Page 74: VERITAS Cluster Server for UNIX Fundamentals

2–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson described how to prepare sites and application services for use in the VCS high availability environment. Performing these preparation tasks ensures that the site is ready to deploy VCS, and helps illustrate how VCS manages application resources.

Next StepsAfter you have prepared your operating system environment for high availability, you can install VERITAS Cluster Server.

Additional Resources• VERITAS Cluster Server Release Notes

The release notes provide detailed information about hardware and software supported by VERITAS Cluster Server.

• VERITAS Cluster Server Installation GuideThis guide provides detailed information about installing VERITAS Cluster Server.

• http://support.veritas.comCheck the VERITAS Support Web site for supported hardware and software information.

Lesson SummaryKey Points – Verify hardware and software compatibility and

record information in a worksheet.– Prepare cluster configuration values before you

begin installation.Reference Materials– VERITAS Cluster Server Release Notes– VERITAS Cluster Server Installation Guide– http://support.veritas.com

Page 75: VERITAS Cluster Server for UNIX Fundamentals

Lesson 2 Preparing a Site for VCS 2–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

2

Lab 2: Validating Site PreparationLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 2 Synopsis: Validating Site Preparation,” page A-2

Appendix B provides step-by-step lab instructions.• “Lab 2: Validating Site Preparation,” page B-3

Appendix C provides complete lab instructions and solutions.• “Lab 2 Solutions: Validating Site Preparation,” page C-3

GoalThe purpose of this lab is to prepare the site, your classroom lab systems, for VCS installation.

ResultsThe system requirements are validated, the interconnect is configured, and the design worksheet is completed and verified.

PrerequisitesObtain any classroom-specific values needed for your classroom lab environment and record these values in your design worksheet included with the lab exercise instructions.

Visually inspect the classroom lab site.Complete and validate the design worksheet.Use the lab appendix best suited to your experience level:

Visually inspect the classroom lab site.Complete and validate the design worksheet.Use the lab appendix best suited to your experience level:

See the next slide for lab assignments.See the next slide for lab assignments.

Lab 2: Validating Site Preparation

train1

train2

train2train1Sample Value

SystemSystem

Your ValueSystem Definition

? Appendix A: Lab Synopses? Appendix B: Lab Details? Appendix C: Lab Solutions

? Appendix A: Lab Synopses? Appendix B: Lab Details? Appendix C: Lab Solutions

Page 76: VERITAS Cluster Server for UNIX Fundamentals

2–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 77: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3Installing VERITAS Cluster Server

Page 78: VERITAS Cluster Server for UNIX Fundamentals

3–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes the automated VCS installation process carried out by the VERITAS Common Product Installer.

ImportanceInstalling VCS is a simple, automated procedure in most high availability environments. The planning and preparation tasks you perform prior to starting the installation process ensure that VCS installs quickly and easily.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 79: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

Outline of Topics• Using the VERITAS Common Product Installer• VCS Configuration Files • Viewing the Default VCS Configuration• Other Installation Considerations

Describe other components to consider at installation time.

Other Installation Considerations

View the VCS configuration created during installation.

Viewing the Default VCS Configuration

Display the configuration files created during installation.

VCS Configuration Files

Install VCS using the VPI utility.Using the VERITAS Product Installer

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 80: VERITAS Cluster Server for UNIX Fundamentals

3–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the VERITAS Product Installer VERITAS ships high availability and storage foundation products with a VERITAS product installer (VPI) utility that enables you to install these products using the same interface.

Viewing Installation LogsAt the end of every product installation, the VPI creates three text files: • A log file containing any system commands executed and their output• A response file to be used in conjunction with the -responsefile option of

the VPI• A summary file containing the output of the VERITAS Product Installer scripts

These files are located in /opt/VRTS/install/logs. The names and locations of each file are displayed at the end of each product installation. It is recommended that these logs be kept for auditing and debugging purposes.

Using the VERITAS Product InstallerThe VERITAS product installer (VPI), therecommended installation procedure for VCS:

Performs environment checking to ensure that prerequisites are metEnables you to add product licensesIs started from the installer file in the CD mount point directoryRuns the installvcs utilityLogs user input and program output to files in:/opt/VRTS/install/logs

Page 81: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

The installvcs UtilityThe installvcs utility is used by the VPI to automatically install and configure a cluster. If remote root access is enabled, installvcs installs and configures all cluster systems you specify during the installation process.

The installation utility performs these high level tasks:• Installs VCS packages on all the systems in the cluster• Configures cluster interconnect links• Brings the cluster up without any application services

Make any changes to the new cluster configuration, such as the addition of any application services, after the installation is completed.

For a list of software packages that are installed, see the release notes for your VCS version and platform.

Options to installvcs

The installvcs utility supports several options that enable you to tailor the installation process. For example, you can:• Perform an unattended installation.• Install software packages without configuring a cluster.• Install VCS in a secure environment.• Upgrade an existing VCS cluster.

For a complete description of installvcs options, see the VERITAS Cluster Server Installation Guide.

The installvcs UtilityThe installvcs utility, called by VPI:

Uses the platform-specific operating system command to install the VCS packages on all the systems in the clusterConfigures Ethernet network links for the VCS communications interconnectBrings the cluster up with the ClusterService group, which manages the VCS Web GUI (if configured during installation)

The installvcs utility requires remote root access to other systems in the cluster while the script is being run.• The /.rhosts file: You can remove .rhosts files after VCS

installation.• ssh: No prompting is permitted.

The installvcs utility requires remote root access to other systems in the cluster while the script is being run.• The /.rhosts file: You can remove .rhosts files after VCS

installation.• ssh: No prompting is permitted.

Page 82: VERITAS Cluster Server for UNIX Fundamentals

3–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Automated VCS Installation ProcedureIf you use the VPI installer utility and select VCS from the product list, installvcs is started. You can also run installvcs directly from the command line. Using information you supply, the installvcs utility installs VCS and all bundled agents on each cluster system, installs a Perl interpreter, and sets up the LLT and GAB communication services. The utility also gives you the option to install and configure the Web-based Cluster Manager (Web Console) and to set up SNMP and SMTP notification features in the cluster.

As you use the installvcs utility, you can review summaries to confirm the information that you provide. You can stop or restart the installation after reviewing the summaries. Installation of VCS packages takes place only after you have confirmed the information. However, partially installed VCS files must be removed before running the installvcs utility again.

The installation utility is described in detail in the VERTIAS Cluster Server Installation Guide for each platform. The next sections provide a summary of the steps involved.

Install VCS packages.Install VCS packages.

Configure VCS.Configure VCS.

Automated VCS Installation Procedure

Invoke installer

Invoke installer

Enter license keys.Enter license keys.

Select optional packages.Select optional packages.

Configure SMTP and SNMP notification.

Configure SMTP and SNMP notification. Start VCS.Start VCS.

Configure the cluster (name, ID, interconnect).Configure the cluster

(name, ID, interconnect).

Configure the Web GUI (device name, IP address,

subnet mask).

Configure the Web GUI (device name, IP address,

subnet mask).

User inputto the script

What the script does

Enter system names.Enter system names. Verify communication and install VERITAS

infrastructure packages.

Verify communication and install VERITAS

infrastructure packages.

Set up VCS user accounts.Set up VCS user accounts.

Select root broker node.Select root broker node.

Page 83: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

Starting the Installation

To start the installation utility:1 Log on as the root user on a system connected by the network to the systems

where VCS is to be installed. The system from which VCS is installed does not need to be part of the cluster.

2 Insert the CD with the VCS software into a drive connected to the system.3 Start the VCS installation utility by starting either VPI or the installvcs

utility directly:./installvcsor./installer

The utility starts by prompting you for the names of the systems in the cluster.

The utility verifies that the systems you specify can communicate using ssh or rsh. If ssh binaries are found, the program confirms that ssh is set up to operate without requests for passwords or passphrases.

Licensing VCS

The installation utility verifies the license status of each system. If a VCS license is found on the system, you can use that license or enter a new license.

If no VCS license is found on the system, or you want to add a new license, enter a license key when prompted.

Configuring the Cluster

After licensing is completed, the installation utility:• Shows the list of VCS packages that will be installed• Determines whether any VCS packages are currently installed• Determines whether enough free disk space is available• Stops any VCS processes that might be running

When these checks are complete, the installation utility asks if you want to configure VCS. If you choose to do so, you are prompted for the following cluster configuration information:• A name for the cluster, beginning with a letter of the alphabet (a-z, A-Z)• A unique ID number for the cluster in the range 0 to 255

Avoid using 0 because this is the default setting and can lead to conflicting cluster numbers if other clusters are added later using the default setting. All clusters sharing the private network infrastructure (including connection to the same public network if used for low-priority links) must have a unique ID.

Page 84: VERITAS Cluster Server for UNIX Fundamentals

3–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring the Cluster Interconnect

After you enter the cluster ID number, the installation utility discovers and lists all NICs on the first system to enable you to configure the private network interfaces.

Note: With VCS 4.x, you can configure more than two Ethernet links and low-priority network links using the installation utility. A low-priority network link is a private link used only for less-frequent heartbeat communications without any status information under normal operating conditions. The cluster interconnect is described in more detail later in the course.

If you are using the same NICs for private heartbeat links on all systems, installvcs automatically configures the same set of interfaces for the cluster interconnect.

If you are using different interfaces, enter n when prompted and the utility prompts for the NICs of each system.

A verification message then displays a summary of the user input:Cluster information verification:

Cluster Name: mycluster

Cluster ID Number: 200

Private Heartbeat Links for train7: link1=dev0 link2=dev1

Private Heartbeat Links for train8:link1=dev0 link2=dev1

Configuring Security

If you choose to configure VxSS security, you are prompted to select the root broker node. The system acting as root broker node must be set up and running before installing VCS in the cluster. All cluster nodes are automatically set up as authentication broker nodes.

Configuring User Accounts

If you configured VxSS security, you are not prompted to add VCS users. When running in secure mode, system (UNIX) users and passwords are used to verify identity. VCS user names and passwords are no longer used in a secure cluster.

Configuring the Web Console

The installation utility describes the information required to configure Cluster Manager (Web Console). Configuring Cluster Manager is optional. To configure the Web Console, enter the following information when prompted:• A public NIC used by each system in the cluster• A virtual IP address and netmask for the Cluster Manager

The installation process creates a service group named ClusterService to make the Web Console highly available.

Page 85: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

If you type n and do not configure Cluster Manager, the installation program advances you to the screen enabling you to configure SMTP/SNMP notification. If you choose to configure VCS to send event notifications to SMTP e-mail services or SNMP management consoles, you need to provide the SMTP server name and e-mail addresses of people to be notified or SNMP management console name and message severity levels. Note that it is also possible to configure notification after installation.

Configuring SMTP/SNMP notification is described later in this course.

Completing the Installation

After you have entered all configuration information, the installation utility:1 Begins installing the packages on the first system

The same packages are installed on each machine in the cluster.2 Creates configuration files and copies them to each system3 Asks for confirmation to start VCS and its components on each system

Page 86: VERITAS Cluster Server for UNIX Fundamentals

3–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Installing VCS UpdatesUpdates for VCS are periodically created in the form of patches or maintenance packs to provide software fixes and enhancements. Before proceeding to configure your cluster, check the VERITAS Support Web site at http://support.veritas.com for information about any updates that might be available.

Download the latest update for your version of VCS according to the instructions provided on the Web site.

The installation instructions for VCS updates are included with the update pack.

Before you install an update, ensure that all prerequisites are met. At the end of the update installation, you may be prompted to run scripts to update agents or other portions of the VCS configuration. Continue through any additional procedures to ensure that the latest updates are applied.

Installing VCS UpdatesCheck for any updates before proceeding to configure your cluster: http://support.veritas.com

Updates are usually provided as patches or maintenance packs.Read the installation instructions included with the update to ensure that all prerequisites are met before you start the installation process.

Page 87: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

VCS Configuration Files VCS File LocationsThe VCS installation procedure creates several directory structures.• Commands:

/sbin, /usr/sbin, and /opt/VRTSvcs/bin• Configuration files:

/etc and /etc/VRTSvcs/conf/config• GUI configuration files:

/opt/VRTSvcs/gui/conf• Logging directory:

/var/VRTSvcs/log

The VCS installation procedure also adds several environment variables during installation, including these commonly used variables:• $VCS_CONF

/etc/VRTSvcs• $VCS_HOME

/opt/VRTSvcs/bin• $VCS_LOG

/var/VRTSvcs

VCS File Locations

Commonly used environment variables:$VCS_CONF: /etc/VRTSvcs $VCS_HOME: /opt/VRTSvcs/bin$VCS_LOG: /var/VRTSvcs

Commonly used environment variables:$VCS_CONF: /etc/VRTSvcs $VCS_HOME: /opt/VRTSvcs/bin$VCS_LOG: /var/VRTSvcs

Configuration files for the cluster interconnect/etc

Log files/var/VRTSvcs/log

Apache servelet engine for Web GUI/opt/VRTSvcs/gui/conf

Cluster configuration files /etc/VRTSvcs/conf/config

Executables, scripts, libraries/sbin, /usr/sbin, /opt/VRTSvcs/bin

ContentsDirectories

Page 88: VERITAS Cluster Server for UNIX Fundamentals

3–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Communication Configuration Files The installvcs utility creates these VCS communication configuration files:• /etc/llttab

The llttab file is the primary LLT configuration file and is used to:– Set system ID numbers.– Set the cluster ID number.– Specify the network device names used for the cluster interconnect.– Modify LLT behavior, such as heartbeat frequency.

• /etc/llthostsThe llthosts file associates a system name with a unique VCS cluster node ID number for every system in the cluster. This file is the same on all systems in the cluster.

• /etc/gabtabThis file contains the command line that is used to start GAB.

Cluster communication is described in detail later in the course.

• LLT configuration files:– /etc/llttab

Specifies the cluster ID number, the host name, and the LLT network interfaces used for the cluster interconnect

– /etc/llthostsLists the host names of each cluster system with its corresponding LLT node ID number

• GAB configuration file:/etc/gabtab

Specifies how many systems are members of the cluster and starts GAB.

Communication Configuration Files

Page 89: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

Cluster Configuration Files The following cluster configuration files are added as a result of package installation:• /etc/VRTSvcs/conf/config/main.cf• /etc/VRTSvcs/conf/config/types.cf

The installvcs utility modifies the main.cf file to configure the ClusterService service group, which includes the resources used to manage the Web-based Cluster Manager (Web Console). VCS configuration files are discussed in detail throughout the course.

VCS configuration files/etc/VRTSvcs/conf/config/types.cf/etc/VRTSvcs/conf/config/main.cf

include "types.cf"cluster VCS (…)system train1 ()system train2 ()group ClusterService (…)IP webip (Device = hme1Address = "192.168.105.101"NetMask = "255.255.255.0")…

Cluster Configuration Files

Cluster NameCluster Name

Information entered forWeb-based Cluster Manager

Information entered forWeb-based Cluster Manager

All the systems whereVCS is installed

All the systems whereVCS is installed

Page 90: VERITAS Cluster Server for UNIX Fundamentals

3–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Viewing the Default VCS ConfigurationViewing Installation ResultsAfter the initial installation, you can perform the following tasks to view the cluster configuration performed during the installation process.• List the VERITAS packages installed on the system:

Solarispkginfo | grep -i vrts

AIXlslpp -L | grep -i vrts

HP-UXswlist | grep -i vrts

Linuxrpm -qa | grep -i vrts

• Log onto the VCS Web Console using the IP address specified during installation:http://IP_Address:8181/vcs

• View the product documentation:/opt/VRTSvcsdc

• List the VERITAS packages installed.

• View VCS configuration files.

• Access the VCS Cluster Manager Web Console.http://IP_Address:8181/vcs

• Access the VCS documentation./opt/VRTSvcsdc

Viewing Installation Results

pkginfopkginfo lslpplslpp swlistswlist rpmrpm

llttabllttab

llthostsllthosts

gabtabgabtabmain.cfmain.cf

types.cftypes.cf

Page 91: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

Viewing StatusAfter installation is complete, you can check the status of VCS components.• View VCS communications status on the cluster interconnect using LLT and

GAB commands. This topic is discussed in more detail later in the course. For now, you can see that LLT is up by running the following command:lltconfigllt is running

• View GAB port a and port h memberships for all systems:gabconfig -aGAB Port Memberships===============================================Port a gen a36e003 membership 01Port h gen fd57002 membership 01

• View the cluster status:hastatus -sum

Viewing Status

– View LLT status:# lltconfigllt is running

– View GAB status:# gabconfig -aGAB Port Memberships================================Port a gen a36e003 membership 01Port h gen fd57002 membership 01

– View VCS status:# hastatus –sum-- System State Frozen A S1 RUNNING 0 A S2 RUNNING 0

Page 92: VERITAS Cluster Server for UNIX Fundamentals

3–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Other Installation Considerations

Fencing ConsiderationsIf you are using VCS with shared storage devices that support SCSI-3 Persistent Reservations, configure fencing after VCS is initially installed. You must have VCS 4.0 and Volume Manager 4.0 (or later) to implement fencing.

You can configure fencing at any time. However, if you set up fencing after you have service groups running, you must stop and restart the service groups for fencing to take effect.

The procedure for configuring fencing is provided later in the course.

I/O Fencing ConsiderationsI/O Fencing is the recommended method for protecting shared storage in a cluster environment.

Configure fencing after initial VCS installation if:– Your shared storage devices support SCSI-3

Persistent Reservations; and– You installed a 4.0 or later version of VCS; and– You are using Volume Manager 4.0 or later.

A detailed procedure is provided later in the course.

Page 93: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

Cluster Manager Java GUIYou can install the VCS Java-based Cluster Manager GUI during the cluster installation process as part of the installation process. You can also install Cluster Manager on any supported system manually using the appropriate operating system installation utility.

The next examples show how to install Cluster Manager on supported UNIX platforms.

Solaris

pkgadd -d /cdrom/pkgs VRTScscmAIX

cd /cdrom

installp -a -d ./pkgs/VRTScscm.rte.bff VRTScscm.rteHP-UX

swinstall -s /cdrom/cluster_server/pkgs VRTScscmLinux

Insert the VCS CD into a drive on the system. The software automatically mounts the CD on /mnt/cdrom. cd /mnt/cdrom/vcsgui

rpm -ihv VRTScscm-base-2.0.3-Linux.i386.rpm

Cluster Manager Java GUIYou can install VERITAS Cluster Manager on anysupported system.

Cluster Manager runs on UNIX and Windows systems.You can install Cluster Manager on cluster systems as part of the installvcs process. However, you are not required to have Cluster Manager on any cluster system.See the VERITAS Cluster Server Release Notes for details about platform support and installation procedures.

Access the Cluster Manager Java Console to verify installation.On UNIX systems, type hagui&.On Windows systems, start the GUI using the Cluster Manager desktop icon.

Access the Cluster Manager Java Console to verify installation.On UNIX systems, type hagui&.On Windows systems, start the GUI using the Cluster Manager desktop icon.

Page 94: VERITAS Cluster Server for UNIX Fundamentals

3–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Installing the Java Console on Windows

You can also install and use the VCS Java Console remotely from a Windows workstation. You do not need to have the VCS software installed locally on the system to use the Java Console. To install the VCS Cluster Manager (Java Console) on a Windows workstation:1 Insert the VCS CD into the drive on your Windows workstation.2 Using Windows Explorer, select the CD drive.3 Navigate to

\pkgs\WindowsInstallers\WindowsClusterManager\EN.4 Double-click Setup.exe.

The VCS InstallShield guides you through the installation process.

Page 95: VERITAS Cluster Server for UNIX Fundamentals

Lesson 3 Installing VERITAS Cluster Server 3–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

3

SummaryThis lesson described the procedure for installing VCS and viewing the cluster configuration after the installation has completed.

Next StepsAfter you install the VCS software, you can prepare your application services for the high availability environment.

Additional Resources• VERITAS Cluster Server Release Notes

This document provides important information regarding VERITAS Cluster Server (VCS) on the specified platform. It is recommended that you review this entire document before installing VCS.

• VERITAS Cluster Server Installation GuideThis guide provides information on how to install VERITAS Cluster Server on the specified platform.

• Web Resources– To verify that you have the latest operating system patches before installing

VCS, see the corresponding vendor Web site for that platform. For example, for Solaris, see http://sunsolve.sun.com.

– To contact VERITAS Technical Support, see:http://support.veritas.com

– To obtain VERITAS software licenses, see:http://vlicense.veritas.com

Lesson SummaryKey Points – Use the VERITAS Common Product Installer to

install VCS on UNIX systems.– Familiarize yourself with the installed and running

configuration.Reference Materials– VERITAS Cluster Server Release Notes– VERITAS Cluster Server Installation Guide– http://support.veritas.com– http://vlicense.veritas.com

Page 96: VERITAS Cluster Server for UNIX Fundamentals

3–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 3: Installing VCSLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 3 Synopsis: Installing VCS,” page A-6

Appendix B provides step-by-step lab instructions.• “Lab 3: Installing VCS,” page B-11

Appendix C provides complete lab instructions and solutions.• “Lab 3 Solutions: Installing VCS,” page C-13

GoalThe purpose of this lab exercise is to set up a two-system VCS cluster with a shared disk configuration and to install the VCS software using the installation utility.

Prerequisites• Pairs of students work together to install the cluster. Select a pair of systems or

use the systems designated by your instructor.• Obtain installation information from your instructor and record it in the design

worksheet provided with the lab instructions.

ResultsA two-system cluster is running VCS with one system running the ClusterService service group.

Lab 3: Installing VCS

train1 train2

# ./installer# ./installer

vcs1vcs1

Link 1:______Link 2:______

Software location:_______________________________

Subnet:_______

Link 1:______Link 2:______

Public:______ Public:______

4.x4.x

# ./installvcs# ./installvcsPre-4.0Pre-4.0

Page 97: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4VCS Operations

Page 98: VERITAS Cluster Server for UNIX Fundamentals

4–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewIn this lesson, you learn how to manage applications that are under the control of VCS. You are introduced to considerations that must be taken when managing applications in a highly available clustered environment.

ImportanceIt is important to understand how to manage applications when they are under VCS control. An application is a member of a service group that also contains resources necessary to run the application that needs to be managed. Applications must be brought up and down using the VCS interface rather than by using a traditional direct interface with the application. Application upgrades and backups are handled differently in a cluster environment.

Course Overview

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 99: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Outline of Topics• Managing Applications in a Cluster Environment• Service Group Operations• Using the VCS Simulator

Use the VCS Simulator to practice managing services.

Using the VCS Simulator

Perform common cluster administrative operations.

Service Group Operations

Describe key considerations for managing applications.

Managing Applications in a Cluster Environment

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 100: VERITAS Cluster Server for UNIX Fundamentals

4–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Managing Applications in a Cluster EnvironmentKey ConsiderationsIn a cluster environment, the application software is a resource that is a member of the service group. When an application is placed under control of VCS, you must change your standard administration practices for managing the application.

Consider a nonclustered, single-host environment running an Oracle database. A common method for shutting down the database is to log on as the database administrator (DBA) and use sqlplus to shut down the database.

In a clustered environment where Oracle is a resource in a failover service group, the same action causes a failover, which results in VCS detecting a fault (the database is offline) and bringing the database online on another system.

It is also normal and common to do other things in a nonclustered environment, such as forcibly unmounting a file system.

Under VCS, the manipulation of resources that are part of service groups and the service groups themselves need to be managed using VCS utilities, such as the GUI or CLI, with full awareness of resource and service group dependencies.

Alternately, you can freeze the service group to prevent VCS from taking action when changes in resource status are detected, as described later in this lesson.

Warning: In clusters that do not implement fencing, VCS cannot prevent someone with proper permissions from manually starting another instance of the application on another system outside of VCS control. VCS will eventually detect this and take corrective action, but it may be too late to prevent data corruption.

Key ConsiderationsAfter an application is placed under VCS control, you must change your management practices. Youhave two basic administrative approaches:

Use VCS to start and stop service groups and resources.Direct VCS not to intervene while you are performing administrative operations outside of VCS by freezing the service group.

You can mistakenly cause problems, such as forcing faults and preventing failover, if you manipulate resources outsideof VCS.

You can mistakenly cause problems, such as forcing faults and preventing failover, if you manipulate resources outsideof VCS.!

Page 101: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4VCS Management ToolsYou can use any of the VCS interfaces to manage the cluster environment, provided that you have the proper VCS authorization. VCS user accounts are described in more detail in the “VCS Configuration Methods” lesson.

For details about the requirements for running the graphical user interfaces (GUIs), see the VERITAS Cluster Server Release Notes and the VERITAS Cluster Server User’s Guide.

Note: You cannot use the Simulator to manage a running cluster configuration.

VCS Management Tools

Create, model, and test configurationsCannot be used to manage a running cluster configuration

Graphical user interfaceRuns on UNIX and Windows systems

Command-line interfaceRuns on the local system

Graphical user interface Runs on systems with supported Web browsers

Only authorized VCS user accounts have access to VCS administrative interfaces.

VCS Simulator Java GUI Web GUI CLI

Page 102: VERITAS Cluster Server for UNIX Fundamentals

4–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Service Group OperationsThis section describes common service group operations that operators and administrators need to perform. Your instructor will demonstrate using the Java GUI to perform each task as it is discussed in class.

Common OperationsThese common service group operations are described in more detail throughout this section:– Displaying attributes and status– Bringing service groups online– Taking service groups offline– Switching service groups– Freezing service groups– Bringing resources online– Taking resources offline– Clearing faults

Your instructor will demonstrate how you can use the VCS Java GUI to perform these tasks.

Ce.ico

Page 103: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Displaying Attributes and StatusKnowing how to display attributes and status about a VCS cluster, service groups, and resources helps you monitor the state of cluster objects and, if necessary, find and fix problems. Familiarity with status displays also helps you build an understanding of how VCS responds to events in the cluster environment, and the effects on application services under VCS control.

You can display attributes and status using the GUI or CLI management tools.

Display Cluster Status Using the CLI

To display cluster status, use either form of the hastatus command:• hastatus -sum[mary]

Show a static snapshot of the status of cluster objects.• hastatus

Show a continuous updated display of the status of cluster objects.

Displaying Attributes and StatusYou can view attributes for the following VCS objects using any VCS interface:– Clusters– Systems– Service groups– Resources– Resource types

Display status information to:– Determine the state of the cluster.– Analyze the causes of errors and correct them, when

necessary.

Logging In Status and Attributes

Use the hastatuscommand with no options for continuous display of cluster status information.

Use the hastatuscommand with no options for continuous display of cluster status information.

Page 104: VERITAS Cluster Server for UNIX Fundamentals

4–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Displaying Logs

You can display the HAD log to see additional status information about activity in the cluster. You can also display the command log to see how the activities you perform using the GUI are translated into VCS commands. You can also use the command log as a resource for creating batch files to use when performing repetitive configuration or administration tasks.

Note: Both the HAD log and command log can be viewed using the GUI.

The primary log file, the engine log, is located in /var/VRTSvcs/log/engine_A.log. Log files are described in more detail later in the course.

Displaying LogsHAD (engine) log:– Is located in /var/VRTSvcs/log/engine_A.log– Tracks all cluster activity– Is useful for solving configuration problems

Command log:– Tracks each command issued using a GUI – Useful for learning the CLI– Can be used for creating batch files– Can be printed, but is not stored on disk in a file

Show continuous hastatus display in one window and the command log in another to become familiar with VCS activities and operations.Show continuous hastatus display in one window and the command log in another to become familiar with VCS activities and operations.!

Page 105: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Bringing Service Groups OnlineWhen a service group is brought online, resources are brought online starting with the lowest (child) resources and progressing up the resource dependency tree to the highest (parent) resources.

In order to bring a failover service group online, VCS must verify that all nonpersistent resources in the service group are offline everywhere in the cluster. If any nonpersistent resource is online on another system, the service group is not brought online.

A service group is considered online if all of its autostart and critical resources are online.• An autostart resource is a resource whose AutoStart attribute is set to 1.• A critical resource is a resource whose Critical attribute is set to 1.

A service group is considered partially online if one or more nonpersistent resources is online and at least one resource is:• Autostart-enabled• Critical • Offline

The state of persistent resources is not considered when determining the online or offline state of a service group because persistent resources cannot be taken offline.

Bringing Service Groups OnlineService group attributes determine how service groups are brought online automatically by VCS.You may need to manually bring a service group online in some cases, for example, if a service group is taken offline for maintenance.Resources are brought online in dependency tree order from bottom child resources to top parent resources.A service group can be partially online.

VolumeNIC

IP Mount

Web

DiskGroup

System S1

WebSGWebSG

hagrp -onlinehagrp -online

Page 106: VERITAS Cluster Server for UNIX Fundamentals

4–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Bringing a Service Group Online Using the CLI

To bring a service group online, use either form of the hagrp command:• hagrp -online service_group -sys system

Provide the service group name and the name of the system where the service group is to be brought online.

• hagrp -online service_group -any Provide the service group name. The -any option, supported as of VCS 4.0, brings the service group online based on the group’s failover policy. Failover policies are described in detail later in the course.

Page 107: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Taking Service Groups OfflineWhen a service group is taken offline, resources are taken offline starting with the highest (parent) resources in each branch of the resource dependency tree and progressing down the resource dependency tree to the lowest (child) resources.

Persistent resources cannot be taken offline. Therefore, the service group is considered offline when all nonpersistent resources are offline.

Taking a Service Group Offline Using the CLI

To take a service group offline, use either form of the hagrp command:• hagrp -offline service_group -sys system

Provide the service group name and the name of a system where the service group is online.

• hagrp -offline service_group -anyProvide the service group name. The -any switch, supported as of VCS 4.0, takes a failover service group offline on the system where it is online. All instances of a parallel service group are taken offline when the -any switch is used.

Taking Service Groups OfflineService groups are taken offline:– Manually, for maintenance– Automatically, by VCS as part of

failoverA service group is considered offline when all nonpersistent resources on a system are offline.Optionally, take all resources offline in dependency tree order from top parent resources to bottom child resources.

VolumeNIC

IP Mount

Web

DiskGroup

S1

WebSGWebSG

hagrp -offlinehagrp -offline

Page 108: VERITAS Cluster Server for UNIX Fundamentals

4–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

.

Switching Service GroupsIn order to ensure that failover can occur as expected in the event of a fault, test the failover process by switching the service group between systems within the cluster.

Switching a Service Group Using the CLI

To switch a service group, type:hagrp -switch service_group -to system

Provide the service group name and the name of the system where the service group is to be brought online.

Switching Service GroupsA manual failover can be performed byswitching the service group betweensystems. VCS performs these actions:

Apply the rules for going offline and coming online.Take the service group offline on system S1.Bring the group online on system S2.Maintain state data:Only resources which were online on system S1 are brought online on system S2.

VolumeNIC

IP Mount

Web

DiskGroup

S2

WebSGWebSG

hagrp -switchhagrp -switch

Page 109: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Freezing a Service GroupWhen you freeze a service group, VCS continues to monitor the resources, but does not allow the service group (or its resources) to be taken offline or brought online. Failover is also disabled, even if a resource faults.

You can also specify that the freeze is in effect even if VCS is stopped and restarted throughout the cluster.

Warning: When frozen, VCS does not take action on the service group even if you cause a concurrency violation by bringing the service online on another system outside of VCS.

Freezing and Unfreezing a Service Group Using the CLI

To freeze and unfreeze a service group temporarily, type:hagrp -freeze service_group

hagrp -unfreeze service_group

To freeze a service group persistently, you must first open the configuration:haconf -makerw

hagrp -freeze service_group -persistent

hagrp -unfreeze service_group -persistent

To determine if a service group is frozen, display the Frozen (for persistent) and TFrozen (for temporary) service group attributes for a service group.hagrp -display service_group -attribute Frozen

Freezing a Service GroupFreezing a service group prevents it from beingtaken offline, brought online, or failed over.

Example uses:– Enable a DBA to perform database operations

outside of VCS control.– Perform application updates that require the

application to be stopped and restarted.A persistent freeze remains in effect through VCS restarts.

If a service group is frozen, VCS does not take the servicegroup offline even if you inadvertently start the serviceoutside of VCS on another system (concurrency violation).

If a service group is frozen, VCS does not take the servicegroup offline even if you inadvertently start the serviceoutside of VCS on another system (concurrency violation).

!hagrp -freezehagrp -freeze

Page 110: VERITAS Cluster Server for UNIX Fundamentals

4–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Bringing Resources OnlineIn normal day-to-day operations, you perform most management operations at the service group level.

However, you may need to perform maintenance tasks that require one or more resources to be offline while others are online. Also, if you make errors during resource configuration, you can cause a resource to fail to be brought online.

Bringing Resources Online Using the CLI

To bring a resource online, type:hares -online resource -sys system

Provide the resource name and the name of a system that is configured to run the service group.

Bringing Resources OnlineWhen you bring a resource online, VCS calls the online entry point of the agent that corresponds to the resource type.For example, when you bring a Mount resource online, the Mount agent online entry point mounts the file system using values specified in the resource attributes.In most configurations, resources are brought online automatically when VCS starts up and brings the service groups online.You may need to bring a resource online if it has been taken offline manually or faulted.

VolumeNIC

IP Mount

Web

DiskGroup

S1

WebSGWebSG

hares -onlinehares -online

Page 111: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Taking Resources OfflineTaking resources offline should not be a normal occurrence. Doing so causes the service group to become partially online, and availability of the application service is affected.

If a resource needs to be taken offline, for example, for maintenance of underlying hardware, then consider switching the service group to another system.

If multiple resources need to be taken offline manually, then they must be taken offline in resource dependency tree order, that is, from top to bottom.

Taking a resource offline and immediately bringing it online may be necessary if, for example, the resource must reread a configuration file due to a change.

Taking Resources Offline Using the CLI

To take a resource offline, type:hares -offline resource -sys system

Provide the resource name and the name of a system.

Taking Resources OfflineYou may need to take an individual resource offline to perform application maintenance.For example, you may want to shut down just the Oracle database instance to perform an update that requires the Oracle data files to be available.You must take resources offline in the order determined by the dependency tree.You must bring all resources back online, or they will not be brought online if the service group fails over.

VolumeNIC

IP Mount

Web

DiskGroup

S1

WebSGWebSG

hares -offlinehares -offline

Page 112: VERITAS Cluster Server for UNIX Fundamentals

4–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Clearing Resource FaultsA fault indicates that the monitor entry point is reporting an unexpected offline state for an online resource. This indicates a problem with the resource. Before clearing a fault, you must resolve the problem that caused the fault.

A faulted resource status prevents VCS from considering that system as a possible target during service group failover. Therefore, a faulted resource must be cleared before VCS can bring the resource and the corresponding service group online on that system. The VCS logs help you determine which resource has faulted and why, as described in more detail in later lessons.

After fixing the problem that caused the fault, you can clear a faulted resource on a particular system, or on all systems defined in the service group’s SystemList attribute.

Note: Persistent resource faults cannot be cleared manually. You must probe the resource so that the agent monitors the resource. The fault is automatically cleared after the resource is probed and the agent determines that the resource is back online. When you probe a resource, VCS directs the agent to run the monitor entry point, which returns the resource status.

Clearing Resource Faults Using the CLI

To clear a faulted resource, type:hares -clear resource [-sys system]

Clearing Resource FaultsFaulted resources must be cleared after you fix the underlying problem.Nonpersistent resources must be explicitly cleared.Persistent resources (such as NIC) are cleared when the problem is fixed and they are subsequently monitored (probed) by the agent.– Offline resources are probed periodically.– You can manually force a probe.

You can bring the resource online again after you have fixed the problems and the fault is cleared.

hares –clearhares -probehares –clearhares -probe

Page 113: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4

Provide the resource name and the name of a system where the resource has the FAULTED status. If the system name is not specified, then the resource is cleared on all systems on which it is faulted.

Probing Resources Using the CLI

To probe a resource, type:hares -probe resource -sys system

Provide the resource name and the name of the system where the resource status is to be checked.

Page 114: VERITAS Cluster Server for UNIX Fundamentals

4–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the VCS SimulatorYou can use the VCS Simulator as a tool for learning how to manage VCS operations and applications under VCS control. You can perform all basic service group operations using the Simulator.

The Simulator also has other uses as a configuration and test tool. For this lesson, the focus of the Simulator discussion is on using a predefined VCS configuration to practice performing administration tasks.

You can install the Simulator while installing VCS or you can download the Simulator from the VERITAS Web site at http://van.veritas.com. To locate the download site, search for they key word Simulator. No additional licensing is required to install and use the Simulator.

Using the VCS SimulatorYou can use the VCS Simulator as a tool for learning how to manage VCS, in addition to creating and testingcluster configurations.

The simulator runs on UNIX and Windows systems.You can perform all common operator and administrator tasks using either the Java GUI or the simulator-specific command-line interface.You can use predefined configuration files (main.cfand types.cf) or create service groups and resources.You can simulate faults to see how VCS responds.

Download the simulator from http://van.veritas.com.Download the simulator from http://van.veritas.com.

Page 115: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4The Simulator Java ConsoleA graphical user interface, referred to as the Simulator Java Console, is provided to create and manage Simulator configurations. Using the Simulator Java Console, you can run multiple Simulator configurations simultaneously.

To start the Simulator Java Console:• On UNIX systems:

a Set the PATH environment variable to /opt/VRTScssim/bin.b Set VCS_SIMULATOR_HOME to /opt/VRTScssim.c Type /opt/VRTSvcs/bin/hasimgui &

• On Windows systems, environment variables are set during installation. Start the Simulator Java Console by double clicking the icon on the desktop.

When the Simulator Java Console is running, a set of sample Simulator configurations is displayed, showing an offline status. You can start one or more existing cluster configurations and then launch an instance of the Cluster Manager Java Console for each running Simulator configuration.

You can use the Cluster Manager Java Console to perform all the same tasks as an actual cluster configuration. Additional options are available for Simulator configurations to enable you to test various failure scenarios, including faulting resources and powering off systems.

Simulator Java ConsoleUse the Simulator Java Console to:

Start and stop sample Simulator configurations.Launch the Cluster Manager Java Console.Create new Simulator configurations.Verify the configuration syntax.

Page 116: VERITAS Cluster Server for UNIX Fundamentals

4–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Creating a New Simulator ConfigurationWhen you add a Simulator cluster configuration, a new directory structure is created and populated with sample files based on the criteria you specify.

On UNIX systems, Simulator configurations are located in /opt/VRTSsim. On Windows, the Simulator repository is in C:\Program Files\VERITAS\VCS Simulator.

Within the Simulator directory, each Simulator configuration has a directory corresponding to the cluster name. When the Simulator is installed, several sample configurations are placed in the sim_dir, such as:

SOL_ORACLE: An two-node cluster with an Oracle service groupLIN_NFS: A two-node cluster with two NFS service groupsWIN_SQL_VVR_C1: One of two clusters in a global cluster with a SQL service group

When you add a cluster:• The default types.cf file corresponding to the selected platform is copied

from sim_dir/types to the sim_dir/cluster_name/conf/config directory.

• A main.cf file is created based on the sim_dir/sample_clus/conf/config/main.cf file, using the cluster and system names specified when adding the cluster.

Creating a New Simulator ConfigurationWhen you add a cluster configuration:

A one-node cluster is created using a system name based on the cluster name.You must assign a unique port numberPlatform selection determines which types.cf file is used.A directory structure with the cluster name is created in /opt/VRTSsim.

You can also copy a main.cf file to the /opt/VRTSsim/cluster_name/conf/configdirectory before starting the Simulated cluster.

You can also copy a main.cf file to the /opt/VRTSsim/cluster_name/conf/configdirectory before starting the Simulated cluster.

You can then start the cluster and launch the Cluster Manager Java GUI to test or modify the configuration.

Page 117: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4Simulator Command-Line InterfaceYou can use the Simulator command-line interface (CLI) to add and manage simulated cluster configurations. While there are a few commands specific to Simulator activities, such as cluster setup shown in the slide, in general the hasim command syntax follows the corresponding ha commands used to manage an actual cluster configuration.

The procedure used to initially set up a Simulator cluster configuration is shown below. The corresponding commands are displayed in the slide.

Note: This procedure assumes you have already set the PATH and VCS_SIMULATOR_HOME environment variables.

1 Change to the /opt/VRTSsim directory if you want to view the new structure created when adding a cluster.

2 Add the cluster configuration, specifying a unique cluster name and port. For local clusters, specify -1 as the WAC port.

3 Start the cluster on the first system.4 Set the VCS_SIM_PORT and WAC_SIM_PORT environment variables to the

values you specified when adding the cluster.

Now you can use hasim commands or Cluster Manager to test or modify the configuration.

Simulator Command-Line InterfaceUse a separate terminal window for each Simulator configuration.

# cd /opt/VRTSsim

# hasim –setupclus myclus –simport 16555 –wacport -1

# hasim –start myclus_sys1 –clus myclus

# VCS_SIM_PORT=16555

# WAC_SIM_PORT=-1

# export VCS_SIM_PORT WAC_SIM_PORT

# hasim –clus –display

< Output is equivalent to haclus –display >

# hasim –sys –state$System Attribute Valuemyclus_sys1 SysState Running

hasimhasim

Page 118: VERITAS Cluster Server for UNIX Fundamentals

4–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the Java GUI with the SimulatorAfter the simulator is started, you can use the Java GUI to connect to the simulated cluster. When the Cluster Monitor is running, select File—>New Simulator and select the following values:• Host name: Enter the name of the system where the Simulator is running. You

can use localhost as the host name if you are running the simulator on the same system.

• Failover retries: Retain the default of 12.• Configuration for: Select the same platform specified when you initially added

the cluster configuration.– Solaris– Windows 2000– Linux– AIX– HP-UXIf you do not select the platform that matches the types.cf file in the simulated cluster configuration, the wizards display error messages.

Note: If you receive a message that the GUI is unable to connect to the Simulator:

› Verify that the Simulator is running.› Check the port number.

Using Cluster Manager with the SimulatorTo administer a simulated cluster using Cluster Manager:1. Start Cluster Manager.2. Select File—>New

Simulator.3. Enter the host name of the

local system.4. Verify that the port for that

cluster configuration is selected.

5. Select the platform from the drop-down list.

Page 119: VERITAS Cluster Server for UNIX Fundamentals

Lesson 4 VCS Operations 4–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

4SummaryIn this lesson, you learned how to manage applications that are under control of VCS.

Next StepsNow that you are more comfortable managing applications in a VCS cluster, you can prepare your application components and deploy your cluster design.

Additional Resources• http://van.veritas.com

The VCS Simulator software is available for download from the VERITAS Web site.

• VERITAS Cluster Server Release NotesThe release notes provide detailed information about hardware and software supported by VERITAS Cluster Server.

• VERITAS Cluster Server User’s GuideThis guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

Lesson SummaryKey Points – Use VCS tools to manage applications under VCS

control.– The VCS Simulator can be used to practice

managing resources and service groups.Reference Materials– VERITAS Architect Network (VAN):http://van.veritas.com

– VERITAS Cluster Server Release Notes– VERITAS Cluster Server User’s Guide

Page 120: VERITAS Cluster Server for UNIX Fundamentals

4–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 4: Using the VCS SimulatorLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 4 Synopsis: Using the VCS Simulator,” page A-18

Appendix B provides step-by-step lab instructions.• “Lab 4: Using the VCS Simulator,” page B-21

Appendix C provides complete lab instructions and solutions.• “Lab 4 Solutions: Using the VCS Simulator,” page C-35

GoalThe purpose of this lab is to reinforce the material learned in this lesson by performing a directed series of operator actions on a simulated VCS configuration.

PrerequisitesObtain the main.cf file for this lab exercise from the location provided by your instructor.

ResultsEach student has a Simulator running with the main.cf file provided for the lab exercise.

Lab 4: Using the VCS Simulator1. Start the Simulator Java GUI.

hasimgui &2. Add a cluster.3. Copy the preconfigured

main.cf file to the new directory.

4. Start the cluster from the Simulator GUI.

5. Launch the Cluster Manager Java Console

6. Log in using the VCS account oper with password oper. This account demonstrates different privilege levels in VCS.

See next slide for classroom valuesSee next slide for lab assignments.See next slide for lab assignments.

Page 121: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5Preparing Services for VCS

Page 122: VERITAS Cluster Server for UNIX Fundamentals

5–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how to prepare application services for use in the VCS high availability environment. Performing these preparation tasks also helps illustrate how VCS manages application resources.

ImportanceBy following these requirements and recommended practices for preparing to configure service groups, you can ensure that your hardware, operating system, and application resources are configured to enable VCS to manage and monitor the components of the high availability services.

Course Overview

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 123: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Outline of Topics• Preparing Applications for VCS • One-Time Configuration Tasks• Testing the Application Service • Stopping and Migrating a Service• Validating the Design Worksheet

Stop resources and manually migrate a service.

Stopping and Migrating a Service

Validate the design worksheet using configuration information.

Validating the Design Worksheet

Test the application services before placing them under VCS control.

Testing the Application Service

Perform one-time configuration tasks.One-Time Configuration Tasks

Prepare applications for the VCS environment.

Preparing Applications for VCS

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 124: VERITAS Cluster Server for UNIX Fundamentals

5–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Preparing Applications for VCSApplication Service Component ReviewAn application service is the service that the end-user perceives when accessing a particular network address. An application service typically consists of multiple components, some hardware- and some software-based, all cooperating together to produce a service.

For example, a service can include application software (processes), a file system containing data files, a physical disk on which the file system resides, one or more IP addresses, and a NIC for network access.

If this application service needs to be migrated to another system for recovery purposes, all of the components that compose the service must migrate together to re-create the service on another system.

Application Service Overview

Process

End Users

IP Address

File System

NetworkNetwork

StorageStorage

ApplicationApplication

NIC

Page 125: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Configuration and Migration Procedure Use the procedure shown in the diagram to prepare and test application services on each system before placing the service under VCS control. Use the design worksheet to obtain and record information about the service group and each resource. This is the information you need to configure VCS to control these resources.

Details are provided in the following section.

Configuration and Migration Procedure

Ready for VCS

Ready for VCS

More Systems?

More Systems?

N

Y

Perform one-timeconfiguration tasks on

each system.

Perform one-timeconfiguration tasks on

each system.

Start, verify, and stop services on

one system at a time.

Start, verify, and stop services on

one system at a time.

Create or prepare operating system and application resources initially on each system for each service that will be placed under VCS control; this is not part of VCS operation.Manually test each service on each system that is a startup or failover target before placing it under VCS control.Configuring services on a test cluster first is recommended to minimize impacts to production environments.

Page 126: VERITAS Cluster Server for UNIX Fundamentals

5–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

One-Time Configuration TasksIdentifying Components The first step in preparing services to be managed by VCS is to identify the components required to support the services. These components should be itemized in your design worksheet and may include the following, depending on the requirements of your application services:• Shared storage resources:

– Disks or components of a logical volume manager, such as Volume Manager disk groups and volumes

– File systems to be mounted– Directory mount points

• Network-related resources:– IP addresses– Network interfaces

• Application-related resources:– Procedures to manage and monitor the application– The location of application binary and data files

The following sections describe the aspects of these components that are critical to understanding how VCS manages resources.

Identifying ComponentsShared storage resources:– Disk or components of a logical volume manager,

such as Volume Manager disk groups and volumes– File systems to be mounted– Mount point directories

Network resources:– IP addresses– Network interfaces

Application resources:– Identical installation and configuration procedures– Procedures to start, stop, and monitor– Location of data, binary, and configuration files

Page 127: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Configuring Shared StorageThe diagram shows the procedure for configuring shared storage on the initial system. In this example, Volume Manager is used to manage shared storage on a Solaris system.

Note: Although examples used throughout this course are based on VERITAS Volume Manager, VCS also supports raw disks and other volume managers. VxVM is shown for simplicity—objects and commands are essentially the same on all platforms. The agents for other volume managers are described in the VERITAS Cluster Server, Implementing Local Clusters participant guide.

Preparing shared storage, such as creating disk groups, volumes, and file systems, is performed once, from one system. Then you must create mount point directories on each system.

The options to mkfs may differ depending on platform type, as displayed in the following examples.

Solaris

mkfs -F vxfs /dev/vx/rdsk/DemoDG/DemoVolAIX

mkfs -V vxfs /dev/vx/rdsk/DemoDG/DemoVolHP-UX

mkfs -F vxfs /dev/vx/rdsk/DemoDG/DemoVolLinux

mkfs -t vxfs /dev/vx/rdsk/DemoDG/DemoVol

Configuring Shared Storage

Initialize disks.Initialize disks.

Create a disk group.Create a disk group.

Create a volume.Create a volume.

Make a file system.Make a file system.

Make a mount point.Make a mount point.

vxdisksetup -i disk_dev

vxdg init DemoDG DemoDG01=disk_dev

vxassist -g DemoDG make DemoVol 1g

mkfs args vxfs /dev/vx/rdsk/DemoDG/DemoVol

mkdir /demo

From One SystemFrom One System

AIX HP-UX Linux

Each SystemEach System

Solaris

Volume Manager ExampleVolume Manager Example

Page 128: VERITAS Cluster Server for UNIX Fundamentals

5–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring the Network In a high availability environment, the IP address that is used by the client to access an application service should not be tied to a specific system because the same service can be provided by any system in the cluster. VCS uses the concept of virtual IP addresses to differentiate between IP addresses associated with a specific system and IP addresses associated with an application service. In order to configure a virtual IP address, an administrative IP address must be up on the network interface.

Administrative IP Addresses

Administrative IP addresses (also referred to as base IP addresses or maintenance IP addresses) are controlled by the operating system. The administrative IP addresses are associated with a physical network interface on the system, such as qfe1 on Solaris systems, and are configured whenever the system is brought up. These addresses are used to access a specific system over the network and can also be used to verify that the system is physically connected to the network even before an application is brought up.

Configuring an Administrative IP Address

The procedures for configuring an administrative IP address vary by platform. Examples are displayed on the following page.

Note: The administrative IP address is often already configured, in which case, you only need to verify that it is up.

Configuring an Administrative IP Address For public network access to high availability services, you must configure an administrative IP address associated with the physical network interface.

Each system needs a unique administrative IP address for each interface.Configure the operating system to bring up the administrative IP address during system boot.The IP addresses are used by VCS to monitor network interfaces.These addresses are also sometimes referred to as base, maintenance, or test IP addresses.

The administrative IP address may already beconfigured and only needs to be verified.

Solaris AIX HP-UX LinuxProcedure

Page 129: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Solaris1 Create /etc/hostname.interface with the desired interface name so

that the IP address is configured during system boot:train14_qfe1

2 Edit /etc/hosts and assign an IP address to the interface name.166.98.112.14 train14_qfe1

3 Use ifconfig to manually configure the IP address to test the configuration without rebooting:ifconfig qfe1 inet 166.98.112.114 netmask +ifconfig qfe1 up

AIX1 Use SMIT or mktcpip to configure the IP address to come up during system

boot.2 Edit /etc/hosts and assign an IP address to the interface name.

166.98.112.14 train14_en13 Use ifconfig to manually configure the IP address or it will be configured

during the next reboot.ifconfig en1 inet 166.98.112.114 netmask +ifconfig en1 up

HP-UX1 Add an entry in /etc/rc.config.d/netconf to include the configuration

information for the interface.INTERFACE_NAME[0]=lan2IP_ADDRESS[0]=192.12.25.3SUBNET_MASK[0]=“255.255.255.0”BROADCAST_ADDRESS[0]=“”DHCP_ENABLE[0]=“0”

2 Edit /etc/hosts and assign an IP address to the interface name.166.98.112.14 train14_lan2

3 Use ifconfig to manually configure the IP address to test the configuration without rebooting:ifconfig lan2 inet 166.98.112.114 ifconfig lan2 up

Page 130: VERITAS Cluster Server for UNIX Fundamentals

5–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Linux1 Add an entry in the appropriate file in /etc/sysconfig/networking/

devices to include the configuration information for the interface.# cd /etc/sysconfig/networking/devices# lsifcfg-eth0 ifcfg-eth1 ifcfg-eth2 ifcfg-eth3 ifcfg-eth4# more ifcfg-eth0DEVICE=eth0BOOTPROTO=staticBROADCAST=166.98.112.255IPADDR=166.98.112.14NETMASK=255.255.255.0NETWORK=166.98.112.0ONBOOT=yesGATEWAY=166.98.112.1TYPE=EthernetUSERCTL=noPEERDNS=no

2 Edit /etc/hosts and assign an IP address to the interface name.166.98.112.14 train14_lan2

3 Use ifconfig to manually configure the IP address to test the configuration without rebooting:ifconfig eth2 166.98.112.14 netmask 255.255.255.0ifconfig eth2 up

Page 131: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Other Network Configuration Tasks

Depending on your environment, other network configuration may be required to complete the network configuration for both administrative and virtual IP addresses. Examples are:• Add administrative IP addresses to /etc/hosts files so that these addresses

can be resolved without relying on an outside name service.• Add entries to the name server:

– Include administrative IP addresses if you want these addresses to be accessible on the public network.

– Include the virtual IP addresses with virtual host names for the high availability services.

• Configure all other applicable files, such as:– /etc/resolv.conf– /etc/nsswitch.conf

Work with your network administrator to ensure that any necessary configuration tasks are completed.

Page 132: VERITAS Cluster Server for UNIX Fundamentals

5–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring the ApplicationYou must ensure that the application is installed and configured identically on each system that is a startup of the failover target to manually test the application after all dependent resources are configured and running.

This ensures that you have correctly identified the information used by the VCS agent scripts to control the application.

Note: The shutdown procedure should be a graceful stop, which performs any cleanup operations.

Configuring the Application

Determine file locations:– Shared or local storage– Binaries, data,

configurationIdentify startup, monitor, and shutdown procedures.Depending on the application needs:– Create user accounts.– Configure environment

variables.– Apply licenses.– Set up configuration files.

Install and configure applications identically on each target system.

/sbin/orderprocup

ArgumentsOptional Attributes

DemoProcessResource Name

/bin/shPathNameRequired Attributes

ProcessResource Type

DemoSGService Group NameSample ValueResource Definition

Page 133: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Testing the Application ServiceBefore configuring a service group in VCS to manage an application service, test the application components on each system that can be a startup or failover target for the service group. Following this best practice recommendation ensures that VCS will successfully manage the application service after you configure a service group to manage the service.

This procedure emulates how VCS manages application services. The actual commands used may differ from those used in this lesson. However, conceptually, the same type of action is performed by VCS.

Testing the Application Service

Bring up resources.Bring up resources.

Ready for VCS

Ready for VCS

More Systems?

More Systems?

Stop resources.Stop resources.

Bring up resources.Bring up resources.

N

Y

Test the application.Test the application.

S1

S2…Sn

Test the application.Test the application.

Stop resources.Stop resources.

Shared storageShared storage

Start up all resources in dependency order.

Virtual IP addressVirtual IP address

Application softwareApplication software

S2

Page 134: VERITAS Cluster Server for UNIX Fundamentals

5–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Bringing Up Resources

Shared Storage

Verify that shared storage resources are configured properly and accessible. The examples shown in the slide are based on using Volume Manager.1 Import the disk group.2 Start the volume.3 Mount the file system.

Mount the file system manually for the purposes of testing the application service. Do not configure the operating system to automatically mount any file system that will be controlled by VCS.If the file system is added to /etc/vfstab, it will be mounted on the first system to boot. VCS must control where the file system is mounted.Examples of mount commands are provided for each platform.

Solaris

mount -F vxfs /dev/vx/dsk/ProcDG/ProcVol /processAIX

mount -V vxfs /dev/vx/dsk/ProcDG/ProcVol /processHP-UX

mount -F vxfs /dev/vx/dsk/ProcDG/ProcVol /processLinux

mount -t vxfs /dev/vx/dsk/ProcDG/ProcVol /process

Bringing Up Resources: Shared Storage1. Import the disk group:

vxdg import DemoDG

2. Start the volume:vxvol –g DemoDG start DemoVol

3. Mount the file system:mount –F vxfs /dev/vx/dsk/DemoDG/DemoVol /demo

Do not configure the operating system to automatically mount file systems that will be controlled by VCS.Verify that there are no entries in the file system startup table (for example /etc/vfstab).

Solaris Volume ManagerExample

Solaris Volume ManagerExample

Page 135: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Configuring Application IP Addresses

Configure the application IP addresses associated with specific application services to ensure that clients can access the application service using the specified address.

Application IP addresses are configured as virtual IP addresses. On most platforms, the devices used for virtual IP addresses are defined as interface:number.

Solaris

The qfe1:1 device is used for the first virtual IP address on the qfe1 interface; qfe1:2 is used for the second.1 Plumb the virtual interface and bring up the IP on the next available logical

interface:ifconfig qfe1 addif 192.168.30.132 up

2 Edit /etc/hosts to assign a virtual hostname (application service name) to the IP address.192.168.30.132 process_services

Configuring Application (Virtual) IP AddressesApplication IP addresses are:– Added as virtual IP addresses to the network

interface– Associated with an application service; resolved by

the name service– Controlled by the high availability software– Migrated to other systems if the current system fails– Also called service group or floating IP addressesApplication IP address considerations for HA:– Verify that an administrative IP address is already

configured on the interface.– Do not configure the application IP to come online

during system boot.

Solaris AIX HP-UX LinuxProcedure

Page 136: VERITAS Cluster Server for UNIX Fundamentals

5–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

AIX

The en1 device is used for all virtual IP addresses with the alias keyword to ifconfig.1 Plumb the virtual interface and bring up the IP on the next available logical

interface:ifconfig en1 inet 192.168.30.13 netmask 255.255.255.0 \alias

2 Edit /etc/hosts to assign a virtual hostname (application service name) to the IP address.192.168.30.13 process_services

HP-UX

The lan2:1 device is used for the first virtual IP address on the lan2 interface; lan2:2 is used for the second IP address.1 Configure the IP address using the ifconfig command.

ifconfig lan2:1 inet 192.168.30.13 2 Bring the IP address up.

ifconfig lan2:1 up3 Edit /etc/hosts to assign a virtual hostname (application service name) to

the IP address.192.168.30.13 process_services

Linux

The eth0:1 device is used for the first virtual IP address on the eth0 interface; eth0:2 is used for the second IP address.1 Configure the IP address using the ifconfig command.

ifconfig eth0:1 192.168.30.13 ifconfig eth0:1 up

2 Edit /etc/hosts to assign a virtual hostname (application service name) to the IP address.192.168.30.13 process_services

Page 137: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Starting the Application

When all dependent resources are available, you can start the application software. Ensure that the application is not configured to start automatically during system boot. VCS must be able to start and stop the application using the same methods you use to control the application manually.

Starting the ApplicationManually start the application for testing purposes.

An example command line for a fictional application:/sbin/orderproc up

Do not configure the operating system to automatically start the application during system boot. Verify that there are no startup files in the system startup directory (for example /etc/rc2.d).

Page 138: VERITAS Cluster Server for UNIX Fundamentals

5–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Verifying ResourcesYou can perform some simple steps, such as those shown in the slide, to verify that each component needed for the application service to function is operating at a basic level.

This helps you identify any potential configuration problems before you test the service as a whole, as described in the “Testing the Integrated Components” section.

Verifying Resources

Verify the virtual IP.Verify the virtual IP.

Verify the application.Verify the application.

Verify the disk group.Verify the disk group.

Verify the volume.Verify the volume.

Verify the file system.Verify the file system.

Verify the admin IP.Verify the admin IP.

vxdg list DemoDG

dd if=/dev/vx/rdsk/DemoDG/DemoVol \of=/dev/null count=1 bs=128

mount | grep /demo

ping same_subnet_IP

ifconfig arguments

ps arguments | grep process

Page 139: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Testing the Integrated ComponentsWhen all components of the service are running, test the service in situations that simulate real-world use of the service.

The example in the slide describes how you can test a database service. Another example that illustrates how you can test your service is NFS. If you are preparing to configure a service group to manage an exported file system, verify that you can mount the exported file system from a client on the network. This is described in more detail later in the course.

Testing the Integrated ComponentsTest the application service using simulated or real world scenarios, if possible.For example, if you have an application with a back-end database, you can:1. Start the database (and listener process).2. Start the application.3. Connect to the application from the public

network using the client software to verify name resolution to the virtual IP address.

4. Perform user tasks, as applicable; perform queries, make updates, and run reports.

Page 140: VERITAS Cluster Server for UNIX Fundamentals

5–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Stopping and Migrating an Application ServiceStopping Application ComponentsStop resources in the order of the dependency tree from the top down after you have finished testing the service. You must have all resources offline in order to migrate the application service to another system for testing. The procedure also illustrates how VCS stops resources.

The ifconfig options are platform-specific, as displayed in the following examples.

Solaris

ifconfig qfe1:1 unplumbAIX

ifconfig en1 192.168.30.13 deleteHP-UX

ifconfig lan2:1 downLinux

ifdown eth0:1

Stopping Application Components

Stop the volume.

Deport the disk group.

Stop the application.

Take down the virtual IP. Unmount the file system.

/sbin/orderproc stop

ifconfig … umount /demo

vxvol –g DemoDG stop DemoVol

vxdg –g DemoDG deport DemoDGStop resources in the order of the dependency tree.Stop resources in the order of the dependency tree.

Page 141: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Manually Migrating an Application ServiceAfter you have verified that the application service works properly on one system, manually migrate the service between all intended target systems. Performing these operations enables you to:• Ensure that your operating system and application resources are properly

configured on all potential target cluster systems.• Validate or complete your design worksheet to document the information

required to configure VCS to manage the services.

Use the procedures described in this lesson to configure and test the underlying operating system resources.

Manually Migrating an Application Service

Process

IP Address

File System

NetworkNetwork

StorageStorage

ApplicationApplication

S1 S2

NIC

Page 142: VERITAS Cluster Server for UNIX Fundamentals

5–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Validating the Design WorksheetDocumenting Resource AttributesIn order to configure the operating system resources you have identified as requirements for application service, you need the detailed configuration information from the design worksheet. A design diagram is helpful to show the relationships between the resources, which determines the order in which you configure, start, and stop resources. These relationships are also defined in the design worksheet as part of the service group definition, shown in the “Documenting Resource Dependencies” section.

Note: If your systems are not configured identically, you must note those differences in the design worksheet. The “Online Configuration of Service Groups” lesson shows how you can configure a resource with different attribute values for different systems.

Documenting Resource Attributes

Process

File System

Volume

DiskGroup

NetworkInterface

IP Address

Use the design worksheet to document details for configuring resources.Note any attributes that are different among systems, for example, network interface device names.

255.255.255.0NetMaskOptional Attributes

DemoIPResource Name

192.168.30.13Addressqfe1Device

Required AttributesIPResource Type

DemoSGService Group NameSample ValueResource Definition

Page 143: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Checking Resource AttributesVerify that the resources specified in your design worksheet are appropriate and complete for your platform. Refer to the VERITAS Cluster Server Bundled Agents Reference Guide before you begin configuring resources.

The examples displayed in the slides in this lesson are based on the Solaris operating system. If you are using another platform, your resource types and attributes may be different.

Checking Resource AttributesUse the VERITAS Cluster Server Bundled Agents Reference Guide to determine:– Required attributes– Optional attributes– Allowed values

Examples in slides show Solaris resource types. Not all platforms have the same resources or attributes.

Solaris HP-UXAIX Linux

Bundled Agents Reference Guides

Page 144: VERITAS Cluster Server for UNIX Fundamentals

5–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Documenting Resource DependenciesEnsure that the steps you perform to bring resources online and take them offline while testing the service are accurately reflected in the design worksheet. Compare the worksheet with service group diagrams you have created or that have been provided to you.

The slide shows the resource dependency definition for the application used as an example in this lesson.

Documenting Resource DependenciesVerify that the resource dependencies are listed in the worksheet.Compare the parent and child resources to your service group diagrams.Verify that the relationships are correct according to the testing you performed.

DemoVolDemoMountDemoNICDemoIPDemoMountDemoProcess

Resource Dependency DefinitionDemoSGService Group

RequiresDemoDGDemoVol

DemoIPDemoProcess

Child ResourceParent Resource

VolumeNIC

IP Mount

Process

DiskGroup

Page 145: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Validating Service Group AttributesCheck the service group attributes in your design worksheet to ensure that the appropriate startup and failover systems are listed. Other service group attributes may be included in your design worksheet, according to the requirements of each service.

Service group definitions consist of the attributes of a particular service group. These attributes are described in more detail later in the course.

Validating Service Group Attributes

S1

S2

DemoSG

Startup System

Failover System

S1AutoStartListOptional Attributes

S1=0, S2=1SystemListPriorityFailoverPolicy

Required AttributesDemoSGGroupSample ValueService Group Definition

Page 146: VERITAS Cluster Server for UNIX Fundamentals

5–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson described how to prepare sites and application services for use in the VCS high availability environment. Performing these preparation tasks ensures that the site is ready to deploy VCS, and helps illustrate how VCS manages application resources.

Next StepsAfter you have prepared your operating system environment and applications for high availability, you can install VERITAS Cluster Server and then configure service groups for your application services.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This guide describes each bundled agent in detail.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• High Availability Using VERITAS Cluster Server, Implementing Local ClustersThis course provides detailed information on advanced clustering topics, focusing on configurations of clusters with more than two nodes.

Lesson SummaryKey Points – Prepare each component of a service and

document attributes.– Test services in preparation for configuring

VCS service groups.Reference Materials– VERITAS Cluster Server Bundled Agents

Reference Guide– VERITAS Cluster Server User's Guide

Page 147: VERITAS Cluster Server for UNIX Fundamentals

Lesson 5 Preparing Services for VCS 5–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

5

Lab 5: Preparing Application ServicesLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 5 Synopsis: Preparing Application Services,” page A-24

Appendix B provides step-by-step lab instructions.• “Lab 5: Preparing Application Services,” page B-29

Appendix C provides complete lab instructions and solutions.• “Lab 5 Solutions: Preparing Application Services,” page C-51

GoalThe purpose of this lab is to prepare the loopy process service for high availability.

PrerequisitesObtain any classroom-specific values needed for your classroom lab environment and record these values in your design worksheet included with the lab exercise instructions.

ResultsEach student’s service can be started, monitored, and stopped on each cluster system.

disk1

bobDG1/bob1 bobVol1

Lab 5: Preparing Application Services

disk2

sueDG1sueVol1 /sue1

NIC

IP Address

while truedoecho “…”

done

/bob1/loopy

Disk/Lun Disk/Lun

NIC

IP Address

while truedoecho “…”

done

/sue1/loopy

See next slide for classroom values.See next slide for classroom values.

Page 148: VERITAS Cluster Server for UNIX Fundamentals

5–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 149: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6VCS Configuration Methods

Page 150: VERITAS Cluster Server for UNIX Fundamentals

6–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson provides an overview of the configuration methods you can use to create and modify service groups. This lesson also describes how VCS manages and protects the cluster configuration.

ImportanceBy understanding all methods available for configuring VCS, you can choose the tools and procedures that best suit your requirements.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 151: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Outline of Topics• Overview of Configuration Methods• Controlling Access to VCS• Online Configuration• Offline Configuration• Starting and Stopping VCS

Describe the offline configuration method.

Offline Configuration

Start and stop VCS.Starting and Stopping VCS

Describe the offline configuration method.

Online Configuration

Set user account privileges to control access to VCS.

Controlling Access to VCS

Compare and contrast VCS configuration methods.

Overview of Configuration Methods

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 152: VERITAS Cluster Server for UNIX Fundamentals

6–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Overview of Configuration MethodsVCS provides several tools and methods for configuring service groups and resources, generally categorized as:• Online configuration: You can modify the cluster configuration while VCS is

running using one of the graphical interfaces or the command-line interface. These online methods change the cluster configuration in memory. When finished, you write the in-memory configuration to the main.cf file on disk to preserve the configuration.

• Offline configuration: In some circumstances, you can simplify cluster implementation and configuration using an offline method, including:– Editing configuration files manually– Using the Simulator to create, modify, model, and test configurationsThis method requires you to stop and restart VCS in order to build the new configuration in memory.

Configuration MethodsOnline configuration: VCS does not need to be stopped. – Cluster Manager Java graphical user interface– Cluster Manager Web graphical user interface– VCS command-line interface– Command batch files

Offline configuration: VCS must be stopped and restarted.– Manual modification of configuration files– Modification of configuration files using the

VCS Simulator

Page 153: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Effects on the ClusterWhichever method you choose to use for configuring VCS to manage an application service, you must plan for application downtime. Online configuration refers to keeping VCS running, not the application service.

If you are configuring your first service group, you may not care whether VCS remains online during configuration. Stopping and restarting VCS has very little effect on your environment in this case.

If you already have service groups running in the cluster, you may want to use an online configuration method so that those services are protected while you are making modifications.

Effects on the ClusterUsing an online method does not mean that you can configure a service group without incurring any application downtime.Both methods require application downtime. At a minimum, you must test failover to each system that can run the service group.Online configuration means that you can configure a service group while VCS continues to run and keep other service groups highly available.

Page 154: VERITAS Cluster Server for UNIX Fundamentals

6–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Controlling Access to VCSRelating VCS and UNIX User AccountsIf you have not configured VxSS security in the cluster, VCS has a completely separate list of user accounts and passwords to control access to the VCS configuration.

When using the Cluster Manager to perform administration, you are prompted for a VCS account name and password. Depending on the privilege level of that VCS user account, VCS displays the Cluster Manager GUI with an appropriate set of options. If you do not have a valid VCS account, VCS does not display Cluster Manager.

When using the command-line interface for VCS, you are also prompted to enter a VCS user account and password and VCS determines whether the VCS user account has proper privileges to run the command. One exception is the UNIX root user. By default, only the UNIX root account is able to use VCS ha commands to administer VCS from the command line.

Relating VCS and UNIX User AccountsIn nonsecure mode:

There is no mapping between UNIX and VCS user accounts by default except root, which has Cluster Administrator privileges.The default VCS admin account:– Is created by default during cluster configuration.– Has privileges to perform all cluster operations.

Nonroot users are prompted for a VCS account name and password when executing VCS commands using the command-line interface.

Page 155: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Simplifying VCS Administrative Access

VCS 4.1 The halogin command is provided in VCS 4.1 to save authentication information so that users do not have to enter credentials every time a VCS command is run.The command stores authentication information in the user’s home directory. You must either set the VCS_HOST environment variable to the name of the node from which you are running VCS commands, or add the node name to the /etc/.vcshosts file.If you run halogin for different hosts, VCS stores authentication information for each host.

VCS 3.5 and 4.0

For releases prior to 4.1, halogin is not supported. When logged on to UNIX as a nonroot account, the user is prompted to enter a VCS account name and password every time a VCS command is entered.

To enable nonroot users to more easily administer VCS, you can set the AllowNativeCliUsers cluster attribute to 1. For example, type:haclus -modify AllowNativeCliUsers 1

When set, VCS maps the UNIX user name to the same VCS account name to determine whether the user is valid and has the proper privilege level to perform the operation. You must explicitly create each VCS account name to match the UNIX user names and grant the appropriate privilege level.

Simplifying VCS Administrative AccessYou can configure VCS to simplify administration from the command line using one of these methods:

For VCS 4.1 clusters:– Set the VCS_HOST environment variable to the node

name or add the node name to the /etc/.vcshostsfile.

– Log on to VCS:halogin vcs_user_name password

For pre-4.1 clusters:– Set the cluster attribute AllowNativeCliUsers to map

UNIX account names to VCS accounts.– A VCS account must exist with the same name as the

UNIX user account with appropriate privileges.

Page 156: VERITAS Cluster Server for UNIX Fundamentals

6–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

User AccountsYou can ensure that the different types of administrators in your environment have a VCS authority level to affect only those aspects of the cluster configuration that are appropriate to their level of responsibility.

For example, if you have a DBA account that is authorized to take a database service group offline or switch it to another system, you can make a VCS Group Operator account for the service group with the same account name. The DBA can then perform operator tasks for that service group, but cannot affect the cluster configuration or other service groups. If you set AllowNativeCliUsers to 1, then the DBA logged on with that account can also use the VCS command line to manage the corresponding service group.

Setting VCS privileges is described in the next section.

VCS User Account Privileges

Give VCS users the level of authorization needed to administer components of the cluster environment.

Cluster AdministratorFull privilegesCluster Operator All cluster, service group, and resource-level operations Cluster Guest Read-only access; new users created as Cluster Guest accounts by defaultGroup Administrator All service group operations for a specified service group, except deletion of service groupsGroup OperatorBrings service groups and resources online and takes them offline; temporarily freezes or unfreezes service groups

Page 157: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Creating Cluster User Accounts

VCS users are not the same as UNIX users except when running VCS in secure mode. If you have not configured VxSS security in the cluster, VCS maintains a set of user accounts separate from UNIX accounts. In this case, even if the same user exists in both VCS and UNIX, this user account can be given a range of rights in VCS that does not necessarily correspond to the user’s UNIX system privileges.

To add a user account:1 Open the cluster configuration:

haconf -makerw2 Add a new account with the hauser command:

hauser -add username For example, to add a user called DBSG_Op to the VCS configuration, type:hauser -add DBSG_Op

In non-secure mode, VCS user accounts are stored in the main.cf file in encrypted format. If you use a GUI or wizard to set up a VCS user account, passwords are encrypted automatically. If you use the command line, you must encrypt the password using the vcsencrypt command.

Note: In non-secure mode, If you change a UNIX account, this change is not reflected in the VCS main.cf file automatically. You must manually modify accounts in both places if you want them to be synchronized.

Creating Cluster User Accounts

The cluster configuration must be open.Add user accounts with the hauser command.hauser –add user

Additional privileges can then be added:haclus -modify Operators -add userhagrp -modify group Operators -add user

The default VCS admin account created during installation is assigned password.User accounts can also be created using the GUI.

Page 158: VERITAS Cluster Server for UNIX Fundamentals

6–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Changing PrivilegesA new account is given Cluster Guest privileges by default. Change the privileges for a user account with the haclus and hagrp commands using this syntax:haclus -modify Administrators | Operators -add userhagrp -modify group Administrators | Operators -add user

For example, to give Operator privileges for the DBSG service group to the user account DBSG_Op, type:hagrp -modify DBSG Operators -add DBSG_Op

With VCS 4.x, you can also add privileges with the -addpriv and -deletepriv options of the hauser command.

Modifying User Accounts

Use the hauser command to make changes to a VCS user account:• Display account information.

hauser -display• Change the password for an account.

hauser -update user_name• Delete a user account.

hauser -delete user_name

The cluster configuration must be open to update or delete a user account.

Page 159: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

VCS Access in Secure ModeWhen running in secure mode, VCS uses platform-based authentication; VCS does not store user passwords. All VCS users are system and domain users and are configured using fully-qualified user names, for example, administrator@vcsdomain. VCS provides a single sign-on mechanism, so authenticated users need not sign on each time to connect to a cluster.

When running in secure mode, you can add system or domain users to VCS and assign them VCS privileges. However, you cannot assign or change passwords using a VCS interface.

VCS Access in Secure ModeWhen running a cluster in secure mode:

VCS does not maintain user accounts separate from UNIX.The UNIX root user is granted Cluster Administrator privileges.Nonroot UNIX users are granted Guest privileges by default.Additional privileges can be granted using the CLI or User Manager GUI.You cannot change a user password from a VCS interface.

Page 160: VERITAS Cluster Server for UNIX Fundamentals

6–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Online ConfigurationBenefits

Online configuration has these advantages:• The VCS engine is up and running, providing high availability of existing

service groups during configuration.• This method provides syntax checking, which helps protect you from making

configuration errors.• This step-by-step procedure is suitable for testing each object as it is

configured, simplifying troubleshooting of configuration mistakes that you may make when adding resources.

• You do not need to be logged into the UNIX system as root to use the GUI and CLI to make VCS configuration changes.

Considerations

Online configuration has these considerations:• Online configuration is more time-consuming for large-scale modifications.• The online process is repetitive. You have to add service groups and resources

one at a time.

Online Configuration CharacteristicsCluster Manager and the VCS command-line interfaceenable you to create and test service groups and resources in a running cluster.

The tools perform syntax checking as you are performing each configuration change. Online configuration is a step-wise procedure. You can create and test resources one at a time.VCS changes the in-memory configuration when you make changes while VCS is online. You must explicitly save the in-memory configuration to the main.cf and types.cf files on disk.

Page 161: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

How VCS Changes the Online Cluster ConfigurationWhen you use Cluster Manager to modify the configuration, the GUI communicates with had on the specified cluster system to which Cluster Manager is connected.

Note: Cluster Manager configuration requests are shown conceptually as ha commands in the diagram, but they are implemented as system calls.

The had daemon communicates the configuration change to had on all other nodes in the cluster, and each had daemon changes the in-memory configuration.

When the command to save the configuration is received from Cluster Manager, had communicates this command to all cluster systems, and each system’s had daemon writes the in-memory configuration to the main.cf file on its local disk.

The VCS command-line interface is an alternate online configuration tool. When you run ha commands, had responds in the same fashion.

Note: When two administrators are changing the cluster configuration simultaneously, each sees all changes as they are being made.

How VCS Changes the Online Cluster Configuration

Add service group.Add service group.

hagrp –add …hagrp –add …

In-memory configuration

In-memory configuration

ConfigConfig

Page 162: VERITAS Cluster Server for UNIX Fundamentals

6–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Opening the Cluster ConfigurationYou must open the cluster configuration to add service groups and resources, make modifications, and perform certain operations.

When you open the cluster configuration, VCS creates a .stale file in the /etc/VRTSvcs/conf/config configuration directory on every system in the cluster. This file indicates that the configuration is open and that the configuration in memory may not match the configuration on disk in the main.cf file.

Shared cluster configuration in memory

main.cf

.stale

main.cf

.stale

Opening the Cluster Configuration

haconf -makerwhaconf -makerw

Page 163: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Saving the Cluster ConfigurationWhen you save the cluster configuration, VCS copies the configuration in memory to the main.cf file in the /etc/VRTSvcs/conf/config directory on all running cluster systems. The .stale file remains in the configuration directory because the configuration is still open.

If you save the cluster configuration after each change, you can view the main.cf file to see how the in-memory modifications are reflected in the main.cf file.

Shared cluster configuration in memory

main.cf

.stale

main.cf

.stale

Saving the Cluster Configuration

haconf -dumphaconf -dump

Page 164: VERITAS Cluster Server for UNIX Fundamentals

6–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Closing the Cluster ConfigurationWhen the administrator saves and closes the configuration, VCS writes the configuration in memory to the main.cf file and removes the .stale file on all running cluster systems.

Shared cluster configuration in memory

main.cf

.stale

main.cf

Closing the Cluster Configuration

haconf –dump -makerohaconf –dump -makero

1

2 .stale

Page 165: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

How VCS Protects the Cluster ConfigurationThe .stale file provides a protection mechanism needed for online configuration. When the .stale file is present, you cannot stop VCS without overriding the warning that the configuration is open.

If you ignore the warning and stop VCS while the configuration is open, the configuration in main.cf on disk may not be the same as the configuration in memory. When this occurs, VCS considers the configuration stale because the administrator may have changed the configuration in memory without writing it to disk and closing the configuration.

Although rare, a stale configuration can also result from all systems in the cluster crashing when the configuration is open.

To understand how this protection mechanism works, you must first understand the normal VCS startup procedure.

How VCS Protects the Cluster ConfigurationThe stale flag:

Is a mechanism for protecting the cluster configuration when you make changes while VCS is running.Indicates that the in-memory configuration may not match the configuration in the main.cf file on disk.

When present:VCS warns you to close the configuration if you attempt to stop VCS.VCS does not start without administrative intervention.

Page 166: VERITAS Cluster Server for UNIX Fundamentals

6–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Offline ConfigurationIn some circumstances, you can simplify cluster implementation or configuration tasks by directly modifying the VCS configuration files. This method requires you to stop and restart VCS in order to build the new configuration in memory.

The benefits of using an offline configuration method are that it:• Offers a very quick way of making major changes or getting an initial

configuration up and running• Provides a means for deploying a large number of similar clusters

One consideration when choosing to perform offline configuration is that you must be logged into the a cluster system as root.

This section describes situations where offline configuration is useful. The next section shows how to stop and restart VCS to propagate the new configuration throughout the cluster. The “Offline Configuration of Service Groups” lesson provides detailed offline configuration procedures and examples.

Offline Configuration CharacteristicsYou can change the cluster configuration by modifyingthe VCS configuration files and then restarting VCS.

Configuration files can be modified using any text editor or the VCS Simulator.The offline configuration method enables you to create an entire cluster configuration by editing the main.cf file.You can also copy an existing configuration to create a new cluster configuration.In addition, you can use this method to create multiple similar resources or service groups within a cluster.

Page 167: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Offline Configuration Examples

Example 1: Creating a New Cluster

You can create a new cluster configuration by creating a new main.cf file. The slide displays the beginning of a main.cf file that is being created from the values in a design worksheet.

You can define all cluster attributes, add service groups and resources, define relationships, and specify failover behavior—all aspects of cluster configuration—by modifying the main.cf file.

Web1AutoStartListOptional Attributes

Web1=0, Web2=1SystemListPriorityFailoverPolicy

Required AttributesWebSGGroupSample ValueService Group

Web1systemWeb2system

SystemsadminAdministratorsadmin=ElmElgUserNames

AttributesWebClusterClusterSample ValueCluster Definition

Example 1: Creating a New Cluster

include “types.cf”

cluster WebCluster (UserNames = { admin =ElmElg }Administrators = { admin })

system Web1 ()

system Web2 ()

group WebSG (SystemList =

{Web1=0,Web=1}AutoStartList = {Ora1})

include “types.cf”

cluster WebCluster (UserNames = { admin =ElmElg }Administrators = { admin })

system Web1 ()

system Web2 ()

group WebSG (SystemList =

{Web1=0,Web=1}AutoStartList = {Ora1})

main.cfmain.cf

Page 168: VERITAS Cluster Server for UNIX Fundamentals

6–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Example 2: Reusing a Cluster Configuration

One example where offline configuration is appropriate is when your high availability environment is expanding and you are adding clusters with similar configurations.

In the example displayed in the diagram, the original cluster consists of two systems, each running a database instance. Another cluster with essentially the same configuration is being added, but is managing different Oracle databases.

You can copy the configuration files from the original cluster, make the necessary changes, and then restart VCS as described later in this lesson. This method may be more efficient than creating each service group and resource using Cluster Manager or the VCS command-line interface.

Example 2: Reusing a Cluster Configuration

S1S1 S2S2

DB1

Cluster1Cluster1

DB2

S3S3 S4S4

DB3

Cluster2Cluster2

DB4

group DB1 (SystemList = {S1=1,S2=2 AutoStartList = {S1})

group DB1 (SystemList = {S1=1,S2=2 AutoStartList = {S1})

group DB3 (SystemList = {S3=1,S4=2 AutoStartList = {S3})

group DB3 (SystemList = {S3=1,S4=2 AutoStartList = {S3})

main.cf

main.cf

Page 169: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

Example 3: Reusing a Service Group Configuration

Another example of using offline configuration is when you want to add a service group with a similar set of resources as another service group in the same cluster.

In the example displayed in the diagram, the portion of the main.cf file that defines the DemoSG service group is copied and edited as necessary to define a new AppSG service group.

Example 3: Reusing a Service Group Definition

AppNIC

AppIPAppMount

AppProcess

DemoNIC

DemoIPDemoMount

DemoProcess

S1AutoStartListOptional Attributes

S1=0, S2=1SystemListPriorityFailoverPolicy

Required AttributesDemoSGGroupSample ValueService Group Definition

S1AutoStartListOptional Attributes

S1=0, S2=1SystemListPriorityFailoverPolicy

Required AttributesAppSGGroupSample ValueService Group Definition

DemoDG

DemoVol

AppDG

AppVol

Page 170: VERITAS Cluster Server for UNIX Fundamentals

6–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Starting and Stopping VCSThere are a variety of options for starting and stopping VCS, which are described in this section. Understanding the effects of each method helps you decide which configuration method is best suited for your high availability environment.

How VCS Starts Up by DefaultThe default VCS startup process is demonstrated using a cluster with two systems connected by the cluster interconnect. To illustrate the process, assume that no systems have an active cluster configuration.1 The hastart command is run on S1 and starts the had and hashadow

processes.2 HAD checks for a .stale file in the configuration directory. In this example,

there is no .stale file present.3 HAD checks for a valid configuration file (hacf -verify config_dir).4 HAD checks for an active cluster configuration on the cluster interconnect.5 Because there is no active cluster configuration, HAD on S1 reads the local

main.cf file and loads the cluster configuration into local memory.The S1 system is now in the VCS local build state, meaning VCS is building a cluster configuration in memory on the local system.

6 The hastart command is then run on S2 and starts had and hashadow on S2.The S2 system is now in the VCS current discover wait state, meaning VCS is in a wait state while it is discovering the current state of the cluster.

7 HAD checks for a .stale file in the configuration directory.

How VCS Starts Up by Default

main.cf

hadhashadow

1

Cluster Conf

No config inmemory

hastarthastart

hadhashadow

S2S1

hastarthastart

main.cf

Local Build Current_Discover_Wait

.stale .stale2

35

4 9

8

7

6

Page 171: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

8 HAD on S2 checks for a valid configuration file on disk.9 HAD on S2 checks for an active cluster configuration by sending a broadcast

message out on the cluster interconnect, even if the main.cf file on S2 is valid.

Page 172: VERITAS Cluster Server for UNIX Fundamentals

6–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

10 HAD on S1 receives the request from S2 and responds.11 HAD on S1 sends a copy of the cluster configuration over the cluster

interconnect to S2.The S1 system is now in the VCS running state, meaning VCS determines there is a running configuration in memory on system S1.The S2 system is now in the VCS remote build state, meaning VCS is building the cluster configuration in memory on the S2 system from the cluster configuration that is in a running state on S1.

12 When the remote build process completes, HAD on S2 copies the cluster configuration into the local main.cf file.If S2 has valid local configuration files (main.cf and types.cf), these are saved to new files with a name, including a date and time stamp, before the active configuration is written to the main.cf file on disk.

The startup process is repeated on each system until all members have identical copies of the cluster configuration in memory and matching main.cf files on local disks.

Synchronization is maintained by data transfer through LLT and GAB.

Next System Startup

main.cfmain.cf

hadhashadow

Cluster Conf

hadhashadow

12

S2S1

main.cfmain.cf

Cluster Conf

ClusterConf

10

Remote Build

Running

11

Page 173: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

VCS Startup with a .stale FileTo illustrate how VCS protects the cluster configuration, assume that no systems have an active cluster configuration and a .stale file is present.1 The hastart command is run on S1 and starts the had and hashadow

processes.2 HAD checks for a .stale file and determines that the file is present.3 HAD determines whether the configuration files are valid.4 HAD determines that there is no active configuration anywhere in the cluster.5 Because there is no active cluster configuration, HAD goes into the stale

admin wait state.

The stale admin wait state indicates to you that you stopped VCS on all systems while the configuration was open. This also occurs if you start VCS and the main.cf file has a syntax error. This enables you to inspect the main.cf file and decide whether you want to start VCS with that main.cf file. You may have to modify the main.cf file if you made changes in the running cluster after saving the configuration to disk.

VCS Startup with a .stale File

main.cfmain.cf

hadhashadow

1

No config in memory

S2S1

hastarthastart

.stale

main.cfmain.cf

Unknown

.stale

Stale admin wait

2

3

4

5

Page 174: VERITAS Cluster Server for UNIX Fundamentals

6–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Forcing VCS to Start from a Wait StateIf all systems are in a wait state, you must force VCS to start on the system with the correct main.cf file. In this case, had is already started on each system, so you cannot use the hastart command to build the cluster configuration. Instead, use hasys -force to tell had to create the cluster configuration in memory on the appropriate system.1 Run hasys -force S1 on S1. This starts the local build process.

Note: You must have a valid main.cf file to force VCS to a running state. If the main.cf file has a syntax error, running hasys -force results in VCS entering the Admin_Wait state. You can run hacf -verify to check the file syntax.

2 HAD removes the .stale flag, if present.3 HAD checks for a valid main.cf file.4 The had daemon on S1 reads the local main.cf file, and if it has no syntax

problems, HAD loads the cluster configuration into local memory on S1.

Forcing VCS to Start from a Wait State

main.cfmain.cf

hadhashadow

1

3

Cluster Conf

No config in memory

hadhashadow

S2S1

hasys –force S1hasys –force S1

.stale2

4

main.cfmain.cf

Local Build Waiting for a running config

.stale

Page 175: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

5 When had is in a running state on S1, this state change is broadcast on the cluster interconnect by GAB.

6 S2 then performs a remote build to put the new cluster configuration into its memory.

7 The had process on S2 copies the cluster configuration into the local main.cf and types.cf files after moving the original files to backup copies with timestamps.

8 The had process on S2 removes the .stale file, if present, from the local configuration directory.

Next System Startup

main.cfmain.cf

hadhashadow

Cluster Conf

hadhashadow

7

S2S1

6

main.cfmain.cf

.stale 8

Cluster Conf

ClusterConf

5

Remote Build

Running

Page 176: VERITAS Cluster Server for UNIX Fundamentals

6–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Building the Configuration Using a Specific main.cf FileThe diagram illustrates how to start VCS to ensure that the cluster configuration in memory is built from a specific main.cf file.

Starting VCS Using a Stale Flag

By starting VCS with the -stale flag on all other systems, you ensure that VCS builds the new configuration in memory on the system where the changes were made to the main.cf file and all other systems wait for the build to successfully complete before building their in-memory configurations.1 Run hastart on S1 to start the had and hashadow processes.2 HAD checks for a .stale flag.3 HAD checks for a valid main.cf file.4 HAD checks for an active cluster configuration on the cluster interconnect.5 Because there is no active cluster configuration, the had daemon on S1 reads

the local main.cf file and loads the cluster configuration into local memory on S1.

6 Run hastart -stale on S2.7 HAD starts and checks for a .stale flag, which is present because VCS

writes the file when the -stale option is given to hastart.The S2 system is now in the stale admin wait state while VCS checks for a valid configuration in memory on another cluster system.

8 HAD on S2 checks for an active cluster configuration on the cluster interconnect and waits until S1 has a running cluster configuration.

Building the Configuration Using a Specific main.cf File

main.cf

hadhashadow

1

Cluster Conf

No config inmemory

hastart -stalehastart -stale

hadhashadow

S2S1

hastarthastart

.stale

main.cf

Local Build Waiting for a running config

2

3

4

5

6

7

8

Page 177: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–29Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

9 When VCS is in a running state on S1, HAD on S1 sends a copy of the cluster configuration over the cluster interconnect to S2.

10 S2 performs a remote build to put the new cluster configuration in memory.11 HAD on S2 copies the cluster configuration into the local main.cf and

types.cf files after moving the original files to backup copies with timestamps.

12 HAD on S2 removes the .stale file from the local configuration directory.

Starting the Next System

main.cfmain.cf

hadhashadow

Cluster Conf

hadhashadow

S2S1

main.cfmain.cf

.stale 12

Cluster Conf

ClusterConf

9

Remote Build

Running

11

10

Page 178: VERITAS Cluster Server for UNIX Fundamentals

6–30 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Stopping VCSThere are three methods of stopping the VCS engine (had and hashadow daemons) on a cluster system:• Stop VCS and take all service groups offline, stopping application services

under VCS control.• Stop VCS and evacuate service groups to another cluster system where VCS is

running.• Stop VCS and leave application services running.

VCS can also be stopped on all systems in the cluster simultaneously. The hastop command is used with different options and arguments that determine how running services are handled.

VCS Shutdown Examples

The three examples show the effect of using different options with the hastop command:• Example 1: The -local option causes the service group to be taken offline on

S1 and stops VCS services (had) on S1.• Example 2: The -local -evacuate options cause the service group on S1

to be migrated to S2 and then stops VCS services (had) on System1.• Example 3: The -all -force options stop VCS services (had) on both

systems and leave the services running. Although they are no longer protected highly available services and cannot fail over, the services continue to be available to users.

Stopping VCS

hastop -local1

S2

had

S1

had

hastop -local -evacuate

S2

had

S1

had

2

hastop –all -force

S2

had

S1

had

3

Page 179: VERITAS Cluster Server for UNIX Fundamentals

Lesson 6 VCS Configuration Methods 6–31Copyright © 2005 VERITAS Software Corporation. All rights reserved.

6

SummaryThis lesson introduced the methods you can use to configure VCS. You also learned how VCS starts and stops in a variety of circumstances.

Next StepsNow that you are familiar with the methods available for configuring VCS, you can apply these skills by creating a service group using an online configuration method.

Additional Resources• VERITAS Cluster Server User’s Guide

This guide provides detailed information on starting and stopping VCS, and performing online and offline configuration.

• VERITAS Cluster Server Command Line Quick ReferenceThis card provides the syntax rules for the most commonly used VCS commands.

Lesson SummaryKey Points – Online configuration enables you to keep VCS

running while making configuration changes.– Offline configuration is best suited for large-

scale modifications.Reference Materials– VERITAS Cluster Server User's Guide– VERITAS Cluster Server Command Line Quick

Reference

Page 180: VERITAS Cluster Server for UNIX Fundamentals

6–32 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 6: Starting and Stopping VCSLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 6 Synopsis: Starting and Stopping VCS,” page A-29

Appendix B provides step-by-step lab instructions.• “Lab 6: Starting and Stopping VCS,” page B-37

Appendix C provides complete lab instructions and solutions.• “Lab 6 Solutions: Starting and Stopping VCS,” page C-63

GoalThe purpose of this lab is to observe the effects of stopping and starting VCS.

PrerequisitesStudents must work together to coordinating stopping and restarting VCS.

ResultsThe cluster is running and the ClusterService group is online.

Lab 6: Starting and Stopping VCS

train1 train2

# hastop –all -force# hastop –all -force

vcs1vcs1

Page 181: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7Online Configuration of Service Groups

Page 182: VERITAS Cluster Server for UNIX Fundamentals

7–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how to use the VCS Cluster Manager graphical user interface (GUI) and the command-line interface (CLI) to create a service group and configure resources while the cluster is running.

ImportanceYou can perform all tasks necessary to create and test a service group while VCS is running without affecting other high availability services.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 183: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Outline of Topics• Online Configuration Procedure• Adding a Service Group • Adding Resources• Solving Common Configuration Errors• Testing the Service Group

Resolve common errors made during online configuration.

Solving Common Configuration Errors

Test the service group to ensure that it is correctly configured.

Testing the Service Group

Create resources using online configuration tools.

Adding Resources

Create a service group using online configuration tools.

Adding a Service Group

Describe an online configuration procedure.

Online Configuration Procedure

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 184: VERITAS Cluster Server for UNIX Fundamentals

7–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Online Configuration ProcedureThe chart on the left in the diagram illustrates the high-level procedure you can use to modify the cluster configuration while VCS is running.

Creating a Service GroupYou can use the procedures shown in the diagram as a standard methodology for creating service groups and resources. Although there are many ways you could vary this configuration procedure, following a recommended practice simplifies and streamlines the initial configuration and facilitates troubleshooting if you encounter configuration problems.

Online Configuration Procedure

Open cluster configuration.Open cluster configuration.

Modify the cluster configuration using the GUI or CLI.

Modify the cluster configuration using the GUI or CLI.

Save the cluster configuration.Save the cluster configuration.

Close the cluster configuration.Close the cluster configuration.

Add Service Group

Set SystemList

Set Opt Attributes

Add/Test Resource

Y

N

Resource Flow Chart

More? Test

This procedure assumes that you have prepared and tested the application service on each system and it is offline everywhere, as described in the “Preparing Services for High Availability” lesson.

This procedure assumes that you have prepared and tested the application service on each system and it is offline everywhere, as described in the “Preparing Services for High Availability” lesson.

Page 185: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Adding a Service GroupAdding a Service Group Using the GUIThe minimum required information to create a service group is:• Enter a unique name. Using a consistent naming scheme helps identify the

purpose of the service group and all associated resources.• Specify the list of systems on which the service group can run.

This is defined in the SystemList attribute for the service group, as displayed in the excerpt from the sample main.cf file. A priority number is associated with each system as part of the SystemList definition. The priority number is used for the default failover policy, Priority. VCS uses the priority number to choose a system for failover when more than one system is specified. The lower numbered system is selected first. Other failover policies are described in other lessons.

• The Startup box specifies that the service group starts automatically when the had daemon starts on the system, if the service group is not already online elsewhere in the cluster. This is defined by the AutoStartList attribute of the service group. In the example displayed in the slide, the S1 system is selected as the system on which DemoSG is started when VCS starts up.

• The Service Group Type selection is failover by default.

If you save the configuration after creating the service group, you can view the main.cf file to see the effect of had modifying the configuration and writing the changes to the local disk.

Adding a Service Group Using the GUI

group DemoSG (SystemList = { S1 = 0, S2 = 1 }AutoStartList = { S1 })

group DemoSG (SystemList = { S1 = 0, S2 = 1 }AutoStartList = { S1 })

main.cfmain.cf

Page 186: VERITAS Cluster Server for UNIX Fundamentals

7–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Note: You can click the Show Command button to see the commands that are run when you click OK.

Adding a Service Group Using the CLIYou can also use the VCS command-line interface to modify a running cluster configuration. The next example shows how to use hagrp commands to add the DemoSG service group and modify its attributes.haconf –makerw

hagrp –add DemoSG

hagrp –modify DemoSG SystemList S1 0 S2 1

hagrp –modify DemoSG AutoStartList S1

haconf –dump -makero

The corresponding main.cf excerpt for DemoSG is shown in the slide.

Notice that the main.cf definition for the DemoSG service group does not include the Parallel attribute. When a default value is specified for a resource, the attribute is not written to the main.cf file. To display all values for all attributes:• In the GUI, select the object (resource, service group, system, or cluster), click

the Properties tag, and click Show all attributes.• From the command line, use the -display option to the corresponding ha

command. For example:hagrp -display DemoSG

See the command-line reference card provided with this course for a list of commonly used ha commands.

Page 187: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Classroom Exercise: Creating a Service GroupCreate a service group using the Cluster Manager GUI. The service group should have these properties:• Specify a name based on your name, or use a student name, such as S1 for the

student using train1, as directed by your instructor.• Select both systems, with priority given to the system you are assigned. For

example, if you are working on train1, assign priority 0 to that system and 1 to the next system, train2.

• Select your system as the startup system.• Retain the default of failover for the service group type.

Resources are added to the service group in a later exercise.

Appendix A provides brief lab instructions for experienced students.• “Creating a Service Group,” page A-32

Appendix B provides step-by-step lab instructions.• “Creating a Service Group,” page B-43

Appendix C provides complete lab instructions and solutions.• “Creating a Service Group,” page C-69

Classroom ExerciseCreate a service group using the Java GUI. Your instructor may demonstrate the steps to perform this task. 1. Complete the design worksheet with values for your classroom.2. Add a service group using the Cluster Manager.3. See Appendix A, B, or C for detailed instructions.

train1AutoStartListOptional Attributes

train1=0, train2=1SystemListPriorityFailOverPolicy

Required AttributesnameSG1Group

Your ValueSample ValueResource Definition

Page 188: VERITAS Cluster Server for UNIX Fundamentals

7–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Design Worksheet Example

Corresponding main.cf Entrygroup nameSG1 (

SystemList = { train1 = 0, train2 = 1 }

AutoStartList = { train1 }

)

Service Group Definition Sample Value Your Value

Group nameSG1

Required Attributes

FailOverPolicy Priority

SystemList train1=0 train2=1

Optional Attributes

AutoStartList train1

Page 189: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Adding ResourcesOnline Resource Configuration ProcedureAdd resources to a service group in the order of resource dependencies starting from the child resource (bottom up). This enables each resource to be tested as it is added to the service group.

Adding a resource requires you to specify:• The service group name• The unique resource name

If you prefix the resource name with the service group name, you can more easily identify the service group to which it belongs. When you display a list of resources from the command line using the hares -list command, the resources are sorted alphabetically.

• The resource type• Attribute values

Use the procedure shown in the diagram to configure a resource.

Notes: • You are recommended to set each resource to be non-critical during initial

configuration. This simplifies testing and troubleshooting in the event that you have specified incorrect configuration information. If a resource faults due to a configuration error, the service group does not fail over if resources are non-critical.

• Enabling a resource signals the agent to start monitoring the resource.

Considerations:Add resources in order of dependency, starting at the bottom.Configure all required attributes.Enable the resource.Bring each resource online before adding the next resource.It is recommended that you set resources as non-critical until testing has completed.

Resource Configuration Procedure

DoneDone

NOnline?Online?

Add ResourceAdd Resource

Modify AttributesModify Attributes

Set Non-CriticalSet Non-Critical

Bring OnlineBring Online

Enable ResourceEnable Resource

Y

Troubleshoot Resources

Page 190: VERITAS Cluster Server for UNIX Fundamentals

7–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Adding Resources Using the GUI: NIC ExampleThe NIC resource has only one required attribute, Device, for all platforms other than HP-UX, which also requires NetworkHosts.

Optional attributes for NIC vary by platform. Refer to the VERITAS Cluster Server Bundled Agents Reference Guide for a complete definition. These optional attributes are common to all platforms.• NetworkType: Type of network, Ethernet (ether)• PingOptimize: Number of monitor cycles to detect if the configured interface

is inactive A value of 1 optimizes broadcast pings and requires two monitor cycles. A value of 0 performs a broadcast ping during each monitor cycle and detects the inactive interface within the cycle. The default is 1.

• NetworkHosts: The list of hosts on the network that are used to determine if the network connection is alive It is recommended that you enter the IP address of the host rather than the host name to prevent the monitor cycle from timing out due to DNS problems.

Adding a Resource Using the GUI: NIC Example

NIC is persistent; it shows as online as soon as you enable the resources.The agent monitors NIC using ping to the NetworkHostsaddress or broadcast to the administrative IP subnet.You must have an administrative IP configured on the interface to monitor NIC.

NIC DemoNIC (Critical = 0Device = qfe1)

NIC DemoNIC (Critical = 0Device = qfe1)

main.cfmain.cf

Page 191: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Example Device Attribute ValuesSolaris

qfe1AIX

en0HP-UX

lan0Linux

eth0

Page 192: VERITAS Cluster Server for UNIX Fundamentals

7–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Adding an IP ResourceThe slide shows the required attribute values for an IP resource in the DemoSG service group. The corresponding entry is made in the main.cf file when the configuration is saved.

Notice that the IP resource has two required attributes, Device and Address, which specify the network interface and IP address, respectively.

Optional Attributes• NetMask: Netmask associated with the application IP address

The value may be specified in decimal (base 10) or hexadecimal (base 16). The default is the netmask corresponding to the IP address class.

• Options: Options to be used with the ifconfig command• ArpDelay: Number of seconds to sleep between configuring an interface and

sending out a broadcast to inform routers about this IP addressThe default is 1 second.

• IfconfigTwice: If set to 1, this attribute causes an IP address to be configured twice, using an ifconfig up-down-up sequence. This behavior increases the probability of gratuitous ARPs (caused by ifconfig up) reaching clients. The default is 0.

Adding an IP Resource

The agent uses ifconfig to configure the IP address.The virtual IP address set in the Address attribute must be different from the adminstrative IP address.

IP DemoIP (Critical = 0 Device = qfe1Address = "10.10.21.198")

IP DemoIP (Critical = 0 Device = qfe1Address = "10.10.21.198")

main.cfmain.cf

Page 193: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Classroom Exercise: Creating Network Resources Using the GUICreate NIC and IP resources using the Cluster Manager GUI, using the values provided by your instructor for your classroom.

Specify resource names based on your name, or use a student name, such as S1 for the student using train1, as directed by your instructor.

Appendix A provides brief lab instructions for experienced students.• “Adding Resources to a Service Group,” page A-33

Appendix B provides step-by-step lab instructions.• “Adding Resources to a Service Group,” page B-44

Appendix C provides complete lab instructions and solutions.• “Adding Resources to a Service Group,” page C-71

Classroom ExerciseCreate network resources using the Java GUI. Your instructor maydemonstrate the steps to perform this task. 1. Complete the design worksheet with values for your classroom.2. Add a NIC and IP resource using the Cluster Manager.3. See Appendix A, B, or C for detailed instructions.

eri0Device

nameNIC1Resource Name

Yes (1)Enabled?No (0)Critical?192.168.xx.1NetworkHosts*

Required AttributesNICResource Type

nameSG1Service Group NameYour ValueSample ValueResource Definition

*Required only on HP-UX.*Required only on HP-UX.

See next slide for other IP resource.

Page 194: VERITAS Cluster Server for UNIX Fundamentals

7–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Design Worksheet Example

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameNIC1

Resource Type NIC

Required Attributes

Device Solaris: eri0Sol Mob: dmfe0AIX: en1HP-UX: lan0Linux: eth1VA: bge0

NetworkHosts* 192.168.xx.1 (HP-UX only)

Critical? No (0)

Enabled? Yes (1)

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameIP1

Resource Type IP

Required Attributes

Device Solaris: eri0Sol Mob: dmfe0AIX: en1HP-UX: lan0Linux: eth1VA: bge0

Address 192.168.xx.51* see table

Optional Attributes

Netmask 255.255.255.0

Critical? No (0)

Enabled? Yes (1)

Page 195: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Corresponding main.cf EntryNIC nameNIC1 (

Critical = 0Device = eri0NetworkHosts = 192.168.xx.1 (Required only on HP-UX.))

IP nameIP1 (Critical = 0Device = eri0Address = "192.168.xx.51")

System IP Address

train1 192.168.xx.51

train2 192.168.xx.52

train3 192.168.xx.53

train4 192.168.xx.54

train5 192.168.xx.55

train6 192.168.xx.56

train7 192.168.xx.57

train8 192.168.xx.58

train9 192.168.xx.59

train10 192.168.xx.60

train11 192.168.xx.61

train12 192.168.xx.62

Page 196: VERITAS Cluster Server for UNIX Fundamentals

7–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Adding a Resource Using the CLI: DiskGroup ExampleYou can use the hares command to add a resource and configure the required attributes. This example shows how to add a DiskGroup resource, which is described in more detail in the next section.

The DiskGroup Resource

The DiskGroup resource has only one required attribute, DiskGroup.

Note: VCS uses the vxdg with the -t option when importing a disk group to disable autoimport. This ensures that VCS controls the disk group. VCS deports a disk group if it was manually imported without the -t option (outside of VCS control).

Optional attributes:• MonitorReservation: Monitors SCSI reservations

The default is 1, the agent monitors the SCSI reservation on the disk group. If the reservation is missing, the agent brings the resource offline.

• StartVolumes: Starts all volumes after importing the disk groupThis also starts layered volumes by running vxrecover -s. The default is 1, enabled, on all UNIX platforms except Linux. This attribute is required on Linux.

Adding a Resource Using Commands: DiskGroup Example

haconf –makerwhares –add DemoDG DiskGroup DemoSGhares –modify DemoDG Critical 0hares –modify DemoDG DiskGroup DemoDGhares –modify DemoDG Enabled 1haconf –dump –makero

haconf –makerwhares –add DemoDG DiskGroup DemoSGhares –modify DemoDG Critical 0hares –modify DemoDG DiskGroup DemoDGhares –modify DemoDG Enabled 1haconf –dump –makero

You can use the hares command to add a resource and modify resource attributes.

DiskGroup DemoDG (Critical = 0DiskGroup = DemoDG)

DiskGroup DemoDG (Critical = 0DiskGroup = DemoDG) main.cfmain.cf

The DiskGroup agent:Imports and deports a disk groupMonitors the disk group using vxdg

The DiskGroup agent:Imports and deports a disk groupMonitors the disk group using vxdg

hareshares

Page 197: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

• StopVolumes: Stops all volumes before deporting the disk groupThe default is 1, enabled, on all UNIX platforms except Linux. This attribute is required on Linux.

Note: Set StartVolumes and StopVolumes attributes to 0 (zero) if using VCS with VERITAS Volume Replicator.

Page 198: VERITAS Cluster Server for UNIX Fundamentals

7–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The Volume Resource

The Volume resource can be used to manage a VxVM volume. Although the Volume resource is not strictly required, it provides additional monitoring. You can use a DiskGroup resource to start volumes when the DiskGroup resource is brought online. This has the effect of starting volumes more quickly, but only the disk group is monitored.

However, if you have a large number of volumes on a single disk group, the DiskGroup resource can time out when trying to start or stop all the volumes simultaneously. In this case, you can set the StartVolume and StopVolume attributes of the DiskGroup to 0, and create Volume resources to start the volumes individually.

Also, if you are using volumes as raw devices with no file systems, and, therefore, no Mount resources, consider using Volume resources for the additional level of monitoring.

The Volume resource has no optional attributes.

The Volume Resource

Starts and stops a volume using vxvolReads a block from the raw device interface using dd to determine status

DemoVolResource Name

DemoDGDiskGroupDemoVolVolume

Required AttributesVolumeResource Type

DemoSGService Group NameSample ValueResource Definition

Volume DemoVol (Volume = DemoVolDiskGroup = DemoDG)

Volume DemoVol (Volume = DemoVolDiskGroup = DemoDG)

main.cfmain.cf

Page 199: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

The Mount Resource

The Mount resource has the required attributes displayed in the main.cf file excerpt in the slide.

Optional attributes:• MountOpt: Specifies options for the mount command• SnapUmount: Determines whether VxFS snapshots are unmounted when the

file system is taken offline (unmounted)The default is 0, meaning that snapshots are not automatically unmounted when the file system is unmounted. Note: If SnapUmount is set to 0 and a VxFS snapshot of the file system is mounted, the unmount operation fails when the resource is taken offline, and the service group is not able to fail over. This is desired behavior in some situations, such as when a backup is being performed from the snapshot.

/dev/vx/dsk/DemoDG/DemoVol

BlockDevice

-yFsckOpt

DemoMountResource Name

vxfsFSType

/demoMountPointRequired Attributes

MountResource Type

DemoSGService Group NameSample ValueResource Definition

The Mount Resource

Mounts and unmounts a block device on the directory; runs fsck to remount if mount failsUses stat and statvfs to monitor the file system

Mount DemoMount (BlockDevice = /dev/vx/dsk/DemoDG/DemoVolFStype = vxfsMountPoint = /demoFsckOpt = -y)

Mount DemoMount (BlockDevice = /dev/vx/dsk/DemoDG/DemoVolFStype = vxfsMountPoint = /demoFsckOpt = -y)

main.cfmain.cf

Page 200: VERITAS Cluster Server for UNIX Fundamentals

7–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Classroom Exercise: Creating Storage Resources using the CLICreate DiskGroup, Volume, and Mount resources using the command-line interface with the values provided by your instructor for your classroom.

Specify resource names based on your name, or use a student name, such as S1 for the student using train1, as directed by your instructor.

Appendix A provides brief lab instructions for experienced students.• “Adding Resources to a Service Group,” page A-33

Appendix B provides step-by-step lab instructions.• “Adding a DiskGroup Resource,” page B-48

Appendix C provides complete lab instructions and solutions.• “Adding a DiskGroup Resource,” page C-76

Classroom ExerciseCreate storage resources using the CLI. Your instructor may demonstrate the steps to perform this task. 1. Complete the design worksheet with values for your classroom.2. Add DiskGroup, Volume, and Mount resources using hares.3. See Appendix A, B, or C for detailed instructions.

1StartVolumes

nameDG1DiskGroupRequired Attributes

nameDG1Resource Name

Yes (1)Enabled?No (0)Critical?1StopVolumes

Optional Attributes

DiskGroupResource Type

nameSG1Service Group NameYour ValueSample ValueResource Definition

hareshares

Page 201: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Design Worksheet Example

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameDG1

Resource Type DiskGroup

Required Attributes

DiskGroup nameDG1

Optional Attributes

StartVolumes 1

StopVolumes 1

Critical? No (0)

Enabled? Yes (1)

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameVol1

Resource Type Volume

Required Attributes

Volume nameVol1

DiskGroup nameDG1

Critical? No (0)

Enabled? Yes (1)

Page 202: VERITAS Cluster Server for UNIX Fundamentals

7–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Corresponding main.cf EntriesDiskGroup nameDG1 (

Critical = 0DiskGroup = nameDG1)

Volume nameVol1 (Critical = 0Volume = nameVol1DiskGroup = nameDG1)

Mount nameMount1 (Critical = 0MountPoint = "/name1"BlockDevice = "/dev/vx/dsk/nameDG1/nameVol1"FSType = vxfsFsckOpt = "-y")

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameMount1

Resource Type Mount

Required Attributes

MountPoint /name1

BlockDevice /dev/vx/dsk/nameDG1/nameVol1 (no spaces)

FSType vxfs

FsckOpt -y

Critical? No (0)

Enabled? Yes (1)

Page 203: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

The Process ResourceThe Process resource controls the application and is added last because it requires all other resources to be online in order to start. The Process resource is used to start, stop, and monitor the status of a process. • Online: Starts the process specified in the PathName attribute, with options, if

specified in the Arguments attribute• Offline: Sends SIGTERM to the process

SIGKILL is sent if process does not exit within one second.• Monitor: Determines if the process is running by scanning the process table

The optional Arguments attribute specifies any command-line options to use when starting the process.

Notes:• If the executable is a shell script, you must specify the script name followed by

arguments. You must also specify the full path for the shell in the PathName attribute.

• The monitor script calls ps and matches the process name. The process name field is limited to 80 characters in the ps output. If you specify a path name to a process that is longer than 80 characters, the monitor entry point fails.

Optional Attributes

/sbin/orderprocup

Arguments/bin/shPathName

Required AttributesProcessResource TypeDemoProcessResource NameDemoSGService GroupValueProperty

The Process Resource

Process DemoProcess (PathName = "/bin/sh"Arguments = "/demo/orderproc up")

Process DemoProcess (PathName = "/bin/sh"Arguments = "/demo/orderproc up")

Starts and stops a daemon-type processMonitors the process by scanning the process table

main.cfmain.cf

Page 204: VERITAS Cluster Server for UNIX Fundamentals

7–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Classroom Exercise: Creating a Process Resource Create a Process resource using either the Cluster Manager GUI or the command line interface, using the values provided by your instructor for your classroom.

Specify resource names based on your name, or use a student name, such as S1 for the student using train01, as directed by your instructor.

Appendix A provides brief lab instructions for experienced students.• “Adding Resources to a Service Group,” page A-33

Appendix B provides step-by-step lab instructions.• “Adding a Process Resource,” page B-51

Appendix C provides complete lab instructions and solutions.• “Adding a Process Resource,” page C-82

Classroom ExerciseCreate a Process resource. Your instructor may demonstrate thesteps to perform this task. 1. Complete the design worksheet with values for your

classroom.2. Add a Process resource using either the GUI or CLI.3. See Appendix A, B, or C for detailed instructions.

Required Attributes/bin/shPathName

nameProcess1Resource Name

Yes (1)Enabled?No (0)Critical?/name1/loopy name 1Arguments

Required AttributesProcessResource Type

nameSG1Service Group NameYour ValueSample ValueResource Definition

Page 205: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Design Worksheet Example

Corresponding main.cf EntryProcess nameProcess1 (

PathName = "/bin/sh"Arguments = "/name1/loopy name 1")

Resource Definition Sample Value Your Value

Service Group nameSG1

Resource Name nameProcess1

Resource Type Process

Required Attributes

PathName /bin/sh

Optional Attributes

Arguments /name1/loopy name 1

Critical? No (0)

Enabled? Yes (1)

Page 206: VERITAS Cluster Server for UNIX Fundamentals

7–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Solving Common Configuration Errors Verify that each resource is online on the local system before continuing the service group configuration procedure.

If you are unable to bring a resource online, use the procedure in the diagram to find and fix the problem. You can view the logs through Cluster Manager or in the /var/VRTSvcs/log file if you need to determine the cause of errors.

Note: Some resources do not need to be disabled and reenabled. Only resources whose agents have open and close entry points, such as MultiNICA, require you to disable and enable again after fixing the problem. By contrast, a Mount resource does not need to be disabled if, for example, you incorrectly specify the MountPoint attribute.

However, it is generally good practice to disable and enable regardless because it is difficult to remember when it is required and when it is not.

More detail on performing tasks necessary for solving resource configuration problems is provided in the following sections.

Solving Common Configuration ErrorsCommon mistakes made during configuration:

You have improperly specified an attribute value.The operating system configuration is incorrect for the resource.

Flush GroupFlush Group

DoneDone

NOnline?Online?

Faulted?Faulted?

Clear ResourceClear Resource

Disable Resource*Disable Resource*

Y

Y

Waiting to Go Online

Modify AttributesModify Attributes

Bring OnlineBring Online

Enable Resource*Enable Resource*

Verify Offline (OS)Everywhere

Verify Offline (OS)Everywhere

Check LogCheck Log

N

Page 207: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Flushing a Service GroupOccasionally, agents for the resources in a service group can appear to become suspended waiting for resources to be brought online or be taken offline.

Generally, this condition occurs during initial configuration and testing because the required attributes for a resource are not defined properly or the underlying operating system resources are not prepared correctly. If it appears that a resource or group has become suspended while being brought online, you can flush the service group to enable corrective action.

Flushing a service group stops VCS from attempting to bring resources online or take them offline and clears any internal wait states. You can then check resources for configuration problems or underlying operating system configuration problems and then attempt to bring resources back online.

Note: Before flushing a service group, verify that the physical or software resource is actually stopped.

Flushing a Service Group

Misconfigured resources can cause agent processes to appear to hang.Verify that the resource is stopped at the operating system level.Flush the service group to stop all online and offline processes.

Misconfigured resources can cause agent processes to appear to hang.Verify that the resource is stopped at the operating system level.Flush the service group to stop all online and offline processes.

hagrp –flush DemoSG –sys S1hagrp –flush DemoSG –sys S1

Page 208: VERITAS Cluster Server for UNIX Fundamentals

7–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Disabling a ResourceDisable a resource before you start modifying attributes to fix a misconfigured resource. When you disable a resource, VCS stops monitoring the resource, so it does not fault or wait to come online while you are making changes.

When you disable a resource, the agent calls the close entry point, if defined. The close entry point is optional.

When the close tasks are completed, or if there is no close entry point, the agent stops monitoring the resource.

Disabling a ResourceNonpersistent resources must be taken offline before being disabled.VCS calls the agent on each system in the SystemList.The agent calls the close entry point, if present, to reset the resource.The agent stops monitoring disabled resources.

Nonpersistent resources must be taken offline before being disabled.VCS calls the agent on each system in the SystemList.The agent calls the close entry point, if present, to reset the resource.The agent stops monitoring disabled resources.

hares –modify DemoIP Enabled 0hares –modify DemoIP Enabled 0

Page 209: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–29Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Copying and Deleting a ResourceIf you add a resource and want to change the resource name in a running cluster later, you must delete the resource.

Before deleting a resource, take all parent resources offline, take the resource offline, and then disable the resource. Also, remove any links to and from that resource.

A recommended practice is to delete all resources before removing a service group. This prevents possible resource faults and error log entries that can occur if a service group with online resources is deleted. After deleting the resources, you can delete the service group using the hagrp -delete service_group command.

You can copy and paste a resource to modify the resource name. You can either add a prefix or suffix to the existing name, or specify a completely different name.

You can also copy a partial or complete resource tree by right-clicking the topmost resource and selecting Copy—>Self and Child Nodes.

Copying and Deleting a ResourceTo change a resource name, you must delete the existing resource and create a new resource with the correct name.Before deleting a resource:

1. Take parent resources offline, if any exist.

2. Take the resource offline.3. Disable the resource.4. Unlink any dependent

resources.Delete all resources before deleting a service group.You can copy and paste resources as a method of modifying the name.

To change a resource name, you must delete the existing resource and create a new resource with the correct name.Before deleting a resource:

1. Take parent resources offline, if any exist.

2. Take the resource offline.3. Disable the resource.4. Unlink any dependent

resources.Delete all resources before deleting a service group.You can copy and paste resources as a method of modifying the name.

hares –offline parent_res

hares –delete DemoDG

hares –offline parent_res

hares –delete DemoDG

Page 210: VERITAS Cluster Server for UNIX Fundamentals

7–30 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Testing the Service GroupAfter you have successfully brought each resource online, link the resources and switch the service group to each system on which the service group can run.

For simplicity, this service group uses the Priority failover policy, which is the default value. That is, if a critical resource in DemoSG faults, the service group is taken offline and brought online on the system with the highest priority.

The “Configuring VCS Response to Resource Faults” lesson provides additional information about configuring and testing failover behavior. Additional failover policies are also described in the High Availability Using VERITAS Cluster Server for UNIX, Implementing Local Clusters course.

Testing Procedure

Check Logs/FixCheck Logs/Fix

DoneDone

Success?Success?

Test SwitchingTest Switching

Set Critical ResSet Critical Res

Link ResourcesLink Resources

N

Y

Test FailoverTest Failover

After all resources areonline locally:1. Link resources.2. Switch the service

group to each system on which it is configured to run.

3. Set resources to critical, as specified in the design worksheet.

4. Test failover.

StartStart

Page 211: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–31Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Linking ResourcesWhen you link a parent resource to a child resource, the dependency becomes a component of the service group configuration. When you save the cluster configuration, each dependency is listed at the end of the service group definition, after the resource specifications, in the format show in the slide.In addition, VCS creates a dependency tree in the main.cf file at the end of the service group definition to provide a more visual view of resource dependencies. This is not part of the cluster configuration, as denoted by the // comment markers.

// resource dependency tree////group DemoSG//{//IP DemoIP// {// NIC DemoNIC// }//}

Linking Resources

main.cfmain.cf

hares –link DemoIP DemoNIChares –link DemoIP DemoNIC

DemoIP requires DemoNICDemoIP requires DemoNIC

Page 212: VERITAS Cluster Server for UNIX Fundamentals

7–32 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Resource DependenciesVCS enables you to link resources to specify dependencies. For example, an IP address resource is dependent on the NIC providing the physical link to the network.

Ensure that you understand the dependency rules shown in the slide before you start linking resources.

Resource DependenciesParent resources depend on child resources:– A child resource must be

online before the parent resource can come online.

– The parent resource must go offline before the child resource can go offline.

Parent resources cannot be persistent.You cannot link resources in different service groups.Resources can have an unlimited number of parent and child resources.Cyclical dependencies are not allowed.

Parent resources depend on child resources:– A child resource must be

online before the parent resource can come online.

– The parent resource must go offline before the child resource can go offline.

Parent resources cannot be persistent.You cannot link resources in different service groups.Resources can have an unlimited number of parent and child resources.Cyclical dependencies are not allowed.

DemoVolDemoMountDemoNICDemoIPDemoMountDemoProcess

Resource Dependency DefinitionDemoSG Service GroupRequires

DemoDGDemoVol

DemoIPDemoProcess

Child Resource

Parent Resource

Page 213: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–33Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Classroom Exercise: Linking ResourcesLink resources in the nameSG1 service group according to the worksheet using either the GUI or CLI.

Appendix A provides brief lab instructions for experienced students.• “Linking Resources in the Service Group,” page A-37

Appendix B provides step-by-step lab instructions.• “Linking Resources in the Service Group,” page B-52

Appendix C provides complete lab instructions and solutions.• “Linking Resources in the Service Group,” page C-84

Classroom ExerciseLink resources. Your instructor may demonstrate the steps to perform this task. 1. Complete the design worksheet with values for your classroom.2. Link resources according to the worksheet using either the GUI

or CLI.3. See Appendix A, B, or C for detailed instructions.

nameVol1nameMount1nameNIC1nameIP1nameMount1nameProcess1

Resource Dependency DefinitionnameSG1 Service Group

RequiresnameDG1nameVol1

nameIP1nameProcess1

Child ResourceParent Resource

Page 214: VERITAS Cluster Server for UNIX Fundamentals

7–34 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Design Worksheet Example

Corresponding main.cf Entries nameProcess1 requires nameIP1

nameProcess1 requires nameMount1

nameMount1 requires nameVol1

nameVol1 requires nameDG1

nameIP1 requires nameNIC1

Resource Dependency Definition

Service Group nameSG1

Parent Resource Requires Child Resource

nameVol1 nameDG1

nameMount1 nameVol1

nameIP1 nameNIC1

nameProcess1 nameMount1

nameProcess1 nameIP1

Page 215: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–35Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Setting the Critical AttributeThe Critical attribute is set to 1, or true, by default. When you initially configure a resource, you set the Critical attribute to 0, or false. This enables you to test the resources as you add them without the resource faulting and causing the service group to fail over as a result of configuration errors you make.

Some resources may always be set to non-critical. For example, a resource monitoring an Oracle reporting database may not be critical to the overall service being provided to users. In this case, you can set the resource to non-critical to prevent downtime due to failover in the event that it was the only resource that faulted.

Note: When you set an attribute to a default value, the attribute is removed from main.cf. For example, after you set Critical to 1 for a resource, the Critical = 0 line is removed from the resource configuration because it is now set to the default value for the NIC resource type.

To see the values of all attributes for a resource, use the hares command. For example:hares -display DemoNIC

Setting the Critical AttributeWhen set to Critical:

The Critical attribute is removed from main.cf(Critical=1 is the default setting for all resources).The entire service group fails over if the resource faults.

When set to Critical:The Critical attribute is removed from main.cf(Critical=1 is the default setting for all resources).The entire service group fails over if the resource faults.

main.cfmain.cfNIC DemoNIC (

Device = qfe1)

NIC DemoNIC (Device = qfe1)

hares –modify DemoNIC Critical 1hares –modify DemoNIC Critical 1

Page 216: VERITAS Cluster Server for UNIX Fundamentals

7–36 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Classroom Exercise: Testing the Service GroupSet each resource to critical and then switch the service group between systems and verify that it operates properly on both systems in the cluster.

Appendix A provides brief lab instructions for experienced students.• “Testing the Service Group,” page A-37

Appendix B provides step-by-step lab instructions.• “Testing the Service Group,” page B-53

Appendix C provides complete lab instructions and solutions.• “Testing the Service Group,” page C-85

Classroom Exercise Test the service group. Your instructor may demonstrate the steps to perform this task.1. Complete the design worksheet with values for your

classroom.2. Test switching the service group between cluster

systems.3. Set resources to Critical using either the GUI or CLI.4. See Appendix A, B, or C for detailed instructions.

Page 217: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–37Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

A Completed Process Service GroupYou can display the completed resource diagram in Cluster Manager in the Resources view when a service group is selected. The main.cf file corresponding to this sample configuration for Solaris is shown here. An example main.cf corresponding to the classroom exercises is shown in Appendix B and Appendix C.

Corresponding main.cf Entries for DemoSGinclude "types.cf"

cluster VCS (UserNames = { admin = "j5_eZ_^]Xbd^\\_Y_d\\" }Administrators = { admin }CounterInterval = 5)

A Completed Process Service Group

Process/demo/orderproc

Mount/demo

VolumeDemoVol

DiskGroupDemoDG

NICqfe1

IP10.10.21.198

Page 218: VERITAS Cluster Server for UNIX Fundamentals

7–38 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

system S1 ()

system S2 ()

group DemoSG (SystemList = { S1 = 1, S2 = 2 }AutoStartList = { S1 })

DiskGroup DemoDG (Critical = 0DiskGroup = DemoDG)

IP DemoIP (Critical = 0Device = qfe1Address = "10.10.21.198")

Mount DemoMount (Critical = 0MountPoint = "/demo"BlockDevice = "/dev/vx/dsk/DemoDG/DemoVol"FSType = vxfsFsckOpt = "-y")

NIC DemoNIC (Critical = 0Device = qfe1)

Process DemoProcess (Critical = 0PathName = "/bin/sh"Arguments = "/sbin/orderproc up")

Page 219: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–39Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Volume DemoVol (Critical = 0Volume = DemoVolDiskGroup = DemoDG)

DemoProcess requires DemoIP

DemoProcess requires DemoMount

DemoMount requires DemoVol

DemoVol requires DemoDG

DemoIP requires DemoNIC

Page 220: VERITAS Cluster Server for UNIX Fundamentals

7–40 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson described the procedure for creating a service group and two tools for modifying a running cluster: the Cluster Manager graphical user interface and VCS ha commands.

Next StepsAfter you familiarize yourself with the online configuration methods and tools, you can modify configuration files directly to practice offline configuration.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This guide describes each bundled agent in detail.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• VERITAS Cluster Server Command Line Quick ReferenceThis card provides the syntax rules for the most commonly used VCS commands.

Lesson SummaryKey Points – Follow a standard procedure for creating and

testing service groups.– Recognize common configuration problems

and apply a methodology for finding solutions.Reference Materials– VERITAS Cluster Server Bundled Agent

Reference Guide– VERITAS Cluster Server User's Guide– VERITAS Cluster Server Command Line Quick

Reference

Page 221: VERITAS Cluster Server for UNIX Fundamentals

Lesson 7 Online Configuration of Service Groups 7–41Copyright © 2005 VERITAS Software Corporation. All rights reserved.

7

Lab 7: Online Configuration of a Service GroupLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 7 Synopsis: Online Configuration of a Service Group,” page A-31

Appendix B provides step-by-step lab instructions.• “Lab 7: Online Configuration of a Service Group,” page B-41

Appendix C provides complete lab instructions and solutions.• “Lab 7 Solutions: Online Configuration of a Service Group,” page C-67

GoalThe purpose of this lab is to create a service group while VCS is running using either the Cluster Manager graphical user interface or the command-line interface.

PrerequisitesThe shared storage and networking resources must be configured and tested. Disk groups must be offline on all systems.

ResultsNew service groups defined in the design worksheet are running and tested on both cluster systems.

Lab 7: Online Configuration of a Service GroupUse the Java GUI to:

Create a service group.Add resources to the service group from the bottom of the dependency tree.Substitute the name you used to create the disk group and volume.

Page 222: VERITAS Cluster Server for UNIX Fundamentals

7–42 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 223: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8Offline Configuration of Service Groups

Page 224: VERITAS Cluster Server for UNIX Fundamentals

8–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how to create a service group and configure resources by modifying the main.cf configuration file.

ImportanceIn some circumstances, it is more efficient to modify the cluster configuration by changing the configuration files and restarting VCS to bring the new configuration into memory on each cluster system.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 225: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Outline of Topics• Offline Configuration Procedures• Using the Design Worksheet• Offline Configuration Tools• Solving Offline Configuration Problems• Testing the Service Group

Use a design worksheet during configuration.

Using the Design Worksheet

Resolve common errors made during offline configuration.

Solving Offline Configuration Problems

Test the service group to ensure it is correctly configured.

Testing the Service Group

Create service groups and resources using offline configuration tools.

Offline Configuration Tools

Describe offline configuration procedures.

Offline Configuration Procedures

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 226: VERITAS Cluster Server for UNIX Fundamentals

8–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Offline Configuration ProceduresNew ClusterThe diagram illustrates a process for modifying the cluster configuration when you are configuring your first service group and do not already have services running in the cluster.

Stop VCS

Stop VCS on all cluster systems. This ensures that there is no possibility that another administrator is changing the cluster configuration while you are modifying the main.cf file.

Edit the Configuration Files

You must choose a system on which to modify the main.cf file. You can choose any system. However, you must then start VCS first on that system. Add service group definitions, as shown in the “Example Configuration File” section.

Verify the Configuration File Syntax

Run the hacf command in the /etc/VRTSvcs/conf/config directory to verify the syntax of the main.cf and types.cf files after you have modified them. VCS cannot start if the configuration files have syntax errors. Run the command in the config directory or specify the path.

Note: The hacf command only identifies syntax errors, not configuration errors.

New Cluster Procedure—No Services Running

Stop VCS on all systems.Stop VCS on all systems.

Edit the configuration file. Edit the configuration file.

Verify configuration file syntax.Verify configuration file syntax.

Start VCS on this system.Start VCS on this system.

hacf –verify .hacf –verify .

vi main.cfvi main.cf

hastatus -sumhastatus -sum

hastop -allhastop -all

Verify that VCS is running.Verify that VCS is running.

hastarthastart

Start VCS on all other systems.Start VCS on all other systems. hastart -stalehastart -stale

First SystemFirst System

All Other SystemsAll Other Systems

# vi main.cf~~group WebSG (SystemList =AutoStartList)NIC WebNIC (Critical = 0Device = xxxx}

. . .

# vi main.cf~~group WebSG (SystemList =AutoStartList)NIC WebNIC (Critical = 0Device = xxxx}

. . .

Page 227: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Start VCS on the System with the Modified Configuration File

Start VCS first on the system with the modified main.cf file. Verify that VCS started on that system.

Verify that VCS Is Running

Verify that VCS started on that system before starting VCS on other systems.

Start Other Systems

After VCS is in a running state on the first system, start VCS on all other systems. If you cannot bring VCS to a running state on all systems, see the “Solving Common Offline Configuration Problems” section.

Example Configuration File

include "types.cf"

cluster vcs (

UserNames = { admin = ElmElgLimHmmKumGlj }

ClusterAddress = "192.168.27.51"

Administrators = { admin }

CounterInterval = 5

)

system S1 (

)

system S2 (

)

group WebSG (SystemList = { S1 = 1, S2 = 2 }AutoStartList = { S1 })

DiskGroup WebDG (Critical = 0DiskGroup = WebDG)

Page 228: VERITAS Cluster Server for UNIX Fundamentals

8–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IP WebIP (Critical = 0Device = qfe1Address = "10.10.21.200")

Mount WebMount (Critical = 0MountPoint = "/Web"BlockDevice = "/dev/dsk/WebDG/WebVol"FSType = vxfs)

NIC WebNIC (Critical = 0Device = qfe1)

Process WebProcess (Critical = 0PathName = "/bin/ksh"Arguments = "/sbin/tomcat")

Volume WebVol (Critical = 0Volume = WebVolDiskGroup = WebDG)

WebProcess requires WebIP

WebProcess requires WebMount

WebMount requires WebVol

WebVol requires WebDG

WebIP requires WebNIC

Page 229: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Existing ClusterThe diagram illustrates a process for modifying the cluster configuration when you already have service groups configured and want to minimize the time that VCS is not running to protect services that are running.

This procedure includes several built-in protections from common configuration errors and maximizes high availability.

First SystemClose the Configuration

Close the cluster configuration before you start making changes. This ensures that the working copy you make has the latest in-memory configuration. This also ensures that you do not have a stale configuration when you attempt to start the cluster later.

Make a Staging Directory

Make a subdirectory of /etc/VRTSvcs/conf/config in which you can edit a copy of the main.cf file. This ensures that your edits cannot be overwritten if another administrator is making configuration changes simultaneously.

Copy the Configuration Files

Copy the main.cf file and types.cf from /etc/VRTSvcs/conf/config to the staging directory.

Existing Cluster Procedure: Part 1

Edit the configuration files.Edit the configuration files.

Verify configuration file syntax. Verify configuration file syntax. hacf –verify .hacf –verify .

First SystemFirst Systemvi main.cfvi main.cf

Create a working directory.Create a working directory. mkdir stagemkdir stage

Copy main.cf and types.cf.Copy main.cf and types.cf. cp main.cf types.cf stagecp main.cf types.cf stage

Change to the stage directory.Change to the stage directory. cd stagecd stage

Change to the config directory.Change to the config directory. cd /etc/VRTSvcs/conf/configcd /etc/VRTSvcs/conf/config

Close the configuration.Close the configuration. haconf –dump -makerohaconf –dump -makero

Page 230: VERITAS Cluster Server for UNIX Fundamentals

8–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Modify the Configuration Files

Modify the main.cf file in the staging directory on one system. The diagram on the slide refers to this as the first system.

Verify the Configuration File Syntax

Run the hacf command in the staging directory to verify the syntax of the main.cf and types.cf files after you have modified them.

Note: The dot (.) argument indicates that current working directory is used as the path to the configuration files. You can run hacf -verify from any directory by specifying the path to the configuration directory, as shown in this example:

Page 231: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Stop VCS

Note: If you have modified an existing service group, first freeze the service group persistently to prevent VCS from failing over the group. This simplifies fixing resource configuration problems—the service group is not being switched between systems.

Stop VCS on all cluster systems after making configuration changes. To leave applications running, use the -force option, as shown in the diagram.

Copy the New Configuration File

Copy the modified main.cf file from the staging directory into the configuration directory.

Start VCS

Start VCS first on the system with the modified main.cf file.

Verify that VCS Is Running

Verify that VCS has started on that system.

Starting Other Systems

After VCS is in a running state on the first system, start VCS with the -stale option on all other systems. These systems wait until the first system has built a cluster configuration in memory, and then they build their in-memory configurations from the first system.

Existing Cluster Part 2: Restarting VCS

Stop VCS; leave services running.Stop VCS; leave services running.

Start VCS on this system.Start VCS on this system.

hastop –all -forcehastop –all -force

hastatus -sumhastatus -sum

First SystemFirst

SystemVerify that HAD is running.Verify that HAD is running.

hastarthastart

Other SystemsOther Systemshastart -stalehastart -staleStart VCS stale on other systems.Start VCS stale on other systems.

Copy the test main.cf file back.Copy the test main.cf file back. cp main.cf ../main.cfcp main.cf ../main.cf

If you are modifying an existing service group, freeze the group persistently before stopping VCS. This prevents the group from failing over when VCS restarts if there are problems with the configuration.

If you are modifying an existing service group, freeze the group persistently before stopping VCS. This prevents the group from failing over when VCS restarts if there are problems with the configuration.

Page 232: VERITAS Cluster Server for UNIX Fundamentals

8–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the Design WorksheetRefer to the design worksheet for the values needed to create service groups and resources.

The slide displays a service group diagram and a portion of the design worksheet that describes an example resource used for the offline configuration example. A new service group named AppSG is created by copying the DemoSG service group definition within the main.cf file. The DemoSG service group was created as the online configuration example.The diagram shows the relationship between the resources in the AppSG service group. In this example, the DemoSG service group definition, with all its resources and resource dependencies, is copied and modified. You change all resource names from DemoResource to AppResource. For example, DemoNIC is changed to AppNIC. As you edit each resource, modify the appropriate attributes. For example, the AppIP resource must have the Address attribute set to 10.10.21.199.

Using a Resource Worksheet and Diagram

Process/app

Mount/app

VolumeAppVol

DiskGroupAppDG

AppNICqfe1

IP10.10.21.199

255.255.255.0NetMask*Optional Attributes

AppIPResource Name

10.10.21.199Addressqfe1Device

Required AttributesIPResource Type

AppSGService Group NameSample ValueResource Definition

*Required only on HP-UX.*Required only on HP-UX.

Page 233: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Resource DependenciesDocument resource dependencies in your design worksheet and add the links at the end of the service group definition, using the syntax shown in the slide. A complete example service group definition is shown in the next section.

AppDGAppVol

AppMountAppProcessAppVolAppMount

Resource Dependency DefinitionAppSG Service GroupRequires

AppIPAppProcess

AppNICAppIP

Child Parent

Resource DependenciesRemember to add resource dependencies to the service group definition.Review these rules:– Parent resources cannot

be persistent.– You cannot link

resources in different service groups.

– Resources can have an unlimited number of parent and child resources.

– Cyclical dependencies are not allowed.

AppProcess requires AppIPAppProcess requires AppMount

AppMount requires AppVol

AppVol requires AppDGAppIP requires AppNIC

Page 234: VERITAS Cluster Server for UNIX Fundamentals

8–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

A Completed Configuration FileA portion of the completed main.cf file with the new service group definition for AppSG is displayed in the slide. The complete description of the AppSG service group, created by copying the DemoSG service group definition used in the previous lesson, is provided here.

Note: You cannot include comment lines in the main.cf file. The lines you see starting with // are generated by VCS to show resource dependencies. Any lines starting with // are stripped out during VCS startup.

group AppSG (SystemList = { S1 = 1, S2 = 2 }AutoStartList = { S1 })

DiskGroup AppDG (Critical = 0DiskGroup = AppDG)

A Completed Configuration File

. . .

group AppSG (SystemList = { S1 = 0, S2 = 1}AutoStartList = { S1 }Operators = { SGoper })

DiskGroup AppDG (Critical = 0DiskGroup = AppDG)

IP AppIP (Critical = 0Device = qfe1Address = "10.10.21.199")

. . .

. . .

group AppSG (SystemList = { S1 = 0, S2 = 1}AutoStartList = { S1 }Operators = { SGoper })

DiskGroup AppDG (Critical = 0DiskGroup = AppDG)

IP AppIP (Critical = 0Device = qfe1Address = "10.10.21.199")

. . .

main.cfmain.cf

Page 235: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

IP AppIP (Critical = 0Device = qfe1Address = "10.10.21.199")

Mount AppMount (Critical = 0MountPoint = "/app"BlockDevice = "/dev/dsk/AppDG/AppVol"FSType = vxfs)

NIC AppNIC (Critical = 0Device = qfe1)

Process AppProcess (Critical = 0PathName = "/bin/ksh"Arguments = "/app/appd test")

Volume AppVol (Critical = 0Volume = AppVolDiskGroup = AppDG)

AppProcess requires AppIP

AppProcess requires AppMount

AppMount requires AppVol

AppVol requires AppDG

AppIP requires AppNIC

Page 236: VERITAS Cluster Server for UNIX Fundamentals

8–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Offline Configuration ToolsYou can modify the configuration files directly from a staging directory, or use the VCS Simulator to modify copies of configuration files, which can then be used to build the configuration.

Editing Configuration FilesIt is recommended that you create a staging directory and copy configuration files to that directory before making modifications. This ensures that you prevent more than one administrator from making configuration changes simultaneously.

You can use any text editor to modify the main.cf or types.cf files. The example in the offline configuration procedure diagram shows the vi editor because it is a commonly used UNIX editor.

Editing Configuration FilesAs a best practice, copy the main.cf or types.cffiles into a staging directory before modifying.

VCS configuration files are simple text files.You can modify these files with any text editor.Save and close the cluster configuration before you edit files. This ensures that the main.cf file matches the in-memory configuration.Check the syntax after editing configuration files.After making changes, copy files back into the configuration directory.

Page 237: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Using the VCS Simulator You can use the VCS Simulator to create or modify copies of VCS configuration files that are located in a simulator-specific directory. You can also test the new or modified configuration using the simulator and then copy the test configuration files into the /etc/VRTSvcs/conf/config VCS configuration directory.

In addition to the advantage of using a familiar interface, using the VCS Simulator ensures that your configuration files do not contain syntax errors that can more easily be introduced when manually editing the files directly.

When you have completed the configuration, you can copy the files into the standard configuration directory and restart VCS to build that configuration in memory on cluster systems, as described earlier in the “Offline Configuration Procedure” sections.

Using the VCS Simulator

You can create and modify main.cf and types.cf files using the VCS Simulator.This method does not affect the cluster configuration files; simulator configuration files are created in a separate directory.You can also use the simulator to test any main.cf file before putting the configuration into the actual cluster environment.

You can create and modify main.cf and types.cf files using the VCS Simulator.This method does not affect the cluster configuration files; simulator configuration files are created in a separate directory.You can also use the simulator to test any main.cf file before putting the configuration into the actual cluster environment.

Page 238: VERITAS Cluster Server for UNIX Fundamentals

8–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Solving Offline Configuration ProblemsCommon ProblemsAlthough there are several protection mechanisms built into the offline configuration process shown at the beginning of this lesson, you may see these types of problems if you miss a step or perform steps in the wrong order.• All systems in the cluster are in a wait state as a result of starting VCS on a

system where the main.cf file has a syntax problem.• You have started VCS from a system with an old configuration file and now

have the wrong configuration running.

Solutions are provided in the next sections.

Common ProblemsTwo common problems can occur if you do notfollow the recommended offline configuration procedures:

All systems enter a wait state when you start VCS because the main.cf file has a syntax error.You start the cluster from the wrong system, and an old configuration is built in memory.

Page 239: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

All Systems in a Wait StateConsider this scenario:• Your new main.cf file has a syntax problem.• You forget to check the file with hacf -verify.• You start VCS on the first system with hastart.• The first system cannot build a configuration and goes into a wait state, such as

STALE_ADMIN_WAIT or ADMIN_WAIT.• You forget to verify that had is running on the first system and start all other

cluster systems using hastart -stale.

All cluster systems are now waiting and cannot start VCS.

Note: This can also occur if you stop had on all cluster systems while the configuration is open.

Propagating an Old ConfigurationIf your new main.cf file has a syntax problem and you forget to check the file, that system (S1) goes into a wait state. If you then start VCS on another system (S2) using hastart without the -stale option, that system builds the cluster configuration in memory from its old main.cf file on disk. The first system (S1) then builds its configuration from the in-memory configuration on S2, moves the new main.cf file to main.cf.previous, and then writes the old configuration that is now in memory to the main.cf file.

Systems in Wait StatesIf all systems are in a STALE_ADMIN_WAIT or ADMIN_WAIT state:1. Run hacf –verify dir to identify the line with

the syntax problem.2. Fix the syntax problem in the main.cf file.3. Verify the configuration.4. Force the system to perform a local build using

hasys -force sys_name5. Wait while all other systems perform a remote build

automatically and then verify that the systems are running.

Page 240: VERITAS Cluster Server for UNIX Fundamentals

8–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Recovering from an Old ConfigurationIf you are running an old cluster configuration because you started VCS on the wrong system first, you can recover the main.cf file on the system where you originally made the modifications using the backup main.previous.cf file created automatically by VCS.

You then use the offline configuration procedure to restart VCS using the recovered configuration file, as shown with example commands below.1 Close the configuration, if open.

haconf -dump -makero2 Stop VCS on all systems and keep applications running.

hastop -all -force3 On the system where you originally modified the main.cf file, copy the

main.cf.previous file to the main.cf file.copy main.cf.pervious main.cf

4 Verify the configuration.hacf -verify

5 Start VCS on this system using the hastart command.hastart

6 Verify that VCS is running using hastatus.hastatus -sum

7 Start VCS stale on all other systems to ensure that they wait to build their configuration from the first system.hastart -stale

Recovering from an Old ConfigurationIf you inadvertently start the cluster with an oldconfiguration file:1. Close the configuration, if open.2. Stop VCS on all systems and keep applications running.3. On the system where you originally modified the

main.cf file, copy the main.cf.previous file to the main.cf file.

4. Verify the configuration.5. Start VCS on this system using the hastart command.6. Verify that VCS is running using hastatus.7. Start VCS stale on all other systems to ensure that they

wait to build their configuration from the first system.

Page 241: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

Configuration File BackupsEach time you save the cluster configuration, VCS maintains backup copies of the main.cf and types.cf files.

Although it is always recommended that you copy configuration files before modifying them, you can revert to an earlier version of these files if you damage or lose a file.

Configuration File Backups

total 140-rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw-------

total 140-rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw--------rw-------

Mar 21 13:09 main.cfMar 14 17:22 main.cf.14Mar2004.17:22:25Mar 16 18:00 main.cf.16Mar2004.18:00:54Mar 20 11:37 main.cf.20Mar2004.11:37:49Mar 21 13:09 main.cf.21Mar2004.13:09:11Mar 21 13:10 main.cf.previousMar 21 13:09 types.cfMar 14 17:22 types.cf.14Mar2004.17:22:25Mar 16 18:00 types.cf.16Mar2004.18:00:54Mar 20 11:37 types.cf.20Mar2004.11:37:49Mar 21 13:09 types.cf.21Mar2004.13:09:11Mar 21 13:10 types.cf.previous

Mar 21 13:09 main.cfMar 14 17:22 main.cf.14Mar2004.17:22:25Mar 16 18:00 main.cf.16Mar2004.18:00:54Mar 20 11:37 main.cf.20Mar2004.11:37:49Mar 21 13:09 main.cf.21Mar2004.13:09:11Mar 21 13:10 main.cf.previousMar 21 13:09 types.cfMar 14 17:22 types.cf.14Mar2004.17:22:25Mar 16 18:00 types.cf.16Mar2004.18:00:54Mar 20 11:37 types.cf.20Mar2004.11:37:49Mar 21 13:09 types.cf.21Mar2004.13:09:11Mar 21 13:10 types.cf.previous

# ls -l /etc/VRTSvcs/conf/config# ls -l /etc/VRTSvcs/conf/config

root otherroot other

Page 242: VERITAS Cluster Server for UNIX Fundamentals

8–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Testing the Service GroupService Group Testing ProcedureAfter you restart VCS throughout the cluster, bring the new service groups online and use the procedure shown in the slide to verify that your configuration additions or changes are correct.

Note: This process is slightly different from online configuration, which tests each resource before creating the next and before creating dependencies.

Use the procedures shown in the “Online Configuration of Service Groups” lesson to solve configuration problems, if any.

If you need to make additional modifications, you can use one of the online tools or modify the configuration files using the offline procedure.

Service Group Testing Procedure

Check Logs/FixCheck Logs/Fix

DoneDone

Success?Success?

Test SwitchingTest Switching

Set CriticalSet Critical

N

Y

Test FailoverTest Failover

1. Bring service groups online.

2. Switch the service group to each system on which it is configured to run.

3. Change critical attributes, as appropriate.

4. Test failover.

Online?Online? Troubleshoot

YN

StartStart

Page 243: VERITAS Cluster Server for UNIX Fundamentals

Lesson 8 Offline Configuration of Service Groups 8–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

8

SummaryThis lesson introduced a methodology for creating a service group by modifying the main.cf configuration file and restarting VCS to use the new configuration.

Next StepsNow that you are familiar with a variety of tools and methods for configuring service groups, you can apply these skills to more complex configuration tasks.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This guide describes each bundled agent in detail.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• VERITAS Cluster Server Command Line Quick ReferenceThis card provides the syntax rules for the most commonly used VCS commands.

Lesson SummaryKey Points – You can use a text editor or the VCS Simulator

to modify VCS configuration files.– Apply a methodology for modifying and testing

a VCS configuration.Reference Materials– VERITAS Cluster Server Bundled Agents

Reference Guide– VERITAS Cluster Server User's Guide– VERITAS Cluster Server Command Line Quick

Reference

Page 244: VERITAS Cluster Server for UNIX Fundamentals

8–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 8: Offline Configuration of Service GroupsLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 8 Synopsis: Offline Configuration of a Service Group,” page A-38

Appendix B provides step-by-step lab instructions.• “Lab 8: Offline Configuration of a Service Group,” page B-57

Appendix C provides complete lab instructions and solutions.• “Lab 8 Solutions: Offline Configuration of a Service Group,” page C-89

GoalThe purpose of this lab is to add a service group by copying and editing the definition in main.cf for nameSG1.

PrerequisitesStudents must coordinate when stopping and restarting VCS.

ResultsThe new service group defined in the design worksheet is running and tested on both cluster systems.

Lab 8: Offline Configuration of a Service Group

nameProcess2

AppVol

AppDG

nameNIC2

nameIP2

nameDG2

nameVol2

nameMount2

nameProcess1

nameDG1

nameVol1

nameMount1

nameNIC1

nameIP1

nameSG1nameSG1 nameSG2nameSG2

Working together, follow the offline configuration procedure. Alternately, work alone and use the GUI to create a new service group.

Working together, follow the offline configuration procedure. Alternately, work alone and use the GUI to create a new service group.

Page 245: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9Sharing Network Interfaces

Page 246: VERITAS Cluster Server for UNIX Fundamentals

9–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how to create a parallel service group containing networking resources shared by multiple service groups.

ImportanceIf you have multiple service groups that use the same network interface, you can reduce monitoring overhead by using Proxy resources instead of NIC resources. If you have many NIC resources, consider using Proxy resources to minimize any potential performance impacts of monitoring.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 247: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Outline of Topics• Sharing Network Interfaces • Alternate Network Configurations • Using Parallel Service Groups • Localizing Resource Attributes

Describe alternate network configurations.

Alternate Network Configurations

Localize resource attributes.Localizing Resource Attributes

Use parallel service groups with network resources.

Using Parallel Service Groups

Describe how multiple service groups can share network interfaces.

Sharing Network Interfaces

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 248: VERITAS Cluster Server for UNIX Fundamentals

9–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Sharing Network InterfacesConceptual ViewThis example shows a cluster system running three service groups using the same network interface. Each service group has a unique NIC resource with a unique name, but the Device attribute for each NIC resource is the same.

Because each service group has its own NIC resource for the interface, VCS monitors the same network interface—qfe1—many times, creating unnecessary overhead and network traffic.

Conceptual ViewThree service groups contain NICresources to monitor the same network interface on the system.

The NIC agent runs the monitor cycle every 60 seconds for each NIC resource that is online.Additional network traffic is generated by multiple monitor cycles running for the same device.

qfe1

Page 249: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Configuration ViewThe example shows a configuration with many service groups using the same network interface specified in the NIC resource. Each service group has a unique NIC resource with a unique name, but the Device attribute for all is qfe1 in this Solaris example.

In addition to the overhead of many monitor cycles for the same resource, a disadvantage of this configuration is the effect of changes in NIC hardware. If you must change the network interface (for example in the event the interface fails), you must change the Device attribute for each NIC resource monitoring that interface.

Configuration View

SolarisSolaris

.

.

.IP DBIP (

Device = qfe1Address = " 10.10.21.198")

NIC DBNIC (Device = qfe1)

DBIP requires DBNIC

.

.

.IP DBIP (

Device = qfe1Address = " 10.10.21.198")

NIC DBNIC (Device = qfe1)

DBIP requires DBNIC

DBSGmain.cfDBSG

main.cf...IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

.

.

.IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

NFSSGmain.cfNFSSGmain.cf.

.

.IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

.

.

.IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

Orac1SGmain.cfOrac1SGmain.cf.

.

.IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

.

.

.IP AppIP (

Device = qfe1Address = " 10.10.21.198")

NIC AppNIC (Device = qfe1)

AppIP requires AppNIC

Ora1SGmain.cfOra1SGmain.cf.

.

.IP DBIP (

Device = qfe1Address = " 10.10.21.199")

NIC DBNIC (Device = qfe1)

DBIP requires DBNIC

.

.

.IP DBIP (

Device = qfe1Address = " 10.10.21.199")

NIC DBNIC (Device = qfe1)

DBIP requires DBNIC

DBSGmain.cfDBSG

main.cf

.

.

.IP WebIP (Device = qfe1

Address = “10.10.21.198")

NIC WebNIC (Device = qfe1)

WebIP requires WebNIC

.

.

.IP WebIP (Device = qfe1

Address = “10.10.21.198")

NIC WebNIC (Device = qfe1)

WebIP requires WebNIC

WebSGmain.cfWebSGmain.cf

Page 250: VERITAS Cluster Server for UNIX Fundamentals

9–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Alternate Network Configurations Using Proxy ResourcesYou can use a Proxy resource to allow multiple service groups to monitor the same network interfaces. This reduces the network traffic that results from having multiple NIC resources in different service groups monitor the same interface.

Using Proxy Resources

DBProcess

DBVol

DBDG

DBProxy

DBIP

DBDG

DBVol

DBMount

WebProcess

WebDG

WebVol

WebMount

WebNIC

WebIP

A Proxy resource mirrors the state of another resource (for example, NIC).A Proxy resource mirrors the state of another resource (for example, NIC).

Page 251: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

The Proxy Resource TypeThe Proxy resource mirrors the status of another resource in a different service group. The required attribute, TargetResName, is the name of the resource whose status is reflected by the Proxy resource.

Optional Attributes

TargetSysName specifies the name of the system on which the target resource status is monitored. If no system is specified, the local system is used as the target system.

DBProxyResource Name

WebNICTargetResNameRequired Attributes

ProxyResource Type

DBSGService Group Name

Sample Value

Resource Definition

The Proxy Resource TypeThe Proxy agent monitors the status of a specified resource on the local system, unless TargetSysName is specified.TargetResName must be in a separate service group.

Proxy DBProxy (Critical = 0 TargetResName = WebNIC)

Proxy DBProxy (Critical = 0 TargetResName = WebNIC)

main.cfmain.cf

Page 252: VERITAS Cluster Server for UNIX Fundamentals

9–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using Parallel Service GroupsYou can further refine your configuration for multiple service groups that use the same network interface by creating a service group that provides only network services. This service group is configured as a parallel service group because the NIC resource is persistent and can be online on multiple systems. With this configuration, all the other service groups are configured with a Proxy to the NIC resource on their local system.

This type of configuration can be easier to monitor at an operations desk, because the state of the parallel service group can be used to indicate the state of the network interfaces on each system.

Determining Service Group StatusService groups that do not include any OnOff resources as members are not reported as online, even if their member resources are online, because the status of the None and OnOnly resources is not considered when VCS reports whether a service group is online.

Network Resources in a Parallel Service Group

How do you determine the status of the parallel service group with only a persistent resource?How do you determine the status of the parallel service group with only a persistent resource?

DBIP

DBSG WebSG

S1

DBProxy

WebIP

WebProxy

DBIP

DBSG WebSG

S2

DBProxy

WebIP

WebProxy

NetNIC NetSG

Page 253: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Phantom ResourcesThe Phantom resource is used to report the actual status of a service group that consists of only persistent resources. A service group shows an online status only when all of its nonpersistent resources are online. Therefore, if a service group has only persistent resources, VCS considers the group offline, even if the persistent resources are running properly. When a Phantom resource is added, the status of the service group is shown as online.

Note: Use this resource only with parallel service groups.

Phantom Resources

A Phantom resource can be used to enable VCS to report the online status of a service group with only persistent resources.A Phantom resource can be used to enable VCS to report the online status of a service group with only persistent resources.

DBIP

DBSG WebSG

DBProxy

WebIP

WebProxy

DBIP

DBSG WebSG

DBProxy

WebIP

WebProxy

S1S1 S2S2NetNIC

NetPhantom

NetSG

Page 254: VERITAS Cluster Server for UNIX Fundamentals

9–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The Phantom Resource TypeThe Phantom resource enables VCS to determine the status of service groups with no OnOff resources, that is, service groups with only persistent resources.• Service groups that do not have any OnOff resources are not brought online

unless they include a Phantom resource.• The Phantom resource is used only in parallel service groups.

NetPhantomResource Name

Required AttributesPhantomResource Type

NetSGService Group Name

Sample Value

Resource Definition

The Phantom Resource TypeDetermines the status of a service groupRequires no attributes

Phantom NetPhantom ()

Phantom NetPhantom ()

main.cfmain.cf

Page 255: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Configuring a Parallel Service GroupYou cannot change an existing failover service group that contains resources to a parallel service group except by using the offline configuration procedure. In this case, you can add the Parallel attribute definition to the service group, as displayed in the diagram.

To create a parallel service group in a running cluster:1 Create a new service group using either the GUI or CLI.2 Set the Parallel attribute to 1 (true).3 Add resources.

Set the critical attributes after you have verified that the service group is online on all systems in SystemList.

Note: If you have a service group that already contains resources, you must set the Parallel attribute by editing the main.cf file and restarting VCS with the modified configuration file.

S1, S2AutoStartListOptional Attributes

S1=0, S2=1

SystemList1Parallel

Required AttributesNetSGGroup

Sample Value

Service Group Definition

group NetSG (SystemList = {S1 = 0, S2 = 1}AutoStartList = {S1, S2}Parallel = 1)

NIC NetNIC (Device = qfe1)

Phantom NetPhantom ()

group NetSG (SystemList = {S1 = 0, S2 = 1}AutoStartList = {S1, S2}Parallel = 1)

NIC NetNIC (Device = qfe1)

Phantom NetPhantom ()

Configuring a Parallel Service GroupUse an offline process to set the Parallel attribute if the group has resources.Use an online method to set Parallel before adding resources.

main.cfmain.cf

SolarisSolaris

Page 256: VERITAS Cluster Server for UNIX Fundamentals

9–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Properties of Parallel Service GroupsParallel service groups are managed like any other service group in VCS. The group is only started on a system if that system is listed in the AutoStartList and the SystemList attributes. The difference with a parallel service group is that it starts on multiple systems simultaneously if more than one system is listed in AutoStartList.

A parallel service group can also fail over if the service group faults on a system and there is an available system (listed the SystemList attribute) that is not already running the service group.

Properties of Parallel Service GroupsParallel service groups are handled the same way as failover service groups in most cases. All groups:

Start up only on systems defined in AutoStartListFail over to target systems defined in SystemList (where they are not already online)Are managed by the GUI/CLI in the same manner

The difference is that a parallel group can be online onmore than one system without causing a concurrency fault.The difference is that a parallel group can be online onmore than one system without causing a concurrency fault.

Page 257: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Localizing Resource AttributesAn attribute whose value applies to all systems is global in scope. An attribute whose value applies on a per-system basis is local in scope. By default, all attributes are global. Some attributes can be localized to enable you to specify different values for different systems.

Localizing a NIC Resource AttributeIn the example displayed in the slide, the Device attribute for the NIC resource is localized to enable you to specify a different interface for each system.

After creating the resource, you can localize attribute values using the hares command, a GUI, or an offline configuration method. For example, when using the CLI, type:hares -local NetNIC Device

hares -modify NetNIC Device qfe0 -sys S1

hares -modify NetNIC Device hme0 -sys S2

Any attribute can be localized. Network-related resources are common examples for local attributes.

Localizing a NIC Resource Attribute

You can localize the Device attribute for NIC resources when systems have different network interfaces.You can localize the Device attribute for NIC resources when systems have different network interfaces.

DBSG

S1S1 S2S2

WebSG WebSGDBSG

qfe1qfe1 hme0hme0

NetNIC

NetPhantom

NetSGNIC NetNIC (Device@S1 = qfe1 Device@S2 = hme0)

NIC NetNIC (Device@S1 = qfe1 Device@S2 = hme0)

Page 258: VERITAS Cluster Server for UNIX Fundamentals

9–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson introduced a methodology for sharing network resources among service groups.

Next StepsNow that you are familiar with a variety of tools and methods for configuring service groups, you can apply these skills to more complex configuration tasks.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This guide describes each bundled agent in detail.• VERITAS Cluster Server User’s Guide

This guide describes the behavior of parallel service groups and advantages of using Proxy resources.

Lesson SummaryKey Points – Proxy resources reflect the state of another

resources without monitoring overhead.– Network resources can be contained in a

parallel service group for efficiency.Reference Materials– VERITAS Cluster Server Bundled Agents

Reference Guide– VERITAS Cluster Server User's Guide

Page 259: VERITAS Cluster Server for UNIX Fundamentals

Lesson 9 Sharing Network Interfaces 9–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

9

Lab 9: Creating a Parallel Service GroupLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 9 Synopsis: Creating a Parallel Service Group,” page A-47

Appendix B provides step-by-step lab instructions.• “Lab 9: Creating a Parallel Service Group,” page B-73

Appendix C provides complete lab instructions and solutions.• “Lab 9 Solutions: Creating a Parallel Service Group,” page C-109

GoalThe purpose of this lab is to add a parallel service group to monitor the NIC resource and replace the NIC resources in the failover service groups with Proxy resources.

PrerequisitesStudents must coordinate when stopping and restarting VCS.

ResultsA new parallel service group defined in the design worksheet is running on both cluster systems. NIC is replaced with Proxy resources in all other service groups.

Lab 9: Creating a Parallel Service Group

nameProcess2

DBVol

DBDG

nameProxy2

nameIP2

nameDG2

nameVol2

nameMount2

nameProcess1

nameDG1

nameVol1

nameMount1

nameProxy1

nameIP1

NetworkNIC

NetworkPhantom

nameSG1nameSG1 nameSG2nameSG2

NetworkSGNetworkSG

Page 260: VERITAS Cluster Server for UNIX Fundamentals

9–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 261: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10Configuring Notification

Page 262: VERITAS Cluster Server for UNIX Fundamentals

10–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how to configure VCS to provide event notification using e-mail, SNMP traps, and triggers.

ImportanceIn order to maintain a high availability cluster, you must be able to detect and fix problems when they occur. By configuring notification, you can have VCS proactively notify you when certain events occur.

Course Overview

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 263: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Outline of Topics• Notification Overview• Configuring Notification• Using Triggers for Notification

Configure notification using the NotifierMngr resource.

Configuring Notification

Use triggers to provide notification.Using Triggers for Notification

Describe how VCS provides notification.

Notification Overview

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 264: VERITAS Cluster Server for UNIX Fundamentals

10–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Notification OverviewWhen VCS detects certain events, you can configure the notifier to:• Generate an SNMP (V2) trap to specified SNMP consoles.• Send an e-mail message to designated recipients.

Message QueueVCS ensures that no event messages are lost while the VCS engine is running, even if the notifier daemon stops or is not started. The had daemons throughout the cluster communicate to maintain a replicated message queue.

If the service group with notifier configured as a resource fails on one of the nodes, notifier fails over to another node in the cluster. Because the message queue is guaranteed to be consistent and replicated across nodes, notifier can resume message delivery from where it left off after it fails over to the new node.

Messages are stored in the queue until one of these conditions is met:• The notifier daemon sends an acknowledgement to had that at least one

recipient has received the message.• The queue is full. The queue is circular—the last (oldest) message is deleted in

order to write the current (newest) message.• Messages in the queue for one hour are deleted if notifier is unable to deliver to

the recipient. Note: Before the notifier daemon connects to had, messages are stored permanently in the queue until one of the last two conditions is met.

Notification OverviewHow VCS performs notification:1. The had daemon sends a message to the notifier daemon

when an event occurs.2. The notifier daemon formats the event message and sends

an SNMP trap or e-mail message (or both) to designated recipients.

had

notifier

SMTPSNMP

had

NIC

NotifierMngr

NIC

NotifierMngr

Replicated MessageQueue

Page 265: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Message Severity LevelsEvent messages are assigned one of four severity levels by notifier:• Information: Normal cluster activity is occurring, such as resources being

brought online.• Warning: Cluster or resource states are changing unexpectedly, such as a

resource in an unknown state.• Error: Services are interrupted, such as a service group faulting that cannot be

failed over.• SevereError: Potential data corruption is occurring, such as a concurrency

violation.

The administrator can configure notifier to specify which recipients are sent messages based on the severity level.

A complete list of events and corresponding severity levels is provided in the “Job Aids” appendix.

Message Severity Levels

had had

notifier

SMTP

SNMP

SNMP

SNMPError

SevereError

Warning

Information

Concurrency violation

Service group is online.

Resource has faulted.Agent has faulted.

A complete list of events and severity levels isincluded in the “Job Aids”Appendix.

Page 266: VERITAS Cluster Server for UNIX Fundamentals

10–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring NotificationWhile you can start and stop the notifier daemon manually outside of VCS, you can make the notifier component highly available by placing the daemon under VCS control.

Carry out the following steps to configure a highly available notification within the cluster:1 Add a NotifierMngr type of resource to the ClusterService group.2 If SMTP notification is required:

a Modify the SmtpServer and SmtpRecipients attributes of the NotifierMngr type of resource.

b If desired, modify the ResourceOwner attribute of individual resources (described later in the lesson).

c You can also specify a GroupOwner e-mail address for each service group.3 If SNMP notification is required:

a Modify the SnmpConsoles attribute of the NotifierMngr type of resource.b Verify that the SNMPTrapPort attribute value matches the port configured

for the SNMP console. The default is port 162.c Configure the SNMP console to receive VCS traps (described later in the

lesson).4 Modify any other optional attributes of the NotifierMngr type of resource, as

desired.

Configuring Notification

Add a NotifierMngr type of resource to the ClusterService group.

Add a NotifierMngr type of resource to the ClusterService group.

Modify the SmtpServer and SmtpRecipients attributes.

Modify the SmtpServer and SmtpRecipients attributes.

Modify the SnmpConsoles attribute of NotifierMngr.

Modify the SnmpConsoles attribute of NotifierMngr.

If SMTP notificationis required

If SNMP notificationis required

Modify any other optional attributes of NotifierMngr as desired.

Modify any other optional attributes of NotifierMngr as desired.

Optionally, modify the ResourceOwner and GroupOwner attributes.

Optionally, modify the ResourceOwner and GroupOwner attributes.

Configure the SNMP console to receive VCS traps.

Configure the SNMP console to receive VCS traps.

Note: A NotifierMngr resource is added to only one service group, the ClusterService group.

Note: A NotifierMngr resource is added to only one service group, the ClusterService group.

Page 267: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Starting the Notifier Manually

To test the notification component, you can start the notifier process from the command line on a system in the cluster. Note that notification is not under VCS control when the notifier process is started from the command line.

VCS notification is configured by starting the notifier daemon with arguments specifying recipients and corresponding message severity levels. For example:notifier -t m=smtp.acme.com,e=”[email protected]”,l=Warning

In this example, an e-mail message is sent to [email protected] for each VCS event of severity level Warning and higher (including Error and SevereError). The notifier arguments shown in this example are:

This example shows a notifier configuration for SNMP.notifier -s m=north -s m=south,p=2000,l=Error,c=company

The notifier arguments shown in this example are:

See the manual pages for notifier and hanotify for a complete description of notification configuration options.

-t Indicates SMTP server configurations

m Specifies the SMTP system name for the SMTP mail server

e Specifies recipients’ e-mail addresses

l Indicates the event message severity level to include

-s Indicates SNMP server configurations

m=north Sends all level SNMP traps to the north system at the default SNMP port and community value (public)

m=south,p=2000,l=Error,c=company

Sends Error and SevereError traps to the south system at port 2000 and community value company

Page 268: VERITAS Cluster Server for UNIX Fundamentals

10–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The NotifierMngr Resource TypeThe notifier daemon can run on only one system in the cluster, where it processes messages from the local had daemon. If the notifier daemon fails on that system, the NotifierMngr agent detects the failure and migrates the service group containing the NotifierMngr resource to another system.

Because the message queue is replicated throughout the cluster, any system that is a target for the service group has an identical queue. When the NotifierMngr resource is brought online, had sends the queued messages to the notifier daemon.

Adding a NotifierMngr Resource

You can add a NotifierMngr resource using one of the usual methods for adding resources to service groups:• Edit the main.cf file and restart VCS.• Use the Cluster Manager graphical user interface to add the resource

dynamically.• Use the hares command to add the resource to a running cluster.

Note: Before modifying resource attributes, ensure that you take the resource offline and disable it. The notifier daemon must be stopped and restarted with new parameters in order for changes to take effect.

The NotifierMngr Resource Type

NotifierMngr notifier (SmtpServer = "smtp.veritas.com"SmtpRecipients = { "[email protected]" = SevereError }PathName = "/opt/VRTSvcs/bin/notifier")

NotifierMngr notifier (SmtpServer = "smtp.veritas.com"SmtpRecipients = { "[email protected]" = SevereError }PathName = "/opt/VRTSvcs/bin/notifier") main.cfmain.cf

notifierResource NameClusterServiceService Group Name

/opt/VRTSvcs/bin/notifierPathName(required on AIX)

[email protected] = SevereError

SmtpRecipientssmtp.veritas.comSmtpServer

Required Attributes*NotifierMngrResource Type

Sample ValueResource Definition

*Note:Either SnmpConsoles or SmtpXxx attributes must be specified.Both may be specified.

*Note:Either SnmpConsoles or SmtpXxx attributes must be specified.Both may be specified.

Page 269: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

The slide displays examples of the required attributes for Solaris and HP-UX platforms. The NotifierMngr resource on the AIX platform also requires an attribute called PathName, which is the absolute pathname of the notifier daemon.

Optional Attributes• EngineListeningPort: The port that the VCS engine uses for listening.

The default is 14141. Note: This optional attribute exists for VCS 3.5 for Solaris and for HP-UX. This attribute does not exist for VCS 3.5 for AIX or VCS 4.0 for Solaris.

• MessagesQueue: The number of messages in the queueThe default is 30.

• NotifierListeningPort: Any valid unused TCP/IP port numbers The default is 14144.

• SnmpConsole: The fully qualified host name of the SNMP console and the severity levelSnmpConsole is a required attribute if SMTP is not specified.

• SnmpCommunity: The community ID for the SNMP managerThe default is public.

• SnmpdTrapPort: The port to which SNMP traps are sent. The value specified for this attribute is used for all consoles if more than one SNMP console is specified. The default is 162.

• SmtpFromPath: A valid e-mail address, if a custom e-mail address is desired for the FROM: field in the e-mail sent by notifier

• SmtpReturnPath: A valid e-mail address, if a custom e-mail address is desired for the Return-Path: <> field in the e-mail sent by notifier

• SmtpServerTimeout: The time in seconds that notifier waits for a response from the mail server for the SMTP commands it has sent to the mail serverThis value can be increased if the mail server takes too much time to reply back to the SMTP commands sent by notifier.The default is 10.

• SmtpServerVrfyOff: A toggle for sending SMTP VRFY requestsSetting this value to 1 results in notifier not sending a SMTP VRFY request to the mail server specified in SmtpServer attribute, while sending e-mails. Set this value to 1 if your mail server does not support the SMTP VRFY command.The default is 0.

Page 270: VERITAS Cluster Server for UNIX Fundamentals

10–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring the ResourceOwner AttributeYou can set the ResourceOwner attribute to define an owner for a resource. After the attribute is set to a valid e-mail address and notification is configured, an e-mail message is sent to the defined recipient when one of these resource-related events occurs:• ResourceStateUnknown• ResourceMonitorTimeout• ResourceNotGoingOffline• ResourceRestartingByAgent• ResourceWentOnlineByItself• ResourceFaulted

VCS also creates an entry in the log file in addition to sending an e-mail message. For example:2003 /12/03 11:23:48 VCS INFO V-16-1-10304 Resource file1 (Owner=daniel, Group testgroup) is offline on machine1

ResourceOwner can be specified as an e-mail ID ([email protected]) or a user account (daniel). If a user account is specified, the e-mail address is constructed as login@smtp_system, where smtp_system is the system that was specified in the SmtpServer attribute of the NotifierMngr resource.

Configuring the ResourceOwner AttributeVCS sends an e-mail message to the account specified in the ResourceOwner attribute when notification is configured and the attribute is defined for a resource.An entry is also created in the log file:2003 /12/03 11:23:48 VCS INFO V-16-1-10304 Resource file1 (Owner=daniel, Group=testgroup) is offline on machine1

To set the ResourceOwner attribute:hares –modify res_name ResourceOwner daniel

ResourceFaultedResourceNotGoingOffline

ResourceWentOnlineByItselfResourceMonitorTimeout

ResourceRestartingByAgentResourceStateUnknownNotification Events

Page 271: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Configuring the GroupOwner AttributeYou can set the GroupOwner attribute to define an owner for a service group. After the attribute is set to a valid e-mail address and notification is configured, an e-mail message is sent to the defined recipient when one of these group-related events occurs:• The service group caused a concurrency violation.• The service group has faulted and cannot be failed over anywhere.• The service group is online.• The service group is offline. • The service group is autodisabled. • The service group is restarting. • The service group is being switched. • The service group is restarting in response to a persistent resource being

brought online.

VCS also creates an entry in the log file of the form displayed in the slide in addition to sending an e-mail message.

GroupOwner can be specified as an e-mail ID ([email protected]) or a user account (chris). If a user account is specified, the e-mail address is constructed as login@smtp_system, where smtp_system is the system that was specified in the SmtpServer attribute of the NotifierMngr resource.

Configuring the GroupOwner AttributeVCS sends an e-mail message to the account specified in the GroupOwner attribute when notification is configured and the attribute is defined for a service group.An entry is also created in the log file:2003 /12/03 11:23:48 VCS INFO V-16-1-10304 Group SG1(Owner=chris, Group=testgroup) is offline on sys1

To set the GroupOwner attribute:hagrp –modify grp_name GroupOwner chris

Examples of service group events thatcause VCS to send notification to GroupOwner:

FaultedConcurrency ViolationAutodisabled

Examples of service group events thatcause VCS to send notification to GroupOwner:

FaultedConcurrency ViolationAutodisabled

Page 272: VERITAS Cluster Server for UNIX Fundamentals

10–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Configuring the SNMP Console To enable an SNMP management console to recognize VCS traps, you must load the VCS MIB into the console. The textual MIB is located in the /etc/VRTSvcs/snmp/vcs.mib file.

For HP OpenView Network Node Manager (NNM), you must merge the VCS SNMP trap events contained in the /etc/VRTSvcs/snmp/vcs_trapd file. To merge the VCS events, type:xnmevents -merge vcs_trapd

SNMP traps sent by VCS are then displayed in the HP OpenView NNM SNMP console.

Configuring the SNMP Console Load the MIB for VCS traps into the SNMP management console.For HP OpenView Network Node Manager, merge events:xnmevents -merge vcs_trapd

VCS SNMP configuration files:– vcs.mib– vcs_trapd

/etc/VRTSvcs/snmp

Page 273: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Using Triggers for NotificationVCS provides an additional method for notifying the administrator of important events. When VCS detects certain events, you can set a trigger to notify an administrator or perform other actions. You can use event triggers in place of, or in conjunction with, notification.

Triggers are executable programs, batch files, or Perl scripts that reside in $VCS_HOME/bin/triggers. The script name must be one of the predefined event types supported by VCS that are shown in the table.

For example, the ResNotOff trigger is a Perl script named resnotoff that resides in $VCS_HOME/bin/triggers.

Most triggers are configured (enabled) if the trigger program is present; no other configuration is necessary or possible. Each system in the cluster must have the script in this location.

These triggers apply to the entire cluster. For example, ResFault applies to all resources running on the cluster.

Some triggers are enabled by an attribute. Examples are:• PreOnline: If the PreOnline attribute for the service group is set to 1, the

PreOnline trigger is run as part of the online procedure, before the service group is actually brought online.

• ResStateChange: If the TriggerResStateChange attribute is set to 1 for a service group, the ResStateChange trigger is enabled for that service group.

These types of triggers must also have a corresponding script present in the $VCS_HOME/bin/triggers directory.

Using Triggers for Notification

Examples of triggers enabled by presence of script file:– ResFault– ResNotOff– SysOffline– PostOffline– PostOnline

Applies cluster-wide.

Examples of triggers enabled by presence of script file:– ResFault– ResNotOff– SysOffline– PostOffline– PostOnline

Applies cluster-wide.

Triggers are scripts run by VCS when certain eventsoccur and can be used as an alternate method of notification.

Triggers configured by service group attributes:– PreOnline– ResStateChange

Applies only to enabled service groups.

Triggers configured by service group attributes:– PreOnline– ResStateChange

Applies only to enabled service groups.

Note: Other uses of triggers are described later in this course.

Page 274: VERITAS Cluster Server for UNIX Fundamentals

10–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson described how to configure VCS to provide notification using e-mail and SNMP traps.

Next StepsThe next lesson describes how VCS responds to resource faults and the options you can configure to modify the default behavior.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This document provides important reference information for the VCS agents bundled with the VCS software.

• VERITAS Cluster Server User’s GuideThis document provides information about all aspects of VCS configuration.

Lesson SummaryKey Points – You can choose from a variety of notification

methods.– Customize the notification facilities to meet

your specific requirements.Reference Materials– VERITAS Cluster Server Bundled Agents

Reference Guide– VERITAS Cluster Server User's Guide

Page 275: VERITAS Cluster Server for UNIX Fundamentals

Lesson 10 Configuring Notification 10–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

10

Lab 10: Configuring NotificationLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 10 Synopsis: Configuring Notification,” page A-52

Appendix B provides step-by-step lab instructions.• “Lab 10: Configuring Notification,” page B-85

Appendix C provides complete lab instructions and solutions.• “Lab 10 Solutions: Configuring Notification,” page C-125

GoalThe purpose of this lab is to configure notification.

PrerequisitesStudents work together to add a NotifierMngr resource to the ClusterService group.

ResultsThe ClusterService group now has a NotifierMngr resource and notification is working.

Lab 10: Configuring Notification

nameSG1ClusterService nameSG2

NotifierMngr

TriggersTriggersresfaultnofailover

resadminwait

resfaultnofailoverresadminwait

Optional Lab

SMTP Server:

___________________________________

SMTP Server:

___________________________________

Page 276: VERITAS Cluster Server for UNIX Fundamentals

10–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 277: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11Configuring VCS Response to Resource Faults

Page 278: VERITAS Cluster Server for UNIX Fundamentals

11–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how VCS responds to resource faults and introduces various components, such as resource type attributes, that you can configure to customize the VCS engine’s response to resource faults. This lesson also describes how to recover after a resource is put into a FAULTED or ADMIN_WAIT state.

ImportanceIn order to maintain a high availability cluster, you must understand how service groups behave in response to resource failures and how you can customize this behavior. This enables you to configure the cluster optimally for your computing environment.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 279: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Outline of Topics• VCS Response to Resource Faults • Determining Failover Duration• Controlling Fault Behavior • Recovering from Resource Faults• Fault Notification and Event Handling

Configure fault notification and triggers.

Fault Notification and Event Handling

Determine failover duration for a service group.

Determining Failover Duration

Recover from resource faults.Recovering from Resource Faults

Control fault behavior using resource type attributes.

Controlling Fault Behavior

Describe how VCS responds to resources faults.

VCS Resource to Resource Faults

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 280: VERITAS Cluster Server for UNIX Fundamentals

11–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

VCS Response to Resource FaultsFailover Decisions and Critical ResourcesCritical resources define the basis for failover decisions made by VCS. When the monitor entry point for a resource returns with an unexpected offline status, the action taken by the VCS engine depends on whether the resource is critical.

By default, if a critical resource in a failover service group faults, VCS determines that the service group is faulted and fails the service group over to another cluster system, as defined by a set of service group attributes. The rules for selecting a failover target are described in the “Service Group Workload Management” lesson in the High Availability Using VERITAS Cluster Server for UNIX, Implementing Local Clusters course.

The default failover behavior for a service group can be modified using one or more optional service group attributes. Failover determination and behavior are described throughout this lesson.

Failover Decisions and Critical ResourcesOne or more resources in a service group must be setto critical in order for automatic failover to occur inresponse to a fault.

The default VCS behavior for a failover service group is:– If a critical resource faults, the service group fails over.– If any critical resource is taken offline as a result of a

fault, the service group fails over.Other attributes can be set to modify this behavior, as described throughout this lesson.

Page 281: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

How VCS Responds to Resource Faults by DefaultVCS responds in a specific and predictable manner to faults. When VCS detects a resource failure, it performs the following actions:1 Instructs the agent to execute the clean entry point for the failed resource to

ensure that the resource is completely offline Both the service group and the resource transition to a FAULTED state.

2 Takes all resources in the path of the fault offline starting from the faulted resource up to the top of the dependency tree

3 If an online critical resource is part of the path that was faulted or taken offline, takes the entire group offline in preparation for failoverIf no online critical resources are affected, no more action occurs.

4 Attempts to start the service group on another system in the SystemList attribute according to the FailOverPolicy defined for that service group and the relationships between multiple service groupsConfiguring failover policies to control how a failover target is chosen and the impact of service group interactions during failover are discussed in detail later in the course.Note: The state of the group on the new system prior to failover must be GROUP_OFFLINE (not faulted).

5 If no other systems are available, the service group remains offline.

How VCS Responds to Resource Faults

Take all resources in path offlineTake all resources in path offline

Fault the service groupFault the service groupCritical

online resourcein path?

Keep group partially online Keep group

partially online Y N

Failovertarget

available? Keep the service group offline Keep the service group offline

Bring the service group online elsewhereBring the service group online elsewhere

N

Y

Execute clean entry point for the failed resource

Execute clean entry point for the failed resource

Fault the resourceFault the resource

A resource goes offline unexpectedly

DefaultBehaviorDefaultBehavior

Take the entire SG offlineTake the entire SG offline

Page 282: VERITAS Cluster Server for UNIX Fundamentals

11–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

VCS also executes certain event triggers and carries out notification while it performs the tasks displayed on the slide as a response to resource faults. The role of notification and event triggers in resource faults is explained in detail later in this lesson.

Page 283: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

The Impact of Service Group Attributes on FailoverSeveral service group attributes can be used to change the default behavior of VCS while responding to resource faults.

Frozen or TFrozen

These service group attributes are used to indicate that the service group is frozen due to an administrative command. When a service group is frozen, all agent actions except for monitor are disabled. If the service group is temporarily frozen using the hagrp -freeze group command, the TFrozen attribute is set to 1, and if the service group is persistently frozen using the hagrp -freeze group -persistent command, the Frozen attribute is set to 1. When the service group is unfrozen using the hagrp -unfreeze group [-persistent] command, the corresponding attribute is set back to the default value of 0.

ManageFaults

This service group attribute can be used to prevent VCS from taking any automatic actions whenever a resource failure is detected. Essentially, ManageFaults determines whether VCS or an administrator handles faults for a service group.

If ManageFaults is set to the default value of ALL, VCS manages faults by executing the clean entry point for that resource to ensure that the resource is completely offline, as shown previously. This is the default value (ALL). The default setting of ALL provides the same behavior as VCS 3.5.

The Impact of Service Group Attributes

Do not take any other resource offline

Do not take any other resource offline

ManageFaults

Execute clean entry pointFault the resource and the service group

Execute clean entry pointFault the resource and the service group

FaultPropagation

Place resource in an ADMIN_WAIT state

Place resource in an ADMIN_WAIT state

Take all resources in the path offline

Take all resources in the path offline

NONE

ALL

0 1

1

Frozen?Do nothing except to

fault the resourceand the service group

Do nothing except to fault the resource

and the service group

Yes

No

A resource goes offline unexpectedly

Path for default settings

Page 284: VERITAS Cluster Server for UNIX Fundamentals

11–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

If this attribute is set to NONE, VCS places the resource in an ADMIN_WAIT state and waits for administrative intervention. This is often used for service groups that manage database instances. You may need to leave the database in its FAULTED state in order to perform problem analysis and recovery operations.

Note: This attribute is set at the service group level. This means that any resource fault within that service group requires administrative intervention if the ManageFaults attribute for the service group is set to NONE.

FaultPropagation

The FaultPropagation attribute determines whether VCS evaluates the effects of a resource fault on parents of the faulted resource.

If ManageFaults is set to ALL, VCS runs the clean entry point for the faulted resource and then checks the FaultPropagation attribute of the service group. If this attribute is set to 0, VCS does not take any further action. In this case, VCS fails over the service group only on system failures and not on resource faults.

The default value is 1, which means that VCS continues through the failover process shown in the next section. This is the same behavior as VCS 3.5 and earlier releases.

Notes: • The ManageFaults and FaultPropagation attributes of a service group are

introduced in VCS version 3.5 for AIX and VCS version 3.5 MP1 (or 2.0 P4) for Solaris. VCS 3.5 for HP-UX and any earlier versions of VCS on any other platform do not have these attributes. If these attributes do not exist, the VCS response to resource faults is the same as with the default values of these attributes.

• ManageFaults and FaultPropagation have essentially the same effect when enabled—service group failover is suppressed. The difference is that when ManageFaults is set to NONE, the clean entry point is not run and that resource is put in an ADMIN_WAIT state.

Page 285: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

AutoFailOver

This attribute determines whether automatic failover takes place when a resource or system faults. The default value of 1 indicates that the service group should be failed over to other available systems if at all possible. However, if the attribute is set to 0, no automatic failover is attempted for the service group, and the service group is left in an OFFLINE|FAULTED state.

The Impact of Service Group Attributes (Continued)

Failovertarget

available?Keep the service

group offline Keep the service

group offline Bring the service group

online elsewhereBring the service group

online elsewhereNY

AutoFailOver

1

0

1

Choose a failover target from the SystemList based on FailOverPolicy

Choose a failover target from the SystemList based on FailOverPolicy

Take the entire group offline

Take the entire group offline

Critical online resource

in path?Keep group

partially online Keep group

partially online

Y

N

Page 286: VERITAS Cluster Server for UNIX Fundamentals

11–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Practice: How VCS Responds to a FaultThe service group illustrated in the slide demonstrates how VCS responds to faults. In each case (A, B, C, and so on), assume that the group is configured as listed and that the service group is not frozen. As an exercise, determine what occurs if the fourth resource in the group fails.

For example, in case A above, clean entry point is executed for resource 4 to ensure that it is offline, and resources 7 and 6 are taken offline because they depend on 4. Because 4 is a critical resource, the rest of the resources are taken offline from top to bottom, and the group is then failed over to another system.

Practice Exercise

Resource 4 Faults

FaultPropagationManageFaults

AutoFailover

-1, ALL, 0-I

-0, ALL, 1-H

-1,NONE,1-G71, ALL, 14F-1, ALL, 14,6,7E

-1, ALL, 14,6D6,71, ALL, 14C-1, ALL, 14B

-1, ALL, 1-A

Starts on another system

Taken offline due to fault

Offline(F, M, A)SG Attributes

Non-Critical

Case

7

65

3

12

9

8

4

Page 287: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Determining Failover DurationFailover Duration on a Resource FaultWhen a resource failure occurs, application services may be disrupted until either the resource is restarted on the same system or the application services migrate to another system in the cluster. The time required to address the failure is a combination of the time required to:• Detect the failure.

This is related to how often you want to monitor each resource in the service group. A resource failure is only detected when the monitor entry point of that resource returns an offline status unexpectedly. The resource type attributes used to tune the frequency of monitoring a resource are MonitorInterval (with a default 60 seconds) and OfflineMonitorInterval (with a default 300 seconds).

• Fault the resource.This is related to two factors:– How much tolerance you want VCS to have for false failure detections

For example, in an overloaded network environment, the NIC resource can return an occasional failure even though there is nothing wrong with the physical connection. You may want VCS to verify the failure a couple of times before faulting the resource.

Failover Duration When a Resource FaultsService group failover time is the sum of the duration of each of failover task.You can affect failover time behavior by setting resource type attributes.

+ Detect the resource failure (< MonitorInterval).+ Fault the resource.+ Take the entire service group offline. + Select a failover target.+ Bring the service group online on another system in

the cluster.

= Failover Duration

Page 288: VERITAS Cluster Server for UNIX Fundamentals

11–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

– Whether or not you want to attempt a restart before failing over For example, it may be much faster to restart a failed process on the same system rather than to migrate the entire service group to another system.

The resource type attributes related to these decisions are RestartLimit, ToleranceLimit, and ConfInterval. These attributes are described in more detail in the following sections.

• Take the entire service group to be taken offline.In general, the time required for a resource to be taken offline is dependent on the type of resource and what the offline procedure includes. However, VCS enables you to define the maximum time allowed for a normal offline procedure before attempting to force the resource to be taken offline. The resource type attributes related to this factor are OfflineTimeout and CleanTimeout. For more detailed information on these attributes, refer to the VERITAS Cluster Server Bundled Agents Reference Guide.

• Select a failover target.The time required for the VCS policy module to determine the target system is negligible, less than one second in all cases, in comparison to the other factors.

• Bring the service group online on another system in the cluster.This may be one of the more dominant factors in determining the total failover time. In most cases, in order to start an application service after a failure, you need to carry out some recovery procedures. For example, a file system’s metadata needs to be checked if it is not unmounted properly, or a database needs to carry out recovery procedures, such as applying the redo logs to recover from sudden failures. Take these considerations into account when you determine the amount of time you want VCS to allow for an online process. The resource type attributes related to bringing a service group online are OnlineTimeout, OnlineWaitLimit, and OnlineRetryLimit. For more information on these attributes, refer to the VERITAS Cluster Server Bundled Agents Reference Guide.

Page 289: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Adjusting MonitoringYou can change some resource type attributes to facilitate failover testing. For example, you can change the monitor interval to see the results of faults more quickly. You can also adjust these attributes to affect how quickly an application fails over when a fault occurs.

MonitorInterval

This is the duration (in seconds) between two consecutive monitor calls for an online or transitioning resource.

The default is 60 seconds for most resource types.

OfflineMonitorInterval

This is the duration (in seconds) between two consecutive monitor calls for an offline resource. If set to 0, offline resources are not monitored.

The default is 300 seconds for most resource types.

Refer to the VERITAS Cluster Server Bundled Agents Reference Guide for the applicable monitor interval defaults for specific resource types.

Adjusting MonitoringMonitorInterval: Frequency of online monitoring– The default value is 60 seconds for most resource types.– Consider reducing the value to 10 or 20 seconds for

testing.– Use caution when changing this value:

Lower values increase the load on cluster systems.Some false resource faults can occur if resources cannot respond in the interval specified.

OfflineMonitorInterval: Frequency of offline monitoring– The default value is 300 seconds for most resource types.– Consider reducing the value to 60 seconds for testing.

If you change a resource type attribute, you affect all resources of that type.

If you change a resource type attribute, you affect all resources of that type.!

Page 290: VERITAS Cluster Server for UNIX Fundamentals

11–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Adjusting Timeout ValuesThe attributes MonitorTimeout, OnlineTimeout, and OfflineTimeout indicate the maximum time (in seconds) within which the monitor, online, and offline entry points must finish or be terminated. The default for the MonitorTimeout attribute is 60 seconds. The defaults for the OnlineTimeout and OfflineTimeout attributes are 300 seconds.

For best results, measure the length of time required to bring a resource online, take it offline, and monitor it before modifying the defaults. Simply issue an online or offline command to measure the time required for each action. To measure how long it takes to monitor a resource, fault the resource and then issue a probe, or bring the resource online outside of VCS control and issue a probe.

Adjusting TimeoutsTimeout interval values define the maximum time within which the entry points must finish or be terminated.

OnlineTimeout and OfflineTimeout:– The default value is 300 seconds.– Increase the value if all resources of a type require

more time to be brought online or taken offline in your environment.

MonitorTimeoutThe default value is 60 seconds for most resource types.

Before modifying defaults: Measure the online and offline times outside of VCS.Measure the monitor time by faulting the resource, then issuing a probe.

Before modifying defaults: Measure the online and offline times outside of VCS.Measure the monitor time by faulting the resource, then issuing a probe.

Page 291: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Controlling Fault Behavior

Type Attributes Related to Resource FaultsAlthough the failover capability of VCS helps to minimize the disruption of application services when resources fail, the process of migrating a service to another system can be time-consuming. In some cases, you may want to attempt to restart a resource on the same system before failing it over to another system.

Whether a resource can be restarted depends on the application service:• The resource must be successfully cleared (taken offline) after failure.• The resource must not be a child resource, which has dependent parent

resources that must be restarted.

If you have determined that a resource can be restarted without impacting the integrity of the application, you can potentially avoid service group failover by configuring these resource type attributes:• RestartLimit

The restart limit determines how many times a resource can be restarted within the confidence interval before the resource faults.For example, you may want to restart a resource such as the Oracle listener process several times before it causes an Oracle service group to fault.

• ConfIntervalWhen a resource has remained online for the specified time (in seconds), previous faults and restart attempts are ignored by the agent. When this clock expires, the restart or tolerance counter is reset to zero.

Type Attributes Related to Resource FaultsRestartLimit– Controls the number of times a resource is restarted

on the same system before it is marked as FAULTED– Default: 0

ConfInterval– Determines the amount of time that must elapse

before restart and tolerance counters are reset to zero– Default: 600 seconds

ToleranceLimit– Enables the monitor entry point to return OFFLINE

several times before the resource is declared FAULTED

– Default: 0

Page 292: VERITAS Cluster Server for UNIX Fundamentals

11–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

• ToleranceLimitThis attribute determines how many times the monitor returns offline before the agent attempts to either restart the resource or mark it as FAULTED. This is within the confidence interval.

Page 293: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Restart Example

This example illustrates how the RestartLimit and ConfInterval attributes can be configured for modifying the behavior of VCS when a resource is faulted.

Setting RestartLimit = 1 and ConfInterval = 180 has this effect when a resource faults:1 The resource stops after running for 10 minutes.2 The next monitor returns offline.3 The ConfInterval counter is set to 0.4 The agent checks the value of RestartLimit.5 The resource is restarted because RestartLimit is set to 1, which allows one

restart within the ConfInterval.6 The next monitor returns online.7 The ConfInterval counter is now 60 (one monitor cycle has completed).8 The resource stops again.9 The next monitor returns offline.10 The ConfInterval counter is now 120 (two monitor cycles have completed).11 The resource is not restarted because the RestartLimit counter is now 2 and the

ConfInterval counter is 120 (seconds). Because the resource has not been online for the ConfInterval time of 180 seconds, it is not restarted.

12 VCS faults the resource.If the resource had remained online for 180 seconds, the internal RestartLimit counter would have been reset to 0.

Restart ExampleRestartLimit = 1The resource is restarted one time within the ConfInterval timeframe.ConfInterval = 180The resource can be restarted once within a three-minute interval.MonitorInterval = 60 seconds (default value)The resource is monitored every 60 seconds.

MonitorInterval Restart Faulted

OnlineOffline

OnlineOffline

OnlineConfInterval

Page 294: VERITAS Cluster Server for UNIX Fundamentals

11–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Modifying Resource Type AttributesYou can modify the resource type attributes to affect how an agent monitors all resources of a given type. For example, agents usually check their online resources every 60 seconds. You can modify that period so that the resource type is checked more often. This is good for either testing situations or time-critical resources.

You can also change the period so that the resource type is checked less often. This reduces the load on VCS overall, as well as on the individual systems, but increases the time it takes to detect resource failures.

For example, to change the ToleranceLimit attribute for all NIC resources so that the agent ignores occasional network problems, type:hatype -modify NIC ToleranceLimit 2

type NIC (static int MonitorInterval = 15static int OfflineMonitorInterval = 60static int ToleranceLimit = 2static str ArgList[] = { Device, …

…)

type NIC (static int MonitorInterval = 15static int OfflineMonitorInterval = 60static int ToleranceLimit = 2static str ArgList[] = { Device, …

…)

Modifying Resource Type Attributes

Can be used to optimize agentsIs applied to all resources of the specified type

hatype –modify NIC ToleranceLimit 2hatype –modify NIC ToleranceLimit 2

types.cftypes.cf

Page 295: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Overriding Resource Type AttributesVCS 4.0 provides the functionality of changing the resource type attributes on a per-resource basis. Unless the resource type attribute is overridden, the value applies to all resources of the same type. If you override the resource type attribute, you can change its value for a specific resource.Some predefined static resource type attributes (those resource type attributes that do not appear in types.cf unless their value is changed, such as MonitorInterval) and all static attributes that are not predefined (static attributes that are defined in the type definition file) can be overridden. For a detailed list of predefined static attributes that can be overridden, refer to the VERITAS Cluster Server User’s Guide.To override a resource type attribute, use the hares -override command as shown on the slide. You cannot override attributes from the GUI. After the resource type attribute is overridden, you can change its value for that specific resource using the hares -modify command as shown on the slide. Note that this change is stored in the main.cf file.You can use the hares -display -ovalues command to display the overridden attributes for a specific resource.When you restore the default settings of the attribute by running the hares -undo_override command, the entry for that resource type attribute is removed from the main.cf file.Note: The configuration must be in read-write mode for you to modify and override resource type attributes. The changes are reflected in the main.cf file only after you dump the configuration using the haconf -dump command.

Overriding Resource Type AttributesYou can override resource type attributes by setting different values in the resource definition.You must use the CLI to override attributes.

Mount myMount (MountPoint="/mydir". . .MonitorInterval=10. . .)

Mount myMount (MountPoint="/mydir". . .MonitorInterval=10. . .)

main.cfmain.cf

–override myMount MonitorInterval–modify myMount MonitorInterval 10–display –ovalues myMount–undo_override myMount MonitorInterval

Override MonitorIntervalModify overridden attributeDisplay overridden valuesRestore default settings

hareshares

Page 296: VERITAS Cluster Server for UNIX Fundamentals

11–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Recovering from Resource FaultsWhen a resource failure is detected, the resource is put into a FAULTED or an ADMIN_WAIT state depending on the cluster configuration as described in the previous sections. In either case, administrative intervention is required to bring the resource status back to normal.

Recovering a Resource from a FAULTED StateA resource in FAULTED state cannot be brought online on a system. When a resource is FAULTED on a system, the service group status also changes to FAULTED on that system, and that system can no longer be considered as an available target during a service group failover.

You have to clear the FAULTED status of a nonpersistent resource manually. Before clearing the FAULTED status, ensure that the resource is completely offline and that the fault is fixed outside of VCS.

To clear a nonpersistent resource fault using the command line:

hares -clear resource [-sys system]

Provide the resource name and the name of the system that the resource is faulted on. If the system name is not specified, the resource is cleared on all systems where it is faulted.

Note: You can also run hagrp -clear group [-sys system] to clear all FAULTED resources in a service group. However, you have to ensure that all of the FAULTED resources are completely offline and the faults are fixed on all the corresponding systems before running this command.

Recovering a Resource from a FAULTED State

When a resource is FAULTED on a system, you cannotbring the resource online without clearing the fault status.

To clear a nonpersistent resource fault:1. Ensure that the fault is fixed outside of VCS and that the

resource is completely offline.2. Use the hares –clear resource command to clear the

FAULTED status.

To clear a nonpersistent resource fault:1. Ensure that the fault is fixed outside of VCS and that the

resource is completely offline.2. Use the hares –clear resource command to clear the

FAULTED status.

To clear a persistent resource fault:1. Ensure that the fault is fixed outside of VCS.2. Either wait for the periodic monitoring or probe the resource

manually using the command:hares –probe resource –sys system

To clear a persistent resource fault:1. Ensure that the fault is fixed outside of VCS.2. Either wait for the periodic monitoring or probe the resource

manually using the command:hares –probe resource –sys system

Page 297: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

The FAULTED status of a persistent resource is cleared when the monitor returns an online status for that resource. Note that offline resources are monitored according to the value of OfflineMonitorInterval, which is 300 seconds (5 minutes) by default. To avoid waiting for the periodic monitoring, you can initiate the monitoring of the resource manually by probing the resource.

To probe a resource using the command line:

hares -probe resource -sys systemProvide the resource name and the name of the system on which you want to probe the resource.

Page 298: VERITAS Cluster Server for UNIX Fundamentals

11–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Recovering a Resource from an ADMIN_WAIT StateIf the ManageFaults attribute of a service group is set to NONE, VCS does not take any automatic action when it detects a resource fault. VCS places the resource into the ADMIN_WAIT state and waits for administrative intervention. There are two primary reasons to configure VCS in this way:• You want to analyze and recover from the failure manually with the aim of

continuing operation on the same system.In this case, fix the fault and bring the resource back to the state it was in before the failure (online state) manually outside of VCS. After the resource is back online, you can inform VCS to take the resource out of ADMIN_WAIT state and put it back into ONLINE state using this command:hagrp -clearadminwait group -sys system Notes: – If the next monitor cycle does not report an online status, the resource is

placed back into the ADMIN_WAIT state. If the next monitor cycle reports an online status, VCS continues normal operation without any failover.

– If the resource is restarted outside of VCS and the monitor cycle runs before you can run hagrp -clearadminwait group -sys system, then the resource returns to an online status automatically.

– You cannot clear the ADMIN_WAIT state from the GUI.

Recovering a Resource from an ADMIN_WAIT State

When a resource is in ADMIN_WAIT state, VCS waits for administrative intervention and takes no further action until the status is cleared.Two possible solutions:

To continue operation without a failover:1. Fix the fault and bring the resource online outside of VCS.2. Tell VCS to clear the ADMIN_WAIT status. Type:

hagrp –clearadminwait group –sys systemNote: If the next monitor cycle for the resource returns offline, the resource is placed back into ADMIN_WAIT state.

To initiate failover after collecting debug information:1. Analyze the log files and collect debug information.2. Tell VCS to clear the ADMIN_WAIT status and continue with the

failover process as normal. Type:hagrp –clearadminwait –fault group –sys system

Two possible solutions:To continue operation without a failover:

1. Fix the fault and bring the resource online outside of VCS.2. Tell VCS to clear the ADMIN_WAIT status. Type:

hagrp –clearadminwait group –sys systemNote: If the next monitor cycle for the resource returns offline, the resource is placed back into ADMIN_WAIT state.

To initiate failover after collecting debug information:1. Analyze the log files and collect debug information.2. Tell VCS to clear the ADMIN_WAIT status and continue with the

failover process as normal. Type:hagrp –clearadminwait –fault group –sys system

Page 299: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

• You want to collect debugging information before any action is taken.The intention in this case is to let VCS wait until the failure is analyzed. After the analysis is completed, you can then let VCS continue with the normal failover process by running this command:hagrp -clearadminwait -fault group -sys system Note: As a result of this command, the clean entry point is executed on the resource in the ADMIN_WAIT state, and the resource changes status to OFFLINE | FAULTED. VCS then continues with the service group failover, depending on the cluster configuration.

Page 300: VERITAS Cluster Server for UNIX Fundamentals

11–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Fault Notification and Event HandlingFault NotificationAs a response to a resource fault, VCS carries out tasks to take resources or service groups offline and to bring them back online elsewhere in the cluster. While carrying out these tasks, VCS generates certain messages with a variety of severity levels and the VCS engine passes these messages to the notifier daemon. Whether these messages are used for SNMP traps or SMTP notification depends on how the notification component of VCS is configured, as described in the “Configuring Notification” lesson.

The following events are examples that result in a notification message being generated:• A resource becomes offline unexpectedly; that is, a resource is faulted. • VCS cannot take the resource offline.• A service group is faulted and there is no failover target available.• The service group is brought online or taken offline successfully.• The service group has faulted on all nodes where the group could be brought

online, and there are no nodes to which the group can fail over.

Send notification (Error).E-mail ResourceOwner (if configured).

Fault Notification

A resource becomes offline unexpectedly.

Send notification (Warning).E-mail ResourceOwner (if configured).

A resource cannot be taken offline.

Send notification (SevereError).E-mail GroupOwner (if configured).

The service group is faulted due to a critical

resource fault.

Send notification (Information).E-mail GroupOwner (if configured).

The service group is brought online or taken

offline successfully.

Send notification (Error).E-mail GroupOwner (if configured).

The failover target does not exist.

Page 301: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

Extended Event Handling Using TriggersYou can use triggers to customize how VCS responds to events that occur in the cluster.

For example, you could use the ResAdminWait trigger to automate the task of taking diagnostics of the application as part of the failover and recovery process. If you set ManageFaults to NONE for a service group, VCS places faulted resources into the ADMIN_WAIT state. If the ResAdminWait trigger is configured, VCS runs the script when a resource enters ADMIN_WAIT. Within the trigger script, you can run a diagnostic tool and log information about the resource, then take a desired action, such as clearing the state and faulting the resource:

hagrp -clearadminwait -fault group -sys system

The Role of Triggers in Resource FaultsAs a response to a resource fault, VCS carries out tasks to take resources or service groups offline and to bring them back online elsewhere in the cluster. While these tasks are being carried out, certain events take place. If corresponding event triggers are configured, VCS executes these trigger scripts.

Call resfault (if present).Call resstatechange.

(if present and configured).

Extended Event Handling Using Triggers

A resource becomes offline unexpectedly.

Call resnotoff (if present).A resource cannot be taken offline.

Call resstatechange. (if present and configured).

A resource is brought online or taken

offline successfully.

Call nofailover (if present).The failover target does not exist.

Call resadminwait (if present).A resource is placed

in an ADMIN_WAIT state.

Page 302: VERITAS Cluster Server for UNIX Fundamentals

11–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The following events result in a trigger being executed if it is configured:• When a resource becomes offline unexpectedly, that is, a resource is faulted,

both the ResFault and the ResStateChange event triggers are executed.• If VCS cannot take the resource offline, the ResNotOff trigger is executed.• If a resource is placed in an ADMIN_WAIT state due to a fault (ManageFaults

= NONE), the ResAdminWait trigger is executed.Note: The ResAdminWait trigger exists only with VCS 3.5 for AIX and VCS 3.5 MP1 for Solaris. VCS 3.5 for HP-UX and earlier versions of VCS on other platforms do not support this event trigger.

• The ResStateChange trigger is executed every time a resource changes its state from online to offline or from offline to online.

• If the service group has faulted on all nodes where the group can be brought online and there are no nodes to which the group can fail over, the NoFailover trigger is executed.

Triggers are placed in the /opt/VRTSvcs/bin/triggers directory. Sample trigger scripts are provided in /opt/VRTSvcs/bin/sample_triggers. Trigger configuration is described in the VERITAS Cluster Server User’s Guide and the High Availability Design Using VERITAS Cluster Server instructor-led training course.

Page 303: VERITAS Cluster Server for UNIX Fundamentals

Lesson 11 Configuring VCS Response to Resource Faults 11–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

11

SummaryThis lesson described how VCS responds to resource faults and introduced various components of VCS that enable you to customize VCS response to resource faults.

Next StepsThe next lesson describes how the cluster communication mechanisms work to build and maintain the cluster membership.

Additional Resources• VERITAS Cluster Server Bundled Agents Reference Guide

This document provides important reference information for the VCS agents bundled with the VCS software.

• VERITAS Cluster Server User’s GuideThis document provides information about all aspects of VCS configuration.

• High Availability Design Using VERITAS Cluster Server instructor-led training This cover provides configuration procedures and practical exercises for configuring triggers.

Lesson SummaryKey Points – You can customize how VCS responds to

faults by configuring attributes.– Failover duration can also be adjusted to meet

your specific requirements.Reference Materials– VERITAS Cluster Server Bundled Agent

Reference Guide– VERITAS Cluster Server User's Guide– High Availability Design Using VERITAS

Cluster Server instructor-led training

Page 304: VERITAS Cluster Server for UNIX Fundamentals

11–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 11: Configuring Resource Fault BehaviorLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 11 Synopsis: Configuring Resource Fault Behavior,” page A-55

Appendix B provides step-by-step lab instructions.• “Lab 11: Configuring Resource Fault Behavior,” page B-93

Appendix C provides complete lab instructions and solutions.• “Lab 11 Solutions: Configuring Resource Fault Behavior,” page C-133

GoalThe purpose of this lab is to observe how VCS responds to faults in a variety of scenarios.

ResultsEach student observes the effects of failure events in the cluster.

PrerequisitesObtain any classroom-specific values needed for your classroom lab environment and record these values in your design worksheet included with the lab exercise instructions.

Lab 11: Configuring Resource Fault Behavior

nameSG1 nameSG2

Critical=0Critical=1FaultPropagation=0FaultPropagation=1ManageFaults=NONEManageFaults=ALLRestartLimit=1

Note: Network interfaces for virtual IP addresses are unconfigured to force the IP resource to fault. In your classroom, the interface you specify is:______

Replace the variable interface in the lab steps with thisvalue.

Note: Network interfaces for virtual IP addresses are unconfigured to force the IP resource to fault. In your classroom, the interface you specify is:______

Replace the variable interface in the lab steps with thisvalue.

Page 305: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12Cluster Communications

Page 306: VERITAS Cluster Server for UNIX Fundamentals

12–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how the cluster interconnect mechanism works. You also learn how the GAB and LLT configuration files are set up during installation to implement the communication channels.

ImportanceAlthough you may never need to reconfigure the cluster interconnect, developing a thorough knowledge of how the cluster interconnect functions is key to understanding how VCS behaves when systems or network links fail.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 307: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12Outline of Topics• VCS Communications Review• Cluster Membership• Cluster Interconnect Configuration• Joining the Cluster Membership

Describe how VCS determines cluster membership.

Cluster Membership

Describe how systems join the cluster membership.

Joining the Cluster Membership

Describe the files that specify the cluster interconnect configuration.

Cluster Interconnect Configuration

Describe how components communicate in a VCS environment.

VCS Communications Review

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 308: VERITAS Cluster Server for UNIX Fundamentals

12–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

VCS Communications ReviewVCS maintains the cluster state by tracking the status of all resources and service groups in the cluster. The state is communicated between had processes on each cluster system by way of the atomic broadcast capability of Group Membership Services/Atomic Broadcast (GAB). HAD is a replicated state machine, which uses the GAB atomic broadcast mechanism to ensure that all systems within the cluster are immediately notified of changes in resource status, cluster membership, and configuration.

Atomic means that all systems receive updates, or all systems are rolled back to the previous state, much like a database atomic commit. If a failure occurs while transmitting status changes, GAB’s atomicity ensures that, upon recovery, all systems have the same information regarding the status of any monitored resource in the cluster.

VCS On-Node CommunicationsVCS uses agents to manage resources within the cluster. Agents perform resource-specific tasks on behalf of had, such as online, offline, and monitoring actions. These actions can be initiated by an administrator issuing directives using the VCS graphical or command-line interface, or by other events which require had to take some action. Agents also report resource status back to had. Agents do not communicate with one another, but only with had.

The had processes on each cluster system communicate cluster status information over the cluster interconnect.

On-Node and Off-Node Communication

Broadcast heartbeat on each interface every ½ second.

Broadcast heartbeat on each interface every ½ second.

Each LLT module tracks status of heartbeat from each peer on each interface.

Each LLT module tracks status of heartbeat from each peer on each interface.

LLT forwards the heartbeat status of each node to GAB.

LLT forwards the heartbeat status of each node to GAB.

LLTLLT

GABGAB

had

AgentAgent AgentAgent

LLTLLT

GABGAB

had

AgentAgent AgentAgent

LLTLLT

GABGAB

had

AgentAgent AgentAgent

Page 309: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12

VCS Inter-Node CommunicationsIn order to replicate the state of the cluster to all cluster systems, VCS must determine which systems are participating in the cluster membership. This is accomplished by the Group Membership Services mechanism of GAB.

Cluster membership refers to all systems configured with the same cluster ID and interconnected by a pair of redundant Ethernet LLT links. Under normal operation, all systems configured as part of the cluster during VCS installation actively participate in cluster communications.

Systems join a cluster by issuing a cluster join message during GAB startup. Cluster membership is maintained by heartbeats. Heartbeats are signals sent periodically from one system to another to determine system state. Heartbeats are transmitted by the LLT protocol.

VCS Communications Stack SummaryThe hierarchy of VCS mechanisms that participate in maintaining and communicating cluster membership and status information is shown in the diagram.• Agents communicate with had.• The had processes on each system communicate status information by way of

GAB.• GAB determines cluster membership by monitoring heartbeats transmitted

from each system over LLT.

Page 310: VERITAS Cluster Server for UNIX Fundamentals

12–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Cluster Interconnect SpecificationsLLT can be configured to designate links as high-priority or low-priority links. High-priority links are used for cluster communications (GAB) as well as heartbeats. Low-priority links only carry heartbeats unless there is a failure of all configured high-priority links. At this time, LLT switches cluster communications to the first available low-priority link. Traffic reverts to high-priority links as soon as they are available.

Later lessons provide more detail about how VCS handles link failures in different environments.

Cluster Interconnect SpecificationsVCS supports up to eight links for the cluster interconnect.Links can be specified as:– High-priority:

Heartbeats every half-secondCluster status information carried over linksUsually configured for dedicated cluster network links

– Low-priority:Heartbeats every secondNo cluster status sentAutomatically promoted to high priority if there are no high-priority links functioningCan be configured on public network interfaces

Page 311: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12

Cluster MembershipGAB Status and Membership NotationTo display the cluster membership status, type gabconfig on each system. For example: gabconfig -a

If GAB is operating, the following GAB port membership information is returned:• Port a indicates that GAB is communicating, a36e0003 is a randomly

generated number, and membership 01 indicates that systems 0 and 1 are connected.

• Port h indicates that VCS is started, fd570002 is a randomly generated number, and membership 01 indicates that systems 0 and 1 are both running VCS.

Note: The port a and port h generation numbers change each time the membership changes.

# gabconfig -aGAB Port Memberships===============================================

Port a gen a36e003 membership 01 ; ;12Port h gen fd57002 membership 01 ; ;12

# gabconfig -aGAB Port Memberships

===============================================

Port a gen a36e003 membership 01 ; ;12Port h gen fd57002 membership 01 ; ;12

GAB Status and Membership NotationNodes 0 and 1

Indicates 10s Digit(0 displayed if node 10is a member of the cluster)

Nodes 21 and 22

20s Placeholder

HAD is communicating.

GAB is communicating.

Page 312: VERITAS Cluster Server for UNIX Fundamentals

12–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

GAB Membership Notation

A positional notation is used by gabconfig to indicate which systems are members of the cluster. Only the last digit of the node number is displayed relative to semicolons that indicate the 10s digit.

For example, if systems 21 and 22 are also members of this cluster, gabconfig displays the following output, where the first semicolon indicates the 10th node, and the second indicates the 20th:GAB Port Memberships

======================================================

Port a gen a36e0003 membership 01 ; ;12

Port h gen fd570002 membership 01 ; ;12

Page 313: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12Viewing LLT Link Status

The lltstat CommandUse the lltstat command to verify that links are active for LLT. This command returns information about the links for LLT for the system on which it is typed. In the example shown in the slide, lltstat -nvv is typed on the S1 system to produce the LLT status in a cluster with two systems.

The -nvv options cause lltstat to list systems with very verbose status:• Link names from llttab • Status• MAC address of the Ethernet ports

Other lltstat uses:• Without options, lltstat reports whether LLT is running.• The -c option displays the values of LLT configuration directives.• The -l option lists information about each configured LLT link.

You can also use lltstat effectively to create a script that runs lltstat -nvv and checks for the string DOWN. Run this from cron periodically to report failed links.

Use the exclude directive in llttab to eliminate information about nonexistent systems.

Note: This level of detailed information about LLT links is only available through the CLI. Basic status is shown in the GUI.

LLT Link Status: The lltstat Command

S1# lltstat -nvv |pgLLT node information:

Node State Link Status Address* 0 S1 OPEN

qfe0 UP 08:00:20:AD:BC:78hme0 UP 08:00:20:AD:BC:79

1 S2 OPEN qfe0 UP 08:00:20:B4:0C:3Bhme0 UP 08:00:20:B4:0C:3C

S1# lltstat -nvv |pgLLT node information:

Node State Link Status Address* 0 S1 OPEN

qfe0 UP 08:00:20:AD:BC:78hme0 UP 08:00:20:AD:BC:79

1 S2 OPEN qfe0 UP 08:00:20:B4:0C:3Bhme0 UP 08:00:20:B4:0C:3C

Shows which system runs the command

Solaris ExampleSolaris Example

Page 314: VERITAS Cluster Server for UNIX Fundamentals

12–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Cluster Interconnect Configuration Configuration OverviewThe VCS installation utility sets up all cluster interconnect configuration files and starts up LLT and GAB. You may never need to modify communication configuration files. Understanding how these files work together to define the cluster communication mechanism helps you understand how VCS responds if a fault occurs.

Configuration OverviewThe cluster interconnect is automatically configured during installation.

You may never need to modify any portion of the interconnect configuration.Details about the configuration and functioning of the interconnect are provided to give you a complete understanding of the VCS architecture.Knowing how a cluster membership is formed and maintained is necessary for understanding effects of system and communications faults, described in later lessons.

Page 315: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12LLT Configuration FilesThe LLT configuration files are located in the /etc directory.

The llttab File

The llttab file is the primary LLT configuration file and is used to:• Set system ID numbers.• Set the cluster ID number.• Specify the network device names used for the cluster interconnect.• Modify LLT behavior, such as heartbeat frequency.

The example llttab file shown in the slide describes this cluster system (S2):• System (node) ID is set to 1.• Cluster ID is set to 10.• Cluster interconnect uses the hme0 and qfe0 Ethernet ports.

This is the minimum recommended set of directives required to configure LLT. The basic format of the file is an LLT configuration directive followed by a value. These directives and their values are described in more detail in the next sections.

For a complete list of directives, see the sample llttab file in the /opt/VRTSllt directory and the llttab manual page.

Note: Ensure that there is only one set-node line in the llttab file.

LLT Configuration Files: The llttab FileThis is the primary LLT configuration file, used to:

Assign node numbers to systems.Set the cluster ID number to specify which systemsare members of a cluster.Specify the network devices used for the cluster interconnect.Modify default LLT behavior, such as heartbeatfrequency.

# cat /etc/llttabset-node 1set-cluster 10link qfe0 /dev/qfe:0 - ether - -link hme0 /dev/hme:0 - ether - -

# cat /etc/llttabset-node 1set-cluster 10link qfe0 /dev/qfe:0 - ether - -link hme0 /dev/hme:0 - ether - -

AIX HP-UX Linux

SolarisSolaris

Page 316: VERITAS Cluster Server for UNIX Fundamentals

12–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

AIX

set-node 1

set-cluster 10

link en1 /dev/en:1 - ether - -

link en2 /dev/en:2 - ether - - HP-UX

set-node 1

set-cluster 10

link lan1 /dev/lan:1 - ether - -

link lan2 /dev/lan:2 - ether - - Linux

/etc/llttab

set-node 1

set-cluster 10

link link1 eth1 - ether - -

link link2 eth2 - ether - -

Page 317: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12How Node and Cluster Numbers Are Specified

A unique number must be assigned to each system in a cluster using the set-node directive.

The value of set-node can be one of the following:• An integer in the range of 0 through 31 (32 systems per cluster maximum)• A system name matching an entry in /etc/llthosts

If a number is specified, each system in the cluster must have a unique llttab file, which has a unique value for set-node. Likewise, if a system name is specified, each system must have a different llttab file with a unique system name that is listed in llthosts, which LLT maps to a node ID.

The set-cluster Directive

LLT uses the set-cluster directive to assign a unique number to each cluster. Although a cluster ID is optional when only one cluster is configured on a physical network, you should always define a cluster ID. This ensures that each system only joins other systems with the same cluster ID to form a cluster.

If LLT detects multiple systems with the same node ID and cluster ID on a private network, the LLT interface is disabled on the node that is starting up to prevent split-brain condition where a service group could be brought online on the two systems with the same node ID.

Note: You can use the same cluster interconnect network infrastructure for multiple clusters. You must ensure the llttab file specifies the appropriate cluster ID to ensure that there are no conflicting node IDs.

# cat /etc/llttab

set-cluster 10

set-node S1

link qfe0 /dev/qfe:0 - ether - -

link hme0 /dev/hme:0 - ether - -

# cat /etc/llthosts0 S11 S2

How Node and Cluster Numbers Are Specified

0 - 31

0 - 255

SolarisSolaris

AIX HP-UX Linux

Page 318: VERITAS Cluster Server for UNIX Fundamentals

12–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The llthosts File

The llthosts file associates a system name with a VCS cluster node ID number. This file must be present in the /etc directory on every system in the cluster. It must contain a line with the unique name and node ID for each system in the cluster. The format is:node_number name

The critical requirements for llthosts entries are:• Node numbers must be unique. If duplicate node IDs are detected on the

Ethernet LLT cluster interconnect, LLT in VCS 4.0 is stopped on the joining node. In VCS versions before 4.0, the joining node panics.

• The system name must match the name in llttab if a name is configured for the set-node directive (rather than a number).

• System names must match those in main.cf, or VCS cannot start.

Note: The system (node) name does not need to be the UNIX host name found using the hostname command. However, VERITAS recommends that you keep the names the same to simplify administration, as described in the next section.

See the llthosts manual page for a complete description of the file.

The llthosts FileThis file associates a system name with a VCScluster node ID.

Have the same entries on all systemsUse unique node numbers, which are requiredHave system names match llttab, main.cfHave system names match sysname, if used

# cat /etc/llthosts0 S11 S2

Page 319: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12The sysname FileThe sysname file is an optional LLT configuration file. This file is used to store the system (node) name. In later versions, the VCS installation utility creates the sysname file on each system, which contains the host name for that system.

The purpose of the sysname file is to remove VCS dependence on the UNIX uname utility for determining the local system name. If the sysname file is not present, VCS determines the local host name using uname. If uname returns a fully qualified domain name (sys.company.com), VCS cannot match the name to the systems in the main.cf cluster configuration and therefore cannot start on that system.

If uname returns a fully qualified domain name on your cluster systems, ensure that the sysname file is configured with the local host name in /etc/VRTSvcs/conf.

Note: Although you can specify a name in the sysname file that is completely different from the UNIX host name shown in the output of uname, this can lead to problems and is not recommended. For example, consider a scenario where system S1 fails and you replace it with another system named S3. You configure VCS on S3 to make it appear to be S1 by creating a sysname file with S1. While this has the advantage of minimizing VCS configuration changes, it can create a great deal of confusion when troubleshooting problems. From the VCS point of view, the system is shown as S1. From the UNIX point of view, the system is S3.

See the sysname manual page for a complete description of the file.

The sysname FileThe sysname file is used to provide VCS with theshort-form system name.

Some UNIX systems return a fully qualified domain name that does not match the main.cf system name and, therefore, prevents VCS from starting.Using a sysname file removes dependence on the UNIX uname utility.The name in the sysname file must:– Be unique within the cluster; each system has a

different entry in the file– Match main.cf system attributes– Be specified in the SystemList and AutoStartList

attributes for service groups

# cat /etc/VRTSvcs/conf/sysnameS1

Page 320: VERITAS Cluster Server for UNIX Fundamentals

12–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

The GAB Configuration FileGAB is configured with the /etc/gabtab file. This file contains one line which is used to start GAB. For example:/sbin/gabconfig -c -n 4

This example starts GAB and specifies that four systems are required to be running GAB to start within the cluster.

A sample gabtab file is included in /opt/VRTSgab.

Note: Other gabconfig options are discussed later in this lesson. See the gabconfig manual page for a complete description of the file.

The GAB Configuration FileGAB configuration is specified in the /etc/gabtabfile.

The file contains the command to start GAB:/sbin/gabconfig -c -n number_of_systemsThe value specified by number_of_systemsdetermines the number of systems that must be communicating by way of GAB to allow VCS to start.

# cat /etc/gabtab/sbin/gabconfig –c –n 4

# cat /etc/gabtab/sbin/gabconfig –c –n 4

Page 321: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12

Joining the Cluster MembershipGAB and LLT are started automatically when a system starts up. HAD can only start after GAB membership has been established among all cluster systems. The mechanism that ensures that all cluster systems are visible on the cluster interconnect is GAB seeding.

Seeding During StartupSeeding is a mechanism to ensure that systems in a cluster are able to communicate before VCS can start. Only systems that have been seeded can participate in a cluster. Seeding is also used to define how many systems must be online and communicating before a cluster is formed.

By default, a system is not seeded when it boots. This prevents VCS from starting, which prevents applications (service groups) from starting. If the system cannot communicate with the cluster, it cannot be seeded.

Seeding is a function of GAB and is performed automatically or manually, depending on how GAB is configured. GAB seeds a system automatically in one of two ways:• When an unseeded system communicates with a seeded system• When all systems in the cluster are unseeded and able to communicate with

each other

The number of systems that must be seeded before VCS is started on any system is also determined by the GAB configuration.

GAB starts on each system with a seed value equal to the number of systems in the cluster. When GAB sees three members, the cluster is seeded.

Seeding During Startup

LLT starts on each system.1

3 HAD starts only after GAB is communicating on all systems.

2

S2S2

S3S3

I am alive

I am alive

I am alive

S1S1

GABLLT

HAD

GABLLT

HAD GABLLT

HAD

Solaris AIX HP-UX Linux

Page 322: VERITAS Cluster Server for UNIX Fundamentals

12–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

When the cluster is seeded, each node is listed in the port a membership displayed by gabconfig -a. For example:# gabconfig -a

GAB Port Memberships

=======================================================

Port a gen a356e003 membership 0123

LLT, GAB, and VCS Startup FilesThese startup files are placed on the system when VCS is installed.

Solaris

AIX

HP-UX

Linux

/etc/rc2.d/S70llt Checks for /etc/llttab and runs/sbin/lltconfig -c to start LLT

/etc/rc2.d/S92gab Calls /etc/gabtab

/etc/rc3.d/S99vcs Runs /opt/VRTSvcs/bin/hastart

/etc/rc.d/rc2.d/S70llt

Checks for /etc/llttab and runs/sbin/lltconfig -c to start LLT

/etc/rc.d/rc2.d/S92gab

Calls /etc/gabtab

/etc/rc2.d/S99vcs Runs /opt/VRTSvcs/bin/hastart

/sbin/rc2.d/S680llt

Checks for /etc/llttab and runs/sbin/lltconfig -c to start LLT

/sbin/rc2.d/S920gab

Calls /etc/gabtab

/sbin/rc3.d/S990vcs

Runs /opt/VRTSvcs/bin/hastart

/etc/rc.d/rc3.d/S66llt

Checks for /etc/llttab and runs/sbin/lltconfig -c to start LLT

/etc/rc.d/rc3.d/S67gab

Calls /etc/gabtab

/etc/rc.d/rc3.d/S99vcs

Runs /opt/VRTSvcs/bin/hastart

Page 323: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12Manual SeedingYou can override the seed values in the gabtab file and manually force GAB to seed a system using the gabconfig command. This is useful when one of the systems in the cluster is out of service and you want to start VCS on the remaining systems.

To seed the cluster, start GAB on one node with -x to override the -n value set in the gabtab file. For example, type:

gabconfig -c -x

Warning: Only manually seed the cluster when you are sure that no other systems have GAB seeded. In clusters that do not use I/O fencing, you can potentially create a split brain condition by using gabconfig improperly.

After you have started GAB on one system, start GAB on other systems using gabconfig with only the -c option. You do not need to force GAB to start with the -x option on other systems. When GAB starts on the other systems, it determines that GAB is already seeded and starts up.

Manual Seeding

S3S3

Seeded

Seeds

S1S1

GABLLT

HAD

GABLLT

HAD GABLLT

HAD

S3 is down for maintenance. S1 and S2 are rebooted.1

3Start GAB on S1 manually and force it to seed: gabconfig –c -x. Start GAB on S2: gabconfig -c; it seeds because it can see another seeded system (S1).

LLT starts on S1 and S2. GAB cannot seed with S3 down.2

Start HAD on S1 and S2.4

12

3

3

S2S2

Page 324: VERITAS Cluster Server for UNIX Fundamentals

12–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Probing Resources During StartupDuring initial startup, VCS autodisables a service group until all its resources are probed on all systems in the SystemList that have GAB running. When a service group is autodisabled, VCS sets the AutoDisabled attribute to 1 (true), which prevents the service group from starting on any system. This protects against a situation where enough systems are running LLT and GAB to seed the cluster, but not all systems have HAD running.

In this case, port a membership is complete, but port h is not. VCS cannot detect whether a service is running on a system where HAD is not running. Rather than allowing a potential concurrency violation to occur, VCS prevents the service group from starting anywhere until all resources are probed on all systems.

After all resources are probed on all systems, a service group can come online by bringing offline resources online. If the resources are already online, as in the case where had has been stopped with the hastop -all -force option, the resources are marked as online.

Probing Resources During Startup

HAD HAD

A B

HAD tells agents to probe (monitor) all resources on all systems in the SystemList to determine their status.

1

If agents successfully probe resources, HAD brings service groups online according to AutoStart and AutoStartList attributes.

2

During startup, HAD autodisables service groups.

3

2

3

A, B Autodisabled for S1, S21

agent

monitor

agentagent

agentagentagent

Page 325: VERITAS Cluster Server for UNIX Fundamentals

Lesson 12 Cluster Communications 12–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

12

SummaryThis lesson described how the cluster interconnect mechanism works and the format and content of the configuration files.

Next StepsThe next lesson describes how system and communication failures are handled in a VCS cluster environment that does not support I/O fencing.

Additional Resources• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• VERITAS Cluster Server Installation GuideThis guide provides detailed information on configuring VCS communication mechanisms.

Lesson SummaryKey Points – The cluster interconnect is used for cluster

membership and status information.– The cluster interconnect configuration may

never require modification, but can be altered for site-specific requirements.

Reference Materials– VERITAS Cluster Server Installation Guide– VERITAS Cluster Server User's Guide

Page 326: VERITAS Cluster Server for UNIX Fundamentals

12–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 327: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13System and Communication Faults

Page 328: VERITAS Cluster Server for UNIX Fundamentals

13–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how VCS handles system and communication failures in clusters that do not implement I/O fencing.

ImportanceA thorough understanding of how VCS responds to system and communication faults ensures that you know how services and their users are affected in common failure situations.

Lesson Introduction

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 329: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Outline of Topics• Ensuring Data Integrity• Cluster Interconnect Failures• Changing the Interconnect Configuration

Change the cluster interconnect configuration.

Changing the Interconnect Configuration

Describe how VCS responds to cluster interconnect failures.

Cluster Interconnect Failures

Describe VERITAS recommendations for ensuring data integrity.

Ensuring Data Integrity

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 330: VERITAS Cluster Server for UNIX Fundamentals

13–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Ensuring Data IntegrityA cluster implementation must protect data integrity by preventing a split brain condition.

The simplest implementation is to disallow any corrective action in the cluster when contact is lost with a node. However, this means that automatic recovery after a legitimate node fault is not possible.

In order to support automatic recovery after a legitimate node fault, you must have an additional mechanism to verify that a node failed.

VCS 4.0 introduced I/O fencing as the primary mechanism to verify node failure, which ensures that data is safe above all other criteria. For environments that do not support I/O fencing, several additional methods for protecting data integrity are also supported.

Ensuring Data IntegrityFor VCS 4.x, VERITAS recommends using I/O fencing to protect data on shared storage, which supports SCSI-3 persistent reservation (PR).For environments that do not have SCSI-3 PR support, VCS supports additional protection mechanisms for membership arbitration: – Redundant communication links– Separate heartbeat infrastructures– Jeopardy cluster membership– Autodisabled service groups

Page 331: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

VCS Response to System FailureThe example cluster used throughout most of this section contains three systems, S1, S2, and S3, which can each run any of the three service groups, A, B, and C. The abbreviated system and service group names are used to simplify the diagrams.

In this example, there are two Ethernet LLT links for the cluster interconnect.

Prior to any failures, systems S1, S2, and S3 are part of the regular membership of cluster number 1.

S3 faults; C started on S1 or S2Regular Membership: S1, S2

System Failure Example

A CB

No Membership: S3

S2S2

S1S1 S3S3

C

Page 332: VERITAS Cluster Server for UNIX Fundamentals

13–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Failover Duration on a System FaultWhen a system faults, application services that were running on that system are disrupted until the services are started up on another system in the cluster. The time required to address a system fault is a combination of the time required to:• Detect the system failure.

A system is determined to be faulted according to these default timeout periods:– LLT timeout: If LLT on a running system does not receive a heartbeat from

a system for 16 seconds, LLT notifies GAB of a heartbeat failure.– GAB stable timeout: GAB determines that a membership change is

occurring, and after five seconds, GAB delivers the membership change to HAD.

• Select a failover target.The time required for the VCS policy module to determine the target system is negligible, less than one second in all cases, in comparison to the other factors.

• Bring the service group online on another system in the cluster.

As described in an earlier lesson, the time required for the application service to start up is a key factor in determining the total failover time.

Failover Duration on a System FailureIn the case of a system failure, service group failover time is the sum of the duration of each ofthese tasks.

+ Detect the system failure—21 seconds for heartbeat timeouts.

+ Select a failover target—less than one second.+ Bring the service group online on another system in

the cluster.

= Failover Duration

Page 333: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Cluster Interconnect FailuresSingle LLT Link FailureIn the case where a node has only one functional LLT link, the node is a member of the regular membership and the jeopardy membership. Being in a regular membership and jeopardy membership at the same time changes only the failover behavior on system fault. All other cluster functions remain. This means that failover due to a resource fault or switchover of service groups at operator request is unaffected.

The only change is that other systems prevented from starting service groups on system fault. VCS continues to operate as a single cluster when at least one network channel exists between the systems.

In the example shown in the diagram where one LLT link fails:• A jeopardy membership is formed that includes just system S3. • System S3 is also a member of the regular cluster membership with systems S1

and S2.• Service groups A, B, and C continue to run and all other cluster functions

remain unaffected.• Failover due to a resource fault or an operator request to switch a service

groups is unaffected. • If system S3 now faults or its last LLT link is lost, service group C is not

started on systems S1 or S2.

Single LLT Link Remaining

A CB

S2S2

S1S1 S3S3

Regular Membership: S1, S2, S3 Jeopardy Membership: S3

Page 334: VERITAS Cluster Server for UNIX Fundamentals

13–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Jeopardy MembershipWhen a system is down to a single LLT link, VCS can no longer reliably discriminate between loss of a system and loss of the last LLT connection. Systems with only a single LLT link are put into a special cluster membership known as jeopardy.

Jeopardy is a mechanism for preventing split-brain condition if the last LLT link fails. If a system is in a jeopardy membership, and then loses its final LLT link:• Service groups in the jeopardy membership are autodisabled in the regular

cluster membership.• Service groups in the regular membership are autodisabled in the jeopardy

membership.

Jeopardy membership also occurs in the case where had stops and hashadow is unable to restart had.

Recovering from a Jeopardy Membership

Recovery from a single LLT link failure is simple—fix and reconnect the link.

When GAB detects that the link is now functioning and the system in Jeopardy again has reliable (redundant) communication with the other cluster systems, the Jeopardy membership is removed.

Jeopardy MembershipA special type of cluster membership called jeopardyis formed when one or more systems have only a single LLT link.

Service groups continue to run, and the cluster functions normally. Failover due to resource faults and switching at operator request are unaffected.The service groups running on a system in jeopardy cannot fail over to another system if that system in jeopardy then faults or loses its last link.Reconnect the link to recover from jeopardy condition.

Page 335: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Transition from Jeopardy to Network PartitionIf the last LLT link fails:• A new regular cluster membership is formed that includes only systems S1 and

S2. This is referred to as a mini-cluster.• A new separate membership is created for system 3, which is a mini-cluster

with a single system. • Because system S3 was in a jeopardy membership prior to the last link failing:

– Service group C is autodisabled in the mini-cluster containing systems S1 and S2 to prevent either system from starting it.

– Service groups A and B are autodisabled in the cluster membership for system S3 to prevent system S3 from starting either one.

• Service groups A and B can still fail over between systems S1 and S2.

In this example, the cluster interconnect has partitioned and two separate cluster memberships have formed as a result, one on each side of the partition.

Each of the mini-clusters continues to operate. However, because they cannot communicate, each maintains and updates only its own version of the cluster configuration and the systems on different sides of the network partition have different cluster configurations.

Transition from Jeopardy to Network Partition

A CB

S2S2

S1S1 S3S3

Mini-cluster with regular membership: S1, S2

1

A, B autodisabled for S3 C autodisabled for S1, S2

Jeopardy membership: S31

Mini-cluster with regularmembership: S3No Jeopardy membership

2

SGs autodisabled3

2

3 3

Page 336: VERITAS Cluster Server for UNIX Fundamentals

13–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Recovering from a Network PartitionAfter a cluster is partitioned, reconnecting the LLT links must be undertaken with care because each mini-cluster has its own separate cluster configuration.

You must enable the cluster configurations to resynchronize by stopping VCS on the systems on one or the other side of the network partitions. When you reconnect the interconnect, GAB rejoins the regular cluster membership and you can then start VCS using hastart so that VCS rebuilds the cluster configuration from the other running systems in the regular cluster membership.

To recover from a network partition:1 On the cluster with the fewest systems (S3, in this example), stop VCS and

leave services running.2 Recable or fix LLT links.3 Restart VCS. VCS autoenables all service groups so that failover can occur.

Recovering from a Network Partition

A CB

S2S2

S1S1 S3S3

Mini-cluster with S1, S2 continues to run.

1

A, B autoenabled for S3 C autoenabled for S1, S2

Stop HAD on S3.1

Fix LLT links.2

Start HAD on S3. A, B, C are autoenabled by HAD.

3

3

4 4

2

Page 337: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Recovery BehaviorWhen a cluster partitions because the cluster interconnect has failed, each of the mini-clusters continues to operate. However, because they cannot communicate, each maintains and updates only its own version of the cluster configuration and the systems on different sides of the network partition have different cluster configurations.

If you reconnect the LLT links without first stopping VCS on one side of the partition, GAB automatically stops HAD on selected systems in the cluster to protect against a potential split-brain scenario.

GAB protects the cluster as follows:• In a two-system cluster, the system with the lowest LLT node number

continues to run VCS and VCS is stopped on the higher-numbered system.• In a multisystem cluster, the mini-cluster with the most systems running

continues to run VCS. VCS is stopped on the systems in the smaller mini-clusters.

• If a multisystem cluster is split into two equal-size mini-clusters, the cluster containing the lowest node number continues to run VCS.

Recovery BehaviorIf you did not stop HAD before reconnecting the cluster interconnect after a network partition, VCS and GAB areautomatically stopped and restarted as follows:

Two-system cluster: – The system with the lowest LLT node number continues

to run VCS.– VCS is stopped on higher-numbered system.

Multi-system cluster:– The mini-cluster with the most systems running

continues to run VCS. VCS is stopped on the systems in the smaller mini-clusters.

– If split into two equal size mini-clusters, the cluster containing the lowest node number continues to run VCS.

Page 338: VERITAS Cluster Server for UNIX Fundamentals

13–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Modifying the Default Recovery BehaviorAs an added protection against split brain condition, you can configure GAB to force systems to immediately reboot without a system shutdown after a partitioned cluster is reconnected by specifying the -j option to gabconfig in /etc/gabtab.

In this case, if you reconnect the LLT links and do not stop VCS, GAB prevents conflicts by halting systems according to these rules:• In a two-system cluster, the system with the lowest LLT node number

continues to run and the higher-numbered system is halted (panics).• In a multisystem cluster, the mini-cluster with the most systems running

continues to run. The systems in the smaller mini-clusters are halted.• If a multisystem cluster is split into two equal size mini-clusters, the cluster

containing the lowest node number continues to run.

Modifying the Default Recovery BehaviorYou can configure GAB to force an immediatereboot without a system shutdown in the casewhere LLT links are reconnected after a networkpartition.

Modify gabtab to start GAB with the –j option. For example:gabconfig -c -n 2 –j

This causes the high-numbered node to shut down if GAB tries to start after all LLT links simultaneously stop and then restart.

Page 339: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Potential Split Brain ConditionWhen both LLT links fail simultaneously:• The cluster partitions into two separate clusters.• Each cluster determines that the other systems are down and tries to start the

service groups.

If an application starts on multiple systems and can gain control of what are normally exclusive resources, such as disks in a shared storage device, split brain condition results and data can be corrupted.

Potential Split Brain Condition

BA

S2S2

S1S1 S3S3

C A CB

S1 and S2 determine that S3 is faulted. No jeopardy occurs, so no SGs are autodisabled.

1

1

If all systems are in all SGs SystemList, VCS tries to bring them online on a failover target.2

2

S3 determines that S1 and S2 are faulted.

1

Page 340: VERITAS Cluster Server for UNIX Fundamentals

13–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Interconnect Failures with a Low-Priority Public LinkLLT can be configured to use a low-priority network link as a backup to normal heartbeat channels. Low-priority links are typically configured on the public network or administrative network.

In normal operation, the low-priority link carries only heartbeat traffic for cluster membership and link state maintenance. The frequency of heartbeats is reduced by half to minimize network overhead.

When the low-priority link is the only remaining LLT link, LLT switches all cluster status traffic over the link. Upon repair of any configured link, LLT switches cluster status traffic back to the high-priority link.

Notes:• Nodes must be on the same public network segment in order to configure low-

priority links. LLT is a non-routable protocol.• You can have up to eight LLT links total, which can be a combination of low-

and high-priority links. If you have three high-priority links in the scenario shown in the slide, you would have the same progression to Jeopardy membership. The difference is that all three links are used for regular heartbeats and cluster status information.

Interconnect Failures with a Low-Priority Public Link

A CB

S2S2

S1S1 S3S3

No change in membership

1

2

Jeopardy membership: S3Public is now used for heartbeat and status.

1

2 Regular membership: S1, S2, S3

2

2

Page 341: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Simultaneous Interconnect Failure with a Low-Priority LinkIf the dedicated Ethernet LLT links fail when a low-priority link is still functioning, a jeopardy membership is formed.

The public network link is then used for all VCS membership and configuration data until a private Ethernet LLT network is restored.

Simultaneous Interconnect Failure with a Low-Priority Link

A CB

S2S2

S1S1 S3S3

Jeopardy membership: S3Public is now used for heartbeat and status.

Regular membership: S1, S2, S3

Page 342: VERITAS Cluster Server for UNIX Fundamentals

13–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Interconnect Failures with Service Group HeartbeatsVCS provides another type of heartbeat communication channel called service group heartbeats. The heartbeat disks used for service group heartbeats must be accessible from each system that can run the service group.

VCS includes the ServiceGroupHB resource type to implement this type of heartbeat. You add a ServiceGroupHB resource to the service group at the bottom of the dependency tree to ensure that no other nonpersistent resource can come online unless the ServiceGroupHB resource is already online.

Bringing the ServiceGroupHB resource online starts an internal process that periodically writes heartbeats to the disk. This process increments a counter, which enables other systems to recognize that the ServiceGroupHB resource is online.

Only one system can initiate this process. When VCS attempts to bring a ServiceGroupHB resource online on a system, that system monitors the disk to detect if heartbeats are being written by another system. If heartbeats are being written, VCS faults the ServiceGroupHB resource, thereby preventing the service group from being brought online.

In the example shown in the diagram, both LLT links fail simultaneously creating a network partition. VCS tries to bring service groups up on each side of the partition, but the ServiceGroupHB resources fault while coming online because the counter on the disk continues to increase.

Interconnect Failure with Service Group Heartbeats

A

Network partitionRegular membership: S1, S2

1

Regular membership: S3

C faults on S1 or S2. A faults on S3.

1

2 SGHB resource faults when brought online.

DiskDisk

C2 C A

S2S2

S1S1 S3S3

Page 343: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Preexisting Network PartitionA preexisting network partition occurs if LLT links fail while a system is down. If the system comes back up and starts running services without being able to communicate with the rest of the cluster, a split-brain condition can result.

When a preexisting network partition occurs, VCS prevents systems on one side of the partition from starting applications that may already be running by preventing HAD from starting on those systems.

In the scenario shown in the diagram, system S3 cannot start HAD when it reboots because the network failure prevents GAB from communicating with any other cluster systems; therefore, system S3 cannot seed.

Preexisting Network Partition

A B

S2S2

S1S1 S3S3

S3 faults; C started on S1 or S2Regular membership: S1, S2

1

2 No membership: S3

2

3

3 S3 reboots; S3 cannot start HAD because GAB on S3 can only detect one member

1

LLT links to S3 disconnected

C C

Page 344: VERITAS Cluster Server for UNIX Fundamentals

13–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Changing the Interconnect ConfigurationYou may never need to perform any manual configuration of the cluster interconnect because the VCS installation utility sets up the interconnect based on the information you provide about the cluster.

However, certain configuration tasks require you to modify VCS communication configuration files, as shown in the slide.

Example ScenariosThese are some examples where you may need tochange the cluster interconnect configuration:

Adding or removing cluster nodesMerging clustersChanging communication parameters, such as the heartbeat time intervalChanging recovery behaviorChanging or adding interfaces used for the cluster interconnectConfiguring additional disk or network heartbeat links for increasing heartbeat redundancy

Page 345: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Modifying the Cluster Interconnect ConfigurationThe overall process shown in the diagram is the same for any type of change to the VCS communications configuration.

Although some types of modifications do not require you to stop both GAB and LLT, using this procedure ensures that any type of change you make takes effect.

For example, if you added a system to a running cluster, you can change the value of -n in the gabtab file without having to restart GAB. However, if you added the -j option to change the recovery behavior, you must either restart GAB or execute the gabtab command manually for the change to take effect.

Similarly, if you add a host entry to llthosts, you do not need to restart LLT. However, if you change llttab, or you change a host name in llthosts, you must stop and restart LLT, and, therefore, GAB.

Regardless of the type of change made, the procedure shown in the slide ensures that the changes take effect. You can also use the scripts in the /etc/rc*.d directories to stop and start services.

Note: On Solaris, you must also unload the LLT and GAB modules if you are removing a system from the cluster, or upgrading LLT or GAB binaries. For example:modinfo | grep gab

modunload -i gab_id

modinfo | grep llt

modunload -i llt_id

Procedure for Modifying the Configuration

Stop LLT

Start VCS

Start GAB

Start LLT

Stop VCS

Stop GAB

# gabtab file

[path]/gabconfig –c –n #

# llttab fileset-node S1set-cluster 10

hastop –all -force

gabconfig -U

lltconfig -U

Edit Files

gabconfig –c –n #

lltconfig -c

hastart

# llthosts0 S11 S2 # sysname

S1

Page 346: VERITAS Cluster Server for UNIX Fundamentals

13–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Adding LLT LinksYou can add links to the LLT configuration as additional layers of redundancy for the cluster interconnect. You may want an additional interconnect link for:• VCS for heartbeat redundancy• Storage Foundation for Oracle RAC for additional bandwidth

To add an Ethernet link to the cluster interconnect:1 Cable the link on all systems. 2 Use the process on the previous page to modify the llttab file on each

system to add the new link directive.

To add a low-priority public network link, add a link-lowpri directive using the same syntax as the link directive, as shown in the llttab file example in the slide.

VCS uses the low-priority link only for heartbeats (at half the normal rate), unless it is the only remaining link in the cluster interconnect.

# cat /etc/llttabset-node S1set-cluster 10# Solaris examplelink qfe0 /dev/qfe:0 - ether - -link hme0 /dev/hme:0 - ether - -link ce0 /dev/ce:0 - ether - -link-lowpri qfe1 /dev/qfe:1 - ether - -

Adding LLT Links

Range (all) SAP

SolarisSolaris

Device:Unit Link Type MTUTag Name

AIX HP-UX Linux

Page 347: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

SummaryThis lesson described how VCS protects data in shared storage environments that do not support I/O fencing. You also learned how you can modify the communication configuration.

Next StepsNow that you know how VCS behaves when faults occur in a non-fencing environment, you can learn how VCS handles system and communication failures in a fencing environment.

Additional Resources• VERITAS Cluster Server Installation Guide

This guide describes how to configure the cluster interconnect.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

Lesson SummaryKey Points – Use redundant cluster interconnect links to

minimize interruption to services.– Use a standard procedure for modifying the

interconnect configuration when changes are required.

Reference Materials– VERITAS Cluster Server Installation Guide– VERITAS Cluster Server User's Guide

Page 348: VERITAS Cluster Server for UNIX Fundamentals

13–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 13: Testing Communication FailuresLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 13 Synopsis: Testing Communication Failures,” page A-60

Appendix B provides step-by-step lab instructions.• “Lab 13 Details: Testing Communication Failures,” page B-101

Appendix C provides complete lab instructions and solutions.• “Lab 13 Solutions: Testing Communication Failures,” page C-149

GoalThe purpose of this lab is to configure a low-priority link and then pull network cables and observe how VCS responds.

PrerequisitesWork together to perform the tasks in this lab.

ResultsAll interconnect links are up when the lab is completed.

Lab 13: Testing Communication Failures

TriggerTrigger injeopardyinjeopardy

Optional Lab

1. Configure the InJeopardy trigger (optional).2. Configure a low-priority link.3. Test failures.

trainxxtrainxxtrainxxtrainxx

Page 349: VERITAS Cluster Server for UNIX Fundamentals

Lesson 13 System and Communication Faults 13–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

13

Optional Lab: Configuring the InJeopardy TriggerGoalThe purpose of this lab is to configure the InJeopardy trigger and see which communication failures cause the trigger to run.

PrerequisitesObtain any classroom-specific values needed for your lab environment and record these values in your design worksheet included with the lab exercise.

ResultsAll network links are up and monitored, and the InJeopardy trigger has been run.

Page 350: VERITAS Cluster Server for UNIX Fundamentals

13–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 351: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14I/O Fencing

Page 352: VERITAS Cluster Server for UNIX Fundamentals

14–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewThis lesson describes how the VCS I/O fencing feature protects data in a shared storage environment.

ImportanceHaving a thorough understanding of how VCS responds to system and communication faults when I/O fencing is configured ensures that you know how services and their users are affected in common failure situations.

Course Overview

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 353: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Outline of Topics• Data Protection Requirements • I/O Fencing Concepts and Components• I/O Fencing Operations• I/O Fencing Implementation• Configuring I/O Fencing • Recovering Fenced Systems

Describe how I/O fencing is implemented.

I/O Fencing Implementation

Recover systems that have been fenced off.

Stopping Recovering Fenced Systems

Configure I/O fencing in a VCS environment.

Configuring I/O Fencing

Describe I/O fencing operations.I/O Fencing Operations

Describe I/O fencing concepts and components.

I/O Fencing Concepts and Components

Define requirements for protecting data in a cluster.

Data Protection Requirements

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 354: VERITAS Cluster Server for UNIX Fundamentals

14–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Data Protection Requirements Understanding the Data Protection ProblemIn order to understand how VCS protects shared data in a high availability environment, it helps to see the problem that needs to be solved—how a cluster goes from normal operation to responding to various failures.

Normal VCS Operation

When the cluster is functioning normally, one system is running one or more service groups and has the storage objects for those services imported or accessible from that system only.

Normal VCS Operation

Heartbeats travel on the cluster interconnect sending “I am alive” messages.Applications (Service Groups) run in the cluster and their current status is known.

AppApp

DBDB

Page 355: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

System Failure

In order to keep services high availability, the cluster software must be capable of taking corrective action on the failure of a system. Most cluster implementations are lights out environments—the HA software must automatically respond to faults without administrator intervention.

Example corrective actions are:• Starting an application on another node• Reconfiguring parallel applications to no longer include the departed node in

locking operations

The animation shows conceptually how VCS handles a system fault. The yellow service group that was running on Server 2 is brought online on Server 1 after GAB on Server 1 stops receiving heartbeats from Server 2 and notifies HAD.

System Failure

System failure is detected when the “I am alive”heartbeats no longer are seen coming from a given node.VCS then takes corrective action to fail over the service group from the failed server.

DBDB

AppApp

Page 356: VERITAS Cluster Server for UNIX Fundamentals

14–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Interconnect Failure

A key function of a high availability solution is to detect and respond to system faults. However, the system may still be running but unable to communicate heartbeats due to a failure of the cluster interconnect. The other systems in the cluster have no way to distinguish between the two situations.

This problem is faced by all HA solutions—how can the HA software distinguish a system fault from a failure of the cluster interconnect? As shown in the example diagram, whether the system on the right side (Server 2) fails or the cluster interconnect fails, the system on the left (Server 1) no longer receives heartbeats from the other system.

The HA software must have a method to prevent an uncoordinated view among systems of the cluster membership in any type of failure scenario.

In the case where nodes are running but the cluster interconnect has failed, the HA software needs to have a way to determine how to handle the nodes on each side of the network split, or partition.

Network Partition

A network partition is formed when one or more nodes stop communicating on the cluster interconnect due to a failure of the interconnect.

Interconnect Failure

If the interconnect fails between the clustered systems:The symptoms look the same as a system failure. However; VCS should not take corrective action and fail over the service groups.

DBDB

AppApp

Page 357: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Split Brain ConditionA network partition can lead to split brain condition—an issue faced by all cluster implementations. This problem occurs when the HA software cannot distinguish between a system failure and an interconnect failure. The symptoms look identical.

For example, in the diagram, if the right-side system fails, it stops sending heartbeats over the private interconnect. The left node then takes corrective action. Failure of the cluster interconnect presents identical symptoms. In this case, both nodes determine that their peer has departed and attempt to take corrective action. This can result in data corruption if both nodes are able to take control of storage in an uncoordinated manner.

Other scenarios can cause this situation. If a system is so busy that it appears to be hung such that it seems to have failed, its services can be started on another system. This can also happen on systems where the hardware supports a break and resume function. If the system is dropped to command-prompt level with a break and subsequently resumed, the system can appear to have failed. The cluster is reformed and then the system recovers and begins writing to shared storage again.

The remainder of this lesson describes how the VERITAS fencing mechanism prevents split brain condition in failure situations.

Split Brain Condition

If each system were to take corrective action and bringthe other system’s service groups online:

Each application would be running on each system.Data corruption can occur.

DBDB

AppApp

DBDB

AppApp

Changing Block 1024Changing Block 1024

Page 358: VERITAS Cluster Server for UNIX Fundamentals

14–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Data Protection RequirementsThe key to protecting data in a shared storage cluster environment is to guarantee that there is always a single consistent view of cluster membership. In other words, when one or more systems stop sending heartbeats, the HA software must determine which systems can continue to participate in the cluster membership and how to handle the other systems.

Data Protection RequirementsA cluster environment requires a method to guarantee:

Cluster membershipThe membership must be consistent.Data protectionUpon membership change, only one cluster can survive and have exclusive control of shared data disks.

AppApp

DBDB

When the heartbeats stop, VCSneeds to take action, but bothfailures have the same symptoms.

What action should be taken?Which failure is it?

When the heartbeats stop, VCSneeds to take action, but bothfailures have the same symptoms.

What action should be taken?Which failure is it?

Page 359: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

I/O Fencing Concepts and ComponentsIn order to guarantee data integrity in the event of communication failures among cluster members, full protection requires determining who should be able to remain in the cluster (membership) as well as guaranteed access blocking to storage from any system that is not an acknowledged member of the cluster.

VCS 4.0 uses a mechanism called I/O fencing to guarantee data protection. I/O fencing uses SCSI-3 persistent reservations (PR) to fence off data drives to prevent split-brain condition, as described in detail in this lesson.

I/O fencing uses an enhancement to the SCSI specification, known as SCSI-3 persistent reservations, (SCSI-3 PR or just PR). SCSI-3 PR is designed to resolve the issues of using SCSI reservations in a modern clustered SAN environment. SCSI-3 PR supports multiple nodes accessing a device while at the same time blocking access to other nodes. Persistent reservations are persistent across SCSI bus resets and PR also supports multiple paths from a host to a disk.

I/O FencingVERITAS Cluster Server 4.x uses a mechanism called I/O fencing to guarantee data protection. I/O fencing uses SCSI-3 persistent reservations (PR) to fence off data drives to prevent split brain condition.

DBDB XXAppApp

Page 360: VERITAS Cluster Server for UNIX Fundamentals

14–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

I/O Fencing ComponentsVCS uses fencing to allow write access to members of the active cluster and to block access to nonmembers.

I/O fencing in VCS consists of several components. The physical components are coordinator disks and data disks. Each has a unique purpose and uses different physical disk devices.

Coordinator Disks

The coordinator disks act as a global lock mechanism, determining which nodes are currently registered in the cluster. This registration is represented by a unique key associated with each node that is written to the coordinator disks. In order for a node to access a data disk, that node must have a key registered on coordinator disks.

When system or interconnect failures occur, the coordinator disks ensure that only one cluster survives, as described in the “I/O Fencing Operations” section.

I/O Fencing ComponentsCoordinator disks:

Act as a global lock deviceDetermine which nodes are currently registered in the clusterEnsure that only one cluster survives in the case of a network partition

DBDB

AppApp

Page 361: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Data Disks

Data disks are standard disk devices used for shared data storage. These can be physical disks or RAID logical units (LUNs). These disks must support SCSI-3 PR. Data disks are incorporated into standard VM disk groups. In operation, Volume Manager is responsible for fencing data disks on a disk group basis.

Disks added to a disk group are automatically fenced, as are new paths to a device are discovered.

Data DisksAre located on a shared storage deviceStore application data for service groupsMust support SCSI-3Must be in a Volume Manager 4.x disk group

DBDB

AppApp

Page 362: VERITAS Cluster Server for UNIX Fundamentals

14–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

I/O Fencing OperationsRegistration with Coordinator DisksAfter GAB has started and port a membership is established, each system registers with the coordinator disks. HAD cannot start until registration is complete.

Registration keys are based on the LLT node number. Each key is eight characters—the left-most character is the ASCII character corresponding to the LLT node ID. For example, Node 0 uses key A-------, Node 1 uses B-------, Node 2 is C, and so on. The right-most seven characters are dashes. For simplicity, these are shown as A and B in the diagram.

Note: The registration key is not actually written to disk, but is stored in the drive electronics or RAID controller.

All systems are aware of the keys of all other systems, forming a membership of registered systems. This fencing membership—maintained by way of GAB port b—is the basis for determining cluster membership and fencing data drives.

Registration with Coordinator Disks1. At GAB startup, just

after port a membership is established, nodes must register with the coordinator disks before starting HAD.

2. Nodes register keys based on node number.

3. Nodes can only register with the fencing membership if coordinator disks have expected keys on them.

4. Fencing membership uses GAB port b.

a 01b 01

a 01b 01

AB

AB

AB

Keys are based on LLT node number:0=A, 1=B, and so on.

Node 0Node 0 Node 1Node 1

Page 363: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Service Group StartupAfter each system has written registration keys to the coordinator disks, the fencing membership is established and port b shows all systems as members. In the example shown in the diagram, the cluster has two members, Node 0 and Node 1, so port b membership shows 0 and 1.

At this point, HAD is started on each system. When HAD is running, VCS brings service groups online according to their specified startup policies. When a disk group resource associated with a service group is brought online, the Volume Manager disk group agent (DiskGroup) imports the disk group and writes a SCSI-3 registration key to the data disk. This registration is performed in a similar way to coordinator disk registration. The key is different for each node; Node 0 uses AVCS, Node 1 uses BVCS, and so on.

In the example shown in the diagram, Node 0 is registered to write to the data disks in the disk group belonging to the DB service group. Node 1 is registered to write to the data disks in the disk group belonging to the App service group.

After registering with the data disk, Volume Manager sets a Write Exclusive Registrants Only reservation on the data disk. This reservation means that only the registered system can write to the data disk.

Service Group StartupHAD is allowed to start after fencing is finished.As service groups are started, VM disk groups are imported. Importing a disk group writes a registration key and places a reservation on its disks.

a 01b 01h 01

a 01b 01h 01

AB

AB

AB

DBDBAppApp

Disk group for DB, has key of AVCS and reservation for Node 0 exclusive access

Disk group for App, has key of BVCS and reservation for Node 1 exclusive access

Node 0Node 0 Node 1Node 1

Page 364: VERITAS Cluster Server for UNIX Fundamentals

14–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

System Failure The diagram shows the fencing sequence when a system fails.1 Node 0 detects that Node 1 has failed when the LLT heartbeat times out and

informs GAB. At this point, port a on Node 0 (GAB membership) shows only 0.2 The fencing driver is notified of the change in GAB membership and Node 0

races to win control of a majority of the coordinator disks. This means Node 0 must eject Node 1 keys (B) from at least two of three coordinator disks. In coordinator disk serial number order, the fencing driver ejects the registration of Node 1 (B keys) using the SCSI-3 Preempt and Abort command. This command allows a registered member on a disk to eject the registration of another. Because I/O fencing uses the same key for all paths from a host, a single preempt and abort ejects a host from all paths to storage.

3 In this example, Node 0 wins the race for each coordinator disk by ejecting Node 1 keys from each coordinator disk.

4 Now port b (fencing membership) shows only Node 0 because Node 1 keys have been ejected. Therefore, fencing has a consistent membership and passes the cluster reconfiguration information to HAD.

5 GAB port h reflects the new cluster membership containing only Node 0 and HAD now performs whatever failover operations are defined for the service groups that were running on the departed system.Fencing takes place when a service group is brought online on a surviving system as part of the disk group importing process. When the DiskGroup resources come online, the agent online entry point instructs Volume Manager to import the disk group with options to remove the Node 1 registration and reservation, and place a SCSI-3 registration and reservation for Node 0.

System Failure 1. Node 0 detects no more

heartbeats from Node 1.2. Node 0 races for the

coordinator disks, ejecting all “B” keys.

3. Node 0 wins all coordinator disks.

4. Node 0 knows it has a perfect membership.

5. VCS can now fail over the App service group and import the disk group, changing the reservation.

a 01b 01h 01

a 01b 01h 01

AB

AB

AB

DBDBAppApp

Disk group for DB, has key of AVCS and reservation for Node 0 exclusive access

Disk Group for App, has key of BVCS and Reservation for Node 1 exclusive access

Node 0Node 0 Node 1Node 1

X X

A

X

BXA

a 0b 01h 01

a 0b 0h 01

a 0b 0h 0

AppApp

Disk group for App, has key of AVCS and reservation for Node 0 exclusive access

Page 365: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Interconnect FailureThe diagram shows how VCS handles fencing if the cluster interconnect is severed and a network partition is created. In this case, multiple nodes are racing for control of the coordinator disks.1 LLT on Node 0 informs GAB that it has not received a heartbeat from Node 1

within the timeout period. Likewise, LLT on Node 1 informs GAB that it has not received a heartbeat from Node 0.

2 When the fencing drivers on both nodes receive a cluster membership change from GAB, they begin racing to gain control of the coordinator disks.The node that reaches the first coordinator disk (based on disk serial number) ejects the failed node’s key. In this example, Node 0 wins the race for the first coordinator disk and ejects the B------- key.After the B key is ejected by Node 0, Node 1 cannot eject the key for Node 0 because the SCSI-PR protocol says that only a member can eject a member. SCSI command tag queuing creates a stack of commands to process, so there is no chance of these two ejects occurring simultaneously on the drive. This condition means that only one system can win.

3 Node 0 also wins the race for the second coordinator disk. Node 0 is favored to win the race for the second coordinator disk according to the algorithm used by the fending driver. Because Node 1 lost the race for the first coordinator disk, Node 1 has to reread the coordinator disk keys a number of times before it tries to eject the other node’s key. This favors the winner of the first coordinator disk to win the remaining coordinator disks. Therefore, Node 1 does not gain control of the second or third coordinator disks.

Disk Group for App, has key of BVCS and Reservation for Node 1 exclusive access

Disk group for App, has key of AVCS and reservation for Node 0 exclusive access

Interconnect Failure 1. Node 0 detects no more

heartbeats from Node 1. Node 1 detects no more heartbeats from Node 0.

2. Nodes 0 and 1 race for the coordinator disks, ejecting each other's keys. Only one node can win each disk.

3. Node 0 wins majority coordinator disks.

4. Node 1 panics.5. Node 0 now has perfect

membership.6. VCS fails over the App

service group, importing the disk group and changing the reservation.

a 01b 01h 01

a 01b 01h 01

AB

AB

AB

DBDBAppApp

Disk group for DB, has key of AVCS and reservation for Node 0 exclusive access

Node 0Node 0 Node 1Node 1

X X

A

X

BA

a 0b 01h 01

a 0b 0h 01

a 0b 0h 0

AppApp

a 1b 01h 01X

Page 366: VERITAS Cluster Server for UNIX Fundamentals

14–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

4 After Node 0 wins control of the majority of coordinator disks (all three in this example), Node 1 loses the race and calls a kernel panic to shut down immediately and reboot.

5 Now port b (fencing membership) shows only Node 0 because Node 1 keys have been ejected. Therefore, fencing has a consistent membership and passes the cluster reconfiguration information to HAD.

6 GAB port h reflects the new cluster membership containing only Node 0, and HAD now performs the defined failover operations for the service groups that were running on the departed system.When a service group is brought online on a surviving system, fencing takes place as part of the disk group importing process.

Page 367: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Interconnect Failure on Node Restart

A preexisting network partition occurs when the cluster interconnect is severed and a node subsequently reboots to attempt to form a new cluster. After the node starts up, it is prevented from gaining control of shared disks.

In this example, the cluster interconnect remains severed. Node 0 is running and has key A------- registered with the coordinator disks.1 Node 1 starts up. 2 GAB cannot seed because it detects only Node 1 and the gabtab file specifies

gabconfig -c -n2. GAB can only seed if two systems are communicating. Therefore, HAD cannot start and service groups do not start.

3 At this point, an administrator mistakenly forces GAB to seed Node 1 using the gabconfig -x command.

4 As part of the initialization of fencing, the fencing driver receives a list of current nodes in the GAB membership, reads the keys present on the coordinator disks, and performs a comparison. In this example, the fencing driver on Node 1 detects keys from Node 0 (A-------) but does not detect Node 0 in the GAB membership because the cluster interconnect has been severed.gabconfig -aGAB Port Memberships===================================================Port a gen b7r004 membership 1

Interconnect Failure on Node Restart1. Node 1 reboots.2. Node 1 cannot join;

gabconfig –c –n2 is set in gabtab and only one node is seen.

3. Admin mistakenly seeds Node 1 with gabconfig –c –x.

4. The fence driver expects to see only “B” keys or no keys on coordinator disks.

5. When the fence driver sees “A” keys, it is disabled. No further cluster startup can occur without a reboot.

6. To start the cluster, fix the interconnect and reboot Node 1.

a 01b 01h 01

a b h

AB

AB

AB

DBDB

Disk group for DB, has key of AVCS and reservation for Node 0 exclusive access

Disk group for App, has key of AVCS and reservation for Node 0 exclusive access

Node 0Node 0 Node 1Node 1

X X

A

X

BXA

a 0b 01h 01

a 0b 0h 01

a 0b 0h 0 AppApp

a 1 b h

Page 368: VERITAS Cluster Server for UNIX Fundamentals

14–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

5 Because Node 1 can detect keys on the coordinator disks for systems not in the GAB membership, the fencing driver on Node 1 determines that a preexisting network partition exists and prints an error message to the console. The fencing driver prevents HAD from starting, which, in turn, prevents a disk groups from being imported.

To enable Node 1 to rejoin the cluster, you must repair the interconnect and restart Node 1.

Page 369: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

I/O Fencing BehaviorAs demonstrated in the example failure scenarios, I/O fencing behaves the same regardless of the type of failure:• The fencing drivers on each system race for control of the coordinator disks

and the winner determines cluster membership.• Reservations are placed on the data disks by Volume Manager when disk

groups are imported.

I/O Fencing BehaviorI/O fencing behavior is the same for both scenarios:– System failure– Cluster interconnect failure

I/O fencing makes no assumptions? the driver races for the control of the coordinator disks to form a perfect membership.Data disks are fenced when imported by Volume Manager.Nodes that have departed from the cluster membership are not allowed access to the data disks until they rejoin the normal cluster membership and the service groups are started on those nodes.

Page 370: VERITAS Cluster Server for UNIX Fundamentals

14–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

I/O Fencing with Multiple NodesIn a multinode cluster, the lowest numbered (LLT ID) node always races on behalf of the remaining nodes. This means that at any time only one node is the designated racer for any mini-cluster.

If a designated racer wins the coordinator disk race, it broadcasts this success on port b to all other nodes in the mini-cluster.

If the designated racer loses the race, it panics and reboots. All other nodes immediately detect another membership change in GAB when the racing node panics. This signals all other members that the racer has lost and they must also panic.

Majority Clusters

The I/O fencing algorithm is designed to give priority to larger clusters in any arbitration scenario. For example, if a single node is separated from a 16-node cluster due to an interconnect fault, the 15-node cluster should continue to run. The fencing driver uses the concept of a majority cluster. The algorithm determines if the number of nodes remaining in the cluster is greater than or equal to the number of departed nodes. If so, the larger cluster is considered a majority cluster. The majority cluster begins racing immediately for control of the coordinator disks on any membership change. The fencing drivers on the nodes in the minority cluster delay the start of the race to give an advantage to the larger cluster. This delay is accomplished by reading the keys on the coordinator disks a number of times. This algorithm ensures that the larger cluster wins, but also allows a smaller cluster to win if the departed nodes are not actually running.

I/O Fencing with Multiple NodesThe lowest LLT node number on each side of the mini-cluster races for the coordinator disks.Two systems race in this scenario.The winner broadcasts success on GAB port b.All nodes in the losing mini-cluster panic.With an even split, generally the highest LLT node number wins.

Page 371: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

I/O Fencing ImplementationCommunication StackI/O fencing uses GAB port b for communications. The following steps describe the start-up sequence used by vxfen and associated communications:1 Fencing is started with the vxfenconfig -c command. The fencing driver

vxfen is started during system startup by way of the /etc/rc2.d/S97vxfen file, whether or not fencing is configured for VCS. The fencing driver:a Passes in the list of coordinator disksb Checks for other members on port bc If this is the first member to register, reads serial numbers from coordinator

disks and stores them in memoryd If this is the second or later member to register, obtains serial numbers of

coordinator disks from the first membere Reads and compares the local serial number f Errors out, if the serial number is differentg Begins a preexisting network partition checkh Reads current keys registered on coordinator disksi Determines that all keys match the current port b membershipj Registers the key with coordinator disks

2 Membership is established (port b). 3 HAD is started and port h membership is established.

Communication Stack

LMX

I/O fencing:Is implemented by the fencing driver (vxfen)Uses GAB port b for communicationDetermines coordinator disks on vxfen startupIntercepts RECONFIG messages from GAB destined for the VCS engine Controls fencing actions by Volume Manager

vxfen

b

vxfen

VM

GAB

LLT

LLT

GAB b

VM

Page 372: VERITAS Cluster Server for UNIX Fundamentals

14–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

4 HAD starts service groups.5 The DiskGroup resource is brought online and control is passed to VxVM to

import disk groups with SCSI-3 reservations.

Page 373: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Fencing Driver Fencing in VCS is implemented in two primary areas:• The vxfen fencing driver, which directs Volume Manager• Volume Manager, which carries out actual fencing operations at the disk group

level

The fencing driver is a kernel module that connects to GAB to intercept cluster membership changes (reconfiguration messages). If a membership change occurs, GAB passes the new membership in the form of a reconfiguration message to vxfen on GAB port b. The fencing driver on the node with lowest node ID in the remaining cluster races for control of the coordinator disks, as described previously. If this node wins, it passes the list of departed nodes to VxVM to have these nodes ejected from all shared disk groups.

After carrying out required fencing actions, vxfen passes the reconfiguration message to HAD.

Fencing DriverThe VERITAS fencing driver (vxfen):

Coordinates membership with the race for coordinator disksIs called by other modules for authorization to continueIs installed by VCS and started during system startup

Page 374: VERITAS Cluster Server for UNIX Fundamentals

14–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Fencing Implementation in Volume ManagerVolume Manager 4.0 handles all fencing of data drives for disk groups that are controlled by the VCS DiskGroup resource type. After a node successfully joins the GAB cluster and the fencing driver determines that a preexisting network partition does not exist, the VCS DiskGroup agent directs VxVM to import disk groups using SCSI-3 registration and a Write Exclusive Registrants Only reservation. This ensures that only the registered node can write to the disk group.

Each path to a drive represents a different I/O path. I/O fencing in VCS places the same key on each path. For example, if node 0 has four paths to the first disk group, all four paths have key AVCS registered. Later, if node 0 must be ejected, VxVM preempts and aborts key AVCS, effectively ejecting all paths.

Because VxVM controls access to the storage, adding or deleting disks is not a problem. VxVM fences any new drive added to a disk group and removes keys when drives are removed. VxVM also determines if new paths are added and fences these, as well.

Fencing Implementation in Volume ManagerVM 4.x disk groups are fenced by the VCS DiskGroupagent.

When the disk group agent brings a disk group online Volume Manager:– Imports the disk group, placing the node key on each

disk in the disk group.– Places SCSI-3 reservations on the disks.vxdg -o groupreserve -o clearreserve -t import group

When the disk group agent takes a disk group offline, Volume Manager removes the node key and SCSI-3 reservation from each disk.VxVM allows a disk or path to be added or removed.

Page 375: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Fencing Implementation in VCSIn VCS 4.0, had is modified to enable the use of fencing for data protection in the cluster. When the UseFence cluster attribute is set to SCSI3, had cannot start unless the fencing driver is operational. This ensures that services cannot be brought online by VCS unless fencing is already protecting shared storage disks.

Note: With I/O fencing configured, GAB disk heartbeats are not supported.

Fencing Implementation in VCSHAD– VCS is modified to use fencing for coordination.– The UseFence attribute must be set to SCSI3 in

the main.cf file.– HAD does not start unless the fencing driver is

operational when UseFence set to SCSI3.Legacy behavior availableSet UseFence=None.

Page 376: VERITAS Cluster Server for UNIX Fundamentals

14–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Coordinator Disk ImplementationCoordinator disks are special-purpose disks in a VCS environment. Coordinator disks are three standard disks or LUNs that are set aside for use by I/O fencing during cluster reconfiguration.

You cannot use coordinator disks for any other purpose in the VCS configuration. Do not store data on these disks or include the disks in disk groups used for data. The coordinator disks can be any three disks that support persistent reservations. VERITAS typically recommends the smallest possible LUNs for coordinator use.

Note: Discussion of coordinator disks in metropolitan area (campus) clusters is provided in the Disaster Recovery Using VERITAS Cluster Server course.

Coordinator Disk ImplementationThree coordinator disks are required with these properties:

Use standard disks or LUNs (the smallest possible LUNs on separate spindles).Ensure disks support SCSI-3 persistent reservations.Create a separate disk group used only for fencing.– Do not store data on the coordinator disks.– Deport the disk group.

Configure hardware mirroring of coordinator disks; they cannot be replaced without stopping the cluster.

Page 377: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Configuring I/O Fencing The diagram above shows the basic procedure used to configure I/O fencing in a VCS environment. Additional information is provided as follows:1 Create a disk group for the coordinator disks. You can choose any name. The

name in this example is fendg. 2 Use the vxfentsthdw utility to verify that the shared storage array supports

SCSI-3 persistent reservations. Assuming that the same array is used for coordinator and data disks, you need only check one of the disks in the array. Warning: The vxfentsthdw utility overwrites and destroys existing data on the disks by default. You can change this behavior using the -r option to perform read-only testing. Other commonly used options include:– -f file_name (Verify all disks listed in the file.)– -g disk_group (Verify all disks in the disk group.)Note: You can check individual LUNs for SCSI-3 support to ensure that you have the array configured properly before checking all disk groups. To determine the paths on each system for that disk, use the vxfenadm utility to check the serial number of the disk. For example:vxfenadm -i disk_dev_pathAfter you have verified the paths to that disk on each system, you can run vxfentsthdw with no arguments, which prompts you for the systems and then for the path to that disk from each system. A verified path means that the SCSI inquiry succeeds. For example, vxfenadm returns a disk serial number from a SCSI disk and an ioctl failed message from non-SCSI 3 disk.

I/O Fencing Configuration Procedure

Verify SCSI-3 support.Verify SCSI-3 support.

Create /etc/vxfendgon all systems.

Create /etc/vxfendgon all systems.

vxfentsthdw –g fendgvxfentsthdw –rg dataDG1vxfentsthdw –rg dataDG2. . .

Configure coordinator disk group.

Configure coordinator disk group.

vxdisksetup –i disk1vxdisksetup –i disk2 vxdisksetup –i disk3vxdg init fendg disk1 disk2 disk3

echo “fendg” > /etc/vxfendg

Deport disk group.Deport disk group. vxdg deport fendg

Solaris AIX HP-UX Linux

Page 378: VERITAS Cluster Server for UNIX Fundamentals

14–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

3 Deport the coordinator disk group.4 Create the /etc/vxfendg fencing configuration file on each system in the

cluster. The file must contain the coordinator disk group name.

Page 379: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–29Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

5 Start the fencing driver on each system using the /etc/init.d/vxfen startup file with the start option. Upon startup, the script creates the vxfentab file with a list of all paths to each coordinator disk. This is accomplished as follows:a Read the vxfendg file to obtain the name of the coordinator disk group.

vxdisk -o alldgs list b Run grep to create a list of each device name (path) in the coordinator

disk group.c For each disk device in this list, run vxdisk list disk_dev and create

a list of each device that is in the enabled state.d Write the list of enabled devices to the vxfentab file.This ensures that any time a system is rebooted, the fencing driver reinitializes the vxfentab file with the current list of all paths to the coordinator disks. Note: This is the reason coordinator disks cannot be dynamically replaced. The fencing driver must be stopped and restarted to populate the vxfentab file with the updated paths to the coordinator disks.

6 Save and close the cluster configuration before modifying main.cf to ensure that the changes you make to main.cf are not overridden.

7 Stop VCS on all systems. Do not use the -force option. You must stop and restart service groups to reimport disk groups to place data under fencing control.

8 Set the UseFence cluster attribute to SCSI3 in the main.cf file.

Note: You cannot set UseFence dynamically while VCS is running.

Restart VCS.Restart VCS.

Configuring Fencing (Continued)

Set UseFence in main.cf.Set UseFence in main.cf. UseFence=SCSI3

hastart [-stale]

Run the startscript for fencing.

Run the startscript for fencing. /etc/init.d/vxfen start

You must stop and restart service groups so that the disk groups are imported using SCSI-3 reservations.You must stop and restart service groups so that the disk groups are imported using SCSI-3 reservations.

On each systemOn each systemStop VCS on all systems.Stop VCS on all systems. hastop –all

!

Save and close the configuration.

Save and close the configuration. haconf –dump -makero

Page 380: VERITAS Cluster Server for UNIX Fundamentals

14–30 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

9 Start VCS on the system with the modified main.cf file and propagate that configuration to all cluster systems. As a best practice, start all other systems with the -stale option to ensure that all other systems wait to build their configuration from the system where you modified the main.cf file. See the “Offline Configuration of a Service Group” lesson for more information.Note: You must stop VCS and take service groups offline. Do not use the -force option to leave services running. You must deport and reimport the disk groups to bring them under fencing control. These disk groups must be imported with SCSI-3 reservations. The disk groups are automatically deported when you stop VCS, which takes service groups offline. The disk groups are automatically imported when VCS is restarted and the service groups are brought back online.

Page 381: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–31Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Fencing Effects on Disk GroupsWhen SCSI reservations have been set on disk groups, the vxdisk -o alldgs list command no longer shows the disk groups that have been imported on non-local cluster systems. Also, the format command then shows the disks as type unknown. Therefore, you cannot run vxdisk -o alldgs list to find which disk groups are in a deported state on the local system. Instead, you can run vxdg -C import diskgroup and observe that it fails because of the SCSI reservation.

Fencing Effects on Disk GroupsAfter SCSI reservations have been set on diskgroups:

The vxdisk –o alldgs list command no longer shows the disk groups that are imported on other systems.The format command shows disks with a SCSI-3 reservation as type unknown.You can use vxdg –C import disk_group to determine if a disk group is imported on the local system. The command fails if the disk group is deported.The vxdisk list command shows imported disks.

Page 382: VERITAS Cluster Server for UNIX Fundamentals

14–32 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Stopping and Recovering Fenced SystemsStopping Systems Running I/O FencingTo ensure that keys held by a system are removed from disks when you stop a cluster system, use the shutdown command. If you use the reboot command, the fencing shutdown scripts do not run to clear keys from disks.

If you inadvertently use reboot to shut down, you may see a message about a pre-existing split brain condition when you try to restart the cluster. In this case, you can use the vxfenclearpre utility described in the “Recovering from a Partition-In-Time” section to remove keys.

Stopping Systems Running I/O FencingTo shut down a cluster system which hasfencing configured and running:

Use the shutdown -r command to ensure that keys are removed from disks reserved by the node being shut down.Do not use the reboot command. Keys are not removed from the disks reserved because the shutdown scripts are bypassed.

Page 383: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–33Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

Recovery with Running SystemsIf one or more nodes are fenced out due to system or interconnect failures, and some part of the cluster remains running, recover as follows:1 Shut down the systems that are fenced off.2 Fix the system or network problem.3 Start up the systems.

When the systems start communicating heartbeats, they are included in the cluster membership and participate in fencing again.

Recovering from a Single-Node Failure

Node 2 is cut off from the heartbeat network, loses the race, and panics.

Node 2 is cut off from the heartbeat network, loses the race, and panics.

Shut down Node 2.Shut down Node 2.

1

3

2

Start Node 2.Start Node 2.

Fix the system or interconnect.

Fix the system or interconnect.

4

210

Page 384: VERITAS Cluster Server for UNIX Fundamentals

14–34 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Recovering from a Partition-In-TimeVERITAS provides the vxfenclearpre script to clear keys from the coordinator and data disks in the event of a partition-in-time where all nodes are fenced off.

The following procedure shows how to recover in an example scenario where:• Node 1 fails first.• Node 0 fails before Node 1 is repaired. • Node 1 is repaired and boots while Node 0 is down.• Node 1 cannot access the coordinator disks because Node 0’s keys are still on

the disk.

To recover:1 Verify that Node 0 is actually down to prevent the possibility of corruption

when you manually clear the keys.2 Verify the systems currently registered with the coordinator disks:

vxfenadm -g all -f /etc/vxfentabThe output of this command identifies the keys registered with the coordinator disks.

3 Clear all keys on the coordinator disks in addition to the data disks:/opt/VRTSvcs/vxfen/bin/vxfenclearpre

4 Repair the faulted system.5 Reboot all systems in the cluster.

Recovering from a Multiple-Node Failure

Node 1 fails. Node 0 fails before Node 1 is repaired.

Node 1 fails. Node 0 fails before Node 1 is repaired.

Repair and boot Node 1.

Verify that Node 0 is actually down.

Repair and boot Node 1.

Verify that Node 0 is actually down.

On Node 1:a. Ensure that the other

node is not running or you can cause split brain condition.

b. View keys currently registered on coordinator disks:vxfenadm –g all –f /etc/vxfentab

c. Clear keys on coordinator and data disks:vxfenclearpre

d. Repair node 0.e. Reboot all systems.

On Node 1:a. Ensure that the other

node is not running or you can cause split brain condition.

b. View keys currently registered on coordinator disks:vxfenadm –g all –f /etc/vxfentab

c. Clear keys on coordinator and data disks:vxfenclearpre

d. Repair node 0.e. Reboot all systems.

3

10

1

2

Page 385: VERITAS Cluster Server for UNIX Fundamentals

Lesson 14 I/O Fencing 14–35Copyright © 2005 VERITAS Software Corporation. All rights reserved.

14

SummaryThis lesson described how VCS protects data in a shared storage environment, focusing on the concepts and basic operations of the I/O fencing feature available in VCS version 4.

Next StepsNow that you understand how VCS behaves normally and when faults occur, you can gain experience performing basic troubleshooting in a cluster environment.

Additional Resources• VERITAS Cluster Server Installation Guide

This guide describes I/O fencing configuration.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• VERITAS Volume Manager User’s GuideThis guide provides detailed information on procedures and concepts for configuring and managing storage using Volume Manager.

• http://van.veritas.comThe VERITAS Architect Network provides access to technical papers describing I/O fencing.

Lesson SummaryKey Points – I/O fencing ensures data is protected in a

cluster environment.– Disk devices must support SCSI-3 persistent

reservations to implement I/O fencing.Reference Materials– VERITAS Cluster Server Installation Guide – VERITAS Cluster Server User's Guide – VERITAS Volume Manager User's Guide – http://van.veritas.com

Quorum SCSI

Page 386: VERITAS Cluster Server for UNIX Fundamentals

14–36 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Lab 14: Configuring I/O FencingLabs and solutions for this lesson are located on the following pages.

Appendix A provides brief lab instructions for experienced students.• “Lab 14 Synopsis: Configuring I/O Fencing,” page A-66

Appendix B provides step-by-step lab instructions.• “Lab 14: Configuring I/O Fencing,” page B-111

Appendix C provides complete lab instructions and solutions.• “Lab 14 Solutions: Configuring I/O Fencing,” page C-163

GoalThe purpose of this lab is to set up I/O fencing in a two-node cluster and simulate node and communication failures.

PrerequisitesWork with your lab partner to complete the tasks in this lab exercise.

ResultsEach student observes the failure scenarios and performs the tasks necessary to bring the cluster back to a running state.

Lab 14: Configuring I/O FencingWork with your lab partner to configure fencing.

trainxxtrainxx

Coordinator Disks

nameDG1, nameDG2

Disk 1:___________________

Disk 2:___________________

Disk 3:___________________

Page 387: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15Troubleshooting

Page 388: VERITAS Cluster Server for UNIX Fundamentals

15–2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

IntroductionOverviewIn this lesson you learn an approach for detecting and solving problems with VERITAS Cluster Server (VCS) software. You work with specific problem scenarios to gain a better understanding of how the product works.

ImportanceTo successfully deploy and manage a cluster, you need to understand the significance and meaning of errors, faults, and engine problems. This helps you detect and solve problems efficiently and effectively.

Course Overview

Lesson 1: VCS Building BlocksLesson 2: Preparing a Site for VCSLesson 3: Installing VCSLesson 4: VCS OperationsLesson 5: Preparing Services for VCSLesson 6: VCS Configuration MethodsLesson 7: Online Configuration of Service GroupsLesson 8: Offline Configuration of Service GroupsLesson 9: Sharing Network InterfacesLesson 10: Configuring NotificationLesson 11: Configuring VCS Response to FaultsLesson 12: Cluster CommunicationsLesson 13: System and Communication FaultsLesson 14: I/O FencingLesson 15: Troubleshooting

Page 389: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–3Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Outline of Topics• Monitoring VCS • Troubleshooting Guide• Cluster Communication Problems• VCS Engine Problems• Service Group and Resource Problems• Archiving VCS-Related Files

Identify and solve VCS engine problems.

VCS Engine Problems

Create an archive of VCS-related files.Archiving VCS-Related Files

Correct service group and resource problems.

Service Group and Resource Problems

Resolve cluster communication problems.

Cluster Communication Problems

Apply troubleshooting techniques in a VCS environment.

Troubleshooting Guide

Describe tools for monitoring VCS operations.

Monitoring VCS

After completing this lesson, you will be able to:

Topic

Lesson Topics and Objectives

Page 390: VERITAS Cluster Server for UNIX Fundamentals

15–4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Monitoring VCSVCS provides numerous resources you can use to gather information about the status and operation of the cluster. These include:• VCS log files

– VCS engine log file, /var/VRTSvcs/log/engine_A.log– Agent log files – hashadow log file, /var/VRTSvcs/log/hashadow_A.log

• System log files:– /var/adm/messages (/var/adm/syslog on HP-UX)– /var/log/syslog

• The hastatus utility• Notification by way of SNMP traps and e-mail messages• Event triggers • Cluster Manager

The information sources that have not been covered elsewhere in the course are discussed in more detail in the next sections.

Monitoring FacilitiesVCS log filesSystem log filesThe hastatus utilityNotificationEvent triggers VCS GUIs

Page 391: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–5Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

VCS LogsIn addition to the engine_A.log primary VCS log file, VCS logs information for had, hashadow, and all agent programs in these locations:• had: /var/VRTSvcs/log/engine_A.log• hashadow: /var/VRTSvcs/log/hashadow_A.log• Agent logs: /var/VRTSvcs/log/AgentName_A.log

Messages in VCS logs have a unique message identifier (UMI) built from product, category, and message ID numbers. Each entry includes a text code indicating severity, from CRITICAL entries indicating that immediate attention is required, to INFO entries with status information.

The log entries are categorized as follows:• CRITICAL: VCS internal message requiring immediate attention

Note: Contact Customer Support immediately.

• ERROR: Messages indicating errors and exceptions• WARNING: Messages indicating warnings• NOTICE: Messages indicating normal operations• INFO: Informational messages from agents

Entries with CRITICAL and ERROR severity levels indicate problems that require troubleshooting.

VCS LogsEngine log: /var/VRTSvcs/log/engine_A.logView logs using the GUI or the hamsg command:hamsg engine_A

Agent logs kept in /var/VRTSvcs/log

2003/05/20 16:00:09 VCS NOTICE V-16-1-10322System S1 (Node '0') changed state from STALE_DISCOVER_WAIT to STALE_ADMIN_WAIT2003/05/20 16:01:27 VCS INFO V-16-1-50408 Received connection from client Cluster Manager -Java Console (ID:400)

2003/05/20 16:01:31 VCS ERROR V-16-1-10069 All systems have configuration files marked STALE. Unable to form cluster.

Most RecentMost Recent

Unique Message Identifier (UMI)

Page 392: VERITAS Cluster Server for UNIX Fundamentals

15–6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Changing the Log Level and File Size

You can change the amount of information logged by agents for resources being monitored. The log level is controlled by the LogDbg resource type attribute. Changing this value affects all resources of that type.

Use the hatype command to change the LogDbg value and then write the in-memory configuration to disk to save the results in the types.cf file.

Note: Only increase agent log levels when you experience problems. The performance impacts and disk space usage can be substantial.

You can also change the size of the log file from the default of 32MB. When a log file reaches the size limit defined in the LogFileSize cluster attribute, a new log file is created with B, C, D, and so on appended to the file name. Letter “A” indicates the first log file, “B” the second, “C” the third, and so on.

Page 393: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–7Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

UMI-Based SupportUMI support in all VERITAS 4.x products, including VCS, provides a mapping between the message ID number and technical notes provided on the Support Web site. This helps you quickly find solutions to the specific problem indicated by the message ID.

UMI-Based SupportCommand-line error message:VCS ERROR V-16-1-10069 All systems have configuration files marked STALE. Unable to form cluster.

System log (syslog):Jul 10 16:24:21 johndoe unix: VCS ERROR V-16-1-10069 All systems have configuration files marked STALE. Unable to form cluster.

VCS Engine Log (engine_A.log):Jul 10 16:24:21 VCS ERROR V-16-1-10069 All systems have configuration files marked STALE. Unable to form cluster.

UMIsmap to Tech

Note IDs.

UMIsmap to Tech

Note IDs.

Page 394: VERITAS Cluster Server for UNIX Fundamentals

15–8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the VERITAS Support Web SiteThe VERITAS Support Web site contains product and patch information, a searchable knowledge base of technical notes, access to product-specific news groups and e-mail notification services, and other information about contacting technical support staff.

The VERITAS Architect Network (VAN) provides a portal for accessing technical resources, such as product documentation, software, technical articles, and discussion groups. You can access VAN from http://van.veritas.com.

Use the Support Web site to:

– Download patches.

– Track your cases.

– Search for tech notes.

The VERITAS Architect Network (VAN) is another forum for technical information.

Use the Support Web site to:

– Download patches.

– Track your cases.

– Search for tech notes.

The VERITAS Architect Network (VAN) is another forum for technical information.

Using the VERITAS Technical Web Sites

Page 395: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–9Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Troubleshooting GuideVCS problems are typically one of three types:• Cluster communication• VCS engine startup• Service groups, resources, or agents

Procedure OverviewTo start troubleshooting, determine which type of problem is occurring based on the information displayed by hastatus -summary output.• Cluster communication problems are indicated by the message:

Cannot connect to server -- Retry Later• VCS engine startup problems are indicated by systems in the

STALE_ADMIN_WAIT or ADMIN_WAIT state.• Other problems are indicated when the VCS engine, LLT, and GAB are all

running on all systems, but service groups or resources are in an unexpected state.

Each type of problem is discussed in more detail in the following sections.

Procedure Overview

hastatus -sum

Communication Problems

Cannot connect to server

System1 in WAIT State

Service Group Problems

VCS Startup Problem

SG1 Autodisabled . . .

Start

Page 396: VERITAS Cluster Server for UNIX Fundamentals

15–10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Using the Troubleshooting Job AidYou can use the troubleshooting job aid provided with this course to assist you in solving problems in your VCS environment. This lesson provides the background for understanding the root causes of problems, as well as the effects of applying solutions described in the job aid.

Ensure that you understand the consequences of the commands and methods you use for troubleshooting when using the job aid.

Using the Troubleshooting Job AidA troubleshooting job aid is provided with this course.Use the job aid after familiarizing yourself with the details of the troubleshooting techniques shown in this lesson.

Page 397: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–11Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Cluster Communication ProblemsWhen hastatus displays a message that it is unable to connect to the server, a cluster communication or VCS engine problem is indicated. Start by verifying that the cluster communication mechanisms are working properly.

Checking GABCheck the status of GAB using gabconfig:gabconfig -a• If no port memberships are present, GAB is not seeded. This indicates a

problem with GAB or LLT.• Check LLT (next section). If all systems can communicate over LLT, check

/etc/gabtab and verify that the seed number is specified correctly.• If port h membership is not present, the VCS engine (had) is not running.

Checking GABCheck GAB by running gabconfig –a:

No port a membership indicates a GAB or LLT problem.– Check the seed number in /etc/gabtab.– If a node is not operational, and, therefore, the cluster is not

seeded, force GAB to start:gabconfig -x

If GAB starts and immediately shuts down, check LLT and cluster interconnect cabling.No port h membership indicates a VCS engine (had) startup problem.

# gabconfig -aGAB Port Memberships========================

# gabconfig -aGAB Port Memberships========================

# gabconfig -aGAB Port Memberships===================================Port a gen 24110002 membership 01

# gabconfig -aGAB Port Memberships===================================Port a gen 24110002 membership 01

HAD not running:GAB and LLT functioning

Page 398: VERITAS Cluster Server for UNIX Fundamentals

15–12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Checking LLTRun the lltconfig command to determine whether LLT is running. If it is not running:• Check the console and system log for messages indicating missing or

misconfigured LLT files.• Check the LLT configuration files, llttab, llthosts, and sysname to

verify that they contain valid and matching entries.• Use other LLT commands to check the status of LLT, such as lltstat and

lltconfig -a list.

Checking LLTRun lltconfig to determine if LLT is running. If LLT is notrunning:

Check console and system log messages.Check LLT configuration files:– Check the /etc/llttab file:

Verify that the node number is within range (0-31).Verify that the cluster number is within range (0-255).Determine whether the link directive is specified correctly (for example, qf3 should be qfe3).

– Check the /etc/llthosts file:Verify that the node numbers are within range.Verify that the system names match the entries in the llttab or sysname files.

– Check the /etc/VRTSvcs/conf/sysname file:Ensure that the name listed is the local host (system) name.Verify that the system name matches the entry in the llthostsfile.

Page 399: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–13Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Duplicate Node IDsHow VCS responds to duplicate node IDs in a cluster configuration depends on the version of VCS you are running.• 4.x: If LLT detects a duplicate node ID, LLT shuts down on the links where

duplicate IDs were detected.• 3.5: If LLT detects a duplicate node ID, it informs GAB, and GAB panics the

system only if the duplicate IDs are detected on the high-priority links.• 2.0: If LLT detects a duplicate node ID on any LLT link, whether the link is

high- or low-priority, it informs GAB, and GAB panics the system.

Duplicate Node IDsVCS responds to duplicate node IDs on LLTlinks differently depending on the version.

VCS shuts down LLT on the links where duplicates are detected.

VCS panics the first system to detect the duplicate ID if another system is started with the same ID on high-priority links only.

VCS panics the first system that detects another system starting with the same node ID on any LLT link, high- or low-priority.

4.x4.x

2.02.0

3.53.5

Page 400: VERITAS Cluster Server for UNIX Fundamentals

15–14 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Problems with LLTIf LLT is running on each system, verify that each system can detect all other cluster systems by running lltstat -n. Check the physical connections if you determine that systems cannot detect each other.

There are several options to lltstat that may be helpful when troubleshooting LLT problems.

-z Reset statistical counters

-v Verbose output

-vv Very verbose output

-l Display current status of links

-n Display current status of peer systems

Problems with LLTIf LLT is running:

Run lltstat -n to determine if systems can detect each other on the LLT link.Check the physical network connections if LLT cannot communicate with each node.

train11# lltconfigLLT is running

train11# lltstat -nLLT node information:

Node State Links* 0 train11 OPEN 2

1 train12 CONNWAIT 2

train11# lltconfigLLT is running

train11# lltstat -nLLT node information:

Node State Links* 0 train11 OPEN 2

1 train12 CONNWAIT 2

train12# lltconfigLLT is running

train12# lltstat -nLLT node information:

Node State Links0 train11 CONNWAIT 2

* 1 train12 OPEN 2

train12# lltconfigLLT is running

train12# lltstat -nLLT node information:

Node State Links0 train11 CONNWAIT 2

* 1 train12 OPEN 2

lltstat options:-l: link status-z: reset counters-vv: very verbose

lltstat options:-l: link status-z: reset counters-vv: very verbose

Page 401: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–15Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

VCS Engine ProblemsStartup ProblemsThe VCS engine fails to start in some circumstances, such as when:• VCS is not properly licensed.• The llthosts file does not exist, or contains entries that do not match the

cluster configuration.• The cluster is not seeded.• The engine state is STALE_ADMIN_WAIT or ADMIN_WAIT, indicating a

problem building the configuration in memory, as discussed in the following pages.

Startup ProblemsThe VCS engine (had) does not start under certain conditions related to licensing, seeding, and misconfigured files.Run hastatus –sum:– Check GAB and LLT if you see this message:Cannot connect to server -- Retry LaterImproper licensing can also cause this problem.

– Verify that the main.cf file is valid and that system names match llthosts and llttab:hacf –verify /etc/VRTSvcs/conf/config

– Check for systems in WAIT states.

Page 402: VERITAS Cluster Server for UNIX Fundamentals

15–16 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

STALE_ADMIN_WAITIf you try to start VCS on a system where the local disk configuration is stale and there are no other running systems, the VCS engine transitions to the STALE_ADMIN_WAIT state. This signals that administrator intervention is required in order to get the VCS engine into the running state, because the main.cf may not match the configuration that was in memory when the engine stopped.

If the VCS engine is in the STALE_ADMIN_WAIT state:1 Visually inspect the main.cf file to determine if it is up-to-date (reflects the

current configuration).2 Edit the main.cf file, if necessary.3 Verify the main.cf file syntax, if you modified the file:

hacf –verify config_dir4 Start the VCS engine on the system with the valid main.cf file:

hasys -force system_name

The other systems perform a remote build from the system now running.

STALE_ADMIN_WAITA system can enter the STALE_ADMIN_WAIT state when:

– There is no other system in a RUNNING state.– The system has a .stale flag.– You start VCS on that system.To recover from the STALE_ADMIN_WAIT state:

1. Visually inspect the main.cf file to determine if it is valid.2. Edit the main.cf file, if necessary.3. Verify the syntax of main.cf, if modified.

hacf –verify config_dir4. Start VCS on the system with the valid main.cf file:

hasys -force system_nameAll other systems perform a remote build from the systemnow running.

Page 403: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–17Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

ADMIN_WAITThe ADMIN_WAIT state results when a system is performing a remote build and the last running system in the cluster fails before the configuration is delivered. It can also occur if the VCS is performing a local build and the main.cf is missing or invalid (syntax errors).

In either case, fix the problem as follows:1 Locate a valid main.cf file from a main.cf.previous file on disk or a

backup on tape or other media.2 Replace the invalid main.cf with the valid version on the local node.3 Use the procedure specified for a stale configuration to force VCS to start.

ADMIN_WAITA system can be in the ADMIN_WAIT state under these circumstances:– A .stale flag exists, and the main.cf file has

a syntax problem. – A disk error occurs, affecting main.cf during

a local build.– The system is performing a remote build, and

the last running system fails.To fix this, restore main.cf and use the procedure for STALE_ADMIN_WAIT.

Page 404: VERITAS Cluster Server for UNIX Fundamentals

15–18 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Service Group and Resource ProblemsService Groups Problems

Service Group Not Configured to AutoStart on the System

To enable VCS to bring a service group online automatically after engine startup, the service group AutoStart attribute must be set to 1 and the target system must be listed in the AutoStartList attribute of the service group.

Service Group Not Configured to Run on the System

If the system is not included in the SystemList attribute of a service group, you cannot bring the service group online on that system even if the system is part of the same cluster. The systems listed in the SystemList attribute should be a superset of the systems listed in the AutoStartList attribute.

Service Group Does Not Come OnlineIf a service group does not come online automatically when VCS starts, check the AutoStart and AutoStartList attributes:hagrp –display service_groupThe service group must also be configured to run on the system.Check the SystemList attribute and verify that the system name is included.

Page 405: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–19Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Service Group AutoDisabled

VCS automatically disables service groups under these conditions:• GAB detects the system but the VCS engine is not running.• Resources in a service group are not fully probed.

The autodisable feature is a mechanism used by VCS to prevent a split-brain condition. If VCS cannot verify that the resources are offline everywhere, it sets the AutoDisabled attribute to prevent the service group from coming online on more than one system.

If a service group was autodisabled because HAD could not probe all its critical resources then after HAD has successfully probed them, it clears the service group’s autodisabled flag.

In contrast, if a system that was in a jeopardy membership fails, VCS does not enable you to bring the service group online on other systems until you manually clear the AutoDisabled attribute for the service group. Before clearing the AutoDisabled attribute:• Ensure that the service group is offline on all running systems in the cluster.• Ensure that the resources are not running outside of VCS control.• Verify that there are no network partitions in the cluster.

To clear the AutoDisabled attribute, type:

hagrp -autoenable service_group -sys system_name

Service Group AutoDisabledAutodisabling occurs when:– GAB sees a system, but HAD is not running on the

system.– Resources of the service group are not fully probed on

all systems in the SystemList attribute.Ensure that the service group is offline on all systems in listed in the SystemList attribute and that resources are not running outside of VCS.Verify that there are no network partitions.Clear the AutoDisabled attribute:hagrp –autoenable service_group -sys system

Page 406: VERITAS Cluster Server for UNIX Fundamentals

15–20 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Service Group Not Fully Probed

A service group must be probed on all systems in the SystemList attribute before VCS attempts to bring the group online. This ensures that even if the service group was online prior to VCS being brought up, VCS does not inadvertently bring the service group online on another system.

If the agents have not monitored each resource, the service group does not come online. Resources that cannot be probed usually have incorrect values specified for one or more attributes.

Follow this guideline determine whether resources are probed.• Check the ProbesPending attribute:

hagrp -display service_group A value of 0 indicates that each resource in the service group has been successfully probed. If there are any resources that cannot successfully be probed, the ProbesPending attribute is set to 1 (true) and the service group cannot be brought online.

• Check which resources are not probed:hastatus -sum

• Check the Probes attribute for resources:hares -display

• Probe the resources:hares –probe resource -sys system

See the engine and agent logs in /var/VRTSvcs/log for more information.

Service Group Not Fully ProbedAll resources in a service group must be probed on all systems in SystemList before it can be brought online.If the service group cannot be fully probed:– Check the ProbesPending attribute:hagrp -display service_group

– Check which resources are not probed:hastatus -sum

– Check the Probes attribute for resources:hares -display

– Probe the resources:hares –probe resource -sys system

Improperly configured resource attributes cannot be probed.

Page 407: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–21Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Service Group Frozen

A service group can be frozen in the online or offline state. When a service group is frozen, no further agent actions can take place on any resources in the service group, including failover.

Use the output of the hagrp command to check the value of the Frozen and TFrozen attributes. For example, type:

hagrp -display service_group• The Frozen attribute shows whether a service group is frozen persistently or

not. If set to 1, it is a persistent freeze.• The TFrozen attribute shows whether a service group is frozen temporarily or

not. If set to 1, it is a temporary freeze.

Use the command hagrp -unfreeze to unfreeze the group.

Note: If you freeze persistently, you must unfreeze persistently.

Service Group FrozenWhen a service group is frozen, no further agent actions can take place on any resources in theservice group.

Verify the value of Frozen and TFrozen attributes: hagrp -display service_group

Unfreeze the service group:hagrp -unfreeze group [-persistent]

If you freeze persistently, you must unfreeze persistently.

Page 408: VERITAS Cluster Server for UNIX Fundamentals

15–22 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Service Group Is Not Offline Elsewhere in the Cluster

VCS does not allow you to bring a service group online if the service group is partially or fully online elsewhere in the cluster. If you want to bring a service group online elsewhere, switch the service group using hagrp -switch.

Service Group Waiting for Resource to Be Brought Online

Because VCS brings resources online hierarchically according to the dependency diagram, a service group cannot come online successfully if any resource cannot come online. This can be due to:• Problems with the physical resource • Errors in the resource attribute values• Incorrectly specified resource dependencies

If the resource is stuck in an internal state (Istate attribute), such as Waiting to Go Online, you may need to flush the service group before taking any corrective measures. Flushing clears the internal state and enables you to bring the service group online after correcting the error.

Service Group Is Not Offline Elsewhere or Waiting for Resource

Case 1: The service group is not offline elsewhere.Solution: Switch the service group rather than bringing it online.Case 2: The service group is waiting for a resource that is stuck in a wait state.Solution:1. Determine which resources are online/offline:

hastatus -sum2. Ensure that the resources are offline outside of VCS on all

systems.3. Verify the State attribute:

hagrp -display service_group4. Flush the service group:

hagrp -flush service_group -sys system

Page 409: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–23Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Incorrect Local Name for System

A service group cannot be brought online if VCS has an incorrect local name for the system. This occurs when the name returned by the command uname -n does not match the system name in the llthosts, llttab, or main.cf files.

This is typically the case when uname -n returns a fully domain-qualified name and you are not using the sysname file to define the system name to VCS. Check this using hasys -list to display the system names known to VCS.

Incorrect Local NameA service group cannot be brought online if the systemname is inconsistent in the llthosts, llttab,sysname, or main.cf files.1. Check each file for consistent use of system names.2. Correct any discrepancies. 3. If main.cf is changed, stop and restart VCS.4. If llthosts or llttab is changed:

a. Stop VCS, GAB, and LLT.b. Restart LLT, GAB, and VCS.

This most commonly occurs if you are not using a sysnamefile and someone changes the UNIX host name. This most commonly occurs if you are not using a sysnamefile and someone changes the UNIX host name. !

Page 410: VERITAS Cluster Server for UNIX Fundamentals

15–24 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Concurrency Violations

A concurrency violation occurs when a failover service group becomes fully or partially online on more than one system. When this happens, VCS takes the service group offline on the system that caused the concurrency violation and invokes the violation event trigger on that system.

The Violation trigger is configured by default during installation. The violation trigger script is placed in /opt/VRTSvcs/bin/triggers and no other configuration is required.

The script notifies the administrator and takes the service group offline on the system where the trigger was invoked.

The script can send a message to the system log and console on all cluster systems and can be customized to send additional messages or e-mail messages.

Concurrency ViolationThis occurs when you bring a failover service group online outside of VCS when it is already online on another system.Notification is provided by the Violation trigger. This trigger:

Is configured by default with the violation script in the/opt/VRTSvcs/bin/triggersdirectory

Is invoked on the system that caused the concurrency violation

Notifies the administrator and takes the service group offline on the system causing the violation

To prevent concurrency violations, do not manage resources outside of VCS.!

Page 411: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–25Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Problems Taking Service Groups Offline

You can occasionally encounter problems when trying to take VCS service groups offline. If this happens during a failover, it can prevent the service group from coming online on another system. Use the following recommendations to solve problems you may encounter.

Service Group Waiting for a Resource to Be Taken Offline

If a resource is stuck in the internal state of WAITING TO GO OFFLINE, none of the child resources can be taken offline and this situation can prevent a failover. This situation is often a result of a resource being controlled outside of VCS. For example, a file system is unmounted before the Mount resource was taken offline.

The ResNotOff trigger can be configured to notify an administrator or, in case of very critical services, to reboot or halt the system so that another system can start the service group.

However, a careful analysis of the systems and the applications is required, because halting a system causes failover, interrupting other service groups that were online on that system.

Problems Taking Service Groups OfflineIf a service group is waiting for a resource to comeoffline:

Identify which resource is not offline:hastatus –summary

Check logs.Manually take the resource offline, if necessary.Configure the ResNotOff trigger for notification or action.

Example: NFS service groups have this problem if an NFS client does not disconnect. The Share resource cannot come offline when a client is connected. You can configure ResNotOff to forcibly stop the share.

Example: NFS service groups have this problem if an NFS client does not disconnect. The Share resource cannot come offline when a client is connected. You can configure ResNotOff to forcibly stop the share.

Page 412: VERITAS Cluster Server for UNIX Fundamentals

15–26 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Problems with Service Group Failover

If a service group does not fail over as you expect when a fault occurs, check all resource and service group attributes that can affect fail over. Examples are listed in the slide.

Refer to the “Configuring VCS Response to Resource Faults” lesson for detailed information about how VCS handles resource faults. Also, see the “System and Communication Failures” lessons to understand how those faults affect service groups.

Problems with Service Group FailoverIf a service group does not fail over when afault occurs, check:

Critical attributes for resourcesService group attributes:– ManageFaults and FaultPropagation– Frozen and TFrozen– AutoFailover– FailOverPolicy– SystemList

Also, check timeout values if you have modified timeout-related attributes.

Page 413: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–27Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Resource Problems

Critical Resource Faults

A service group does not come online on a system where a critical resource is marked as FAULTED. Persistent resource faults are cleared automatically after the underlying software or hardware problem is fixed. The next monitor cycle determines that the resource is responding properly and reports the resource as online. You can also probe the resource to force a monitor cycle.

Nonpersistent resource faults need to be explicitly cleared.

Resource Problems: Critical Resource FaultsA service group cannot come online on a system where a critical resource is marked as FAULTED. 1. Determine which critical resource has faulted:

hastatus –summary2. Ensure that the resource is offline. 3. Examine the engine log to analyze the problem.4. Fix the problem.5. Freeze the service group.6. Verify that the resources work properly outside of

VCS.7. Clear the fault in VCS.8. Unfreeze the service group. Note: Persistent resource faults

are cleared automatically afterthe underlying problem is fixed.

Note: Persistent resource faults are cleared automatically afterthe underlying problem is fixed.

Page 414: VERITAS Cluster Server for UNIX Fundamentals

15–28 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Problems Bringing Resources Online

If VCS is unable to bring a resource online, these are the likely causes:• The resource is waiting for a child resource to come online.• The resource is stuck in a WAIT state.• The agent is not running.

Waiting for Child Resources

VCS does not bring a resource online if one or more of its child resources cannot be brought online. You need to solve the problem with the child resource and bring it online first before attempting to bring the parent online.

Note: The resource waiting for its child resources has an internal wait state of Waiting for Children Online. As soon as all the children are brought online, the resource transitions to Waiting to Go Online.

Resource Waiting to Come Online

You can encounter this situation if VCS has directed the agent to run the online entry point for the resource, but the resource is stuck in the internal state Waiting to Go Online. Check the VCS engine and agent logs to identify the problem and solve it.

Unable to Bring Resources Online or OfflineResources cannot be brought online or takenoffline when:

A resource is waiting for child or parent resources.A resource is stuck in an internal Waiting state.The agent is not running.

Page 415: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–29Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Problems Taking Resources Offline

If VCS is unable to take a resource offline, these are the likely causes:• The resource is waiting for a parent resource.• The resource is waiting for a resource to respond.• The agent is not running (as discussed in the previous section).

Waiting for the Parent Resource

VCS does not take a resource offline if one or more of its parent resources cannot be taken offline. Solve the problem with the parent resource and take it offline first before attempting to bring the child offline.

Waiting for a Resource to Respond

You can encounter this situation if VCS has directed the agent to run the offline entry point for the resource, but the resource is stuck in the internal state Waiting to Go Offline. Check the VCS engine and agent logs to identify the problem and solve it. VCS allows the offline entry point to run until the OfflineTimeout value is reached. After that, it stops the entry point process and runs the clean entry point. If the resource still does not go offline, it runs the ResNotOff trigger, if configured.

Page 416: VERITAS Cluster Server for UNIX Fundamentals

15–30 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Agent Problems and Resource Type Problems

Agent Problems

An agent process should be running on the system for each configured resource type. If the agent process is stopped for any reason, VCS cannot carry out operations on any resource of that type. Check the VCS engine and agent logs to identify what caused the agent to stop or prevented it from starting. It could be an incorrect path for the agent binary, the wrong agent name, or a corrupt agent binary.

Use the haagent command to restart the agent. Ensure that you start the agent on all systems in the cluster.

Agent Problems: Agent Not RunningDetermine whether the agent for that resource is FAULTED:hastatus –summary

Use the ps command to verify that the agent process is not running.Check the log files for:– Incorrect path name for the agent binary– Incorrect agent name– Corrupt agent binary

Verify that the agent is installed on all systems.Restart the agent after fixing the problem:haagent –start agent –sys system

Page 417: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–31Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Resource Type Problems

Another common problem preventing VCS from bringing a resource online is an invalid specification of the agent argument list in a resource type ArgList attribute. If you inadvertently select a resource type rather than a resource in the Cluster Manager, and change the ArgList attribute, the agent cannot function properly.

Perform these tasks to determine if this problem is occurring:• Verify that the resource attributes are correctly specified:

hares –display resource• Verify that the agent is running:

haagent –display • Verify that the resource works properly outside of VCS.• Display values for ArgList and ArgListValues type attributes:

hatype –display res_type

If ArgList is corrupted in types.cf:1 Stop VCS on all systems:

hastop -all -force2 Fix types.cf or replace with types.cf.previous. For example:

/etc/VRTSvcs/conf/config# cp types.cf.previous types.cfNote: Check each *types.cf file if you have multiple types definition files.

3 Start VCS on the repaired system and then start VCS stale on other systems:hastart [-stale]

Resource Type ProblemsIf a resource does not come online after you check all otherpossible causes, check the resource type definition:

Verify that the resource attributes are correctly specified using hares –display resource.Verify that the agent is running using haagent –display. Verify that the resource works properly outside of VCS.Display values for ArgList and ArgListValues type attributes using hatype –display res_type.If ArgList is corrupted in *types.cf:

1. Stop VCS on all systems.2. Fix types.cf or replace with types.cf.previous.3. Restart VCS.

Page 418: VERITAS Cluster Server for UNIX Fundamentals

15–32 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Archiving VCS-Related FilesMaking BackupsInclude VCS configuration information in your regular backup scheme. Consider archiving these types of files and directories:• /etc/VRTSvcs/conf/config/types.cf and any other custom types

files• /etc/VRTSvcs/conf/config/main.cf • main.cmd, generated by:

hacf -cftocmd /etc/VRTSvcs/conf/config• LLT and GAB configuration files in /etc:

– llthosts– llttab (unique on each system)– gabtab

• Customized triggers in /opt/VRTSvcs/bin/triggers• Customized agents in /opt/VRTSvcs/bin

Notes: • You can use the hagetcf command in VCS 3.5 or hasnap in VCS 4.0 to

create a directory structure containing all cluster configuration files. This helps ensure that you do not inadvertently miss archiving any key files.

• The VCS software distribution includes the VRTSspt package, which provides vxexplorer, a tool for gathering system information that may be needed by Support to troubleshoot a problem.

Making BackupsBack up key VCS files as part of your regular backup procedure:types.cf and customized types filesmain.cfmain.cmdsysnameLLT and GAB configuration files in /etcCustomized trigger scripts in /opt/VRTSvcs/bin/triggersCustomized agents in /opt/VRTSvcs/bin

Page 419: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–33Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

The hasnap UtilityThe hasnap utility backs up and restores predefined and custom VCS files on each node in a cluster. A snapshot is a collection of predefined VCS configuration files and any files added to a custom file list. A snapshot also contains information such as the snapshot name, description, time, and file permissions.

In the example shown in the slide, hasnap is used to:• Create a single file containing all backed up files (-f vcs.tar).• Specify no prompts for user input (-n).• Create a description for the snapshot (-m Oracle_Cluster).

The following table shows samples of hasnap options:

Option Purpose

-backup Copies the files to a local predefined directory

-restore Copies the files in the specified snapshot to a directory

-display Lists all snapshots and the details of a specified snapshot

-sdiff Shows differences between configuration files in a snapshot and the files on a specified systems

-fdiff Shows differences between a specified file in a snapshot and the file on a specified systems

-export Exports a snapshot to a single file

-custom Adds specified files along with predefined VCS files

The hasnap UtilityThe hasnap utility backs up and restores VCS configuration files. This utility also serves as a support tool for collecting information needed for problem analysis.

# hasnap –backup –f /tmp/vcs.tar -n -m Oracle_ClusterV-8-1-15522 Initializing file "vcs.tar" for backup.V-8-1-15526 Please wait...

Checking VCS package integrity

Collecting VCS information

…..

Compressing /tmp/vcs.tar to /tmp/vcs.tar.gz

Done.

# hasnap –backup –f /tmp/vcs.tar -n -m Oracle_ClusterV-8-1-15522 Initializing file "vcs.tar" for backup.V-8-1-15526 Please wait...

Checking VCS package integrity

Collecting VCS information

…..

Compressing /tmp/vcs.tar to /tmp/vcs.tar.gz

Done.

Page 420: VERITAS Cluster Server for UNIX Fundamentals

15–34 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

SummaryThis lesson described how to detect and solve problems with VCS faults. Common problem scenarios were described and solutions were provided, as well as a general purpose troubleshooting methodology.

Next StepsNow that you have learned how to configure, manage, and troubleshoot high availability services in the VCS environment, you can learn how to manage more complex cluster configurations, such as multinode clusters.

Additional Resources• Troubleshooting Job Aid

This quick reference is included with this participant guide.• VERITAS Cluster Server User’s Guide

This guide provides detailed information on procedures and concepts for configuring and managing VCS clusters.

• VERITAS Cluster Server Bundled Agents Reference GuideThis guide describes each bundled agent in detail.

• http://support.veritas.comThis Web site provides troubleshooting information about all VERITAS products.

Lesson SummaryKey Points – Develop an understanding of common problem

causes and solutions to problems using the background provided in this lesson.

– Use the troubleshooting job aid as a guide.Reference Materials– Troubleshooting Job Aid– VERITAS Cluster Server User's Guide– VERITAS Cluster Server Bundled Agents

Reference Guide– http://support.veritas.com

Page 421: VERITAS Cluster Server for UNIX Fundamentals

Lesson 15 Troubleshooting 15–35Copyright © 2005 VERITAS Software Corporation. All rights reserved.

15

Lab 15: Troubleshooting Goal During this lab exercise, your instructor will create several problems on your cluster. Your goal is to diagnose, fix, and document them for the instructor during the allotted time. If you finish early, your instructor may give you additional problems to solve.

PrerequisitesWait for your instructor to indicate that your systems are ready for troubleshooting.

ResultsThe cluster is running with all service groups online.

Optional LabSome classrooms have access to the VERITAS Support Web site. If your instructor indicates that your classroom network can access VERITAS Support, search http://support.veritas.com for technical notes to help you solve the problems created as part of this lab exercise.

Lab 15: TroubleshootingYour instructor will run scripts that cause problemswithin your cluster environment.

Apply the troubleshooting techniques provided in the lesson to identify and fix the problems.Notify your instructor when you have restored your cluster to a functional state.

Optional lab: If your instructor indicates that your classroom has access to the VERITAS Support Web site, search http://support.veritas.com for technical notes to help you solve the problems created as part of this lab exercise.

Optional lab: If your instructor indicates that your classroom has access to the VERITAS Support Web site, search http://support.veritas.com for technical notes to help you solve the problems created as part of this lab exercise.

Page 422: VERITAS Cluster Server for UNIX Fundamentals

15–36 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2005 VERITAS Software Corporation. All rights reserved.

Page 423: VERITAS Cluster Server for UNIX Fundamentals

Index-1Copyright © 2004 VERITAS Software Corporation. All rights reserved.

Aabort sequence 2-14access control 6-8access, controlling 6-6adding license 3-7admin account 2-16ADMIN_WAIT state

definition 15-17in ManageFaults attribute 11-8in ResAdminWait trigger 11-25recovering resource from 11-22

administration application 4-4administrative IP address 5-8administrator, network 5-11agent

clean entry point 1-14, 11-5close entry point 7-28communication 12-4custom 15-32definition 1-14logs 15-5monitor entry point 1-14offline entry point 1-14online entry point 1-14troubleshooting 15-30

AIXconfigure IP address 5-9configure virtual IP address 5-16llttab 12-12lslpp command 3-14SCSI ID 2-10startup files 12-18

AllowNativeCliUsers attribute 6-7application

clean 5-12component definition 5-4configure 5-12IP address 5-15management 4-4managing 4-4manual migration 5-21preparation procedure 5-13prepare 5-4service 5-4

shutdown 5-12start 5-17

application componentsstopping 5-20

application servicedefinition 1-7testing 5-13

atomic broadcast mechanism 12-4attribute

display 4-7local 9-13override 11-19resource 1-12, 7-10resource type 11-13, 11-15service group failover 11-7service group validation 5-25verify 5-23

autodisabledefinition 15-19in jeopardy 13-8service group 12-20

AutoDisabled attribute 12-20, 15-19AutoFailover attribute 11-9AutoStart attribute 4-9, 15-18AutoStartList attribute 15-18

Bbackup configuration files 8-19base IP address 5-8best practice

application management 4-4application service testing 5-13cluster interconnect 2-7

boot disk 2-8Bundled Agents Reference Guide 1-15

Ccable, SCSI 2-9child resource

configuration 7-9dependency 1-11linking 7-31

Index

Page 424: VERITAS Cluster Server for UNIX Fundamentals

Index-2 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

clean entry point 11-5clear

autodisable 15-19resource fault 4-16

CLIonline configuration 6-13resource configuration 7-16service group configuration 7-6

closecluster configuration 6-16entry point 7-28

clustercampus 14-26communication 12-4configuration 1-22configuration files 3-13configure 3-5create configuration 6-19definition 1-5design Intro-6, 2-6duplicate configuration 6-20duplicate service group configuration 6-21ID 2-16, 12-11installation preparation 2-16interconnect 1-16interconnect configuration 13-18maintenance 2-4managing applications 4-4member systems 12-8membership 1-16, 12-7membership seeding 12-17membership status 12-7name 2-16Running state 6-24simulator 4-18terminology 1-4troubleshooting 8-16

cluster communicationconfiguration files 3-12overview 1-16

cluster configurationbuild from file 6-28close 6-16in memory 6-22in-memory 6-24modification 8-14offline 6-18open 6-14

protection 6-17save 6-15

cluster interconnectconfiguration files 3-12configure 3-8definition 1-16VCS startup 6-23, 6-24

Cluster Managerinstallation 3-17online configuration 6-13Windows 3-18

Cluster Monitor 4-22cluster state

GAB 6-27, 12-4remote build 6-24, 6-27running 6-27Stale_Admin_Wait 6-25unknown 6-25Wait 6-26

ClusterService groupinstallation 3-8main.cf file 3-13notification 10-6

command-line interface 7-6communication

agent 12-4between cluster systems 12-5cluster problems 15-9configure 13-18fencing 14-21within a system 12-4

component testing 5-19concurrency violation

in failover service group 15-24in frozen service group 4-13prevention 12-20

configurationapplication 5-12application IP address 5-15application service 5-6backup files 8-19build from file 6-28cluster 3-5cluster interconnect 3-8downtime 6-5fencing 3-16, 14-27files 1-23GAB 12-16

Page 425: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals Index-3Copyright © 2004 VERITAS Software Corporation. All rights reserved.

GroupOwner attribute 10-11in-memory 6-12interconnect 12-10, 13-18LLT 13-20main.cf file 1-23methods 6-4network 5-8notification 10-6NotifierMngr 10-8offline method 6-18overview 1-22protection 6-17resource type attribute 11-18shared storage 5-7troubleshooting 7-26, 8-16types.cf file 1-23Web console 3-8

configuration filesbackup 8-19, 15-32installation 3-11llttab 12-11network 2-15operating system 2-15

ConfInterval attribute 11-15, 11-17coordinator disk

definition 14-10disk group 14-27requirements 14-26

Critical attributein critical resource 4-9setting 7-35

critical resourcefaults 15-27role of in failover 11-4

crossover cable 2-7custom

agents 15-32triggers 15-32

Ddata

corruption 13-13disk 14-11storage 2-8

data diskreservation 14-13

data protectiondefinition 14-4

fencing 1-19, 13-4HAD 14-25jeopardy membership 13-8requirement definition 14-8service group heartbeats 13-16

dependencyoffline order 4-11online order 4-9resource 1-11, 5-24, 7-31resource offline order 4-15resource rules 7-32resource start order 5-13resource stop order 5-20rule 1-11

designcluster 2-6offline configuration 8-10resource dependency 5-24validate 5-22worksheet 2-6

differential SCSI 2-9disable

resource 7-28disk

coordinator 14-10data 14-11fencing 14-10shared 2-8

disk group fencing 14-11DiskGroup resource 7-16display

service group 7-6displaying

cluster membership status 12-7LLT status 12-9

DMP 2-8downtime

cluster configuration 6-5system fault 13-6

dynamic multipathing 2-8

Eeeprom command 2-10e-mail notification

configuration 10-4from GroupOwner attribute 10-11from ResourceOwner attribute 10-10

Page 426: VERITAS Cluster Server for UNIX Fundamentals

Index-4 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

entry pointclean 11-5close 7-28definition 1-14offline 11-14online 11-14

environment variableMANPATH 2-13PATH 2-13

Ethernet interconnect network 2-7Ethernet ports 12-9event

notification 11-24severity level 10-5trigger 11-25

event messages 10-5

Ffailover

active/active 1-28active/passive 1-25automatic 11-9configurations supported 1-25critical resource 7-35, 11-4default behavior 11-4definition 1-6duration 11-11, 13-6manual 4-12, 11-9N + 1 1-27N-to-1 1-26N-to-N 1-29policy 11-5service group fault 11-4service group problems 15-26service group type 1-9

FailOverPolicy attribute 11-5failure

communication 14-6fencing 14-15HAD startup 15-15interconnect recovery 14-33LLT link 13-7system 14-5

faultcritical resource 7-35, 15-27detection duration 11-11effects of resource type attributes 11-15

failover duration 13-6ManageFaults attribute 11-8manual management 11-7notification 11-24recover 11-20resource 4-16system 13-5trigger 11-25

FaultPropagation attribute 11-8fencing

communication 14-21components 14-10configure 14-27coordinator disk requirements 14-26data protection 14-9definition 1-19disk groups 14-30, 14-31GAB communication 14-15I/O 13-4installation 3-16interconnect failure 14-15partition-in-time 14-34race 14-14recovering a system 14-32startup 14-29system failure 14-14vxfen driver 14-23

flush service group 7-27force

VCS stop 6-30freeze

persistent 4-13service group 4-13temporary 4-13

Frozen attribute 4-13, 11-7, 15-21

GGAB

cluster state change 6-27communication 12-4configuration file 12-16definition 1-18fencing 14-23manual seeding 12-19membership 12-7, 12-17, 14-17Port a 12-7Port b 14-21Port h 12-7

Page 427: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals Index-5Copyright © 2004 VERITAS Software Corporation. All rights reserved.

seeding 12-17startup files 12-18status 12-7timeout 13-6troubleshooting 15-11

gabconfig command 12-7gabtab file 12-16Group Membership Services/Atomic Broadcast

definition 1-18GroupOwner attribute 10-11GUI

adding a service group 7-5resource configuration 7-10

Hhacf command 8-4haconf command 7-6HAD

data protection 14-25definition 1-20log 4-8notifier 10-4online configuration 6-13Stale_Admin_Wait state 6-25startup 6-22, 14-25

hagetcf command 15-32hagrp command 4-10, 4-11, 4-12, 4-13, 7-6hardware

requirements 2-7SCSI 2-9storage 2-8support 2-7verify 2-12

hardware compatibility list 2-7hares command 4-14hashadow daemon 1-20hasim command 4-21hasimgui 4-19hasnap command 15-33hastart command 6-22hastatus command 4-7hastop command 6-30hauser command 6-9HBA 2-8HCL 2-7

heartbeatdefinition 1-18, 12-5disk 13-16frequency reduction 13-14loss of 14-6, 14-15low-priority link 12-6, 13-20network requirement 2-7public network 13-14service group 13-16

high availabilitydefinition 1-5notification 10-6online configuration 6-12

high availability daemon 1-20high-priority link 12-6HP OpenView Network Node Manager 10-12HP-UX

configuring IP address 5-9configuring virtual IP address 5-16llttab 12-12SCSI ID 2-11startup files 12-18swlist command 3-14

hub 2-7hybrid service group type 1-9

II/O fencing 13-4ID

cluster 2-16duplicate node numbers 15-13initiator 2-10message 15-5

ifconfig command 5-15initiator ID 2-10installation

Cluster Manager 3-17fencing 3-16Java GUI 3-17log 3-4VCS preparation 2-16view cluster configuration 3-14

installer command 3-7Installing 3-18installvcs command 3-5interconnect

cable 13-10

Page 428: VERITAS Cluster Server for UNIX Fundamentals

Index-6 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

cluster communication 12-5configuration 13-18configuration procedure 13-19configure 12-10Ethernet 2-7failure 14-6, 14-15, 14-17failure recovery 14-33link failures 13-7network partition 13-9, 14-17partition 14-6recover from network partition 13-10requirement 2-7specifications 12-6

IPadding a resource 7-12address configuration 5-8administrative address 5-8application address configuration 5-15

JJava GUI

installation 3-17installation on Windows 3-18simulator 4-22Windows 3-18

jeopardy membershipafter interconnect failure 13-15autodisabled service groups 13-8definition 13-8system failure 15-19

join cluster membership 12-17

Kkey

SCSI registration 14-12

Llicense

adding 3-7HAD startup problem 15-15upgrading 2-15VCS 3-7verification 2-15

linkhigh-priority 12-6low-priority 12-6

resource 7-31Linux

configuring IP address 5-10configuring virtual IP address 5-16llttab 12-12rpm command 3-14SCSI ID 2-11startup files 12-18

LLTadding links 13-20configuration 13-20definition 1-17link failure 13-8link failures 13-13link status 12-9links 12-5low-priority link 13-14node name 12-14simultaneous link failure 13-9startup files 12-18timeout 13-6troubleshooting 15-12, 15-14

lltconfig command 3-14llthosts file 3-12, 12-14lltstat command 12-9llttab file 3-12, 12-11local build 6-22local resource attribute 9-13log

agent 15-5display 4-8installation 3-4troubleshooting 7-26

log fileagent 15-4engine 15-4hashadow 15-4system 15-4

low priority link 12-6low-latency transport 1-17LUN 14-11

MMAC address 12-9main.cf file

backup 6-27backup files 8-19

Page 429: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals Index-7Copyright © 2004 VERITAS Software Corporation. All rights reserved.

Critical attribute 7-35definition 1-23editing 8-14example 8-5fencing 14-30installation 3-13network resources 7-15offline configuration 8-4old configuration 8-18online configuration 6-14Process resource 7-25resource dependencies 7-34service group example 7-8, 7-37, 8-12storage resources 7-22syntax 8-17troubleshooting 8-16

main.previous.cf file 8-18maintenance

cluster 2-13staffing 2-4

ManageFaults attribute 11-7, 11-22MANPATH environment variable 2-13manual

application migration 5-21application start 5-17fault management 11-7mount file system 5-14seeding 12-19starting notifier 10-7

member systems 12-8membership

cluster 12-4GAB 12-7jeopardy 13-8joining 12-17regular 13-8

message severity levels 10-5migration

application 5-5application service 5-21VCS stop 6-30

mini-cluster 13-9mkfs command 5-7modify

cluster interconnect 13-19service group 7-5

monitoradjusting 11-13, 11-14

interval 11-13network interface 9-6probe 15-27VCS 15-4

MonitorInterval attribute 11-11, 11-13MonitorTimeout attribute 11-14mount command 5-14Mount resource 7-19, 7-22mounting a file system 5-14

Nname

cluster 2-16convention 7-9resource 7-9service group 7-5

networkadministrator 5-11cluster interconnect 2-12configuration files 2-15configure 5-8interconnect interfaces 12-11interface monitoring 9-6interface sharing 9-4LLT link 13-14partition 13-9, 14-17preexisting partition 13-17

network partition 13-10definition 14-6

NIC resource 7-10, 7-14, 9-13changing ToleranceLimit attribute 11-18false failure detections 11-11parallel service groups 9-8sharing network interface 9-5

NoFailover trigger 11-26nonpersistent resource 9-9notification

ClusterService group 10-6concurrency violation 15-24configuration 10-6configure 2-16, 3-9, 10-8e-mail 10-4event messages 10-5fault 11-24GroupOwner attribute 10-11message queue 10-4overview 10-4

Page 430: VERITAS Cluster Server for UNIX Fundamentals

Index-8 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

ResourceOwner 10-10severity level 10-5support e-mail 15-8test 10-7trigger 10-13

notifier daemonconfiguration 10-6message queue 10-4starting manually 10-7

NotifierMngr resourceconfiguring notification 10-6definition 10-8

nsswitch.conf file 5-11

Ooffline

entry point 11-14nonpersistent resource 4-11resource 4-15resource problems 15-29service group problems 15-25troubleshooting 9-9

offline configurationbenefits 6-18cluster 6-4examples 6-19procedure for a new cluster 8-4procedure for an existing cluster 8-7troubleshooting 8-16

OfflineMonitorInterval attribute 11-11, 11-13OfflineTimeout attribute 11-14online

definition 4-9entry point 11-14nonpersistent resource 4-9resource 4-14resource problems 15-28

online configurationbenefits 6-12cluster 6-4overview 6-13procedure 7-4service group 7-4

OnlineTimeout attribute 11-14operating system

configuration files 2-15patches 2-15

VCS support 2-13override attributes 11-19override seeding 12-19

Ppackage installation 3-5parallel

service group 9-8service group configuration 9-11service group type 1-9

parentresource 1-11, 7-31

partial online 15-22, 15-24definition 4-9when resources taken offline 4-15

partitioncluster 13-13interconnect failure 13-9preexisting 13-17recovery 13-10

partition-in-time 14-34path

coordinator disk 14-29fencing data disks 14-24storage 2-8

PATH environment variable 2-13persistent resource 4-11, 9-8, 15-27Phantom resource 9-9, 9-10plan, implementation

implementation plan 2-5Port 12-7preexisting network partition 13-17

severed cluster interconnect 14-17PreOnline trigger 10-13prepare

applications 5-4identify application components 5-6site 2-6VCS installation 2-16

private network 1-16privilege

UNIX user account 6-7VCS 6-10

probeclear autodisable 15-19clear resource 15-27

Page 431: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals Index-9Copyright © 2004 VERITAS Software Corporation. All rights reserved.

persistent resource fault 4-16resource 4-17, 11-21, 12-20service group not probed 15-20

Process resource 7-23, 7-25Proxy resource 9-6, 9-7

RRAID 2-8raw disks 5-7recover

network partition 13-10recovery

fenced system 14-33from ADMIN_WAIT state 11-22partition-in-time 14-34resource fault 11-20

registrationwith coordinator disks 14-13

regular membership 13-8Remote Build state 6-27replicated state machine 12-4requirements

hardware 2-7software 2-13

ResAdminWait trigger 11-26reservation 14-13ResFault trigger 10-13ResNotOff trigger 10-13resolv.conf file 5-11resource

attribute 1-12attribute verification 5-23child 1-11clear fault 4-16CLI configuration 7-16configuration procedure 7-9copying 7-29critical 11-4Critical attribute 7-35definition 1-10deletion 7-29dependency 5-24dependency definition 1-11dependency rules 1-11disable 7-28event handling 11-25fault 4-16, 11-5, 15-27

fault detection 11-11fault recovery 11-20GUI configuration 7-10local attribute 9-13name 7-9nonpersistent 9-9offline definition 4-15offline order 4-11offline problems 15-29online definition 4-14online order 4-9online problems 15-28parent 1-11persistent 4-11, 9-8probe 12-20recover 11-22restart 11-16restart example 11-17troubleshooting 7-26type 1-13type attribute 1-13verify 5-18

resource typecontrolling faults 11-15None 9-8OnOff 9-8OnOnly 9-8testing failover 11-13troubleshooting 15-31

ResourceOwner attribute 10-6, 10-10ResStateChange trigger 10-13, 11-26RestartLimit attribute 11-15, 11-17root user account 6-6, 6-12rsh 2-14, 3-7rules

resource dependency 7-32Running state 6-27

SSAN 2-8, 2-12SCSI

cable 2-9controller configuration 2-9termination 2-9

seedingdefinition 12-17manual 12-19

Page 432: VERITAS Cluster Server for UNIX Fundamentals

Index-10 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

override 12-19split brain condition 12-19troubleshooting 15-15

service groupautodisable 12-20CLI configuration 7-6concurrency violation 15-24data protection 14-14, 14-16definition 1-8evacuation 6-30event handling 11-25failover attributes 11-7failover type 1-9failure to come offline 15-25failure to come online 15-22, 15-23fault 11-4flush 7-27, 15-22freeze 4-13, 15-21GUI configuration 7-5heartbeat 13-16hybrid type 1-9manage 4-6name 7-5network interface 9-4offline 4-11offline configuration 8-7online 4-9online configuration 7-4parallel 9-8, 9-11parallel type 1-9status 9-8, 9-9test procedure 7-30testing 8-20troubleshooting 7-26, 9-9, 15-18types 1-9unable to fail over 15-26unable to probe 15-20validate attributes 5-25worksheet 7-8

ServiceGroupHB resource 13-16shutdown

application 5-12VCS 6-30

simulatorJava GUI 4-22offline configuration 8-14test 8-15VCS 4-18

Simulator, command line interface 4-21

Simulator, configuration files 4-20Simulator, Java Console 4-19Simulator, sample configurations 4-20single point of failure 2-8single-ended SCSI controller 2-9site access 2-4SMTP notification configuration 3-9SNMP

console configuration 10-12notification 3-9notification configuration 10-6

softwareconfiguration 2-13managing applications 4-4requirements 2-13verification 2-15

Solarisabort sequence 2-14configure IP address 5-9configure virtual IP address 5-15llttab 12-11pkginfo command 3-14SCSI ID 2-10startup files 12-18

split brain condition 13-13, 13-17, 15-19definition 14-7

split-brain condition 13-8ssh 2-14, 3-7.stale file

close configuration 6-16open configuration 6-14protecting the cluster configuration 6-17save configuration 6-15startup check 6-22VCS startup 6-25

stale flag in starting VCS 6-28Stale_Admin_Wait state 6-25, 15-16start

volumes 5-14with a .stale file 6-28

startupby default 6-22fencing 14-21, 14-29files 12-18probing 12-20

statecluster 12-5

Page 433: VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX, Fundamentals Index-11Copyright © 2004 VERITAS Software Corporation. All rights reserved.

cluster membership 12-4Stale_Admin_Wait 6-25unknown 6-25

statusdisplay 4-7license 3-7LLT link 12-9service group 9-8

storagerequirement 2-8shared

bringing up 5-14configuring 5-7

verification 2-12switch

network 2-7service group 4-12

sysname file 12-15system

cluster member 2-16failure 14-5, 14-14failure recovery 14-33fault 13-5, 13-6GAB startup specification 12-16ID 12-11, 12-13incorrect local name 15-23join cluster membership 12-17LLT node name 12-14local attribute 9-13seeding 12-17state 12-5

SystemList attribute 7-5, 15-18systems

service group configuration 7-5

Ttermination 2-9test

service group 7-30testing

application service 5-13integrated components 5-19network connections 2-12notification 10-7service group 8-20

TFrozen attribute 4-13, 11-7, 15-21timeouts

adjusting 11-14GAB 13-6LLT 13-6

ToleranceLimit attribute 11-16tools

offline configuration 8-14online configuration 6-13

traps 10-12trigger

event handling 11-25fault handling 11-25NoFailover 11-26notification 10-13PreOnline 10-13ResFault 10-13, 11-26ResNotOff 10-13, 11-26resnotoff 10-13ResStateChange 10-13, 11-26Violation 15-24

troubleshooting 13-10agents 15-30configuration 7-26configuration backup files 8-19duplicate node IDs 15-13flush service group 7-27GAB 15-11guide 15-9HAD startup 15-15LLT 15-12, 15-14log 7-26log files 15-5main.cf file 8-16message ID 15-7offline configuration 8-16recovering the cluster configuration 8-18resource types 15-31VCS 15-9

types.cf filebackup 6-27backup files 8-19definition 1-23installation 3-13simulator 4-22

UUMI 15-5, 15-7unique message identifier 15-5

Page 434: VERITAS Cluster Server for UNIX Fundamentals

Index-12 VERITAS Cluster Server for UNIX, FundamentalsCopyright © 2004 VERITAS Software Corporation. All rights reserved.

UNIXroot user account 6-6user account 6-7

UseFence attribute 14-25user account

creating 6-9modifying 6-10privileges 6-10root 6-7UNIX 6-7VCS 6-8

Vvalidation

design 5-22service group attributes 5-25

VCSaccess control authority 6-8administration 4-5administration tools 4-5administrator 6-7architecture 1-24communication 12-4communication overview 12-5engine startup problems 15-15fencing configuration 3-16fencing implementation 14-25forcing startup 6-26installation preparation 2-16installation procedure 3-6license 3-7management tools 4-5membership and configuration data 13-15response to system fault 13-5SNMP MIB 10-12SNMP traps 10-12starting 6-22starting stale 6-28starting with .stale file 6-25startup 12-20startup files 12-18stopping 6-30support 15-7system name 12-15troubleshooting 15-5user accounts 6-6

vcs.mib file 10-12verification

resources 5-18software 2-15

VERITASSupport 15-7, 15-8

VERITAS Product Installer 3-4Violation trigger 15-24vLicense Web site 2-15volume management 5-7volume management software 2-13Volume resource 7-18, 7-21VPI 3-4vxfen driver 14-21, 14-23vxfenadm command 14-27vxfenclearpre command 14-34vxfenconfig command 14-21vxfendg file 14-28vxfentab file 14-29vxfentsthdw command 14-27VxVM

fencing 14-22fencing implementation 14-24resources 7-18

WWait state

resource 15-29troubleshooting 8-17

Web GUIaddress 3-14configuration 2-16configure 3-8

worksheetdesign 2-6offline configuration 8-10resource dependencies 7-34resource dependency 5-24resource example 7-21resource preparation 5-22service group 7-8