Upload
autumn-trew
View
263
Download
4
Tags:
Embed Size (px)
Citation preview
VMware vCenter Server High Availability
Product Support Engineering
VMware Confidential
VI4 - Mod 2-1 - Slide 2
Module 2 Lessons
Lesson 1 – vCenter Server High Availability
Lesson 2 – Distributed Resource Scheduler
Lesson 3 – Fault Tolerance Virtual Machines
Lesson 4 – Enhanced vMotion Compatibility
Lesson 5 – DPM - IPMI
Lesson 6 – vApps
Lesson 7 – Host Profiles
Lesson 8 – Reliability, Availability, Serviceability ( RAS )
Lesson 9 – Web Access
Lesson 10 – vCenter Update Manager
Lesson 11 – Guided Consolidation
Lesson 12 – Health Status
VI4 - Mod 2-1 - Slide 3
Module 2-1 Lessons
Lesson 1 – Overview of High Availability
Lesson 2 – VMware HA Clusters
Lesson 3 – Creating HA Clusters
Lesson 4 – Monitoring HA Clusters
Lesson 5 – HA Clusters Best Practices
Lesson 6 – Troubleshooting VMware HA
Lesson 7 – Customizing VMware HA
VI4 - Mod 2-1 - Slide 5
High Availability Solutions
Unplanned downtime
VMware Infrastructure builds fault tolerance capabilities into datacenter infrastructure.
These features can be easily configured, thus reducing the cost and complexity of providing higher availability.
Key fault-tolerance capabilities built into VMware Infrastructure include:
Network interface (NIC) teaming to provide tolerance of individual network card failures
Storage multipathing to tolerate storage path failures
VI4 - Mod 2-1 - Slide 6
High Availability Solutions
VMware High Availability and VMware Fault Tolerance, implemented through VMware Infrastructure, offer simple, cost effective solutions that help mitigate situations that could otherwise make data or services unavailable to users.
VMware HA - Checks that ESX/ESXi hosts are functioning. If an ESX/ESXi host fails, another ESX/ESXi host restarts any virtual machines that were running on the server that failed.
VMware Fault Tolerance (FT) - Checks that individual virtual machines are functioning and deals with failures without any interruption in service. VMware FT creates hidden duplicate copies of running virtual machines so if a virtual machine fails due to hardware or software failures, the duplicate virtual machine can immediately replace the one that was lost.
VI4 - Mod 2-1 - Slide 7
High Availability Solutions
High availability and fault tolerance are different from other business continuity offerings in that the solution:
Exists within a single datacenter. Other solutions exist across physical locations.
Uses shared storage for holding the machines' data. Other solutions use multiple copies of the data, which are regularly replicated.
Fault tolerance addresses a number of common problematic situations
VI4 - Mod 2-1 - Slide 10
Understanding the Resource Allocation Tab in ClustersIf the host being used to start a virtual machines is in a cluster, you can view information about reserved resources on the Resource Allocation tab for that cluster.
The information for the CPU and Memory reservations indicates that reservations have been made
Summary reservation information displays information about reservations on the cluster root, where all reservations occurred.
Individual virtual machines do not actually have any reservations.
VI4 - Mod 2-1 - Slide 11
VMware HA Cluster Prerequisites
This section describes the prerequisites for establishing VMware HA clusters.
A number of conditions must be established for VMware HA to be used.
All virtual machines and their configuration files must reside on shared storage (such as a SAN)
Hosts must also be configured to have access to the same virtual machine network.
Each host in a VMware HA cluster must have a host name assigned and a static IP address.
VMware recommends redundant Service Console and VMkernel networking
NOTE After you have added a NIC to a host in your VMware HA cluster, you must reconfigure VMware HA on that host.
VI4 - Mod 2-1 - Slide 12
A note on VMware HA ‘slot’ Calculation
Slot calculation is still done by the vCenter HA service. It gives the HA service the capacity of the cluster as a
whole
For Virtual Center 2.x the VM with maximum resource consumption was the one chosen as the basis of the slot calculation.
This poised a problem if there was only one heavily resourced Virtual Machine and the other VM’s did not use so much resources.
You would get an unfair calculation of remaining resources.
This has been changed for vCenter 4.
The slot size is shown in the UI.
VI4 - Mod 2-1 - Slide 13
A note on VMware HA ‘slot’ Calculation
When you use the Host failures cluster tolerates option, it is most effective if all virtual machines have a similar CPU and memory requirement.
If you have highly variable configurations, consider using the Percentage of cluster resources reserved as failover spare capacity option.
When tolerating a specific number of host failures, VMware HA plans for a worst-case scenario by considering all powered-on virtual machines in a cluster and finding the maximum memory and CPU reservations.
These maximums are the basis for what is called a slot, which is a logical representation of the largest virtual machine in the cluster.
If no reservations are set on a virtual machine, default requirements of 256MB and 256MHz are assigned.
VI4 - Mod 2-1 - Slide 14
A note on VMware HA ‘slot’ CalculationVMware HA determines how many slots are available in each ESX/ESXi host based on the host’s CPU and memory capacity.
VMware HA then determines how many ESX/ESXi hosts could fail with the cluster still having at least as many slots as powered on virtual machines.
When you use the Percentage of cluster resources reserved as failover spare capacity option, each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine needs and how much uncommitted resources remain on cluster resources for failovers.
If sufficient resources are available, the virtual machine is powered on. This process does not guarantee maintaining a level of service if a number of hosts fail, but it is a more flexible and less conservative approach to assessing whether or not to power on machines.
This policy does not use slots. It uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 256MB and 256MHz is applied. This is controlled by the same HA advanced options used for the failover level policy.
VI4 - Mod 2-1 - Slide 15
Monitoring Availability
You can monitor changes in your high-availability deployment using events and alarms
Use the functionality included in Alarms and Actions to determine what actions are taken when VMware HA events occur.
VI4 - Mod 2-1 - Slide 16
Creating a VMware HA Cluster
Clusters enable a collection of ESX/ESXi hosts to work together.
This provides higher levels of availability for virtual machines
You can create a new cluster using the Cluster Creation Wizard
VI4 - Mod 2-1 - Slide 22
Admission Control Policy
VMware HA provides options for what policy is enforced if admission control is enabled.
Host failures cluster tolerates – VMware HA reserves a certain amount of resources across a set of hosts. These reserved resources are sufficient to sustain performance even if the specified number of hosts fail.
Percentage of cluster resources reserved as failover spare capacity – VMware HA reserves a certain percentage of aggregate resources in the cluster to accommodate failures.
Specify a failover host – VMware HA reserves a specific host to accommodate failures. This is a more static solution, where a single host is designated as the host that will be the target for virtual machines if one of the other hosts fails.
VI4 - Mod 2-1 - Slide 23
Tolerate Some Number of Host Failures
You can configure VMware HA to tolerate a specified number of host failures.
When using the “Host failures cluster tolerates” option, it is most effective if all virtual machines have a similar CPU and memory requirement.
If you have highly variable configurations, consider using the “Percentage of cluster resources reserved as failover spare capacity” option
Each host has some amount of memory and CPU that it can make available for use by virtual machines.
Each virtual machine must be guaranteed its CPU and memory reservation requirements.
VI4 - Mod 2-1 - Slide 24
Tolerate Some Number of Host Failures
VI4 - Mod 2-1 - Slide 25
Tolerate Some Number of Host Failures
VI4 - Mod 2-1 - Slide 26
Reserve a Percentage of Cluster Resources
You can configure VMware HA to reserve a specific percentage of cluster resources for recovery from host failures
When using the “Percentage of cluster resources reserved as failover spare capacity” option,
Each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine would need and how much uncommitted resources remain on cluster resources for failovers
This policy does not use slots, but rather it uses the actual reservations of the virtual machines.
VI4 - Mod 2-1 - Slide 27
Specify a Failover Host
You can configure VMware HA to reserve a specific host as failover capacity.
When using the Specify a failover host option, If one host fails, first attempts are made to restart the virtual
machines on the reserved host, but if this is not possible for some reason such as insufficient resources or that the reserved host has failed, attempts are made to restart virtual machines on any available host in the cluster
This option does not guarantee a level of availability.
It establishes a spare host to use in case of failover
If a failover host is specified, HA admission control prevents users from powering on a virtual machine on the failover host or VMotioning virtual machines to the failover host
VI4 - Mod 2-1 - Slide 29
VM Restart Priority
VM restart priority determines the relative order in which virtual machines are restarted after a host failure.
Assign higher restart priority to the virtual machines that host the most important services.
For example, in the case of a multi-tier application you might opt to rank assignments according to functions hosted on the virtual machines:
High: Database servers that will provide data for applications.
Medium: Application servers that consume data in the database and provide results on web pages.
Low: Web servers that receive user requests, pass queries to application servers, and return results to users.
VI4 - Mod 2-1 - Slide 30
Host Isolation ResponseDetermines what happens when a host in a VMware HA cluster loses its service console networks (or Vmkernel networks, in ESXi) connection but continues running.
Values are: Leave VM powered on (the default), Power off VM, and Shut down VM.
When a host in a HA cluster loses its console network (or VMkernel network, in ESXi) connectivity, the host is isolated from other hosts in the cluster.
Virtual Machine Settings
You can override the default settings established for the cluster. For each virtual machine, you can establish individual settings for Restart Priority and Isolation Response.
VI4 - Mod 2-1 - Slide 31
Virtual Machine Monitoring Sensitivity
The degree to which VMware HA is sensitive to virtual machine failures can be configured to different levels.
If you select Enable VM Monitoring, VMware Tools will evaluate whether each virtual machine in the cluster is running by checking for regular heartbeats from the GOS.
In such a case, the VM monitoring service determines that the virtual machine has failed and the virtual machine is rebooted to restore service
Click on the Custom box to configure advanced features for Monitoring Sensitivity
VI4 - Mod 2-1 - Slide 32
Best Practices for Configuring VMware HA Clusters
Networking Best Practices
If your switches support the PortFast (or an equivalent) setting, enable it on the physical network switches that connect servers.
This helps to prevent a host from incorrectly determining that a network is isolated during the execution of lengthy spanning-tree algorithms
On ESX hosts, HA automatically opens the firewall ports that are needed for it to function. The following ports are opened:
Incoming port: TCP/UDP 8042-8045
Outgoing port: TCP/UDP 2050-2250
VI4 - Mod 2-1 - Slide 35
Best Practices for Configuring VMware HA Clusters
Selection of Networks
The networks that HA will use by defaults is:
ESX: all Service Console Networks
ESXi: All VMKernel networks, *except* the VMotion network, unless there is only one network and it is a VMotion network
By default, the network isolation address is the default gateway , so it is a best practice to add a das.isolationaddress[...] for each network
HA to select the default networks you can use the advanced option das.allowNetwork[...] and HA will use only networks whose port group names match.
ESXi by default uses all VMKernel networks, except the VMotion Network unless there is only one network defined. Use das.AllowVmotionNetworks to override this default behavior. Also, you can use das.allowNetwork[...] to specify the networks that will be used for HA.
VI4 - Mod 2-1 - Slide 36
Clusters with both ESX and ESXi hosts
In mixed ESX and ESXi clusters, using the das.allowNetwork[...] advanced options may be necessary to ensure compatible networks are selected for hosts.
HA configuration enforces that all hosts in the cluster have compatible networks.
The first node added to the cluster dictates the networks that all subsequent hosts must have for them to be allowed into the cluster
Networks are deemed compatible if the IP address and subnet mask combine to result in a network that matches another host's
Use das.allowNetwork[...] advanced options to control which networks are to be used to ensure compatibility between all hosts in the cluster
VI4 - Mod 2-1 - Slide 37
Setting Up Networking Redundancy
Networking redundancy between cluster nodes is important for VMware HA reliability.
Redundant service console networking on ESX4 (or VMkernel networking on ESXi) allows the reliable detection of failures and prevents isolation conditions from occurring.
NIC Teaming
Using a team of two NICs connected to separate physical switches improves the reliability of a service console (or, in ESXi, VMkernel) network.
To configure a NIC team for the service console, configure the vNICs in vSwitch configuration for Active or Standby configuration. The recommended parameter settings for the vNICs are:
Default load balancing = route based on originating port ID
Failback = No
VI4 - Mod 2-1 - Slide 38
Secondary Service Console NetworkYou can create a secondary service console (or VMkernel port for ESXi), which is attached to a separate virtual switch
The primary service console is used for network and management purposes.
With a secondary service console network created, VMware HA sends heartbeats over both the primary and secondary service consoles.
When you set up service console redundancy, you must specify an additional isolation response address (das.isolationaddress2) for the service console networks
When you specify a secondary isolation address, you should increase the das.failuredetectiontime setting to 20000 milliseconds or greater
Adding a secondary service console network to the VMotion vswitch. A virtual switch can be shared between VMotion networks and a secondary service console network.
VI4 - Mod 2-1 - Slide 39
Other VMware HA Cluster Considerations
Use larger groups of homogeneous servers to allow higher levels of utilization across an VMware HA-enabled cluster (on average).
More nodes per cluster can tolerate multiple host failures while still guaranteeing failover capacities.
The failover level policy used in admission control heuristics is conservatively weighted, so that virtual machines on large servers can fail over to smaller servers.
VI4 - Mod 2-1 - Slide 41
Viewing Information about VMware HA ClustersYou can view current settings for a cluster
The cluster Summary page displays summary information for the cluster.
VI4 - Mod 2-1 - Slide 44
Primary and Secondary Hosts
Some hosts in a VMware HA cluster are designated as primary hosts.
They maintain information about the cluster such as membership.
The first five hosts in the cluster are designated primary hosts, and all subsequent hosts are designated secondary hosts.
When you add a host to a VMware HA cluster, that host communicates with an existing primary host in the same cluster to complete its configuration
When a primary host becomes unavailable or is removed from the cluster
VMware HA promotes one of the secondary hosts to primary status.
Primary hosts help provide redundancy by replicating the cluster's configuration information and virtual machine states and are used to initiate failover actions
VI4 - Mod 2-1 - Slide 45
VMware HA Clusters and Maintenance Mode
Put a host in maintenance mode in preparation for completing administrative tasks that would otherwise cause unwanted HA responses.
Putting a host into maintenance mode effectively disables the HA service.
You cannot power on a virtual machine on a host that is in maintenance mode.
VMware HA does not fail over any virtual machines to a host that is in maintenance mode
When a host exits maintenance mode, the VMware HA service is reenabled on that host, so it becomes available for failover again
If the host is in a cluster, when it enters maintenance mode the user is given the option to evacuate powered-off virtual machines
VI4 - Mod 2-1 - Slide 46
VMware HA Clusters and Disconnected Hosts
Users may initiate state changes, such as during network maintenance.
ESX/ESXi host in a cluster may no longer be able to communicate with other hosts in a cluster
That host becomes disconnected
The unresponsive host continues to function, but its state is unknown
When a host is disconnected, VMware HA cannot use it as a guaranteed failover target.
VMware HA does not consider disconnected hosts when making calculations related to admission control.
When the host becomes reconnected, the host becomes available for failover again
VI4 - Mod 2-1 - Slide 47
VMware HA Clusters and Disconnected Hosts
The difference between a disconnected host and a host that is not responding is that:
A disconnected host has been explicitly disconnected by the user. As part of disconnecting a host, VMware HA is disabled on that host. The virtual machines on that host are not failed over and not considered when the current failover level is computed.
If a host is not responding, no other hosts receive heartbeats from it. This might happen, for example, because of a network problem or because the host failed.
Disconnected and unresponsive hosts are not included in computations of the current failover level, but any virtual machines running on an unresponsive host will be failed over if the host fails.
VI4 - Mod 2-1 - Slide 51
Monitoring Individual Virtual Machines
You can specify behavior for individual virtual machines for:
VM Restart Priority — Indicates relative priority for restarting the virtual machine in case of host failure.
Host Isolation Response — Specifies what the ESX/ESXi host that has lost connection with its cluster should do with running virtual machines.
Monitoring Sensitivity — Specifies how quickly failures are detected. Settings can be changes so certain virtual machines are more or less aggressively monitored. Specific custom values can also be set using advanced options.
VI4 - Mod 2-1 - Slide 56
Troubleshooting VMware HA
If no hosts in a cluster are responding, when you attempt to add a new host, VMware HA configuration fails because the new host cannot communicate with any of the primary hosts.
Disconnect all hosts that are not responding before adding the new host.
After disconnecting all other hosts and adding a new host, that host becomes the first primary host.
When other hosts become available again, their VMware HA service is reconfigured and they then become primary or secondary hosts depending on the existing number of primary hosts.
VI4 - Mod 2-1 - Slide 57
Customizing VMware HAAfter you have established a cluster, you may need to modify settings. There are specific attributes that affect how VMware HA behaves
VI4 - Mod 2-1 - Slide 58
Customizing VMware HA
VI4 - Mod 2-1 - Slide 59
Customizing VMware HA
VI4 - Mod 2-1 - Slide 60
Set Advanced VMware HA Options
To precisely customize VMware HA behavior, set advanced VMware HA options.
Prerequisites
You must have a VMware HA cluster for which to modify settings.
To modify advanced VMware HA settings, you must have cluster administrator privileges.
In the cluster’s Settings dialog box, select VMware HA.
Click the Advanced Options button to open the dialog box.
Enter each advanced attribute you want to change in a text box
Click OK.
VI4 - Mod 2-1 - Slide 61
Lesson 2-1 Summary
Learn how to Create a HA Cluster
Learn how to Monitor a HA Cluster
Learn how to modify HA Cluster Settings
Learn how to troubleshoot HA Clusters
VI4 - Mod 2-1 - Slide 62
Lab – VMware High Availability
Lab 1 Part 1 - Creating a vCenter High Availability (HA) Cluster
Lab 1 Part 2 – Adding Hosts to High Availability (HA) Cluster
Lab 1 Part 3 – Viewing High Availability (HA) Cluster Settings
Lab 1 Part 4 – Modifying High Availability (HA) Cluster Settings