41
VMware vCenter Server High Availability Product Support Engineering VMware Confidential

VMware vCenter Server High Availability

Embed Size (px)

DESCRIPTION

VMware vCenter Server High Availability. Product Support Engineering. VMware Confidential. Module 2 Lessons. Lesson 1 – vCenter Server High Availability Lesson 2 – Distributed Resource Scheduler Lesson 3 – Fault Tolerance Virtual Machines Lesson 4 – Enhanced vMotion Compatibility - PowerPoint PPT Presentation

Citation preview

Page 1: VMware vCenter  Server High Availability

VMware vCenter Server High Availability

Product Support Engineering

VMware Confidential

Page 2: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 2

Module 2 Lessons

Lesson 1 – vCenter Server High Availability

Lesson 2 – Distributed Resource Scheduler

Lesson 3 – Fault Tolerance Virtual Machines

Lesson 4 – Enhanced vMotion Compatibility

Lesson 5 – DPM - IPMI

Lesson 6 – vApps

Lesson 7 – Host Profiles

Lesson 8 – Reliability, Availability, Serviceability ( RAS )

Lesson 9 – Web Access

Lesson 10 – vCenter Update Manager

Lesson 11 – Guided Consolidation

Lesson 12 – Health Status

Page 3: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 3

Module 2-1 Lessons

Lesson 1 – Overview of High Availability

Lesson 2 – VMware HA Clusters

Lesson 3 – Creating HA Clusters

Lesson 4 – Monitoring HA Clusters

Lesson 5 – HA Clusters Best Practices

Lesson 6 – Troubleshooting VMware HA

Lesson 7 – Customizing VMware HA

Page 4: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 5

High Availability Solutions

Unplanned downtime

VMware Infrastructure builds fault tolerance capabilities into datacenter infrastructure.

These features can be easily configured, thus reducing the cost and complexity of providing higher availability.

Key fault-tolerance capabilities built into VMware Infrastructure include:

Network interface (NIC) teaming to provide tolerance of individual network card failures

Storage multipathing to tolerate storage path failures

Page 5: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 6

High Availability Solutions

VMware High Availability and VMware Fault Tolerance, implemented through VMware Infrastructure, offer simple, cost effective solutions that help mitigate situations that could otherwise make data or services unavailable to users.

VMware HA - Checks that ESX/ESXi hosts are functioning. If an ESX/ESXi host fails, another ESX/ESXi host restarts any virtual machines that were running on the server that failed.

VMware Fault Tolerance (FT) - Checks that individual virtual machines are functioning and deals with failures without any interruption in service. VMware FT creates hidden duplicate copies of running virtual machines so if a virtual machine fails due to hardware or software failures, the duplicate virtual machine can immediately replace the one that was lost.

Page 6: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 7

High Availability Solutions

High availability and fault tolerance are different from other business continuity offerings in that the solution:

Exists within a single datacenter. Other solutions exist across physical locations.

Uses shared storage for holding the machines' data. Other solutions use multiple copies of the data, which are regularly replicated.

Fault tolerance addresses a number of common problematic situations

Page 7: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 10

Understanding the Resource Allocation Tab in ClustersIf the host being used to start a virtual machines is in a cluster, you can view information about reserved resources on the Resource Allocation tab for that cluster.

The information for the CPU and Memory reservations indicates that reservations have been made

Summary reservation information displays information about reservations on the cluster root, where all reservations occurred.

Individual virtual machines do not actually have any reservations.

Page 8: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 11

VMware HA Cluster Prerequisites

This section describes the prerequisites for establishing VMware HA clusters.

A number of conditions must be established for VMware HA to be used.

All virtual machines and their configuration files must reside on shared storage (such as a SAN)

Hosts must also be configured to have access to the same virtual machine network.

Each host in a VMware HA cluster must have a host name assigned and a static IP address.

VMware recommends redundant Service Console and VMkernel networking

NOTE After you have added a NIC to a host in your VMware HA cluster, you must reconfigure VMware HA on that host.

Page 9: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 12

A note on VMware HA ‘slot’ Calculation

Slot calculation is still done by the vCenter HA service. It gives the HA service the capacity of the cluster as a

whole

For Virtual Center 2.x the VM with maximum resource consumption was the one chosen as the basis of the slot calculation.

This poised a problem if there was only one heavily resourced Virtual Machine and the other VM’s did not use so much resources.

You would get an unfair calculation of remaining resources.

This has been changed for vCenter 4.

The slot size is shown in the UI.

Page 10: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 13

A note on VMware HA ‘slot’ Calculation

When you use the Host failures cluster tolerates option, it is most effective if all virtual machines have a similar CPU and memory requirement.

If you have highly variable configurations, consider using the Percentage of cluster resources reserved as failover spare capacity option.

When tolerating a specific number of host failures, VMware HA plans for a worst-case scenario by considering all powered-on virtual machines in a cluster and finding the maximum memory and CPU reservations.

These maximums are the basis for what is called a slot, which is a logical representation of the largest virtual machine in the cluster.

If no reservations are set on a virtual machine, default requirements of 256MB and 256MHz are assigned.

Page 11: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 14

A note on VMware HA ‘slot’ CalculationVMware HA determines how many slots are available in each ESX/ESXi host based on the host’s CPU and memory capacity.

VMware HA then determines how many ESX/ESXi hosts could fail with the cluster still having at least as many slots as powered on virtual machines.

When you use the Percentage of cluster resources reserved as failover spare capacity option, each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine needs and how much uncommitted resources remain on cluster resources for failovers.

If sufficient resources are available, the virtual machine is powered on. This process does not guarantee maintaining a level of service if a number of hosts fail, but it is a more flexible and less conservative approach to assessing whether or not to power on machines.

This policy does not use slots. It uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 256MB and 256MHz is applied. This is controlled by the same HA advanced options used for the failover level policy.

Page 12: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 15

Monitoring Availability

You can monitor changes in your high-availability deployment using events and alarms

Use the functionality included in Alarms and Actions to determine what actions are taken when VMware HA events occur.

Page 13: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 16

Creating a VMware HA Cluster

Clusters enable a collection of ESX/ESXi hosts to work together.

This provides higher levels of availability for virtual machines

You can create a new cluster using the Cluster Creation Wizard

Page 14: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 22

Admission Control Policy

VMware HA provides options for what policy is enforced if admission control is enabled.

Host failures cluster tolerates – VMware HA reserves a certain amount of resources across a set of hosts. These reserved resources are sufficient to sustain performance even if the specified number of hosts fail.

Percentage of cluster resources reserved as failover spare capacity – VMware HA reserves a certain percentage of aggregate resources in the cluster to accommodate failures.

Specify a failover host – VMware HA reserves a specific host to accommodate failures. This is a more static solution, where a single host is designated as the host that will be the target for virtual machines if one of the other hosts fails.

Page 15: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 23

Tolerate Some Number of Host Failures

You can configure VMware HA to tolerate a specified number of host failures.

When using the “Host failures cluster tolerates” option, it is most effective if all virtual machines have a similar CPU and memory requirement.

If you have highly variable configurations, consider using the “Percentage of cluster resources reserved as failover spare capacity” option

Each host has some amount of memory and CPU that it can make available for use by virtual machines.

Each virtual machine must be guaranteed its CPU and memory reservation requirements.

Page 16: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 24

Tolerate Some Number of Host Failures

Page 17: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 25

Tolerate Some Number of Host Failures

Page 18: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 26

Reserve a Percentage of Cluster Resources

You can configure VMware HA to reserve a specific percentage of cluster resources for recovery from host failures

When using the “Percentage of cluster resources reserved as failover spare capacity” option,

Each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine would need and how much uncommitted resources remain on cluster resources for failovers

This policy does not use slots, but rather it uses the actual reservations of the virtual machines.

Page 19: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 27

Specify a Failover Host

You can configure VMware HA to reserve a specific host as failover capacity.

When using the Specify a failover host option, If one host fails, first attempts are made to restart the virtual

machines on the reserved host, but if this is not possible for some reason such as insufficient resources or that the reserved host has failed, attempts are made to restart virtual machines on any available host in the cluster

This option does not guarantee a level of availability.

It establishes a spare host to use in case of failover

If a failover host is specified, HA admission control prevents users from powering on a virtual machine on the failover host or VMotioning virtual machines to the failover host

Page 20: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 29

VM Restart Priority

VM restart priority determines the relative order in which virtual machines are restarted after a host failure.

Assign higher restart priority to the virtual machines that host the most important services.

For example, in the case of a multi-tier application you might opt to rank assignments according to functions hosted on the virtual machines:

High: Database servers that will provide data for applications.

Medium: Application servers that consume data in the database and provide results on web pages.

Low: Web servers that receive user requests, pass queries to application servers, and return results to users.

Page 21: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 30

Host Isolation ResponseDetermines what happens when a host in a VMware HA cluster loses its service console networks (or Vmkernel networks, in ESXi) connection but continues running.

Values are: Leave VM powered on (the default), Power off VM, and Shut down VM.

When a host in a HA cluster loses its console network (or VMkernel network, in ESXi) connectivity, the host is isolated from other hosts in the cluster.

Virtual Machine Settings

You can override the default settings established for the cluster. For each virtual machine, you can establish individual settings for Restart Priority and Isolation Response.

Page 22: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 31

Virtual Machine Monitoring Sensitivity

The degree to which VMware HA is sensitive to virtual machine failures can be configured to different levels.

If you select Enable VM Monitoring, VMware Tools will evaluate whether each virtual machine in the cluster is running by checking for regular heartbeats from the GOS.

In such a case, the VM monitoring service determines that the virtual machine has failed and the virtual machine is rebooted to restore service

Click on the Custom box to configure advanced features for Monitoring Sensitivity

Page 23: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 32

Best Practices for Configuring VMware HA Clusters

Networking Best Practices

If your switches support the PortFast (or an equivalent) setting, enable it on the physical network switches that connect servers.

This helps to prevent a host from incorrectly determining that a network is isolated during the execution of lengthy spanning-tree algorithms

On ESX hosts, HA automatically opens the firewall ports that are needed for it to function. The following ports are opened:

Incoming port: TCP/UDP 8042-8045

Outgoing port: TCP/UDP 2050-2250

Page 24: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 35

Best Practices for Configuring VMware HA Clusters

Selection of Networks

The networks that HA will use by defaults is:

ESX: all Service Console Networks

ESXi: All VMKernel networks, *except* the VMotion network, unless there is only one network and it is a VMotion network

By default, the network isolation address is the default gateway , so it is a best practice to add a das.isolationaddress[...] for each network

HA to select the default networks you can use the advanced option das.allowNetwork[...] and HA will use only networks whose port group names match.

ESXi by default uses all VMKernel networks, except the VMotion Network unless there is only one network defined. Use das.AllowVmotionNetworks to override this default behavior. Also, you can use das.allowNetwork[...] to specify the networks that will be used for HA.

Page 25: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 36

Clusters with both ESX and ESXi hosts

In mixed ESX and ESXi clusters, using the das.allowNetwork[...] advanced options may be necessary to ensure compatible networks are selected for hosts.

HA configuration enforces that all hosts in the cluster have compatible networks.

The first node added to the cluster dictates the networks that all subsequent hosts must have for them to be allowed into the cluster

Networks are deemed compatible if the IP address and subnet mask combine to result in a network that matches another host's

Use das.allowNetwork[...] advanced options to control which networks are to be used to ensure compatibility between all hosts in the cluster

Page 26: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 37

Setting Up Networking Redundancy

Networking redundancy between cluster nodes is important for VMware HA reliability.

Redundant service console networking on ESX4 (or VMkernel networking on ESXi) allows the reliable detection of failures and prevents isolation conditions from occurring.

NIC Teaming

Using a team of two NICs connected to separate physical switches improves the reliability of a service console (or, in ESXi, VMkernel) network.

To configure a NIC team for the service console, configure the vNICs in vSwitch configuration for Active or Standby configuration. The recommended parameter settings for the vNICs are:

Default load balancing = route based on originating port ID

Failback = No

Page 27: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 38

Secondary Service Console NetworkYou can create a secondary service console (or VMkernel port for ESXi), which is attached to a separate virtual switch

The primary service console is used for network and management purposes.

With a secondary service console network created, VMware HA sends heartbeats over both the primary and secondary service consoles.

When you set up service console redundancy, you must specify an additional isolation response address (das.isolationaddress2) for the service console networks

When you specify a secondary isolation address, you should increase the das.failuredetectiontime setting to 20000 milliseconds or greater

Adding a secondary service console network to the VMotion vswitch. A virtual switch can be shared between VMotion networks and a secondary service console network.

Page 28: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 39

Other VMware HA Cluster Considerations

Use larger groups of homogeneous servers to allow higher levels of utilization across an VMware HA-enabled cluster (on average).

More nodes per cluster can tolerate multiple host failures while still guaranteeing failover capacities.

The failover level policy used in admission control heuristics is conservatively weighted, so that virtual machines on large servers can fail over to smaller servers.

Page 29: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 41

Viewing Information about VMware HA ClustersYou can view current settings for a cluster

The cluster Summary page displays summary information for the cluster.

Page 30: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 44

Primary and Secondary Hosts

Some hosts in a VMware HA cluster are designated as primary hosts.

They maintain information about the cluster such as membership.

The first five hosts in the cluster are designated primary hosts, and all subsequent hosts are designated secondary hosts.

When you add a host to a VMware HA cluster, that host communicates with an existing primary host in the same cluster to complete its configuration

When a primary host becomes unavailable or is removed from the cluster

VMware HA promotes one of the secondary hosts to primary status.

Primary hosts help provide redundancy by replicating the cluster's configuration information and virtual machine states and are used to initiate failover actions

Page 31: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 45

VMware HA Clusters and Maintenance Mode

Put a host in maintenance mode in preparation for completing administrative tasks that would otherwise cause unwanted HA responses.

Putting a host into maintenance mode effectively disables the HA service.

You cannot power on a virtual machine on a host that is in maintenance mode.

VMware HA does not fail over any virtual machines to a host that is in maintenance mode

When a host exits maintenance mode, the VMware HA service is reenabled on that host, so it becomes available for failover again

If the host is in a cluster, when it enters maintenance mode the user is given the option to evacuate powered-off virtual machines

Page 32: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 46

VMware HA Clusters and Disconnected Hosts

Users may initiate state changes, such as during network maintenance.

ESX/ESXi host in a cluster may no longer be able to communicate with other hosts in a cluster

That host becomes disconnected

The unresponsive host continues to function, but its state is unknown

When a host is disconnected, VMware HA cannot use it as a guaranteed failover target.

VMware HA does not consider disconnected hosts when making calculations related to admission control.

When the host becomes reconnected, the host becomes available for failover again

Page 33: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 47

VMware HA Clusters and Disconnected Hosts

The difference between a disconnected host and a host that is not responding is that:

A disconnected host has been explicitly disconnected by the user. As part of disconnecting a host, VMware HA is disabled on that host. The virtual machines on that host are not failed over and not considered when the current failover level is computed.

If a host is not responding, no other hosts receive heartbeats from it. This might happen, for example, because of a network problem or because the host failed.

Disconnected and unresponsive hosts are not included in computations of the current failover level, but any virtual machines running on an unresponsive host will be failed over if the host fails.

Page 34: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 51

Monitoring Individual Virtual Machines

You can specify behavior for individual virtual machines for:

VM Restart Priority — Indicates relative priority for restarting the virtual machine in case of host failure.

Host Isolation Response — Specifies what the ESX/ESXi host that has lost connection with its cluster should do with running virtual machines.

Monitoring Sensitivity — Specifies how quickly failures are detected. Settings can be changes so certain virtual machines are more or less aggressively monitored. Specific custom values can also be set using advanced options.

Page 35: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 56

Troubleshooting VMware HA

If no hosts in a cluster are responding, when you attempt to add a new host, VMware HA configuration fails because the new host cannot communicate with any of the primary hosts.

Disconnect all hosts that are not responding before adding the new host.

After disconnecting all other hosts and adding a new host, that host becomes the first primary host.

When other hosts become available again, their VMware HA service is reconfigured and they then become primary or secondary hosts depending on the existing number of primary hosts.

Page 36: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 57

Customizing VMware HAAfter you have established a cluster, you may need to modify settings. There are specific attributes that affect how VMware HA behaves

Page 37: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 58

Customizing VMware HA

Page 38: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 59

Customizing VMware HA

Page 39: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 60

Set Advanced VMware HA Options

To precisely customize VMware HA behavior, set advanced VMware HA options.

Prerequisites

You must have a VMware HA cluster for which to modify settings.

To modify advanced VMware HA settings, you must have cluster administrator privileges.

In the cluster’s Settings dialog box, select VMware HA.

Click the Advanced Options button to open the dialog box.

Enter each advanced attribute you want to change in a text box

Click OK.

Page 40: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 61

Lesson 2-1 Summary

Learn how to Create a HA Cluster

Learn how to Monitor a HA Cluster

Learn how to modify HA Cluster Settings

Learn how to troubleshoot HA Clusters

Page 41: VMware vCenter  Server High Availability

VI4 - Mod 2-1 - Slide 62

Lab – VMware High Availability

Lab 1 Part 1 - Creating a vCenter High Availability (HA) Cluster

Lab 1 Part 2 – Adding Hosts to High Availability (HA) Cluster

Lab 1 Part 3 – Viewing High Availability (HA) Cluster Settings

Lab 1 Part 4 – Modifying High Availability (HA) Cluster Settings