30
© 2014 VMware Inc. All rights reserved. Zero Downtime Application Mobility with Site Recovery Manager Lee Dilworth @leedilworth

Presentazione VMware @ VMUGIT UserCon 2015

  • Upload
    vmug-it

  • View
    539

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Presentazione VMware @ VMUGIT UserCon 2015

© 2014 VMware Inc. All rights reserved.

Zero Downtime Application Mobility with Site Recovery Manager

Lee Dilworth @leedilworth

Page 2: Presentazione VMware @ VMUGIT UserCon 2015

Cross-site Availability – Typical choices today

2

Two sites, treat as one = stretched cluster

Two sites, treat as two = Disaster Recovery

Site A (Active)

Site B (Active)

Stretched Storage Site A (Active)

Site B (Passive)

Replicated Storage

VCENTER SRM VCENTER SRM VCENTER

Page 3: Presentazione VMware @ VMUGIT UserCon 2015

Active-Active datacenters – use cases

• Planned maintenance of one site without any service downtime

• Transparent to app owners and end users

• Avoid lengthy approval processes

• Ability to migrate applications back after maintenance is complete

Planned Maintenance

• Automated initiation of VM restart or recovery

• Very low RTO for majority of unplanned failures

• Allows users to focus on app health after recovery, not how to recover VMs

Automated Recovery

• Prevent service outages before an impending disaster (e.g. hurricane, rising flood levels)

• Avoid downtime, not recover from it

• Zero data loss possible if you have the time

Disaster Avoidance

Page 4: Presentazione VMware @ VMUGIT UserCon 2015

What you need for an Active-Active datacenter model

Stretched Storage solution

– Storage clustering solution that supports

distributed data mirroring

– Read/write access to the same volumes

from both sites

– Some tie-break mechanism to avoid split-

brain

– Examples: EMC VPLEX, IBM SVC, NetApp

MetroCluster, etc.

– Stretched Network

Backend Array (Site 1)

Backend Array (Site 2)

Storage

controllers

Storage

controllers

Page 5: Presentazione VMware @ VMUGIT UserCon 2015

Active-active datacenter network model

5

Multi-Site Single vC with Stretched Clusters

Stretched Storage

Web App DB

Web App DB

Web App DB

N-S Connectivity N-S Connectivity

Page 6: Presentazione VMware @ VMUGIT UserCon 2015

Active-Passive datacenters – use cases

• Recover from unexpected site failure (full or partial)

• Most common use case

• Fast and accurate recovery usually critical to customers

• Workflow driven

• High degree of confidence if regular test failovers have been performed

Unplanned Failover

• Planned datacenter maintenance

• Global load balancing or distribution of service

• Using test feature to minimize risk

• Execute partial failovers

• Automated failback enables bi-directional migrations

Planned Migration

• Anticipate potential datacenter outages

• Initiate preventative failover for smooth migration of services

• Graceful shutdown of services to be migrated, zero data loss

Preventative Failover

Page 7: Presentazione VMware @ VMUGIT UserCon 2015

What you need for an Active-passive datacenter model

Replicated Storage solution

– Storage or software based replication

configured between sites

– vCenter per site

– SRM server per site

– Network can be stretched or not

– Concept is referred to as Active-passive;

reality is each site is active simply acts as

the passive DR location for it counterpart

Site A (Active)

Site B (Passive)

Replicated Storage

VCENTER SRM VCENTER SRM

Page 8: Presentazione VMware @ VMUGIT UserCon 2015

NSX 6.2 Integration with SRM 6.1

8

Implicit Mapping

Distributed Switch

Distributed Switch

SRM B SRM A

NSX Universal Logical Switch

Page 9: Presentazione VMware @ VMUGIT UserCon 2015

Cake and eat it

• Most common requests

– Use both stretched and non-stretched storage in same design

– Leverage operational benefits of SRM for stretched storage

– Use SRM to drive large scale migrations where needed on stretched solutions

• Can this be done?

– Prior to vSphere 6.0 the answer was NO, its one or the other

– Reaction from customers was usually this…….

9

Page 10: Presentazione VMware @ VMUGIT UserCon 2015

• Support introduced in vSphere 6.0

• Requires vCenter & ESXi 6.0 or later

• Simultaneously changes

– Compute

– Storage

– Network

– vCenter

• vMotion without shared storage

• Increased scale

– Pool resources across vCenter servers

What has changed?– vMotion anywhere!

10

VCENTER

VMware vSphere

Stretched Networks

VCENTER

VMware vSphere

Page 11: Presentazione VMware @ VMUGIT UserCon 2015

This cross vCenter vmotion layout seems familiar….

11

Cross vCenter vMotion layout SRM layout

Site A (Active)

Site B (Active)

Stretched Storage Site A (Active)

Site B (Passive)

Replicated Storage

VCENTER SRM VCENTER SRM VCENTER SRM VCENTER SRM

Page 12: Presentazione VMware @ VMUGIT UserCon 2015

So what can be done now?

12

VCENTER SRM VCENTER SRM

Replicated Storage

Stretched Storage

vSphere

Replication

Page 13: Presentazione VMware @ VMUGIT UserCon 2015

Why customers ask for SRM integration with stretched clusters

13

• vCenter Availability

– Failure of the site where vCenter is running disrupts management of both sites

• Operational Watchdogs

– Availability specific alarms, alerts and events

– Configuration validation on the fly

• DRS and HA are not site aware

– VMs are recovered and migrated to any site – may not be what you want !

– Could result in additional East-West traffic when your network is not designed to handle it

• No Orchestration or Testability

– Stretched Clusters lack a repeatable, testable procedure to handle unplanned failures

– HA will restart VMs based on VM restart order – but doesn’t give you granular control of VM dependencies or customization

Page 14: Presentazione VMware @ VMUGIT UserCon 2015

active/active datacenters – a new approach

Page 15: Presentazione VMware @ VMUGIT UserCon 2015

active-active datacenters with SRM 6.1

VCENTER SRM VCENTER SRM

VMware vSphere VMware vSphere

Volume A at Site 1 (Full R/W access)

Volume A at Site 2 (Full R/W access)

Stretched Networks

Page 16: Presentazione VMware @ VMUGIT UserCon 2015

scenario 1: local host failures in one site

VCENTER SRM VCENTER SRM

VMware vSphere VMware vSphere

Volume A at Site 1 (Full R/W access)

Volume A at Site 2 (Full R/W access)

Stretched Networks HA handles local failures

Page 17: Presentazione VMware @ VMUGIT UserCon 2015

scenario 2: disaster avoidance at one site

VCENTER SRM VCENTER SRM

VMware vSphere VMware vSphere

Volume A at Site 1 (Full R/W access)

Volume A at Site 2 (Full R/W access)

Stretched Networks

SRM invokes vMotion as per VM priority and

dependencies

Page 18: Presentazione VMware @ VMUGIT UserCon 2015

scenario 3: faster recovery from unplanned failures

VCENTER SRM VCENTER SRM

VMware vSphere VMware vSphere

Volume A at Site 1 (Full R/W access)

Volume A at Site 2 (Full R/W access)

Stretched Networks

SRM orchestrates entire site failover including

dependencies

Page 19: Presentazione VMware @ VMUGIT UserCon 2015

Technical Deep-dive and Demo

Page 20: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: Initial setup

20

• New SRA interface

– Contact your array vendor for the SRA availability and supported array models

– Most vendors will provide a single SRA to manage both stretched and non-stretched volumes

– Existing SRAs (with no stretched storage support) will continue to work as is with new SRM

• Configure stretched storage volumes using the array UI/tools

• SRM will discover stretched arrays/volumes through the SRA

– Use the SRM UI to verify stretched volumes and how they map to datastores

– Use the SRM UI to verify the site preference for stretched volumes

SRM

SRA

Vendor

Management

Interface

Array

Manager Array

Manager

Replication Manager

SRA

Vendor

Management

Interface

Stretched

Storage

Non

Stretched

Non

Stretched

Page 21: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: Initial setup

21

Page 22: Presentazione VMware @ VMUGIT UserCon 2015

Introducing - Storage Policy Protection Groups (SPPG)

22

Profile Driven

Protection Group

• New Style Protection Group leveraging storage profiles (SRM 6.1)

• Level of indirection and automation compared to traditional protection groups

• Policy based approach reduces OpEx by handling VM protection lifecycle automatically

• Simpler integration of VM provisioning, migration, and decommissioning with other solutions such as vRealize Automation

Storage Policy

Page 23: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: configuring protection

23

• Stretched storage supported ONLY with Storage-Profile based Protection Groups (SPPG)

• Configure storage profiles for stretched volumes at each site

• Configure protection groups for the storage profiles

• SRM will automatically protect all VMs assigned to the storage profiles in the group

• Can mix stretched and non-stretched storage in the same protection group

• Protection groups with stretched devices MUST have a preferred direction

• Preferred direction MUST match any site preference defined at the storage layer

• Cannot create two groups in opposing directions using same devices

Page 24: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: configuring recovery

24

• Create one or more recovery plans for all or some protection groups

– Same protection group can belong to multiple recovery plans

– Can mix stretched, non-stretched storage (only if they are SPPG group types)

• Configure recovery settings for each VM

– Stretched VMs** can depend on non-stretched VMs and vice versa

– Can configure IP customization for stretched VMs that do not have stretched networks (for DR/Test)

– Can assign scripts to stretched VMs

– Can opt out of vMotion for some VMs even if they reside on stretched storage

** - Stretched VMs refers to VMs on Stretched Storage datastores

Page 25: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: test failover

25

• Use test failover to make sure an unplanned failover would succeed

• Stretched VMs are included in the test failover

• Stretched VMs are powered on in an isolated network

• SRM will perform vMotion host compatibility tests as part of the test failover

• SRM will not perform vMotion as part of the test failover

• Complicated environmental issues (i.e. network latency) may remain undetected

• Test failover for stretched storage requires array support

– Not all arrays support snapshotting of stretched devices

– Contact your array vendor for specific compatibility requirements

VCENTER SRM VCENTER SRM

test

Page 26: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: planned migration

26

• Use planned failover to avoid an expected outage

• Choose whether to use vMotion when initiating a planned migration

– Enabled, SRM uses vMotion on stretched

– Disabled, SRM power off / power on as normal

• SRM will reassign site preference to recovery site for all stretched volumes (if array supports it)

• SRM will perform a storage sync to make sure no blocks are left at the protected site

• Planned failover with mix of stretched and non-stretched VMs is ok

Site A (Active)

Site B (Passive)

Stretched Storage

VCENTER SRM VCENTER SRM

Page 27: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: unplanned failover

27

• Use unplanned failover to recover from a disaster

– Initiated when the protected site is no longer functional

• Stretched VMs are powered on at the recovery site

– Stretched and non-stretched VMs are recovered together

– Priority tiers and VM dependencies are honored across all VMs

– SRM will coordinate with the array to guard against any VMs still running at the protected site

• Stretched volumes are recovered faster – shorter RTO

– Stretched volumes are already visible at the recovery site

– No need to wait for costly surfacing, mounting and host rescan operations

Page 28: Presentazione VMware @ VMUGIT UserCon 2015

SRM with stretched storage: reprotect and failback

28

• Use reprotect to reverse the roles of sites after a successful planned failover

– Repairs replication for stretched devices

– Ensures the new protected site (former recovery site) has the site preference for stretched devices

• Reprotect after an unplanned failover

– Rerun planned failover once the protected site becomes available

• Failback

– Initiate a planned failover after the reprotect to migrate all VMs back to the former protected site

– It is recommended to perform a test failover first to make sure everything is ready

Page 29: Presentazione VMware @ VMUGIT UserCon 2015

Key Takeaways Roadmap

• SRM is a great solution for Active Active datacenters

• SRM enhances Continuous Availability with rich

orchestration

• SRM with vMotion enables ZERO service downtime for

disaster avoidance

• No longer trade-off testability and repeatability when

choosing Active-Active model

• SRM + Live Migration is a game changer in IT

operations

Page 30: Presentazione VMware @ VMUGIT UserCon 2015

Thank you

30