16
WHITE PAPER Datrium ControlShiſt™ Mobility and DR Orchestration

WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

WHITE PAPER

DatriumControlShift™ Mobility and DR Orchestration

Page 2: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

Contents

1. Introduction 3

2. Legacy Data Protection Architectures 5

2.1 RPO/RTO and Compute Resource Challenges 5

2.2ComplexityandInefficiencyofJugglingMultipleProducts 6

2.3 Stretched Clusters and CDP Alternatives 7

2.4 Data Integrity Risks 7

3.ControlShiftIntegratesAllBackup&DRComponentsIntoOneSystem 7 4. DR Orchestration As-a-Service 11

5.EliminationoftheDRSite 12

5.1TheSameVMsOn-PremisesandintheCloud 12

5.2Ahead-of-TimeDeploymentofaCloudDRSite 13

5.3Just-in-TimeDeploymentofaCloudDRSite 14

6.Summary 16

2

Page 3: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

DatriumControlShift™ Mobility and DR Orchestration

1. IntroductionDatrium ControlShift is a cloud-based workload mobility and DR orchestration service for on-premises and cloud environments. ControlShift provides end-to-end orchestration for workload protection, backup and replication to cloud or other on-premises sites, DR plan definition, workflow execution, testing, compliance checks and report generation.

ControlShift DR plans operate on 3 different types of sites: Protected Site, Backup Site and Failover Site. Separate Backup and Failover site designations enable ControlShift to pass down the economic benefits of cloud pay-per-use elasticity to the user via just-in-time creation of a Cloud Failover Site - a Software-Defined Data Center (SDDC) in VMware Cloud on AWS1. With ControlShift, VMware Cloud on AWS becomes a foundational piece for a complete cloud DR solution.

Within ControlShift, the protected site is a Datrium DVX system executing workloads covered by a DR plan. The backup site is a physical DVX or a Cloud DVX instance receiving backups from the protected site. The failover site is a physical site or a cloud-based site which is designated to take over workload execution following a disaster. The three base ControlShift use cases are described in the table below:

1Technical Preview

WHITE PAPER

3

Page 4: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

Use Case Protected Site BackupSite Failover Site

Prem→Prem→Prem On-premises On-premises On-premises

Prem→Cloud→Prem On-premises Cloud DVX On-premises

Prem→Cloud→Cloud On-premises Cloud DVX VMware Cloud on AWS

ControlShift enables administrators to organize sites into flexible topologies, so that they can adapt these basic use cases to the availability needs of their own unique environments. Some examples of these topologies:

Prem→Prem→PremIn a simple 2-site topology, a single DVX System serves as both a backup and failover site. The protected site sends snapshot replicas to the backup/failover site. ControlShift orchestrates instant failover to the failover site with no additional data transfer at recovery time.

Prem→Prem→PremPrem→Cloud→PremIn this common 3-site topology, the protected site sends backups to both the failover site and a cloud backup site. This topology is useful for extra protection, for longer archiving, or as a method of recovering from ransomware attacks that require deep history. When executing failover, ControlShift can either use backups available on the failover site (zero RTO) or fetch them from the backup site.

Prem→Cloud→PremThis solution is based on a single cloud backup site. It can be used for operational recovery or for disaster recovery when the on-premises hardware remains operational (e.g. user errors, ransomware attacks). ControlShift failover will use either the local backups on the protected site (zero RTO) or cloud backups.

This extension of the previous topology enables recovery from cloud backups even when the disaster destroys the protected site. A new system is procured followed by ControlShift failover using snapshots stored in the cloud. While the RTO is longer (the backups need to be retrieved to the new system), this is a very cost efficient solution that makes recovery possible without maintaining a secondary on-premises DR site.

4

Page 5: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

Prem→Cloud→Cloud

This topology combines the economic benefits of eliminating the need for a secondary DR site and replacing it with low RTO recovery to the public cloud. Cloud DVX serves as a cloud backup site. Following a disaster event, a failover site is deployed in the public cloud. ControlShift performs failover to the newly created cloud failover site. Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply during normal operation.

Failing over to VMware Cloud on AWS is currently a technical preview feature.

2. Legacy Data Protection ArchitecturesLegacy storage protection architectures rely on tiers of specialized primary and secondary storage appliances and accompanying backup software. In many scenarios, DR needs are addressed by dedicated DR orchestration software that is separate from backup software. These architectures have evolved during the client-server era and present RPO/RTO, resource utilization, and risk mitigation challenges for modern hybrid cloud environments. To provide a complete data protection solution, it is necessary to tie together many different products from multiple vendors with inherent operational complexity.

2.1 RPO/RTO and Compute Resource ChallengesTraditional backup software runs once a day and delivers 24-hour RPO and RTO. Because of the impact and high resource usage, the “backup window” is most frequently conducted once a day during off-hours. While this might be sufficient for some backup scenarios, DR generally has more demanding RPO and RTO requirements. Application owners demand better SLAs for DR that cannot be met by legacy backup software because of associated performance bottlenecks and the impact of backups on production workload execution.

Recovery from backups involves retrieving full backups from the backup array to the primary storage array resulting in inferior RTO that might even exceed RPO. This data transfer may go on for many days following a disaster. A 24-hour backup RPO and RTO does not meet modern DR requirements, forcing administrators to also deploy other dedicated DR solutions in parallel with backups. A common method for implementing DR with lower RPO and RTO is based on the primary array LUN or volume mirroring.

Array-based LUN mirroring is more efficient for protecting entire sites than backup software replication, because it involves replicating data in its native storage format without rehydrating and transforming the data multiple times by the backup software. Because of its lower resource usage and a smaller impact on production workloads, array LUN mirroring could run on a more aggressive schedule than traditional backups (e.g. every 30 minutes).

While commonly used for DR, array LUN mirroring does not eliminate backups because LUN or volume replicas don’t provide full backup capabilities to satisfy regulatory and operational requirements for data protection:

• no deep backup storage: arrays can accommodate only a modest number of LUN snapshots• no backup catalog• no visibility inside a LUN• no recovery of individual VMs or files

5

Page 6: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

The lack of visibility inside the LUN is one of the main reasons for the existence of parallel backup and DR stacks. The types of data protection provided by the legacy backup and DR solutions are often complementary and make up for mutual deficiencies. For example, moving a mission-critical application between array LUNs might adversely affect LUN-based DR (application lost upon the LUN recovery from a snapshot), but it is generally handled correctly by backup software policies that are attached to user visible entities as opposed to storage array LUNs.

2.2ComplexityandInefficiencyofJugglingMultipleProductsOver time, DR orchestration software has evolved to coordinate DR recovery based on native array mirroring. DR orchestration products such as VMware Site Recovery Manager are complex distributed systems that integrate with native array mirroring via installable 3rd party array specific agents (Site Recovery Adapters - SRAs).

The following diagram illustrates the number of data transfers required for a typical data protection architecture integrating best-of-breed backup and DR products. The data protection part alone involves 5 different data transfers with a majority requiring I/O intensive data transformations. Restoring from backups or a DR failover requires a number of additional data transformations and data transfers not shown in this diagram.

Backup software keeps its data on a backup appliance (a specialized array - a Purpose Built Backup Appliance, per Gartner nomenclature). As a part of the backup process, the backup software copies recent changes from the primary array to the backup array with the help of hypervisor changed block tracking APIs. Primary storage and backup arrays have different filesystems. In addition, backup software normally utilizes its own client file system layered on top of the backup array and managing snapshots of protected entities.

This is an example of a common backup and DR stack deployed on both primary and backup sites. These products come from different vendors and require 4 independent management consoles.

Primary Storage Array Dell EMC Unity 450FBackup Array Data Domain DD6300Backup Software CommvaultDR Orchestration Software VMware SRM + Array SRAs + database

6

Page 7: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

2.3 Stretched Clusters and CDP AlternativesStretched Clusters and Continuous Data Protection (CDP) are the basis for an alternative mechanism for Disaster Recovery. Stretched Clusters aim to provide zero RPO by synchronously replicating every write from the primary to the secondary site. Stretched Clusters impose strict requirements on the inter-site network latency in 1-5ms range and cannot protect against regional disasters.

Because each write is replicated over the network in its entirety, Stretched Clusters have high network bandwidth requirements. Similar to LUN replication, Stretched Clusters require DR orchestration software to coordinate recovery on the secondary site. For example, VMware SRM was extended to support Stretched Clusters from several vendors in a way reminiscent of SRM array support. Similar to array LUN mirroring, Stretched Clusters do not eliminate backups for operational recovery. The resulting architecture remains similar to that described in the previous section.

CDP addresses the rigidity of Stretched Cluster network requirements by relaxing replication from synchronous to semi-synchronous. CDP solutions gained some popularity for providing high levels of data protection for a few carefully chosen workloads. However, it is seldom used as a complete DR solution for the entire enterprise site. CDP products are available as 3rd party software and do not eliminate the need for backup storage appliances.

2.4 Data Integrity RisksMultiple transfers of data with extensive data transformations between different complex products from multiple vendors has inherent data integrity risks. How can the administrator be sure that the backup created by reading the blocks changed between two vSphere VM snapshots stored on a Dell EMC storage array, copied into a Commvault backup stored on a Data Domain appliance, and subsequently replicated over a WAN to a DR site, actually represents the original point-in-time application state? Similarly, DR orchestration software that relies on 3rd party storage and replication has little chance of detecting in-transit data corruption due to misconfiguration or a software or hardware fault. There are no global end-to-end data integrity checks or APIs that can apply across all the multiple hardware and software products from different vendors.

In the end, administrators are left with a complex web of solutions integrating components from 3 or more vendors leading to increased complexity, ample opportunity for misconfiguration, and staggering levels of resource inefficiencies due to multiple data transformation with no end-to-end integrity checks.

A study by Dell EMC revealed that: Business using three or more vendors to supply data protection solutions lost three times as much data as those who unified their data protection strategy around a single vendor.2

3.ControlShiftIntegratesAllBackup&DRComponentsIntoOneSystem

Use cases: Prem→Prem→PremPrem→Cloud→PremPrem→Cloud→Cloud

ControlShift integrates all aspects of backup and DR into a single, centrally managed system. The resulting solution has all the benefits of best-of-breed backup and DR products without the associated complexities and inefficiencies of navigating a web of management consoles and excessive resource usage due to multiple data transfers with expensive data transformations.

2https://www.emc.com/about/news/press/2014/20141202-01.htm

7

Page 8: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

ControlShift integrates all aspects of backup and DR into a single, centrally managed system. The resulting solution has all the benefits of best-of-breed backup and DR products without the associated complexities and inefficiencies of navigating a web of management consoles and excessive resource usage due to multiple data transfers with expensive data transformations.

3.1LowRPO/RTOandMinimalResourceRequirementsControlShift leverages backups based on native storage-level snapshots with RPO of minutes, not hours and days. DVX unifies primary and secondary storage environments and natively supports forever incremental replication with no data transformations. This enables very aggressive backup and replication schedules with low resource usage and a minimal impact on the executing workloads.

A DR failover requires no additional data transfer - VMs are restarted directly from backups for any available restore point. Since no data transfer is required for recovery, and protected workloads are restarted directly from backups on the DR site, the resulting RTO is near zero, similar to RTO of array LUN mirroring used with 3rd party DR orchestration products.

However, unlike array LUN mirroring, DVX also provides a full-featured backup solution: backups are accessed via a searchable catalogue and are kept on a cost-effective SATA HDDs with the modern data reduction technologies applied at all times. In addition, primary copies and local backups share the same storage pool, drastically cutting down physical storage requirements for data protection.

8

Page 9: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

3.2SimplicityofaSingleDataStackControlShift completely eliminates the need for parallel hardware and software backup and DR stacks by integrating all components and aspects of the backup and DR into a single system with unified management. A protected DVX and one or more accompanying DVX systems deployed at another location or in the cloud are managed by a unified cloud orchestration service.

DVX integrates primary and secondary storage, making it possible to use a single management console to establish backup and replication policies and to configure, test, and execute DR plans. Both backup policies and DR plans operate on exactly the same abstractions: backups for VMs and groups of VMs. Because snapshots are at the storage level, ControlShift delivers consistent point-in-time backups across many VMs executing on different servers. Such advanced functionality requires native storage integration and is not available from 3rd party backup software that relies on hypervisor APIs to take snapshots and copy snapshot state into backups.

The system’s built-in health checks can pinpoint problems anywhere in the backup and DR stack. For example, replication failures due to network connectivity losses will automatically flag all affected DR plans. ControlShift also automatically performs DR plan compliance checks to assure that the changes in the execution environment do not invalidate DR plans.

9

Page 10: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

3.3 End-to-End Data Integrity ChecksA single data stack backup and DR solution eliminates the risks associated with multiple data transformations and misconfigurations. Because Datrium controls protected, backup, and recovery site endpoints and orchestrates all movements of data, it also automatically performs end-to-end integrity checks to verify backup fidelity regardless of data location or past replication history. Datrium employs an efficient algorithm to calculate cryptographic hashes of backups and primary storage to continuously validate data integrity across the entire distributed environment, both on-premises and in the cloud.

10

Page 11: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

4. DR Orchestration As-a-ServiceUse cases: Prem→Prem→Prem

Prem→Cloud→PremPrem→Cloud→Cloud

DR orchestration software products are complex distributed systems composed of dedicated DR orchestration servers and internal databases often augmented with 3rd party array software agents. These servers and databases are provisioned per-site and need to be licensed, secured, monitored, managed, and upgraded which requires additional maintenance and extra operational skills. The initial installation and configuration of DR products often require professional services engagements making the overall solution costly. DR roll-out and upgrade processes are lengthened due to the intricacies of the interactions of multiple cross-vendor products and components.

Datrium ControlShift is delivered as-a-service: there is nothing to install and nothing to manage. The ControlShift orchestration engine runs as an AWS-based service and leverages the public cloud infrastructure to achieve high availability for its internal operation. DR plans and execution states are replicated across multiple availability zones with an automatic failover to a healthy availability zone, without any data loss in the event of a disaster affecting the public cloud. Monitoring and upgrades are automated and performed by Datrium as a part of the service offering.

The ControlShift service is activated online, making it immediately operational and allowing users to focus on designing and testing their DR plans instead of managing the internal complexities of the DR orchestration software itself. ControlShift includes all necessary network connectivity and encryption software and establishes a secure bidirectional channel between protected sites and the orchestration engine. No external VPN is required.

This diagram shows main ControlShift service components. ControlShift employs serverless Lambda functions to automate the initial deployment of other service components and to subsequently monitor and heal all deployed services. In the extreme case of the entire Availability Zone going down, the Lambda functions will redeploy all ControlShift components in another Availability Zone restoring service availability.

11

Page 12: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

DynamoDB keeps a cloud service registry used by the Lambda functions. It is also used by Cloud DVX for auxiliary metadata indexing. All Datrium services are deployed as AMIs into a Datrium-created VPC and Subnet. VPC endpoints used to access all other external services required by ControlShift and Cloud DVX are created automatically, including the endpoints for DynamoDB, S3, and Internet Gateway used by the Datrium VPN.

ControlShift utilizes AWS Aurora RDS service for its internal transactions, such as saving plans and committing plan execution states. Aurora is highly available with data replicated 6-ways across several availability zones.

Cloud DVX uses S3 as a repository of backups in a Datrium native forever incremental compressed and deduplicated form. Cloud DVX instances run a copy of the Datrium file system designed for efficient handling of backups on cost-optimized spinning disk medium such as S3.

5.EliminationoftheDRSiteUse cases: Prem→Cloud→Prem

Prem→Cloud→Cloud

Replacing an on-premises DR site with a Cloud-based DR site has significant CAPEX and OPEX implications. However, the practicality of existing solutions is severely limited by the lack of hypervisor interoperability between private and public clouds and the associated costs of the public cloud infrastructure.

5.1TheSameVMsOn-PremisesandintheCloudWhile VMware ESX hypervisor dominates on-premises private cloud deployments, public clouds use several other incompatible hypervisors: AWS relies on Xen and, more recently, on KVM similar to Google Cloud; Azure relies on the Microsoft proprietary hypervisor. The translation between VM formats is a brittle and time consuming process which goes beyond VM disk format conversion. Complex vSphere enterprise environments rely on many other virtualization abstractions which have no immediate analogues in the public cloud: clusters, resource pools, datastores, virtual switches, port groups, etc. vSphere also offers a set of widely used services based on these abstractions that have no equivalent in the public cloud: vSphere HA, FT, vMotion, DRS, etc.

VMware Cloud on AWS finally makes the transition between private and public clouds robust by presenting an execution environment in AWS that is similar to the on-premises execution environment. No VM conversion needs to take place, VMs retain their native vSphere format, and users get access to the familiar abstractions and management tools following a failover to the cloud - the same management tools that are used on-premises prior to the failover.

12

Page 13: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

As a part of a DR plan creation, users map their on-premises virtual infrastructure abstractions (networks, resource pools, folders, datastores, IP addresses, etc.) to the corresponding entities in VMware Cloud following a process that is identical to that of Prem → Prem DR. The native on-premises VM geometry is fully preserved, as are all virtual hardware devices. The existing in-guest OS drivers continue to function the same way following a migration to the cloud eliminating all risks of VM conversion between different hypervisor types and the associated virtual hardware and guest OS driver changes.

5.2Ahead-of-TimeDeploymentofaCloudDRSiteThis diagram shows an example of a deployed Cloud DR site that maps to a Software Defined Data Center (SDDC) in VMware Cloud on AWS. In cases where a DR site has a secondary function of executing non-DR workloads during normal operation, an SDDC can be provisioned prior to failover.

13

Page 14: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

If the sole purpose of the Cloud DR site is to take over workload execution in the event of disaster and it remains otherwise unutilized, further significant cost savings are possible by the just-in-time deployment.

5.3Just-in-TimeDeploymentofaCloudDRSiteWhile replacing an on-premises DR site with a virtual site hosted in the public cloud is attractive for many reasons, by itself this does not necessarily reduce the total cost of the overall DR solution because of the recurring charges for maintaining a cloud DR site. The DR costs are merely shifted from on-premises capital and operational expenses to the recurring costs of maintaining an always-on cloud DR site. A careful total cost of ownership (TCO) analysis is needed to ensure that the overall cloud DR solution is price competitive with the original on-premises DR solution.

DR related activities don’t contribute to the company’s top-line performance, but they are necessary to mitigate risk. Optimizing the costs of DR is, therefore, an important TCO consideration. Just-in-time deployment of a cloud DR site presents an attractive alternative to continuously maintaining a warm stand-by cloud DR site. With just-in-time deployment, the recurring costs of a cloud DR site are eliminated in their entirety until a failover occurs and cloud resources are provisioned.

14

Page 15: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

Dedicated on-premises DR sites are normally minimally utilized resulting in resource wastage: real estate, power, cooling, capital expenditures for compute resources and costs of skilled labor to keep DR sites operational.

The on-demand nature of public clouds enables ControlShift to drastically reduce the operating costs of disaster recovery by deploying the bulk of the DR infrastructure programmatically following a DR event. During steady state operation, ControlShift maintains a minimal low-cost AWS cloud footprint to accommodate cloud backups with no ongoing charges for the cloud DR site. The backups are sent to the cloud backup site, and after some processing, land in a cost-effective compressed and deduplicated form in an S3 bucket. In just-in-time mode of deployment, a cloud DR site is created only following a disaster. VMware Cloud Software-Defined Data Center (SDDC), a Cloud DR site with a significantly larger server footprint and associated costs, is deployed only immediately prior to executing a DR plan.

To make this possible, ControlShift leverages the space and cost efficiencies of Datrium Cloud DVX. The protected site replicates VMs or protection groups in their forever incremental format to Cloud DVX, which in turn stores them in a compressed and deduplicated native format within the low-cost S3. During normal operation, the costs of data protection are limited to the costs of the Cloud DVX backup service and the cost of the S3 medium.

15

Page 16: WHITE PAPER Datrium ControlShift™ Mobility and DR ...€¦ · Most cloud infrastructure charges apply only while the cloud failover site is deployed. Only cloud backup charges apply

DATRIUM’S CONTROLSHIFT™ MOBILITY AND DR ORCHESTRATIONWHITE PAPER

© 2019 Datrium, Inc. All rights reserved Datrium | 385 Moffett Park Dr. Sunnyvale, CA 94089 | 844-478-8349 | www.Datrium.com

Following a DR event, ControlShift deploys a new SDDC in VMware Cloud on AWS and orchestrates the failover to this SDDC as a part of a DR plan execution. This process utilizes a fast high-bandwidth network link from VMware Cloud to AWS S3 to get access to backups. The recurring charges for Cloud DR site start accumulating only following the SDDC deployment. The just-in-time deployment of SDDC reduces DR TCO by over an order of magnitude.

ControlShift supports an efficient orchestrated failback following an on-premises site recovery. If upon recovery the on-premises site retains some pre-disaster data, only data changes incurred while executing in the cloud DR site are transferred back to the on-premises protected site.

Ahead-of-time vs. just-in-time provisioning of SDDC is a trade-off between costs and RTO. With ahead-of-time SDDC provisioning, SDDC creation latency could be eliminated. Just-in-time SDDC provisioning drastically lowers the costs, but increases the RTO by deploying SDDC only following a failover.

6.SummaryDatrium ControlShift is a cloud-based DR and workload mobility orchestration service that leverages the execution and operational efficiencies of a single integrated data stack to orchestrate all aspects of Disaster Recovery. ControlShift is dramatically simpler and significantly less resource intensive than legacy DR solutions resulting in lower RPO and RTO for cloud and on-premises environments. The single integrated data and orchestration stack enables consistency checking of the entire environment, which drastically reduces errors at the time of disaster. Just-in-time-DR to cloud provides further transformational economics.

16