RightScale Webinar: Safeguard Your Cloud Apps by Ensuring High Availability & Disaster Recovery Plans with RightScale & AWS

  • Published on

  • View

  • Download

Embed Size (px)


<ol><li> 1. 1#Safeguard Your Cloud Applications: Ensuring High Availability and Disaster Recovery PlansMarch 14, 2013 Watch the video of this presentation </li><li> 2. 2#Your Speakers TodayPresenting Miles Ward, Advanced Solutions Architecture, AWS Brian Adler, Sr. Services Architect, RightScaleQ&amp;A Ryan Geyer, Cloud Solutions Engineer, RightScale Greg Goodwin, Account Manager, RightScalePlease use the Questions windowto ask questions any time! </li><li> 3. 3#Learn at our EventsRightScale Annual Conference: Special offer for webinar attendees10% off conference registration10% off any trainingExpires March 22COMPUTE_Webinar_10AWS Summits in a city near youNYC, SF, London, Sydney and morehttps://aws.amazon.com/aws-summit-2013/ </li><li> 4. 4#Agenda Terminology/Level-Setting Takeaways Cloud and Component Definitions Designing for Failure Architectural Options and Considerations High Availability Disaster Recovery Conclusions / Q&amp;A </li><li> 5. 5#Faults? Facilities Hardware Networking Code People </li><li> 6. 6#What is Fault-Tolerant? Degrees of risk mitigation - not binary Automated Tested! </li><li> 7. 7#Old School Fault-Tolerance: Build Two </li><li> 8. 8#Cloud Computing Benefits No Up-Front Low Cost Pay Only forCapital Expense What You UseSelf-ServiceEasily Scale Up Improve Agility &amp; Infrastructure and Down Time-to-MarketDeploy </li><li> 9. 9#Cloud Computing Fault-ToleranceBenefitsNo Up-Front HALow CostPay for DR OnlyCapital Expense Backups When You Use it Self-ServiceEasily Deliver Fault- Improve Agility &amp; DR Infrastructure Tolerant Applications Time-to-RecoveryDeploy </li><li> 10. 10#AWS Cloud allows Overcast RedundancyHave the shadow duplicateof your infrastructure readyto go when you need it but only pay for what you actually use </li><li> 11. 11# Old Barriers to HAare now Surmountable Cost Complexity Expertise </li><li> 12. 12#AWS Building Blocks: Two StrategiesInherently fault- Services that are fault-tolerant tolerant serviceswith the right architecture Amazon S3 Amazon EC2 Amazon SimpleDBAmazon Virtual Private Cloud (Amazon VPC)Amazon DynamoDB Amazon Elastic Block Store (EBS)Amazon CloudFrontAmazon Relational Database Service Amazon SWF (Amazon RDS)Amazon SQS Amazon SNSAmazon SES Amazon Route 53 Elastic Load Balancing AWS Elastic BeanstalkAmazon ElastiCache Amazon Elastic MapReduceAWS Identity and Access Management (IAM) </li><li> 13. 13#The Stack:Resources DeploymentManagementConfiguration Networking Facilities Geographies </li><li> 14. 14#TerminologyAbility of a system to Fault TolerantThe process, policiescontinue operating systems are and proceduresproperly (perhaps at measured by their related to restoringa degraded level) if Availability in terms critical systems afterone or moreof planned anda catastrophic event.components fails.unplanned service Goal is to get outages for end application back up users.and running within a defined time period (RTO) and within a certain data loss window (RPO). </li><li> 15. 15#Terminology - continuedTime period in which service Acceptable data loss as amust be restored to meet result of a recovering from aBCP (Business Continuity disaster/catastrophic eventPlanning) objectives RTO and RPO are often at odds, and tradeoffs need to be made in order to find an acceptable middle ground </li><li> 16. 16#Takeaways Understand core concepts behind HA and DR Introduction to architectural options for designing HA, fault- tolerant applications and DR environments and procedures Best Practices for implementation of these architectural options within AWS (independent of RightScale) Multi-Availability Zone (AZ) and Multi-Region Architectural options and Considerations / pros and cons of these options Understanding of the tools RightScale brings to AWS to simplify the creation of these HA and DR environments </li><li> 17. 17#Regions &amp; Availability ZonesUS East Region EU West Region Japan Availability AvailabilityZone A Zone BAvailability Availability AvailabilityAvailability Zone A Zone B Zone AZone B AvailabilityZone C US West Region SingaporeAvailability AvailabilityAvailabilityAvailability Zone A Zone BZone AZone BSource: AWS Zones within a region share a LAN (high bandwidth, low latency, private IP access) Zones utilize separate power sources, are physically segregated Regions are islands, and share no resources. </li><li> 18. 18#Designing for FailureEverything fails, all the time. Werner Vogels, CTO Amazon.com Large scale failures in the cloud are rare but do happen Application owners are ultimately responsible foravailability and recoverability Balance cost and complexity of HA efforts againstrisk(s) you are willing to bear Cloud infrastructure has made DR and HA remarkablyaffordable versus past options -Multi-Server -Multi-AZ (Availability Zone) -Multi-Region </li><li> 19. 19#Designing for Failure Basic Concepts Fault tolerance is the goal. Degradation of service may occur, but application continues to function. Avoid single points of failure (SPOF) Assume everything fails (remember Werners mantra) and design accordingly Plan and practice your recovery process (both for HA and DR) Remember that better HA and DR equals more $$$. So find that acceptable balance. </li><li> 20. 20#High Availability Dont sweat the small stuff. And its all small stuff* *(until its not) Follow a few general best practices to absorb application component outages </li><li> 21. 21#General HA Best Practices Avoid single points of failure. Always place one of each component (load balancers,app servers, databases) in at least two AZs. Replicate data across AZs (HA) and backup or replicateacross regions for failover (DR) Setup monitoring, alerts and operations to identify andautomate problem resolution or failover process. </li><li> 22. 22# High availability for top web properties with 270M visitors/month Migration from datacenter to AWS RightScale provides -Self-service access to developers -Consistency and low maintenance -Usage and cost accounting -Multi-region architectures to avoid downtime </li><li> 23. 23#Multi-Zone HAConsider DNS distributed172.168.7.31172.168.8.62 NoSQL 1 databases withUS-EAST 1aUS-EAST 1bthe same LOAD BALANCERSLOAD BALANCERSdistribution considerations .APP SERVERS AUTOSCALE MASTER DB SLAVE DB Consider local storage for additionalslave database to removeREPLICATE dependency on attached volume EBSSNAPSHOTS S3 Snapshot data volume for backups Place Slave databases in oneso the database can be readily or more zones for failover.recovered within the region. </li><li> 24. 24#Disaster Recovery Dont sweat the small stuff. And its all small stuff* *(until its not) DR presents a few new wrinkles compared to HA, but there are multiple options depending on your needs and budget </li><li> 25. 25#HA/DR Checklist for Risk Mitigation Determine who owns the architecture, DR process and testing. Develop expertise in-house and / or get outside help. Conduct a risk assessment for each application. Specify your target RTO and RPO. Design for failure starting with application architecture. This will help drive the infrastructure architecture. </li><li> 26. 26#HA/DR Checklist for Risk Mitigation Implement HA best practices balancing cost, complexity and risk. -Automate infrastructure for consistency and reliability. Document operational processes and automations. Test the failover... then test it again. Release the Chaos Monkey. </li><li> 27. 27#Multi-Region/Cloud DR OptionsAvailability Downtime99.999% 0Multi-Cloud HA(Live/Live Config)99.9%&lt; 5 MinsHot DR (Least Common)99.5%&lt; 1 Hour Warm DR (Recommended)99%&gt; 1 HourCold DR(Most Common) $$$$$$ $$$$ </li><li> 28. 28#Multi-Region Cold DRStaged Server Configuration and generally no staged data Not recommended if rapid recovery is required Slow to replicate data to other cloud and bring database online DNS172.168.7.31US EAST US WESTLOAD BALANCERS LOAD BALANCERSAPP SERVERS APP SERVERSMASTER DB SLAVE DBSLAVE DBREPLICATEEBSSNAPSHOTS S3 </li><li> 29. 29#Multi-Region Warm DRStaged Server Configuration, pre-staged data and running Slave Database Server Generally recommended DR solution Minimal additional cost and allows fairly rapid recoveryDNS EAST US WESTLOAD BALANCERS LOAD BALANCERSAPP SERVERSAPP SERVERSMASTER DBSLAVE DB SLAVE DB REPLICATEREPLICATEEBSSNAPSHOTS SNAPSHOTS S3 </li><li> 30. 30#Multi-Region Hot DRParallel Deployment with all servers running but all traffic going to primary Not recommended Very high additional cost to allow rapid recovery DNS172.168.7.31 US EAST US WEST LOAD BALANCERS LOAD BALANCERSAPP SERVERS APP SERVERS MASTER DBSLAVE DB SLAVE DBREPLICATEREPLICATE EBS SNAPSHOTSSNAPSHOTS S3 </li><li> 31. 31#Hybrid HALive/Live configuration. Geo-target IP services to direct traffic to regional LBs. Possible, but not recommended (more to follow) Max additional cost and max availability, but complex to implement and manage DNS172.168.7.31 CHICAGOLOAD BALANCERSLOAD BALANCERSAPP SERVERS APP SERVERSMASTER DB SLAVE DB SLAVE DBREPLICATEREPLICATE EBSSNAPSHOTS SNAPSHOTS S3 SWIFT </li><li> 32. 32#Hybrid HALooks similar to Multi-Zone but additional problems to solve as some resourcesare not shared Security requires addtl effort as You need DNS managementsecurity groups are Region-or a global load balancer. DNS specific. US-EASTCHICAGO Machine Images LOAD BALANCERSLOAD BALANCERS are specific to the cloud/region.APP SERVERS APP SERVERS MASTER DBSLAVE DB SLAVE DBREPLICATEREPLICATE EBS VOLUMESNAPSHOTS SNAPSHOTS S3 SWIFT </li><li> 33. 33# Procurement software SLA to their customers require HA Subway chain is a customer that procures perishable goods through Coupa </li><li> 34. 34#In the DashboardCostforecastingMulti-region for DR or cloudenvironmentMulti-region Warm DRStagedservers </li><li> 35. 35#Automating HA and DR Use dynamic DNS for your database servers Allow app servers to use a single FQDN. Use a low TTL to allow rapid failover in the case of a change in master database Automatic connection of app servers to load balancing servers App servers can connect to all load balancers automatically at launch No manual intervention No DNS modifications Automated promotion of slave to master Process is automated Decision to run process is manual </li><li> 36. 36#How RightScale makes it possibleMultiCloud Images MultiCloud Images can be launched across regions and hybridwithout modificationServerTemplate contains a list1 of MultiCloud Images (MCIs) When the Server is 2 created, a specific MCI is chosen. The appropriate 3 RightImage is used atMultiCloud Images launch. Cloud A, RightImage 1 Cloud B, RightImage 2 Cloud C, RightImage 3 Cloud A, RightImage 1 Cloud A Stability across cloudsImage 1RightImage </li><li> 37. 37#How RightScale makes it possibleServerTemplates, Tags, and Inputs Automated load balancer registration and database connections Autoscaling across zones Dynamic configuration </li><li> 38. 38#DR Cost Comparison ExampleMulti-Region Multi-Region Multi-RegionCold DRWarm DRHot DRTotal $4480 / month$5630 / month$8800 / monthRunning $4470 / month$5540 / month$8440 / month3 Load Balancers (Large) 3 Load Balancers (Large) 6 Load Balancers (Large)6 App Servers (XLarge) 6 App Servers (XLarge) 12 App Servers (XLarge)1 Master DB (2XLarge)1 Master DB (2XLarge)1 Master DB (2XLarge)1 Slave DB (2XLarge) 2 Slave DB (2XLarge) 2 Slave DB (2XLarge)Staged$0 / month $0 / month3 Load Balancers (Large) 3 Load Balancers (Large)6 App Servers (XLarge) 6 App Servers (Xlarge)1 Slave DB (2XLarge)Replication $10 / month$90 / month$360 / month25GB / day cross-zone25GB / day cross-region100GB / day cross-region </li><li> 39. 39#Outage-Proofing Best PracticesPlace in &gt;1 zone: Replicate data Replicate data Load balancers across zones across zones App serversBackup acrossDesign stateless Databasesregionsapps for resilience to reboot / relaunchMaintain capacity Monitoring, alert,to absorb zone or and automateregion failures operations tospeed up failover </li><li> 40. 40#Resources and Q&amp;ARightScaleAWSTry: RightScale Free EditionContact:aws.amazon.com/contact-www.rightscale.com/free usContact:Toll Free: 1.866.720.0208 Attend:Intl: 1.805.855.0265 https://aws.amazon.com/aws-summit-2013/Attend:http://www.rightscalecompute.com/</li></ol>