Upload
ezra-watson
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
SQLCAT: SQL Server 2012 AlwaysOn Lessons Learned from Early Customer DeploymentsSanjay MishraProgram ManagerMicrosoft Corporation
DBI360
Setting the StageAssumed Pre-requisites for this presentation: Basic knowledge of
AlwaysOn Failover Cluster Instances (FCI)AlwaysOn Availability Groups (AG)
There is much more to each of these deployments than we can discuss in this session. Come by the SQL Server Technical Learning Center (TLC) / Booth and discuss with us.
Setting the Stage
AlwaysOn ≠ Availability Groups
AlwaysOn = { SQL Server Failover Cluster Instances, Availability Groups }
Availability Groups ≠ Database Mirroring
Key Learnings from Early Customer Deployments• Windows Cluster
• is the foundation for HA and DR in SQL Server 2012 AlwaysOn• AlwaysOn inherits all “characteristics” of Windows Cluster
• Windows Cluster • every single AlwaysOn deployment is a Windows Cluster deployment
• Windows Cluster • understand Windows Cluster for succesfully deploy, operate, monitor, troubleshoot,
administer AlwaysOn• key areas are: quorum model, cluster network communication, DR procedures,
cluster.exe, PowerShell• Windows Cluster • ≠ SQL Cluster (SQL Server Failover Cluster Instance)• therefore, is NOT necessarily a shared-storage cluster
• Windows Cluster • many key enhancements have been made to Windows Cluster specifically for SQL
Server 2012 AlwaysOn• Asymmetric Disk• Node Votes• Asymmetric Disk as Quorum resource
Key Learnings from Early Customer Deployments• Organizational structure• Typically, teams and skills are organized into separate groups – SQL Server DBA
team and Windows Server Admin team• AlwaysOn reaches out beyond the SQL Server DBA• DBAs need to work closely with Windows / Network Administration teams• Not just for initial deployment, but for troubleshooting and disaster recovery as
well• Historical experience• need to unlearn and relearn a few things if you are already experienced with
Windows Cluster, but new to AlwaysOn• For example, if you haven’t read the Windows Cluster documentation in the last few
months, it is worth a re-read now• New/Different Tools for administration and troubleshooting• Windows cluster log • Failover Cluster Manager• Knowledge of PowerShell and cluster.exe command lines will come very handy
SQL Server 2012 AlwaysOn Customer Examples
Customer SQL Server 2012 AlwaysOn HA+DR Solution
1 Microsoft IT Availability Group for HA and DR
2 bwin.party Availability Group for HA and DR
3 Caregroup Availability Group for HA and DR
4 ServiceU CorporationFailover Cluster Instance for local HA + Availability Group for DR
5 EdgenetMulti-site Failover Cluster Instance (FCI) for HA and DR
customer
Microsoft IT SAP ERP Deployment
~ 6 TB in a single, central, row/page compressed databaseDatabase growth around 120GB/month
Live in ~85 countries 4,000 named GUI users~100K internal web users plus external web usersUp to 1500+ concurrent users2 million dialog steps per business day240K+ batch job executions per month80+ million transactions steps per month (100+ million during Year End)0.8 seconds user response time 99.995% availability since SQL Server 2005Database Servers: 4 X 8 cores, 256 GB of memory
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Microsoft-IT/Microsoft-Ensures-Smooth-Operation-of-ERP-System-and-Cuts-Disaster-Recovery-Time/710000000493
Microsoft IT SAP ERP Usage Statistics
Production
TestDR Site
Log Shipping
SAP Volume Test and Integration System Image of production
Synchronous DBM
WitnessPrimary Site
HA/DR Deployment Prior to SQL Server 2012Database Mirroring for local HA, Log Shipping for DR
SQL Server 2012 AlwaysOn DeploymentAvailability Group for HA and DR
11
Production
Test
DR Site
Async
SAP Volume Test and Integration SystemImage of production
Sync
1 1 1
0
File share for Cluster Quorum
5+ TbEMC
CX3-80SAN
7+ TbEMC
CX3-80SAN
7+ TbEMC
CX3-80SAN Sync
Production Availability Group on production DBMS cluster
SAP production CI cluster containing File Share quorum for DBMS cluster
Test Availability Group on test DBMS cluster
SAP test CI cluster containing File Share quorum for test DBMS cluster
Primary Site
customer
bwin.party digital entertainment plc
The System
Online gaming and gamblingReal money handling system for bwin.partyAuthoritative system for Responsible Gaming LimitationsIncludes a specialized Data Warehouse
Multiple databases, and multiple availability groups in the topology4 servers in the topology
Each server is hosting the primary replica of an AG, and secondary replica of other AGsFocus on 1 AG in this presentation
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/bwin.party/Company-Cuts-Reporting-Time-by-up-to-99-Percent-to-3-Seconds-and-Boosts-Scalability/710000000087
HA/DR objectives
>99.99% availability in the last years>99.99% availability even with maintenance
RPO: Zero data lossRTO: 10 seconds or lessPlan for the worst case scenario: Loss of a complete datacenterMust still be able to do maintenance during the worst case
Deployment Architecture: Prior to SQL Server 2012
Pre-SQL Server 2012 Challenges
Can’t easily glue together databases that need to run on the same nodeData Warehouse load restrictions due to limitations in Log Shipping Maintaining database mirroring connection strings (failover_partner) in all applications is painful, and in some cases (some 3rd party applications) not even supported
Deployment Architecture: SQL Server 2012AlwaysOn Availability Groups
Key Points
Quorum Model: Node and FileShare MajorityEach node has a voteFileShare in 3rd DatacenterAutomatic Failover between Datacenters
Avoiding downtime in case of Datacenter failure
Gains
Faster failoverMaintenance now is easy to do during a failure conditionReduced system load on Primary due to backup offloadingAbility to run read-only workload on the secondary, and not interfere with OLTP production
Considerations for Migration
Migration involves other teams, not just the DBA teamNeed to change connection string, as the DBM connection string (with failover_partner) only works with one secondaryDifferent machines used different OS versions before. This is no longer possibleAll machines in the topology now need to be in the same Active Directory domain
customerCareGroup Healthcare Systems
CareGroup Healthcare SystemsAmong Top 5 Large Healthcare Systems in the USAFour Hospitals located in Boston, MA 16,000 Employees 146 Mission Critical Clinical Applications 2+ Million Patient Medical Records Annual Revenue : $2 Billion All mission-critical applications are enabled for high availability and DRRanked #1: Most Innovative Healthcare IT nationwide (InformationWeek)
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Beth-Israel-Deaconess-Medical-Center/Hospital-Improves-Availability-and-Speeds-Performance-to-Deliver-High-Quality-Care/5000000011
CareGroup Database Classification and SLA 80+ databases rated “AAA”
RPO 0 & RTO 0 Standard HA/DR Solution: FCI + AG Storage: Use EMC Clariion SAN with SSD disk
300+ databases rated “AA” RPO =<1 hour & RTO 1 hour Standard HA/DR Solution: Hyper-V and AlwaysOn AG Storage: Use EMC Clariion SAN
Rest of the databases rated “A” RPO & RTO 1 day Selective HA/DR
SQL Server 2012 HA / DR Architecture for “AA” applications
Sync
ASync
Windows 2008 R2 Hosts Cluster
Windows 2008 R2 Guest Cluster
Availability Group: BillingSys
Pri
mary
Sit
eD
R S
ite
Denali_A Denali_B
Denali_C
Pri
mar
y
Hyper-VNode BHyper-V
Node A
Node C
HW & OS Failure ProtectionOS & SQL Failure ProtectionDisk & DB Failure Protection
customer
ServiceU Corporation,Part of the Active Network
ServiceU Solution OverviewServiceU provides web-based online scheduling, event management, payment processing, and other services to customers in 15 countriesArchitecture Goals:
99.99% uptime (which means maximum allowable downtime of 52 minutes per year including scheduled maintenance)Security – Level 1 PCI Service ProviderPerformance
Architecture Decision DriversTechnologies should provide more uptime – even if a few secondsTry to eliminate manual intervention Eliminate single points of failureKeep it simple!! Make sure troubleshooting can be done easily
Approach to High AvailabilityHighly trained personnel, extensive monitoring, good documentation, standardization across the enterprise
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/ServiceU/Online-Company-Reduces-Downtime-and-Helps-Its-Customers-to-Improve-Service/4000011506
ServiceU FCI + DBM Solution (Pre-SQL Server 2012)FCI for local HA, DBM for DR
Asynchronous Database Mirroring
Windows Server 2008, SQL Server 2008 Windows Server 2008, SQL Server 2008
Disk Only Quorum Disk Only Quorum
• 3 nodes in each FCI• SQL Server is available with NO user intervention! (unless there is a
disaster)• “Last Man Standing”• Disk Only Quorum provides benefits but the quorum disk must be fully
protected and always available
SQL Server 2008 FCI #1 SQL Server 2008 FCI #2
Windows Server Failover Cluster #1 Windows Server Failover Cluster #2
Disk Only Quorum
SECONDARY – SQL Server 2012 FCI #2PRIMARY – SQL Server 2012 FCI #1
• Windows Server 2008 and later – support added for Asymmetric Disk Only Quorum• Must be configured with cluster.exe; not supported in GUI or PowerShell• Requires testing and thorough knowledge of clustering• With a primary site loss, getting the cluster online at the remote site involves force
quorum, changing to node majority, then disk only• Allows “Last Man Standing”
Availability Group (Asynchronous Secondary)
ServiceU FCI + AG Solution (SQL Server 2012)FCI for local HA, AG for DR
This is a single Windows cluster instead of a
Windows cluster at each site.
Asymmetric storage is the key to this architecture.
Setup for Availability Groups across FCIsIn a FCI + AG setup, the SQL Instance names must be
unique within the Windows Cluster
In a FCI + AG setup, the data and log file paths should be the same between all instances; by default the instance name is part of the file path, making them different
Site 1 Site 2 Note
WRONG INST01 INST01 This was correct with FCI+DBM configuration
RIGHT INST01 DRINST01
This means default file paths are different for data and log files because the instance name is part of the path (discussed below)
Site 1 Site 2
NOT Recommended
F:\MSSQL11.INST01\MSSQL\DATA
F:\MSSQL11.DRINST01\MSSQL\DATA
RIGHT F:\DATA F:\DATA
customer
Edgenet, Inc.
About EdgenetLeader in Data Services, Guided Selling and Marketing SolutionsConsumers and businesses want details about products. At Edgenet, we organize that product information to increase sales.Provide retail applications
Help retailers sell configurable productsHelp consumers compare and purchase the right product for them.
Collect, certify and distribute product dataGoogle Search & ShoppingBing Search & ShoppingRetailersOne of Four Active US GDSN-certified pools
Rigorous certification and data quality scoring process
http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012/Edgenet/Data-Provider-Supports-Growth-and-Gains-Competitive-Advantage-with-Microsoft/4000011528
Edgenet Multi-site FCI SolutionConfiguration
SLA: 99.99% Annual uptimeProvides high availability and disaster recovery for our data pool applications
Near real-time data replication with MSDTC support Additional, read-Only secondary to offload Exports & BI Workload
Software / HardwareSQL Server 2012 EnterpriseWindows Server 2008 R2 DatacenterBrocade 5300 - 8 Gb FC Switches EMC Clariion CX4-80EMC RecoverPoint CE – Disk Based ReplicationNEC Express 5800/A1080a-D GX
Edgenet HA / DR Topology DiagramMulti-site FCI for HA/DR + AG readable secondary replicaPrimary Site - Milwaukee DR Site - Atlanta
WSFC Node BFCI Passive Node
WSFC Node AFCI Active Node
EMC RecoverPoint CE Appliances
EMC RecoverPoint CE Appliances
Hardware Replicated LUNS Hardware Replicated
LUNS
WSFC Node CAvailability Group Secondary
Replica(Synchronous, Readable)
LUNS for AG secondary
Asynchronous SAN Replication
300 Mb Ethernet Connection
850 M
iles
10.10.10.0/24
11.11.11.0/24
Edgenet HA / DR SolutionCluster, Disk and Instances
3 Node Windows Server Failover Cluster2 Nodes (one at each data center) SQL Stretch Cluster (Multi-site FCI)
850 mi. – Milwaukee to Atlanta1 Node in the primary DC hosting AG readable secondary
4 Clustered SQL instances, 1 Clustered MSDTC11 TB of useable SAN replicated storage – 54 LUNSMulti-Subnet (two)TempDB on Local Disk
Saves money on storage replication licensingReduces cross-data center storage replication trafficEnables use of local solid state storage to improve performance
… And there is more …
Please come by the booth if you would like a deep dive discussion on any of
these or other customer deployments
DBI Track Resources
@sqlserver@teched_europe
#msTechEd
mvaMicrosoft Virtual Academy
SQL Server 2012 Eval Copy
Get Certified!
Hands-On Labs
Resources
Connect. Share. Discuss.
http://europe.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Evaluations
http://europe.msteched.com/sessions
Submit your evals online
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.