Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
2
Agenda
Basic SAP Architecture
SAP HA Architecture in Azure
Pacemaker
Azure Load Balancer
Demo of unplanned failover
4
Basic SAP architecture
Application
Server
Database
ServerShared Disk
Central
Services
Database
Storage
10
SUSE High Availability Overview
corosync (cluster membership)
pacemaker (crm)
Resource Agents (RAs)
Fencing (stonith)
Kernel Kernel
SAP SAPSAP
Storage
(SBD)
vIP vIP
14
Resource Agents
Provides ‘intelligence to Pacemaker’
A script used to start/stop/monitor a resource
• Ideally should be Open Cluster Framework compliant
• Well defined return values
• Mandatory operations
• Return value passed back to Pacemaker
• Many providers of RAs
• Ships with around 140 RA out of the box.
• Resource Agents for SAP HANA included in SLES for SAP Applications
16
Why Do We Need Fencing?
To a cluster node, loss of a peer node is indistinguishable from loss of
communication with that node.
In the former case, is it safe to failover resources?
And in the latter case?
17
Split Brain
• When a cluster partitions due to network failure
• Neither side knows if the other is still alive
• Worst case scenario: each side attempts to failover the other's resource
• Better scenario: neither side does anything
(But then, why do we have a cluster?)
• Best scenario: one side is able to guarantee that the other is down
• Fencing is about moving from an UNKNOWN state to a KNOWN state
21
Architecture options for SAP on Azure
File system
• BYO SUSE cluster
• ANF
• NFS (future)
Az availability
options
• Av Set
• Av Zone
Fencing agent
• SBD
• Azure Fencing agent (future)
22
Floating IP: Two Basic Architectures Possible
Hana
1
Hana
2
PAS
Azure Load
Balancer
Hana
1
Hana
2
PAS
Floating IP
23
Let’s look at the first case
Hana
1
Hana
2
PAS
Floating IP
Hana
System
Replication
Sr_takeover
• “Floating” IP can be moved from
one machine to another via
API/CLI
• IP Move takes approximately 2
minutes
25
Hana
1
Hana
2
PAS
Floating IP
Hana
System
Replication
Sr_takeover
Azure Load
Balancer
Health
Probe
Health
Probe
26
SOCAT & Virtual IP Network Resource
sudo crm configure primitive rsc_ip_HN1_HDB03 ocf:heartbeat:IPaddr2 \ meta target-
role="Started" is-managed="true" \ operations \$id="rsc_ip_HN1_HDB03-operations" \
op monitor interval="10s" timeout="20s" \ params ip="10.0.0.13"
sudo crm configure primitive rsc_nc_HN1_HDB03 anything \ params
binfile="/usr/bin/socat" cmdline_options="-U TCP-
LISTEN:62503,backlog=10,fork,reuseaddr /dev/null" \ op monitor timeout=20s
interval=10 depth=0
sudo crm configure group g_ip_HN1_HDB03 rsc_ip_HN1_HDB03 rsc_nc_HN1_HDB03
28
Unplanned Failover
Several mechanisms for testing:
• Shut down machine from Azure
portal
• ps aux | grep sbd, kill inquisitor
• service pacemaker stop
29
Takeaways
Read the Documentation
Setup & test your configuration and
keep testing
Understand the operations
Monitoring & Alerts
30
Resources
Links to documentation
https://documentation.suse.com/sbp/all/
Links to automation
https://github.com/SUSE/ha-sap-
terraform-deployments
Training & certification
https://training.suse.com/training/sap/
Azure training & certifications
[TUT-1226]
SAP HA on SUSE: All you need to know
[TUT-1396]
"Day 2" Operations of SAP HANA Cluster using SUSE High Availability on Public Cloud
[HOL-1064]
SAP HANA scale-out with high availability NFS using DRBD
[BP-1351]
SUSE High Availability for SAP HANA: Tales from the real world, tips, tricks, & troubleshooting
[HOL-1225]
High Availability for SAP application servers using ENSA2 enqueue replication.