Conducting a HANA Symphony - SUSECON · Conducting a HANA Symphony SAP HANA Scale-Out Automation...

Preview:

Citation preview

Conducting a HANA SymphonySAP HANA Scale-Out Automation

Fabian Herschel Peter SchinaglSenior Architect SAP LinuxLab Senior Architect

fabian.herschel@suse.com peter.schinagl@suse.com

2

Agenda

SUSE Linux Enterprise Server Overview

Automate SAP HANA System Replication

SAPHanaSR Scale-Out

Scenarios and Use-Cases

3

SUSE Linux Enterprise Server for SAP Applications

- Evolution -

SLES

• SAP's development platform

• Certified• SAP specific

packages• Priority

Support for SAP

2006SLES for

BAiO Fast Start

• Installation Wizard

• Hardware bundles

2008SLES for

SAP Enterprise Search

• Installation Framework with Cluster Support

• ESPOS• SLES 10 SP3

2010SLES for

SAP Applications

• Installation Framework supports generic SAP installations

• SLES 11 SP1

2011SLES for

SAP Applications

• SAP HA certification

• ClamSAP• SLES 11 SP2

2012SLES for

SAP Applications

• SAP BOne HANA

• SLES 11 SP3

20131999 2014

• SAP HANA System-replication automation

• SAP HANA security

SLES forSAP

Applications

4

Simplify Linux for SAP WorkloadsSUSE Linux Enterprise Server for SAP Applications 11

Reliable, Scalable and Secure Operating System

SUSE Linux Enterprise Server

High AvailabilitySAP NetWeaver & SAP HANA

Page Cache

Management

AntivirusClamSAP

SAP HANASecurity

SimplifiedOperations

Management

InstallationWizard

Faster Installation

Extended Service Pack Support18 Month Grace Period

24x7 Priority Support for SAP

24x7 Priority Support for SAP

SAP HANA HA

Resource Agent

5

The recommended and supported operating system for SAP HANA

SUSE Linux Enterprise Server

6

Example Hardware Evolution

2 Sockets 4 Sockets 8 Sockets

2 Cores Up to 8 Cores Up to 15 Cores

No HT HT HT

= 4 CPUs = 32 CPUs = 120 CPUs

= 64 HT CPUs = 240 HT CPUs

8

#1 Platform for SAP HANASUSE Linux Enterprise Server for SAP Applications

Automate SAP HANA System Replication

10

Disaster recovery (DR)between Datacenter

HighAvailability (HA) per Datacenter

SAP HANA Business Continuity

HWSAP

Business Continuity

SAP HANA Host Auto Failover(scale out with standby)

SAP HANA System Replication SAP HANA System Replication

SAP HANA Storage Replication

SAP

HW

SAP

11

SAP HANA SystemReplication

“sr_takeover” is a Manual process

Automate SAP HANA System Replication

12

SAP HANA SystemReplication

Automates“sr_takeover”

Automate SAP HANA System Replication

SUSE High Availability Solution

13

Automate SAP HANA System Replication

SUSE High Availability Solution

SAP HANA SystemReplication

Improves

Service Level Agreement

14

SAP HANA System ReplicationPowered by SUSE High Availability Solution

resource failover

active / active

node 1 node 2

N M

A B

N M

A B

HANADatabase

HANAmemory-preloadA B

SystemReplication

HANA PR1primary

HANA PR1secondary

Performance optimized Secondary system completely used for the preparation of a possible take-over Resources used for data pre-load on Secondary Take-overs and Performance Ramp shortened maximally

15

From Concept to ImplementationSUSE High Availability Solution for SAP HANA

SAP HANAPrimary

SAP HANASecondary

vIP

SAPHana Master/Slave ResourceMaster Slave

SAPHanaTopology Clone Resource

Clone Clone

suse01 suse02

Cluster Communication

Fencing

16

Four Steps to Install and Configure

Install SAP HANA

Configure SAP HANA System Replication

Install and initialize SUSE Cluster

Configure SR Automation using HAWK wizard

17

SAPHanaSR HAWK Wizard

18

What is the Delivery?SUSE Linux Enterprise Server for SAP Applications

The package SAPHanaSR● the two resource agents

● SAPHanaTopology● SAPHana

● HAWK setup Wizard (as technical preview)

The package SAPHanaSR-doc● the important SetupGuide

Use-cases

Allowed Scenarios

• Scale-Up performance-optimized (syncron =>)A => B

• Scale-Up in a chain or multi tier (asyncron ->)A => B -> C

• Scale-Up in a cost-optimized scenario (+)A => B + Q

• Scale Up in a mixed scenario A => B -> C + Q

• Now all with multi tenancy (%) - here cost optimized%A => %B + %Q

Single-tier System Replication

Pacemaker

System Replication

node 1 node 2

SAP HANAPR1primary

SAP HANADEV / PR1 secondary

SystemPR1

vIP

SystemPR1

Performance optimized ( A => B )● Secondary system completely used for the preparation of a possible take-over ● Resources used for data pre-load on Secondary● Take-overs and Performance Ramp shortened maximallyon-prod usage load

starting with version 0.149

Single-tier System Replication and DEV / QAS

Pacemaker

System Replication

node 1 node 2

SAP HANAPR1primary

SAP HANADEV / PR1 secondary

SystemPR1

vIP

SystemPR1

SystemDEV

Cost optimized ( A => B + C )●Operating non-prod systems on Secondary●Resources freed (no data pre-load) to be offered to one or more non-prod installations●During take-over the non-prod operation has to be ended●Take-over performance similar to cold start-up●Needs another disk stack for non-prod usage load

Multi Tier System Replication – Cascading Systems ( A => B -> C )

Datacenter Datacenter

asyncronsyncron

Production Local standbywith data preload

Remote standby systemwith or without preload(mixed usage with non-prod.)

Available since SAP HANA SPS7

(Three cascading systems)

Multi Tenancy (MCOD)Synchronizing multiple Databases within one System Replication

Multiple Components One Database (MCOD)Performance optimized %A => %BCost optimized %A => %B -> %CMulti tier %A => %B + %Q

Pacemaker

System Replication

node 1 node 2

SAP HANAPR1primary

SAP HANAPR1secondary

SystemPR1

vIP

SystemPR1

beginning with version 0.151

Sys

A B

Tenants are databases within the SAP HANA database systemSystem replication only replicates the complete database

Sys

A B

Actual development

Available with SP1

SAP HANA Scale-Out

Pacemaker

System Replication

Cluster 1

vIP

Cluster 2

A SAP HANA scale-out database consists of multiple nodes and SAP HANA instances.

Each worker(l) node has it's own data partition.

Standby(l) nodes do not have a data partition.

SAPHanaSR for Scale-Out

37

SAP HANA Scale-Out ExplainedWorker and Standby Nodes

A SAP HANA scale-out database consists of multiple nodes and SAP HANA instances.

Each worker node has it's own data partition.

Standby nodes do not have a data partition.

38

SAP HANA Scale-Out ExplainedMaster and Slave Nodes

A SAP HANA scale-out database consists of several services such as master nameserver (M).

The active master nameserver takes all client connections and redirect the client to the proper worker node. It always has data partition 1.

Master candidates could be worker or standby nodes.

Typically there are 3 nodes which could get active master name-server

39

SAP HANA Scale-Out – Worker FailureFailing Worker Node or Instance

If a normal worker node failed, client could still connect to the SAP HANA database.

However answers which needs data from the failed node could not be processed.

SAP HA tries to repair this situation using a standby node.

40

SAP HANA Scale-Out – Worker FailureFailing Worker Node

First of all the SAP HANA HA storage API must guarantee, that the old node does not longer have access to the data (SAP STONITH).

After the data partition is “free” the failover could be processed.

41

SAP HANA Scale-Out – Worker FailureFailing Worker Node

Any available standby node could take the “lost” data partition.

The standby node is now a worker node and loads the data.

The active master nameserver will now redirect clients to the new node.

The old worker will be a standby once again available.

42

Summary

SAPHanaSR detects all failovers of worker nodes.

SAPHanaSR checks the over-all landscape status of the SAP HANA database.

SAPHanaSR “follows” the decision of the SAP HA and checks, if the failover is successful.

SAP HANA Scale-Out – Worker FailureFailing Worker Node

43

SAP HANA Scale-Out – Master FailureFailing Master Node

The active master nameserver is failing. All client connections are blocked.

As the active master nameserver is also a worker node SAP HA needs to failover the active master role including the worker part.

44

The data partition 1 needs to be released (SAP STONITH).

One of the master nameserver candidates try to failover the active master nameserver role.

In best this should be a standby node, because otherwise it´s data partion would be need to failover.

SAP HANA Scale-Out – Master FailureFailing Master Node

45

One of the master name server candidates wins, mounts the data partition 1 and loads the data.

In the SAP HANA landscape this new node is shown as active master nameserver.

SAP HANA Scale-Out – Master FailureFailing Master Node

46

Summary

SAPHanaSR detects the failover of the active master nameserver and migrates the virtual IP address to that node.

SAPHanaSR allows clients to process a transparent reconnect and do not need to be configured for multiple access addresses.

SAPHanaSR enables also high availability for software which is not able to connect to different IP addresses.

SAP HANA Scale-Out – Master FailureFailing Master Node

47

SAP HANA Scale-Out – Standby FailureFailing Standby Node or Instance

A SAP HANA standby fails. It could be either a master nameserver candidate or a “plain” standby.

SAP HA does typically not repair this situation.

The running SAP HANA database is not directly influenced, but the HA capacity of the site gets degraded.

48

SAP HANA Scale-Out – Standby FailureFailing Standby Node

Summary

SAPHanaSR detects the outage of the SAP HANA standby node or instance.

SAPHanaSR restarts the failed SAP HANA standby instance, if the node is still part of the pacemaker cluster or rejoining the cluster.

SAPHanaSR takes care of the SAP HA failover “capacity” and increases the build-in SAP high availability.

SAPHanaSR checks, if the situation allows the restart of the standby or not.

49

SAP HANA Scale-Out System replicationScale Out with System Replication (SR)

A Scale-Out SR scenario consists of two SAP HANA Scale-Out database systems

50

SAP HANA Scale-Out System replicationNodes and Services

Synchronisation of a Scale-Out is done pairwise by all worker nodes and services like tenants.

system replication status SOK

51

SAP HANA Scale-Out System replicationFailing Synchronization

Each single replication could failSAPHanaSR detects such failures and excludes the secondary from site takeover.

SFAIL

52

SAP HANA Scale-Out System replicationFailing Primary

SOK

SAPHanaSR detects the failing primary. Depending on the configuration and the system replication status a takeover is processed

53

SAP HANA Scale-Out System replicationFailing Primary

SOK → SFAIL

SAPHanaSR processes the takeover to the secondary site and switches the virtual IP address so clients could transparently reconenct.

54

SAP HANA Scale-Out System replicationFailing Primary

SFAIL → SOK

SAPHanaSR could process a registration of the failed primary, depending on the configuration and checks if the new SR pair gets in sync.

55

SAP HANA Scale-Out System replicationFailing Secondary

SOK → SFAIL

SAPHanaSR detects failing secondary sites and handels the tracking of the system replication status to prevent sub-optimal takeovers.

56

SAP HANA Scale-Out System replicationFailing Secondary

SFAIL → SOK

SAPHanaSR processes the restart of the secondary site and checks the system replication status to allow optimal takeovers.

57

SAPHanaSR Scale-Out ConductingTypical Failures and Reactions

Failure SAPHanaSR

Worker fails -node or instance

SAP HA processes failover. If SAP HA fails, SAPHanaSR processes a takeover or restart.

Active master nameserver fails -node or instance

Like the worker failure. In addition SAPHanaSR migrates the virtual IP address to the new active master nameserver.

Standby fails -node or instance

SAPHanaSR processes a instance restart to re-establish the full SAP HA capacity.

Primary site fails SAPHanaSR processes a takeover on secondary or restart of the failed primary depending on configuration and system replication status.

Standby site fails SAPHanaSR processes a database system restart to re-establish SAP HANA system replication.

58

SUSE SAPHanaSR in 3 Facts

Reduces complexity- provides a wizard for easy configuration with just SID, instance number and IP address- automates the sr-takeover and IP failover ("bind")

Reduces risk- includes always a consistent picture of the SAP HANA topology- provides a choice for automatic registrations and site takeover preference

Increases reliability- provides short takeover times in special for table preload scenarios- includes the monitoring of the system replication status to increase data consistency

Thank you.

59

Start your SAPHanaSR projecttoday and visitwww.suse.com/products/sles-for-sap/

Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany

+49 911 740 53 0 (Worldwide)www.suse.com

Join us on:www.opensuse.org

60

Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

Recommended