Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata Maximum Availability ArchitectureBest Practices and Recommendations
Michael NowakMAA Solutions ArchitectOracle Server Technologies
Eric BezilleChief Technologist Oracle Cloud InfrastructureOracle France
October 23, 2018
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Exadata & Maximum Availability Architecture (MAA) Overview
Exadata MAA: New Features and Best Practices *
Exadata MAA: Sneak Peek Into Future Best Practices and Features
Customer HA Success Stories from Oracle France
1
2
3
4
BONUS!
* Includes a sampling of some new lifecyle operations best practices and expectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata MAA: Overview
5
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Cost of Downtime
https://devops.com/real-cost-downtime/
For the Fortune 1000, the average total cost of unplanned application downtime per year is $1.25 billion to $2.5 billion.
The average hourly cost of an infrastructure failure is $100,000 per hour. The average cost of a critical application failure per hour is $500,000 to $1 million.
BA lost 80 million pounds : https://www.reuters.com/article/us-iag-ceo/british-airways-ceo-puts-cost-of-recent-it-outage-at-80-million-pounds-idUSKBN1961H2
In addition to the monetary value, Company reputation & Customer Loyalty are affected.
Example from CNBC today, 10/23: Amazon's move off Oracle caused Prime Day outage in big Ohio warehouse, internal report says
6
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Maximum Availability Architecture (MAA)
• Applying 25+ years of lessons learned in solving toughest HA problems around the world
• Solutions to reduce downtime for planned & unplanned outages for Enterprise customers with most demanding workloads and requirements
• Service level oriented architectures
• MAA integrated Engineered Systems and Cloud
• Continuous feedback into products and Cloud
• Books, white papers, blueprints
7
High Availability, Disaster Recovery and Data Protection
Production Copy
DatabaseReplication
R
https://oracle.com/goto/maa
Protect your Data
Maintain your Service Level
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Zero Data LossDR to the Cloud Use CaseZero Downtime
RACZero Data Loss Backup to the Cloud Use Case
Prod/Departmental
Business Critical
Dev, Test, Prod
Mission Critical
Backup and Recovery
Bronze +
Zero DowntimeHigh Availability
Oracle MAA Availability TiersAvailability Service Levels for Unplanned and Planned Maintenance
8
Silver +
Zero Data LossHA and DR
GOLD
BRONZESILVER
PLATINUM
http://www.oracle.com/technetwork/database/availability/maa-reference-architectures-2244929.pdf
Zero Downtime Golden Gate Cloud Svc.
Gold +
Zero Downtime Maintenance / Migration
Local & Remote Backups
Bronze +
Active/Active Database Clustering + Backup & Recovery
Silver +
Remote Replication,Zero Data Loss,Reduced Downtime
Gold +
Advanced Capabilities for Zero Application Outages and Zero Data Loss
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Redundant Database Servers
– Active-Active highly available clustered servers
– Hot-swappable power supplies and fans
– Redundant power distribution units
– Integrated HA software/firmware stack
• Redundant Network
– Redundant 40Gb/s IB connections and switches
– Client access using HA bonded networks
– Integrated HA software/firmware stack
• Redundant Storage Grid
– Data mirrored across storage servers
– Redundant, non-blocking I/O paths
– Integrated HA software/firmware stack
Exadata: Built-in High Availability
9
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Happy Birthday Exadata MAA
10
10 Years, Countless HA Features and Best Practices, World Class HAhttps://www.oracle.com/technetwork/database/features/availability/exadata-maa-best-practices-155385.html
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
High Availability for Maximum Application Uptime
“Exadata and SuperClusterboth achieve AL4 fault
tolerance in a Maximum Availability Architecture
configuration”
FIVE NINES
5X999.999%
A New Gold Standard
11
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata MAA Evolution
On-Premises
On-Premises Exadata
Database / ExadataCloud
Autonomous Database
12
• Infrastructure Management
• Architecture• Configuration, Tuning• Database Management• Lifecycle Operations• Application Performance
• Blueprints• Feedback to
products & features
• Infrastructure Management
• Architecture• Database Management• Configuration, Tuning• Lifecycle operations• Application Performance
• Blueprints• Exadata is the best
integrated MAA DB platform
• Architecture• Database Management (Tooling)• Configuration, Tuning • Lifecycle Operations (Tooling)• Application Performance
• Oracle owns and manages the best integrated MAA DB platform
• Cloud automation for provisioning and life cycle operations
• Choosing the SLA policy• Application performance
• Oracle owns and manages Infrastructure
• Policy driven deployments
• MAA Integrated cloud• Fully automated Self-
Driving, Self-Securing, Self-Repairing Database
CustomerOracle
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata MAA: New Features and Best Practices
13
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Smart Handshake For Storage Server Shutdown
14
• Clear communication to the diskmon process on the database servers when storage is shutdown prevents errors and application blackouts.
Your service level will smile!
Database Tier
Storage Tier
Grid Infrastructure 12c / Exadata 12.1+
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary: Smart Handshake For Storage Server Shutdown
• Graceful database tier handling during storage server shutdown
• Use graceful shutdown procedures.
• Related: Use patchmgr for storage server software updates as it ensures grid disks are handled properly.
• No blackouts when storage tier is shutdown for maintenance
• No false positive errors/alerts
15
Feature Oracle Has Provided
Best Practices You Can Implement (Tips!)
Service Level ImpactExpectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Database Tier IO Cancel
16
Database Tier
Storage Tier
Slow IO ?
Hung IO ?
Sick disk ?
Undiscovered hardware / software issue?
Cell IO Latency Capping
IO Hang detection / repair
Disk confinement
Database Tier IO Latency Capping
?IOs are PumpingIOs are PumpingIOs are PumpingIOs are PumpingIOs are Pumping
Oracle Grid Infrastructure & Database 18c
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary: Database Tier IO Cancel
• Protection from uncommon storage tier stalls/hangs
• Nothing! Completely transparent.
• Stable service level achieved through IO redirection on stalls/hangs
17
Feature Oracle Has Provided
Best Practices You Can Implement (Tips!)
Service Level ImpactExpectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 18
Database Buffer Cache
Cell with primary mirror populated in super low latency DRAM cache
Cell with secondary mirror populated in low latency flash cache
Cell with tertiary mirror populated on high latency hard disk
DBWR evicts this buffer while freeing up space in
the buffer cache
Smart OLTP CachingOracle Grid Infrastructure 19c. Under the covers of this quarter rack with high redundancy
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 19
Smart OLTP Caching
X
• SaaS application reading data from the primary mirror
• Storage failure on cell containing primary mirror
• No problem, just retrieve data from the secondary mirror on flash with low latency
• The tertiary mirror continues to provide protection just in case its one of those days
• After the storage failure is repaired and the cell caching state is deemed healthy again, return to the primary mirror
Maintaining SLAs During Storage Failures
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary: Smart OLTP Caching
• Low latency IO even after storage failures
• Smart access to repaired storage only after it is fully cached
• Nothing! This feature works automatically and transparently
• No cache misses on storage failure or repair = no performance related service level interruptions
20
Feature Oracle Has Provided
Best Practices You Can Implement (Tips!)
Service Level ImpactExpectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Dynamic HugePages
21
Buffer Cache with hugepages
Operating System
sysctl –w …..
alter system set ..…
18c and Lower
Buffer Cache with hugepages
Operating System
sysctl –w …..
alter system set ..…
19c
DISM background
Use oedacli
Oracle Database 19c
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary: Dynamic HugePages
• Dynamic Hugepages remove the need for manual configuration
• Oedacli automates many Exadata lifecycle operations including database drop/add
• In 18c and lower, set use_large_pages=ONLY
• In 19c, set use_large_pages=AUTO_ONLY
• Use oedacli for lifecycle operations including dbcreation
• Stable service level achieved through proper use of hugepages.
22
Feature Oracle Has Provided
Best Practices You Can Implement (Tips!)
Service Level ImpactExpectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Smart Rebalance For High Redundancy Diskgroups
• Problem: Rebalance runs out of space after disk failure (ORA-15041)
• Solution for 18c and lower
– Run exachk which reports on compliance to our MAA best practice
• Solution for 19c with high redundancy diskgroups
– Smart rebalance - no need for free space!
– If there is not enough space to rebalance at the time of failure, offline the disk
– Upon replacement, efficiently repopulate it from partner disks automatically
23
15% free with a normal or high redundancy diskgroup having < 5 Exadata cells and GI versions 12.2 and 18c
0% free with 19c high redundancy diskgroup.
Like they say in New Jersey at the gas station “Fill er up!”
9% free with a normal or high redundancy diskgroup having 5 or more Exadata cellsand GI versions 12.2 and 18c
Oracle Grid Infrastructure 19c
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary: Smart Rebalance For High Redundancy Diskgroups
• Elimination of need to reserve free space for rebalance when using high redundancy
• Use high redundancy diskgroups
• Use MAA Exadata best practice power limit of 4.
• If desired, ASM REPLACE DISK issued by Exadata auto disk management can be monitored in gv$asm_operation
• High redundancy and seamless repair without risk of out of space errors = no service level impact
24
Feature Oracle Has Provided
Best Practices You Can Implement (Tips!)
Service Level ImpactExpectations
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata OVM Best Practices
• Out of scope for today but see our recently updated recommendations available here (https://www.oracle.com/technetwork/database/availability/exadata-
ovm-2795225.pdf ) covering the following:
– Use Cases
– Exadata OVM Software Requirements
– Exadata Isolation Considerations
– Exadata OVM Sizing and Prerequisites
– Exadata OVM Deployment Overview
– Exadata OVM Administration and Operational Life Cycle
–Migration, HA, Backup/Restore, Upgrading/Patching
–Monitoring, Resource Management
25
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 26
Exadata: MAA Exadata Lifecycle Operations
Software Maintenance
Compute Elasticity
Storage Elasticity
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 27
Exadata: MAA Exadata Lifecycle Operations
Software Maintenance
Compute Elasticity
Storage Elasticity
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
VCPU video here
28
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29
Exadata: MAA Exadata Lifecycle Operations
Software Maintenance
Compute Elasticity
Storage Elasticity
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 30
Recommended Update Schedule
Software Maintenance
Frequency Database / Grid Exadata
3-12 months Release Update (RU) Sustaining Release
1-4 years Annual Feature Release Feature Release
• All software maintenance for Exadata MOS 888828.1
• Quality maintenance readiness with Exachk
• Version recommendation
• Critical issues exposure report
Late-breaking issues - MOS Alerts for Hot Topics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |Confidential – Oracle Internal/Restricted/Highly
31
RHP Here
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 32
Exadata: MAA Exadata Lifecycle Operations
Software Maintenance
Compute Elasticity
Storage Elasticity
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Drop/Add Cells and Diskgroup Resizing
33
Service Level Impact Expectations
• Use Exadata best practice default asm_power_limit of 4 (total across clusters)
• For drop cell, follow MAA best practice for space reserved to restore redundancy during rebalance (ie avoid ORA-15041)
• Keep the # of diskgroups per cluster to a minimum (ex: DATA and RECO) both for simplicity and to avoid rebalances getting queued (only one rebalance can run per db node at a time)
• Leverage oedacli to simplify/automate process
• Run exachk
• Zero to low impact because data cached in original cell’s flash cache prior to operation is proactively cached in new cell’s flash cache during rebalance.
Best Practices (Tips!)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Real-world MAA Project Example – Add Cell
34
Cell single Block Read Histogram from Exadata AWR Report
Baseline Single Block Read Histogram
Add Cell Single Block Read Histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata MAA Solution Integration
35
On-Premises Exadata
All Exadata MAA configuration best practices baked in
Exadata MAA operational best practices implemented by customer
All Exadata MAA configuration best practices baked in
All Exadata MAA operational best practices baked in
Exadata Cloud / Autonomous DB
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Exadata MAA: Sneak Peek Into Future Best Practices and Features
36
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Customer HA Success Stories from Oracle France
37
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle DB
High EndSAN Storage
SAN
Front End Network
4x 32 cores
396GO RAM
Oracle DB RAC Nodes
BPELFront End
Servers• Customer Context
– Critical BPEL Application to manage 10+ Millions Devices deployment (growing), high level of Database IO load
– Deadline for the next Device deployment at high risk
– High SLA 24/7
Utilities Customer
0
5
10
15
20
25
30
35
40
Devices (Millions)
Devices (Millions)
x4 Times
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
0
5
10
15
20
25
30
35
40
Devices (Millions)
Devices (Millions)
Front End Network
Oracle DB
High EndSAN Storage
SAN
4x 32 cores
396GO RAM
Oracle DB RAC Nodes
BPELFront End
Servers• Customer Context
– Critical BPEL Application to manage 10+ Millions Devices deployment, high level of Database IO load
– Deadline for the next Device deployment at high risk
– High SLA 24/7
• Solution
– Replace existing High-end x86 servers + SAN Storage by Exadata
– 1 Production Exadata, 1 DRP Exadata, 1x Dev/Integration, 1x Performance tests
– Configuration : 5x Computes Nodes, 3 Extreme Flash, 3x High Capacity, High Redundancy
– Active Dataguard
39
Utilities Customer
x4 Times
OSB
BPEL
BPEL
BPEL
5 x (48 cores -768 GO RAM) Oracle DB RAC Nodes
EF
EF
EF
infi
nib
and
R.EXT.
HC
HC
HC
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Customer Context
– Critical BPEL Application to manage 10+ Millions Devices deployment, high level of Database IO load
– Deadline for the next Device deployment at high risk
– High SLA 24/7
• Solution
– Replace existing High-end x86 servers + SAN Storage by Exadata
– 1 Production Exadata, 1 DRP Exadata, 1x Dev/Integration, 1x Performance tests
– Configuration : 5x Computes Nodes, 3 Extreme Flash, 3x High Capacity, High Redundancy
– Active Dataguard
40
• Results
– Deployment in 8 weeks
– Results in improvement of Database demonstrated bottlenecks at Application layer => moved from virtualized Front-end to Baremetal
– Implemented Backup to FRA : HC = 4 GB/sec, EF = 12 GB/sec
– Tested against Hyperconverged: Exadata deliver x2 to x11 better perf
– Customer sleeps well
Utilities Customer
OSB
BPEL
BPEL
BPEL
5 x (48 cores -768 GO RAM) Oracle DB RAC Nodes
EF
EF
EF
infi
nib
and
R.EXT.
HC
HC
HC
Front End Network
BPELFront End
Servers
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Architecture
> 50 kmOSB
BPEL
BPEL
BPEL
5 x (48 cores -768 GO RAM) Oracle DB RAC Nodes
44 TO Flash (EF)106 TO Disques (HC)
+Active Dataguard
PRE-PROD / Local failover
Backup
PERF. /TESTS
EF
EF
EF
OSB
BPEL
BPEL
BPEL
EF
EFEF
Storage Cells
ExtremFlashTriple Mirror
infi
nib
and
PROD
OSB
BPEL
BPEL
BPEL
EF
EF
DRP
Bakup
EF
EF
R.EXT.
R.EXT.R.EXT.
Storage Cells High Capacity
Triple Mirror
HC
HC
HC
HC
HC
HC
HC
HCHCHC
HC
HC
OSB
BPEL
BPEL
BPEL
EF
EF
EF
R.EXT.
Local FRABackup
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Customer Context
– Securing SAP ERP Database Backend
– Providing scalability and performances for the Enterprise DWH
– Securing POS Database Backend for the new retail shop 24/7
– Provide capabilities to evolve to Database as a Service solution gradualy
• Solution
– Dual Datacenter deployment
– Upgrade their existing Exadata deployment to support full rolling upgrade
– Introduce virtualization
– Implement a 5 days backup in FRA
– Configuration : 3x Computes Nodes,3x High Capacity, High Redundancy
43
• Results
– Improvement of patching procedure
– Improvement in ability to restore very quickly
– Ability to segregate workloads and options
– Customer have room for growth
Fashion Customer
CN
CN
CN
HC
HC
HC
CN
CN
HC
HC
HC
Production DRP / Non Prod
Prod1
DR Test
Prod2