Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Data Protection, Recovery, and HA for Private Cloud Deployments Lawrence To Sr. Director, MAA Development Oracle High Availability Systems
Joseph Meeks Sr. Director, Product Management Oracle High Availability Systems
Seungtaek Lee Principal Engineer CI-TEC Samsung
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Data Protection and Oracle MAA
Bronze and Silver Data Protection
Gold and Platinum Data Protection
Data Protection as a Service
Samsung MAA Architecture for Private Cloud
1
2
3
4
5
2
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
A Sky-is-Blue Statement
3
High Availability
Data Protection
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Inadequate Data Protection = Downtime
U.S. State Government – SAN memory failure, problem mirrored to standby SAN.
European Cloud Infrastructure Provider – Storage array failed, unable to read tape backups used for DR
5-day outage
Global Specialty Retailer – Disk failure, followed by mirrored disk failure. Restore from local
backup failed. Restore using copy at DR site also failed.
8-day outage
5-day outage
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle Data Protection and Availability Design Principles
Data Protection at Every Level
Strong Fault Isolation: Real-Time Validation
Real-time HA/DR: All Components Active
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Edition-based Redefinition, Online Redefinition, Data Guard, GoldenGate – Minimal downtime maintenance, upgrades, migrations
Active Data Guard – Data Protection, DR – Query Offload
GoldenGate – Active-active replication – Heterogeneous
Active Replica
RMAN, Oracle Secure Backup, Recovery Appliance – Backup to disk, tape or cloud
Enterprise Manager Cloud Control – Site Guard, Coordinated Site Failover Application Continuity – Application HA Global Data Services – Service Failover / Load Balancing
RAC – Scalability – Server HA
ASM – Local storage
protection
Production Site
Flashback – Human error
correction
Oracle Maximum Availability Architecture (MAA)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle MAA Availability Tiers Availability Service Levels for Unplanned and Planned Outages
BRONZE
SILVER
GOLD • Comprehensive HA and Disaster Protection • Recovery in seconds with zero or near-zero data loss
• High Availability (HA) for Recoverable Local Outages • Backups plus redo for Oracle data protection
• Basic Service Restart • Backups plus redo for Oracle data protection
PLATINUM • Zero Outage for Platinum Ready Applications • Zero data loss
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Reference Architectures Oracle MAA Availability Tiers
BRONZE
SILVER
GOLD
PLATINUM
Single Instance
Replication
Backups
Platinum-Ready Apps
Clusters
Backups
Clusters
Clusters and Replication
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Data Protection and Oracle MAA
Bronze and Silver Data Protection
Gold and Platinum Data Protection
Data Protection as a Service
Samsung MAA Architecture for Private Cloud
1
2
3
4
5
9
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Problem Statement: Lack of Intelligent Data Validation
Data can be corrupted anywhere and anytime… and can be undetected unless touched Checksum is not sufficient
Backups and DR without validation is enormous risk Do not guarantee working or meeting recovery SLAs
Validation is helpful everywhere: I/O, memory, storage, Oracle data block, inter-block, database and application
10
How do we know restore and recovery will succeed? Is my mirrored copy corrupt too? Can I achieve recovery SLAs?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Third Party Backups lacks data protection and validation
• A backup is meaningless if it does not result in successful recovery • Recover operation can fail IF:
– Backup script is incorrect or incomplete (e.g. missing data files, archives, control files) – Backup operation is incorrect (e.g. online backups without database in backup mode) – Backups are corrupted (from source, from storage or media)
• Most backup appliances do not have ongoing checks and validations • Reality: The inability to recover successfully results in extended
downtime, lost revenue, damaged reputations…and career changes.
11
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
A Story about Backup and Recovery...
• The story starts out great. • Backups completed
successfully with no errors. • Everything appears fine.
12
RMAN-03091: Finished backup at 11-SEP-14 RMAN> Recovery Manager complete
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Then a Failure Occurs...
I hope this works....
13
Restore and Recovery is Required
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 14
Missing Data File or Archive Gap Recovery Gone Wrong - Example 1
RMAN-03002: failure of restore command at 09/11/2014 07:48:13 RMAN-06026: some targets not found – aborting restore RMAN-06023: no backup or copy of datafile 4 found to restore
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 15
Backups are corrupt, likely due to corrupt backup media Recovery gone wrong example 2
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /SHARED1/ORADATA/DBF/orcloow1/system01.dbf
channel ORA_DISK_1: reading from backup piece /SHARED1/ORADATA/FRA/Orpiafbb_1_1ORCL00W1 channel ORA_DISK_1: ORA-19870: error while restoring backup piece /SHARED1/ORADATA/FRA/Orpiafbb_1_1O...
ORA-19612: datafile 1 not restored due to missing or corrupt data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Bronze and Silver Data Protection
We can do a much better job preventing and repairing corruptions in real time.
I’d like to know that my backups are validated when they are created, and on a regular basis to make sure they are good. I want to be alerted whenever a database can NOT meet my recovery SLAs.
16
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Capability Physical Block Corruption Logical Block Corruption
Dbverify, Analyze Physical block checks Logical checks for intra-block and
inter-object consistency
RMAN, ASM Physical block checks Intra-block logical checks
Database In-memory block and redo checksum In-memory intra-block checks
ASM Automatic corruption detection and repair using extent pairs
Exadata HARD checks on write, automatic disk scrubbing and repair HARD checks on write
Bronze - Single Instance Oracle Database (MOS 1302539.1) Oracle Data Protection
Runt
ime
Man
ual
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MAA: Data Protection for All Databases
18
Validation, Detection and Repair in Memory, during I/O and on Disk
DB_BLOCK_CHECKSUM=FULL (Optional DB_BLOCK_CHECKING) • Compute checksum on change and catches corruptions in memory • Validate checksum on read and update (DETECTION) • Prevents corrupted block to be written to disk (PREVENTION) • Recover using good data block and redo (REPAIR) Automatic Storage Management • Data Corruption or I/O error triggers repair (DETECTION/REPAIR) • Oracle semantics aware • Reads extent copies for good copy (PREVENTION of ERROR) • Good writes can correct existing corruptions (REPAIR) Exadata HARD and Automatic Disk Scrub and Repair • Prevents physical corruption during writes (OS to storage) (PREVENTION) • Inspects and repairs hard disk corruption that resides on storage (DETECTION) • Calls ASM to repair using good extent copy (REPAIR)
Bad SCN
Good SCN
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 19
With ASM Redundancy Corruption detection, Mirror Read, and Automatic Repair
Database update
encounters corruption
Database reads ASM mirror copy and repairs corruption
Oracle logs the following for the administrator: Corrupt block relative dba: 0x16400087 (file 89, block 135)
Bad check value found during multiblock buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x16400087
last change scn: 0x0000.b6702b33 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x2b330601
check value in block header: 0xa07a
computed block checksum: 0x3
Reading datafile '+DATA/qs/datafile/c.257.825768683' for corruption at rdba: 0x16400087 (file 89, block 135)
Read datafile mirror ‘DATA_CD_08_CELL13' (file 89, block 135) found same corrupt data (no logical check)
Read datafile mirror ‘DATA_CD_07_CELL14' (file 89, block 135) found valid data
Hex dump of (file 89, block 135) in trace file /u01/app/oracle/diag/… /qs1_ora_60475.trc
Repaired corruption at (file 89, block 135)
continues to run without ever noticing the failure
Application
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 20
Exadata Disk Scrubbing Combined with ASM Auto Repair I/O Error Prevention
Disk sector goes bad
Cell disk scrub finds bad sector and ASM repairs it
never encounters the I/O error
Application
Oracle logs the following for the administrator:
Wed Jul 16 17:00:06 2014
Begin scrubbing CellDisk:CD_06_cell06.
Begin scrubbing CellDisk:CD_07_cell06.
..
Wed Jul 16 18:33:05 2014
Read Error on Cell Disk CD_06_cell06 (/dev/sdg) at device offset 2794140467200 bytes with size 1048576 bytes (errno: Input/output error [5])
Read Error on Grid Disk RECOC1_CD_06_cell06 at grid disk offset 423268188160 bytes with size 1048576 bytes from disk scrub
Wed Jul 16 18:33:12 2014
Broadcast: 1 events ASM REPAIR diskgroup of opcode 10 for diskgroup RECOC1 to:
...
Finished scrubbing CellDisk:CD_06_cell06, scrubbed blocks (1MB):2860960, found bad blocks:2
Finished scrubbing CellDisk:CD_07_cell06, scrubbed blocks (1MB):2860960, found bad blocks:0
..
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 21
Exadata Hardware Assisted Resilient Disk (HARD) Corruption Prevention with Automatic Retry
Network packet containing
database write is corrupted
Cell prevents write of corrupt block and ASM
retries write
never encounters a corruption
Application
Oracle logs the following for the Administrator:
Cell side:
Thu Sep 11 08:42:33 2014
HARD CHECK FAILED for ftyp=0 blksiz=512 blkno=0checks=1 startblk=33182326784 nblks=16
Database side:
Errors in file /u01/app/oracle/diag/rdbms/qs/qs1/trace/qs1_dbwf_41262.trc:
ORA-27603: Cell storage I/O error, I/O failed on disk
o/192.168.10.29;192.168.10.30/DATAC1_CD_02_CELL7 at offset 151396352
for data length 8192
ORA-27626: Exadata error: 205 (HARD check failed)
WARNING: Write Failed, will retry. group:1 disk:74 AU:36 offset:401408
size:8192
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Bronze and Silver Data Protection
We can do a much better job preventing and repairing corruptions in real time.
I’d like to know that my backups are validated when they are created, and on a regular basis to make sure they are good. I want to be alerted whenever a database can NOT meet my recovery SLAs.
22
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Zero Data Loss Recovery Appliance Overview
23
Delta Store – virtual full backups • Stores validated, compressed changes on disk • Fast restores to any point-in-time using deltas and redo • Built on Exadata scaling and resilience • Enterprise Manager end-to-end control
ZDLRA
Delta Push • Access and send only changes
• Minimal impact on production • Data Guard-like real-time redo ship
instantly protects new transactions
Protected Databases
Protects all Oracle Databases • Petabytes of data, any release • No expensive backup agents
Offloads Tape Backup
Replicates to Remote ZDLRA for disaster recovery
Data Protection for Your Backups, Recovery for Your Business
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• Data validated on receive • Data periodically revalidated • Data validated on restore • Built using MAA practices • ASM auto repair • Exadata HARD checks and
automatic disk scrub/repair • Data validated on receive,
restore, and periodically • ASM and Exadata checks
and repair
• Data validated when copied to and restored from tape
Tape Archive
Remote Replica
Recovery Appliance
24
ZDLRA Understands and Validates Database Formats End-to-end Data Loss Protection from Corruptions
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 25
Policy-Based Database Protection as a Service
Protection Policies • Easy-to-deploy • Standardized • Alerts when not
meeting Recovery SLAs
Platinum and Gold Policy, Mission Critical Disk: 90 days Tape: 2 years RPO: 5 secs
Tape
Silver Policy, Business Critical Disk: 30 days Tape: 45 days RPO: 15 mins
Bronze Policy, Test/Dev Disk: 5 days Tape: 30 days RPO: 1 hour
Replica
Replica ZDLRA also policy based
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26
Single Console for End-to-End Visibility and Control
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27
Immediate Alerts when Recovery Windows are at Risk
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Let’s Summarize BRONZE and SILVER MAA parameters
ASM redundancy
RMAN Backups
ZDLRA so we can count on successful restore when required
Exadata-unique capabilities for the best database protection and availability
28
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Data Protection and Oracle MAA
Bronze and Silver Data Protection
Gold and Platinum Data Protection
Data Protection as a Service
Samsung MAA Architecture for Private Cloud
1
2
3
4
5
29
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Storage Remote Mirroring Architecture Problem: No real-time validation. Corruption and other problems are mirrored
Oracle Instance (in memory)
Primary Database Remote Volumes
SYNC or ASYNC block replication
Data corruptions are replicated • Zero Oracle validation • No Oracle block checks • No database recovery checks • No application validation
30
ORA-01578: ORACLE data block corrupt (file # 27, block # 331214)
000
000
Database Files
Recovery Files
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Capability Physical Block Corruption Logical Block Corruption Dbverify, Analyze Physical block checks Logical checks for intra-block and
inter-object consistency
RMAN, ASM Physical block checks Intra-block logical checks
Active Data Guard
• Continuous physical block checking at standby • Strong isolation to prevent single point of failure • Automatic repair of physical corruptions • Automatic database failover
• Detect lost write corruption, auto shutdown and failover
• Intra-block logical checks at standby
Database In-memory block and redo checksum In-memory intra-block checks
ASM Automatic corruption detection and repair using extent pairs
Exadata HARD checks on write, automatic disk scrub and repair HARD checks on write
Gold and Platinum – Comprehensive Data Protection Oracle Data Protection
Runt
ime
Man
ual
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Active Data Guard Architecture Oracle Aware Process Maintains an Exact Physical Copy of Production
Oracle Instance (in memory)
Primary Database
Oracle Instance (in memory)
SYNC or ASYNC database redo
Active Standby Database open read-only
Redo Apply
Recovery Files
Database Files
32
Data corruption is isolated to primary • Comprehensive run-time validation
• By Data Guard apply • By read-only application workload
• Automatic repair of primary using good copy from standby
000
Automatic block media recovery requested for (file#6, block #8738) Automatic block media recovery successful for (file#6, block #8738)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Primary Database
Far Sync Instance Active Standby Database • Oracle control file and log files
• No database files, no media recovery • Offload transport compression • Supports up to thirty remote
destinations
• DR and reporting instance • Open read-only • Continuous Oracle validation • Zero data loss failover target • Manual or automatic failover
SYNC Limited distance
ASYNC any distance
transport compression over WAN
Active Data Guard Demonstrations Environment: Primary, RAC Far Sync, Active Data Guard Standby
• Production instance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Demonstration: Automatic Repair of Primary Data Corruption
34
Active Data Guard Configuration • Data Guard Broker • Primary Database • Far Sync Instance
– Zero data loss over wan
• Active standby – Workload on both primary
and standby
35
Configuration – DGconfig Protection Mode: MaxAvailability Members: primary – Primary Database farsync – Far sync instance standby – Physical standby database
Primary Workload 13:46:54 175 8 13:46:55 173 8 ... Standby Workload 13:46:54 443 2 13:46:55 446 2 ...
36
Corruption Detected at the Primary Database
Fri Sep 12 13:47:50 2014 Corrupt block relative dba: 0x00002222 (file 6, block 8738 Completely zero block found during multiblock buffer read
37
Automatic Repair, No Application Error, Zero Downtime
Primary Workload 13:47:49 180 6 13:47:50 185 6 13:47:51 174 6 ... Standby Workload 13:47:49 446 1 13:47:50 450 1 13:47:51 455 1 ...
Fri Sep 12 13:47:50 2014 Automatic block media recovery is requested for (file# 6, block 8738) Fri Sep 12 13:47:51 2014 Automatic block media recovery successful for ((file# 6, block 8738)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Demonstration: Automatic Repair of Standby Data Corruption
38
39
Corruption Detection and Auto-Repair at the Standby Fri Sep 12 13:54:33 2014 Corrupt block relative dba: 0x00002222 (file 6, block 8738 Completely zero block found during multiblock buffer read
Fri Sep 12 13:54:33 2014 Automatic block media recovery is requested for (file# 6, block 8738) Fri Sep 12 13:54:34 2014 Automatic block media recovery successful for ((file# 6, block 8738)
Primary Workload 13:54:33 181 5 13:54:34 180 5 ... Standby Workload 13:54:33 450 3 13:54:34 442 3 ...
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Demonstration: HA for Far Sync Instance
40
41
Primary disconnects and reconnects to second RAC Far Sync instance Fri Sep 12 13:57:04 2014 LGWR: Error 1041 disconnecting from dest LOG_ARCHIVE_DEST_2 standby host ‘farsyncp’ LGWR: RFS network connection re-established at host ‘farsyncp’ LGWR: RFS destination opened for reconnect at host ‘farsyncp’
Shutdown abort of the first RAC Far Sync Instance Fri Sep 12 13:57:06 2014 Instance shutdown complete
42
Primary has brief brownout while network connection transitions from one node to the next Data Protection Mode remains at Maximum Availability and RPO=0 is maintained Primary Workload Standby Workload 13:57:03 179 5 13:57:03 441 1 13:57:04 188 5 13:57:04 450 1 13:57:05 27 5 13:57:05 449 1 13:57:06 148 5 13:57:06 438 1 13:57:07 177 5 13:57:07 442 1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Demonstration: Automatic Failover with WAN Zero Data Loss
43
44
With Fast-Start Failover enabled – kill LGWR process on primary to induce failover Oracle-> kill -9 3808
45
Data Guard Observer initiates and completes automatic failover 14:31:03.02 Initiating Fast-Start Failover to database “standby”... Performing failover NOW, please wait... Failover succeeded, new primary is “standby” 14:31:43.29
Total outage last 56 seconds, including detection, failover, and reconnect time Primary Workload Standby Workload 14:30:52 181 3 14:30:52 444 1 14:30:54 0 3 14:30:54 450 1 14:31:06 0 3 14:31:06 0 1 14:31:48 0 3 14:31:49 74 1 14:31:50 122 4 14:31:51 182 1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Let’s Summarize GOLD and PLATINUM MAA parameters + lost_write +
db_block_checking (standby only)
Active Data Guard provides continuous validation
Auto data block repair for primary and standby
Full utilization of standby for queries and reports
Fast database and application failover in seconds
Zero data loss with SYNC (LAN/MAN) or FAR SYNC (WAN)
46
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Data Protection and Oracle MAA
Bronze and Silver Data Protection
Gold and Platinum Data Protection
Data Protection as a Service
Samsung MAA Architecture for Private Cloud
1
2
3
4
5
47
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Cloud Control 12c
• Comprehensive support for all methods of consolidation
• Automated, intelligent placement
• Complete self-service catalog
• Flexible cloning architecture
• Integrated database lifecycle management
• API-driven (RESTful and command line)
Self-Service Provisioning
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49
Comprehensive Database Service Catalog Enterprise Manager Cloud Control 12c
Primary Standbys EM12c R4
SI - SI SI
RAC - RAC SI RAC RAC RON - RON RON
SI – Single Instance RAC – Real Application Clusters RON – RAC One Node
1
2
3
4
5
6
7
• High Availability offerings with Active Data Guard
• Supports Single Instance, RAC One Node, and RAC standby; Multiple standby environments allowed
• Support for Oracle Database 10.2.0.5, 11.1.0.7, 11.2+, 12.1+
• Define your own custom or MAA Service Levels / Metals, and also allow users to upgrade or downgrade across these levels
• Define different Database sizes based on CPU, Memory, Storage, IOPS, etc
BRONZE
SILVER
GOLD
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 50
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 51
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 52
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 53
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 54
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 55
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 56
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 57
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Data Protection and Oracle MAA
Bronze and Silver Data Protection
Gold and Platinum Data Protection
Data Protection as a Service
Samsung MAA Architecture for Private Cloud
1
2
3
4
5
58
59
□ Cloud Architecture before Restart and FSFO
Restart and FSFO in Cloud
Diagram Weak points
- No support for OS level HA solutions
- Only Hypervisor HA is applied
. Does not detect DB crashes
. After a DB Crash, OS Reboot or Hypervisor
failover is needed; DBA must start DB manually
- DR Solution is not applied in case of:
. Hypervisor Pool Down
. Storage or Database file Failure
Hypervisor Hypervisor Hypervisor
DB
Hypervisor Pool
Hypervisor HA
60
□ Architectural Improvement using Oracle Restart and Data Guard Fast Start Failover (FSFO)
Restart and FSFO in Cloud
Hypervisor Hypervisor Hypervisor
Hypervisor HA Restart
DB
Hypervisor Hypervisor Hypervisor
Restart
FSFO
DB
Hypervisor HA : Hypervisor Down
Oracle Restart : DB Crash in 30sec
Oracle FSFO : Hypervisor Down, Hypervisor Pool Down, DB File Corruption, DB Crash over 30sec, OS Reboot
SYNC
61
Restart and FSFO in Cloud
1
1
0.5
10
5
0.5
30
25
20
0 5 10 15 20 25 30 35
Server Down
OS Reboot
DB Down
Failover Duration Time (min)
ServerCloud(without Restart/FSFO)
Restart
Restart/FSFO
Item Server Cloud(As-Was) Only Restart Restart+FSFO
Block Corruption Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Auto Repair
DB File Corruption Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)
Storage Down Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)
Hypervisor Pool Down Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)
FSFO Reduces Recovery Time Significantly !!
Restart and FSFO Reduce Failover Time !!
Restart can reduce detect/start DB time (our assumption: 20 min)
62
□ The Result of Availability Test
Restart and FSFO in Cloud
Category Item Recovery
Time HA Solution Description
OS DB Server Reboot 46sec FSFO Executed Failover to Standby and Standby Reinstatement automatically
Observer Server Reboot 0sec MonObserver.sh Observer Restarted automatically after rebooting
DB LAN Card Fail 44sec FSFO Executed Failover to Standby and Standby Reinstatement automatically
DB DB Instance Crash 26sec Restart DB Instance was restarted automatically
DB Listener Crash 0sec Restart Listener was restarted automatically
GI Stop 39sec FSFO Executed Failover to Standby, but Standby should be reinstated manually
Datafile Write Fail 32sec FSFO Executed Failover to Standby, but Standby should be reinstated manually
Observer Observer Fail 0sec MonObserver.sh Observer Restarted automatically
DG Broker Manual Switch Over 15sec DG Broker Executed Switch Over by DG Broker
Manual Fail Over 15sec DG Broker Executed Failover and Automatic Standby Reinstatement
Hypervisor Live Migration 0sec Hypervisor Migrated to other Hypervisor online
Maximize Availability using Restart and FSFO
MonObserver.sh : Observer Restart Script, registered as a cron job
63
Non Stop Cloud Active Data Center A
Zone
B Z
one
Restart Pods
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
금융 Pool
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Pool
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Pool
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Pool
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Pool
Hypervisor Hypervisor Hypervisor
DB1 DB3 DB2
RAC
Pool
FSFO
C Zone
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
금융 Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
Hypervisor Hypervisor Hypervisor
Restart
DB
Restart
DB
Restart
DB
Pool
FSFO SYNC
SYNC/ASYNC
ADG(Manual)
Restart Pod
Restart Pods RAC Pod
SYNC/ASYNC ………
ADG(Manual)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Key Take Aways • Generic solutions for backup and replication put data
at risk and make recovery uncertain
• Continuous validation using knowledge of Oracle block and redo structures is required – Physical and logical validation – Regularly scheduled background checks – Run-time database checks – Auto repair, transparent to the user when possible – Automatic recovery and failover when required
• Cloud-enabled to deliver Data Protection as a Service
65
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
References • Oracle Maximum Availability Architecture
– www.oracle.com/goto/maa
• Oracle Enterprise Manager 12c Cloud Management – http://www.oracle.com/technetwork/oem/cloud-mgmt/em-dbaas-2104694.html
• The complete Samsung presentation – www.oracle.com/technetwork/database/availability/restart-fsf-in-cloud-oow-
2301693.pdf
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Key HA Sessions and Demos by Oracle Development Monday, 29 September, Moscone South
11:45 MAA with Oracle Multitenant – Seeing is Believing, 104 1:30 Oracle Database 12c HA for Consolidation and Cloud, 306 2:45 Zero Data Loss Recovery Appliance, New Era in Data Protection, 307 4:00 Oracle GoldenGate 12c for Oracle Database 12c, 305 5:15 Maximizing Oracle RAC Uptime, 103 Tuesday, 30 September, Moscone South 10:45 Active Data Guard and GoldenGate HA Best Practices, 308 12:00 Zero-Downtime Mantra for Applications with Oracle RAC, 309 3:45 Zero Data Loss Recovery Appliance Best Practices, 305 5:00 Oracle WebLogic Server 12c: Oracle Database Integration, 304 5:00 Geodistributed Oracle GoldenGate and Active Data Guard:
Global Data Services, 307
Wednesday, 1 October, Moscone South 10:15 Resource Manager Best Practices 11:30 RMAN Best Practices in Oracle Database 12c, 104 12:45 Active Data Guard: Best Practices and Deep Dive, 104 2:00 Expert High-Availability Best Practices for Oracle Exadata, 102 4:45 GoldenGate Performance and Tuning for Oracle, NORTH 130
Thursday, 2 October, Moscone South 9:30 Best Practices for Zero Downtime, 103 12:00 Data Protection,Recovery and HA for Private Cloud, 103 Demos – Moscone South
Oracle Maximum Availability Architecture, SLD-140 Oracle Active Data Guard, SLD-145 Global Data Services, SLD-144
Continuous Availability, SLD-125 RMAN, Database Backup Cloud Service, Flashback, SLD-141 Oracle Secure Backup, SLD-142 Oracle Real Application Clusters, SLD-128
oracle.com/goto/availability https://blogs.oracle.com/MAA @OracleMAA
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 68