8 i rman_love_it

Deploying, Managing, and Administering the Oracle Internet Platform

Paper #227 / Page 1

OORRAACCLLEE BBAACCKKUUPP AANNDD RREECCOOVVEERRYY SSTTRRAATTEEGGIIEESS::HHOOWW II LLEEAARRNNEEDD TTOO LLOOVVEE RREECCOOVVEERRYY MMAANNAAGGEERR

Francisco M. Sánchez, Oracle Corporation

This paper presents backup and recovery strategies for medium and low-end customers. It assumes that you areusing a media manager and making backups to tape to complement backups to disk. It does not describe RecoveryManager fundamentals or concepts; prior familiarity with Recovery Manager by the reader is assumed. Moreinformation about Recovery Manager can be found at http://www.oracle.com/st/products/features/backup.html.

BACKUP STRATEGIESAny production database requires protection against potential media failures. If you do not have a backup strategy,you may not be able to recover your database should a disk failure arise. A successful backup strategy requires thefollowing elements:

• Redundancy

• Frequent and regular backupsREDUNDANCY

The best recovery strategy is to maintain your system so that you never have to perform recovery. The perfectrecovery is the one that you never need to perform. The key to this perfect recovery is redundancy. By utilizing diskredundancy (RAID), operating system redundancy (mirroring), and Oracle redundancy (multiplexing), you decreasethe probability that you need to perform recovery. Other redundancy methods out of the scope of this presentationinclude node redundancy (failover nodes) and database redundancy (standby databases).If despite your efforts at redundancy a media failure forces you to perform recovery, then you need to identify thepieces that require recovery. The set of files needed to perform recovery, or redundancy set, is composed of:

• The last backup of all the database files

• All archived redo logs generated after the last backup was taken

• A duplicate of the online redo log files generated by Oracle multiplexing, O/S mirroring, or both

• A duplicate of the current control file generated by Oracle multiplexing, O/S mirroring, or both

• Configuration files such as init.ora, tnsnames.ora, and listener.oraGOLDEN RULE

The golden rule of backup and recovery can be formulated as:

• The set of disks or other media that contain the redundancy set should be separate from the disks that containthe datafiles, online redo logs, and control files.

This rule implies that you are using at least two disks, one holding the main database files and the other holding theredundancy set. The redundancy set should be kept separated from the primary copies in every way possible: onseparate volumes, separate file systems, and separate RAID devices.


Paper #227 / Page 2

Oracle recommends that you achieve the golden rule by following these guidelines:

• Multiplex the online redo log files and current control file at the Oracle level, not only at the operating system orhardware level. By multiplexing at the Oracle level an I/O failure or lost write only corrupts one of the copies.

• Use operating system or hardware mirroring for at least the control file, because Oracle does not providecomplete support for control file multiplexing: if one multiplexed copy of the control file fails, then Oracle shutsdown.

• Use operating system or hardware mirroring for the primary datafiles if possible to avoid having to apply mediarecovery for simple disk failures.

• Keep at least one copy of the entire redundancy set—including the most recent backup—on hard disk. On theother hand, a redundancy set that is generated by splitting a mirror is not as good as a real backup, as it relies onthe mirroring subsystem for both the primary and redundancy set copy. If using this strategy the last real backupto tape should be considered the redundancy set and not the mirrors.

• If your database is stored on a RAID device, then place the redundancy set on a set of devices that is not in thesame RAID device.

RMAN AND THE GOLDEN RULE

RMAN automates the backup and recovery of archived logs, datafiles, and control files. RMAN does not handleconfiguration files: backups and restores for these files need to be performed directly by the media manager.Online redo logs are not backed up by RMAN and should not be backed up using the media manager or any othermethod. The danger in backing up online redo logs is that you may unintentionally restore them. In a number ofsituations, restoring the online logs would cause significant problems in the recovery operation. One commonproblem is restoring not just the database but also the online redo logs, overwriting the current online logs with theuseless backup. This type of mistake forces you to perform incomplete recovery rather than the intended completerecovery, causing you to lose valuable transactions contained in the overwritten redo logs.A more serious problem is caused when multiple parallel redo log time lines (database incarnations) are created. Thisscenario occurs when you restore a consistent database backup along with its corresponding online redo logs as a wayof avoiding a RESETLOGS operation. The problem is that Oracle will generate archived logs with the same sequencenumbers as the logs generated before the recovery. Later, if another disaster requires you to restore the database androll forward, you may find it difficult to identify which archived log is the correct one and possibly corrupt thedatabase accidentally. If you had performed the RESETLOGS option, thus creating a new incarnation of thedatabase, it would not be possible to apply archived redo logs from the previous incarnation to the new one.Any time an online redo log is lost, incomplete recovery is required. After performing the incomplete recovery, youshould open the database with the RESETLOGS option. The best method for protecting the online logs against thisscenario is by multiplexing them, that is, maintaining multiple log members per group on different disks and diskcontrollers. If your database is in ARCHIVELOG mode, the archiver (ARCH) process is already archiving the redologs. If your database is in NOARCHIVELOG mode, then the only type of backups that you should perform areclosed, consistent, whole database backups. The files in this type of backup are all consistent and do not needrecovery, so the online logs are not needed. The potential problems to avoid the RESETLOGS in this situationdefeat its perceived convenience.


Paper #227 / Page 3

ARCHIVED LOGS REDUNDANCY

Archived logs are the most critical element of the backup strategy. Even if a backup of a database is lost or corrupted,recovery is still possible as long as a previous backup of the datafiles is available and all the logs archived after thatbackup are also available; on the other hand, if a backup of an archived log is lost and you have no other copy, thenrecovery is no longer possible. Consequently, you should always have redundant backups of archived logs.You can achieve redundant backups for archived logs by using the SET DUPLEX command. For a duplexed backupto be effective, you should not write multiple backup sets to the same physical tape. Duplexed backups imply that atleast two tape devices are available and that no hardware multiplexing is used. After a duplexed backup of thearchived logs, you can automatically delete them using the DELETE INPUT option of the backup command. In thecase where only one tape is available, you can accomplish redundancy by backing up the same archived logs after atape change. In this case, it is best to perform the archived log deletion manually.BACKUP REDUNDANCY

You should have at least two whole database backups and all the necessary archived logs to perform completerecovery of the database. This precaution is necessary so that if the latest database backup is damaged, the previousone can be used for recovery. If you perform whole database backups once a week, the worst case scenario recoveryrequires you to apply two weeks worth of archived redo logs.If you store backups on disk and a complete media failure occurs, then you can use only tape backups to recover thedatabase up to the latest archived log that was backed up to tape. Hence, it is very important to back up the archivedlogs frequently. The amount of acceptable loss of data drives the archived log backup.As an added safeguard, keep more than two backups along with their corresponding archived logs on tape. You canuse REPORT NEED BACKUP REDUNDANCY database command to find out which datafiles are in need ofbackup to achieve the desired redundancy.FREQUENT AND REGULAR BACKUPS

The frequency and regularity of the backups is determined by the database archiving mode. You can run a database inARCHIVELOG or NOARCHIVELOG mode. Running in NOARCHIVELOG mode has the followingconsequences:

• You cannot recover the database to arbitrary points in time.

• The database is subject to loss of data in case of media failure.

• You must perform backups when the database is shut down.If any of these conditions is not acceptable, then you should run the database in ARCHIVELOG modeBACKUPS FOR NOARCHIVELOG DATABASES

When the database runs in NOARCHIVELOG mode, backups are the only protection against media failure. . Thesebackups need to be performed when the database has been cleanly shutdown. Make whole database backupsaccording to the amount of work that is acceptable to lose. If one week is an acceptable loss, then take a wholedatabase backup once a week; if only one day is an acceptable loss, then take a whole database backup at the end ofeach business day. If it is not acceptable to lose any work, then run the database in ARCHIVELOG mode.Even if the database is running in NOARCHIVELOG mode , you can still perform incremental backups. Forexample if you are implementing a daily backup strategy, you can take a level 0 every Sunday and take cumulativeincremental backups during the week.BACKUPS FOR ARCHIVELOG DATABASES

When the database runs in ARCHIVELOG mode, two elements require backups: the datafiles and the archived logs.Oracle archives logs continuously during normal database operation. To recover to an arbitrary point in time withoutdata loss, all the logs archived after the backup need to be available after a media failure.As long as the free space in the disk is not consumed by the archived logs, they can be kept on disk. If the disk spaceis scarce, however, space usage dictates how frequently should back up logs to tape. If very few archived logs aregenerated, it is advisable to wait until at least a few hundreds of Mb of redo have been created before taking a backup.


Paper #227 / Page 4

This strategy minimizes the number of file markers in the tape and makes the backup process more effective. Weighthis guideline against the possibility of data loss in case of complete media failure and modify the frequency ofbackups accordingly.You can make database backups with the database open or closed. This decision depends on the availabilityrequirements of the data. Open datafile backups are the only choice if the data being backed up must always beavailable. If the backup is taken with the database open, then use the following sequence of commands to ensure thatthe backups are self-contained:

• Backup database (or datafiles)

• Execute the SQL command: ALTER SYSTEM ARCHIVE LOG CURRENT

• Backup archived logs

• Backup the control fileBy flushing the current online log after the backup of the database, you archive all the changes that were made in themiddle of the backup and ensure that the backup does not depend on the online redo logs. If this is not done, and amedia failure occurs immediately after the backup which causes the online logs to be lost, this backup will not berecoverable.The backup of the control file after the backups of the database and archived logs contains all the entries for theprevious backups. This control file backup is crucial if you are using RMAN without a catalog and a convenience ifusing a recovery catalog, because a disaster can wipe out both the target and recovery catalog, forcing you to use thecontrol file backup during recovery. It is not enough to include the current control file in the database backup—as inthe case where datafile 1 is included or where you specify INCLUDE CURRENT CONTROLFILE—because thisbackup control file does not contain records describing the database backup set that it is contained in. You musthave a backup control file that contains a complete set of records for the backup that was just created, which is onlypossible when those records are created after the backup set is complete and at the time the control file is copied intothe backup set, the backup set is not complete yet.By taking a separate backup of the control file following any backup operation this problem is avoided as this backupcontrol file will contain the records RMAN needs to restore the backup sets just created.INCREMENTAL BACKUPS

When performing database backups, the backups can be full (that is, non-incremental) or incremental. During a fullbackup, RMAN directs the channel to read the whole datafile and to back up all blocks that are not empty. Currently,RMAN asks the channel performing the backup to scan the entire datafile when performing incremental backups.Blocks that changed after the last backup are detected and then written to the backup set.If the tape output is not a bottleneck, then the time it takes to perform a full backup of a datafile is roughly the sameas the time it takes to perform an incremental backup. When the tape output is a bottleneck, the incremental backupmay take less time. As long as the number of blocks keeps the tape streaming, the incremental backup will be fasterthan the full backup.If only a few blocks have changed, then the channel has to input several buffers from the datafile before itaccumulates enough blocks to fill a buffer and write to tape. Consequently, the tape drive may not be kept streaming.When the tape is streaming the tape drive is 100% busy, but when a tape drive is not kept streaming it becomesinefficient, because the tape drive must stop and re-start between each write, causing a performance hit.. In mostscenarios, a single datafile cannot keep the tape streaming and thus the actual performance of the incremental is notas good as the performance of the full backup.One way to overcome the single datafile syndrome is to scan many datafiles in parallel. This strategy makes the outputbuffers for the tape drive fill quickly, allowing the channel to write them frequently enough to keep the tape drivestreaming. The number of files scanned in parallel is controlled by the FILESPERSET parameter. For anincremental backup, setting FILESPERSET = 50 is usually sufficient. But for a full or incremental level 0 backup,FILESPERSET should be set to a lower value such as 4 or 8.


Paper #227 / Page 5

In general, network traffic and backup size should be the main considerations when choosing between full orincremental backups—not the overall time of the backup. When the backup devices are not local to the channelperforming the backup, an incremental backup reduces the amount of the data sent through the network, making anincremental backup a better choice. On the other hand, as long as only portions of the database are changed,incremental backups are smaller than full backups, which is of particular interest if either the backups are on disk ortape usage is a concern. You should consider using full backups when the incremental backups are backing up 30%or more blocks of the datafiles.The two types of incremental backups are differential and cumulative. Cumulative backups copy all blocks changedafter the most recent backup at level n -1 or lower, whereas differential backups copy all blocks changed after themost recent backup at level n or lower. Cumulative backups take more space and network bandwidth than differentialbackups, but during recovery fewer backup sets need to be restored from tape and applied, which decreases recoverytime.FREQUENCY OF BACKUPS

You should make whole database backup to tape at least once a week. For ARCHIVELOG databases, the availabledisk space dictates how often archived log backups need to be taken; at the very least, you should back up thedatabase and logs together once a week.You can use REPORT NEED BACKUP DAYS command to find out which datafiles need backups. A direct,proportional relationship exists between transactions to the database and datafile backups: the higher the number oftransactions in the database, the more frequent the datafile and archived log backups. Conversely, a lower number oftransactions in the database, the less frequent the backups.Besides these regularly scheduled backups, the following events should trigger a new backup:

• Structural changes to the database

• UNRECOVERABLE or UNLOGGED operations in the database

• RESETLOGS operations

• Heavy updates to the databaseFor NOARCHIVELOG databases, you should make a new whole database backup after any alteration to thephysical structure of the database so that the backup reflects the new database structure. For ARCHIVELOGdatabases, make a control file backup and a recovery catalog RESYNC after any structural change to the database.Make backups of the database after structural changes. Structural changes to a database include:

• Creating or dropping a tablespace

• Adding or renaming a datafile in an existing tablespace

• Adding renaming, or dropping an online redo log group or memberFor ARCHIVELOG databases, make a backup of the newly created tablespaces or datafiles. This is necessarybecause RMAN currently does not use the CREATE DATAFILE to recreate the datafile just using the archived redolog.Whenever you perform an UNRECOVERABLE or UNLOGGED operation, make a backup of the affecteddatafiles. This backup can be either full or incremental. Failure to make the backup compromises the recoverybecause the changes of these operations are not recorded in the archived logs. REPORT UNRECOVERABLEcommand indicates which datafiles are in need of backup due to these kind of operations.Always take a whole database backup after a RESETLOGS operation. Recovery is not possible using a pre-RESETLOGS backup because resetting the online redo logs creates a new incarnation. Oracle perform these actionsduring the RESETLOGS operation:

• A new RESETLOGS SCN is placed in the headers of all database files.

• Log sequence number is reset to 1.

• The online redo log files are reformatted if they exist, and otherwise created.


Paper #227 / Page 6

Oracle performs these operations so that it can uniquely identify which archived redo logs apply to whichincarnations of the database and prevent multiple timeline problems.Backups made before a RESETLOGS operation are not usable, except in the following special cases:

• Backups of read-only tablespaces that were not made read-write again before the RESETLOGS.

• Backups of offline normal tablespaces that were not brought online again before the RESETLOGS.

• Backups of read-write tablespaces made after an incomplete recovery and immediately before the RESETLOGS.None of the previous backups of the database will meet this criteria. New backups have to be taken after recoveryand before opening the database.

After a RESETLOGS operation in an ARCHIVELOG database, you can perform a backup with the database openor closed. Oracle Corporation recommends that you perform the whole database backup with the database cleanlyshutdown. If you make the backup with the database open and the backup does not finish before another mediafailure, then the changes after the RESETLOGS operation are lost because the previous backups are not usable. ForNOARCHIVELOG databases, the only option is to perform the backup with the database closed.You should also make backups after heavy usage on the database, because otherwise a failure necessitates a longerrecovery time or, in the case of NOARCHIVELOG databases, loss of the new data. If the database is running inARCHIVELOG mode, you can back up only the datafiles that were heavily modified. Depending on the number ofchanges, the backup can be either full or incremental.

RECOVERY STRATEGIESRecovery is not something that you should try for the first time in a disaster situation. A crisis situation is not the todiscover that a critical piece of the redundancy set is not available and that recovery is not possible.Performing regular test recoveries ensures that your backup strategy is working and it also helps you stay familiar withrecovery procedures, so that you are less likely to make a mistake in a crisis. It can also show whether the backupstrategy needs to be modified and whether more frequent or different types of backups are needed.RMAN provides two commands to test recovery:

• VALIDATE

• DUPLICATEVALIDATION OF BACKUPS

You can use the VALIDATE command in two ways:

• VALIDATE BACKUPSET

• RESTORE DATABASE VALIDATEThe VALIDATE BACKUPSET command directs RMAN to examine specific backup sets and report whether theycan be restored. RMAN directs the channel to scan all of the backup pieces in the specified backup sets and looks atthe checksums to verify that the contents are intact so that the backup can be successfully restored if necessary. Thisoption should be used when it is suspected that one or more backup pieces in a backup set are missing or have beendamaged.The RESTORE DATABASE VALIDATE lets RMAN decide based on the specified objects and the allocatedchannels the type of backup that needs to be restored and then scans them to verify their contents.Neither of the commands creates output files but is the equivalent of using a RESTORE command for the requestedobjects. A trace file is created describing the outcome of the operation. By specifying this option periodically,RMAN can verify that the copies and backup sets required to restore are intact and usable. Incremental backupscannot be validated with RESTORE VALIDATE as incremental backups are restored only during recovery.DATABASE DUPLICATION

While VALIDATE verifies that the restore is possible, it does not perform recovery of any kind. By using theDUPLICATE command RMAN not only restores but tests recovery. The DUPLICATE command is equivalent to


Paper #227 / Page 7

performing disaster recovery because it requires that the whole database be restored and recovered. Also, because thedatabase created by DUPLICATE does not have online redo logs, RMAN performs an incomplete recovery—in thedefault cases, recovery proceeds through the most recent archived log.You can perform the duplication in the same host where the database resides or in a different host. If using only tapebackups, the host can be even disconnected from the original host, which simulates losing the entire data andperforming recovery in a new location.A successful database duplication is a good indication that your backup strategy is working. Because theDUPLICATE command not only performs restore operations but also recovery, the time to recover can be measuredvery precisely. In light of this information, you can amend your backup strategy to decrease this time if necessary.

TOP TEN REASONS TO LOVE RMAN1. Powerful (RESTORE DATABASE, BACKUP DATABASE)2. Reliable (VALIDATE)3. Light (no need to do BEGIN BACKUP/END BACKUP)4. Flexible (DISK, ‘SBT_TAPE’)5. Versatile (COPY, BACKUP, CATALOG previous backups)6. Customizable (FULL, INCREMENTAL, CUMULATIVE INCREMENTAL)7. Helpful (REPORT NEED BACKUP, REPORT UNRECOVERABLE)8. Clone-enabled (DUPLICATE)9. Integrated (part of the RDBMS)10. Well-timed (SET UNTIL)

Software

8 i rman_love_it