81
03/30/22 1 Archiving & Restoring John F. Miller III

Archiving & Restoring

Embed Size (px)

DESCRIPTION

Archiving & Restoring. John F. Miller III. TOC. Term & History Disaster Recovery Planning Backup & Restore Procedures Architecture (XPS differences) The grab bag. Terminology. Serial Backup Archives the entire system at a single point in time using only one data stream Parallel Backup - PowerPoint PPT Presentation

Citation preview

04/19/23 1

Archiving & Restoring

John F. Miller III

04/19/23 2

TOC

• Term & History

• Disaster Recovery Planning

• Backup & Restore Procedures

• Architecture (XPS differences)

• The grab bag

04/19/23 3

Terminology

• Serial Backup– Archives the entire system at a single point in time using

only one data stream

• Parallel Backup– Archives the requested dbspace one at a time to N data

streams

• External Backup– Allows a third party application to backup the database

server while maintain logical consistency

04/19/23 4

Terminology

• Cold Restore– Restoring the server when the database engine is

offline

• Warm Restore– Restores of dbspaces which occur while the

database engine is online

• Mixed Restore– A cold restore of set dbspaces followed by a warm

restore of other dbspaces

04/19/23 5

Terminology

• Imported Restore– Transferring an archive taken on one computer and

restoring it on a second computer

• Point-in-Time Restore– Restoring the entire system to a single point it time

• Restartable Restore– Allows the DBA to pickup the restore from the

failure point

04/19/23 6

Early Backup and Restore History

• 1.X Turbo– Only Quiescent mode archives

• 4.X named OnLine for advanced archiving technology

• 5.X same core technology– limitation revealed (scalability & extensibility)

04/19/23 7

DSA Backup and Restore History

• 6.0 new client/server model developed• 7.1 & 7.20 same core technology• 7.21 new client (onbar)• 7.3 server API re-write• 9.2 onbar usability features added

04/19/23 8

Pre-DSA Archive Bad Grammar Archive

• Archive Checkpoint (get timestamp)

• Free extents recorded

• Reserve pages saved

• Chunks backed-up by ascending chunk number

• Pages modified during archive are placed in physical log

• tbtape routinely scans physical log for unarchived before-images

• Pages placed directly to tape

04/19/23 9

Pre-DSA Restore

• Begins with OnLine off-line

• Reads configuration file, matches params to config params of archive tape

• Zero out logs (physical & logical)

• Validate size of all chunks

• Read tape, copying pages based on their address directly to disk

04/19/23 10

DSA Archive Architecture

Major Differences• True client-server architecture

• Archived pages logically grouped by dbspaces

• Granularity of creations

• Granularity of restores

• Warm restores

• Physical log pages kept in temp tables

04/19/23 11

Server Algorithm ChangesGood Grammar Archive

• List is made of all pages that should be archived– Cost vs Benefit

• Before images are queued by the modifier

• A new thread is responsible for the before image handling

04/19/23 12

Disaster Recovery

• Goals

• Planning

04/19/23 13

What is a Successful Recovery?

• “Successful” recovery is defined by your business needs

04/19/23 14

Goals For Recovery

• Determine acceptable recovery time – How long can your business function without

the data?– How long can your production system be down

during a restore?

04/19/23 15

Type

Time

Distribution

Quantity

Determine Acceptable Data Loss

04/19/23 16

Recovery Strategy

Tune theStrategy

Analyze/Testthe Strategy

ImplementThe Strategy

SelectTools

Plan RecoveryGoals

04/19/23 17

Data Layout

• Poor data layout can hurt BAR performance

• Isolating the different types of data can facility restore priority

• Example– 8 dbspaces each with 2 chunk, but one dbspace

has 68 chunk

04/19/23 18

Data Layout Examples

• Important frequently modified in its own dbspaces– important data such as orders should

dbspace_orders– dbspace containing zipcodes and other lightly

modified data can be backed up with less frequency

04/19/23 19

Right, Fast or Cheap?

Choose Two!

04/19/23 20

Select Tools

Backup Utilitiesontape

ON-Bar

External Backup/Restore

Fault Tolerance MechanismsMirroring

High Availability Data Replication (HDR)

Enterprise Data Replication (DR)

Load/UnloadHigh Performance

Loader (HPL)

dbexport/dbimport

dbschema

SQL load/unload

onload/onunload

dbload

Customer ESQL programs

04/19/23 21

Ontape Backup Features

• Backup at the Server level

• Support for incremental backups

• Manual or continuous logical log backup

• Restore entire system or single dbspace

• Backup is self describing

04/19/23 22

On-Bar Backup Features

• Parallel backup and restore

• System and dbspace level backup and restore

• Support for incremental backups

• Manual or automatic backup of logical logs

• Instance point-in time recovery

• Open interface for communication with storage managers (XBSA)

04/19/23 23

External Backup Features

• EBR allows administrators to make a consistent copy of their dbspaces using external tools

• Used with many 3rd party backup products

• Allows for both cold and warm restores

04/19/23 24

EBR - Examples

• Planned uses:– File system snapshots– Breaking of mirrors– Third party “raw” backup

• Basic Steps– Block coserver(s) at checkpoint– Backup dbspaces using third party tools– Unblock coserver(s)

04/19/23 25

Restoring

• Logical Logs required

• Restore looks hung, nothings happening

• Handling unanticipated problems

04/19/23 26

Logical Logs Required for a Restore

• Cold Parallel Restore– Starting log is the log that contains the begin of the

oldest active transaction when the first archive checkpoint occurred

– At least the logical log that contains the last archive checkpoint

• Cold Whole System (Non-Parallel)– No logical logs required– Logs included with archive

04/19/23 27

Logical Logs Required for a Restore

• Warm Restore– Starting log is the log that contains the begin of

the oldest active transaction when the first archive checkpoint

– All logs to the current point in time

• If you are using DR then you must include the replay point

04/19/23 28

Example of Logical Logs Required for a Restore

Archive Checkpoint

Log 10 Log 11 Log 12 Log 13

B

B Oldest Begin Work

BB

Cold restore all Logs 10-12 Optional 13

Warm restore Logs 11-> No Optional Logs

Logs Required

04/19/23 29

Restartable vs. Suspended Restored

• Restartable Restore– When the database engine prematurely shuts

down the engine may be restarted in recovery mode

• Suspended Restore– When the archive client receives an error which

is restartable and the database engine does not shutdown

Restartable Restore

• Turned OFF by default

• What can restart when? – Whole system– Partial Restore– Logical Recovery from a cold restore

• Only available with On-BAR

• onbar -RESTART

04/19/23 31

Architecture

• Overview• Archive Clients• Moving Data

– IDS

– XPS

• Server Threads• XPS Architecture

04/19/23 32

What Pages are Sent to the Archive

• If page’s timestamp is older than maxstamp and newer than minstamp, it is put to tape

• If a page is greater than current stamp, but older than minstamp, it is put to tape, and it’s timestamp is updated to maxstamp-1

• Pages newer than max, but older than current are considered to be modified after the archive started, and are ignored.

04/19/23 33

Understanding Timestamps

0

Not ArchivedNot Archived

Max-Stamp

Current Stamp

04/19/23 34

OnLine Wheel-O-Death

0

Not ArchivedNot Archived

Max-StampMin-Stamp

Current Stamp

The timestamp at the start of the archive

The timestamp at the current point in time

The timestamp 50% away from Max-Stampie Max-Stamp - 2GB

All Pages in the red region have their timestamp updated along with being archived

04/19/23 35

EBRArchive Clients

Ontap

e

XBSA Onbar CommonArchiveCode

SMV

XB

SA

04/19/23 36

DSA Client Server Model

SQLI/ASFNetwork Connection

StreamsLocal Connection

Archive Client

ArchiveBE

04/19/23 37

Moving Data between Client and Server

Archive Client

ONINIT

Shared Memory

SQLI RequestsArchive Data Buffer

SQLI Returns SharedMemory Address

04/19/23 38

Moving Data between Client/Server

• The size of the buffers used to transmit data– ontape - control by onconfig’s TAPEBLOCK– onBar - BAR_XFER_BUFSIZE - maximum size is one

online page smaller than 64kb

• The number of buffers:– ontape – onbar - BAR_XPORT_COUNT min 3 max 99

• Monitoring the data transfer– onstat -g stq

04/19/23 39

What Data is Shipped to the Archive Client

• Server sends raw online pages just like they exist on disk

04/19/23 40

Example of onstat -g stq

Stream Queue: (session 11 cnt 10) 0:ad91400 1:ada1400 2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400 7:ae01400 8:ae11400 9:ae21400

Full Queue: (cnt 0 waiters 0) 0:0 1:ada1400 2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400 7:ae01400 8:ae11400

Empty Queue: (cnt 0 waiters 1)

Stream Queue: (session 10 cnt 10) 0:ac8d400 1:ac9d400 2:acad400 3:acbd400 4:accd400 5:acdd400 6:aced400 7:acfd400 8:ad0d400 9:ad1d400

Full Queue: (cnt 9 waiters 0) 0:ac9d400 1:acad400 2:0 3:accd400 4:acdd400 5:aced400 6:acfd400 7:ad0d400 8:ad1d400

Empty Queue: (cnt 0 waiters 1)

04/19/23 41

Server Threads

• ontape• Scanner• Before Image Processor

04/19/23 42

Ontape Thread

• Always called ontape regardless of archive client

• Responsible for all communication to archive client

04/19/23 43

Scanner Thread (arc_backup1)

• The “dummy” thread, geared for I/O performance and not thinking

• Handed a list of pages to backup• Scans data from disk into shared

memory buffers• Makes NO decisions about the data• Ensures the page address is correct

04/19/23 44

Before Image Processor Thread (arc_backup2)

• Monitors the before image queues• Determines if the before image

needs to be saved or discarded• Drains the before image memory

queue, by storing the page images into temp tables

• Creates multiple temp tables if required

04/19/23 45

XPS Difference & Architecture Overview

• Basic XPS Architecture– Client Sub-Systems– Server Sub-Systems

• Differences– sysutils– configuration

04/19/23 46

Basic XPS Architecture

onbar

StorageManager 1

StorageManager 2

OnLine XPS

Coserver 3

Coserver 2

Coserver 4

Coserver 1

04/19/23 47

Client Sub-Systems

Executable Function

onbar Shell script wrapper

onbar_d The driver

start_worker Shell script wrapper

onbar_w Worker process

onbar_m Distributes bootfiles

onbar_s Checks server state

04/19/23 48

Client Sub-Systems

onbar

StorageManager 1

OnLine XPS

Coserver 3

Coserver 2

Coserver 4

Coserver 1

onbar_d

onbar_w

04/19/23 49

• ASF/local streams– Send/Receive commands and data buffers

• Backup Scheduler (BUS)– distributes tasks to workers

• XBAR– communicates between coservers

• RSAM– only sees a single coserver– manages all I/O to disk (dbspaces/chunks)

Server Sub-Systems

New

New

04/19/23 50

XBAR

• Interfaces with both BUS and RSAM• Manages distributed execution of backup and

restores– transfers data from the object’s coserver (coserver

where the dbspace/chunk exists) to onbar_w’s coserver (output coserver)

– Uses XMF between coservers– Uses local stream between onbar_w and output

coserver

04/19/23 51

• Manages user requests, workers, storage managers and coservers

• Farms out work to onbar_w

• Reports success or failure to onbar_d after each work item has been attempted

• onbar_w create a new worker queue in the bus when it is started

Backup Scheduler (BUS)

04/19/23 52

XBAR/BUS support in SMI

• New tables for BUS data structures:– sysbusession list of sessions

– sysbuobject what’s in the queue

– sysbuobjses for which session

– sysbusm BAR_SM paragraphs

– sysbusmdbspace space to BAR_SM map

– sysbusmlog logstream to BAR_SM map

– sysbusmworker worker to BAR_SM map

– sysbuworker info about each onbar_w

04/19/23 53

Moving Data between Client/Server Version 8

onbar

StorageManager 1

OnLine XPS

Coserver 3

Coserver 2

Coserver 4

Coserver 1

onbar_d

SQLI

Shared Memory

SQLIonbar_w

04/19/23 54

Difference Between8 and 7

• Multiple Nodes

• Non-locality of devices and data– Backup data may be shipped between nodes

• Multiple Storage Managers– One Storage manager can server the entire

system– Multiple storage managers can eliminate

performance bottlenecks for large systems

04/19/23 55

Difference immediately seen by DBAs

• Command line is slightly different

• Configuration parameters are very different– Version 7 has 6 configuration

parameters, none needs to be set

– Version 8 has 15 configuration parameters, most must be configured

04/19/23 56

Difference immediately seen by DBAs

• sysutils has more columns

• Emergency bootfiles– more columns– 1 boot file per coserver– Merge boot files

• Additional onstat options

04/19/23 57

arc_very_old_pages()Why do it??

04/19/23 58

arc_very_old_pages()

• Permanent solution #1– No longer use timestamps for recovery

– Disk timestamps do not need to be refreshed

– Memory and disk timestamp are different

– Bitmaps used to keep track of foreground writes

• Permanent solution #2– Multiple instances of the same page in the physical log

– Only the oldest instance of a page is restored during physical recovery

04/19/23 59

7.31 Solution #1

• Must be enabled CCFLAGS

Physical Recovery Started at Page(1:1065).Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.

04/19/23 60

9.21 Solution #2

Physical Recovery Started at Page(1:1065).Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.

04/19/23 61

Override Internal Error Checks

• The -O option is much like -f for UNIX rm

• Does many different things:– Allow restore of a space that is still on-line– Creates a filesystem entry for each chunk if

there isn’t one– Allows expiration of objects from sysutils and

the storage manager that may be needed in a restore

04/19/23 62

Archive Utilities

• Explaining onstat & oncheck options– onstat -d– onstat -g arc– onstat -g stq

• Validating Archive

• Managing the archive catalogs

04/19/23 63

onstat -g arc

num DBSpace Q Size Q Len Buffer partnum size scanner2 dbspace1 92 0 4 0x100085 240 0x2033ee3 dbspace2 69 0 1 0x100084 150 0x302f1a

Dbspaces - Archive Statusname number level date log log-positionrootdbs 1 0 10/04/2001.10:17 5 0x10b608dbspace1 2 0 10/04/2001.10:17 5 0x10b608dbspace2 3 0 10/04/2001.10:17 5 0x10b608sbspace1 4 0 10/04/2001.10:17 5 0x10b608sbspace2 5 0 10/04/2001.10:17 5 0x10b608

04/19/23 64

onstat -d information

• D Chunk is down

• L Storage space is being logically restored

• O Chunk is online

• P Storage Space is physically restored

• R Storage space is being restored

04/19/23 65

oncheck -pr Validating PAGE_1DBSP & PAGE_2DBSP...

DBspace number 2

DBspace name dbspace1

. . . . . DBspace archive status

Archive Level 0

Real Time Archive Began 10/04/2001 10:33:09

Time Stamp Archive Began 306128

Logical Log Unique Id 6

Logical Log Position 0x3d2018

Archive Level 1

Real Time Archive Began 10/04/2001 10:35:28

Time Stamp Archive Began 323695

Logical Log Unique Id 8

Logical Log Position 0x208018

04/19/23 66

Validating Archives

• Utilizes a executable called archecker

04/19/23 67

Validating Archives

• What is actually validated

• What other information is there for me

• What else can go wrong with my validated restore

• How do I validated my archives

04/19/23 68

What is actually validated

• Format of each page on the archive is check (similar to oncheck -cd)

• Tape control pages are sanity check

• Each table is checked ensuring all pages of the table exist on the archive tape

• Reserve page format is validated

• Each chunk free list is verified

• Table extents are checked for overlap (oncheck -pe)

04/19/23 69

Other Information for the DBA

• AC_MSGPATH - Message log for archecker

• {AC_STORAGE}/INFO– extent list for each dbspace, oncheck -pe DBS.{dbspace_#}

– time to process each tape/object

– Information about the number and type of pages processed; profile.{pid}

• {AC_STORAGE}/SAVE– contains a binary image of control information

04/19/23 70

Profile InformationProfile Information

=======================

Total pages processed 51227

Total Data pages 49327

Total index pages 828

Total smart blob pages 6

Total blob space pages 0

Total partition pages 328

Total chunk free list pages 5

Total Reserve pages 12

Total bit map pages 335

MORE . . .

04/19/23 71

Extent Information

db1:sysprocedures 0x00200235 8

db1:sysprocbody 0x0020023D 32

db1:sysprocauth 0x0020025D 8

db1:sysprocedures 0x00200265 8

db1:sysprocbody 0x0020026D 32

db1:t1 0x0020028D 24344

FREE 0x002061A5 3

04/19/23 72

Validating Archives

• ontape– archecker -tdvs– AC_TAPEBLK, AC_TAPEDEV

• onbar– onbar -r -v (version 7.3X)– onbar -v (9.20 & 8.30)– onbar -b -v (8.30)

04/19/23 73

onsmsync

• Adds from ixbar files to sysutils

• Removes objects from sysutils

• Three expiration policies– -g: remove older than the Nth generation– -t: remove from before a datetime– -i: remove older than an interval

04/19/23 74

Understand ixBar Files

• Server name• object name• object type• is_serial• action id• archive level• SMV copy id high• SMV copy id low

• Backup start date• Backup start time• Backup end data• Backup end time

04/19/23 75

Storage Manager Snafus

• Timeout of onbar

• Error 131 Object not found

• Salvaging logs and getting wrong object

04/19/23 76

Recovery Snafus

• Check the devices are linked proper– KAIO only uses raw I/O– overlapping data

• While restoring database appears hung

04/19/23 77

Preparing to Call Support

04/19/23 78

Restore seems Hung

• The tape is done• onstat -D shows no I/O• Very little CPU activy• While the system clears the

physical and logical logs there is very little activity and the system appears to be hung.

04/19/23 79

Improvements

• A message into the online log indicating this phase of the restore started and completed.

• The use of intelligent parallelism to clear all the logs in a single chunks with one thread. One disk clear thread per chunk.

Clearing the physical and logical logs has startedCleared 2100 MB of the physical and logical logs in 612 seconds

04/19/23 80

Parallel Archive Procedures

• The archive is broken down into archive jobs with each dbspace being its own backup

• An onbar_d is started to backup a single dbspace

• Connects to database server and Storage manager requesting the backup session

• Updates sysutils and ixbar file

04/19/23 81

Parallel Restore Procedures