26
Joonwon Lee [email protected] Recovery

Joonwon Lee [email protected] Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Embed Size (px)

Citation preview

Page 1: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Joonwon [email protected]

Recovery

Page 2: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Recovery• Lightweight Recoverable Virtual Memory

• Rio Vista

Page 3: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Introduction• failure

– when a system does not perform in the manner defined

• erroneous state– state that could lead the system to the failure

• fault– anomalous physical condition– causes

• design/manufacturing error

• damage/fatigue

• external disturbance

• faults lead the system to an erroneous state which may or may not results in a failure

Page 4: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Failures• process failure

– deadlock, timeout, protection violation, ...– OS should confine this failure to the process

• system failure– software and hardware– amnesia failure: cannot recover the state just before the failure– pause failure: the state can be reinstated– halting failure: the system never restarts

• disk failure– serious problem when it is the last backup storage– usually backed up by tape OR– mirrored (it will enhance read throughput anyway)

• communication medium failure– does not cause total system failure

Page 5: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Error Recovery• Forward Error Recovery

– allow the process to proceed after fixing errors

– difficult to remove all the errors (in software, procedures to cope with all kinds of error should be prepared, which is almost impossible)

• Backward Error Recovery– the process should restart from the saved (or predefined) state

– roll-back mechanism is needed

– easy to cope with any kind of errors (it is not necessary to anticipate all kinds of errors)

– overhead to restore previous state

• checkpointing is needed

– same error may occur again

Page 6: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Backward Error Recovery• Operation-based approach

– using a log, undo(roll-back) what has been done until an error-free state can be restored

– write ahead log (for a write to X)• records in a log new value of X• updates X

• State-based approach– checkpoint

• a complete state of a process• at crash, rollback to the most recent safe state

– needs many checkpoints

– shadow page• copy of a page that is to be updated• updates are done only on the original page• at crash, goes back to the shadow page• at commit, keep using the original page

Page 7: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Issues in Recovery(1)• failure and recovery of a process affect other

processes that exchange data with the failed process• orphan message

– when a process rolls back to the point before sending out a message

– actions of other processes depending on the orphan message should be rolled back, too (domino effects)

• lost message– node Y receives a message from X– Y rolls back to the point before receiving the message– effects are the same as when the message is lost

Page 8: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Issues in Recovery(2)• livelocks

Y sends out m1 and receives an orphan message n1, and rolls back m1 becomes an orphan message receiving m1, X rolls back

X

Y x

x

m1n1

1. failure, androll back

2. orphan message,roll back

Page 9: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Checkpoints• local checkpoint

– snapshot of a single node

– superscalar CPU and out-of-order memory operations made checkpointing difficult

• global checkpoint– strongly consistent set of checkpoints

• all the checkpoints are inside a given interval

• no information is exchanged between any processes during this interval

• this is the last place any process should rolls back to

Page 10: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Checkpoints(2)– consistent set of checkpoints

• a message recorder as “received” in a checkpoint should be recorded as “sent” in another checkpoint

– no orphan message

• recorded as “sent” may NOT be recorded as “received” in other checkpoint

– possible lost message

• simple to make this set– take a checkpoint after sending every message

– or after sending N messages for better efficiency but at more chances of domino effect

• lost message can be dealt as in other network protocols

Page 11: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Synchronous Checkpointing• Assumption

– FIFO delivery of messages– no lost message

• Operations– an initiating node P broadcasts a message– all the other node

• take temporary checkpoints if necessary• reply OK to the P• do not send any message until they hear from P

– P broadcasts either• GO: if all the nodes reply OK to P• Fail: otherwise

– Nodes make the temporary checkpoint permanent or discard it• start to send messages from this point

Page 12: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Synchronous Checkpointing• advantages

– east recovery: all processes restarts from the checkpoint

• disadvantages– message overhead

– hinder normal progress (no computational messages are allowed during checkpointing)

Page 13: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Asynchronous Checkpointing• checkpoint at each node is made independently

– no guarantee of consistent set

– recovery is complex to find the nearest consistent set

• optimization: all incoming messages are logged after checkpoint– recovery algorithm analyzes the log and find the most

recent consistent set of checkpoints

Page 14: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Asynchronous Checkpointing(2)

– Y crashes• Y restarts from the last checkpoint• send ROLLBACK(Y,2) to X since the last checkpoint records that Y

has sent 2 msgs to X• ROLLBACK(Y,1) to Z (red lines)

– other nodes sends back ROLLBACK msgs similarly (blue lines)• X sends out (X,2), (X,0) to Y and Z, respectively

– each node sets the chkpnt as to prevent orphan msgs (red brackets)• number of received msg from i recorded in the chkpnt < N, where R

OLLBACK(i,N) msg has arrived– loop until a consistent set of checkpoints comes up

• bounded by N (?)

X

Y x

Z

[

[

[

[

[

Page 15: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Free Transactions with Rio Vista• crash taxonomy

– hardware: not frequent

– software: frequent due to bugs in OS

– power: UPS

• motivations– transactions are useful but high overhead (disk accesses)

– file cache is useful, but vulnerable to system crashes

Page 16: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Traditional Approach: RVM

• at the beginning of a transaction, RVM copies the page to undo log(shadow page)

– user abort is serviced by the undo log

• at commit, RVM reclaims undo space, and writes updated pages to redo log on disk

– system/process failure is serviced by the redo log

• at leisure time, database is updated from the redo log

Page 17: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Rio file cache• protect cached data from system crashes

– cache is as reliable as a disk

– then, write ahead log for recovery is not needed

– writes to disk can be delayed infinitely

• OS errors can corrupt any part of the system– the issue is how to reduce the chances

• at a crash– warm reboot process writes the cache to disk

Page 18: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

file cache vs disk• why people view memory more vulnerable than

disk?– memory access is a simple write

• an error in the address bits will overwrite the file cache

– interface to access disk is complex and explicit

• hardware controller is accessed only through device driver

• calls to device drivers are checked for their arguments

• it is extremely unlikely that accidental errors can forge the logic of device driver

Page 19: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

How to protect from system crashes?

• prevent OS from accidentally overwriting the file cache

• virtual memory mapping– turn off the write-permission bits in the page table for the pages in the

file cache

– unauthorized accesses will encounter protection violation

– file cache module enables the bit before writing and disables the bit afterwards

• the file cache is vulnerable to crashes while being written– disk has the same problem

– solutions

• verify after writes

• use shadow copy for atomic writes

Page 20: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

How to protect from system crashes?

• some kernels bypass the address translations (TLB)– many systems can disable such bypasses

– otherwise, code insertion (sandboxing)

• check for every kernel write using physical address

• 20-50% slower

• memory-mapped file– kernel procedures that modify the memory-mapped file sh

ould be changed as above

– faulty user program can still corrupt files to which it has write access

Page 21: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Warm Reboot• Recovery needs to access many data structures

– internal file cache lists

– page tables (memory-mapped files)

– all these data must be protected from crash but they are scattered inside the kernel

• Registry– a separate physical memory region

– contains all the information to recover the file cache

– it is updated only when a buffer is replaced (reloaded)

Page 22: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

File System Modifications• writes to disk can be saved

– most disk writes are reliability-induced

• writes to disk are needed only when the file cache overflows

• writing back dirty copies when the system is idle – reduces the time when a buffer is replaced

Page 23: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Vista Recoverable Memory

Page 24: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Recovery• operations

– prepare undo log

– writes directly to DB’s mapped image in Rio

• these updates are persistent

– at commit, discard the undo log

– at abort, restore the undo log to the mapped DB

• at recovery– Rio writes back Vista segments that were mapped at the time of crash

– Visa examines the segment if there is any uncommitted transactions

• roll back (restore undo log)

– recovery process should be idempotent

• crash can happen while recovering

Page 25: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Persistent Heap• only transactions can use

– when they aborts, all the used heaps are returned

• undo records mentioned above are stored here• programs can store their original data structures

– usually convert them to record style when stored in a file

• meta data for the heap is in user space– why?– need a protection from corruption

• reduce the risk by using isolated range of addresses• software fault isolation• virtual memory protection

Page 26: Joonwon Lee joon@kaist.ac.kr Recovery. Lightweight Recoverable Virtual Memory Rio Vista

Fault Tolerance with DSM• DSM maintains multiple copies of a page

– if a copy is lost, it can be recovered from another copy

• maintain at least two copies for each page– cope with a single failure

– can be extend to cope with n-failures

• what about state information?– can be rebuilt