View
216
Download
0
Embed Size (px)
Citation preview
Recovery Oriented Programming
Olga Brukman and Shlomi Dolev Ben-Gurion University
Beer-ShevaIsrael
2
Towards Correct Software
• Software should respects its specifications– Safety, Liveness
• Atomic power station– Safety: the atomic
station shouldn't explode
– Liveness: the atomic station should produce some electricity
Atomic power station
3
Recovery Oriented Design
• Software performs substantially in accordance with specifications for a period of 90 days... (IEEE Computer, October 2006)
• How to cope with such software?!– Recovery Oriented Computing [PBB'02]!
• Recovery actions– Reboot, wait, reschedule– Non-intrusive: avoid rewriting the program
(possibly new other bugs)
4
Recovery Oriented Programming
• Specifications Composer (Project Manager)
– Invariants and predicates• important properties on
program IO
– Recovery actions
• Programmer• Best-effort implementation
• Using same IO variables as specifier
• Still: bugs and unexpected states
5
Recovery Oriented Programming: Assumptions • Self-stabilizing processor
• Self-stabilizing OS
• Infrastructure for robust monitoring and recovery• Processes exist and execute their code
Recovery Oriented Programming: Assumptions
• Not immediately Byzantine– eventual Byzantine program
Long enough to do sufficient job
7
Our Framework
Pre-compiler
Code
Recovery tuples
Subsystemshierarchy
event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
SubsystemExternal Monitor
System is able to recover from any
state
Generated Code: One Process
event-driven monitoring
External Monitor
Codeevent-driven monitoring
Recovery tuples
9
Generated Code: Subsystem
event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
SubsystemExternal Monitor
Code
Code
Code
Recovery tuples
Subsystemshierarchy
10
Our Framework: Transforming Recovery Tuples into Code
Code
Recovery tuples
Subsystemshierarchy
event-driven monitoring
event-driven monitoring
External Monitor
SubsystemExternal Monitor
Pre-compiler
event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
11
Safety Recovery Tuple
...x=a;...
PRED: x!=7RA: this.restart()
1 process
temp_x=a;if temp_x!=7 x=temp_x;else this.restart();
Pre-compiler
12
Safety Recovery Tuple in the Scope of Stabilization: External Monitoring
...x=a;...
PRED: x!=7RA: this.restart();
1 process
temp_x=a;if temp_x!=7 x=temp_x;else this.restart(); ...
if !(ps.x!=7) ps.restart();
No more x=...
Pre-compiler
13
Liveness Recovery Tuple
x=x+2;...y=y+5;...
INV: eventually x+y=15RA: this.restart()HTR: history={}
1 processx=x+2;if (x+y==15) this.history={};...y=y+5;if (x+y==15) this.history={};
History= [ ... {.., x=1,y=2,..}, {.., x=3,y=7,..},...]
history=history▪this.state(); if loop in history and CPU(this) ps.restart();
Pre-compiler
14
Generated Monitoring Code for Subsystem
Code for p1
Recovery Tuples
sub: p1, p
2
History= [ ... distributed snapshot(sub),...] External monitor
for sub
Code for p2
Pre-compiler event-driven monitoring
event-driven monitoring
External Monitor
event-driven monitoring
event-driven monitoring
External Monitor
15
Generic Correctness Theorem
• In the program produced by the pre-compiler every rsf (restart supporting fair)-execution E has a suffix in which the program respects its specification function
– A rsf-execution is the execution in which system is trusted to behave according to its specifications after restart.
16
Generic Correctness Proof
• Assumption: Processes and external monitors are scheduled fairly due to presence of self-stabilizing software platform
• Safety: process either reaches monitoring section in its code or its external monitor makes scheduled check – Subsystem: external monitor makes scheduled
check
17
Generic Correctness Proof Cont.
• Liveness: the process (subsystem) external monitor makes scheduled check of the history log
• Corrupted history: – If causes (unnecessary) recovery - trimmed– New correct records are eventually
accumulated and reflect the real state of system
18
Related Work: Perfect Software• Formal specification languages
– ASM [GRS'04], IO Automata [L'96], NURPL [CKB'84]
– Gradually and manually translated into fully verified program
• Model checking – Doesn't scale
• Specification embedding programming languages– SRC (Software Cost Reduction) language [RLHL'06]
– Programmer bugs
19
Related Work: Programming Tools• Design By Contract
– Eiffel, iContract for Java– Checking invariants on an object state,
pre-/post-conditions on object methods, recovery by predefined recovery action
– Partial monitoring of liveness, based on timeout
– Monitoring of safety outside of stabilization scope
• Exceptions– Suitable for single process only
• Unpractical for changing the program flow
20
Related Work: Online Recovery• Recovery blocks (N-programming) [RX94]• ROC [PBB02], Java MOP[CR'05],
Kinesthetics eXtreme [KPGV'03], "On Modeling and Tolerating Incorrect Software" [AT'03]
• Monitoring/correcting layer that alternates the failed component behaviour
21
Related Work: Online Recovery
• Assumption of monitoring/correcting layer stability– ROC [PBB02], Java MOP[CR'05], Kinesthetics
eXtreme [KPGV'03]• Intrusive correcting actions
– Empty program: correcting actions define the program
• "On Modelling and Tolerating Incorrect Software" [AT'03]
22
Conclusions
• Recovery Oriented Programming paradigm for a programming language
• Full monitoring of safety and liveness properties in the scope of stabilization
• Formal correctness proof scheme for the resulting code