Upload
sadiya-farheen
View
218
Download
0
Embed Size (px)
Citation preview
8/6/2019 SADIYA FARHEEN
1/25
Session :
Feb-Jun 2011
FAULT TOLERANCE & FAULTFAULT TOLERANCE & FAULTTOLERANCE ARCHITECTURESTOLERANCE ARCHITECTURES
In Critical Systems DevelopmentIn Critical Systems Development
Under the guidance of
Mr. Manjunath C.R.
Asst. Prof., SBMJCE
By,Sadiya Farheen
10MT6ECS10
SBMJCE, Jain University
8/6/2019 SADIYA FARHEEN
2/25
FAULT TOLERANCE
In critical situations, software systems must be
fault tolerant.
Fault tolerance is required where there are
high availability requirements or where systemfailure costs are very high.
Fault tolerance means that the system can
continue in operation in spite of software
failure.
2
Session :
Feb-Jun 2011
8/6/2019 SADIYA FARHEEN
3/25
FAULT TOLERANCE
ACTIONS
Fault detection
Damage assessment
Fault recovery
Fault repair
3
Session :
Feb-Jun 2011
8/6/2019 SADIYA FARHEEN
4/25
FAULT DETECTION
The first stage of fault tolerance is to detect that a fault (an
erroneous system state) has occurred or will occur.
Ex. Insulin pump software:
4
Session :
Feb-Jun 2011
/ / The d ose o f insul in to be de l ivered m ust always be greater/ / than zero and less that some d ef ined m axim um sing le dose
insul in_do se >= 0 & insul in_dose < = insul in_rese rvoir_con tents
// The total am ount of insul in del ivered in a day m ust be less/ / than or equal to a def ined d aily maximum dose
cum ulative_dose
8/6/2019 SADIYA FARHEEN
5/25
Types of fault detection
Preventative fault detection
- The fault detection mechanism is initiated
before the state change is committed.
Retrospective fault detection
- The fault detection mechanism is initiated after
the system state has been changed.
5
Session :
Feb-Jun 2011
8/6/2019 SADIYA FARHEEN
6/25
Implementation of
preventative fault detection
Session :
Feb-Jun 2011
6
class Posit iveEvenInteger {
int va l = 0 ;
Pos itive Even Integer ( int n ) t hrow s Num ericExce pt ion
{
if (n < 0 | n%2 = = 1)
throw new N ume ricExcept ion () ;
else
val = n ;
} / / P ositiveEve nI ntege r
8/6/2019 SADIYA FARHEEN
7/25
Session :
Feb-Jun 2011
7
p ub lic v o id a ss ig n ( in t n ) th ro s u e ri c x ce ptio n
{if (n < 0 | n 2 = = 1 )
th ro ne u e ric xception ();else
val = n ;
} // as sig n
int toIn teg er (){
return va l ;
} //to Integ er
boolean e qual s ( os itive ven In teger n ){
return (val == n .val) ;
} // eq uals
} // os it ive ve n
8/6/2019 SADIYA FARHEEN
8/25
Damage Assessment
Analyse system state to judge the extent of corruption
caused by a system failure.
The assessment must check what parts of the state
space have been affected by the failure.
Generally based on validity functions that can be
applied to the state elements to assess if their value is
within an allowed range.
Session :
Feb-Jun 2011
8
8/6/2019 SADIYA FARHEEN
9/25
Session :
Feb-Jun 2011
9
c lass R obustArray {
// C he cks that al l the objec ts in an a rray of ob jects
/ / conform to som e def ined constraint
boo lean [] check State ;C hec kableO bject [ ] theR obu stArray ;
R obu stArray (Che ckableO bject [ ] theArray)
{checkS tate = new boolean [ theA rray.length] ;theRobus tArray = theArray ;
} //Rob ustArray
Interface CheckableObject {
public boolean check();}
8/6/2019 SADIYA FARHEEN
10/25
Session :
Feb-Jun 2011
10
public vo id assess Da mag e ( ) throw s ArrayD ama gedEx cept ion
{
boo lean h asBeenD ama ged = fa lse ;
for ( int i= 0 ; i
8/6/2019 SADIYA FARHEEN
11/25
Damage assessment
techniques
Checksums
Pointers
Watch dog timers
Session :
Feb-Jun 2011
11
8/6/2019 SADIYA FARHEEN
12/25
Fault recovery and repair
Forward recovery
- Apply repairs to a corrupted system state.
Backward recovery
- Restore the system state to a known safe state.
Forward recovery is usually application specific
- domain knowledge is required to compute
possible state corrections.
Backward error recovery is simpler. Details of a
safe state are maintained and this replaces the
corrupted system state.
Session :
Feb-Jun 2011
12
8/6/2019 SADIYA FARHEEN
13/25
Forward recovery
Corruption of data coding
- Error coding techniques which add redundancy to coded
data can be used for repairing data corrupted during
transmission.
Redundant pointers- When redundant pointers are included in data structures
(e.g. two-way lists), a corrupted list or filestore may be
rebuilt if a sufficient number of pointers are uncorrupted
- Often used for database and file system repair.
Session :
Feb-Jun 2011
13
8/6/2019 SADIYA FARHEEN
14/25
Backward recovery
Transactions are a frequently used method of
backward recovery. Changes are not applied until
computation is complete. If an error occurs, the
system is left in the state preceding the transaction.
Periodic checkpoints allow system to 'roll-back' to a
correct state.
Session :
Feb-Jun 2011
14
8/6/2019 SADIYA FARHEEN
15/25
Safe sort procedure
A sort operation monitors its own execution and
assesses if the sort has been correctly executed.
It maintains a copy of its input so that if an error
occurs, the input is not corrupted.
Based on identifying and handling exceptions.
Possible in this case as the condition for avalid sort is
known. However, in many cases it is difficult to write
validity checks.
Session :
Feb-Jun 2011
15
8/6/2019 SADIYA FARHEEN
16/25
Session :
Feb-Jun 2011
16
c la ss a fe o rt {
s tat ic v o id sort ( int [] in tarra y, in t order ) thro s o rt rror
{
int [ ] copy = ne in t [ int arra y.leng th];
/ / co py t he inpu t ar ray
for ( int i = 0; i < inta rra y.leng th ; i++)
co py [i ] = i nt arra y [i ] ;try {
ort.bub bleso rt (in tarra y, intarra y.leng th, o rder) ;
8/6/2019 SADIYA FARHEEN
17/25
Session :
Feb-Jun 2011
17
i f (order == o rt.asce nding)
for (int i = 0; i i ntarra y [i+1])
th ro ne or t rro r () ;
elsefor (int i = 0; i intarray [i])
th ro ne or t rro r () ;
} // try block
catc h ( o rt rr o r e )
{
for (int i = 0; i < inta rra y.leng th ; i++ )intarray [i] = cop y [i] ;
th ro ne or t rro r (" rra y no t s orted ") ;
} //catch
} // sor t
} // a fe o rt
8/6/2019 SADIYA FARHEEN
18/25
Fault tolerant architecture
Defensive programming cannot cope with faults that
involve interactions between the hardware and the
software.
Where systems have high availability requirements, a
specific architecture designed to support fault
tolerance may be required.
This must tolerate both hardware and software failure.
Session :
Feb-Jun 2011
18
8/6/2019 SADIYA FARHEEN
19/25
Hardware fault tolerance
Triple Modular Redundancy(TMR) to cope with hardware failure
Session :
Feb-Jun 2011
19
8/6/2019 SADIYA FARHEEN
20/25
Software analogies to TMR
N-version programming- The same specification is implemented in a number of
different versions by different teams. All versions computesimultaneously and the majority output is selected using avoting system.
- This is the most commonly used approach e.g. in manymodels of the Airbus commercial aircraft.
Recovery blocks- A number ofexplicitly different versions of the same
specification are written and executed in sequence.
- An acceptance test is used to select the output to betransmitted.
20
Session :
Feb-Jun 2011
8/6/2019 SADIYA FARHEEN
21/25
N-version programming
21
Session :
Feb-Jun 2011
8/6/2019 SADIYA FARHEEN
22/25
Recovery blocks
Session :
Feb-Jun 2011
22
8/6/2019 SADIYA FARHEEN
23/25
Key points
Exceptions are used to support error management in
dependable systems.
The four aspects of program fault tolerance are failure
detection, damage assessment, fault recovery and
fault repair.
N-version programming and recovery blocks are
alternative approaches to fault-tolerant architectures.
Session :
Feb-Jun 2011
23
8/6/2019 SADIYA FARHEEN
24/25
QUERIES??
24
8/6/2019 SADIYA FARHEEN
25/25
THANK YOU FOLKS!!!!
25