14
1 Dealing with Byzantine Faults CS 686 Final Project brought to you by Chris Sosa

Handling Byzantine Faults

Embed Size (px)

DESCRIPTION

My presentation on handling byzantine faults in distributed systems given for my graduate dependability course

Citation preview

Page 1: Handling Byzantine Faults

1

Dealing with Byzantine Faults

CS 686 Final Projectbrought to you by Chris Sosa

Page 2: Handling Byzantine Faults

2

Overview

Motivation in Dependable Systems

Common Types of Byzantine Faults

Solutions in Real Systems

Page 3: Handling Byzantine Faults

3

The Myths Hardware cannot be

“traitorous”! Anthropomorphic model Any system with

consensus is susceptible It’s never happened

before Often misclassified Legionnaire's Disease

Page 4: Handling Byzantine Faults

4

The Awful Truth Time-Triggered Architecture

Radioactive Fault injection to one node

Messed up timing protocol (SOS) Formed Cliques until system failed

Quad Redundant Control System No message exchange Lots of redundancy One fault propagated to look like

many

Professor Knight’s Computer

Page 5: Handling Byzantine Faults

5

Trends in Dependable Systems

1. Device Physics• Smaller and faster not always

better• Cosmic Rays, etc.

2. Movement to Distributed Topologies

3. Usage of Commercial off-the-shelf (COTS) Technology

Page 6: Handling Byzantine Faults

6

Common Types of Observed Faults1. Value

• Issues related to digital values being the extreme of analog

• Propagation2. Temporal

• Different observations at same time• Synchronization doesn’t help very much

3. Value + Temporal

Page 7: Handling Byzantine Faults

7

Solutions?

Page 8: Handling Byzantine Faults

8

Solutions (1) Full Exchange

Uses classical Byzantine agreement SPIDER – bus (ROBUS) design

Page 9: Handling Byzantine Faults

9

Solutions (2) Hierarchical

Uses hierarchy of different fault tolerant techniques including Byzantine Agreement

Seen with Fail-Stop processors SAFEbus

Communication backplane for Boeing 777 Uses two buses which are themselves dual

redundant –different forms of parity detect errors

Uses self-checking pairs on top of buses

Page 10: Handling Byzantine Faults

10

Solutions (3)

Filtering Targets propagation of Byzantine faults Tries to either

Mask faults by forcing output to some straight value (removes value-type faults)

Segments system into Fault Containment Regions (FCR’s) where we put protections to stop propagation

Page 11: Handling Byzantine Faults

11

Ignorance is not Bliss

Can invalidate failure model Propagation of one fault can be

disastrous No amount of redundancy can help

Large Economic Factor Possible costs of recall and redeployment

Page 12: Handling Byzantine Faults

12

Conclusions

Byzantine faults are real! Problems with Ignoring them No amount of Redundancy can

tolerate them w/out message exchange

Three categories of solutions to deal with them

Page 13: Handling Byzantine Faults

13

Questions?

Page 14: Handling Byzantine Faults

14

BGP Quick Review Algorithm is expensive:

Each processor has to broadcast its values for many rounds

Chooses majority value Requires n > 3f where f is # of failures

and n is the # of processors With signed messages

Can tolerate more failures Still expensive