Crash Fault Detection in Celerating Environments

Crash Fault Detection in Celerating EnvironmentsSrikanth Sastry Scott M. Pike

(sastry@cs.tamu.edu) (pike@cs.tamu.edu)

Implementing ◊P

• Implementable under (some models of) partial synchrony.• Popular model: Unknown bounds on message delay ()

and relative process speeds ().

Round Trip Time (RTT) = Outgoing message delay + message processing time + incoming message delay

PINGLocal

◊P module

Outgoing message delay ≤

Ack generationTime ≤ f()

≤ f()ACKIncoming message delay ≤

RTT ≤ + f() + RTT is bounded above!This bound on RTT can be adaptively estimated.

Local Adaptive Estimation of RTT Measuring Time Action Clocks in Accelerating Environments

Real Time

De facto bound on Round-Trip Time (RTT)k action-clock ticks

Estimated bound on RTT - k action ticks

2k action-clock ticks

Timeout! False suspicion

k action-clock ticks

New estimate on RTTis now 2k action ticks

And so on, leading to an infinite stream of mistakes!And so on, leading to an infinite stream of mistakes!

Faster processes More action-clock ticks per RTT Action clock timer continually times out• Two techniques:

– Action clocks: Counting the number of actions– Real-time clocks: Independent device to

measure time (e.g., hardware clocks, NTP).

• Either technique works in environments that do NOT accelerate or decelerate arbitrarily

• But in Celerating environments, where processes can accelerate or decelerate arbitrarily, each technique fails independently.

• Start timer with some arbitrary (small) value• If timer expires without receiving a message, suspect

the process• If a message arrives after timer expiry, trust the

process and increase the timer value.• Eventually timer value exceeds the bound on RTT.• After which correct processes will never be

suspected.• Any crashed process is permanently suspected.

But how do processes measure time?

Crash!

Distributed Systems

Crash!

A collection of autonomous computers (processes) connected through a communication network

• But processes can crash!• Maintain correctness despite crashes• Fault tolerance through crash detection• Crash detection determined by synchronism in the system

Crash Detection and System Models Failure Detectors Eventually Perfect Failure Detector

• Failure detectors: Distributed system service to detect process crashes.

• Failure detector provide (potentially) incorrect information.

• Still powerful enough to solve important problems.

• E.g., distributed consensus, leader election, wait-free scheduling, contention management.

• Failure detector implementations often require partial synchrony.

• One well known failure detector is ◊P, the eventually perfect failure detector.

Live Crashed …Fault Pattern 1

◊P outputs

Crashed …Live Crashed …

Live Crashed …

LiveFault Pattern 2

◊P outputs

CrashedLive

Live CrashedLive

Partial SynchronyCrash Detection Possible

Greater Fidelity to Real World Systems

SynchronyRestrictive Model

Crash Detection Possible

AsynchronyPermissive Model

Crash Detection Impossible

Real-time Clocks in Decelerating Environments Solving the Celeration Problem Bi-Chronal Timers in Non-Celerating Environments Conclusion

Real TimeMsg Send

New estimate on Round-trip time is now 2k real-time ticks…

And so on, leading to an infinite stream of mistakes!And so on, leading to an infinite stream of mistakes!

Msg Recv

Estimate on Round-trip time is k real-time ticks

Msg Send Msg Recv

Slower processes Longer duration to generateand process messages Unbounded RTT (in real time)

• Bi-chronal timer– A vectored composition of action timer and real-

time timer.– Measures time in terms of actions as well as real-

time.– All processes use separate local bi-chronal timers.– Timer expires only when both action timer and the

real-time timer expire.

• The action timer insulates ◊P from deceleration.

• The real-time timer insulates ◊P from acceleration.

• Bi-chronal clocks insulate ◊P from transient network behavior.

• Hardware upgrades often accelerate process speeds– Action clocks precipitate ◊P mistakes during

acceleration– Bi-chronal clocks are immune to acceleration

• Multiple process crashes (in a server farm), DoS attacks, and such can decelerate processes to a crawl– Real-time clocks precipitate ◊P mistakes during

deceleration– Bi-chronal clocks are immune to deceleration

• Many existing ◊P implementations are subtly broken

• Bi-chronal clocks provide a simple solution• Additionally, they insulate systems from

transient behavior• Future work:

– Properties and behavior of Bi-chronal clocks– Use of Bi-chronal clocks in other applications– Other approaches to dealing with Celeration

• Asynchrony: Unbounded message delay and process speeds

• Synchrony: Known bounds on message delay and process speeds

• Partial Synchrony: Between synchrony and asynchrony

Crash Fault Detection in Celerating Environments

Documents

CRASH DATABASE Download Crash Documentation FULL Extract Layout … · CRASH DATABASE Download Crash Documentation FULL Extract Layout Revised: 10/03/2012 Information Technology Office

CELERATING 2 YEARS Medical & Surgical INSTRUMENT …

µProbes for Monitoring and Troubleshootingin case of hardware fault or monitoring application crash. • Compulsory for traﬃc enforcement applications, desirable for passive. monitoring

CY 2550 Foundations of Cybersecurity · •Usually, this just causes a program crash •The infamous “segmentation” or “page” fault •To an attacker, every bug is an opportunity

A Crash Course on the d.school Virtual Crash Course

Are Expert Witnesses Necessary to Prove Fault In a Texas Truck Accidents: A Guide for Crash Victims

Introduction Classical Fault Simulation Modern Fault ...eecs.ceas.uc.edu/~jonewb/Fault-simu.pdf · Fault simulation.10 Parallel Fault Simulation • Taking advantage of inherent parallel

Living in the Anthropocene : Crash or Crash Through

Arc Fault + Ground Fault -

Pre-Crash Scenario Typology for Crash Avoidance Research

åd Legend Actiue Fault Fault Trace (Site Active Fault ...€¦ · åd Legend Actiue Fault Fault Trace (Site Active Fault Trace Dip Presumed Active Fault Tilting 1 : 25,000 I km 60

North Carolina Pedestrian Crash Facts · North Carolina Pedestrian Crash Facts, 2012-2016 3 General NC Pedestrian Crash Trends This report provides a summary of crash trends and crash

WHAT DOES FAULT IN CALIFORNIA TRUCK ACCIDENTS ......If you or someone you love has been hurt in a truck crash, you may be entitled to compensation. California is a fault state under

Crash Early, Crash Often, Explain Well

AIRCRAFT CRASH SITE HAZARDS POST CRASH MANAGEMENT …eagosh.org/eagosh-files/EAGOSH.HAZARDS.pdf · AIRCRAFT CRASH SITE HAZARDS POST CRASH MANAGEMENT ... GLASS FIBRE: Fibres melt in

Fault Rupture and Fault Deformnation

Language crash course crash course A means of communication

Introduction · Web viewPlace the pagefile on a drive that is not fault-tolerant. Note that, if the disk fails, a system crash is likely to occur. If you place the pagefile on a fault-tolerant

SECTION 2. CRASHES & PERSONS IN CRASHES...crash report form, every driver in a crash is assigned a contributing factor for the crash. If a driver is not at fault, ^No Improper Action

AHTD’s Crash Location Tool and Utilizing Incident … · AHTD’s Crash Location Tool and Utilizing Incident Analyst for Crash ... •Utilizing Incident Analyst for Crash ... Section