View
39
Download
0
Category
Preview:
DESCRIPTION
Real Time. And so on, leading to an infinite stream of mistakes!. Msg Send. Msg Recv. 4k action-clock ticks. Conclusion. Implementing ◊P. Failure Detectors. Measuring Time. Estimate on Round-trip time is k real-time ticks. Eventually Perfect Failure Detector. …. - PowerPoint PPT Presentation
Citation preview
Crash Fault Detection in Celerating EnvironmentsSrikanth Sastry Scott M. Pike
(sastry@cs.tamu.edu) (pike@cs.tamu.edu)
Implementing ◊P
• Implementable under (some models of) partial synchrony.• Popular model: Unknown bounds on message delay ()
and relative process speeds ().
Round Trip Time (RTT) = Outgoing message delay + message processing time + incoming message delay
PINGLocal
◊P module
Outgoing message delay ≤
Ack generationTime ≤ f()
≤ f()ACKIncoming message delay ≤
RTT ≤ + f() + RTT is bounded above!This bound on RTT can be adaptively estimated.
Local Adaptive Estimation of RTT Measuring Time Action Clocks in Accelerating Environments
Pro
cess
Spe
ed
Real Time
De facto bound on Round-Trip Time (RTT)k action-clock ticks
Estimated bound on RTT - k action ticks
2k action-clock ticks
Timeout! False suspicion
k action-clock ticks
New estimate on RTTis now 2k action ticks
….
4k action-clock ticks
Timeout! False suspicion
And so on, leading to an infinite stream of mistakes!And so on, leading to an infinite stream of mistakes!
….
2k action-clock ticks
Faster processes More action-clock ticks per RTT Action clock timer continually times out• Two techniques:
– Action clocks: Counting the number of actions– Real-time clocks: Independent device to
measure time (e.g., hardware clocks, NTP).
• Either technique works in environments that do NOT accelerate or decelerate arbitrarily
• But in Celerating environments, where processes can accelerate or decelerate arbitrarily, each technique fails independently.
• Start timer with some arbitrary (small) value• If timer expires without receiving a message, suspect
the process• If a message arrives after timer expiry, trust the
process and increase the timer value.• Eventually timer value exceeds the bound on RTT.• After which correct processes will never be
suspected.• Any crashed process is permanently suspected.
But how do processes measure time?
Crash!
Distributed Systems
Crash!
A collection of autonomous computers (processes) connected through a communication network
• But processes can crash!• Maintain correctness despite crashes• Fault tolerance through crash detection• Crash detection determined by synchronism in the system
Crash Detection and System Models Failure Detectors Eventually Perfect Failure Detector
• Failure detectors: Distributed system service to detect process crashes.
• Failure detector provide (potentially) incorrect information.
• Still powerful enough to solve important problems.
• E.g., distributed consensus, leader election, wait-free scheduling, contention management.
• Failure detector implementations often require partial synchrony.
• One well known failure detector is ◊P, the eventually perfect failure detector.
Live Crashed …Fault Pattern 1
◊P outputs
Crashed …Live Crashed …
Live Crashed …
LiveFault Pattern 2
◊P outputs
CrashedLive
Live CrashedLive
Live
Partial SynchronyCrash Detection Possible
Greater Fidelity to Real World Systems
SynchronyRestrictive Model
Crash Detection Possible
AsynchronyPermissive Model
Crash Detection Impossible
Real-time Clocks in Decelerating Environments Solving the Celeration Problem Bi-Chronal Timers in Non-Celerating Environments Conclusion
Pro
cess
Ste
p T
ime
Real TimeMsg Send
Timeout! False suspicion
New estimate on Round-trip time is now 2k real-time ticks…
.
Timeout! False suspicion
….
And so on, leading to an infinite stream of mistakes!And so on, leading to an infinite stream of mistakes!
Msg Recv
Estimate on Round-trip time is k real-time ticks
Msg Send Msg Recv
Msg Send Msg Recv
(Pro
cess
Spe
ed
)
Slower processes Longer duration to generateand process messages Unbounded RTT (in real time)
• Bi-chronal timer– A vectored composition of action timer and real-
time timer.– Measures time in terms of actions as well as real-
time.– All processes use separate local bi-chronal timers.– Timer expires only when both action timer and the
real-time timer expire.
• The action timer insulates ◊P from deceleration.
• The real-time timer insulates ◊P from acceleration.
• Bi-chronal clocks insulate ◊P from transient network behavior.
• Hardware upgrades often accelerate process speeds– Action clocks precipitate ◊P mistakes during
acceleration– Bi-chronal clocks are immune to acceleration
• Multiple process crashes (in a server farm), DoS attacks, and such can decelerate processes to a crawl– Real-time clocks precipitate ◊P mistakes during
deceleration– Bi-chronal clocks are immune to deceleration
• Many existing ◊P implementations are subtly broken
• Bi-chronal clocks provide a simple solution• Additionally, they insulate systems from
transient behavior• Future work:
– Properties and behavior of Bi-chronal clocks– Use of Bi-chronal clocks in other applications– Other approaches to dealing with Celeration
• Asynchrony: Unbounded message delay and process speeds
• Synchrony: Known bounds on message delay and process speeds
• Partial Synchrony: Between synchrony and asynchrony
Recommended