Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John

Experiences with Formal Specifications of Fault-Tolerant

File Systems

Roxana Geambasu (University of Washington)

Andrew Birrell (Microsoft Research)

John MacCormick (Dickinson College)

22

Fault-Tolerant File Systems (FTFSs) FTFSs are crucial components in today’s datacenters They underlie most of what we do on the Web Dependability & correctness of FTFSs are paramount

Google File System (GFS)

Niobe Dynamo

Google Earth

Google Analytics

Amazon services

Web services

3

FTFSs Are Extremely Complex Contain sophisticated protocols for:

replica consistency, recovery (replica addition to compensate for failures), reconfiguration (replica removal due to failure), load balancing, etc.

Hence, FTFS protocols and implementation are hard to get right

Formal Methods (FM) Formal methods have been used extensively to

increase trust in complex systems Formal specification languages are unambiguous Model checking and formal proofs are reliable

However, FTFS designers still rely solely on prose and intuitive reasoning Prose may be ambiguous, inaccurate Intuitive reasoning may be faulty

FTFS Design and Analysis Challenges

Without formal methods, it is hard to: Understand FTFS behavior and semantics

Intuitive reasoning is hard and error-prone

Explore alternative designs Alternative designs may affect semantics in complex ways

Compare various FTFSs Prose is ambiguous and code bases are huge (tens of

thousands of lines of code)

Goal: Convince FTFS Builders to Use FM Previous studies showed how and for what purposes

to use FM for many classes of systems, e.g.: Local/distributed FSs, processor caches, TCP congestion

Our work: Shows how and for what purposes to use FM for

another specific class of important systems:

fault-tolerant file systems

Identifies convenient ways in which FM help in understanding, designing & comparing FTFSs

77

Our Experience We wrote TLA+ specifications for three protocols:

Chain replication (Cornell University) Niobe (Microsoft) GFS (Google)

Our experience shows that FM help solve FTFS challenges:

1. Comparing system mechanisms & tradeoffs

2. Understanding and proving semantics

3. Exploring alternative designs

88

Outline

Specification effort

Experiences with formal specifications for FTFS:

1. Comparing system mechanisms



Conclusions

Specification Effort

Question: How hard is it to build specifications?

Answer: Moderately precise specifications are reasonably easy to produce

Chain Niobe GFS

Time to write 3 weeks 1 week 2 weeks

1010

Outline






Conclusions

11

1. Comparing System Mechanisms

Case study: GFS vs. Niobe

From prose, they seemed very different systems GFS: trades some consistency for throughput Niobe: designed for strong consistency

Our TLA+ specifications highlight significant mechanism overlap and also key differences

11

12

Capturing Similarities & Differences

Common

Niobe GFS

(291 lines)

(287 lines)(189 lines)

single-master, primary-secondary replication

More than half of the TLA+ code-base is common Specifications are small due to TLA+ expressiveness

Compare their total sizes to the tens of thousands of LOC of the systems’ implementations

13

Differences Stand Out Clearly in TLA+

1

w

32

w

w

w

ACK ACK

4

Example: Write completion in GFS and Niobe

14

Differences Stand Out Clearly in TLA+

Group reconfiguration

1

w

32

w

w

w

ACK ACK

4 1

32

w

w

w

ACK ACK

w

4


15

Understanding Tradeoffs

Smaller latency, but writes may leave

group inconsistent

A write never leaves replica group in

inconsistent state


Tradeoff:

16

Lesson: Formalism Helps in Comparison

Formal specifications distill key differences and similarities between systems

Understanding the key differences enables us to understand tradeoffs

1717

Outline






Conclusions

18

2. Understanding FTFS Consistency Hard to prove consistency models for FTFSs

For weakly consistent systems, it can be even harder

Solution: use refinement mapping1. Reduce system to a really simple model

2. Prove the correctness of the reduction

3. Reason about the SimpleStore

For convenience, we use model-checking instead of full manual proofs at Step 2

System

SimpleStore

consistency model

reduction

consistency model

19

SimpleStores capture only client-visible behaviors and abstract out all protocol mechanisms

SimpleStores are easy to reason about

SimpleStores for the Three FTFSs

Chain_SS Niobe_SS GFS_SS

Chain BlueNiobeChain

reductionreductionreduction

GFS

20

Chain_SS

Chain

reduction

20

Chain’s Consistency Semantics

Using convenient methods, we gained reliable insight into Chain’s consistency model

linearizablelinearizable Proof is straightforward (half a page)

2121

Niobe’s Consistency Semantics

Chain_SS

Chain

reduction

linearizable

linearizable

Niobe_SS

Niobe

reduction

linearizable

linearizable

Similar experience as with Chain Thus, formal methods help in verifying standard

consistency models for strongly-consistent FTFSs

GFS_SS

??

GFS

GFS’ Consistency Semantics Formal methods proved helpful in several ways

An interesting conclusion (details in the paper): Using refinement mappings, we were able to show

that, under a small set of assumptions, GFS has regular-register semantic

GFS_SS

GFSassumptions

reduction

regular register

regular register

well-defined intermediate-level consistency model

23

Lesson: Formalism Helps Understand Semantics

Refinement mappings help in understanding & reliably verifying consistency models of FTFS

They are useful for both strongly consistent and weakly consistent FTFSs

2424

Outline






Conclusions

2525

3. Exploring Alternative Designs

Exploring alternative designs is much easier using our framework (TLA+ specs, SimpleStores, reductions)

SystemSimpleStore

System model

reduction

26

Case-Study: Changing Niobe’s Design Currently, Niobe’s clients read from primary only Reading from any replica may improve throughput

Design question:

What happens to Niobe if it adopts read-any policy?

ChainGFSassumption

Niobe_SS

linearizable

regular register?

GFS_SS

regular register

Nioberead-any

regular register

Conclusions FTFSs are extremely important in today’s Web We showed how formal methods can help improve

our understanding and trust in FTFSs

Lessons from our experience with three FTFSs: Writing formal specifications is relatively easy Formal methods enable:

Insightful comparison of mechanisms & tradeoffs Reliable verification of consistency properties Convenient investigation of alternative designs

2828

Appendix

29

Related Work FM are extensively used to reason about software [Bickford,

et.al., 96] and hardware [Shimizu, et.al., 02] However, FTFS builders have not adopted them yet By sharing our experience, we hope to convince FTFS builders of the

utility of specifying their systems formally

Using FM to improve understanding and trust in systems: Previous works apply FM to various classes of systems: [Chkliaev, et.al., 00], [Crow, et.al., 98], [Joshi, et.al., 03], [Houston, et.al., 91] The closest works are those looking at distributed FS (AFS, Coda)

[Sivathanu, et.al., 05], [Wing, et.al., 97], [Yang, et.al., 04] We show how to apply them in the specific context of FTFS

Reducing complex systems to simple ones in order to reason about semantics has been used before [Joshi, et.al., 03]

We apply this method to FTFSs

GFS Assumptions

If:

1. A write never crosses chunk boundaries GFS client library offers chunk-level operations

2. A write never goes to a stale replica Implement this assumption using a lease mechanism

Then:GFS_SS

GFSassumptions

reduction

regular register

regular register

Standard Consistency Models Linearizability (Atomic register semantic)

Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S

The sequential interleaving S preserves the real-time ordering of operations from H

Serializability Any client-visible history H generated by the system is equivalent to a

legal sequential interleaving S

Regular register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes returns either the value of the in-

process writes or the most recently written value

Safe register semantic Read not concurrent with any write returns most recently written value Read concurrent with some writes can return anything

3232

Summary of Contributions Identified a new important class of extremely complex

systems: FTFSs

Showed three aspects of FTFS design & analysis for which FM prove especially valuable Mechanism comparison, semantics understanding, and

design space exploration

Showed how to apply specific FMs to FTFSs Showed how to construct SimpleStores and what can be

learned from them SimpleStores are reusable between systems

We believe that our study, tailored toward FTFSs, can be more relevant to FTFS designers than more general studies

Lessons from Our Experience Building high-level specifications for FTFS is relatively easy

It is also remarkably useful for understanding system

The exercise of writing specifications exposes similarities in seemingly dissimilar systems (GFS, Niobe)

Formal specifications also distill the key design differences

Specifications enable convenient verifications of consistency for both strongly and weakly consistent systems Niobe and Chain are both linearizable GFS can be upgraded to regular register via a clear set of assumptions GFS’ design to read from any replica heavily influences its consistency

Intuition can fail often times Niobe seemed to be reducible to Chain_SS, but actually was not

3434

Chain SimpleStore

write channel

drop(w7)

commit(w5)

SerialDB

reads

read()

read channel

Chain_SS

r2

r1

r3

w7

w6

w5

Requestswrites

Responseswrites

reads

3535

The Temporal Logic of Actions (TLA+) Formalism that combines a temporal logic with a

logic of actions Especially designed for specification of distributed

asynchronous systems TLA+ specifications model the system as a state

machine: Define system variables (state) Model actions that the system can take as state transitions

36

Understanding Tradeoffs

Smaller write latency, but writes may leave group inconsistent

A write never leaves replica group in

inconsistent state

1

32

4 1

32

4

Error

readread

Old value

Documents

Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu(University of Washington) Andrew Birrell(Microsoft Research) John