49
The Mysteries of Dropbox John Hughes

The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

The Mysteries ofDropbox

John Hughes

Page 2: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

File Synchronizer Usage

400 million (June 2015)

240 million (Oct 2014)

250 million (Nov 2014)

Page 3: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What do they do?

How can we test them?

Are they trustworthy?

Page 4: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

TESTING

Hand written test cases

Generated test cases

Page 5: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

QuickCheck

1999—invented by Koen Claessen and myself, for Haskell

2006—Quviq founded marketing Erlang version

Many extensions

Finding deep bugs for Ericsson, Volvo Cars, Basho, etc…

Page 6: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Why Generate Tests?

•Much wider variety!

•More confidence!

•Less work!

Page 7: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Example: Testing a Queue withQuickCheck

• API:• q:new(Size) – create a queue

• q:put(Q, N) – put N into the queue

• q:get(Q) – remove and return the firstelement

Page 8: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

A Generated Failing Testq:new(1) -> {ptr,...}q:put({ptr,...}, 1) -> okq:get({ptr,...}) -> 1q:put({ptr,...}, -1) -> okq:get({ptr,...}) -> -1q:put({ptr,...}, 0) -> okq:put({ptr,...}, 1) -> okq:put({ptr,...}, 0) -> okq:put({ptr,...}, -1) -> okq:get({ptr,...}) -> -1

Reason:Post-condition failed:-1 /= 0

Quitelong and boring!

Page 9: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

A Shrunk Failing Test

q:new(1) -> {ptr, ...}q:put({ptr, ...}, 0) -> okq:put({ptr, ...}, 1) -> okq:get({ptr, ...}) -> 1

Reason:Post-condition failed:1 /= 0

We made a queue of size 1…

…and put TWO things into it!

Page 10: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Bug in the code

We should have got an exception

Bug in the test

We shouldn’tgenerate nonsensetests that abuse the API

Page 11: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

QuickCheck

API under test

A minimal failingexample

Page 12: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo
Page 13: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

How can we tell if a test passed?

State transitions

Postconditions

Page 14: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Modelling a queue

new(1)

put(…,0)

put(…,1)

get(…)

Model

[]

[0]

[0,1]

[1]

Postcondition

result/=NULL

result==0

Page 15: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

But what about Dropbox?

VM

VM

VM

Laptop

Dropbox service

Read and write files

Page 16: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Goals

• A simple model of what a file synchronizer does, that the user can understand without reference to implementation details

• Ideally, a model that works for many different synchronizers

Page 17: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What’s the model?

• Contents of each file?

• Contents of each file on each node?

write(”a”)

read()missing

write(”a”)

read()”a”

Backgroundaction

0.5s

1.0s

Page 18: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

A new approach

Observatione.g. read() ”a”

State transitione.g. write(”b”)

Initial state

All possible background actions

Page 19: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

No matchingobservations means the test fails

Explanation

Page 20: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What background operations?

write(”a”)

read()”a”

write(”b”)

write(”c”)

read()”b”

Page 21: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What background operations?

write(”a”)

read()”a”

write(”b”)

write(”c”)

read()”b”

Server

Page 22: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Conflicts

write(”a”)

read()”a”

write(”b”)

”b” will appear in a conflict file

Page 23: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Conflicts

write(”a”)

read()”b”

write(”b”)

”a” may or may not appear in a conflict file

”a”

Observe the value overwritten

will not

Page 24: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Conflicts

write(”a”)

read()”b”

write(”b”)

”a” may or may not appear in a conflict file

missing

will

Page 25: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Observing conflict files

• Conflict files do not appear immediately!Wait for a stable state to check for them

stabilize() (V,C)

The final value in the file(that all nodes converge to)

The set of values in conflict files

Page 26: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

stabilize() (V,C)How long should we wait?

• Until all nodes agree on file contents (V) and conflict files (C)

• Until the Dropbox daemon on each node claims to be idle

• Until we see what we expect!

BUT NOT TOO

LONG!!!

Page 27: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Our model

• Global value• Global conflict set

• For each node:• Local value• ”Fresh” or ”Stale”• ”Clean” or ”Dirty”

On the ”server”

”Stale” meansneeds to downloadfrom the server

”Dirty” meansneeds to upload to the server

Page 28: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

read() V

Observes:Local value is V

State transition:None

Page 29: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

write(Vnew) Vold

Observes:Local value is Vold

State transition:Local value becomes VnewThis node becomes dirty

Page 30: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

stabilize() (V,C)

Observes:Global value is VConflict set is CAll nodes are fresh and clean

State transition:None

Page 31: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

download()

Observes:This node is stale and clean

State transition:Local value becomes global valueThis node becomes fresh

Page 32: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

upload()

Observes:This node is dirty

State transition:This node becomes cleanif this node is freshthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

First upload wins

Page 33: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Does the model match reality?

write(”a”) missingwrite(”a”) missing

stabilize() (”a”,{})

in conflict

Where is it?

Page 34: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

A value does not conflict with itself

Page 35: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

upload()Observes:

This node is dirty

State transition:This node becomes cleanif local value /= global value then

if this node is freshthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

Page 36: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Another inconsistency

Wr(”a”)●

Rd()”a””a”

Wr(●)”a”

Rd()●●

Wr(”b”)”a”

in conflict

Rd()”b””b”

But ”b” should be in a conflict file!

Page 37: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

’missing’ loses everyconflict

Page 38: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

upload()Observes:

This node is dirty

State transition:This node becomes cleanif local value /= global value then

if this node is fresh or global value is missingthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

Page 39: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

So far…

• We’re fitting the model to the implementation

Why?• Because Dropbox have thought harder about

synchronization than we have!

For each inconsistency:• Ask ”Is this the intended behaviour?”

Page 40: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Surprises

Page 41: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Dropbox can delete a newlycreated file

Wr(”a”)●

Wr(●)”a”

Wr(”b”)”a”

Wr(”c”)●

Rd()●

It’s gone!!

We’d expect”b” or ”c”!!

Page 42: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Dropbox can recreate deleted files

Wr(”a”)●

Wr(●)”a”

Rd()”a”

What??

stabilize() (”a”,{})

Page 43: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Dropbox can lose data completely

Wr(”a”)●Wr(”b”)”a”Rd()”b”

stabilize() (”b”,{}) (”a”,{})

Dirty, butbehaves as clean

Page 44: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Dropbox can lose data completely

Wr(”a”)●Wr(”b”)”a”Rd()”b”

stabilize() (”c”,{})

Wr(”c”)”a”Lostaltogether!!

Page 45: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What did we do?

• Tested a non-deterministic system by searching for explanations using a model with hidden actions

• Used QuickCheck’s minimal failing tests to refinethe model, until it matched the intended behaviour

• Now minimal failing tests reveal unintended system behaviour

Page 46: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

What do Dropbox say?

• The synchronization team has reproduced the buggy behaviours

• They’re rare failures which occur under very special circumstances

• They’re developing fixes

Page 47: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Synchronization is subtle!

• There’s much more to do…

• Add directories!• Directories and files with the same names• Conflicts between deleting a directory and writing a file

in it• …

• More file synchronizers!

Page 48: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

Coming out in April,

IEEE International Conference on Software Testing,

Chicago

Page 49: The Mysteries of Dropbox - Lambda Days...The Mysteries of Dropbox John Hughes. File Synchronizer Usage 400 million (June 2015) 240 million (Oct 2014) 250 million (Nov 2014) Whatdo

www.quviq.com